linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Catching NForce2 lockup with NMI watchdog
@ 2003-12-05  4:54 Jesse Allen
  2003-12-05  7:40 ` Mikael Pettersson
  0 siblings, 1 reply; 62+ messages in thread
From: Jesse Allen @ 2003-12-05  4:54 UTC (permalink / raw)
  To: linux-kernel

Hi,

I have a NForce2 board and can easily reproduce a lockup with grep on an IDE 
hard disk at UDMA 100.  The lockup occurs when both Local APIC + IO-APIC are 
enabled.  It was suggested to me to use NMI watchdog to catch it.  However, the 
NMI watchdog doesn't seem to work.

When I set the kernel parameter "nmi_watchdog=1" I get this message in 
/var/log/syslog:
Dec  4 20:10:30 tesore kernel: ..MP-BIOS bug: 8254 timer not connected to 
IO-APIC
Dec  4 20:10:30 tesore kernel: timer doesn't work through the IO-APIC - 
disabling NMI Watchdog!

"nmi_watchdog=2" seems to work at first, In /var/log/messages:
Dec  4 20:13:11 tesore kernel: testing NMI watchdog ... OK.
but it still locks up.

I have the complete logs when running with nmi_watchdog, kernel config, and more
here:
http://www.chez.com/alors/nforce-lockup-logs.tar.gz

If you have any ideas please give them =)

Jesse

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-05  4:54 Catching NForce2 lockup with NMI watchdog Jesse Allen
@ 2003-12-05  7:40 ` Mikael Pettersson
  2003-12-05  8:33   ` Josh McKinney
  2003-12-05  8:58   ` Mike Fedyk
  0 siblings, 2 replies; 62+ messages in thread
From: Mikael Pettersson @ 2003-12-05  7:40 UTC (permalink / raw)
  To: Jesse Allen; +Cc: linux-kernel

Jesse Allen writes:
 > Hi,
 > 
 > I have a NForce2 board and can easily reproduce a lockup with grep on an IDE 
 > hard disk at UDMA 100.  The lockup occurs when both Local APIC + IO-APIC are 
 > enabled.  It was suggested to me to use NMI watchdog to catch it.  However, the 
 > NMI watchdog doesn't seem to work.
 > 
 > When I set the kernel parameter "nmi_watchdog=1" I get this message in 
 > /var/log/syslog:
 > Dec  4 20:10:30 tesore kernel: ..MP-BIOS bug: 8254 timer not connected to 
 > IO-APIC
 > Dec  4 20:10:30 tesore kernel: timer doesn't work through the IO-APIC - 
 > disabling NMI Watchdog!
 > 
 > "nmi_watchdog=2" seems to work at first, In /var/log/messages:
 > Dec  4 20:13:11 tesore kernel: testing NMI watchdog ... OK.
 > but it still locks up.

The NMI watchdog can only handle software lockups, since it relies on
the CPU, and for nmi_watchdog=1 the I/O-APIC + bus, still running.
Hardware lockups result in, well, hardware lockups :-(

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-05  7:40 ` Mikael Pettersson
@ 2003-12-05  8:33   ` Josh McKinney
  2003-12-05 12:14     ` Mikael Pettersson
  2003-12-05  8:58   ` Mike Fedyk
  1 sibling, 1 reply; 62+ messages in thread
From: Josh McKinney @ 2003-12-05  8:33 UTC (permalink / raw)
  To: linux-kernel

On approximately Fri, Dec 05, 2003 at 08:40:58AM +0100, Mikael Pettersson wrote:
> Jesse Allen writes:
>  > Hi,
>  > 
>  > I have a NForce2 board and can easily reproduce a lockup with grep on an IDE 
>  > hard disk at UDMA 100.  The lockup occurs when both Local APIC + IO-APIC are 
>  > enabled.  It was suggested to me to use NMI watchdog to catch it.  However, the 
>  > NMI watchdog doesn't seem to work.
>  > 
>  > When I set the kernel parameter "nmi_watchdog=1" I get this message in 
>  > /var/log/syslog:
>  > Dec  4 20:10:30 tesore kernel: ..MP-BIOS bug: 8254 timer not connected to 
>  > IO-APIC
>  > Dec  4 20:10:30 tesore kernel: timer doesn't work through the IO-APIC - 
>  > disabling NMI Watchdog!
>  > 
>  > "nmi_watchdog=2" seems to work at first, In /var/log/messages:
>  > Dec  4 20:13:11 tesore kernel: testing NMI watchdog ... OK.
>  > but it still locks up.
> 
> The NMI watchdog can only handle software lockups, since it relies on
> the CPU, and for nmi_watchdog=1 the I/O-APIC + bus, still running.
> Hardware lockups result in, well, hardware lockups :-(

So does this confirm that the lockups with nforce2 chipsets and apic
is actually a hardware problem after all? 
  
-- 
Josh McKinney		     |	Webmaster: http://joshandangie.org
--------------------------------------------------------------------------
                             | They that can give up essential liberty
Linux, the choice       -o)  | to obtain a little temporary safety deserve 
of the GNU generation    /\  | neither liberty or safety. 
                        _\_v |                          -Benjamin Franklin

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-05  7:40 ` Mikael Pettersson
  2003-12-05  8:33   ` Josh McKinney
@ 2003-12-05  8:58   ` Mike Fedyk
  2003-12-05 12:06     ` Mikael Pettersson
  2003-12-08  2:20     ` Bob
  1 sibling, 2 replies; 62+ messages in thread
From: Mike Fedyk @ 2003-12-05  8:58 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: Jesse Allen, linux-kernel

On Fri, Dec 05, 2003 at 08:40:58AM +0100, Mikael Pettersson wrote:
> Jesse Allen writes:
>  > Hi,
>  > 
>  > I have a NForce2 board and can easily reproduce a lockup with grep on an IDE 
>  > hard disk at UDMA 100.  The lockup occurs when both Local APIC + IO-APIC are 
>  > enabled.  It was suggested to me to use NMI watchdog to catch it.  However, the 
>  > NMI watchdog doesn't seem to work.
>  > 
>  > When I set the kernel parameter "nmi_watchdog=1" I get this message in 
>  > /var/log/syslog:
>  > Dec  4 20:10:30 tesore kernel: ..MP-BIOS bug: 8254 timer not connected to 
>  > IO-APIC
>  > Dec  4 20:10:30 tesore kernel: timer doesn't work through the IO-APIC - 
>  > disabling NMI Watchdog!
>  > 
>  > "nmi_watchdog=2" seems to work at first, In /var/log/messages:
>  > Dec  4 20:13:11 tesore kernel: testing NMI watchdog ... OK.
>  > but it still locks up.
> 
> The NMI watchdog can only handle software lockups, since it relies on
> the CPU, and for nmi_watchdog=1 the I/O-APIC + bus, still running.
> Hardware lockups result in, well, hardware lockups :-(

But nmi_watchdog=1 is supposed to work with APIC, or IO-APIC, and it isn't
for his motherboard.  It doesn't increment NMI in /proc/interrupts.  And it
gives the above error message.  Isn't that a bug?

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-05  8:58   ` Mike Fedyk
@ 2003-12-05 12:06     ` Mikael Pettersson
  2003-12-08  2:20     ` Bob
  1 sibling, 0 replies; 62+ messages in thread
From: Mikael Pettersson @ 2003-12-05 12:06 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: Jesse Allen, linux-kernel

Mike Fedyk writes:
 > On Fri, Dec 05, 2003 at 08:40:58AM +0100, Mikael Pettersson wrote:
 > > Jesse Allen writes:
 > >  > Hi,
 > >  > 
 > >  > I have a NForce2 board and can easily reproduce a lockup with grep on an IDE 
 > >  > hard disk at UDMA 100.  The lockup occurs when both Local APIC + IO-APIC are 
 > >  > enabled.  It was suggested to me to use NMI watchdog to catch it.  However, the 
 > >  > NMI watchdog doesn't seem to work.
 > >  > 
 > >  > When I set the kernel parameter "nmi_watchdog=1" I get this message in 
 > >  > /var/log/syslog:
 > >  > Dec  4 20:10:30 tesore kernel: ..MP-BIOS bug: 8254 timer not connected to 
 > >  > IO-APIC
 > >  > Dec  4 20:10:30 tesore kernel: timer doesn't work through the IO-APIC - 
 > >  > disabling NMI Watchdog!
 > >  > 
 > >  > "nmi_watchdog=2" seems to work at first, In /var/log/messages:
 > >  > Dec  4 20:13:11 tesore kernel: testing NMI watchdog ... OK.
 > >  > but it still locks up.
 > > 
 > > The NMI watchdog can only handle software lockups, since it relies on
 > > the CPU, and for nmi_watchdog=1 the I/O-APIC + bus, still running.
 > > Hardware lockups result in, well, hardware lockups :-(
 > 
 > But nmi_watchdog=1 is supposed to work with APIC, or IO-APIC, and it isn't
 > for his motherboard.  It doesn't increment NMI in /proc/interrupts.  And it
 > gives the above error message.  Isn't that a bug?

nmi_watchdog=1 only falls back to nmi_watchdog=2 if no SMP is detected.
If the I/O-APIC is detected but doesn't work, then the fallback
does not happen, and you need to set nmi_watchdog=2 explicitly.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-05  8:33   ` Josh McKinney
@ 2003-12-05 12:14     ` Mikael Pettersson
  2003-12-05 14:19       ` Craig Bradney
  0 siblings, 1 reply; 62+ messages in thread
From: Mikael Pettersson @ 2003-12-05 12:14 UTC (permalink / raw)
  To: Josh McKinney; +Cc: linux-kernel

Josh McKinney writes:
 > On approximately Fri, Dec 05, 2003 at 08:40:58AM +0100, Mikael Pettersson wrote:
 > > Jesse Allen writes:
 > >  > Hi,
 > >  > 
 > >  > I have a NForce2 board and can easily reproduce a lockup with grep on an IDE 
 > >  > hard disk at UDMA 100.  The lockup occurs when both Local APIC + IO-APIC are 
 > >  > enabled.  It was suggested to me to use NMI watchdog to catch it.  However, the 
 > >  > NMI watchdog doesn't seem to work.
 > >  > 
 > >  > When I set the kernel parameter "nmi_watchdog=1" I get this message in 
 > >  > /var/log/syslog:
 > >  > Dec  4 20:10:30 tesore kernel: ..MP-BIOS bug: 8254 timer not connected to 
 > >  > IO-APIC
 > >  > Dec  4 20:10:30 tesore kernel: timer doesn't work through the IO-APIC - 
 > >  > disabling NMI Watchdog!
 > >  > 
 > >  > "nmi_watchdog=2" seems to work at first, In /var/log/messages:
 > >  > Dec  4 20:13:11 tesore kernel: testing NMI watchdog ... OK.
 > >  > but it still locks up.
 > > 
 > > The NMI watchdog can only handle software lockups, since it relies on
 > > the CPU, and for nmi_watchdog=1 the I/O-APIC + bus, still running.
 > > Hardware lockups result in, well, hardware lockups :-(
 > 
 > So does this confirm that the lockups with nforce2 chipsets and apic
 > is actually a hardware problem after all? 

Confirm with very high probability. There may be quirks in nVidia's
chipset that we (unlike their Windoze drivers) don't know about.

Ask nVidia for detailed chipset documentation. Then maybe we can fix this.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-05 12:14     ` Mikael Pettersson
@ 2003-12-05 14:19       ` Craig Bradney
  2003-12-05 17:05         ` Craig Bradney
  2003-12-05 18:11         ` Josh McKinney
  0 siblings, 2 replies; 62+ messages in thread
From: Craig Bradney @ 2003-12-05 14:19 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: Josh McKinney, linux-kernel

I'm getting those in dmesg too...

..MP-BIOS bug: 8254 timer not connected to IO-APIC
...trying to set up timer (IRQ0) through the 8259A ...  failed.
...trying to set up timer as Virtual Wire IRQ... failed.
...trying to set up timer as ExtINT IRQ... works.


Do you really think this could be the problem?

If so, any ideas why I am relatively lucky to not have the crashes
people are having? 5.5 days, then 5 hours, and now Im up to 17 hours...
with a decent amount of use combined with idle time.

Craig


On Fri, 2003-12-05 at 13:14, Mikael Pettersson wrote:
> Josh McKinney writes:
>  > On approximately Fri, Dec 05, 2003 at 08:40:58AM +0100, Mikael Pettersson wrote:
>  > > Jesse Allen writes:
>  > >  > Hi,
>  > >  > 
>  > >  > I have a NForce2 board and can easily reproduce a lockup with grep on an IDE 
>  > >  > hard disk at UDMA 100.  The lockup occurs when both Local APIC + IO-APIC are 
>  > >  > enabled.  It was suggested to me to use NMI watchdog to catch it.  However, the 
>  > >  > NMI watchdog doesn't seem to work.
>  > >  > 
>  > >  > When I set the kernel parameter "nmi_watchdog=1" I get this message in 
>  > >  > /var/log/syslog:
>  > >  > Dec  4 20:10:30 tesore kernel: ..MP-BIOS bug: 8254 timer not connected to 
>  > >  > IO-APIC
>  > >  > Dec  4 20:10:30 tesore kernel: timer doesn't work through the IO-APIC - 
>  > >  > disabling NMI Watchdog!
>  > >  > 
>  > >  > "nmi_watchdog=2" seems to work at first, In /var/log/messages:
>  > >  > Dec  4 20:13:11 tesore kernel: testing NMI watchdog ... OK.
>  > >  > but it still locks up.
>  > > 
>  > > The NMI watchdog can only handle software lockups, since it relies on
>  > > the CPU, and for nmi_watchdog=1 the I/O-APIC + bus, still running.
>  > > Hardware lockups result in, well, hardware lockups :-(
>  > 
>  > So does this confirm that the lockups with nforce2 chipsets and apic
>  > is actually a hardware problem after all? 
> 
> Confirm with very high probability. There may be quirks in nVidia's
> chipset that we (unlike their Windoze drivers) don't know about.
> 
> Ask nVidia for detailed chipset documentation. Then maybe we can fix this.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-05 14:19       ` Craig Bradney
@ 2003-12-05 17:05         ` Craig Bradney
  2003-12-05 18:11         ` Josh McKinney
  1 sibling, 0 replies; 62+ messages in thread
From: Craig Bradney @ 2003-12-05 17:05 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: Josh McKinney, linux-kernel

Having just had another hang.. I tried booting with nmi-watchdog=1 and
then with 2.

I am currently running from the boot with 2 selected.

In my current dmesg I have these which dont normally appear and didnt
appear in the boot with 1 set.

Any ideas?

hda: IRQ probe failed (0xfffffcfa)
hdb: IRQ probe failed (0xfffffcfa)
hdb: IRQ probe failed (0xfffffcfa)

Craig



On Fri, 2003-12-05 at 15:19, Craig Bradney wrote:
> I'm getting those in dmesg too...
> 
> ..MP-BIOS bug: 8254 timer not connected to IO-APIC
> ...trying to set up timer (IRQ0) through the 8259A ...  failed.
> ...trying to set up timer as Virtual Wire IRQ... failed.
> ...trying to set up timer as ExtINT IRQ... works.
> 
> 
> Do you really think this could be the problem?
> 
> If so, any ideas why I am relatively lucky to not have the crashes
> people are having? 5.5 days, then 5 hours, and now Im up to 17 hours...
> with a decent amount of use combined with idle time.
> 
> Craig
> 
> 
> On Fri, 2003-12-05 at 13:14, Mikael Pettersson wrote:
> > Josh McKinney writes:
> >  > On approximately Fri, Dec 05, 2003 at 08:40:58AM +0100, Mikael Pettersson wrote:
> >  > > Jesse Allen writes:
> >  > >  > Hi,
> >  > >  > 
> >  > >  > I have a NForce2 board and can easily reproduce a lockup with grep on an IDE 
> >  > >  > hard disk at UDMA 100.  The lockup occurs when both Local APIC + IO-APIC are 
> >  > >  > enabled.  It was suggested to me to use NMI watchdog to catch it.  However, the 
> >  > >  > NMI watchdog doesn't seem to work.
> >  > >  > 
> >  > >  > When I set the kernel parameter "nmi_watchdog=1" I get this message in 
> >  > >  > /var/log/syslog:
> >  > >  > Dec  4 20:10:30 tesore kernel: ..MP-BIOS bug: 8254 timer not connected to 
> >  > >  > IO-APIC
> >  > >  > Dec  4 20:10:30 tesore kernel: timer doesn't work through the IO-APIC - 
> >  > >  > disabling NMI Watchdog!
> >  > >  > 
> >  > >  > "nmi_watchdog=2" seems to work at first, In /var/log/messages:
> >  > >  > Dec  4 20:13:11 tesore kernel: testing NMI watchdog ... OK.
> >  > >  > but it still locks up.
> >  > > 
> >  > > The NMI watchdog can only handle software lockups, since it relies on
> >  > > the CPU, and for nmi_watchdog=1 the I/O-APIC + bus, still running.
> >  > > Hardware lockups result in, well, hardware lockups :-(
> >  > 
> >  > So does this confirm that the lockups with nforce2 chipsets and apic
> >  > is actually a hardware problem after all? 
> > 
> > Confirm with very high probability. There may be quirks in nVidia's
> > chipset that we (unlike their Windoze drivers) don't know about.
> > 
> > Ask nVidia for detailed chipset documentation. Then maybe we can fix this.
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-05 14:19       ` Craig Bradney
  2003-12-05 17:05         ` Craig Bradney
@ 2003-12-05 18:11         ` Josh McKinney
  1 sibling, 0 replies; 62+ messages in thread
From: Josh McKinney @ 2003-12-05 18:11 UTC (permalink / raw)
  To: linux-kernel

Please don't CC me, I am on the list, thanks.

On approximately Fri, Dec 05, 2003 at 03:19:33PM +0100, Craig Bradney wrote:
> I'm getting those in dmesg too...
> 
> ..MP-BIOS bug: 8254 timer not connected to IO-APIC
> ...trying to set up timer (IRQ0) through the 8259A ...  failed.
> ...trying to set up timer as Virtual Wire IRQ... failed.
> ...trying to set up timer as ExtINT IRQ... works.
> 
> 
> Do you really think this could be the problem?
> 
> If so, any ideas why I am relatively lucky to not have the crashes
> people are having? 5.5 days, then 5 hours, and now Im up to 17 hours...
> with a decent amount of use combined with idle time.
> 
> Craig
> 

At least two of us are lucky.  I can't reproduce the crashes "anymore"
either.  I am up to 2 days now, was up to 3 or 4 before I booted
2.4.23 for a while to see if I could make that kernel crash, which I
couldn't.  I will see how long I can go, since 5 days or so seems to
be the top uptime. 

-- 
Josh McKinney		     |	Webmaster: http://joshandangie.org
--------------------------------------------------------------------------
                             | They that can give up essential liberty
Linux, the choice       -o)  | to obtain a little temporary safety deserve 
of the GNU generation    /\  | neither liberty or safety. 
                        _\_v |                          -Benjamin Franklin

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-05  8:58   ` Mike Fedyk
  2003-12-05 12:06     ` Mikael Pettersson
@ 2003-12-08  2:20     ` Bob
  2003-12-09 14:21       ` Maciej W. Rozycki
  1 sibling, 1 reply; 62+ messages in thread
From: Bob @ 2003-12-08  2:20 UTC (permalink / raw)
  To: linux-kernel

Mike Fedyk wrote:

>for his motherboard.  It doesn't increment NMI in /proc/interrupts.  And it
>gives the above error message.  Isn't that a bug?
>  
>
> But nmi_watchdog=1 is supposed to work with APIC, or IO-APIC, and it isn't

Do you mean like this with an MSI K7N2 Delta MCP2-T mboard
and nmi in kernel and this in cat /proc/interrupts, also in /etc/lilo.conf
I have append="nmi_watchdog=1" ? Nothing "nmi" or "NMI" is logged.

 cat /proc/interrupts
           CPU0      
  0:  241105839          XT-PIC  timer
  1:      27337    IO-APIC-edge  i8042
  2:          0          XT-PIC  cascade
  8:          1    IO-APIC-edge  rtc
  9:          0   IO-APIC-level  acpi
 12:     217952    IO-APIC-edge  i8042
 14:         22    IO-APIC-edge  ide0
 15:         24    IO-APIC-edge  ide1
 16:    4245875   IO-APIC-level  3ware Storage Controller, yenta, yenta
 17:    5428737   IO-APIC-level  eth0
 21:          0   IO-APIC-level  NVidia nForce2
NMI:          0
LOC:  241091187
ERR:          0
MIS:          6





^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-08  2:20     ` Bob
@ 2003-12-09 14:21       ` Maciej W. Rozycki
  2003-12-09 16:35         ` Bob
  0 siblings, 1 reply; 62+ messages in thread
From: Maciej W. Rozycki @ 2003-12-09 14:21 UTC (permalink / raw)
  To: Bob; +Cc: linux-kernel

On Sun, 7 Dec 2003, Bob wrote:

> I have append="nmi_watchdog=1" ? Nothing "nmi" or "NMI" is logged.
> 
>  cat /proc/interrupts
>            CPU0      
>   0:  241105839          XT-PIC  timer
>   1:      27337    IO-APIC-edge  i8042
>   2:          0          XT-PIC  cascade
>   8:          1    IO-APIC-edge  rtc
>   9:          0   IO-APIC-level  acpi
>  12:     217952    IO-APIC-edge  i8042
>  14:         22    IO-APIC-edge  ide0
>  15:         24    IO-APIC-edge  ide1
>  16:    4245875   IO-APIC-level  3ware Storage Controller, yenta, yenta
>  17:    5428737   IO-APIC-level  eth0
>  21:          0   IO-APIC-level  NVidia nForce2
> NMI:          0
> LOC:  241091187
> ERR:          0
> MIS:          6

 You don't have the NMI watchdog working, because the timer interrupt is
configured as an 8259A interrupt ("XT-PIC" for IRQ 0 in the output above).  
This usually means the wiring of a particular system doesn't provide any
other alternative or configuration data provided by the BIOS is broken.
The timer interrupt has to be configured as an I/O APIC interrupt for the
watchdog to work, or you can select "nmi_watchdog=2" for an alternative 
watchdog internal to processors if they support it.

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-09 14:21       ` Maciej W. Rozycki
@ 2003-12-09 16:35         ` Bob
  2003-12-10 13:41           ` Maciej W. Rozycki
  0 siblings, 1 reply; 62+ messages in thread
From: Bob @ 2003-12-09 16:35 UTC (permalink / raw)
  To: linux-kernel

Maciej W. Rozycki wrote:

>On Sun, 7 Dec 2003, Bob wrote:
>
>  
>
>>I have append="nmi_watchdog=1" ? Nothing "nmi" or "NMI" is logged.
>>
>> cat /proc/interrupts
>>           CPU0      
>>  0:  241105839          XT-PIC  timer...................
>>NMI:          0...........
>>
> You don't have the NMI watchdog working, because the timer interrupt is
>configured as an 8259A interrupt ("XT-PIC" for IRQ 0 in the output above).  
>This usually means the wiring of a particular system doesn't provide any
>other alternative or configuration data provided by the BIOS is broken.
>The timer interrupt has to be configured as an I/O APIC interrupt for the
>watchdog to work, or you can select "nmi_watchdog=2" for an alternative 
>watchdog internal to processors if they support it.
>
>  
>
Using a patch that fixes a number of people's nforce2
lockups while enabling io-apic edge timer, I can now
use nmi_watchdog=2 but not =1

turn on ioapic edge timer--

http://www.kernel.org/pub/linux/kernel/people/bart/2.6.0-test11-bart1/broken-out/nforce2-apic.patch

We're all trying to get acpi, apic, lapic, io-apic working
when turned on in cmos/bios and kernel.

The three things that each alone have achieved stability
on somebody's system here are 1) bios update 2) cpu
disconnect off either in cmos if available or by athcool
or kernel patch with same 3) timing delay patch

For CPU disconnect you still need athcool or this one
http://www.kernel.org/pub/linux/kernel/people/bart/2.6.0-test11-bart1/broken-out/nforce2-disconnect-quirk.patch 


Both patches are for 2.6.0-test11 kernel.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-09 16:35         ` Bob
@ 2003-12-10 13:41           ` Maciej W. Rozycki
  2003-12-12 16:01             ` bill davidsen
  0 siblings, 1 reply; 62+ messages in thread
From: Maciej W. Rozycki @ 2003-12-10 13:41 UTC (permalink / raw)
  To: Bob; +Cc: linux-kernel

On Tue, 9 Dec 2003, Bob wrote:

> > You don't have the NMI watchdog working, because the timer interrupt is
> >configured as an 8259A interrupt ("XT-PIC" for IRQ 0 in the output above).  
> >This usually means the wiring of a particular system doesn't provide any
> >other alternative or configuration data provided by the BIOS is broken.
> >The timer interrupt has to be configured as an I/O APIC interrupt for the
> >watchdog to work, or you can select "nmi_watchdog=2" for an alternative 
> >watchdog internal to processors if they support it.
> >
> Using a patch that fixes a number of people's nforce2
> lockups while enabling io-apic edge timer, I can now
> use nmi_watchdog=2 but not =1

 The I/O APIC NMI watchdog utilizes the property of being transparent to a
single IRQ source of a specially reconfigured 8259A PIC (the master one in
the IA32 PC architecture).  There are more prerequisites that have to be
met and all indeed are for a 100% compatible PC as specified by the
Intel's Multiprocessor Specification.

1. The INT output of the master 8259A PIC has to be connected to the LINT0
(or LINTIN0; the name varies by implementations) inputs of all local APICs
in the system.

2a. The OUT0 output of the 8254 PIT (IOW the timer source) has to be 
directly connected to the INTIN2 input of the first I/O APIC.

2b. Alternatively the INT output of the master 8259A PIC has to be
connected to the INTIN0 input of the first I/O APIC.

3. There must be no glue logic that would change logical properties of the
signal between the INT output of the master 8259A PIC and the respective
APIC interrupt inputs.

In practice, assuming the MP IRQ routing information provided the BIOS has
been correct (which is not always the case), prerequisites #1 and #2 have
been met so far, but #3 has proved to be occasionally problematic.

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-10 13:41           ` Maciej W. Rozycki
@ 2003-12-12 16:01             ` bill davidsen
  2003-12-12 16:47               ` Maciej W. Rozycki
  2003-12-12 22:27               ` George Anzinger
  0 siblings, 2 replies; 62+ messages in thread
From: bill davidsen @ 2003-12-12 16:01 UTC (permalink / raw)
  To: linux-kernel

In article <Pine.LNX.4.55.0312101421540.31543@jurand.ds.pg.gda.pl>,
Maciej W. Rozycki <macro@ds2.pg.gda.pl> wrote:

|  The I/O APIC NMI watchdog utilizes the property of being transparent to a
| single IRQ source of a specially reconfigured 8259A PIC (the master one in
| the IA32 PC architecture).  There are more prerequisites that have to be
| met and all indeed are for a 100% compatible PC as specified by the
| Intel's Multiprocessor Specification.
| 
| 1. The INT output of the master 8259A PIC has to be connected to the LINT0
| (or LINTIN0; the name varies by implementations) inputs of all local APICs
| in the system.
| 
| 2a. The OUT0 output of the 8254 PIT (IOW the timer source) has to be 
| directly connected to the INTIN2 input of the first I/O APIC.
| 
| 2b. Alternatively the INT output of the master 8259A PIC has to be
| connected to the INTIN0 input of the first I/O APIC.
| 
| 3. There must be no glue logic that would change logical properties of the
| signal between the INT output of the master 8259A PIC and the respective
| APIC interrupt inputs.
| 
| In practice, assuming the MP IRQ routing information provided the BIOS has
| been correct (which is not always the case), prerequisites #1 and #2 have
| been met so far, but #3 has proved to be occasionally problematic.

In practice many system seem to take a good bit of guessing and testing.
I have an old P-II which only works with acpi=force and nmi_watchdog=2,
for instance.

It would be nice if there were a program which could poke at the
hardware and suggest options which might work, as in eliminating the
ones which can be determined not to work. Absent that trial and error
rule, unfortunately.
-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-12 16:01             ` bill davidsen
@ 2003-12-12 16:47               ` Maciej W. Rozycki
  2003-12-12 16:57                 ` Richard B. Johnson
  2003-12-13  5:16                 ` Bill Davidsen
  2003-12-12 22:27               ` George Anzinger
  1 sibling, 2 replies; 62+ messages in thread
From: Maciej W. Rozycki @ 2003-12-12 16:47 UTC (permalink / raw)
  To: bill davidsen; +Cc: linux-kernel

On Fri, 12 Dec 2003, bill davidsen wrote:

> | In practice, assuming the MP IRQ routing information provided the BIOS has
> | been correct (which is not always the case), prerequisites #1 and #2 have
> | been met so far, but #3 has proved to be occasionally problematic.
> 
> In practice many system seem to take a good bit of guessing and testing.
> I have an old P-II which only works with acpi=force and nmi_watchdog=2,
> for instance.

 Well, the NMI watchdog is a side-effect feature that works by chance
rather than by design.  So you can't really complain it doesn't work
somewhere, although I wouldn't mind if new hardware was designed such that
it works.  You shouldn't have to use "acpi=force" for the watchdog to work
though and for a PII system if "nmi_watchdog=1" doesn't work, then I
suspect a BIOS bug (set APIC_DEBUG to 1 in asm-i386/apic.h and send me the
bootstrap log and a dump from `mptable' for a diagnosis, if interested).

> It would be nice if there were a program which could poke at the
> hardware and suggest options which might work, as in eliminating the
> ones which can be determined not to work. Absent that trial and error
> rule, unfortunately.

 Linux has all appropriate bits to set up hardware reasonably as long as
BIOS provides accurate information.  The only case our code fails is when
BIOS tells us lies and the there's little we can do about it.  Actually we
are doing hardware manufacturers a favor we try to handle some cases at
all -- it's the BIOS that should be fixed instead and it is software and
it is stored in Flash memories these days, so there's no excuse.  So if
there's a problem with running Linux because of BIOS bugs, then please
bugger the manufacturer in the first place (and avoid the company in the
future if they don't support Linux).

 Sometimes the NMI watchdog works in principle, but its activation leads
to system instability -- almost always this is a symptom of buggy SMM code
executed by the BIOS behind our back (NMIs are disabled by default in the
SMM, but careless code may enable them by accident).

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-12 16:47               ` Maciej W. Rozycki
@ 2003-12-12 16:57                 ` Richard B. Johnson
  2003-12-12 17:21                   ` Maciej W. Rozycki
  2003-12-13  5:16                 ` Bill Davidsen
  1 sibling, 1 reply; 62+ messages in thread
From: Richard B. Johnson @ 2003-12-12 16:57 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: bill davidsen, linux-kernel

On Fri, 12 Dec 2003, Maciej W. Rozycki wrote:

> On Fri, 12 Dec 2003, bill davidsen wrote:
>
> > | In practice, assuming the MP IRQ routing information provided the BIOS has
> > | been correct (which is not always the case), prerequisites #1 and #2 have
> > | been met so far, but #3 has proved to be occasionally problematic.
> >
> > In practice many system seem to take a good bit of guessing and testing.
> > I have an old P-II which only works with acpi=force and nmi_watchdog=2,
> > for instance.
>
>  Well, the NMI watchdog is a side-effect feature that works by chance
> rather than by design.  So you can't really complain it doesn't work
> somewhere, although I wouldn't mind if new hardware was designed such that
> it works.  You shouldn't have to use "acpi=force" for the watchdog to work
> though and for a PII system if "nmi_watchdog=1" doesn't work, then I
> suspect a BIOS bug (set APIC_DEBUG to 1 in asm-i386/apic.h and send me the
> bootstrap log and a dump from `mptable' for a diagnosis, if interested).
>
> > It would be nice if there were a program which could poke at the
> > hardware and suggest options which might work, as in eliminating the
> > ones which can be determined not to work. Absent that trial and error
> > rule, unfortunately.
>
>  Linux has all appropriate bits to set up hardware reasonably as long as
> BIOS provides accurate information.  The only case our code fails is when
> BIOS tells us lies and the there's little we can do about it.  Actually we
> are doing hardware manufacturers a favor we try to handle some cases at
> all -- it's the BIOS that should be fixed instead and it is software and
> it is stored in Flash memories these days, so there's no excuse.  So if
> there's a problem with running Linux because of BIOS bugs, then please
> bugger the manufacturer in the first place (and avoid the company in the
> future if they don't support Linux).
>
>  Sometimes the NMI watchdog works in principle, but its activation leads
> to system instability -- almost always this is a symptom of buggy SMM code
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> executed by the BIOS behind our back (NMIs are disabled by default in the
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> SMM, but careless code may enable them by accident).

The NMI vector goes to Linux code. In fact all interrupt vectors
go to Linux code. There is no way that some BIOS code could possibly
be accidentally executed here. Some Linux code would have to
call some 16-bit BIOS code somewhere, and it doesn't even know
where..........

Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-12 16:57                 ` Richard B. Johnson
@ 2003-12-12 17:21                   ` Maciej W. Rozycki
  0 siblings, 0 replies; 62+ messages in thread
From: Maciej W. Rozycki @ 2003-12-12 17:21 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: bill davidsen, linux-kernel

On Fri, 12 Dec 2003, Richard B. Johnson wrote:

> >  Sometimes the NMI watchdog works in principle, but its activation leads
> > to system instability -- almost always this is a symptom of buggy SMM code
>                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > executed by the BIOS behind our back (NMIs are disabled by default in the
>   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > SMM, but careless code may enable them by accident).
> 
> The NMI vector goes to Linux code. In fact all interrupt vectors
> go to Linux code. There is no way that some BIOS code could possibly
> be accidentally executed here. Some Linux code would have to
> call some 16-bit BIOS code somewhere, and it doesn't even know
> where..........

 The problem happens when the SMM is active (i.e. the BIOS code is being
executed) after an SMI has been received during Linux operation (SMIs may
get triggered due to various reasons -- a parity/ECC error caught by the
chipset, an access to an emulated 8042 controller, a power failure in a
notebook, etc.) and an NMI arrives.  When in the SMM, no interrupt
(including the NMI) causes a switch back into the protected mode (and the
processor expects real-mode style interrupt vectors), so the Linux's NMI
handler is never reached and the SMM's NMI handler (if at all initialized)  
isn't appropriate for handling the NMI watchdog.  Since the SMM cannot
know what NMIs are used for in a particular OS, the code should best keep
NMIs disabled -- then an arriving NMI event is latched and postponed until
after the RSM instruction is executed.

 The SMM was invented to be transparent to a running OS, but care has to
be taken for this to be true and firmware bugs sometimes make the SMM
activity visible.

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-12 16:01             ` bill davidsen
  2003-12-12 16:47               ` Maciej W. Rozycki
@ 2003-12-12 22:27               ` George Anzinger
  2003-12-15 13:13                 ` Maciej W. Rozycki
  1 sibling, 1 reply; 62+ messages in thread
From: George Anzinger @ 2003-12-12 22:27 UTC (permalink / raw)
  To: macro; +Cc: bill davidsen, linux-kernel

Having had cause to try and figure out all this, I vote for the following being 
included in the source somewhere...

-g

bill davidsen wrote:
> In article <Pine.LNX.4.55.0312101421540.31543@jurand.ds.pg.gda.pl>,
> Maciej W. Rozycki <macro@ds2.pg.gda.pl> wrote:
> 
> |  The I/O APIC NMI watchdog utilizes the property of being transparent to a
> | single IRQ source of a specially reconfigured 8259A PIC (the master one in
> | the IA32 PC architecture).  There are more prerequisites that have to be
> | met and all indeed are for a 100% compatible PC as specified by the
> | Intel's Multiprocessor Specification.
> | 
> | 1. The INT output of the master 8259A PIC has to be connected to the LINT0
> | (or LINTIN0; the name varies by implementations) inputs of all local APICs
> | in the system.
> | 
> | 2a. The OUT0 output of the 8254 PIT (IOW the timer source) has to be 
> | directly connected to the INTIN2 input of the first I/O APIC.
> | 
> | 2b. Alternatively the INT output of the master 8259A PIC has to be
> | connected to the INTIN0 input of the first I/O APIC.
> | 
> | 3. There must be no glue logic that would change logical properties of the
> | signal between the INT output of the master 8259A PIC and the respective
> | APIC interrupt inputs.
> | 
> | In practice, assuming the MP IRQ routing information provided the BIOS has
> | been correct (which is not always the case), prerequisites #1 and #2 have
> | been met so far, but #3 has proved to be occasionally problematic.
> 
> In practice many system seem to take a good bit of guessing and testing.
> I have an old P-II which only works with acpi=force and nmi_watchdog=2,
> for instance.
> 
> It would be nice if there were a program which could poke at the
> hardware and suggest options which might work, as in eliminating the
> ones which can be determined not to work. Absent that trial and error
> rule, unfortunately.

-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-12 16:47               ` Maciej W. Rozycki
  2003-12-12 16:57                 ` Richard B. Johnson
@ 2003-12-13  5:16                 ` Bill Davidsen
  2003-12-15 13:23                   ` Maciej W. Rozycki
  1 sibling, 1 reply; 62+ messages in thread
From: Bill Davidsen @ 2003-12-13  5:16 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-kernel

On Fri, 12 Dec 2003, Maciej W. Rozycki wrote:

> On Fri, 12 Dec 2003, bill davidsen wrote:
> 
> > | In practice, assuming the MP IRQ routing information provided the BIOS has
> > | been correct (which is not always the case), prerequisites #1 and #2 have
> > | been met so far, but #3 has proved to be occasionally problematic.
> > 
> > In practice many system seem to take a good bit of guessing and testing.
> > I have an old P-II which only works with acpi=force and nmi_watchdog=2,
> > for instance.
> 
>  Well, the NMI watchdog is a side-effect feature that works by chance
> rather than by design.  So you can't really complain it doesn't work
> somewhere, although I wouldn't mind if new hardware was designed such that
> it works.  You shouldn't have to use "acpi=force" for the watchdog to work
> though and for a PII system if "nmi_watchdog=1" doesn't work, then I
> suspect a BIOS bug (set APIC_DEBUG to 1 in asm-i386/apic.h and send me the
> bootstrap log and a dump from `mptable' for a diagnosis, if interested).

Has the check to see if the BIOS is old than very recent been removed? I
used to get a message that the BIOS was too old, I believe that's what
prompted the acpi to enable the local apic. Sorrt, I've been running that
feature since 2.5.3x or so and I just carried it forward.

> 
> > It would be nice if there were a program which could poke at the
> > hardware and suggest options which might work, as in eliminating the
> > ones which can be determined not to work. Absent that trial and error
> > rule, unfortunately.
> 
>  Linux has all appropriate bits to set up hardware reasonably as long as
> BIOS provides accurate information.  The only case our code fails is when
> BIOS tells us lies and the there's little we can do about it.  Actually we
> are doing hardware manufacturers a favor we try to handle some cases at
> all -- it's the BIOS that should be fixed instead and it is software and
> it is stored in Flash memories these days, so there's no excuse.  So if
> there's a problem with running Linux because of BIOS bugs, then please
> bugger the manufacturer in the first place (and avoid the company in the
> future if they don't support Linux).
> 
>  Sometimes the NMI watchdog works in principle, but its activation leads
> to system instability -- almost always this is a symptom of buggy SMM code
> executed by the BIOS behind our back (NMIs are disabled by default in the
> SMM, but careless code may enable them by accident).

Works fine for me, system stays up for 30-40 days when I let it... I also
run softdog to catch hangs in user mode but not in the kernel. That also
works.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-12 22:27               ` George Anzinger
@ 2003-12-15 13:13                 ` Maciej W. Rozycki
  2003-12-15 21:42                   ` George Anzinger
  0 siblings, 1 reply; 62+ messages in thread
From: Maciej W. Rozycki @ 2003-12-15 13:13 UTC (permalink / raw)
  To: George Anzinger; +Cc: linux-kernel

On Fri, 12 Dec 2003, George Anzinger wrote:

> Having had cause to try and figure out all this, I vote for the following being 
> included in the source somewhere...

 Hmm, you could have simply asked... ;-)  Anyway, an inclusion is doable,
I guess.

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-13  5:16                 ` Bill Davidsen
@ 2003-12-15 13:23                   ` Maciej W. Rozycki
  0 siblings, 0 replies; 62+ messages in thread
From: Maciej W. Rozycki @ 2003-12-15 13:23 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-kernel

On Sat, 13 Dec 2003, Bill Davidsen wrote:

> >  Well, the NMI watchdog is a side-effect feature that works by chance
> > rather than by design.  So you can't really complain it doesn't work
> > somewhere, although I wouldn't mind if new hardware was designed such that
> > it works.  You shouldn't have to use "acpi=force" for the watchdog to work
> > though and for a PII system if "nmi_watchdog=1" doesn't work, then I
> > suspect a BIOS bug (set APIC_DEBUG to 1 in asm-i386/apic.h and send me the
> > bootstrap log and a dump from `mptable' for a diagnosis, if interested).
> 
> Has the check to see if the BIOS is old than very recent been removed? I
> used to get a message that the BIOS was too old, I believe that's what
> prompted the acpi to enable the local apic. Sorrt, I've been running that
> feature since 2.5.3x or so and I just carried it forward.

 I don't know what check you refer to, sorry.  I don't think we do any
version checks in the APIC code.  Perhaps ACPI does some, but having no
use for it anywhere I'm not familiar with that area.

 If the "nmi_watchdog=1" option doesn't work for a PII system, then its
most likely a bug in BIOS IRQ routing tables -- either missing or broken
entries for the 8254 timer and/or the 8259A ExtINTA source.

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-15 13:13                 ` Maciej W. Rozycki
@ 2003-12-15 21:42                   ` George Anzinger
  2003-12-16 13:37                     ` Maciej W. Rozycki
  0 siblings, 1 reply; 62+ messages in thread
From: George Anzinger @ 2003-12-15 21:42 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-kernel

Maciej W. Rozycki wrote:
> On Fri, 12 Dec 2003, George Anzinger wrote:
> 
> 
>>Having had cause to try and figure out all this, I vote for the following being 
>>included in the source somewhere...
> 
> 
>  Hmm, you could have simply asked... ;-)  Anyway, an inclusion is doable,
> I guess.
> 

I suspect I did, but most likey the wrong place.  In any case, I would like to 
think that "read the source, Luke" is the right answer.

So, while I am in the asking mode, is there a simple way to turn off the PIT 
interrupt without changing the PIT program?  I would like a way to stop the 
interrupts AND also stop the NMIs that it generates for the watchdog.  I suspect 
that this is a bit more complex that it would appear, due to how its wired.

-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-15 21:42                   ` George Anzinger
@ 2003-12-16 13:37                     ` Maciej W. Rozycki
  2003-12-16 13:57                       ` Richard B. Johnson
  2003-12-16 17:26                       ` George Anzinger
  0 siblings, 2 replies; 62+ messages in thread
From: Maciej W. Rozycki @ 2003-12-16 13:37 UTC (permalink / raw)
  To: George Anzinger; +Cc: linux-kernel

On Mon, 15 Dec 2003, George Anzinger wrote:

> >  Hmm, you could have simply asked... ;-)  Anyway, an inclusion is doable,
> > I guess.
> 
> I suspect I did, but most likey the wrong place.  In any case, I would like to 
> think that "read the source, Luke" is the right answer.

 Certainly it is, but not necessarily the only one. ;-)

> So, while I am in the asking mode, is there a simple way to turn off the PIT 
> interrupt without changing the PIT program?  I would like a way to stop the 
> interrupts AND also stop the NMIs that it generates for the watchdog.  I suspect 
> that this is a bit more complex that it would appear, due to how its wired.

 Well, in PC/AT compatible implementations, the counter #0 of the PIT has
its gate hardwired to active, so you cannot mask the PIT output itself.  
So the only other choices are either reprogramming the counter to a mode
that won't cause periodic triggers (which is probably the easiest way, but
you don't want to do that for some purpose, right?) or reprogramming
interrupt controllers not to accept interrupts arriving from the PIT.

 Note that Linux may behave strangely then. ;-)

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-16 13:37                     ` Maciej W. Rozycki
@ 2003-12-16 13:57                       ` Richard B. Johnson
  2003-12-16 15:47                         ` Maciej W. Rozycki
  2003-12-16 17:26                       ` George Anzinger
  1 sibling, 1 reply; 62+ messages in thread
From: Richard B. Johnson @ 2003-12-16 13:57 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: George Anzinger, linux-kernel

On Tue, 16 Dec 2003, Maciej W. Rozycki wrote:

> On Mon, 15 Dec 2003, George Anzinger wrote:
>
> > >  Hmm, you could have simply asked... ;-)  Anyway, an inclusion is doable,
> > > I guess.
> >
> > I suspect I did, but most likey the wrong place.  In any case, I would like to
> > think that "read the source, Luke" is the right answer.
>
>  Certainly it is, but not necessarily the only one. ;-)
>
> > So, while I am in the asking mode, is there a simple way to turn off the PIT
> > interrupt without changing the PIT program?  I would like a way to stop the
> > interrupts AND also stop the NMIs that it generates for the watchdog.  I suspect
> > that this is a bit more complex that it would appear, due to how its wired.
>
>  Well, in PC/AT compatible implementations, the counter #0 of the PIT has
> its gate hardwired to active, so you cannot mask the PIT output itself.
> So the only other choices are either reprogramming the counter to a mode
> that won't cause periodic triggers (which is probably the easiest way, but
> you don't want to do that for some purpose, right?) or reprogramming
> interrupt controllers not to accept interrupts arriving from the PIT.
>
>  Note that Linux may behave strangely then. ;-)
>

Masking OFF the timer channel 0 in the interrupt controller
is probably the easiest thing to do. The port is read-write,
and the OCW default to having it accessible.

	movw	$0x21, %dx	# Controller 0, mask register
	inb	%dx, %al	# Get mask
	orb	$1, %al		# Mask off bit 0
	outb	%al, %dx	# Write it back

You can reenable by:

	movw	$0x21, %dx
	inb	%dx, %al
	andb	$~1, %al
	outb	%al, %dx

With port numbers less that 256, you actually don't need the
DX register but I forget if the AT&T assembler needs a $ before
the port number when doing this.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-16 13:57                       ` Richard B. Johnson
@ 2003-12-16 15:47                         ` Maciej W. Rozycki
  2003-12-16 16:44                           ` Richard B. Johnson
  0 siblings, 1 reply; 62+ messages in thread
From: Maciej W. Rozycki @ 2003-12-16 15:47 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: George Anzinger, linux-kernel

On Tue, 16 Dec 2003, Richard B. Johnson wrote:

> Masking OFF the timer channel 0 in the interrupt controller
> is probably the easiest thing to do. The port is read-write,
> and the OCW default to having it accessible.

 Note we are writing about configurations involving an I/O APIC, so things
are not that easy -- the 8254 timer IRQ may be wired in different ways.

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-16 15:47                         ` Maciej W. Rozycki
@ 2003-12-16 16:44                           ` Richard B. Johnson
  2003-12-16 16:50                             ` Maciej W. Rozycki
  0 siblings, 1 reply; 62+ messages in thread
From: Richard B. Johnson @ 2003-12-16 16:44 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: George Anzinger, Linux kernel

On Tue, 16 Dec 2003, Maciej W. Rozycki wrote:

> On Tue, 16 Dec 2003, Richard B. Johnson wrote:
>
> > Masking OFF the timer channel 0 in the interrupt controller
> > is probably the easiest thing to do. The port is read-write,
> > and the OCW default to having it accessible.
>
>  Note we are writing about configurations involving an I/O APIC, so things
> are not that easy -- the 8254 timer IRQ may be wired in different ways.
>
> --
> +  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
> +--------------------------------------------------------------+
> +        e-mail: macro@ds2.pg.gda.pl, PGP key available        +


Well if I was trying to isolate a problem, I would make it
that easy. You boot the machine in its simplist configuration
and work "up" from there.

Although I haven't looked at recent source-code, with APIC, the
problem is even simpler. If you booted with APIC, just set
the global "using_apic_timer" to zero and, voila`, timer-ticks
stop.

Any any event, the caller needs to know that if there is
any code executing anywhere that does the equivalent of

		for(;;)
                      ;

...the machine will lock-up forever because without that timer,
there will be no preemption. Once a CPU-hog gets the CPU, only
and interrupt can get it away.


Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-16 16:44                           ` Richard B. Johnson
@ 2003-12-16 16:50                             ` Maciej W. Rozycki
  0 siblings, 0 replies; 62+ messages in thread
From: Maciej W. Rozycki @ 2003-12-16 16:50 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: George Anzinger, Linux kernel

On Tue, 16 Dec 2003, Richard B. Johnson wrote:

> Although I haven't looked at recent source-code, with APIC, the
> problem is even simpler. If you booted with APIC, just set
> the global "using_apic_timer" to zero and, voila`, timer-ticks
> stop.

 Except we are writing of the 8254 timer, not the local APIC one...

> ...the machine will lock-up forever because without that timer,
> there will be no preemption. Once a CPU-hog gets the CPU, only
> and interrupt can get it away.

 And the 8254 timer isn't used for preemption when local APICs are used,
so disabling it won't break the whole system, only the timekeeping.

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-16 13:37                     ` Maciej W. Rozycki
  2003-12-16 13:57                       ` Richard B. Johnson
@ 2003-12-16 17:26                       ` George Anzinger
  2003-12-16 20:54                         ` Maciej W. Rozycki
  1 sibling, 1 reply; 62+ messages in thread
From: George Anzinger @ 2003-12-16 17:26 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-kernel

Maciej W. Rozycki wrote:
> On Mon, 15 Dec 2003, George Anzinger wrote:
> 
> 
>>> Hmm, you could have simply asked... ;-)  Anyway, an inclusion is doable,
>>>I guess.
>>
>>I suspect I did, but most likey the wrong place.  In any case, I would like to 
>>think that "read the source, Luke" is the right answer.
> 
> 
>  Certainly it is, but not necessarily the only one. ;-)
> 
> 
>>So, while I am in the asking mode, is there a simple way to turn off the PIT 
>>interrupt without changing the PIT program?  I would like a way to stop the 
>>interrupts AND also stop the NMIs that it generates for the watchdog.  I suspect 
>>that this is a bit more complex that it would appear, due to how its wired.
> 
> 
>  Well, in PC/AT compatible implementations, the counter #0 of the PIT has
> its gate hardwired to active, so you cannot mask the PIT output itself.  
> So the only other choices are either reprogramming the counter to a mode
> that won't cause periodic triggers (which is probably the easiest way, but
> you don't want to do that for some purpose, right?) or reprogramming
> interrupt controllers not to accept interrupts arriving from the PIT.
> 
>  Note that Linux may behave strangely then. ;-)

This is for the VST code where we want to stop the timer interrupts for a bit IF 
and only if we are in the idle task AND there are no timers to service, i.e. the 
interrupt would be useless.  We don't want to mess with the PIT program as that 
would mess up the time when we turn it on again.  So we just want to stop a few 
interrupts from time to time.  We catch up after turning the PIT back on by 
using the TSC or pm_timer or some other source that keeps something close to 
reasonable time.
> 

-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-16 17:26                       ` George Anzinger
@ 2003-12-16 20:54                         ` Maciej W. Rozycki
  2003-12-16 21:53                           ` George Anzinger
  0 siblings, 1 reply; 62+ messages in thread
From: Maciej W. Rozycki @ 2003-12-16 20:54 UTC (permalink / raw)
  To: George Anzinger; +Cc: linux-kernel

On Tue, 16 Dec 2003, George Anzinger wrote:

> This is for the VST code where we want to stop the timer interrupts for a bit IF 
> and only if we are in the idle task AND there are no timers to service, i.e. the 
> interrupt would be useless.  We don't want to mess with the PIT program as that 
> would mess up the time when we turn it on again.  So we just want to stop a few 
> interrupts from time to time.  We catch up after turning the PIT back on by 
> using the TSC or pm_timer or some other source that keeps something close to 
> reasonable time.

 I see.  Well, then disable_irq(0) may be the easiest way to do that for
the regular timer interrupt.  For the NMI watchdog from the I/O APIC you'd
use disable_8259A_irq(0) and for one from the local APIC -- just mask the
APIC_LVTPC interrupt (there's no wrapper function, but that's easy).

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-16 20:54                         ` Maciej W. Rozycki
@ 2003-12-16 21:53                           ` George Anzinger
  2003-12-17 14:03                             ` Maciej W. Rozycki
  0 siblings, 1 reply; 62+ messages in thread
From: George Anzinger @ 2003-12-16 21:53 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-kernel

Maciej W. Rozycki wrote:
> On Tue, 16 Dec 2003, George Anzinger wrote:
> 
> 
>>This is for the VST code where we want to stop the timer interrupts for a bit IF 
>>and only if we are in the idle task AND there are no timers to service, i.e. the 
>>interrupt would be useless.  We don't want to mess with the PIT program as that 
>>would mess up the time when we turn it on again.  So we just want to stop a few 
>>interrupts from time to time.  We catch up after turning the PIT back on by 
>>using the TSC or pm_timer or some other source that keeps something close to 
>>reasonable time.
> 
> 
>  I see.  Well, then disable_irq(0) may be the easiest way to do that for
> the regular timer interrupt.  For the NMI watchdog from the I/O APIC you'd
> use disable_8259A_irq(0) and for one from the local APIC -- just mask the
> APIC_LVTPC interrupt (there's no wrapper function, but that's easy).

How confusing :(  Could you give me some idea how this works?  I have tried 
disable_irq(0) and, as best as I can tell, it does not do the trick.  The 
confusion I have is understanding where in the chain of hardware each of these 
thing is taking place.

For example, it would be "nice" if I could just turn off the PIT interrupt line 
so that both the NMI (PIT generated) and the PIT interrupt would be put on hold. 
  Your answer seems to indicate that disable_irq() is working down stream from 
where the NMI signal is connected to the PIT interrupt line, so we need to turn 
of the NMI as well.  A picture would be nice here :)

> 

-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-16 21:53                           ` George Anzinger
@ 2003-12-17 14:03                             ` Maciej W. Rozycki
  0 siblings, 0 replies; 62+ messages in thread
From: Maciej W. Rozycki @ 2003-12-17 14:03 UTC (permalink / raw)
  To: George Anzinger; +Cc: linux-kernel

On Tue, 16 Dec 2003, George Anzinger wrote:

> How confusing :(  Could you give me some idea how this works?  I have tried 
> disable_irq(0) and, as best as I can tell, it does not do the trick.  The 
> confusion I have is understanding where in the chain of hardware each of these 
> thing is taking place.

 Well, strange -- it should mask the timer interrupt.  But I've never 
tried that and have proposed based on a source study only -- perhaps it 
needs to be further investigated.

> For example, it would be "nice" if I could just turn off the PIT interrupt line 
> so that both the NMI (PIT generated) and the PIT interrupt would be put on hold. 

 The counter gate of the 8254 chip is designed to do just that -- it's a
pity it's hardwired, but I can understand another SSI TTL latch of a
dubious utility was just too costly for the original PC in 1981.

>   Your answer seems to indicate that disable_irq() is working down stream from 
> where the NMI signal is connected to the PIT interrupt line, so we need to turn 
> of the NMI as well.  A picture would be nice here :)

 I'll try my best:

 +------+ OUT0                                INTIN2 +--------+
 | 8254 +--+-----------------------------------------+        |
 +------+  |                                         |  I/O   |
           | IR0 +------+ INT +------+        INTIN0 |  APIC  |
           +-----+ 8259 +-----+ glue +-+-------------+        |
                 +------+     +------+ |             +---++---+
                                       |                 ||
                                       |                 ||
                                       |                 ||
                           +-----------+---------+-...   ||
                           |                     |       ||
        +--------+         |  +--------+         |       ||
        | CPU #0 |         |  | CPU #1 |         |       ||
        +--------+         |  +--------+         |       ||
        |        | LINT0   |  |        | LINT0   | ...   ||
        | local  +---------+  | local  +---------+       ||
        | APIC   |            | APIC   |                 ||
        |        |            |        |                 ||
        +---++---+            +---++---+                 ||
            ||                    ||                     ||
            || inter-APIC bus     ||                     ||
            ++====================++===============...===++

The system is a traditional i82489DX/Pentium/P6-style virtual-wire setup
with a serial inter-APIC bus and a full MP-spec feature set.  More limited
systems may miss the OUT0->INTIN2 line and/or one or more of the
INT->INTIN0 or INT->LINT0 -- there needs to be only one.  If any INT->sth
connections are missing then either the INT->LINT0 one for the bootstrap
processor (BSP) or the INT->LINT0 has to exist; other are optional.

 For the system above the path for the 8254 timer interrupt is via INTIN2
and the inter-APIC bus as a LoPri APIC interrupt.  The path for the NMI
watchdog is via the 8259 reconfigured to pass IR0 transparently to INT and
then LINT0 inputs of all processors, reconfigured for a NMI APIC
interrupt.  Some glue at the INT output may prevent the NMI watchdog from
working -- the LINT0 inputs may not toggle back and forth.

 If the OUT0->INTIN2 line is missing, the path for the 8254 timer
interrupt is via the 8259 reconfigured to pass IR0 transparently to INT,
then INTIN0 and the inter-APIC bus as a LoPri APIC interrupt.  The path
for the NMI watchdog is also via the 8259 and then LINT0 inputs of all
processors, reconfigured for a NMI APIC interrupt.  Again, some glue at
the INT output may prevent this set up from working, but if it does work,
then both the timer interrupt and the NMI watchdog do -- I've not heard of
a system having different glue logic for INTIN0 and LINT0.

 If the above variant does not work, as a last resort, the path for the
8254 timer interrupt is via the 8259 reconfigured back into its usual mode
and then LINT0 of the BSP reconfigured for an ExtINTA APIC interrupt.  
Additionally, since at this point the glue logic has probably already
locked up due to the messing done above, a few artiffical sets of double
INTA cycles are sent to the system bus using the RTC chip and INTIN8
reconfigured temporarily to send ExtINTA APIC interrupts via the
inter-APIC bus.

 I do hope a thorough read of the description will make the available
variants clear.  The I/O APIC input numbers may differ but so far they are
almost always as noted above.

  Maciej

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-19  4:17       ` Ross Dickson
@ 2003-12-19 15:35         ` Maciej W. Rozycki
  0 siblings, 0 replies; 62+ messages in thread
From: Maciej W. Rozycki @ 2003-12-19 15:35 UTC (permalink / raw)
  To: Ross Dickson; +Cc: George Anzinger, linux-kernel

On Fri, 19 Dec 2003, Ross Dickson wrote:

> Do you know if the Athlon apic programming docs are available or under NDA?

 No idea -- I've been only loosely interested in IA-32-based hardware 
recently.

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-19  4:06   ` Ross Dickson
@ 2003-12-19 15:33     ` Maciej W. Rozycki
  0 siblings, 0 replies; 62+ messages in thread
From: Maciej W. Rozycki @ 2003-12-19 15:33 UTC (permalink / raw)
  To: Ross Dickson; +Cc: george, linux-kernel

On Fri, 19 Dec 2003, Ross Dickson wrote:

> >  It does have such a mode. ;-)  You just have not to ack a pending
> > interrupt -- if a request goes away, the INT output gets deasserted as
> > well.  We are super cautious though and we reprogram the 8259A into the
> > AEOI mode to prevent a lockup in case INTA cycles escape to the 8259A
> > (which is theoretically possible for a broken design of an i82489DX-based
> > system).  See the 8259A datasheet for details.
> 
> I believe what you have written because you say it is how the code works.

 Well, since I'm actually the author of the relevant bits (though Ingo did
some clean-ups before applying them), I must have been completely sure the 
assumptions are valid.

> I take it you mean that the INT is either never latched? or only latched with IS bit
> after receipt of first INTA?

 Yes, one of these conditions is true, although I've never bothered to
investigate exactly which one. ;-)

> It is not obvious in 8259A Datasheet published in Intel Microsystem Components
> Handbook Volume 1 1983 nor in datasheet December 1988.

 Yep, the datasheet is indeed not that clear on the matter.  The latest
version (version 3, dated Nov 1988) used to be available at the Intel's
FTP site, but I can't find it anymore.

 The 8259A core is documented in many other datasheets, perhaps more
clearly -- e.g. I've found at least one Intel datasheet providing an
unambiguos explanation of how the SFNM mode works.  I knew of the volatile
property of the INT output pretty always and it can be quite easily
verified with hardware.  Given this property some people find the way
Intel defines edge-triggered interrupts quite surprising.

> Could you please point me to the document where it is made clear? It may be
> in the i82489DX docs as I do not have them or in a later 8259A data sheet
> revision?

 Well, there is actually a hint on how this "transparent" property of the
8259A PIC can be used for delivery of EISA chaining interrupts as APIC
interrupts in the i82489DX datasheet.  The problem with these interrupts
appears with the 82357 ISP EISA component that has a pair of 8259A PICs
embedded and does not provide the interrupt line externally, only wiring
it to IRQ 13 (IR5 of the slave 8259A -- so both 8259A cores have to be
treated as "transparent"!) internally.  The same problem exists with the
8254 interrupt in this chip, but the datasheet disregards it, assuming the
local APIC timer will be used for periodic interrupts exclusively.

 Linux would use IRQ 0 in the "transparent" 8259A mode with this chip and
if that failed (which would be quite possible, since an ISP erratum
required glue logic in the 8259A path when used with an APIC and the
Intel's suggestion wasn't the most fortunate) the mixed mode with ExtINTA
interrupts would be configured.  Of course the mixed mode would also
permit simultanous use of IRQ 0 and IRQ 13 with ISP -- with the
"transparent" 8259A mode can support only a single interrupt source.

 Note the interesting internal inconsistency of the document --
implementation of the erratum workaround as proposed by Intel would make
the suggested "transparent" 8259A mode inoperational. ;-)

  Maciej

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-19  5:38     ` Ross Dickson
@ 2003-12-19 10:36       ` Craig Bradney
  0 siblings, 0 replies; 62+ messages in thread
From: Craig Bradney @ 2003-12-19 10:36 UTC (permalink / raw)
  To: ross; +Cc: Maciej W. Rozycki, george, linux-kernel

On Fri, 2003-12-19 at 06:38, Ross Dickson wrote:
> On Friday 19 December 2003 00:22, Craig Bradney wrote:
> > Just as an FYI, still going strong here with the old api and ioapic
> > patches. 5d 20h now.
> > 
> > When the official 2.6.0 comes to Gentoo Linux I can try that with
> > whatever patches people are finding stable for these nforce fixes.
> > 
> > Anyone had any luck in talking to ASUS re a BIOS update?
> > 
> > Craig
> > 
> 
> I have not talked to ASUS. I note from peoples postings that with the
> latest award bios we may need no apic patches (C1 disconnect auto),
> just an ioapic one to work round a buggy bios. I don't think you can run
> nmi_watchdog=1 with the old io-apic (not of my doing) patch.
> 
> I have pheonix bios MOBOS from albatron and epox so award bios doesn't help me.
> No disconnect options available in setup.
> My apic ack delay patch lets the bios have its disconnect on and keep the cpu a
> few degrees cooler besides whatever else it and the nforce2 chipset might want
> to control it for.
> 
> I have been advised my query wrt my apic ack delay patch is progressing
> with AMD but I have nothing technical to report on it.
> 
> I have made and am trialling, but have not yet posted a kernel arg controlled
> version combining my v1 and v2 apic ack delay patches. This would be better
> than what I have released in the past because people can fix bioses as the
> fixes become available and use timer ack delay in the mean time.
> Of course there is still athcool and the earlier disconnect patch to force 
> things if desired.
> 
> Regards
> Ross.

Ok Ross. Well, Gentoo's 2.6 is out now so whenever you want me to test
your new patch I can try it. Ive been looking back through the list for
the updated patches but things seemed to have changed here and there
even for the v2 patches so I think I'll wait for the next round of
patchesas things seem a little confusing.

2.6test11 is still running happily.. 6d15h now.

Craig


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-18 14:22   ` Craig Bradney
@ 2003-12-19  5:38     ` Ross Dickson
  2003-12-19 10:36       ` Craig Bradney
  0 siblings, 1 reply; 62+ messages in thread
From: Ross Dickson @ 2003-12-19  5:38 UTC (permalink / raw)
  To: Craig Bradney, Maciej W. Rozycki; +Cc: george, linux-kernel

On Friday 19 December 2003 00:22, Craig Bradney wrote:
> Just as an FYI, still going strong here with the old api and ioapic
> patches. 5d 20h now.
> 
> When the official 2.6.0 comes to Gentoo Linux I can try that with
> whatever patches people are finding stable for these nforce fixes.
> 
> Anyone had any luck in talking to ASUS re a BIOS update?
> 
> Craig
> 

I have not talked to ASUS. I note from peoples postings that with the
latest award bios we may need no apic patches (C1 disconnect auto),
just an ioapic one to work round a buggy bios. I don't think you can run
nmi_watchdog=1 with the old io-apic (not of my doing) patch.

I have pheonix bios MOBOS from albatron and epox so award bios doesn't help me.
No disconnect options available in setup.
My apic ack delay patch lets the bios have its disconnect on and keep the cpu a
few degrees cooler besides whatever else it and the nforce2 chipset might want
to control it for.

I have been advised my query wrt my apic ack delay patch is progressing
with AMD but I have nothing technical to report on it.

I have made and am trialling, but have not yet posted a kernel arg controlled
version combining my v1 and v2 apic ack delay patches. This would be better
than what I have released in the past because people can fix bioses as the
fixes become available and use timer ack delay in the mean time.
Of course there is still athcool and the earlier disconnect patch to force 
things if desired.

Regards
Ross.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-18 14:32     ` Maciej W. Rozycki
@ 2003-12-19  4:17       ` Ross Dickson
  2003-12-19 15:35         ` Maciej W. Rozycki
  0 siblings, 1 reply; 62+ messages in thread
From: Ross Dickson @ 2003-12-19  4:17 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: George Anzinger, linux-kernel

On Friday 19 December 2003 00:32, Maciej W. Rozycki wrote:
> On Thu, 18 Dec 2003, Ross Dickson wrote:
> 
> > I grabbed the manuals that google search found.  By the look of it what I had
> > covered P3 and earlier. Yours are more up to date and cover P4.
> 
>  Newer manuals sometimes lack details that are present in older ones.  If
> you want to have a thorough view of the APIC, you certainly want to have
> all four variations of processor manuals, i.e. the one for P4+, the one
> for PII+, the one for PPro and the one for Pentium.  Plus manuals for the
> I/O APIC, e.g. the one for the i82093AA and perhaps for ones embedded into
> various chipsets.  All of them are or used to be available online.  If you
> want to go back to the i82489DX, there is a datasheet and a programming
> manual for the part, which are IMO the most exhaustive descriptions,
> though the implementation differed a bit from newer ones (the chip was so
> far the most powerful implementation of the APIC).  These were
> unfortunately never available online.
> 

Point taken, I generally play embedded MPU where the codebase matches the 
specific hardware version and one set of docs suffice, although it is not
uncommon to rediscover an unpublished bug.

This one codebase for all hardware certainly is a lot more work!

Do you know if the Athlon apic programming docs are available or under NDA?
I do not even want to ask about the nvidia nforce2 chipset.

Regards
Ross.

> -- 
> +  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
> +--------------------------------------------------------------+
> +        e-mail: macro@ds2.pg.gda.pl, PGP key available        +
> 
> 
> 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-18 14:04 ` Maciej W. Rozycki
  2003-12-18 14:22   ` Craig Bradney
@ 2003-12-19  4:06   ` Ross Dickson
  2003-12-19 15:33     ` Maciej W. Rozycki
  1 sibling, 1 reply; 62+ messages in thread
From: Ross Dickson @ 2003-12-19  4:06 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: george, linux-kernel

On Friday 19 December 2003 00:04, Maciej W. Rozycki wrote:
> On Thu, 18 Dec 2003, Ross Dickson wrote:
> 
> > Here is where to find Intel's MP arch spec Maceij mentions.
> > I had to find it recently wrt nforce2 issues
> > 
> > http://www.intel.com/design/pentium/datashts/24201606.pdf
> > 
> > Section 3.6.1 Apic Architecture is relevant
> > particularly
> > Section 3.6.2.2 Virtual Wire Mode
> 
>  BTW, I have revision 1.1 as well in case anyone was interested in the 
> differences.

Yes please if it is forwardable or downloadable.

> 
> > I would like to add a footnote to highlight a potential gotcha as I
> > understand it.
> > 
> > To clarify, the xt pic 8259A does not in itself have a transparent mode
> > as would a logic buffer or inverter. It always needs inta cycles to
> > function. In PIC mode it is wired to processor pins as per old 8086 and
> > original cpu architecture provides the inta cycles to it (bypasses apic,
> > apic seems off).
> 
>  It does have such a mode. ;-)  You just have not to ack a pending
> interrupt -- if a request goes away, the INT output gets deasserted as
> well.  We are super cautious though and we reprogram the 8259A into the
> AEOI mode to prevent a lockup in case INTA cycles escape to the 8259A
> (which is theoretically possible for a broken design of an i82489DX-based
> system).  See the 8259A datasheet for details.
> 

I believe what you have written because you say it is how the code works.

I take it you mean that the INT is either never latched? or only latched with IS bit
after receipt of first INTA?

It is not obvious in 8259A Datasheet published in Intel Microsystem Components
Handbook Volume 1 1983 nor in datasheet December 1988.

I note the data sheet is almost silent on the topic of INT behaviour without
INTA cycle.

Almost, as the WAVEFORMS diagrams which have INT displayed only
show it in conjunction with the INTA and under said diagram I read

NOTES: Interrupt output must remain HIGH at least until leading edge of first 
INTA.
implying it can go low for some reason?

And the,
1. Cycle 1 in iAPX 86 , ....
Only indicates its trailing edge is synchrouous with the machine cycle.

Figure 10 in the data sheet does not help either as when the IR goes low it has
a LATCH* ARMED notation which I took to mean the INT output was then 
latched.
 
I now think it was referring to the transparent D type latch "REQUEST LATCH"
in the priority cell diagram but I cannot see a footnote to the *.

Could you please point me to the document where it is made clear? It may be
in the i82489DX docs as I do not have them or in a later 8259A data sheet
revision?

Thanks,
Ross.

> > I certainly agree with Marceij's comments that mixed mode of having 8254 PIT
> > routed via the 8259A was never meant to occur alongside ioapic handling of
> > the other interrupts. It is very problematic not to mention confusing. 
> 
>  Well, the true "mixed mode", i.e. where certain interrupts are delivered
> as I/O APIC (either LoPri or Fixed) interrupts and others are routed
> through an 8259A controller and delivered as ExtINTA interrupts was
> specifically designed to work since the i8248DX APIC.  What wasn't 
> designed but works by the properties of the 8259A PIC is the transparent
> "through-8259A" mode.

Clarified thanks.

> 
>   Maciej
> 
> -- 
> +  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
> +--------------------------------------------------------------+
> +        e-mail: macro@ds2.pg.gda.pl, PGP key available        +
> 
> 
> 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-18  1:30   ` Ross Dickson
@ 2003-12-18 14:32     ` Maciej W. Rozycki
  2003-12-19  4:17       ` Ross Dickson
  0 siblings, 1 reply; 62+ messages in thread
From: Maciej W. Rozycki @ 2003-12-18 14:32 UTC (permalink / raw)
  To: Ross Dickson; +Cc: George Anzinger, linux-kernel

On Thu, 18 Dec 2003, Ross Dickson wrote:

> I grabbed the manuals that google search found.  By the look of it what I had
> covered P3 and earlier. Yours are more up to date and cover P4.

 Newer manuals sometimes lack details that are present in older ones.  If
you want to have a thorough view of the APIC, you certainly want to have
all four variations of processor manuals, i.e. the one for P4+, the one
for PII+, the one for PPro and the one for Pentium.  Plus manuals for the
I/O APIC, e.g. the one for the i82093AA and perhaps for ones embedded into
various chipsets.  All of them are or used to be available online.  If you
want to go back to the i82489DX, there is a datasheet and a programming
manual for the part, which are IMO the most exhaustive descriptions,
though the implementation differed a bit from newer ones (the chip was so
far the most powerful implementation of the APIC).  These were
unfortunately never available online.

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-18 14:04 ` Maciej W. Rozycki
@ 2003-12-18 14:22   ` Craig Bradney
  2003-12-19  5:38     ` Ross Dickson
  2003-12-19  4:06   ` Ross Dickson
  1 sibling, 1 reply; 62+ messages in thread
From: Craig Bradney @ 2003-12-18 14:22 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Ross Dickson, george, linux-kernel

Just as an FYI, still going strong here with the old api and ioapic
patches. 5d 20h now.

When the official 2.6.0 comes to Gentoo Linux I can try that with
whatever patches people are finding stable for these nforce fixes.

Anyone had any luck in talking to ASUS re a BIOS update?

Craig

On Thu, 2003-12-18 at 15:04, Maciej W. Rozycki wrote:
> On Thu, 18 Dec 2003, Ross Dickson wrote:
> 
> > Here is where to find Intel's MP arch spec Maceij mentions.
> > I had to find it recently wrt nforce2 issues
> > 
> > http://www.intel.com/design/pentium/datashts/24201606.pdf
> > 
> > Section 3.6.1 Apic Architecture is relevant
> > particularly
> > Section 3.6.2.2 Virtual Wire Mode
> 
>  BTW, I have revision 1.1 as well in case anyone was interested in the 
> differences.
> 
> > I would like to add a footnote to highlight a potential gotcha as I
> > understand it.
> > 
> > To clarify, the xt pic 8259A does not in itself have a transparent mode
> > as would a logic buffer or inverter. It always needs inta cycles to
> > function. In PIC mode it is wired to processor pins as per old 8086 and
> > original cpu architecture provides the inta cycles to it (bypasses apic,
> > apic seems off).
> 
>  It does have such a mode. ;-)  You just have not to ack a pending
> interrupt -- if a request goes away, the INT output gets deasserted as
> well.  We are super cautious though and we reprogram the 8259A into the
> AEOI mode to prevent a lockup in case INTA cycles escape to the 8259A
> (which is theoretically possible for a broken design of an i82489DX-based
> system).  See the 8259A datasheet for details.
> 
> > I certainly agree with Marceij's comments that mixed mode of having 8254 PIT
> > routed via the 8259A was never meant to occur alongside ioapic handling of
> > the other interrupts. It is very problematic not to mention confusing. 
> 
>  Well, the true "mixed mode", i.e. where certain interrupts are delivered
> as I/O APIC (either LoPri or Fixed) interrupts and others are routed
> through an 8259A controller and delivered as ExtINTA interrupts was
> specifically designed to work since the i8248DX APIC.  What wasn't 
> designed but works by the properties of the 8259A PIC is the transparent
> "through-8259A" mode.
> 
>   Maciej


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-17 18:14 Ross Dickson
  2003-12-17 21:41 ` George Anzinger
  2003-12-17 21:48 ` George Anzinger
@ 2003-12-18 14:04 ` Maciej W. Rozycki
  2003-12-18 14:22   ` Craig Bradney
  2003-12-19  4:06   ` Ross Dickson
  2 siblings, 2 replies; 62+ messages in thread
From: Maciej W. Rozycki @ 2003-12-18 14:04 UTC (permalink / raw)
  To: Ross Dickson; +Cc: george, linux-kernel

On Thu, 18 Dec 2003, Ross Dickson wrote:

> Here is where to find Intel's MP arch spec Maceij mentions.
> I had to find it recently wrt nforce2 issues
> 
> http://www.intel.com/design/pentium/datashts/24201606.pdf
> 
> Section 3.6.1 Apic Architecture is relevant
> particularly
> Section 3.6.2.2 Virtual Wire Mode

 BTW, I have revision 1.1 as well in case anyone was interested in the 
differences.

> I would like to add a footnote to highlight a potential gotcha as I
> understand it.
> 
> To clarify, the xt pic 8259A does not in itself have a transparent mode
> as would a logic buffer or inverter. It always needs inta cycles to
> function. In PIC mode it is wired to processor pins as per old 8086 and
> original cpu architecture provides the inta cycles to it (bypasses apic,
> apic seems off).

 It does have such a mode. ;-)  You just have not to ack a pending
interrupt -- if a request goes away, the INT output gets deasserted as
well.  We are super cautious though and we reprogram the 8259A into the
AEOI mode to prevent a lockup in case INTA cycles escape to the 8259A
(which is theoretically possible for a broken design of an i82489DX-based
system).  See the 8259A datasheet for details.

> I certainly agree with Marceij's comments that mixed mode of having 8254 PIT
> routed via the 8259A was never meant to occur alongside ioapic handling of
> the other interrupts. It is very problematic not to mention confusing. 

 Well, the true "mixed mode", i.e. where certain interrupts are delivered
as I/O APIC (either LoPri or Fixed) interrupts and others are routed
through an 8259A controller and delivered as ExtINTA interrupts was
specifically designed to work since the i8248DX APIC.  What wasn't 
designed but works by the properties of the 8259A PIC is the transparent
"through-8259A" mode.

  Maciej

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-17 21:48 ` George Anzinger
@ 2003-12-18  1:30   ` Ross Dickson
  2003-12-18 14:32     ` Maciej W. Rozycki
  0 siblings, 1 reply; 62+ messages in thread
From: Ross Dickson @ 2003-12-18  1:30 UTC (permalink / raw)
  To: George Anzinger; +Cc: Maciej W. Rozycki, linux-kernel

On Thursday 18 December 2003 07:48, George Anzinger wrote:
> Ross Dickson wrote:
> > 
> > Section 7.5.11 covers it
> > 24319202.pdf available here
> 
> I wonder if you might know the difference between the 243190/1/2 and the 
> 245470/1/2 manuals.  I have hard copies of the ladder.
>
I grabbed the manuals that google search found.  By the look of it what I had
covered P3 and earlier. Yours are more up to date and cover P4.
I have since found them on the web

http://www.intel.com/design/pentium4/manuals/245470.htm

Regards
Ross.

> 
> 
> -- 
> George Anzinger   george@mvista.com
> High-res-timers:  http://sourceforge.net/projects/high-res-timers/
> Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-17 18:14 Ross Dickson
  2003-12-17 21:41 ` George Anzinger
@ 2003-12-17 21:48 ` George Anzinger
  2003-12-18  1:30   ` Ross Dickson
  2003-12-18 14:04 ` Maciej W. Rozycki
  2 siblings, 1 reply; 62+ messages in thread
From: George Anzinger @ 2003-12-17 21:48 UTC (permalink / raw)
  To: ross; +Cc: Maciej W. Rozycki, linux-kernel

Ross Dickson wrote:
> 
> Section 7.5.11 covers it
> 24319202.pdf available here

I wonder if you might know the difference between the 243190/1/2 and the 
245470/1/2 manuals.  I have hard copies of the ladder.



-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-17 18:14 Ross Dickson
@ 2003-12-17 21:41 ` George Anzinger
  2003-12-17 21:48 ` George Anzinger
  2003-12-18 14:04 ` Maciej W. Rozycki
  2 siblings, 0 replies; 62+ messages in thread
From: George Anzinger @ 2003-12-17 21:41 UTC (permalink / raw)
  To: ross, Maciej W. Rozycki; +Cc: linux-kernel

I really want to thank you both for all this information.  It will make the code 
and comments much easier to understand.

Thanks again

George

Ross Dickson wrote:
>>On Tue, 16 Dec 2003, George Anzinger wrote: 
> 
>  
> 
> 
> 
>>>How confusing :( Could you give me some idea how this works? I have tried 
>>>disable_irq(0) and, as best as I can tell, it does not do the trick. The 
>>>confusion I have is understanding where in the chain of hardware each of these 
>>>thing is taking place. 
> 
> 
> Here is where to find Intel's MP arch spec Maceij mentions.
> I had to find it recently wrt nforce2 issues
> 
> http://www.intel.com/design/pentium/datashts/24201606.pdf
> 
> Section 3.6.1 Apic Architecture is relevant
> particularly
> Section 3.6.2.2 Virtual Wire Mode
> 
> <snip>
> great diagram!
> 
> <snip>
> 
>>If the above variant does not work, as a last resort, the path for the 
>>8254 timer interrupt is via the 8259 reconfigured back into its usual mode 
>>and then LINT0 of the BSP reconfigured for an ExtINTA APIC interrupt. 
>>Additionally, since at this point the glue logic has probably already 
>>locked up due to the messing done above, a few artiffical sets of double 
>>INTA cycles are sent to the system bus using the RTC chip and INTIN8 
>>reconfigured temporarily to send ExtINTA APIC interrupts via the 
>>inter-APIC bus. 
> 
>  
> 
>>I do hope a thorough read of the description will make the available 
>>variants clear. The I/O APIC input numbers may differ but so far they are 
>>almost always as noted above. 
> 
>  
> 
>> Maciej 
> 
> 
> All good.
> 
> I would like to add a footnote to highlight a potential gotcha as I understand it.
> 
> To clarify, the xt pic 8259A does not in itself have a transparent mode as would
> a logic buffer or inverter. It always needs inta cycles to function. In PIC mode
> it is wired to processor pins as per old 8086 and original cpu architecture
> provides the inta cycles to it (bypasses apic, apic seems off).
> 
> In virtual wire mode with the 8259A output wired either to a local apic pin on cpu
> or through the io-apic. In this mode it is the local apic which has to provide the 
> inta cycles on the bus back to the 8259A for it to function correctly. 
> 
> The delivery mode has to be set to ExtInt for the register associated with the pin
> that the 8259A output (int on Maceij diagram) is connected to. This is the only
> way to force the apic to deliver the inta cycles to the 8259A and that is how it
> appears transparent to the system. Spec says there can only be one source 
> register (local apic) or redirection register (ioapic) of mode ExtInt per system
> regardless of how many local apic and io-apic pins it (int on Maceij diagram)
> is connected to. 
> 
> Gotcha: If none are set to ExtInt then the 8259A will hang for lack of IntA 
> cycles.
> 
> Section 7.5.11 covers it
> 24319202.pdf available here
> 
> http://www.intel.com/design/pentiumii/manuals/243192.htm
> 
> Why only one Extint source in virtual wire mode?:
> 
> The 8259A in X86 architecture systems needs two inta cycles per interrupt event.
> Do not confuse them with the EOI which is software, the inta is purely hardware.
> It only works properly with one source causing inta cycles. Docs I have do not
> say what happens with more than one source.
> 
> How 8259A works in a nutshell (it is more complex in cascade mode).
> 
> First the 8259A gets a request from H/ware and if unmasked etc generates its int 
> (int on Maceij diagram) out.  8259A then sits there waiting for Inta from cpu 
> (PIC mode) or local apic (Virtual wire mode). When the inta arrives the 8259A
> latches its internal ISR bit and waits for second inta. When second inta arrives
> it outputs a vector onto the data bus indicating which ISR bit was set. 
> 
> If the request from H/ware is still active when the first inta arrives then we get
> the correct vector number.
> 
> If it is NOT still exerted then its tough luck and the vector we get 7 for the first
> 8259A or 15 for the second 8259A and it is too late to try and find out where
> the real source was, hence the spurious irq7 messages and corresponding
> irq 7 count increase.
> 
> It is pretty bad when the apic system that is handling the 8259A in virtual
> wire mode cannot get the inta to the 8259A in time while the int request
> hardware is still exerting but it happens.
> 
> I certainly agree with Marceij's comments that mixed mode of having 8254 PIT
> routed via the 8259A was never meant to occur alongside ioapic handling of
> the other interrupts. It is very problematic not to mention confusing. 
> 
> I do not know how smoothly the apic handles the 8259A if you would be turning
> that source on and off frequently.
> 
> Regards
> Ross Dickson
> 
> 
> 
> 

-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
@ 2003-12-17 18:14 Ross Dickson
  2003-12-17 21:41 ` George Anzinger
                   ` (2 more replies)
  0 siblings, 3 replies; 62+ messages in thread
From: Ross Dickson @ 2003-12-17 18:14 UTC (permalink / raw)
  To: Maciej W. Rozycki, george; +Cc: linux-kernel

>On Tue, 16 Dec 2003, George Anzinger wrote: 
 


>> How confusing :( Could you give me some idea how this works? I have tried 
> > disable_irq(0) and, as best as I can tell, it does not do the trick. The 
> > confusion I have is understanding where in the chain of hardware each of these 
> > thing is taking place. 

Here is where to find Intel's MP arch spec Maceij mentions.
I had to find it recently wrt nforce2 issues

http://www.intel.com/design/pentium/datashts/24201606.pdf

Section 3.6.1 Apic Architecture is relevant
particularly
Section 3.6.2.2 Virtual Wire Mode

<snip>
great diagram!

<snip>
> If the above variant does not work, as a last resort, the path for the 
> 8254 timer interrupt is via the 8259 reconfigured back into its usual mode 
> and then LINT0 of the BSP reconfigured for an ExtINTA APIC interrupt. 
> Additionally, since at this point the glue logic has probably already 
> locked up due to the messing done above, a few artiffical sets of double 
> INTA cycles are sent to the system bus using the RTC chip and INTIN8 
> reconfigured temporarily to send ExtINTA APIC interrupts via the 
> inter-APIC bus. 
 
> I do hope a thorough read of the description will make the available 
> variants clear. The I/O APIC input numbers may differ but so far they are 
> almost always as noted above. 
 
>  Maciej 

All good.

I would like to add a footnote to highlight a potential gotcha as I understand it.

To clarify, the xt pic 8259A does not in itself have a transparent mode as would
a logic buffer or inverter. It always needs inta cycles to function. In PIC mode
it is wired to processor pins as per old 8086 and original cpu architecture
provides the inta cycles to it (bypasses apic, apic seems off).

In virtual wire mode with the 8259A output wired either to a local apic pin on cpu
or through the io-apic. In this mode it is the local apic which has to provide the 
inta cycles on the bus back to the 8259A for it to function correctly. 

The delivery mode has to be set to ExtInt for the register associated with the pin
that the 8259A output (int on Maceij diagram) is connected to. This is the only
way to force the apic to deliver the inta cycles to the 8259A and that is how it
appears transparent to the system. Spec says there can only be one source 
register (local apic) or redirection register (ioapic) of mode ExtInt per system
regardless of how many local apic and io-apic pins it (int on Maceij diagram)
is connected to. 

Gotcha: If none are set to ExtInt then the 8259A will hang for lack of IntA 
cycles.

Section 7.5.11 covers it
24319202.pdf available here

http://www.intel.com/design/pentiumii/manuals/243192.htm

Why only one Extint source in virtual wire mode?:

The 8259A in X86 architecture systems needs two inta cycles per interrupt event.
Do not confuse them with the EOI which is software, the inta is purely hardware.
It only works properly with one source causing inta cycles. Docs I have do not
say what happens with more than one source.

How 8259A works in a nutshell (it is more complex in cascade mode).

First the 8259A gets a request from H/ware and if unmasked etc generates its int 
(int on Maceij diagram) out.  8259A then sits there waiting for Inta from cpu 
(PIC mode) or local apic (Virtual wire mode). When the inta arrives the 8259A
latches its internal ISR bit and waits for second inta. When second inta arrives
it outputs a vector onto the data bus indicating which ISR bit was set. 

If the request from H/ware is still active when the first inta arrives then we get
the correct vector number.

If it is NOT still exerted then its tough luck and the vector we get 7 for the first
8259A or 15 for the second 8259A and it is too late to try and find out where
the real source was, hence the spurious irq7 messages and corresponding
irq 7 count increase.

It is pretty bad when the apic system that is handling the 8259A in virtual
wire mode cannot get the inta to the 8259A in time while the int request
hardware is still exerting but it happens.

I certainly agree with Marceij's comments that mixed mode of having 8254 PIT
routed via the 8259A was never meant to occur alongside ioapic handling of
the other interrupts. It is very problematic not to mention confusing. 

I do not know how smoothly the apic handles the 8259A if you would be turning
that source on and off frequently.

Regards
Ross Dickson




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-13  3:56 Ross Dickson
@ 2003-12-15 13:16 ` Maciej W. Rozycki
  0 siblings, 0 replies; 62+ messages in thread
From: Maciej W. Rozycki @ 2003-12-15 13:16 UTC (permalink / raw)
  To: Ross Dickson; +Cc: george, linux-kernel

On Sat, 13 Dec 2003, Ross Dickson wrote:

> Please consider adding
> 
> 2c. Alternatively the OUT0 output of the 8254 PIT (IOW the timer source) may be 
> directly connected to the INTIN0 input of the first I/O APIC. 
> 
> which we have found for nforce2 boards.

 Actually the code can handle routing to any INTIN pins, so the whole text
needs to be reworded.  It's just that I've got used to INTIN0 and INTIN2
after that many years. ;-)

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
@ 2003-12-13  3:56 Ross Dickson
  2003-12-15 13:16 ` Maciej W. Rozycki
  0 siblings, 1 reply; 62+ messages in thread
From: Ross Dickson @ 2003-12-13  3:56 UTC (permalink / raw)
  To: george; +Cc: Maciej W. Rozycki, linux-kernel

>Having had cause to try and figure out all this, I vote for the following being 
> included in the source somewhere... 
>-g 

Please consider adding

2c. Alternatively the OUT0 output of the 8254 PIT (IOW the timer source) may be 
directly connected to the INTIN0 input of the first I/O APIC. 

which we have found for nforce2 boards.
ref:

http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/2375.html

Ross Dickson


>bill davidsen wrote: 
> > In article <Pine.LNX.4.55.0312101421540.31543@jurand.ds.pg.gda.pl>, 
> > Maciej W. Rozycki <macro@ds2.pg.gda.pl> wrote: 
> > 
> > | The I/O APIC NMI watchdog utilizes the property of being transparent to a 
> > | single IRQ source of a specially reconfigured 8259A PIC (the master one in 
> > | the IA32 PC architecture). There are more prerequisites that have to be 
> > | met and all indeed are for a 100% compatible PC as specified by the 
> > | Intel's Multiprocessor Specification. 
> > | 
> > | 1. The INT output of the master 8259A PIC has to be connected to the LINT0 
> > | (or LINTIN0; the name varies by implementations) inputs of all local APICs 
> > | in the system. 
> > | 
> > | 2a. The OUT0 output of the 8254 PIT (IOW the timer source) has to be 
> > | directly connected to the INTIN2 input of the first I/O APIC. 
> > | 
> > | 2b. Alternatively the INT output of the master 8259A PIC has to be 
> > | connected to the INTIN0 input of the first I/O APIC. 
> > | 
> > | 3. There must be no glue logic that would change logical properties of the 
> > | signal between the INT output of the master 8259A PIC and the respective 
> > | APIC interrupt inputs. 
> > | 
> > | In practice, assuming the MP IRQ routing information provided the BIOS has 
> > | been correct (which is not always the case), prerequisites #1 and #2 have 
> > | been met so far, but #3 has proved to be occasionally problematic. 
> > 
> > In practice many system seem to take a good bit of guessing and testing. 
> > I have an old P-II which only works with acpi=force and nmi_watchdog=2, 
> > for instance. 
> > 
> > It would be nice if there were a program which could poke at the 
> > hardware and suggest options which might work, as in eliminating the 
> > ones which can be determined not to work. Absent that trial and error 
> > rule, unfortunately. 
 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-10  6:14       ` Bob
@ 2003-12-10  7:51         ` Craig Bradney
  0 siblings, 0 replies; 62+ messages in thread
From: Craig Bradney @ 2003-12-10  7:51 UTC (permalink / raw)
  To: Bob; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4535 bytes --]

Hi,

Thanks to all for their replies.. Of course when I got the the PC this
morning.. hung. About 4 days uptime with the old IRQ0 patch it was ok
until 2am this morning.

So.. I have enabled preempt now.. and as for the patches I have put the
two 2.6test11 patches in that Jesse Allen attached for APIC and IO_APIC
that were I think originally created by Ross Dickson for 2.4.2x.

Should I also be adding in a CPU Disconnect patch (or running athcool as
theres nothing in my ASUS BIOS)?

Should I be running an ATA133 patch (previous emails indicate yes), and
if so, is there a 2.6test11 patch?

When I boot with nmi_watchdog=1 I get NMI values of about the same as
IRQ0, just a bit less (1500 less at this point). With nmi_watchdog=2, I
get barely any compared to IRQ0. IRQ0/timer is on IO-APIC-edge.

This is from my current boot up, with nmi 1, 
           CPU0
  0:     344998    IO-APIC-edge  timer
  1:       1517    IO-APIC-edge  i8042
  2:          0          XT-PIC  cascade
  8:          2    IO-APIC-edge  rtc
  9:          0   IO-APIC-level  acpi
 12:       5313    IO-APIC-edge  i8042
 14:      10179    IO-APIC-edge  ide0
 15:        927    IO-APIC-edge  ide1
 19:      23551   IO-APIC-level  radeon@PCI:3:0:0
 21:       3882   IO-APIC-level  ehci_hcd, NVidia nForce2, eth0
 22:          3   IO-APIC-level  ohci1394
NMI:     343501
LOC:     343354
ERR:          0
MIS:          0

I have attached my dmesg outputs from the starts ups with the two nmi
options.

regards
Craig

On Wed, 2003-12-10 at 07:14, Bob wrote: 
> Craig Bradney wrote:
> 
> >What do the IDE ones[patches] claim to fix? I have had no real issue with IDE at
> >all.. being able to burn CDs, DVDs, use my ATA133 drive for hdparm,
> >greps, compilation, and general use.....
> >
> >Craig
> >
> 
> These patches belong together because the same
> necessity is the mother of their invention.
> 
> You may not have an offboard promise or sis hd
> controller.
> 
> Alan Cox looked at "nforce2 irq storm" and the
> offboard promise and sis controllers exposing
> that dma operations might be running out of
> time(time? timing..."timer"? a timer is a given
> so "timer" was unthinkable!) waiting for irq
> availability. That was months ago. It was only
> evident that giving a "bight of slack(1)" to those
> ops could help slightly, but we have a timer in
> any case, don't we?
> 
> One person with a timer patch may backed into
> the nforce2 solution while just trying to get
> nmi_watchdog to work, right?
> 
> Ian Kumlien looks most likely to reason the problem
> all the way through(2).
> 
> -Bob D
> 
> (1) "give me a bight of slack"
>      "ah, for a bitty byte of pre-unicode slack loop"
>   http://www.bartleby.com/61/13/B0241300.html
> 
> *bight*
>  
> 
> PRONUNCIATION <http://www.bartleby.com/61/12.html>: 	  
> <http://www.bartleby.com/61/wavs/13/B0241300.wav> bt
> NOUN: 	*1**a.* A loop in a rope. *b.* The middle or slack part of an 
> extended rope. *2**a.* A bend or curve, especially in a shoreline. *b.* 
> A wide bay formed by such a bend or curve.
> ETYMOLOGY: 	Middle English, bend, angle, from Old English /byht/. See 
> *bheug- <http://www.bartleby.com/61/roots/IE63.html>* in Appendix I.
> 
> 
> (2)  voted most likely to finesse through on a level above monkeys
> 
>  From Ian Kumlien:
> 
> I did some reading on amd's site, and if the disconnect + apic fixed the
> same problem as the ~500ns delay, then it could be as i suspect...
> 
> I suspect that something goes wrong with apic ack when the cpu is
> disconnected and according to the amd docs we could check the
> Northbridge's CLKFWDRST or isn't that avail on the outside?
> (It would be interesting to see if that fixes the problem as well.)
> 
> http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26237.PDF
> 
> I don't really have the knowledge but it would sure be nicer to fix this
> by checking this than to just disable it. I dunno if there is something
> we could do from within the kernel aswell with the sending of HLT but i
> doubt it.
> 
> Anyways, we need a generalized patch that does better checking on the
> NMI bit (like Ross' patch). 
> 
> PS. Anyone that can point me to northbridge tech docks? and CC
> 
> -- Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

[-- Attachment #2: dmesg_afterpatch_r3_nmi_1 --]
[-- Type: text/plain, Size: 15364 bytes --]

000 Nvidia                                    ) @ 0x000f75e0
ACPI: RSDT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x1fff3000
ACPI: FADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x1fff3040
ACPI: MADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x1fff74c0
ACPI: DSDT (v001 NVIDIA AWRDACPI 0x00001000 MSFT 0x0100000e) @ 0x00000000
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 6:10 APIC version 16
ACPI: LAPIC_NMI (acpi_id[0x00] polarity[0x1] trigger[0x1] lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] global_irq_base[0x0])
IOAPIC[0]: Assigned apic_id 2
IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, IRQ 0-23
ACPI: INT_SRC_OVR (bus[0] irq[0x0] global_irq[0x2] polarity[0x0] trigger[0x0])
ACPI: INT_SRC_OVR (bus[0] irq[0x9] global_irq[0x9] polarity[0x1] trigger[0x3])
ACPI: INT_SRC_OVR (bus[0] irq[0xe] global_irq[0xe] polarity[0x1] trigger[0x1])
ACPI: INT_SRC_OVR (bus[0] irq[0xf] global_irq[0xf] polarity[0x1] trigger[0x1])
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Using ACPI (MADT) for SMP configuration information
Building zonelist for node : 0
Kernel command line: root=/dev/hda6 nmi_watchdog=1
Initializing CPU#0
PID hash table entries: 2048 (order 11: 16384 bytes)
Detected 1913.382 MHz processor.
Console: colour VGA+ 80x25
Memory: 514424k/524224k available (2563k kernel code, 9052k reserved, 933k data, 168k init, 0k highmem)
Calibrating delay loop... 3784.70 BogoMIPS
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
CPU:     After generic identify, caps: 0383fbff c1c3fbff 00000000 00000000
CPU:     After vendor identify, caps: 0383fbff c1c3fbff 00000000 00000000
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU:     After all inits, caps: 0383fbff c1c3fbff 00000000 00000020
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: AMD Athlon(tm) XP 2600+ stepping 00
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
ENABLING IO-APIC IRQs
init IO_APIC IRQs
 IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected.
..TIMER: vector=0x31 pin1=2 pin2=-1
..MP-BIOS bug: 8254 timer not connected to IO-APIC
..TIMER: Is timer irq0 connected to IOAPIC Pin0? ...
IOAPIC[0]: Set PCI routing entry (2-0 -> 0x31 -> IRQ 0 Mode:0 Active:0)
activating NMI Watchdog ... done.
testing NMI watchdog ... OK.
..TIMER: works OK on apic pin0 irq0
number of MP IRQ sources: 15.
number of IO-APIC #2 registers: 24.
testing the IO APIC.......................
IO APIC #2......
.... register #00: 02000000
.......    : physical APIC id: 02
.......    : Delivery Type: 0
.......    : LTS          : 0
.... register #01: 00170011
.......     : max redirection entries: 0017
.......     : PRQ implemented: 0
.......     : IO APIC version: 0011
.... register #02: 00000000
.......     : arbitration: 00
.... IRQ redirection table:
 NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:   
 00 001 01  0    0    0   0   0    1    1    31
 01 001 01  0    0    0   0   0    1    1    39
 02 000 00  0    0    0   0   0    0    0    00
 03 001 01  0    0    0   0   0    1    1    41
 04 001 01  0    0    0   0   0    1    1    49
 05 001 01  0    0    0   0   0    1    1    51
 06 001 01  0    0    0   0   0    1    1    59
 07 001 01  0    0    0   0   0    1    1    61
 08 001 01  0    0    0   0   0    1    1    69
 09 001 01  1    1    0   0   0    1    1    71
 0a 001 01  0    0    0   0   0    1    1    79
 0b 001 01  0    0    0   0   0    1    1    81
 0c 001 01  0    0    0   0   0    1    1    89
 0d 001 01  0    0    0   0   0    1    1    91
 0e 001 01  0    0    0   0   0    1    1    99
 0f 001 01  0    0    0   0   0    1    1    A1
 10 000 00  1    0    0   0   0    0    0    00
 11 000 00  1    0    0   0   0    0    0    00
 12 000 00  1    0    0   0   0    0    0    00
 13 000 00  1    0    0   0   0    0    0    00
 14 000 00  1    0    0   0   0    0    0    00
 15 000 00  1    0    0   0   0    0    0    00
 16 000 00  1    0    0   0   0    0    0    00
 17 000 00  1    0    0   0   0    0    0    00
IRQ to pin mappings:
IRQ0 -> 0:2-> 0:0
IRQ1 -> 0:1
IRQ3 -> 0:3
IRQ4 -> 0:4
IRQ5 -> 0:5
IRQ6 -> 0:6
IRQ7 -> 0:7
IRQ8 -> 0:8
IRQ9 -> 0:9
IRQ10 -> 0:10
IRQ11 -> 0:11
IRQ12 -> 0:12
IRQ13 -> 0:13
IRQ14 -> 0:14
IRQ15 -> 0:15
.................................... done.
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 1912.0876 MHz.
..... host bus clock speed is 332.0674 MHz.
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xfb490, last bus=3
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20031002
IOAPIC[0]: Set PCI routing entry (2-9 -> 0x71 -> IRQ 9 Mode:1 Active:0)
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (00:00)
PCI: Probing PCI hardware (bus 00)
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.HUB0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.AGPB._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.HUB1._PRT]
ACPI: PCI Interrupt Link [LNK1] (IRQs 3 4 5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNK2] (IRQs 3 4 5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNK3] (IRQs 3 4 5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNK4] (IRQs 3 4 *5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNK5] (IRQs 3 4 5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LUBA] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LUBB] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LMAC] (IRQs 3 4 5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LAPU] (IRQs 3 4 *5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LACI] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LMCI] (IRQs 3 4 5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LSMB] (IRQs 3 4 *5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LUB2] (IRQs 3 4 *5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LFIR] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [L3CM] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LIDE] (IRQs 3 4 5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [APC1] (IRQs 16)
ACPI: PCI Interrupt Link [APC2] (IRQs 17)
ACPI: PCI Interrupt Link [APC3] (IRQs 18)
ACPI: PCI Interrupt Link [APC4] (IRQs *19)
ACPI: PCI Interrupt Link [APC5] (IRQs 16)
ACPI: PCI Interrupt Link [APCF] (IRQs 20 21 22)
ACPI: PCI Interrupt Link [APCG] (IRQs 20 21 22)
ACPI: PCI Interrupt Link [APCH] (IRQs 20 21 22)
ACPI: PCI Interrupt Link [APCI] (IRQs 20 21 22)
ACPI: PCI Interrupt Link [APCJ] (IRQs 20 21 22)
ACPI: PCI Interrupt Link [APCK] (IRQs 20 21 22)
ACPI: PCI Interrupt Link [APCS] (IRQs *23)
ACPI: PCI Interrupt Link [APCL] (IRQs 20 21 22)
ACPI: PCI Interrupt Link [APCM] (IRQs 20 21 22)
ACPI: PCI Interrupt Link [AP3C] (IRQs 20 21 22)
ACPI: PCI Interrupt Link [APCZ] (IRQs 20 21 22)
Linux Plug and Play Support v0.97 (c) Adam Belay
SCSI subsystem initialized
drivers/usb/core/usb.c: registered new driver usbfs
drivers/usb/core/usb.c: registered new driver hub
ACPI: PCI Interrupt Link [APCS] enabled at IRQ 23
IOAPIC[0]: Set PCI routing entry (2-23 -> 0xa9 -> IRQ 23 Mode:1 Active:0)
00:00:01[A] -> 2-23 -> IRQ 23
Pin 2-23 already programmed
ACPI: PCI Interrupt Link [APCF] enabled at IRQ 20
IOAPIC[0]: Set PCI routing entry (2-20 -> 0xb1 -> IRQ 20 Mode:1 Active:0)
00:00:02[A] -> 2-20 -> IRQ 20
ACPI: PCI Interrupt Link [APCG] enabled at IRQ 22
IOAPIC[0]: Set PCI routing entry (2-22 -> 0xb9 -> IRQ 22 Mode:1 Active:0)
00:00:02[B] -> 2-22 -> IRQ 22
ACPI: PCI Interrupt Link [APCL] enabled at IRQ 21
IOAPIC[0]: Set PCI routing entry (2-21 -> 0xc1 -> IRQ 21 Mode:1 Active:0)
00:00:02[C] -> 2-21 -> IRQ 21
ACPI: PCI Interrupt Link [APCH] enabled at IRQ 20
Pin 2-20 already programmed
ACPI: PCI Interrupt Link [APCI] enabled at IRQ 22
Pin 2-22 already programmed
ACPI: PCI Interrupt Link [APCJ] enabled at IRQ 21
Pin 2-21 already programmed
ACPI: PCI Interrupt Link [APCK] enabled at IRQ 20
Pin 2-20 already programmed
ACPI: PCI Interrupt Link [APCM] enabled at IRQ 22
Pin 2-22 already programmed
ACPI: PCI Interrupt Link [AP3C] enabled at IRQ 21
Pin 2-21 already programmed
ACPI: PCI Interrupt Link [APCZ] enabled at IRQ 20
Pin 2-20 already programmed
ACPI: PCI Interrupt Link [APC1] enabled at IRQ 16
IOAPIC[0]: Set PCI routing entry (2-16 -> 0xc9 -> IRQ 16 Mode:1 Active:0)
00:01:06[A] -> 2-16 -> IRQ 16
ACPI: PCI Interrupt Link [APC2] enabled at IRQ 17
IOAPIC[0]: Set PCI routing entry (2-17 -> 0xd1 -> IRQ 17 Mode:1 Active:0)
00:01:06[B] -> 2-17 -> IRQ 17
ACPI: PCI Interrupt Link [APC3] enabled at IRQ 18
IOAPIC[0]: Set PCI routing entry (2-18 -> 0xd9 -> IRQ 18 Mode:1 Active:0)
00:01:06[C] -> 2-18 -> IRQ 18
ACPI: PCI Interrupt Link [APC4] enabled at IRQ 19
IOAPIC[0]: Set PCI routing entry (2-19 -> 0xe1 -> IRQ 19 Mode:1 Active:0)
00:01:06[D] -> 2-19 -> IRQ 19
Pin 2-19 already programmed
Pin 2-16 already programmed
Pin 2-17 already programmed
Pin 2-18 already programmed
Pin 2-18 already programmed
Pin 2-19 already programmed
Pin 2-16 already programmed
Pin 2-17 already programmed
Pin 2-17 already programmed
Pin 2-18 already programmed
Pin 2-19 already programmed
Pin 2-16 already programmed
Pin 2-16 already programmed
Pin 2-17 already programmed
Pin 2-18 already programmed
Pin 2-19 already programmed
Pin 2-18 already programmed
Pin 2-18 already programmed
Pin 2-18 already programmed
Pin 2-18 already programmed
Pin 2-19 already programmed
Pin 2-21 already programmed
Pin 2-21 already programmed
Pin 2-21 already programmed
Pin 2-21 already programmed
PCI: Using ACPI for IRQ routing
PCI: if you experience problems, try using option 'pci=noacpi' or even 'acpi=off'
Machine check exception polling timer started.
devfs: v1.22 (20021013) Richard Gooch (rgooch@atnf.csiro.au)
devfs: boot_options: 0x1
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
udf: registering filesystem
Supermount version 2.0.2a for kernel 2.6
ACPI: Power Button (FF) [PWRF]
ACPI: Processor [CPU0] (supports C1)
pty: 256 Unix98 ptys configured
request_module: failed /sbin/modprobe -- parport_lowlevel. error = -16
lp: driver loaded but no devices found
Real Time Clock Driver v1.12
Linux agpgart interface v0.100 (c) Dave Jones
agpgart: Detected NVIDIA nForce2 chipset
agpgart: Maximum main memory to use for agp memory: 439M
agpgart: AGP aperture is 64M @ 0xd0000000
[drm] Initialized radeon 1.9.0 20020828 on minor 0
Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing disabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
parport0: PC-style at 0x378 (0x778) [PCSPP(,...)]
parport0: irq 7 detected
lp0: using parport0 (polling).
Using anticipatory io scheduler
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
loop: loaded (max 8 devices)
3c59x: Donald Becker and others. www.scyld.com/network/vortex.html
0000:02:01.0: 3Com PCI 3c920 Tornado at 0x9000. Vers LK1.1.19
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
NFORCE2: IDE controller at PCI slot 0000:00:09.0
NFORCE2: chipset revision 162
NFORCE2: not 100% native mode: will probe irqs later
NFORCE2: BIOS didn't set cable bits correctly. Enabling workaround.
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
NFORCE2: 0000:00:09.0 (rev a2) UDMA133 controller
    ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:DMA
hda: Maxtor 6Y080P0, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hdc: SONY DVD RW DRU-510A, ATAPI CD/DVD-ROM drive
hdd: SAMSUNG CD-ROM SC-152C, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 128KiB
hda: 160086528 sectors (81964 MB) w/7936KiB Cache, CHS=65535/16/63, UDMA(133)
 /dev/ide/host0/bus0/target0/lun0: p1 p2 < p5 p6 p7 p8 >
hdc: ATAPI 32X DVD-ROM DVD-R CD-R/RW drive, 8192kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.12
hdd: ATAPI 52X CD-ROM drive, 128kB Cache, DMA
ohci1394: $Rev: 1045 $ Ben Collins <bcollins@debian.org>
PCI: Setting latency timer of device 0000:00:0d.0 to 64
ohci1394_0: OHCI-1394 1.1 (PCI): IRQ=[22]  MMIO=[e0083000-e00837ff]  Max Packet=[2048]
ohci1394_0: SelfID received outside of bus reset sequence
ehci_hcd 0000:00:02.2: EHCI Host Controller
PCI: Setting latency timer of device 0000:00:02.2 to 64
ehci_hcd 0000:00:02.2: irq 21, pci mem e0848000
ehci_hcd 0000:00:02.2: new USB bus registered, assigned bus number 1
PCI: cache line size of 64 is not supported by device 0000:00:02.2
ehci_hcd 0000:00:02.2: USB 2.0 enabled, EHCI 1.00, driver 2003-Jun-13
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 6 ports detected
drivers/usb/host/uhci-hcd.c: USB Universal Host Controller Interface driver v2.1
drivers/usb/core/usb.c: registered new driver usblp
drivers/usb/class/usblp.c: v0.13: USB Printer Device Class driver
Initializing USB Mass Storage driver...
drivers/usb/core/usb.c: registered new driver usb-storage
USB Mass Storage support registered.
drivers/usb/core/usb.c: registered new driver hid
drivers/usb/input/hid-core.c: v2.0:USB HID core driver
mice: PS/2 mouse device common for all mice
input: ImExPS/2 Logitech Explorer Mouse on isa0060/serio1
serio: i8042 AUX port at 0x60,0x64 irq 12
input: AT Translated Set 2 keyboard on isa0060/serio0
serio: i8042 KBD port at 0x60,0x64 irq 1
Advanced Linux Sound Architecture Driver Version 0.9.7 (Thu Sep 25 19:16:36 2003 UTC).
request_module: failed /sbin/modprobe -- snd-card-0. error = -16
ALSA device list:
  No soundcards found.
NET: Registered protocol family 2
IP: routing cache hash table of 4096 buckets, 32Kbytes
TCP: Hash tables configured (established 32768 bind 65536)
NET: Registered protocol family 1
NET: Registered protocol family 17
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Mounted devfs on /dev
Freeing unused kernel memory: 168k freed
ieee1394: Host added: ID:BUS[0-00:1023]  GUID[00e018000044dec8]
Adding 2008084k swap on /dev/hda5.  Priority:-1 extents:1
EXT3 FS on hda6, internal journal
i2c_adapter i2c-0: nForce2 SMBus adapter at 0x5000
i2c_adapter i2c-1: nForce2 SMBus adapter at 0x5500
registering 1-002d
registering 1-0049
registering 1-0048
kjournald starting.  Commit interval 5 seconds
EXT3 FS on hda7, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on hda8, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
PCI: Setting latency timer of device 0000:00:06.0 to 64
intel8x0: clocking to 47450
agpgart: Found an AGP 2.0 compliant device at 0000:00:00.0.
agpgart: Putting AGP V2 device at 0000:00:00.0 into 1x mode
agpgart: Putting AGP V2 device at 0000:03:00.0 into 1x mode
agpgart: Putting AGP V2 device at 0000:03:00.1 into 1x mode
[drm] Loading R200 Microcode

[-- Attachment #3: dmesg_afterpatch_r3_nmi_2 --]
[-- Type: text/plain, Size: 15361 bytes --]


DMI 2.2 present.
ACPI: RSDP (v000 Nvidia                                    ) @ 0x000f75e0
ACPI: RSDT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x1fff3000
ACPI: FADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x1fff3040
ACPI: MADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x1fff74c0
ACPI: DSDT (v001 NVIDIA AWRDACPI 0x00001000 MSFT 0x0100000e) @ 0x00000000
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 6:10 APIC version 16
ACPI: LAPIC_NMI (acpi_id[0x00] polarity[0x1] trigger[0x1] lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] global_irq_base[0x0])
IOAPIC[0]: Assigned apic_id 2
IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, IRQ 0-23
ACPI: INT_SRC_OVR (bus[0] irq[0x0] global_irq[0x2] polarity[0x0] trigger[0x0])
ACPI: INT_SRC_OVR (bus[0] irq[0x9] global_irq[0x9] polarity[0x1] trigger[0x3])
ACPI: INT_SRC_OVR (bus[0] irq[0xe] global_irq[0xe] polarity[0x1] trigger[0x1])
ACPI: INT_SRC_OVR (bus[0] irq[0xf] global_irq[0xf] polarity[0x1] trigger[0x1])
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Using ACPI (MADT) for SMP configuration information
Building zonelist for node : 0
Kernel command line: root=/dev/hda6 nmi_watchdog=2
Initializing CPU#0
PID hash table entries: 2048 (order 11: 16384 bytes)
Detected 1913.393 MHz processor.
Console: colour VGA+ 80x25
Memory: 514424k/524224k available (2563k kernel code, 9052k reserved, 933k data, 168k init, 0k highmem)
Calibrating delay loop... 3784.70 BogoMIPS
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
CPU:     After generic identify, caps: 0383fbff c1c3fbff 00000000 00000000
CPU:     After vendor identify, caps: 0383fbff c1c3fbff 00000000 00000000
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU:     After all inits, caps: 0383fbff c1c3fbff 00000000 00000020
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: AMD Athlon(tm) XP 2600+ stepping 00
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
testing NMI watchdog ... OK.
ENABLING IO-APIC IRQs
init IO_APIC IRQs
 IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected.
..TIMER: vector=0x31 pin1=2 pin2=-1
..MP-BIOS bug: 8254 timer not connected to IO-APIC
..TIMER: Is timer irq0 connected to IOAPIC Pin0? ...
IOAPIC[0]: Set PCI routing entry (2-0 -> 0x31 -> IRQ 0 Mode:0 Active:0)
..TIMER: works OK on apic pin0 irq0
number of MP IRQ sources: 15.
number of IO-APIC #2 registers: 24.
testing the IO APIC.......................
IO APIC #2......
.... register #00: 02000000
.......    : physical APIC id: 02
.......    : Delivery Type: 0
.......    : LTS          : 0
.... register #01: 00170011
.......     : max redirection entries: 0017
.......     : PRQ implemented: 0
.......     : IO APIC version: 0011
.... register #02: 00000000
.......     : arbitration: 00
.... IRQ redirection table:
 NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:   
 00 001 01  0    0    0   0   0    1    1    31
 01 001 01  0    0    0   0   0    1    1    39
 02 000 00  0    0    0   0   0    0    0    00
 03 001 01  0    0    0   0   0    1    1    41
 04 001 01  0    0    0   0   0    1    1    49
 05 001 01  0    0    0   0   0    1    1    51
 06 001 01  0    0    0   0   0    1    1    59
 07 001 01  0    0    0   0   0    1    1    61
 08 001 01  0    0    0   0   0    1    1    69
 09 001 01  1    1    0   0   0    1    1    71
 0a 001 01  0    0    0   0   0    1    1    79
 0b 001 01  0    0    0   0   0    1    1    81
 0c 001 01  0    0    0   0   0    1    1    89
 0d 001 01  0    0    0   0   0    1    1    91
 0e 001 01  0    0    0   0   0    1    1    99
 0f 001 01  0    0    0   0   0    1    1    A1
 10 000 00  1    0    0   0   0    0    0    00
 11 000 00  1    0    0   0   0    0    0    00
 12 000 00  1    0    0   0   0    0    0    00
 13 000 00  1    0    0   0   0    0    0    00
 14 000 00  1    0    0   0   0    0    0    00
 15 000 00  1    0    0   0   0    0    0    00
 16 000 00  1    0    0   0   0    0    0    00
 17 000 00  1    0    0   0   0    0    0    00
IRQ to pin mappings:
IRQ0 -> 0:2-> 0:0
IRQ1 -> 0:1
IRQ3 -> 0:3
IRQ4 -> 0:4
IRQ5 -> 0:5
IRQ6 -> 0:6
IRQ7 -> 0:7
IRQ8 -> 0:8
IRQ9 -> 0:9
IRQ10 -> 0:10
IRQ11 -> 0:11
IRQ12 -> 0:12
IRQ13 -> 0:13
IRQ14 -> 0:14
IRQ15 -> 0:15
.................................... done.
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 1912.0941 MHz.
..... host bus clock speed is 332.0685 MHz.
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xfb490, last bus=3
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20031002
IOAPIC[0]: Set PCI routing entry (2-9 -> 0x71 -> IRQ 9 Mode:1 Active:0)
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (00:00)
PCI: Probing PCI hardware (bus 00)
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.HUB0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.AGPB._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.HUB1._PRT]
ACPI: PCI Interrupt Link [LNK1] (IRQs 3 4 5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNK2] (IRQs 3 4 5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNK3] (IRQs 3 4 5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNK4] (IRQs 3 4 *5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNK5] (IRQs 3 4 5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LUBA] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LUBB] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LMAC] (IRQs 3 4 5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LAPU] (IRQs 3 4 *5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LACI] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LMCI] (IRQs 3 4 5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LSMB] (IRQs 3 4 *5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LUB2] (IRQs 3 4 *5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LFIR] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [L3CM] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LIDE] (IRQs 3 4 5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [APC1] (IRQs 16)
ACPI: PCI Interrupt Link [APC2] (IRQs 17)
ACPI: PCI Interrupt Link [APC3] (IRQs 18)
ACPI: PCI Interrupt Link [APC4] (IRQs *19)
ACPI: PCI Interrupt Link [APC5] (IRQs 16)
ACPI: PCI Interrupt Link [APCF] (IRQs 20 21 22)
ACPI: PCI Interrupt Link [APCG] (IRQs 20 21 22)
ACPI: PCI Interrupt Link [APCH] (IRQs 20 21 22)
ACPI: PCI Interrupt Link [APCI] (IRQs 20 21 22)
ACPI: PCI Interrupt Link [APCJ] (IRQs 20 21 22)
ACPI: PCI Interrupt Link [APCK] (IRQs 20 21 22)
ACPI: PCI Interrupt Link [APCS] (IRQs *23)
ACPI: PCI Interrupt Link [APCL] (IRQs 20 21 22)
ACPI: PCI Interrupt Link [APCM] (IRQs 20 21 22)
ACPI: PCI Interrupt Link [AP3C] (IRQs 20 21 22)
ACPI: PCI Interrupt Link [APCZ] (IRQs 20 21 22)
Linux Plug and Play Support v0.97 (c) Adam Belay
SCSI subsystem initialized
drivers/usb/core/usb.c: registered new driver usbfs
drivers/usb/core/usb.c: registered new driver hub
ACPI: PCI Interrupt Link [APCS] enabled at IRQ 23
IOAPIC[0]: Set PCI routing entry (2-23 -> 0xa9 -> IRQ 23 Mode:1 Active:0)
00:00:01[A] -> 2-23 -> IRQ 23
Pin 2-23 already programmed
ACPI: PCI Interrupt Link [APCF] enabled at IRQ 20
IOAPIC[0]: Set PCI routing entry (2-20 -> 0xb1 -> IRQ 20 Mode:1 Active:0)
00:00:02[A] -> 2-20 -> IRQ 20
ACPI: PCI Interrupt Link [APCG] enabled at IRQ 22
IOAPIC[0]: Set PCI routing entry (2-22 -> 0xb9 -> IRQ 22 Mode:1 Active:0)
00:00:02[B] -> 2-22 -> IRQ 22
ACPI: PCI Interrupt Link [APCL] enabled at IRQ 21
IOAPIC[0]: Set PCI routing entry (2-21 -> 0xc1 -> IRQ 21 Mode:1 Active:0)
00:00:02[C] -> 2-21 -> IRQ 21
ACPI: PCI Interrupt Link [APCH] enabled at IRQ 20
Pin 2-20 already programmed
ACPI: PCI Interrupt Link [APCI] enabled at IRQ 22
Pin 2-22 already programmed
ACPI: PCI Interrupt Link [APCJ] enabled at IRQ 21
Pin 2-21 already programmed
ACPI: PCI Interrupt Link [APCK] enabled at IRQ 20
Pin 2-20 already programmed
ACPI: PCI Interrupt Link [APCM] enabled at IRQ 22
Pin 2-22 already programmed
ACPI: PCI Interrupt Link [AP3C] enabled at IRQ 21
Pin 2-21 already programmed
ACPI: PCI Interrupt Link [APCZ] enabled at IRQ 20
Pin 2-20 already programmed
ACPI: PCI Interrupt Link [APC1] enabled at IRQ 16
IOAPIC[0]: Set PCI routing entry (2-16 -> 0xc9 -> IRQ 16 Mode:1 Active:0)
00:01:06[A] -> 2-16 -> IRQ 16
ACPI: PCI Interrupt Link [APC2] enabled at IRQ 17
IOAPIC[0]: Set PCI routing entry (2-17 -> 0xd1 -> IRQ 17 Mode:1 Active:0)
00:01:06[B] -> 2-17 -> IRQ 17
ACPI: PCI Interrupt Link [APC3] enabled at IRQ 18
IOAPIC[0]: Set PCI routing entry (2-18 -> 0xd9 -> IRQ 18 Mode:1 Active:0)
00:01:06[C] -> 2-18 -> IRQ 18
ACPI: PCI Interrupt Link [APC4] enabled at IRQ 19
IOAPIC[0]: Set PCI routing entry (2-19 -> 0xe1 -> IRQ 19 Mode:1 Active:0)
00:01:06[D] -> 2-19 -> IRQ 19
Pin 2-19 already programmed
Pin 2-16 already programmed
Pin 2-17 already programmed
Pin 2-18 already programmed
Pin 2-18 already programmed
Pin 2-19 already programmed
Pin 2-16 already programmed
Pin 2-17 already programmed
Pin 2-17 already programmed
Pin 2-18 already programmed
Pin 2-19 already programmed
Pin 2-16 already programmed
Pin 2-16 already programmed
Pin 2-17 already programmed
Pin 2-18 already programmed
Pin 2-19 already programmed
Pin 2-18 already programmed
Pin 2-18 already programmed
Pin 2-18 already programmed
Pin 2-18 already programmed
Pin 2-19 already programmed
Pin 2-21 already programmed
Pin 2-21 already programmed
Pin 2-21 already programmed
Pin 2-21 already programmed
PCI: Using ACPI for IRQ routing
PCI: if you experience problems, try using option 'pci=noacpi' or even 'acpi=off'
Machine check exception polling timer started.
devfs: v1.22 (20021013) Richard Gooch (rgooch@atnf.csiro.au)
devfs: boot_options: 0x1
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
udf: registering filesystem
Supermount version 2.0.2a for kernel 2.6
ACPI: Power Button (FF) [PWRF]
ACPI: Processor [CPU0] (supports C1)
pty: 256 Unix98 ptys configured
request_module: failed /sbin/modprobe -- parport_lowlevel. error = -16
lp: driver loaded but no devices found
Real Time Clock Driver v1.12
Linux agpgart interface v0.100 (c) Dave Jones
agpgart: Detected NVIDIA nForce2 chipset
agpgart: Maximum main memory to use for agp memory: 439M
agpgart: AGP aperture is 64M @ 0xd0000000
[drm] Initialized radeon 1.9.0 20020828 on minor 0
Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing disabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
parport0: PC-style at 0x378 (0x778) [PCSPP(,...)]
parport0: irq 7 detected
lp0: using parport0 (polling).
Using anticipatory io scheduler
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
loop: loaded (max 8 devices)
3c59x: Donald Becker and others. www.scyld.com/network/vortex.html
0000:02:01.0: 3Com PCI 3c920 Tornado at 0x9000. Vers LK1.1.19
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
NFORCE2: IDE controller at PCI slot 0000:00:09.0
NFORCE2: chipset revision 162
NFORCE2: not 100% native mode: will probe irqs later
NFORCE2: BIOS didn't set cable bits correctly. Enabling workaround.
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
NFORCE2: 0000:00:09.0 (rev a2) UDMA133 controller
    ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:DMA
hda: Maxtor 6Y080P0, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hdc: SONY DVD RW DRU-510A, ATAPI CD/DVD-ROM drive
hdd: SAMSUNG CD-ROM SC-152C, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 128KiB
hda: 160086528 sectors (81964 MB) w/7936KiB Cache, CHS=65535/16/63, UDMA(133)
 /dev/ide/host0/bus0/target0/lun0: p1 p2 < p5 p6 p7 p8 >
hdc: ATAPI 32X DVD-ROM DVD-R CD-R/RW drive, 8192kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.12
hdd: ATAPI 52X CD-ROM drive, 128kB Cache, DMA
ohci1394: $Rev: 1045 $ Ben Collins <bcollins@debian.org>
PCI: Setting latency timer of device 0000:00:0d.0 to 64
ohci1394_0: OHCI-1394 1.1 (PCI): IRQ=[22]  MMIO=[e0083000-e00837ff]  Max Packet=[2048]
ohci1394_0: SelfID received outside of bus reset sequence
ehci_hcd 0000:00:02.2: EHCI Host Controller
PCI: Setting latency timer of device 0000:00:02.2 to 64
ehci_hcd 0000:00:02.2: irq 21, pci mem e0848000
ehci_hcd 0000:00:02.2: new USB bus registered, assigned bus number 1
PCI: cache line size of 64 is not supported by device 0000:00:02.2
ehci_hcd 0000:00:02.2: USB 2.0 enabled, EHCI 1.00, driver 2003-Jun-13
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 6 ports detected
drivers/usb/host/uhci-hcd.c: USB Universal Host Controller Interface driver v2.1
drivers/usb/core/usb.c: registered new driver usblp
drivers/usb/class/usblp.c: v0.13: USB Printer Device Class driver
Initializing USB Mass Storage driver...
drivers/usb/core/usb.c: registered new driver usb-storage
USB Mass Storage support registered.
drivers/usb/core/usb.c: registered new driver hid
drivers/usb/input/hid-core.c: v2.0:USB HID core driver
mice: PS/2 mouse device common for all mice
input: ImExPS/2 Logitech Explorer Mouse on isa0060/serio1
serio: i8042 AUX port at 0x60,0x64 irq 12
input: AT Translated Set 2 keyboard on isa0060/serio0
serio: i8042 KBD port at 0x60,0x64 irq 1
Advanced Linux Sound Architecture Driver Version 0.9.7 (Thu Sep 25 19:16:36 2003 UTC).
request_module: failed /sbin/modprobe -- snd-card-0. error = -16
ALSA device list:
  No soundcards found.
NET: Registered protocol family 2
IP: routing cache hash table of 4096 buckets, 32Kbytes
TCP: Hash tables configured (established 32768 bind 65536)
NET: Registered protocol family 1
NET: Registered protocol family 17
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Mounted devfs on /dev
Freeing unused kernel memory: 168k freed
ieee1394: Host added: ID:BUS[0-00:1023]  GUID[00e018000044dec8]
Adding 2008084k swap on /dev/hda5.  Priority:-1 extents:1
EXT3 FS on hda6, internal journal
i2c_adapter i2c-0: nForce2 SMBus adapter at 0x5000
i2c_adapter i2c-1: nForce2 SMBus adapter at 0x5500
registering 1-002d
registering 1-0049
registering 1-0048
kjournald starting.  Commit interval 5 seconds
EXT3 FS on hda7, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on hda8, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
PCI: Setting latency timer of device 0000:00:06.0 to 64
intel8x0: clocking to 49371
agpgart: Found an AGP 2.0 compliant device at 0000:00:00.0.
agpgart: Putting AGP V2 device at 0000:00:00.0 into 1x mode
agpgart: Putting AGP V2 device at 0000:03:00.0 into 1x mode
agpgart: Putting AGP V2 device at 0000:03:00.1 into 1x mode
[drm] Loading R200 Microcode

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-09 22:04     ` Craig Bradney
  2003-12-09 23:13       ` Ian Kumlien
@ 2003-12-10  6:14       ` Bob
  2003-12-10  7:51         ` Craig Bradney
  1 sibling, 1 reply; 62+ messages in thread
From: Bob @ 2003-12-10  6:14 UTC (permalink / raw)
  To: linux-kernel

Craig Bradney wrote:

>What do the IDE ones[patches] claim to fix? I have had no real issue with IDE at
>all.. being able to burn CDs, DVDs, use my ATA133 drive for hdparm,
>greps, compilation, and general use.....
>
>Craig
>

These patches belong together because the same
necessity is the mother of their invention.

You may not have an offboard promise or sis hd
controller.

Alan Cox looked at "nforce2 irq storm" and the
offboard promise and sis controllers exposing
that dma operations might be running out of
time(time? timing..."timer"? a timer is a given
so "timer" was unthinkable!) waiting for irq
availability. That was months ago. It was only
evident that giving a "bight of slack(1)" to those
ops could help slightly, but we have a timer in
any case, don't we?

One person with a timer patch may backed into
the nforce2 solution while just trying to get
nmi_watchdog to work, right?

Ian Kumlien looks most likely to reason the problem
all the way through(2).

-Bob D

(1) "give me a bight of slack"
     "ah, for a bitty byte of pre-unicode slack loop"
  http://www.bartleby.com/61/13/B0241300.html

*bight*
 

PRONUNCIATION <http://www.bartleby.com/61/12.html>: 	  
<http://www.bartleby.com/61/wavs/13/B0241300.wav> bt
NOUN: 	*1**a.* A loop in a rope. *b.* The middle or slack part of an 
extended rope. *2**a.* A bend or curve, especially in a shoreline. *b.* 
A wide bay formed by such a bend or curve.
ETYMOLOGY: 	Middle English, bend, angle, from Old English /byht/. See 
*bheug- <http://www.bartleby.com/61/roots/IE63.html>* in Appendix I.


(2)  voted most likely to finesse through on a level above monkeys

 From Ian Kumlien:

I did some reading on amd's site, and if the disconnect + apic fixed the
same problem as the ~500ns delay, then it could be as i suspect...

I suspect that something goes wrong with apic ack when the cpu is
disconnected and according to the amd docs we could check the
Northbridge's CLKFWDRST or isn't that avail on the outside?
(It would be interesting to see if that fixes the problem as well.)

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26237.PDF

I don't really have the knowledge but it would sure be nicer to fix this
by checking this than to just disable it. I dunno if there is something
we could do from within the kernel aswell with the sending of HLT but i
doubt it.

Anyways, we need a generalized patch that does better checking on the
NMI bit (like Ross' patch). 

PS. Anyone that can point me to northbridge tech docks? and CC

-- Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-09 22:04     ` Craig Bradney
@ 2003-12-09 23:13       ` Ian Kumlien
  2003-12-10  6:14       ` Bob
  1 sibling, 0 replies; 62+ messages in thread
From: Ian Kumlien @ 2003-12-09 23:13 UTC (permalink / raw)
  To: Craig Bradney; +Cc: ross, linux-kernel, recbo

[-- Attachment #1: Type: text/plain, Size: 2039 bytes --]

On Tue, 2003-12-09 at 23:04, Craig Bradney wrote:
> On Tue, 2003-12-09 at 19:12, Ian Kumlien wrote:
> > Bob wrote:
> > > Using a patch that fixes a number of people's nforce2
> > > lockups while enabling io-apic edge timer, I can now
> > > use nmi_watchdog=2 but not =1
> > 
> > Why regurgitate patches that are outdated, Personally i find int
> > outdated after Ross made his patches available and they DO enable
> > nmi_watchdog=1. (I have seen the old patches mentioned more than once,
> > if something better comes along, please move to that instead.)
> > 
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=107080280512734&w=2
> > 
> > Anyways, Is there anyway to detect if the cpu is "disconnected" or, is
> > there anyway to see when the kernel sends it's halts that triggers the
> > disconnect? (or is it automagic?)
> > 
> > If there was a way to check, then thats all thats needed, all delays can
> > be removed and the code can be more generalized.
> > 
> > (Since doubt that this is apic torment. It's more apic trying to talk to
> > a disconnected cpu... (which both approaches hints at imho))
> 
> Have these patches been submitted for review for inclusion into the main
> kernel?

No, there is no final patch in anyway, there are just dodgy workarounds.
I just deem this better with working nmi_watchdog=1

> I'm still running the old IO-APIC patch (Uptime 3d 20h) and having no
> issues whatsoever.

They fix the same problem.. 

> Are all of the patches at that address you provide necessary? 

nope, but they are all nforce2 related.

> What do the IDE ones claim to fix? I have had no real issue with IDE at
> all.. being able to burn CDs, DVDs, use my ATA133 drive for hdparm,
> greps, compilation, and general use.....

it's just a cleanup afair.

Anyways, I think that if we find someway to detect cpu disconnect, then
we just need that "detection" prior to the apic ack... 
(just a guess though)

-- 
Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-09 18:12   ` Catching NForce2 lockup with NMI watchdog Ian Kumlien
@ 2003-12-09 22:04     ` Craig Bradney
  2003-12-09 23:13       ` Ian Kumlien
  2003-12-10  6:14       ` Bob
  0 siblings, 2 replies; 62+ messages in thread
From: Craig Bradney @ 2003-12-09 22:04 UTC (permalink / raw)
  To: Ian Kumlien; +Cc: ross, linux-kernel, recbo

On Tue, 2003-12-09 at 19:12, Ian Kumlien wrote:
> Bob wrote:
> > Using a patch that fixes a number of people's nforce2
> > lockups while enabling io-apic edge timer, I can now
> > use nmi_watchdog=2 but not =1
> 
> Why regurgitate patches that are outdated, Personally i find int
> outdated after Ross made his patches available and they DO enable
> nmi_watchdog=1. (I have seen the old patches mentioned more than once,
> if something better comes along, please move to that instead.)
> 
> http://marc.theaimsgroup.com/?l=linux-kernel&m=107080280512734&w=2
> 
> Anyways, Is there anyway to detect if the cpu is "disconnected" or, is
> there anyway to see when the kernel sends it's halts that triggers the
> disconnect? (or is it automagic?)
> 
> If there was a way to check, then thats all thats needed, all delays can
> be removed and the code can be more generalized.
> 
> (Since doubt that this is apic torment. It's more apic trying to talk to
> a disconnected cpu... (which both approaches hints at imho))

Have these patches been submitted for review for inclusion into the main
kernel?

I'm still running the old IO-APIC patch (Uptime 3d 20h) and having no
issues whatsoever.

Are all of the patches at that address you provide necessary? 

What do the IDE ones claim to fix? I have had no real issue with IDE at
all.. being able to burn CDs, DVDs, use my ATA133 drive for hdparm,
greps, compilation, and general use.....

Craig


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-08  2:07 ` Ross Dickson
@ 2003-12-09 18:12   ` Ian Kumlien
  2003-12-09 22:04     ` Craig Bradney
  0 siblings, 1 reply; 62+ messages in thread
From: Ian Kumlien @ 2003-12-09 18:12 UTC (permalink / raw)
  To: ross; +Cc: linux-kernel, recbo

[-- Attachment #1: Type: text/plain, Size: 1016 bytes --]

Bob wrote:
> Using a patch that fixes a number of people's nforce2
> lockups while enabling io-apic edge timer, I can now
> use nmi_watchdog=2 but not =1

Why regurgitate patches that are outdated, Personally i find int
outdated after Ross made his patches available and they DO enable
nmi_watchdog=1. (I have seen the old patches mentioned more than once,
if something better comes along, please move to that instead.)

http://marc.theaimsgroup.com/?l=linux-kernel&m=107080280512734&w=2

Anyways, Is there anyway to detect if the cpu is "disconnected" or, is
there anyway to see when the kernel sends it's halts that triggers the
disconnect? (or is it automagic?)

If there was a way to check, then thats all thats needed, all delays can
be removed and the code can be more generalized.

(Since doubt that this is apic torment. It's more apic trying to talk to
a disconnected cpu... (which both approaches hints at imho))

-- 
Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-05 20:18 ` cheuche+lkml
  2003-12-05 20:34   ` Prakash K. Cheemplavam
  2003-12-05 20:55   ` Jesse Allen
@ 2003-12-06  3:20   ` Jesse Allen
  2 siblings, 0 replies; 62+ messages in thread
From: Jesse Allen @ 2003-12-06  3:20 UTC (permalink / raw)
  To: linux-kernel

On Fri, Dec 05, 2003 at 09:18:12PM +0100, cheuche+lkml@free.fr wrote:
> On Fri, Dec 05, 2003 at 11:11:39AM -0800, Allen Martin wrote:
> With a little patch in arch/i386/kernel/mpparse.c in the acpi section, I
> managed to get the timer interrupt back on IO-APIC-edge, maybe the nmi
> watchdog could work with the ioapic then ?
> 


Like reported, with the patch the timer uses IO-APIC-edge, and the noise on IRQ 7 is gone, but still unable to catch a lockup with nmi_watchdog.

=(

Jesse

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-05 22:55 ` Mike Fedyk
@ 2003-12-05 23:11   ` Craig Bradney
  0 siblings, 0 replies; 62+ messages in thread
From: Craig Bradney @ 2003-12-05 23:11 UTC (permalink / raw)
  To: linux-kernel

On Fri, 2003-12-05 at 23:55, Mike Fedyk wrote:
> On Fri, Dec 05, 2003 at 11:11:39AM -0800, Allen Martin wrote:
> > NVIDIA doesn't provide a windows driver to setup APIC interrupts.  APIC
> > functionality is exported through the ACPI methods and MP table in the
> > system BIOS which the motherboard vendors supply.
> > 
> > Likely the root of the problem has to do with the way the Linux kernel is
> > using the ACPI methods to setup the interrupts which is different from win
> > 9x/2k/XP.  I can help track this down, unfortunately so far I've been unable
> > to reproduce the hangs on any of the boards I have.
> 
> Can the people with nforce chips run a command that will show the chipset
> config space like was done back when there were problems with via chipsets
> (before via released the specs on how to set the bits correctly).
> 
> Maybe you'll see some correlation between the boards that are crashing, and
> a few bits that are different for the boards that aren't crashing.
> -

Is there such a command? or is that your question? Ready to run it as
soon as someone lets me know.

Craig
Uptime: 6.5 hours


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-05 19:11 Allen Martin
  2003-12-05 20:18 ` cheuche+lkml
  2003-12-05 20:36 ` Jesse Allen
@ 2003-12-05 22:55 ` Mike Fedyk
  2003-12-05 23:11   ` Craig Bradney
  2 siblings, 1 reply; 62+ messages in thread
From: Mike Fedyk @ 2003-12-05 22:55 UTC (permalink / raw)
  To: Allen Martin; +Cc: 'Mikael Pettersson', Josh McKinney, linux-kernel

On Fri, Dec 05, 2003 at 11:11:39AM -0800, Allen Martin wrote:
> NVIDIA doesn't provide a windows driver to setup APIC interrupts.  APIC
> functionality is exported through the ACPI methods and MP table in the
> system BIOS which the motherboard vendors supply.
> 
> Likely the root of the problem has to do with the way the Linux kernel is
> using the ACPI methods to setup the interrupts which is different from win
> 9x/2k/XP.  I can help track this down, unfortunately so far I've been unable
> to reproduce the hangs on any of the boards I have.

Can the people with nforce chips run a command that will show the chipset
config space like was done back when there were problems with via chipsets
(before via released the specs on how to set the bits correctly).

Maybe you'll see some correlation between the boards that are crashing, and
a few bits that are different for the boards that aren't crashing.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: Catching NForce2 lockup with NMI watchdog
@ 2003-12-05 22:41 b
  0 siblings, 0 replies; 62+ messages in thread
From: b @ 2003-12-05 22:41 UTC (permalink / raw)
  To: mfedyk; +Cc: linux-kernel

 >Everyone with this problem should turn on the nmi_watchdog, as
 >someone may
 >have the right circumstances to produce an oops where the
 >others didn't.
 >
 >I say that you're not serious about getting this fixed unless
 >you're going
 >to do all of:

To quote Allen Martin:

 >NVIDIA doesn't provide a windows driver to setup APIC interrupts.
 >
 >APIC functionality is exported through the ACPI methods and MP
 >table in the system BIOS which the motherboard vendors supply.

 >Likely the root of the problem has to do with the way the Linux
 >kernel is using the ACPI methods to setup the interrupts which
 >is different from win 9x/2k/XP.  I can help track this down,
 >unfortunately so far I've been unable to reproduce the hangs
 >on any of the boards I have.

and

 >> Do you know whether the nforce2's with apic support the timer
 >> (IRQ 0) in
 >> IO-APIC mode?  To me, it seems like a bug:
 >> "Dec  4 20:13:11 tesore kernel: ..MP-BIOS bug: 8254 timer not
 >> connected to
 >> IO-APIC"
 >> (This message originates in arch/i386/kernel/io_apic.c)
 >>
 >
 >Yes, Win 9x/2k/XP use the system timer on irq0 and have no problem.  I
 >haven't looked at this yet.
 >

Is it not possible that Linux could be made to handling this hardware
correctly?


 >
 > o turn on nmi_watchdog
 > o try the patches posted[1]
 > o contact nvidia or your motherboard manufacturer saying you
 >need linux
 >   support, and return the board if they don't. (phone, fax,
 >email, or even
 >   local office if there is one)
 >
 >I bought a VIA board to avoid the problems I expected from the
 >nforce, and I
 >needed a system (server) that would *work* now.
 >
 >[1] If you're worried about your file system, just boot the
 >patched kernel in
 >single mode, and that will mount all of your file systems
 >read-only so there
 >will be little chance of corruption.
 >
 >Mike


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-05 20:34   ` Prakash K. Cheemplavam
@ 2003-12-05 21:02     ` Mike Fedyk
  0 siblings, 0 replies; 62+ messages in thread
From: Mike Fedyk @ 2003-12-05 21:02 UTC (permalink / raw)
  To: Prakash K. Cheemplavam; +Cc: cheuche+lkml, linux-kernel

On Fri, Dec 05, 2003 at 09:34:46PM +0100, Prakash K. Cheemplavam wrote:
> Hmm, interesting observation. This makes me remeber something: When my 
> machine freezes doing hdparm, the cursor still blinks, but I can't do 
> anything anymore. Maybe a connection to your observation? I haven't 
> treid to run the NMI watchdog, as you guys haven't had success with it yet.

Everyone with this problem should turn on the nmi_watchdog, as someone may
have the right circumstances to produce an oops where the others didn't.

I say that you're not serious about getting this fixed unless you're going
to do all of:

 o turn on nmi_watchdog
 o try the patches posted[1]
 o contact nvidia or your motherboard manufacturer saying you need linux
   support, and return the board if they don't. (phone, fax, email, or even
   local office if there is one)

I bought a VIA board to avoid the problems I expected from the nforce, and I
needed a system (server) that would *work* now.

[1] If you're worried about your filesystem, just boot the patched kernel in
single mode, and that will mount all of your filesystems read-only so there
will be little chance of corruption.

Mike

^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: Catching NForce2 lockup with NMI watchdog
@ 2003-12-05 20:56 Allen Martin
  0 siblings, 0 replies; 62+ messages in thread
From: Allen Martin @ 2003-12-05 20:56 UTC (permalink / raw)
  To: 'Jesse Allen'; +Cc: linux-kernel

> -----Original Message-----
> From: Jesse Allen [mailto:the3dfxdude@hotmail.com] 
> Sent: Friday, December 05, 2003 12:36 PM
>
> Do you know whether the nforce2's with apic support the timer 
> (IRQ 0) in 
> IO-APIC mode?  To me, it seems like a bug:
> "Dec  4 20:13:11 tesore kernel: ..MP-BIOS bug: 8254 timer not 
> connected to 
> IO-APIC"
> (This message originates in arch/i386/kernel/io_apic.c)
> 

Yes, Win 9x/2k/XP use the system timer on irq0 and have no problem.  I
haven't looked at this yet.

-Allen

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-05 20:18 ` cheuche+lkml
  2003-12-05 20:34   ` Prakash K. Cheemplavam
@ 2003-12-05 20:55   ` Jesse Allen
  2003-12-06  3:20   ` Jesse Allen
  2 siblings, 0 replies; 62+ messages in thread
From: Jesse Allen @ 2003-12-05 20:55 UTC (permalink / raw)
  To: linux-kernel

On Fri, Dec 05, 2003 at 09:18:12PM +0100, cheuche+lkml@free.fr wrote:
> With a little patch in arch/i386/kernel/mpparse.c in the acpi section, I
> managed to get the timer interrupt back on IO-APIC-edge, maybe the nmi
> watchdog could work with the ioapic then ?

Maybe!  thanks!

> 
> With the patch, the interrupt flood on IRQ7 I reported on the nvidia2 
> lockups thread also disappeared, but then I noticed something odd when
> there is ide activity :

Yeah, I have been writing trace code to try to identify where it fails.  
Somehow what I did seem to have made IRQ 7 less noisy but I have no idea why? =)
So I do think the IRQ is related somehow...

> 
> There may be something wrong with the timer using apic and the
> amd/nforce ide driver does not handle this situation that should not
> occur and juste freezes. This is pure speculation of course.
> 
> *Disclaimer*
> The modification is certainly not the proper fix, does a wrong thing,
> but it shows an interesting behavior, especially it fixed the
> interrupt flood on IRQ7 I and some others are able to see.
> 
> Here the little patch of arch/i386/kernel/mpparse.c I used :
> 

I'll check it out.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-05 19:11 Allen Martin
  2003-12-05 20:18 ` cheuche+lkml
@ 2003-12-05 20:36 ` Jesse Allen
  2003-12-05 22:55 ` Mike Fedyk
  2 siblings, 0 replies; 62+ messages in thread
From: Jesse Allen @ 2003-12-05 20:36 UTC (permalink / raw)
  To: Allen Martin; +Cc: linux-kernel

On Fri, Dec 05, 2003 at 11:11:39AM -0800, Allen Martin wrote:
> > -----Original Message-----
> > From: Mikael Pettersson [mailto:mikpe@csd.uu.se] 
> > Sent: Friday, December 05, 2003 4:15 AM
> >
> >  > So does this confirm that the lockups with nforce2 
> > chipsets and apic
> >  > is actually a hardware problem after all? 
> > 
> > Confirm with very high probability. There may be quirks in nVidia's
> > chipset that we (unlike their Windoze drivers) don't know about.
> > 
> > Ask nVidia for detailed chipset documentation. Then maybe we 
> > can fix this.
> 
> NVIDIA doesn't provide a windows driver to setup APIC interrupts.  APIC
> functionality is exported through the ACPI methods and MP table in the
> system BIOS which the motherboard vendors supply.
> 
> Likely the root of the problem has to do with the way the Linux kernel is
> using the ACPI methods to setup the interrupts which is different from win
> 9x/2k/XP.  I can help track this down, unfortunately so far I've been unable
> to reproduce the hangs on any of the boards I have.
> 

Do you know whether the nforce2's with apic support the timer (IRQ 0) in 
IO-APIC mode?  To me, it seems like a bug:
"Dec  4 20:13:11 tesore kernel: ..MP-BIOS bug: 8254 timer not connected to 
IO-APIC"
(This message originates in arch/i386/kernel/io_apic.c)

nmi_watchdog doesn't seem to work at all because of this.  If it was working, 
then maybe I can catch the lockup, because if it's like you say, it's probably 
the kernel not hardware.

Jesse

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-05 20:18 ` cheuche+lkml
@ 2003-12-05 20:34   ` Prakash K. Cheemplavam
  2003-12-05 21:02     ` Mike Fedyk
  2003-12-05 20:55   ` Jesse Allen
  2003-12-06  3:20   ` Jesse Allen
  2 siblings, 1 reply; 62+ messages in thread
From: Prakash K. Cheemplavam @ 2003-12-05 20:34 UTC (permalink / raw)
  To: cheuche+lkml; +Cc: linux-kernel

> through easily. I first thought the box freezed but I realized the
> software cursor was blinking *very* slowly. In fact 1 second for the
> kernel took about 12 seconds. Stopping the IO load on ide and
> everything seems back to normal.

Hmm, interesting observation. This makes me remeber something: When my 
machine freezes doing hdparm, the cursor still blinks, but I can't do 
anything anymore. Maybe a connection to your observation? I haven't 
treid to run the NMI watchdog, as you guys haven't had success with it yet.

Prakash



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog
  2003-12-05 19:11 Allen Martin
@ 2003-12-05 20:18 ` cheuche+lkml
  2003-12-05 20:34   ` Prakash K. Cheemplavam
                     ` (2 more replies)
  2003-12-05 20:36 ` Jesse Allen
  2003-12-05 22:55 ` Mike Fedyk
  2 siblings, 3 replies; 62+ messages in thread
From: cheuche+lkml @ 2003-12-05 20:18 UTC (permalink / raw)
  To: linux-kernel

On Fri, Dec 05, 2003 at 11:11:39AM -0800, Allen Martin wrote:
> 
> Likely the root of the problem has to do with the way the Linux kernel is
> using the ACPI methods to setup the interrupts which is different from win
> 9x/2k/XP.  I can help track this down, unfortunately so far I've been unable
> to reproduce the hangs on any of the boards I have.
> 
With a little patch in arch/i386/kernel/mpparse.c in the acpi section, I
managed to get the timer interrupt back on IO-APIC-edge, maybe the nmi
watchdog could work with the ioapic then ?

With the patch, the interrupt flood on IRQ7 I reported on the nvidia2 
lockups thread also disappeared, but then I noticed something odd when
there is ide activity :
With amd74xx/nforce driver, I can almost instantly hang the machine
(nothing new there), but with the generic ide driver and the IO load a
cat /dev/hda > /dev/null can do, timer interrupts don't seem to get
through easily. I first thought the box freezed but I realized the
software cursor was blinking *very* slowly. In fact 1 second for the
kernel took about 12 seconds. Stopping the IO load on ide and
everything seems back to normal.

There may be something wrong with the timer using apic and the
amd/nforce ide driver does not handle this situation that should not
occur and juste freezes. This is pure speculation of course.

I looked in mpparse.c because this is where I noticed the difference
about the timer interrupt setup with apic between 2.4.22 and 2.4.23.
However it is in the path of ACPI source interrupt override, maybe the
modification I made just overrides the override (sigh).

*Disclaimer*
The modification is certainly not the proper fix, does a wrong thing,
but it shows an interesting behavior, especially it fixed the
interrupt flood on IRQ7 I and some others are able to see.

Here the little patch of arch/i386/kernel/mpparse.c I used :

--- mpparse.c.old       2003-12-05 14:42:10.000000000 +0100
+++ mpparse.c   2003-12-05 14:43:41.000000000 +0100
@@ -962,7 +962,8 @@
	*/
	for (i = 0; i < mp_irq_entries; i++) {
		if ((mp_irqs[i].mpc_dstapic == intsrc.mpc_dstapic)
-			&& (mp_irqs[i].mpc_srcbusirq == intsrc.mpc_srcbusirq)) {
+			&& (mp_irqs[i].mpc_srcbusirq == intsrc.mpc_srcbusirq)
+			&& (mp_irqs[i].mpc_irqtype == intsrc.mpc_irqtype)) {
			mp_irqs[i] = intsrc;
			found = 1;
			break;



I hope this helps,

Mathieu

^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: Catching NForce2 lockup with NMI watchdog
@ 2003-12-05 19:11 Allen Martin
  2003-12-05 20:18 ` cheuche+lkml
                   ` (2 more replies)
  0 siblings, 3 replies; 62+ messages in thread
From: Allen Martin @ 2003-12-05 19:11 UTC (permalink / raw)
  To: 'Mikael Pettersson', Josh McKinney; +Cc: linux-kernel

> -----Original Message-----
> From: Mikael Pettersson [mailto:mikpe@csd.uu.se] 
> Sent: Friday, December 05, 2003 4:15 AM
>
>  > So does this confirm that the lockups with nforce2 
> chipsets and apic
>  > is actually a hardware problem after all? 
> 
> Confirm with very high probability. There may be quirks in nVidia's
> chipset that we (unlike their Windoze drivers) don't know about.
> 
> Ask nVidia for detailed chipset documentation. Then maybe we 
> can fix this.

NVIDIA doesn't provide a windows driver to setup APIC interrupts.  APIC
functionality is exported through the ACPI methods and MP table in the
system BIOS which the motherboard vendors supply.

Likely the root of the problem has to do with the way the Linux kernel is
using the ACPI methods to setup the interrupts which is different from win
9x/2k/XP.  I can help track this down, unfortunately so far I've been unable
to reproduce the hangs on any of the boards I have.

-Allen

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2003-12-19 15:35 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-12-05  4:54 Catching NForce2 lockup with NMI watchdog Jesse Allen
2003-12-05  7:40 ` Mikael Pettersson
2003-12-05  8:33   ` Josh McKinney
2003-12-05 12:14     ` Mikael Pettersson
2003-12-05 14:19       ` Craig Bradney
2003-12-05 17:05         ` Craig Bradney
2003-12-05 18:11         ` Josh McKinney
2003-12-05  8:58   ` Mike Fedyk
2003-12-05 12:06     ` Mikael Pettersson
2003-12-08  2:20     ` Bob
2003-12-09 14:21       ` Maciej W. Rozycki
2003-12-09 16:35         ` Bob
2003-12-10 13:41           ` Maciej W. Rozycki
2003-12-12 16:01             ` bill davidsen
2003-12-12 16:47               ` Maciej W. Rozycki
2003-12-12 16:57                 ` Richard B. Johnson
2003-12-12 17:21                   ` Maciej W. Rozycki
2003-12-13  5:16                 ` Bill Davidsen
2003-12-15 13:23                   ` Maciej W. Rozycki
2003-12-12 22:27               ` George Anzinger
2003-12-15 13:13                 ` Maciej W. Rozycki
2003-12-15 21:42                   ` George Anzinger
2003-12-16 13:37                     ` Maciej W. Rozycki
2003-12-16 13:57                       ` Richard B. Johnson
2003-12-16 15:47                         ` Maciej W. Rozycki
2003-12-16 16:44                           ` Richard B. Johnson
2003-12-16 16:50                             ` Maciej W. Rozycki
2003-12-16 17:26                       ` George Anzinger
2003-12-16 20:54                         ` Maciej W. Rozycki
2003-12-16 21:53                           ` George Anzinger
2003-12-17 14:03                             ` Maciej W. Rozycki
2003-12-05 19:11 Allen Martin
2003-12-05 20:18 ` cheuche+lkml
2003-12-05 20:34   ` Prakash K. Cheemplavam
2003-12-05 21:02     ` Mike Fedyk
2003-12-05 20:55   ` Jesse Allen
2003-12-06  3:20   ` Jesse Allen
2003-12-05 20:36 ` Jesse Allen
2003-12-05 22:55 ` Mike Fedyk
2003-12-05 23:11   ` Craig Bradney
2003-12-05 20:56 Allen Martin
2003-12-05 22:41 b
2003-12-07 19:58 Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered Ian Kumlien
2003-12-08  2:07 ` Ross Dickson
2003-12-09 18:12   ` Catching NForce2 lockup with NMI watchdog Ian Kumlien
2003-12-09 22:04     ` Craig Bradney
2003-12-09 23:13       ` Ian Kumlien
2003-12-10  6:14       ` Bob
2003-12-10  7:51         ` Craig Bradney
2003-12-13  3:56 Ross Dickson
2003-12-15 13:16 ` Maciej W. Rozycki
2003-12-17 18:14 Ross Dickson
2003-12-17 21:41 ` George Anzinger
2003-12-17 21:48 ` George Anzinger
2003-12-18  1:30   ` Ross Dickson
2003-12-18 14:32     ` Maciej W. Rozycki
2003-12-19  4:17       ` Ross Dickson
2003-12-19 15:35         ` Maciej W. Rozycki
2003-12-18 14:04 ` Maciej W. Rozycki
2003-12-18 14:22   ` Craig Bradney
2003-12-19  5:38     ` Ross Dickson
2003-12-19 10:36       ` Craig Bradney
2003-12-19  4:06   ` Ross Dickson
2003-12-19 15:33     ` Maciej W. Rozycki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).