* Re: Catching NForce2 lockup with NMI watchdog - found
@ 2003-12-08 10:43 Mikael Pettersson
0 siblings, 0 replies; 8+ messages in thread
From: Mikael Pettersson @ 2003-12-08 10:43 UTC (permalink / raw)
To: linux-kernel, recbo
On Mon, 08 Dec 2003 03:13:55 -0500, Bob <recbo@nishanet.com> wrote:
>Does the kernel opt "user HPET timer" relate to io-apic-edge timer?
No. HPET is a newer piece of timer HW. IO-APIC-edge on the timer
relates to how it's connected to the CPU, not where it comes from.
>Does the kernel opt "hangcheck timer relate" to nmi_watchdog?
No.
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Catching NForce2 lockup with NMI watchdog
@ 2003-12-05 20:56 Allen Martin
2003-12-05 23:49 ` Catching NForce2 lockup with NMI watchdog - found? Prakash K. Cheemplavam
0 siblings, 1 reply; 8+ messages in thread
From: Allen Martin @ 2003-12-05 20:56 UTC (permalink / raw)
To: 'Jesse Allen'; +Cc: linux-kernel
> -----Original Message-----
> From: Jesse Allen [mailto:the3dfxdude@hotmail.com]
> Sent: Friday, December 05, 2003 12:36 PM
>
> Do you know whether the nforce2's with apic support the timer
> (IRQ 0) in
> IO-APIC mode? To me, it seems like a bug:
> "Dec 4 20:13:11 tesore kernel: ..MP-BIOS bug: 8254 timer not
> connected to
> IO-APIC"
> (This message originates in arch/i386/kernel/io_apic.c)
>
Yes, Win 9x/2k/XP use the system timer on irq0 and have no problem. I
haven't looked at this yet.
-Allen
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Catching NForce2 lockup with NMI watchdog - found?
2003-12-05 20:56 Catching NForce2 lockup with NMI watchdog Allen Martin
@ 2003-12-05 23:49 ` Prakash K. Cheemplavam
2003-12-06 8:18 ` Catching NForce2 lockup with NMI watchdog - found cheuche+lkml
0 siblings, 1 reply; 8+ messages in thread
From: Prakash K. Cheemplavam @ 2003-12-05 23:49 UTC (permalink / raw)
To: Allen Martin; +Cc: 'Jesse Allen', linux-kernel
Hi,
*maybe* I found the bugger, at least I got APIC more stable (need to
test whether oit is really stable, compiling kernel right now...):
It is a problem with CPU disconnect function. I tried various parameters
in bios and turned cpu disconnect off, and tada, I could do several
subsequent hdparms and machine is running! As CPU disconnect is a ACPI
state, if I am not mistkaen, I think there is something broken in ACPI
right now or in APIC and cpu disconnect triggers the bug.
Maybe now my windows environment is stable, as well. It was much more
stable with cpu disconnect and apic, nevertheless seldomly locked up.
So gals and guys, try disabling cpu disconnect in bios and see whether
aopic now runs stable.
I have an Abit NF7-S Rev2.0 with Bios 2.0.
Prakash
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Catching NForce2 lockup with NMI watchdog - found
2003-12-05 23:49 ` Catching NForce2 lockup with NMI watchdog - found? Prakash K. Cheemplavam
@ 2003-12-06 8:18 ` cheuche+lkml
2003-12-06 11:22 ` Julien Oster
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: cheuche+lkml @ 2003-12-06 8:18 UTC (permalink / raw)
To: linux-kernel
On Sat, Dec 06, 2003 at 12:49:50AM +0100, Prakash K. Cheemplavam wrote:
>
> So gals and guys, try disabling cpu disconnect in bios and see whether
> aopic now runs stable.
>
Yes that fix it. Well time will tell but I cannot make it crash with
hdparm -tT or cat /dev/hda so far. I'm dumping hda to /dev/null right
now.
After testing to make it crash, I used athcool to reenable CPU
disconnect, and guess what, test after that just crashed the box.
You found the problem, congratulations.
If you experience crashes with apic and your bios does not have such
option, try athcool at
http://members.jcom.home.ne.jp/jacobi/linux/softwares.html
Its purpose is to *enable* cpu disconnect but can also disable it. Your
best bet is to run it to disable cpu disconnect the soonest possible at
boot.
On the other hand, it isn't the cause of IRQ7 rogue interrupts. As I
initially suspected, it seems now totally unrelated. The ACPI override
handling may be buggy ? Since putting back the timer on IO-APIC-edge
solves it.
Nevertheless this is still a problem, other chipsets for Athlon
processors seems to be able to have cpu disconnect and ioapic enabled
without any crashes. But so far I don't see any thermal differences, I'm
happy with that.
Mathieu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Catching NForce2 lockup with NMI watchdog - found
2003-12-06 8:18 ` Catching NForce2 lockup with NMI watchdog - found cheuche+lkml
@ 2003-12-06 11:22 ` Julien Oster
2003-12-08 3:34 ` Bob
2003-12-06 12:24 ` Prakash K. Cheemplavam
2003-12-08 3:25 ` Bob
2 siblings, 1 reply; 8+ messages in thread
From: Julien Oster @ 2003-12-06 11:22 UTC (permalink / raw)
To: linux-kernel
cheuche+lkml@free.fr writes:
Hello,
>> So gals and guys, try disabling cpu disconnect in bios and see whether
>> aopic now runs stable.
> Yes that fix it. Well time will tell but I cannot make it crash with
> hdparm -tT or cat /dev/hda so far. I'm dumping hda to /dev/null right
> now.
> After testing to make it crash, I used athcool to reenable CPU
> disconnect, and guess what, test after that just crashed the box.
> You found the problem, congratulations.
Well, now I'm stunned.
With APIC and ACPI enabled, my machine isn't even able to boot
completely, it'll most certainly crash before the init scripts are
finished.
Now, I modified the init scripts to do "athcool off" as the first
thing at all (I don't have any "CPU disconnect" BIOS setting) and it
not only booted, but I even can't seem to make it crash using my
hdparm/grep/whatever tests...
I don't know if it's "rock solid" yet, but at least the difference is
huge. It really seems like that made the problem go away!
Regards,
Julien
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Catching NForce2 lockup with NMI watchdog - found
2003-12-06 11:22 ` Julien Oster
@ 2003-12-08 3:34 ` Bob
2003-12-08 8:13 ` Bob
0 siblings, 1 reply; 8+ messages in thread
From: Bob @ 2003-12-08 3:34 UTC (permalink / raw)
To: linux-kernel
Wasn't this formerly fixed by a kernel config
option to not detect cpu idle on amd cpu's?
I don't see that in 2.6
Julien Oster wrote:
>cheuche+lkml@free.fr writes:
>
>Hello,
>
>
>
>>>So gals and guys, try disabling cpu disconnect in bios and see whether
>>>aopic now runs stable.
>>>
>>>
>
>
>
>>Yes that fix it. Well time will tell but I cannot make it crash with
>>hdparm -tT or cat /dev/hda so far. I'm dumping hda to /dev/null right
>>now.
>>After testing to make it crash, I used athcool to reenable CPU
>>disconnect, and guess what, test after that just crashed the box.
>>You found the problem, congratulations.
>>
>>
>
>Well, now I'm stunned.
>
>With APIC and ACPI enabled, my machine isn't even able to boot
>completely, it'll most certainly crash before the init scripts are
>finished.
>
>Now, I modified the init scripts to do "athcool off" as the first
>thing at all (I don't have any "CPU disconnect" BIOS setting) and it
>not only booted, but I even can't seem to make it crash using my
>hdparm/grep/whatever tests...
>
>I don't know if it's "rock solid" yet, but at least the difference is
>huge. It really seems like that made the problem go away!
>
>Regards,
>Julien
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Catching NForce2 lockup with NMI watchdog - found
2003-12-08 3:34 ` Bob
@ 2003-12-08 8:13 ` Bob
0 siblings, 0 replies; 8+ messages in thread
From: Bob @ 2003-12-08 8:13 UTC (permalink / raw)
To: linux-kernel
The athcool patch seemed to work as far as patch reported,
but there were undef and unused problems on compile so
I don't have that in right now. The timer patch is in.
http://www.kernel.org/pub/linux/kernel/people/bart/2.6.0-test11-bart1/broken-out/nforce2-apic.patch
Patch succeeded in giving me the ioapic-edge timer,
then lilo append="nmi_watchdog=1" did not work but
=2 did get NMI ticks as shown below.
cat /proc/interrupts
CPU0
0: 617651 IO-APIC-edge timer
1: 868 IO-APIC-edge i8042
2: 0 XT-PIC cascade
8: 1 IO-APIC-edge rtc
9: 0 IO-APIC-level acpi
12: 8736 IO-APIC-edge i8042
14: 22 IO-APIC-edge ide0
15: 24 IO-APIC-edge ide1
16: 92853 IO-APIC-level 3ware Storage Controller, yenta, yenta
17: 2793 IO-APIC-level eth0
21: 0 IO-APIC-level NVidia nForce2
NMI: 122
LOC: 617511
ERR: 0
MIS: 0
Does the kernel opt "user HPET timer" relate to io-apic-edge timer?
Does the kernel opt "hangcheck timer relate" to nmi_watchdog?
Does the kernel opt "ACPI, Processor (c2) (c3 states)" relate to
the cmos/bios "processor disconnect" option and athcool patch?
kernel 2.6.0-test11, pre-emptive, apic, lapic, acpi, anticipatory
scheduling not deadline scheduling, cpu and fsb clock 1:1 333mhz,
amd xp3000+ and high-performance settings(CAS2) other than
1:1 fsb/ram which is slow for the ram, 41C - 48C cpu temp, MSI K7N2
mboard.
My system was stable already(apic, lapic, pre-empt) after a bios
update which stopped all irq storm and crashes except "IRQ7 disabled"
and "spurious 8259A interrupts" possibly related to the XT-PIC timer
running when the other was expected due to apic and lapic and acpi
and kernel opt 'use HPET timer" all being on. Turning on onboard
ethernet set off the irq7 and 8259a errs so I have not been using
onboard eth. USB did not work. I can now test the timer patch with
the onboard ethernet, forcedeth driver, usb, and the "nvidia" X
driver, which was crashing linux so I had to use "nv". Those
items are where my stability frontier is.
-Bob
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Catching NForce2 lockup with NMI watchdog - found
2003-12-06 8:18 ` Catching NForce2 lockup with NMI watchdog - found cheuche+lkml
2003-12-06 11:22 ` Julien Oster
@ 2003-12-06 12:24 ` Prakash K. Cheemplavam
2003-12-08 3:25 ` Bob
2 siblings, 0 replies; 8+ messages in thread
From: Prakash K. Cheemplavam @ 2003-12-06 12:24 UTC (permalink / raw)
To: cheuche+lkml; +Cc: linux-kernel
cheuche+lkml@free.fr wrote:
> On Sat, Dec 06, 2003 at 12:49:50AM +0100, Prakash K. Cheemplavam wrote:
>
>>So gals and guys, try disabling cpu disconnect in bios and see whether
>>aopic now runs stable.
>>
>
> Yes that fix it. Well time will tell but I cannot make it crash with
> hdparm -tT or cat /dev/hda so far. I'm dumping hda to /dev/null right
> now.
>
> After testing to make it crash, I used athcool to reenable CPU
> disconnect, and guess what, test after that just crashed the box.
> You found the problem, congratulations.
:-)
Isn't it possible to ad athcool's code into the kernel, maybe into the
pm section or even make it an kernel option. It seems to be a nice
workaround for the time-being.
Prakash
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Catching NForce2 lockup with NMI watchdog - found
2003-12-06 8:18 ` Catching NForce2 lockup with NMI watchdog - found cheuche+lkml
2003-12-06 11:22 ` Julien Oster
2003-12-06 12:24 ` Prakash K. Cheemplavam
@ 2003-12-08 3:25 ` Bob
2003-12-08 3:18 ` Bartlomiej Zolnierkiewicz
2 siblings, 1 reply; 8+ messages in thread
From: Bob @ 2003-12-08 3:25 UTC (permalink / raw)
To: linux-kernel
cheuche+lkml@free.fr wrote:
> ...................If you experience crashes with apic and your bios
> does not have such
>
>option, try athcool at
>http://members.jcom.home.ne.jp/jacobi/linux/softwares.html
>Its purpose is to *enable* cpu disconnect but can also disable it. Your
>best bet is to run it to disable cpu disconnect the soonest possible at
>boot.
>
>On the other hand, it isn't the cause of IRQ7 rogue interrupts. As I
>initially suspected, it seems now totally unrelated. The ACPI override
>handling may be buggy ? Since putting back the timer on IO-APIC-edge
>solves it.
>
>Nevertheless this is still a problem, other chipsets for Athlon
>processors seems to be able to have cpu disconnect and ioapic enabled
>without any crashes. But so far I don't see any thermal differences, I'm
>happy with that.
>
>Mathieu
>
I presently have /proc/interrupts
0: 244393560 XT-PIC timer
but when I tried nvnet driver and onboard
ethernet I think I saw both IRQ7 disabled
and some 8259A spurious interrupt err.
Presently there is no grep timer or TIMER
or 8259A in logs. 8259A has to do with
IO-APIC timer? It would make sense that
nvnet would see apic and lapic on in bios
and linux and look for io-apic timer as
well as apic table, then fail confused.
Is there a link to that patch? I keep deleting
this list it's huge so I lost a patch in a message.
-Bob
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2003-12-08 10:43 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-12-08 10:43 Catching NForce2 lockup with NMI watchdog - found Mikael Pettersson
-- strict thread matches above, loose matches on Subject: below --
2003-12-05 20:56 Catching NForce2 lockup with NMI watchdog Allen Martin
2003-12-05 23:49 ` Catching NForce2 lockup with NMI watchdog - found? Prakash K. Cheemplavam
2003-12-06 8:18 ` Catching NForce2 lockup with NMI watchdog - found cheuche+lkml
2003-12-06 11:22 ` Julien Oster
2003-12-08 3:34 ` Bob
2003-12-08 8:13 ` Bob
2003-12-06 12:24 ` Prakash K. Cheemplavam
2003-12-08 3:25 ` Bob
2003-12-08 3:18 ` Bartlomiej Zolnierkiewicz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).