linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Catching NForce2 lockup with NMI watchdog - found
@ 2003-12-08 10:43 Mikael Pettersson
  0 siblings, 0 replies; 8+ messages in thread
From: Mikael Pettersson @ 2003-12-08 10:43 UTC (permalink / raw)
  To: linux-kernel, recbo

On Mon, 08 Dec 2003 03:13:55 -0500, Bob <recbo@nishanet.com> wrote:
>Does the kernel opt "user HPET timer" relate to io-apic-edge timer?

No. HPET is a newer piece of timer HW. IO-APIC-edge on the timer
relates to how it's connected to the CPU, not where it comes from.

>Does the kernel opt "hangcheck timer relate" to nmi_watchdog?

No.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog - found
  2003-12-08  3:34       ` Bob
@ 2003-12-08  8:13         ` Bob
  0 siblings, 0 replies; 8+ messages in thread
From: Bob @ 2003-12-08  8:13 UTC (permalink / raw)
  To: linux-kernel

The athcool patch seemed to work as far as patch reported,
but there were undef and unused problems on compile so
I don't have that in right now. The timer patch is in.

http://www.kernel.org/pub/linux/kernel/people/bart/2.6.0-test11-bart1/broken-out/nforce2-apic.patch

Patch succeeded in giving me the ioapic-edge timer,
then lilo append="nmi_watchdog=1" did not work but
=2 did get NMI ticks as shown below.

cat /proc/interrupts

           CPU0       
  0:     617651    IO-APIC-edge  timer
  1:        868    IO-APIC-edge  i8042
  2:          0          XT-PIC  cascade
  8:          1    IO-APIC-edge  rtc
  9:          0   IO-APIC-level  acpi
 12:       8736    IO-APIC-edge  i8042
 14:         22    IO-APIC-edge  ide0
 15:         24    IO-APIC-edge  ide1
 16:      92853   IO-APIC-level  3ware Storage Controller, yenta, yenta
 17:       2793   IO-APIC-level  eth0
 21:          0   IO-APIC-level  NVidia nForce2
NMI:        122 
LOC:     617511 
ERR:          0
MIS:          0

Does the kernel opt "user HPET timer" relate to io-apic-edge timer?
Does the kernel opt "hangcheck timer relate" to nmi_watchdog?
Does the kernel opt "ACPI, Processor (c2) (c3 states)" relate to
  the cmos/bios "processor disconnect" option and athcool patch?

kernel 2.6.0-test11, pre-emptive, apic, lapic, acpi, anticipatory
scheduling not deadline scheduling, cpu and fsb clock 1:1 333mhz,
amd xp3000+ and high-performance settings(CAS2) other than
1:1 fsb/ram which is slow for the ram, 41C - 48C cpu temp, MSI K7N2
mboard.

My system was stable already(apic, lapic, pre-empt) after a bios
update which stopped all irq storm and crashes except "IRQ7 disabled"
and "spurious 8259A interrupts" possibly related to the XT-PIC timer
running when the other was expected due to apic and lapic and acpi
and kernel opt 'use HPET timer" all being on. Turning on onboard
ethernet set off the irq7 and 8259a errs so I have not been using
onboard eth. USB did not work. I can now test the timer patch with
the onboard ethernet, forcedeth driver, usb, and the "nvidia" X 
driver, which was crashing linux so I had to use "nv". Those
items are where my stability frontier is.

-Bob




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog - found
  2003-12-06 11:22     ` Julien Oster
@ 2003-12-08  3:34       ` Bob
  2003-12-08  8:13         ` Bob
  0 siblings, 1 reply; 8+ messages in thread
From: Bob @ 2003-12-08  3:34 UTC (permalink / raw)
  To: linux-kernel

Wasn't this formerly fixed by a kernel config
option to not detect cpu idle on amd cpu's?
I don't see that in 2.6

Julien Oster wrote:

>cheuche+lkml@free.fr writes:
>
>Hello,
>
>  
>
>>>So gals and guys, try disabling cpu disconnect in bios and see whether 
>>>aopic now runs stable.
>>>      
>>>
>
>  
>
>>Yes that fix it. Well time will tell but I cannot make it crash with
>>hdparm -tT or cat /dev/hda so far. I'm dumping hda to /dev/null right
>>now.
>>After testing to make it crash, I used athcool to reenable CPU
>>disconnect, and guess what, test after that just crashed the box.
>>You found the problem, congratulations.
>>    
>>
>
>Well, now I'm stunned.
>
>With APIC and ACPI enabled, my machine isn't even able to boot
>completely, it'll most certainly crash before the init scripts are
>finished.
>
>Now, I modified the init scripts to do "athcool off" as the first
>thing at all (I don't have any "CPU disconnect" BIOS setting) and it
>not only booted, but I even can't seem to make it crash using my
>hdparm/grep/whatever tests...
>
>I don't know if it's "rock solid" yet, but at least the difference is
>huge. It really seems like that made the problem go away!
>
>Regards,
>Julien
>  
>



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog - found
  2003-12-06  8:18   ` Catching NForce2 lockup with NMI watchdog - found cheuche+lkml
  2003-12-06 11:22     ` Julien Oster
  2003-12-06 12:24     ` Prakash K. Cheemplavam
@ 2003-12-08  3:25     ` Bob
  2003-12-08  3:18       ` Bartlomiej Zolnierkiewicz
  2 siblings, 1 reply; 8+ messages in thread
From: Bob @ 2003-12-08  3:25 UTC (permalink / raw)
  To: linux-kernel

cheuche+lkml@free.fr wrote:

> ...................If you experience crashes with apic and your bios 
> does not have such
>
>option, try athcool at
>http://members.jcom.home.ne.jp/jacobi/linux/softwares.html
>Its purpose is to *enable* cpu disconnect but can also disable it. Your
>best bet is to run it to disable cpu disconnect the soonest possible at
>boot.
>
>On the other hand, it isn't the cause of IRQ7 rogue interrupts. As I
>initially suspected, it seems now totally unrelated. The ACPI override
>handling may be buggy ? Since putting back the timer on IO-APIC-edge
>solves it.
>
>Nevertheless this is still a problem, other chipsets for Athlon
>processors seems to be able to have cpu disconnect and ioapic enabled
>without any crashes. But so far I don't see any thermal differences, I'm
>happy with that.
>
>Mathieu
>
I presently have /proc/interrupts
 0:  244393560          XT-PIC  timer

but when I tried nvnet driver and onboard
ethernet I think I saw both IRQ7 disabled
and some 8259A spurious interrupt err.

Presently there is no grep timer or TIMER
or 8259A in logs. 8259A has to do with
IO-APIC timer? It would make sense that
nvnet would see apic and lapic on in bios
and linux and look for io-apic timer as
well as apic table, then fail confused.

Is there a link to that patch? I keep deleting
this list it's huge so I lost a patch in a message.

-Bob


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog - found
  2003-12-08  3:25     ` Bob
@ 2003-12-08  3:18       ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 8+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2003-12-08  3:18 UTC (permalink / raw)
  To: Bob; +Cc: linux-kernel


> Is there a link to that patch? I keep deleting
> this list it's huge so I lost a patch in a message.
>
> -Bob

http://www.kernel.org/pub/linux/kernel/people/bart/2.6.0-test11-bart1/broken-out/nforce2-apic.patch


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog - found
  2003-12-06  8:18   ` Catching NForce2 lockup with NMI watchdog - found cheuche+lkml
  2003-12-06 11:22     ` Julien Oster
@ 2003-12-06 12:24     ` Prakash K. Cheemplavam
  2003-12-08  3:25     ` Bob
  2 siblings, 0 replies; 8+ messages in thread
From: Prakash K. Cheemplavam @ 2003-12-06 12:24 UTC (permalink / raw)
  To: cheuche+lkml; +Cc: linux-kernel

cheuche+lkml@free.fr wrote:
> On Sat, Dec 06, 2003 at 12:49:50AM +0100, Prakash K. Cheemplavam wrote:
> 
>>So gals and guys, try disabling cpu disconnect in bios and see whether 
>>aopic now runs stable.
>>
> 
> Yes that fix it. Well time will tell but I cannot make it crash with
> hdparm -tT or cat /dev/hda so far. I'm dumping hda to /dev/null right
> now.
> 
> After testing to make it crash, I used athcool to reenable CPU
> disconnect, and guess what, test after that just crashed the box.
> You found the problem, congratulations.

:-)

Isn't it possible to ad athcool's code into the kernel, maybe into the 
pm section or even make it an kernel option. It seems to be a nice 
workaround for the time-being.

Prakash


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog - found
  2003-12-06  8:18   ` Catching NForce2 lockup with NMI watchdog - found cheuche+lkml
@ 2003-12-06 11:22     ` Julien Oster
  2003-12-08  3:34       ` Bob
  2003-12-06 12:24     ` Prakash K. Cheemplavam
  2003-12-08  3:25     ` Bob
  2 siblings, 1 reply; 8+ messages in thread
From: Julien Oster @ 2003-12-06 11:22 UTC (permalink / raw)
  To: linux-kernel

cheuche+lkml@free.fr writes:

Hello,

>> So gals and guys, try disabling cpu disconnect in bios and see whether 
>> aopic now runs stable.

> Yes that fix it. Well time will tell but I cannot make it crash with
> hdparm -tT or cat /dev/hda so far. I'm dumping hda to /dev/null right
> now.
> After testing to make it crash, I used athcool to reenable CPU
> disconnect, and guess what, test after that just crashed the box.
> You found the problem, congratulations.

Well, now I'm stunned.

With APIC and ACPI enabled, my machine isn't even able to boot
completely, it'll most certainly crash before the init scripts are
finished.

Now, I modified the init scripts to do "athcool off" as the first
thing at all (I don't have any "CPU disconnect" BIOS setting) and it
not only booted, but I even can't seem to make it crash using my
hdparm/grep/whatever tests...

I don't know if it's "rock solid" yet, but at least the difference is
huge. It really seems like that made the problem go away!

Regards,
Julien

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog - found
  2003-12-05 23:49 ` Catching NForce2 lockup with NMI watchdog - found? Prakash K. Cheemplavam
@ 2003-12-06  8:18   ` cheuche+lkml
  2003-12-06 11:22     ` Julien Oster
                       ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: cheuche+lkml @ 2003-12-06  8:18 UTC (permalink / raw)
  To: linux-kernel

On Sat, Dec 06, 2003 at 12:49:50AM +0100, Prakash K. Cheemplavam wrote:
> 
> So gals and guys, try disabling cpu disconnect in bios and see whether 
> aopic now runs stable.
> 
Yes that fix it. Well time will tell but I cannot make it crash with
hdparm -tT or cat /dev/hda so far. I'm dumping hda to /dev/null right
now.

After testing to make it crash, I used athcool to reenable CPU
disconnect, and guess what, test after that just crashed the box.
You found the problem, congratulations.

If you experience crashes with apic and your bios does not have such
option, try athcool at
http://members.jcom.home.ne.jp/jacobi/linux/softwares.html
Its purpose is to *enable* cpu disconnect but can also disable it. Your
best bet is to run it to disable cpu disconnect the soonest possible at
boot.

On the other hand, it isn't the cause of IRQ7 rogue interrupts. As I
initially suspected, it seems now totally unrelated. The ACPI override
handling may be buggy ? Since putting back the timer on IO-APIC-edge
solves it.

Nevertheless this is still a problem, other chipsets for Athlon
processors seems to be able to have cpu disconnect and ioapic enabled
without any crashes. But so far I don't see any thermal differences, I'm
happy with that.

Mathieu

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2003-12-08 10:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-12-08 10:43 Catching NForce2 lockup with NMI watchdog - found Mikael Pettersson
  -- strict thread matches above, loose matches on Subject: below --
2003-12-05 20:56 Catching NForce2 lockup with NMI watchdog Allen Martin
2003-12-05 23:49 ` Catching NForce2 lockup with NMI watchdog - found? Prakash K. Cheemplavam
2003-12-06  8:18   ` Catching NForce2 lockup with NMI watchdog - found cheuche+lkml
2003-12-06 11:22     ` Julien Oster
2003-12-08  3:34       ` Bob
2003-12-08  8:13         ` Bob
2003-12-06 12:24     ` Prakash K. Cheemplavam
2003-12-08  3:25     ` Bob
2003-12-08  3:18       ` Bartlomiej Zolnierkiewicz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).