linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: Catching NForce2 lockup with NMI watchdog
@ 2003-12-05 20:56 Allen Martin
  2003-12-05 23:49 ` Catching NForce2 lockup with NMI watchdog - found? Prakash K. Cheemplavam
  0 siblings, 1 reply; 21+ messages in thread
From: Allen Martin @ 2003-12-05 20:56 UTC (permalink / raw)
  To: 'Jesse Allen'; +Cc: linux-kernel

> -----Original Message-----
> From: Jesse Allen [mailto:the3dfxdude@hotmail.com] 
> Sent: Friday, December 05, 2003 12:36 PM
>
> Do you know whether the nforce2's with apic support the timer 
> (IRQ 0) in 
> IO-APIC mode?  To me, it seems like a bug:
> "Dec  4 20:13:11 tesore kernel: ..MP-BIOS bug: 8254 timer not 
> connected to 
> IO-APIC"
> (This message originates in arch/i386/kernel/io_apic.c)
> 

Yes, Win 9x/2k/XP use the system timer on irq0 and have no problem.  I
haven't looked at this yet.

-Allen

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog - found?
  2003-12-05 20:56 Catching NForce2 lockup with NMI watchdog Allen Martin
@ 2003-12-05 23:49 ` Prakash K. Cheemplavam
  2003-12-05 23:55   ` Prakash K. Cheemplavam
                     ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Prakash K. Cheemplavam @ 2003-12-05 23:49 UTC (permalink / raw)
  To: Allen Martin; +Cc: 'Jesse Allen', linux-kernel

Hi,

*maybe* I found the bugger, at least I got APIC more stable (need to 
test whether oit is really stable, compiling kernel right now...):

It is a problem with CPU disconnect function. I tried various parameters 
in bios and turned cpu disconnect off, and tada, I could do several 
subsequent hdparms and machine is running! As CPU disconnect is a ACPI 
state, if I am not mistkaen, I think there is something broken in ACPI 
right now or in APIC and cpu disconnect triggers the bug.

Maybe now my windows environment is stable, as well. It was much more 
stable with cpu disconnect and apic, nevertheless seldomly locked up.


So gals and guys, try disabling cpu disconnect in bios and see whether 
aopic now runs stable.

I have an Abit NF7-S Rev2.0 with Bios 2.0.

Prakash


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog - found?
  2003-12-05 23:49 ` Catching NForce2 lockup with NMI watchdog - found? Prakash K. Cheemplavam
@ 2003-12-05 23:55   ` Prakash K. Cheemplavam
  2003-12-06  0:15   ` Craig Bradney
  2003-12-06  8:18   ` Catching NForce2 lockup with NMI watchdog - found cheuche+lkml
  2 siblings, 0 replies; 21+ messages in thread
From: Prakash K. Cheemplavam @ 2003-12-05 23:55 UTC (permalink / raw)
  Cc: Allen Martin, 'Jesse Allen', linux-kernel

Prakash K. Cheemplavam wrote:
> *maybe* I found the bugger, at least I got APIC more stable (need to 
> test whether oit is really stable, compiling kernel right now...):

So, new kernel is up. So far so good:

            CPU0
   0:      47118          XT-PIC  timer
   1:         34    IO-APIC-edge  i8042
   2:          0          XT-PIC  cascade
   8:          3    IO-APIC-edge  rtc
   9:          0   IO-APIC-level  acpi
  12:        864    IO-APIC-edge  i8042
  14:         10    IO-APIC-edge  ide0
  15:         16    IO-APIC-edge  ide1
  16:          0   IO-APIC-level  Skystar2
  18:       7690   IO-APIC-level  libata
  19:       1910   IO-APIC-level  nvidia
  20:         43   IO-APIC-level  ohci_hcd, eth0
  21:        540   IO-APIC-level  NVidia nForce2
  22:          0   IO-APIC-level  ohci_hcd
NMI:          0
LOC:      46934
ERR:          0
MIS:          0


Prakash


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog - found?
  2003-12-05 23:49 ` Catching NForce2 lockup with NMI watchdog - found? Prakash K. Cheemplavam
  2003-12-05 23:55   ` Prakash K. Cheemplavam
@ 2003-12-06  0:15   ` Craig Bradney
  2003-12-06  0:21     ` Prakash K. Cheemplavam
  2003-12-08  3:03     ` Bob
  2003-12-06  8:18   ` Catching NForce2 lockup with NMI watchdog - found cheuche+lkml
  2 siblings, 2 replies; 21+ messages in thread
From: Craig Bradney @ 2003-12-06  0:15 UTC (permalink / raw)
  To: linux-kernel

On Sat, 2003-12-06 at 00:49, Prakash K. Cheemplavam wrote:
> Hi,
> 
> *maybe* I found the bugger, at least I got APIC more stable (need to 
> test whether oit is really stable, compiling kernel right now...):
> 
> It is a problem with CPU disconnect function. I tried various parameters 
> in bios and turned cpu disconnect off, and tada, I could do several 
> subsequent hdparms and machine is running! As CPU disconnect is a ACPI 
> state, if I am not mistkaen, I think there is something broken in ACPI 
> right now or in APIC and cpu disconnect triggers the bug.
> 
> Maybe now my windows environment is stable, as well. It was much more 
> stable with cpu disconnect and apic, nevertheless seldomly locked up.
> 
> 
> So gals and guys, try disabling cpu disconnect in bios and see whether 
> aopic now runs stable.

> I have an Abit NF7-S Rev2.0 with Bios 2.0.

> 
> Prakash

I rebooted and checked in my BIOS, I dont seem to have "CPU Disconnect"?
Is there another name. I also downloaded the motherboard manual for your
NF7-S and cant find it there either?

Craig



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog - found?
  2003-12-06  0:15   ` Craig Bradney
@ 2003-12-06  0:21     ` Prakash K. Cheemplavam
  2003-12-06  0:37       ` Craig Bradney
  2003-12-08  3:03     ` Bob
  1 sibling, 1 reply; 21+ messages in thread
From: Prakash K. Cheemplavam @ 2003-12-06  0:21 UTC (permalink / raw)
  To: Craig Bradney; +Cc: linux-kernel

Craig Bradney wrote:
> On Sat, 2003-12-06 at 00:49, Prakash K. Cheemplavam wrote:
> 
>>Hi,
>>
>>*maybe* I found the bugger, at least I got APIC more stable (need to 
>>test whether oit is really stable, compiling kernel right now...):
>>
>>It is a problem with CPU disconnect function. I tried various parameters 
>>in bios and turned cpu disconnect off, and tada, I could do several 
>>subsequent hdparms and machine is running! As CPU disconnect is a ACPI 
>>state, if I am not mistkaen, I think there is something broken in ACPI 
>>right now or in APIC and cpu disconnect triggers the bug.
>>
>>Maybe now my windows environment is stable, as well. It was much more 
>>stable with cpu disconnect and apic, nevertheless seldomly locked up.
>>
>>
>>So gals and guys, try disabling cpu disconnect in bios and see whether 
>>aopic now runs stable.
> 
> 
>>I have an Abit NF7-S Rev2.0 with Bios 2.0.
> 
> 
>>Prakash
> 
> 
> I rebooted and checked in my BIOS, I dont seem to have "CPU Disconnect"?
> Is there another name. I also downloaded the motherboard manual for your
> NF7-S and cant find it there either?

th efull name should be "CPU Disconnect Function". it is an the page 
with "enhanced pci performance", "enable system bios caching" ".. video 
bios caching" and all the spread spectrums. I have forgotten the name of 
that page in the main menu. Should the 3 or 4 in the first column.

Perhaps your BIOS is too old. I remember it only came with 1.8 (or 
alike) and later. But usually this setting should be disabled at default.

My machine still hasn't locked, btw. :-)

Prakash


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog - found?
  2003-12-06  0:21     ` Prakash K. Cheemplavam
@ 2003-12-06  0:37       ` Craig Bradney
  2003-12-08  3:08         ` Bob
  0 siblings, 1 reply; 21+ messages in thread
From: Craig Bradney @ 2003-12-06  0:37 UTC (permalink / raw)
  To: linux-kernel

On Sat, 2003-12-06 at 01:21, Prakash K. Cheemplavam wrote:
> Craig Bradney wrote:
> > On Sat, 2003-12-06 at 00:49, Prakash K. Cheemplavam wrote:
> > 
> >>Hi,
> >>
> >>*maybe* I found the bugger, at least I got APIC more stable (need to 
> >>test whether oit is really stable, compiling kernel right now...):
> >>
> >>It is a problem with CPU disconnect function. I tried various parameters 
> >>in bios and turned cpu disconnect off, and tada, I could do several 
> >>subsequent hdparms and machine is running! As CPU disconnect is a ACPI 
> >>state, if I am not mistkaen, I think there is something broken in ACPI 
> >>right now or in APIC and cpu disconnect triggers the bug.
> >>
> >>Maybe now my windows environment is stable, as well. It was much more 
> >>stable with cpu disconnect and apic, nevertheless seldomly locked up.
> >>
> >>
> >>So gals and guys, try disabling cpu disconnect in bios and see whether 
> >>aopic now runs stable.
> > 
> > 
> >>I have an Abit NF7-S Rev2.0 with Bios 2.0.
> > 
> > 
> >>Prakash
> > 
> > 
> > I rebooted and checked in my BIOS, I dont seem to have "CPU Disconnect"?
> > Is there another name. I also downloaded the motherboard manual for your
> > NF7-S and cant find it there either?
> 
> th efull name should be "CPU Disconnect Function". it is an the page 
> with "enhanced pci performance", "enable system bios caching" ".. video 
> bios caching" and all the spread spectrums. I have forgotten the name of 
> that page in the main menu. Should the 3 or 4 in the first column.
> 
> Perhaps your BIOS is too old. I remember it only came with 1.8 (or 
> alike) and later. But usually this setting should be disabled at default.
> 
> My machine still hasn't locked, btw. :-)


Sounds great.. maybe you have come across something. Yes, the CPU
Disconnect function arrived in your BIOS in revision of 2003/03/27
"6.Adds"CPU Disconnect Function" to adjust C1 disconnects. The Chipset
does not support C2 disconnect; thus, disable C2 function."

For me though.. Im on an ASUS A7N8X Deluxe v2 BIOS 1007. From what I can
see the CPU Disconnect isnt even in the Uber BIOS 1007 for this ASUS
that has been discussed.

Craig


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog - found
  2003-12-05 23:49 ` Catching NForce2 lockup with NMI watchdog - found? Prakash K. Cheemplavam
  2003-12-05 23:55   ` Prakash K. Cheemplavam
  2003-12-06  0:15   ` Craig Bradney
@ 2003-12-06  8:18   ` cheuche+lkml
  2003-12-06 11:22     ` Julien Oster
                       ` (2 more replies)
  2 siblings, 3 replies; 21+ messages in thread
From: cheuche+lkml @ 2003-12-06  8:18 UTC (permalink / raw)
  To: linux-kernel

On Sat, Dec 06, 2003 at 12:49:50AM +0100, Prakash K. Cheemplavam wrote:
> 
> So gals and guys, try disabling cpu disconnect in bios and see whether 
> aopic now runs stable.
> 
Yes that fix it. Well time will tell but I cannot make it crash with
hdparm -tT or cat /dev/hda so far. I'm dumping hda to /dev/null right
now.

After testing to make it crash, I used athcool to reenable CPU
disconnect, and guess what, test after that just crashed the box.
You found the problem, congratulations.

If you experience crashes with apic and your bios does not have such
option, try athcool at
http://members.jcom.home.ne.jp/jacobi/linux/softwares.html
Its purpose is to *enable* cpu disconnect but can also disable it. Your
best bet is to run it to disable cpu disconnect the soonest possible at
boot.

On the other hand, it isn't the cause of IRQ7 rogue interrupts. As I
initially suspected, it seems now totally unrelated. The ACPI override
handling may be buggy ? Since putting back the timer on IO-APIC-edge
solves it.

Nevertheless this is still a problem, other chipsets for Athlon
processors seems to be able to have cpu disconnect and ioapic enabled
without any crashes. But so far I don't see any thermal differences, I'm
happy with that.

Mathieu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog - found
  2003-12-06  8:18   ` Catching NForce2 lockup with NMI watchdog - found cheuche+lkml
@ 2003-12-06 11:22     ` Julien Oster
  2003-12-08  3:34       ` Bob
  2003-12-06 12:24     ` Prakash K. Cheemplavam
  2003-12-08  3:25     ` Bob
  2 siblings, 1 reply; 21+ messages in thread
From: Julien Oster @ 2003-12-06 11:22 UTC (permalink / raw)
  To: linux-kernel

cheuche+lkml@free.fr writes:

Hello,

>> So gals and guys, try disabling cpu disconnect in bios and see whether 
>> aopic now runs stable.

> Yes that fix it. Well time will tell but I cannot make it crash with
> hdparm -tT or cat /dev/hda so far. I'm dumping hda to /dev/null right
> now.
> After testing to make it crash, I used athcool to reenable CPU
> disconnect, and guess what, test after that just crashed the box.
> You found the problem, congratulations.

Well, now I'm stunned.

With APIC and ACPI enabled, my machine isn't even able to boot
completely, it'll most certainly crash before the init scripts are
finished.

Now, I modified the init scripts to do "athcool off" as the first
thing at all (I don't have any "CPU disconnect" BIOS setting) and it
not only booted, but I even can't seem to make it crash using my
hdparm/grep/whatever tests...

I don't know if it's "rock solid" yet, but at least the difference is
huge. It really seems like that made the problem go away!

Regards,
Julien

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog - found
  2003-12-06  8:18   ` Catching NForce2 lockup with NMI watchdog - found cheuche+lkml
  2003-12-06 11:22     ` Julien Oster
@ 2003-12-06 12:24     ` Prakash K. Cheemplavam
  2003-12-06 13:11       ` [PATCH] " Bartlomiej Zolnierkiewicz
  2003-12-08  3:25     ` Bob
  2 siblings, 1 reply; 21+ messages in thread
From: Prakash K. Cheemplavam @ 2003-12-06 12:24 UTC (permalink / raw)
  To: cheuche+lkml; +Cc: linux-kernel

cheuche+lkml@free.fr wrote:
> On Sat, Dec 06, 2003 at 12:49:50AM +0100, Prakash K. Cheemplavam wrote:
> 
>>So gals and guys, try disabling cpu disconnect in bios and see whether 
>>aopic now runs stable.
>>
> 
> Yes that fix it. Well time will tell but I cannot make it crash with
> hdparm -tT or cat /dev/hda so far. I'm dumping hda to /dev/null right
> now.
> 
> After testing to make it crash, I used athcool to reenable CPU
> disconnect, and guess what, test after that just crashed the box.
> You found the problem, congratulations.

:-)

Isn't it possible to ad athcool's code into the kernel, maybe into the 
pm section or even make it an kernel option. It seems to be a nice 
workaround for the time-being.

Prakash


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] Re: Catching NForce2 lockup with NMI watchdog - found
  2003-12-06 12:24     ` Prakash K. Cheemplavam
@ 2003-12-06 13:11       ` Bartlomiej Zolnierkiewicz
  2003-12-06 15:10         ` Prakash K. Cheemplavam
  2003-12-06 15:35         ` Vladimir Grebinskiy
  0 siblings, 2 replies; 21+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2003-12-06 13:11 UTC (permalink / raw)
  To: Prakash K. Cheemplavam; +Cc: cheuche+lkml, linux-kernel


It is possible :-).  Here is a completly untested patch.

[PATCH] fix lockups with APIC support on nForce2

Add PCI quirk to disable Halt Disconnect and Stop Grant Disconnect
(based on athcool program by Osamu Kayasono).

 arch/i386/pci/fixup.c |   13 +++++++++++++
 1 files changed, 13 insertions(+)

diff -puN arch/i386/pci/fixup.c~nforce2_disconnect_quirk arch/i386/pci/fixup.c
--- linux-2.6.0-test11/arch/i386/pci/fixup.c~nforce2_disconnect_quirk	2003-12-06 13:36:56.147911576 +0100
+++ linux-2.6.0-test11-root/arch/i386/pci/fixup.c	2003-12-06 14:03:41.655837272 +0100
@@ -187,6 +187,18 @@ static void __devinit pci_fixup_transpar
 		dev->transparent = 1;
 }
 
+/*
+ * Halt Disconnect and Stop Grant Disconnect (bit 4 at offset 0x6F)
+ * must be disabled when APIC is used (or lockups will happen).
+ */
+static void __devinit pci_fixup_nforce2_disconnect(struct pci_dev *d)
+{
+	u8 t;
+
+	pci_read_config_byte(d, 0x6F, &t);
+	pci_write_config_byte(d, 0x6F, (t & 0xef));
+}
+
 struct pci_fixup pcibios_fixups[] = {
 	{ PCI_FIXUP_HEADER,	PCI_VENDOR_ID_INTEL,	PCI_DEVICE_ID_INTEL_82451NX,	pci_fixup_i450nx },
 	{ PCI_FIXUP_HEADER,	PCI_VENDOR_ID_INTEL,	PCI_DEVICE_ID_INTEL_82454GX,	pci_fixup_i450gx },
@@ -205,5 +217,6 @@ struct pci_fixup pcibios_fixups[] = {
 	{ PCI_FIXUP_HEADER,	PCI_VENDOR_ID_VIA,	PCI_DEVICE_ID_VIA_8367_0,	pci_fixup_via_northbridge_bug },
 	{ PCI_FIXUP_HEADER,	PCI_VENDOR_ID_NCR,	PCI_DEVICE_ID_NCR_53C810,	pci_fixup_ncr53c810 },
 	{ PCI_FIXUP_HEADER,	PCI_VENDOR_ID_INTEL,	PCI_ANY_ID,			pci_fixup_transparent_bridge },
+	{ PCI_FIXUP_HEADER,	PCI_VENDOR_ID_NVIDIA,	PCI_DEVICE_ID_NVIDIA_NFORCE2,	pci_fixup_nforce2_disconnect },
 	{ 0 }
 };

_

On Saturday 06 of December 2003 13:24, Prakash K. Cheemplavam wrote:
> cheuche+lkml@free.fr wrote:
> > On Sat, Dec 06, 2003 at 12:49:50AM +0100, Prakash K. Cheemplavam wrote:
> >>So gals and guys, try disabling cpu disconnect in bios and see whether
> >>aopic now runs stable.
> >
> > Yes that fix it. Well time will tell but I cannot make it crash with
> > hdparm -tT or cat /dev/hda so far. I'm dumping hda to /dev/null right
> > now.
> >
> > After testing to make it crash, I used athcool to reenable CPU
> > disconnect, and guess what, test after that just crashed the box.
> > You found the problem, congratulations.
> >
> :-)
>
> Isn't it possible to ad athcool's code into the kernel, maybe into the
> pm section or even make it an kernel option. It seems to be a nice
> workaround for the time-being.
>
> Prakash


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Re: Catching NForce2 lockup with NMI watchdog - found
  2003-12-06 13:11       ` [PATCH] " Bartlomiej Zolnierkiewicz
@ 2003-12-06 15:10         ` Prakash K. Cheemplavam
  2003-12-06 15:37           ` Craig Bradney
  2003-12-06 15:35         ` Vladimir Grebinskiy
  1 sibling, 1 reply; 21+ messages in thread
From: Prakash K. Cheemplavam @ 2003-12-06 15:10 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Prakash K. Cheemplavam, cheuche+lkml, linux-kernel, Allen Martin

Bartlomiej Zolnierkiewicz wrote:
> It is possible :-).  Here is a completly untested patch.
> 
> [PATCH] fix lockups with APIC support on nForce2


I tried it (applied pacth and *enabled* CPU disconnect in bios) and it 
works! Good work. Nevertheless, it is no real fix, just a work-around. 
Perhaps somone from nvidia should comment on this...or some APIC guru 
needs to take a look into the code.

Prakash


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Re: Catching NForce2 lockup with NMI watchdog - found
  2003-12-06 13:11       ` [PATCH] " Bartlomiej Zolnierkiewicz
  2003-12-06 15:10         ` Prakash K. Cheemplavam
@ 2003-12-06 15:35         ` Vladimir Grebinskiy
  1 sibling, 0 replies; 21+ messages in thread
From: Vladimir Grebinskiy @ 2003-12-06 15:35 UTC (permalink / raw)
  To: linux-kernel

I hope you nailed the problem! I applied the "timer APIC-IO-edge" change 
and "athcool off" patch and the box seemed to be working with APIC/LAPIC 
(for the first time) !!! The board is the Leadtek K7nCR18G-Pro, BIOS 
9/30/2003, kernel 2.6.0-test11 and Debian/Sid (all debug turned on, 
preempt off).

The box used to lock up in seconds when running "md5sum bigfile" unless 
booted with "noapic nolapic". Now it survived parallel read from two 
drives + "ping -f router" + glxgears (with nvidia video driver). I'll 
keep testing it for a while.

Vladimir

lspci:
00:00.0 Host bridge: nVidia Corporation nForce2 AGP (different version?) 
(rev a2)
00:00.1 RAM memory: nVidia Corporation nForce2 Memory Controller 1 (rev a2)
00:00.2 RAM memory: nVidia Corporation nForce2 Memory Controller 4 (rev a2)
00:00.3 RAM memory: nVidia Corporation nForce2 Memory Controller 3 (rev a2)
00:00.4 RAM memory: nVidia Corporation nForce2 Memory Controller 2 (rev a2)
00:00.5 RAM memory: nVidia Corporation nForce2 Memory Controller 5 (rev a2)
00:01.0 ISA bridge: nVidia Corporation nForce2 ISA Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation nForce2 SMBus (MCP) (rev a2)
00:02.0 USB Controller: nVidia Corporation nForce2 USB Controller (rev a3)
00:02.1 USB Controller: nVidia Corporation nForce2 USB Controller (rev a3)
00:05.0 Multimedia audio controller: nVidia Corporation nForce 
MultiMedia audio [Via VT82C686B] (rev a2)
00:06.0 Multimedia audio controller: nVidia Corporation nForce2 AC97 
Audio Controler (MCP) (rev a1)
00:08.0 PCI bridge: nVidia Corporation nForce2 External PCI Bridge (rev a3)
00:09.0 IDE interface: nVidia Corporation nForce2 IDE (rev a2)
00:0d.0 FireWire (IEEE 1394): nVidia Corporation nForce2 FireWire (IEEE 
1394) Controller (rev a3)
00:1e.0 PCI bridge: nVidia Corporation nForce2 AGP (rev a2)
01:09.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] 
(rev 6c)
03:00.0 VGA compatible controller: nVidia Corporation NV18 [GeForce4 MX 
- nForce GPU] (rev a3)

Interrupts:


*         CPU0
  0:    2865299    IO-APIC-edge  timer
  1:       3017    IO-APIC-edge  i8042
  2:          0          XT-PIC  cascade
  8:          4    IO-APIC-edge  rtc
  9:          0   IO-APIC-level  acpi
 12:      13046    IO-APIC-edge  i8042
 14:     223268    IO-APIC-edge  ide0
 15:         49    IO-APIC-edge  ide1
 16:     616405   IO-APIC-level  nvidia
 17:    2333041   IO-APIC-level  eth0
 20:      22450   IO-APIC-level  ohci_hcd
 21:       2621   IO-APIC-level  NVidia nForce2
 22:          0   IO-APIC-level  ohci_hcd
NMI:          0
LOC:    2864269
ERR:          0
MIS:          0




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Re: Catching NForce2 lockup with NMI watchdog - found
  2003-12-06 15:10         ` Prakash K. Cheemplavam
@ 2003-12-06 15:37           ` Craig Bradney
  0 siblings, 0 replies; 21+ messages in thread
From: Craig Bradney @ 2003-12-06 15:37 UTC (permalink / raw)
  To: linux-kernel

On Sat, 2003-12-06 at 16:10, Prakash K. Cheemplavam wrote:
> Bartlomiej Zolnierkiewicz wrote:
> > It is possible :-).  Here is a completly untested patch.
> > 
> > [PATCH] fix lockups with APIC support on nForce2
> 
> 
> I tried it (applied pacth and *enabled* CPU disconnect in bios) and it 
> works! Good work. Nevertheless, it is no real fix, just a work-around. 
> Perhaps somone from nvidia should comment on this...or some APIC guru 
> needs to take a look into the code.

So.. if you find long term stability with this.. then maybe it relates
to disconnect but perhaps is just a method of increasing the time
between crashes and the patch is a correct workaround? But why isnt it a
"real" fix if the timer IRQ is not set up correctly without?

I would also like an nvidia or apic opinion on this one.

Craig


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog - found?
  2003-12-06  0:15   ` Craig Bradney
  2003-12-06  0:21     ` Prakash K. Cheemplavam
@ 2003-12-08  3:03     ` Bob
  1 sibling, 0 replies; 21+ messages in thread
From: Bob @ 2003-12-08  3:03 UTC (permalink / raw)
  To: linux-kernel

Craig Bradney wrote:

>On Sat, 2003-12-06 at 00:49, Prakash K. Cheemplavam wrote:
>  
>
>>...try disabling cpu disconnect in bios and see whether 
>>aopic now runs stable.
>>    
>>
>
>  
>
>>I have an Abit NF7-S Rev2.0 with Bios 2.0.
>>    
>>
>
>  
>
>>Prakash
>>    
>>
>
>I rebooted and checked in my BIOS, I dont seem to have "CPU Disconnect"?
>Is there another name. I also downloaded the motherboard manual for your
>NF7-S and cant find it there either?
>
>Craig
>
I don't have that either on MSI K7N2  -Bob


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog - found?
  2003-12-08  3:08         ` Bob
@ 2003-12-08  3:06           ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 21+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2003-12-08  3:06 UTC (permalink / raw)
  To: Bob; +Cc: linux-kernel

On Monday 08 of December 2003 04:08, Bob wrote:
> >Sounds great.. maybe you have come across something. Yes, the CPU
> >Disconnect function arrived in your BIOS in revision of 2003/03/27
> >"6.Adds"CPU Disconnect Function" to adjust C1 disconnects. The Chipset
> >does not support C2 disconnect; thus, disable C2 function."
> >
> >For me though.. Im on an ASUS A7N8X Deluxe v2 BIOS 1007. From what I can
> >see the CPU Disconnect isnt even in the Uber BIOS 1007 for this ASUS
> >that has been discussed.
> >
> >Craig
>
> I don't have that in MSI K7N2 MCP2-T near the
> agp and fsb spread spectrum items or anywhere
> else.

Use athcool:
	http://members.jcom.home.ne.jp/jacobi/linux/softwares.html#athcool
or apply kernel patch (2.4 and 2.6 versions were posted already).

--bart


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog - found?
  2003-12-06  0:37       ` Craig Bradney
@ 2003-12-08  3:08         ` Bob
  2003-12-08  3:06           ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 21+ messages in thread
From: Bob @ 2003-12-08  3:08 UTC (permalink / raw)
  To: linux-kernel

Craig Bradney wrote:

>On Sat, 2003-12-06 at 01:21, Prakash K. Cheemplavam wrote:
>  
>
>>Craig Bradney wrote:
>>    
>>
>>>On Sat, 2003-12-06 at 00:49, Prakash K. Cheemplavam wrote:
>>>
>>>      
>>>
>>>>Hi,
>>>>
>>>>*maybe* I found the bugger, at least I got APIC more stable (need to 
>>>>test whether oit is really stable, compiling kernel right now...):
>>>>
>>>>It is a problem with CPU disconnect function. I tried various parameters 
>>>>in bios and turned cpu disconnect off, and tada, I could do several 
>>>>subsequent hdparms and machine is running! As CPU disconnect is a ACPI 
>>>>state, if I am not mistkaen, I think there is something broken in ACPI 
>>>>right now or in APIC and cpu disconnect triggers the bug.
>>>>
>>>>Maybe now my windows environment is stable, as well. It was much more 
>>>>stable with cpu disconnect and apic, nevertheless seldomly locked up.
>>>>
>>>>
>>>>So gals and guys, try disabling cpu disconnect in bios and see whether 
>>>>aopic now runs stable.
>>>>        
>>>>
>>>      
>>>
>>>>I have an Abit NF7-S Rev2.0 with Bios 2.0.
>>>>        
>>>>
>>>      
>>>
>>>>Prakash
>>>>        
>>>>
>>>I rebooted and checked in my BIOS, I dont seem to have "CPU Disconnect"?
>>>Is there another name. I also downloaded the motherboard manual for your
>>>NF7-S and cant find it there either?
>>>      
>>>
>>th efull name should be "CPU Disconnect Function". it is an the page 
>>with "enhanced pci performance", "enable system bios caching" ".. video 
>>bios caching" and all the spread spectrums. I have forgotten the name of 
>>that page in the main menu. Should the 3 or 4 in the first column.
>>
>>Perhaps your BIOS is too old. I remember it only came with 1.8 (or 
>>alike) and later. But usually this setting should be disabled at default.
>>
>>My machine still hasn't locked, btw. :-)
>>    
>>
>
>
>Sounds great.. maybe you have come across something. Yes, the CPU
>Disconnect function arrived in your BIOS in revision of 2003/03/27
>"6.Adds"CPU Disconnect Function" to adjust C1 disconnects. The Chipset
>does not support C2 disconnect; thus, disable C2 function."
>
>For me though.. Im on an ASUS A7N8X Deluxe v2 BIOS 1007. From what I can
>see the CPU Disconnect isnt even in the Uber BIOS 1007 for this ASUS
>that has been discussed.
>
>Craig
>
I don't have that in MSI K7N2 MCP2-T near the
agp and fsb spread spectrum items or anywhere
else.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog - found
  2003-12-08  3:25     ` Bob
@ 2003-12-08  3:18       ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 21+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2003-12-08  3:18 UTC (permalink / raw)
  To: Bob; +Cc: linux-kernel


> Is there a link to that patch? I keep deleting
> this list it's huge so I lost a patch in a message.
>
> -Bob

http://www.kernel.org/pub/linux/kernel/people/bart/2.6.0-test11-bart1/broken-out/nforce2-apic.patch


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog - found
  2003-12-06  8:18   ` Catching NForce2 lockup with NMI watchdog - found cheuche+lkml
  2003-12-06 11:22     ` Julien Oster
  2003-12-06 12:24     ` Prakash K. Cheemplavam
@ 2003-12-08  3:25     ` Bob
  2003-12-08  3:18       ` Bartlomiej Zolnierkiewicz
  2 siblings, 1 reply; 21+ messages in thread
From: Bob @ 2003-12-08  3:25 UTC (permalink / raw)
  To: linux-kernel

cheuche+lkml@free.fr wrote:

> ...................If you experience crashes with apic and your bios 
> does not have such
>
>option, try athcool at
>http://members.jcom.home.ne.jp/jacobi/linux/softwares.html
>Its purpose is to *enable* cpu disconnect but can also disable it. Your
>best bet is to run it to disable cpu disconnect the soonest possible at
>boot.
>
>On the other hand, it isn't the cause of IRQ7 rogue interrupts. As I
>initially suspected, it seems now totally unrelated. The ACPI override
>handling may be buggy ? Since putting back the timer on IO-APIC-edge
>solves it.
>
>Nevertheless this is still a problem, other chipsets for Athlon
>processors seems to be able to have cpu disconnect and ioapic enabled
>without any crashes. But so far I don't see any thermal differences, I'm
>happy with that.
>
>Mathieu
>
I presently have /proc/interrupts
 0:  244393560          XT-PIC  timer

but when I tried nvnet driver and onboard
ethernet I think I saw both IRQ7 disabled
and some 8259A spurious interrupt err.

Presently there is no grep timer or TIMER
or 8259A in logs. 8259A has to do with
IO-APIC timer? It would make sense that
nvnet would see apic and lapic on in bios
and linux and look for io-apic timer as
well as apic table, then fail confused.

Is there a link to that patch? I keep deleting
this list it's huge so I lost a patch in a message.

-Bob


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog - found
  2003-12-06 11:22     ` Julien Oster
@ 2003-12-08  3:34       ` Bob
  2003-12-08  8:13         ` Bob
  0 siblings, 1 reply; 21+ messages in thread
From: Bob @ 2003-12-08  3:34 UTC (permalink / raw)
  To: linux-kernel

Wasn't this formerly fixed by a kernel config
option to not detect cpu idle on amd cpu's?
I don't see that in 2.6

Julien Oster wrote:

>cheuche+lkml@free.fr writes:
>
>Hello,
>
>  
>
>>>So gals and guys, try disabling cpu disconnect in bios and see whether 
>>>aopic now runs stable.
>>>      
>>>
>
>  
>
>>Yes that fix it. Well time will tell but I cannot make it crash with
>>hdparm -tT or cat /dev/hda so far. I'm dumping hda to /dev/null right
>>now.
>>After testing to make it crash, I used athcool to reenable CPU
>>disconnect, and guess what, test after that just crashed the box.
>>You found the problem, congratulations.
>>    
>>
>
>Well, now I'm stunned.
>
>With APIC and ACPI enabled, my machine isn't even able to boot
>completely, it'll most certainly crash before the init scripts are
>finished.
>
>Now, I modified the init scripts to do "athcool off" as the first
>thing at all (I don't have any "CPU disconnect" BIOS setting) and it
>not only booted, but I even can't seem to make it crash using my
>hdparm/grep/whatever tests...
>
>I don't know if it's "rock solid" yet, but at least the difference is
>huge. It really seems like that made the problem go away!
>
>Regards,
>Julien
>  
>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog - found
  2003-12-08  3:34       ` Bob
@ 2003-12-08  8:13         ` Bob
  0 siblings, 0 replies; 21+ messages in thread
From: Bob @ 2003-12-08  8:13 UTC (permalink / raw)
  To: linux-kernel

The athcool patch seemed to work as far as patch reported,
but there were undef and unused problems on compile so
I don't have that in right now. The timer patch is in.

http://www.kernel.org/pub/linux/kernel/people/bart/2.6.0-test11-bart1/broken-out/nforce2-apic.patch

Patch succeeded in giving me the ioapic-edge timer,
then lilo append="nmi_watchdog=1" did not work but
=2 did get NMI ticks as shown below.

cat /proc/interrupts

           CPU0       
  0:     617651    IO-APIC-edge  timer
  1:        868    IO-APIC-edge  i8042
  2:          0          XT-PIC  cascade
  8:          1    IO-APIC-edge  rtc
  9:          0   IO-APIC-level  acpi
 12:       8736    IO-APIC-edge  i8042
 14:         22    IO-APIC-edge  ide0
 15:         24    IO-APIC-edge  ide1
 16:      92853   IO-APIC-level  3ware Storage Controller, yenta, yenta
 17:       2793   IO-APIC-level  eth0
 21:          0   IO-APIC-level  NVidia nForce2
NMI:        122 
LOC:     617511 
ERR:          0
MIS:          0

Does the kernel opt "user HPET timer" relate to io-apic-edge timer?
Does the kernel opt "hangcheck timer relate" to nmi_watchdog?
Does the kernel opt "ACPI, Processor (c2) (c3 states)" relate to
  the cmos/bios "processor disconnect" option and athcool patch?

kernel 2.6.0-test11, pre-emptive, apic, lapic, acpi, anticipatory
scheduling not deadline scheduling, cpu and fsb clock 1:1 333mhz,
amd xp3000+ and high-performance settings(CAS2) other than
1:1 fsb/ram which is slow for the ram, 41C - 48C cpu temp, MSI K7N2
mboard.

My system was stable already(apic, lapic, pre-empt) after a bios
update which stopped all irq storm and crashes except "IRQ7 disabled"
and "spurious 8259A interrupts" possibly related to the XT-PIC timer
running when the other was expected due to apic and lapic and acpi
and kernel opt 'use HPET timer" all being on. Turning on onboard
ethernet set off the irq7 and 8259a errs so I have not been using
onboard eth. USB did not work. I can now test the timer patch with
the onboard ethernet, forcedeth driver, usb, and the "nvidia" X 
driver, which was crashing linux so I had to use "nv". Those
items are where my stability frontier is.

-Bob




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Catching NForce2 lockup with NMI watchdog - found
@ 2003-12-08 10:43 Mikael Pettersson
  0 siblings, 0 replies; 21+ messages in thread
From: Mikael Pettersson @ 2003-12-08 10:43 UTC (permalink / raw)
  To: linux-kernel, recbo

On Mon, 08 Dec 2003 03:13:55 -0500, Bob <recbo@nishanet.com> wrote:
>Does the kernel opt "user HPET timer" relate to io-apic-edge timer?

No. HPET is a newer piece of timer HW. IO-APIC-edge on the timer
relates to how it's connected to the CPU, not where it comes from.

>Does the kernel opt "hangcheck timer relate" to nmi_watchdog?

No.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2003-12-08 10:43 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-12-05 20:56 Catching NForce2 lockup with NMI watchdog Allen Martin
2003-12-05 23:49 ` Catching NForce2 lockup with NMI watchdog - found? Prakash K. Cheemplavam
2003-12-05 23:55   ` Prakash K. Cheemplavam
2003-12-06  0:15   ` Craig Bradney
2003-12-06  0:21     ` Prakash K. Cheemplavam
2003-12-06  0:37       ` Craig Bradney
2003-12-08  3:08         ` Bob
2003-12-08  3:06           ` Bartlomiej Zolnierkiewicz
2003-12-08  3:03     ` Bob
2003-12-06  8:18   ` Catching NForce2 lockup with NMI watchdog - found cheuche+lkml
2003-12-06 11:22     ` Julien Oster
2003-12-08  3:34       ` Bob
2003-12-08  8:13         ` Bob
2003-12-06 12:24     ` Prakash K. Cheemplavam
2003-12-06 13:11       ` [PATCH] " Bartlomiej Zolnierkiewicz
2003-12-06 15:10         ` Prakash K. Cheemplavam
2003-12-06 15:37           ` Craig Bradney
2003-12-06 15:35         ` Vladimir Grebinskiy
2003-12-08  3:25     ` Bob
2003-12-08  3:18       ` Bartlomiej Zolnierkiewicz
2003-12-08 10:43 Mikael Pettersson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).