* RE: Catching NForce2 lockup with NMI watchdog
@ 2003-12-05 20:56 Allen Martin
2003-12-05 23:49 ` Catching NForce2 lockup with NMI watchdog - found? Prakash K. Cheemplavam
0 siblings, 1 reply; 21+ messages in thread
From: Allen Martin @ 2003-12-05 20:56 UTC (permalink / raw)
To: 'Jesse Allen'; +Cc: linux-kernel
> -----Original Message-----
> From: Jesse Allen [mailto:the3dfxdude@hotmail.com]
> Sent: Friday, December 05, 2003 12:36 PM
>
> Do you know whether the nforce2's with apic support the timer
> (IRQ 0) in
> IO-APIC mode? To me, it seems like a bug:
> "Dec 4 20:13:11 tesore kernel: ..MP-BIOS bug: 8254 timer not
> connected to
> IO-APIC"
> (This message originates in arch/i386/kernel/io_apic.c)
>
Yes, Win 9x/2k/XP use the system timer on irq0 and have no problem. I
haven't looked at this yet.
-Allen
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Catching NForce2 lockup with NMI watchdog - found? 2003-12-05 20:56 Catching NForce2 lockup with NMI watchdog Allen Martin @ 2003-12-05 23:49 ` Prakash K. Cheemplavam 2003-12-05 23:55 ` Prakash K. Cheemplavam ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Prakash K. Cheemplavam @ 2003-12-05 23:49 UTC (permalink / raw) To: Allen Martin; +Cc: 'Jesse Allen', linux-kernel Hi, *maybe* I found the bugger, at least I got APIC more stable (need to test whether oit is really stable, compiling kernel right now...): It is a problem with CPU disconnect function. I tried various parameters in bios and turned cpu disconnect off, and tada, I could do several subsequent hdparms and machine is running! As CPU disconnect is a ACPI state, if I am not mistkaen, I think there is something broken in ACPI right now or in APIC and cpu disconnect triggers the bug. Maybe now my windows environment is stable, as well. It was much more stable with cpu disconnect and apic, nevertheless seldomly locked up. So gals and guys, try disabling cpu disconnect in bios and see whether aopic now runs stable. I have an Abit NF7-S Rev2.0 with Bios 2.0. Prakash ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Catching NForce2 lockup with NMI watchdog - found? 2003-12-05 23:49 ` Catching NForce2 lockup with NMI watchdog - found? Prakash K. Cheemplavam @ 2003-12-05 23:55 ` Prakash K. Cheemplavam 2003-12-06 0:15 ` Craig Bradney 2003-12-06 8:18 ` Catching NForce2 lockup with NMI watchdog - found cheuche+lkml 2 siblings, 0 replies; 21+ messages in thread From: Prakash K. Cheemplavam @ 2003-12-05 23:55 UTC (permalink / raw) Cc: Allen Martin, 'Jesse Allen', linux-kernel Prakash K. Cheemplavam wrote: > *maybe* I found the bugger, at least I got APIC more stable (need to > test whether oit is really stable, compiling kernel right now...): So, new kernel is up. So far so good: CPU0 0: 47118 XT-PIC timer 1: 34 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 3 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 864 IO-APIC-edge i8042 14: 10 IO-APIC-edge ide0 15: 16 IO-APIC-edge ide1 16: 0 IO-APIC-level Skystar2 18: 7690 IO-APIC-level libata 19: 1910 IO-APIC-level nvidia 20: 43 IO-APIC-level ohci_hcd, eth0 21: 540 IO-APIC-level NVidia nForce2 22: 0 IO-APIC-level ohci_hcd NMI: 0 LOC: 46934 ERR: 0 MIS: 0 Prakash ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Catching NForce2 lockup with NMI watchdog - found? 2003-12-05 23:49 ` Catching NForce2 lockup with NMI watchdog - found? Prakash K. Cheemplavam 2003-12-05 23:55 ` Prakash K. Cheemplavam @ 2003-12-06 0:15 ` Craig Bradney 2003-12-06 0:21 ` Prakash K. Cheemplavam 2003-12-08 3:03 ` Bob 2003-12-06 8:18 ` Catching NForce2 lockup with NMI watchdog - found cheuche+lkml 2 siblings, 2 replies; 21+ messages in thread From: Craig Bradney @ 2003-12-06 0:15 UTC (permalink / raw) To: linux-kernel On Sat, 2003-12-06 at 00:49, Prakash K. Cheemplavam wrote: > Hi, > > *maybe* I found the bugger, at least I got APIC more stable (need to > test whether oit is really stable, compiling kernel right now...): > > It is a problem with CPU disconnect function. I tried various parameters > in bios and turned cpu disconnect off, and tada, I could do several > subsequent hdparms and machine is running! As CPU disconnect is a ACPI > state, if I am not mistkaen, I think there is something broken in ACPI > right now or in APIC and cpu disconnect triggers the bug. > > Maybe now my windows environment is stable, as well. It was much more > stable with cpu disconnect and apic, nevertheless seldomly locked up. > > > So gals and guys, try disabling cpu disconnect in bios and see whether > aopic now runs stable. > I have an Abit NF7-S Rev2.0 with Bios 2.0. > > Prakash I rebooted and checked in my BIOS, I dont seem to have "CPU Disconnect"? Is there another name. I also downloaded the motherboard manual for your NF7-S and cant find it there either? Craig ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Catching NForce2 lockup with NMI watchdog - found? 2003-12-06 0:15 ` Craig Bradney @ 2003-12-06 0:21 ` Prakash K. Cheemplavam 2003-12-06 0:37 ` Craig Bradney 2003-12-08 3:03 ` Bob 1 sibling, 1 reply; 21+ messages in thread From: Prakash K. Cheemplavam @ 2003-12-06 0:21 UTC (permalink / raw) To: Craig Bradney; +Cc: linux-kernel Craig Bradney wrote: > On Sat, 2003-12-06 at 00:49, Prakash K. Cheemplavam wrote: > >>Hi, >> >>*maybe* I found the bugger, at least I got APIC more stable (need to >>test whether oit is really stable, compiling kernel right now...): >> >>It is a problem with CPU disconnect function. I tried various parameters >>in bios and turned cpu disconnect off, and tada, I could do several >>subsequent hdparms and machine is running! As CPU disconnect is a ACPI >>state, if I am not mistkaen, I think there is something broken in ACPI >>right now or in APIC and cpu disconnect triggers the bug. >> >>Maybe now my windows environment is stable, as well. It was much more >>stable with cpu disconnect and apic, nevertheless seldomly locked up. >> >> >>So gals and guys, try disabling cpu disconnect in bios and see whether >>aopic now runs stable. > > >>I have an Abit NF7-S Rev2.0 with Bios 2.0. > > >>Prakash > > > I rebooted and checked in my BIOS, I dont seem to have "CPU Disconnect"? > Is there another name. I also downloaded the motherboard manual for your > NF7-S and cant find it there either? th efull name should be "CPU Disconnect Function". it is an the page with "enhanced pci performance", "enable system bios caching" ".. video bios caching" and all the spread spectrums. I have forgotten the name of that page in the main menu. Should the 3 or 4 in the first column. Perhaps your BIOS is too old. I remember it only came with 1.8 (or alike) and later. But usually this setting should be disabled at default. My machine still hasn't locked, btw. :-) Prakash ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Catching NForce2 lockup with NMI watchdog - found? 2003-12-06 0:21 ` Prakash K. Cheemplavam @ 2003-12-06 0:37 ` Craig Bradney 2003-12-08 3:08 ` Bob 0 siblings, 1 reply; 21+ messages in thread From: Craig Bradney @ 2003-12-06 0:37 UTC (permalink / raw) To: linux-kernel On Sat, 2003-12-06 at 01:21, Prakash K. Cheemplavam wrote: > Craig Bradney wrote: > > On Sat, 2003-12-06 at 00:49, Prakash K. Cheemplavam wrote: > > > >>Hi, > >> > >>*maybe* I found the bugger, at least I got APIC more stable (need to > >>test whether oit is really stable, compiling kernel right now...): > >> > >>It is a problem with CPU disconnect function. I tried various parameters > >>in bios and turned cpu disconnect off, and tada, I could do several > >>subsequent hdparms and machine is running! As CPU disconnect is a ACPI > >>state, if I am not mistkaen, I think there is something broken in ACPI > >>right now or in APIC and cpu disconnect triggers the bug. > >> > >>Maybe now my windows environment is stable, as well. It was much more > >>stable with cpu disconnect and apic, nevertheless seldomly locked up. > >> > >> > >>So gals and guys, try disabling cpu disconnect in bios and see whether > >>aopic now runs stable. > > > > > >>I have an Abit NF7-S Rev2.0 with Bios 2.0. > > > > > >>Prakash > > > > > > I rebooted and checked in my BIOS, I dont seem to have "CPU Disconnect"? > > Is there another name. I also downloaded the motherboard manual for your > > NF7-S and cant find it there either? > > th efull name should be "CPU Disconnect Function". it is an the page > with "enhanced pci performance", "enable system bios caching" ".. video > bios caching" and all the spread spectrums. I have forgotten the name of > that page in the main menu. Should the 3 or 4 in the first column. > > Perhaps your BIOS is too old. I remember it only came with 1.8 (or > alike) and later. But usually this setting should be disabled at default. > > My machine still hasn't locked, btw. :-) Sounds great.. maybe you have come across something. Yes, the CPU Disconnect function arrived in your BIOS in revision of 2003/03/27 "6.Adds"CPU Disconnect Function" to adjust C1 disconnects. The Chipset does not support C2 disconnect; thus, disable C2 function." For me though.. Im on an ASUS A7N8X Deluxe v2 BIOS 1007. From what I can see the CPU Disconnect isnt even in the Uber BIOS 1007 for this ASUS that has been discussed. Craig ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Catching NForce2 lockup with NMI watchdog - found? 2003-12-06 0:37 ` Craig Bradney @ 2003-12-08 3:08 ` Bob 2003-12-08 3:06 ` Bartlomiej Zolnierkiewicz 0 siblings, 1 reply; 21+ messages in thread From: Bob @ 2003-12-08 3:08 UTC (permalink / raw) To: linux-kernel Craig Bradney wrote: >On Sat, 2003-12-06 at 01:21, Prakash K. Cheemplavam wrote: > > >>Craig Bradney wrote: >> >> >>>On Sat, 2003-12-06 at 00:49, Prakash K. Cheemplavam wrote: >>> >>> >>> >>>>Hi, >>>> >>>>*maybe* I found the bugger, at least I got APIC more stable (need to >>>>test whether oit is really stable, compiling kernel right now...): >>>> >>>>It is a problem with CPU disconnect function. I tried various parameters >>>>in bios and turned cpu disconnect off, and tada, I could do several >>>>subsequent hdparms and machine is running! As CPU disconnect is a ACPI >>>>state, if I am not mistkaen, I think there is something broken in ACPI >>>>right now or in APIC and cpu disconnect triggers the bug. >>>> >>>>Maybe now my windows environment is stable, as well. It was much more >>>>stable with cpu disconnect and apic, nevertheless seldomly locked up. >>>> >>>> >>>>So gals and guys, try disabling cpu disconnect in bios and see whether >>>>aopic now runs stable. >>>> >>>> >>> >>> >>>>I have an Abit NF7-S Rev2.0 with Bios 2.0. >>>> >>>> >>> >>> >>>>Prakash >>>> >>>> >>>I rebooted and checked in my BIOS, I dont seem to have "CPU Disconnect"? >>>Is there another name. I also downloaded the motherboard manual for your >>>NF7-S and cant find it there either? >>> >>> >>th efull name should be "CPU Disconnect Function". it is an the page >>with "enhanced pci performance", "enable system bios caching" ".. video >>bios caching" and all the spread spectrums. I have forgotten the name of >>that page in the main menu. Should the 3 or 4 in the first column. >> >>Perhaps your BIOS is too old. I remember it only came with 1.8 (or >>alike) and later. But usually this setting should be disabled at default. >> >>My machine still hasn't locked, btw. :-) >> >> > > >Sounds great.. maybe you have come across something. Yes, the CPU >Disconnect function arrived in your BIOS in revision of 2003/03/27 >"6.Adds"CPU Disconnect Function" to adjust C1 disconnects. The Chipset >does not support C2 disconnect; thus, disable C2 function." > >For me though.. Im on an ASUS A7N8X Deluxe v2 BIOS 1007. From what I can >see the CPU Disconnect isnt even in the Uber BIOS 1007 for this ASUS >that has been discussed. > >Craig > I don't have that in MSI K7N2 MCP2-T near the agp and fsb spread spectrum items or anywhere else. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Catching NForce2 lockup with NMI watchdog - found? 2003-12-08 3:08 ` Bob @ 2003-12-08 3:06 ` Bartlomiej Zolnierkiewicz 0 siblings, 0 replies; 21+ messages in thread From: Bartlomiej Zolnierkiewicz @ 2003-12-08 3:06 UTC (permalink / raw) To: Bob; +Cc: linux-kernel On Monday 08 of December 2003 04:08, Bob wrote: > >Sounds great.. maybe you have come across something. Yes, the CPU > >Disconnect function arrived in your BIOS in revision of 2003/03/27 > >"6.Adds"CPU Disconnect Function" to adjust C1 disconnects. The Chipset > >does not support C2 disconnect; thus, disable C2 function." > > > >For me though.. Im on an ASUS A7N8X Deluxe v2 BIOS 1007. From what I can > >see the CPU Disconnect isnt even in the Uber BIOS 1007 for this ASUS > >that has been discussed. > > > >Craig > > I don't have that in MSI K7N2 MCP2-T near the > agp and fsb spread spectrum items or anywhere > else. Use athcool: http://members.jcom.home.ne.jp/jacobi/linux/softwares.html#athcool or apply kernel patch (2.4 and 2.6 versions were posted already). --bart ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Catching NForce2 lockup with NMI watchdog - found? 2003-12-06 0:15 ` Craig Bradney 2003-12-06 0:21 ` Prakash K. Cheemplavam @ 2003-12-08 3:03 ` Bob 1 sibling, 0 replies; 21+ messages in thread From: Bob @ 2003-12-08 3:03 UTC (permalink / raw) To: linux-kernel Craig Bradney wrote: >On Sat, 2003-12-06 at 00:49, Prakash K. Cheemplavam wrote: > > >>...try disabling cpu disconnect in bios and see whether >>aopic now runs stable. >> >> > > > >>I have an Abit NF7-S Rev2.0 with Bios 2.0. >> >> > > > >>Prakash >> >> > >I rebooted and checked in my BIOS, I dont seem to have "CPU Disconnect"? >Is there another name. I also downloaded the motherboard manual for your >NF7-S and cant find it there either? > >Craig > I don't have that either on MSI K7N2 -Bob ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Catching NForce2 lockup with NMI watchdog - found 2003-12-05 23:49 ` Catching NForce2 lockup with NMI watchdog - found? Prakash K. Cheemplavam 2003-12-05 23:55 ` Prakash K. Cheemplavam 2003-12-06 0:15 ` Craig Bradney @ 2003-12-06 8:18 ` cheuche+lkml 2003-12-06 11:22 ` Julien Oster ` (2 more replies) 2 siblings, 3 replies; 21+ messages in thread From: cheuche+lkml @ 2003-12-06 8:18 UTC (permalink / raw) To: linux-kernel On Sat, Dec 06, 2003 at 12:49:50AM +0100, Prakash K. Cheemplavam wrote: > > So gals and guys, try disabling cpu disconnect in bios and see whether > aopic now runs stable. > Yes that fix it. Well time will tell but I cannot make it crash with hdparm -tT or cat /dev/hda so far. I'm dumping hda to /dev/null right now. After testing to make it crash, I used athcool to reenable CPU disconnect, and guess what, test after that just crashed the box. You found the problem, congratulations. If you experience crashes with apic and your bios does not have such option, try athcool at http://members.jcom.home.ne.jp/jacobi/linux/softwares.html Its purpose is to *enable* cpu disconnect but can also disable it. Your best bet is to run it to disable cpu disconnect the soonest possible at boot. On the other hand, it isn't the cause of IRQ7 rogue interrupts. As I initially suspected, it seems now totally unrelated. The ACPI override handling may be buggy ? Since putting back the timer on IO-APIC-edge solves it. Nevertheless this is still a problem, other chipsets for Athlon processors seems to be able to have cpu disconnect and ioapic enabled without any crashes. But so far I don't see any thermal differences, I'm happy with that. Mathieu ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Catching NForce2 lockup with NMI watchdog - found 2003-12-06 8:18 ` Catching NForce2 lockup with NMI watchdog - found cheuche+lkml @ 2003-12-06 11:22 ` Julien Oster 2003-12-08 3:34 ` Bob 2003-12-06 12:24 ` Prakash K. Cheemplavam 2003-12-08 3:25 ` Bob 2 siblings, 1 reply; 21+ messages in thread From: Julien Oster @ 2003-12-06 11:22 UTC (permalink / raw) To: linux-kernel cheuche+lkml@free.fr writes: Hello, >> So gals and guys, try disabling cpu disconnect in bios and see whether >> aopic now runs stable. > Yes that fix it. Well time will tell but I cannot make it crash with > hdparm -tT or cat /dev/hda so far. I'm dumping hda to /dev/null right > now. > After testing to make it crash, I used athcool to reenable CPU > disconnect, and guess what, test after that just crashed the box. > You found the problem, congratulations. Well, now I'm stunned. With APIC and ACPI enabled, my machine isn't even able to boot completely, it'll most certainly crash before the init scripts are finished. Now, I modified the init scripts to do "athcool off" as the first thing at all (I don't have any "CPU disconnect" BIOS setting) and it not only booted, but I even can't seem to make it crash using my hdparm/grep/whatever tests... I don't know if it's "rock solid" yet, but at least the difference is huge. It really seems like that made the problem go away! Regards, Julien ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Catching NForce2 lockup with NMI watchdog - found 2003-12-06 11:22 ` Julien Oster @ 2003-12-08 3:34 ` Bob 2003-12-08 8:13 ` Bob 0 siblings, 1 reply; 21+ messages in thread From: Bob @ 2003-12-08 3:34 UTC (permalink / raw) To: linux-kernel Wasn't this formerly fixed by a kernel config option to not detect cpu idle on amd cpu's? I don't see that in 2.6 Julien Oster wrote: >cheuche+lkml@free.fr writes: > >Hello, > > > >>>So gals and guys, try disabling cpu disconnect in bios and see whether >>>aopic now runs stable. >>> >>> > > > >>Yes that fix it. Well time will tell but I cannot make it crash with >>hdparm -tT or cat /dev/hda so far. I'm dumping hda to /dev/null right >>now. >>After testing to make it crash, I used athcool to reenable CPU >>disconnect, and guess what, test after that just crashed the box. >>You found the problem, congratulations. >> >> > >Well, now I'm stunned. > >With APIC and ACPI enabled, my machine isn't even able to boot >completely, it'll most certainly crash before the init scripts are >finished. > >Now, I modified the init scripts to do "athcool off" as the first >thing at all (I don't have any "CPU disconnect" BIOS setting) and it >not only booted, but I even can't seem to make it crash using my >hdparm/grep/whatever tests... > >I don't know if it's "rock solid" yet, but at least the difference is >huge. It really seems like that made the problem go away! > >Regards, >Julien > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Catching NForce2 lockup with NMI watchdog - found 2003-12-08 3:34 ` Bob @ 2003-12-08 8:13 ` Bob 0 siblings, 0 replies; 21+ messages in thread From: Bob @ 2003-12-08 8:13 UTC (permalink / raw) To: linux-kernel The athcool patch seemed to work as far as patch reported, but there were undef and unused problems on compile so I don't have that in right now. The timer patch is in. http://www.kernel.org/pub/linux/kernel/people/bart/2.6.0-test11-bart1/broken-out/nforce2-apic.patch Patch succeeded in giving me the ioapic-edge timer, then lilo append="nmi_watchdog=1" did not work but =2 did get NMI ticks as shown below. cat /proc/interrupts CPU0 0: 617651 IO-APIC-edge timer 1: 868 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 1 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 8736 IO-APIC-edge i8042 14: 22 IO-APIC-edge ide0 15: 24 IO-APIC-edge ide1 16: 92853 IO-APIC-level 3ware Storage Controller, yenta, yenta 17: 2793 IO-APIC-level eth0 21: 0 IO-APIC-level NVidia nForce2 NMI: 122 LOC: 617511 ERR: 0 MIS: 0 Does the kernel opt "user HPET timer" relate to io-apic-edge timer? Does the kernel opt "hangcheck timer relate" to nmi_watchdog? Does the kernel opt "ACPI, Processor (c2) (c3 states)" relate to the cmos/bios "processor disconnect" option and athcool patch? kernel 2.6.0-test11, pre-emptive, apic, lapic, acpi, anticipatory scheduling not deadline scheduling, cpu and fsb clock 1:1 333mhz, amd xp3000+ and high-performance settings(CAS2) other than 1:1 fsb/ram which is slow for the ram, 41C - 48C cpu temp, MSI K7N2 mboard. My system was stable already(apic, lapic, pre-empt) after a bios update which stopped all irq storm and crashes except "IRQ7 disabled" and "spurious 8259A interrupts" possibly related to the XT-PIC timer running when the other was expected due to apic and lapic and acpi and kernel opt 'use HPET timer" all being on. Turning on onboard ethernet set off the irq7 and 8259a errs so I have not been using onboard eth. USB did not work. I can now test the timer patch with the onboard ethernet, forcedeth driver, usb, and the "nvidia" X driver, which was crashing linux so I had to use "nv". Those items are where my stability frontier is. -Bob ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Catching NForce2 lockup with NMI watchdog - found 2003-12-06 8:18 ` Catching NForce2 lockup with NMI watchdog - found cheuche+lkml 2003-12-06 11:22 ` Julien Oster @ 2003-12-06 12:24 ` Prakash K. Cheemplavam 2003-12-06 13:11 ` [PATCH] " Bartlomiej Zolnierkiewicz 2003-12-08 3:25 ` Bob 2 siblings, 1 reply; 21+ messages in thread From: Prakash K. Cheemplavam @ 2003-12-06 12:24 UTC (permalink / raw) To: cheuche+lkml; +Cc: linux-kernel cheuche+lkml@free.fr wrote: > On Sat, Dec 06, 2003 at 12:49:50AM +0100, Prakash K. Cheemplavam wrote: > >>So gals and guys, try disabling cpu disconnect in bios and see whether >>aopic now runs stable. >> > > Yes that fix it. Well time will tell but I cannot make it crash with > hdparm -tT or cat /dev/hda so far. I'm dumping hda to /dev/null right > now. > > After testing to make it crash, I used athcool to reenable CPU > disconnect, and guess what, test after that just crashed the box. > You found the problem, congratulations. :-) Isn't it possible to ad athcool's code into the kernel, maybe into the pm section or even make it an kernel option. It seems to be a nice workaround for the time-being. Prakash ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH] Re: Catching NForce2 lockup with NMI watchdog - found 2003-12-06 12:24 ` Prakash K. Cheemplavam @ 2003-12-06 13:11 ` Bartlomiej Zolnierkiewicz 2003-12-06 15:10 ` Prakash K. Cheemplavam 2003-12-06 15:35 ` Vladimir Grebinskiy 0 siblings, 2 replies; 21+ messages in thread From: Bartlomiej Zolnierkiewicz @ 2003-12-06 13:11 UTC (permalink / raw) To: Prakash K. Cheemplavam; +Cc: cheuche+lkml, linux-kernel It is possible :-). Here is a completly untested patch. [PATCH] fix lockups with APIC support on nForce2 Add PCI quirk to disable Halt Disconnect and Stop Grant Disconnect (based on athcool program by Osamu Kayasono). arch/i386/pci/fixup.c | 13 +++++++++++++ 1 files changed, 13 insertions(+) diff -puN arch/i386/pci/fixup.c~nforce2_disconnect_quirk arch/i386/pci/fixup.c --- linux-2.6.0-test11/arch/i386/pci/fixup.c~nforce2_disconnect_quirk 2003-12-06 13:36:56.147911576 +0100 +++ linux-2.6.0-test11-root/arch/i386/pci/fixup.c 2003-12-06 14:03:41.655837272 +0100 @@ -187,6 +187,18 @@ static void __devinit pci_fixup_transpar dev->transparent = 1; } +/* + * Halt Disconnect and Stop Grant Disconnect (bit 4 at offset 0x6F) + * must be disabled when APIC is used (or lockups will happen). + */ +static void __devinit pci_fixup_nforce2_disconnect(struct pci_dev *d) +{ + u8 t; + + pci_read_config_byte(d, 0x6F, &t); + pci_write_config_byte(d, 0x6F, (t & 0xef)); +} + struct pci_fixup pcibios_fixups[] = { { PCI_FIXUP_HEADER, PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82451NX, pci_fixup_i450nx }, { PCI_FIXUP_HEADER, PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82454GX, pci_fixup_i450gx }, @@ -205,5 +217,6 @@ struct pci_fixup pcibios_fixups[] = { { PCI_FIXUP_HEADER, PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_8367_0, pci_fixup_via_northbridge_bug }, { PCI_FIXUP_HEADER, PCI_VENDOR_ID_NCR, PCI_DEVICE_ID_NCR_53C810, pci_fixup_ncr53c810 }, { PCI_FIXUP_HEADER, PCI_VENDOR_ID_INTEL, PCI_ANY_ID, pci_fixup_transparent_bridge }, + { PCI_FIXUP_HEADER, PCI_VENDOR_ID_NVIDIA, PCI_DEVICE_ID_NVIDIA_NFORCE2, pci_fixup_nforce2_disconnect }, { 0 } }; _ On Saturday 06 of December 2003 13:24, Prakash K. Cheemplavam wrote: > cheuche+lkml@free.fr wrote: > > On Sat, Dec 06, 2003 at 12:49:50AM +0100, Prakash K. Cheemplavam wrote: > >>So gals and guys, try disabling cpu disconnect in bios and see whether > >>aopic now runs stable. > > > > Yes that fix it. Well time will tell but I cannot make it crash with > > hdparm -tT or cat /dev/hda so far. I'm dumping hda to /dev/null right > > now. > > > > After testing to make it crash, I used athcool to reenable CPU > > disconnect, and guess what, test after that just crashed the box. > > You found the problem, congratulations. > > > :-) > > Isn't it possible to ad athcool's code into the kernel, maybe into the > pm section or even make it an kernel option. It seems to be a nice > workaround for the time-being. > > Prakash ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Re: Catching NForce2 lockup with NMI watchdog - found 2003-12-06 13:11 ` [PATCH] " Bartlomiej Zolnierkiewicz @ 2003-12-06 15:10 ` Prakash K. Cheemplavam 2003-12-06 15:37 ` Craig Bradney 2003-12-06 15:35 ` Vladimir Grebinskiy 1 sibling, 1 reply; 21+ messages in thread From: Prakash K. Cheemplavam @ 2003-12-06 15:10 UTC (permalink / raw) To: Bartlomiej Zolnierkiewicz Cc: Prakash K. Cheemplavam, cheuche+lkml, linux-kernel, Allen Martin Bartlomiej Zolnierkiewicz wrote: > It is possible :-). Here is a completly untested patch. > > [PATCH] fix lockups with APIC support on nForce2 I tried it (applied pacth and *enabled* CPU disconnect in bios) and it works! Good work. Nevertheless, it is no real fix, just a work-around. Perhaps somone from nvidia should comment on this...or some APIC guru needs to take a look into the code. Prakash ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Re: Catching NForce2 lockup with NMI watchdog - found 2003-12-06 15:10 ` Prakash K. Cheemplavam @ 2003-12-06 15:37 ` Craig Bradney 0 siblings, 0 replies; 21+ messages in thread From: Craig Bradney @ 2003-12-06 15:37 UTC (permalink / raw) To: linux-kernel On Sat, 2003-12-06 at 16:10, Prakash K. Cheemplavam wrote: > Bartlomiej Zolnierkiewicz wrote: > > It is possible :-). Here is a completly untested patch. > > > > [PATCH] fix lockups with APIC support on nForce2 > > > I tried it (applied pacth and *enabled* CPU disconnect in bios) and it > works! Good work. Nevertheless, it is no real fix, just a work-around. > Perhaps somone from nvidia should comment on this...or some APIC guru > needs to take a look into the code. So.. if you find long term stability with this.. then maybe it relates to disconnect but perhaps is just a method of increasing the time between crashes and the patch is a correct workaround? But why isnt it a "real" fix if the timer IRQ is not set up correctly without? I would also like an nvidia or apic opinion on this one. Craig ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Re: Catching NForce2 lockup with NMI watchdog - found 2003-12-06 13:11 ` [PATCH] " Bartlomiej Zolnierkiewicz 2003-12-06 15:10 ` Prakash K. Cheemplavam @ 2003-12-06 15:35 ` Vladimir Grebinskiy 1 sibling, 0 replies; 21+ messages in thread From: Vladimir Grebinskiy @ 2003-12-06 15:35 UTC (permalink / raw) To: linux-kernel I hope you nailed the problem! I applied the "timer APIC-IO-edge" change and "athcool off" patch and the box seemed to be working with APIC/LAPIC (for the first time) !!! The board is the Leadtek K7nCR18G-Pro, BIOS 9/30/2003, kernel 2.6.0-test11 and Debian/Sid (all debug turned on, preempt off). The box used to lock up in seconds when running "md5sum bigfile" unless booted with "noapic nolapic". Now it survived parallel read from two drives + "ping -f router" + glxgears (with nvidia video driver). I'll keep testing it for a while. Vladimir lspci: 00:00.0 Host bridge: nVidia Corporation nForce2 AGP (different version?) (rev a2) 00:00.1 RAM memory: nVidia Corporation nForce2 Memory Controller 1 (rev a2) 00:00.2 RAM memory: nVidia Corporation nForce2 Memory Controller 4 (rev a2) 00:00.3 RAM memory: nVidia Corporation nForce2 Memory Controller 3 (rev a2) 00:00.4 RAM memory: nVidia Corporation nForce2 Memory Controller 2 (rev a2) 00:00.5 RAM memory: nVidia Corporation nForce2 Memory Controller 5 (rev a2) 00:01.0 ISA bridge: nVidia Corporation nForce2 ISA Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation nForce2 SMBus (MCP) (rev a2) 00:02.0 USB Controller: nVidia Corporation nForce2 USB Controller (rev a3) 00:02.1 USB Controller: nVidia Corporation nForce2 USB Controller (rev a3) 00:05.0 Multimedia audio controller: nVidia Corporation nForce MultiMedia audio [Via VT82C686B] (rev a2) 00:06.0 Multimedia audio controller: nVidia Corporation nForce2 AC97 Audio Controler (MCP) (rev a1) 00:08.0 PCI bridge: nVidia Corporation nForce2 External PCI Bridge (rev a3) 00:09.0 IDE interface: nVidia Corporation nForce2 IDE (rev a2) 00:0d.0 FireWire (IEEE 1394): nVidia Corporation nForce2 FireWire (IEEE 1394) Controller (rev a3) 00:1e.0 PCI bridge: nVidia Corporation nForce2 AGP (rev a2) 01:09.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 6c) 03:00.0 VGA compatible controller: nVidia Corporation NV18 [GeForce4 MX - nForce GPU] (rev a3) Interrupts: * CPU0 0: 2865299 IO-APIC-edge timer 1: 3017 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 4 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 13046 IO-APIC-edge i8042 14: 223268 IO-APIC-edge ide0 15: 49 IO-APIC-edge ide1 16: 616405 IO-APIC-level nvidia 17: 2333041 IO-APIC-level eth0 20: 22450 IO-APIC-level ohci_hcd 21: 2621 IO-APIC-level NVidia nForce2 22: 0 IO-APIC-level ohci_hcd NMI: 0 LOC: 2864269 ERR: 0 MIS: 0 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Catching NForce2 lockup with NMI watchdog - found 2003-12-06 8:18 ` Catching NForce2 lockup with NMI watchdog - found cheuche+lkml 2003-12-06 11:22 ` Julien Oster 2003-12-06 12:24 ` Prakash K. Cheemplavam @ 2003-12-08 3:25 ` Bob 2003-12-08 3:18 ` Bartlomiej Zolnierkiewicz 2 siblings, 1 reply; 21+ messages in thread From: Bob @ 2003-12-08 3:25 UTC (permalink / raw) To: linux-kernel cheuche+lkml@free.fr wrote: > ...................If you experience crashes with apic and your bios > does not have such > >option, try athcool at >http://members.jcom.home.ne.jp/jacobi/linux/softwares.html >Its purpose is to *enable* cpu disconnect but can also disable it. Your >best bet is to run it to disable cpu disconnect the soonest possible at >boot. > >On the other hand, it isn't the cause of IRQ7 rogue interrupts. As I >initially suspected, it seems now totally unrelated. The ACPI override >handling may be buggy ? Since putting back the timer on IO-APIC-edge >solves it. > >Nevertheless this is still a problem, other chipsets for Athlon >processors seems to be able to have cpu disconnect and ioapic enabled >without any crashes. But so far I don't see any thermal differences, I'm >happy with that. > >Mathieu > I presently have /proc/interrupts 0: 244393560 XT-PIC timer but when I tried nvnet driver and onboard ethernet I think I saw both IRQ7 disabled and some 8259A spurious interrupt err. Presently there is no grep timer or TIMER or 8259A in logs. 8259A has to do with IO-APIC timer? It would make sense that nvnet would see apic and lapic on in bios and linux and look for io-apic timer as well as apic table, then fail confused. Is there a link to that patch? I keep deleting this list it's huge so I lost a patch in a message. -Bob ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Catching NForce2 lockup with NMI watchdog - found 2003-12-08 3:25 ` Bob @ 2003-12-08 3:18 ` Bartlomiej Zolnierkiewicz 0 siblings, 0 replies; 21+ messages in thread From: Bartlomiej Zolnierkiewicz @ 2003-12-08 3:18 UTC (permalink / raw) To: Bob; +Cc: linux-kernel > Is there a link to that patch? I keep deleting > this list it's huge so I lost a patch in a message. > > -Bob http://www.kernel.org/pub/linux/kernel/people/bart/2.6.0-test11-bart1/broken-out/nforce2-apic.patch ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Catching NForce2 lockup with NMI watchdog - found @ 2003-12-08 10:43 Mikael Pettersson 0 siblings, 0 replies; 21+ messages in thread From: Mikael Pettersson @ 2003-12-08 10:43 UTC (permalink / raw) To: linux-kernel, recbo On Mon, 08 Dec 2003 03:13:55 -0500, Bob <recbo@nishanet.com> wrote: >Does the kernel opt "user HPET timer" relate to io-apic-edge timer? No. HPET is a newer piece of timer HW. IO-APIC-edge on the timer relates to how it's connected to the CPU, not where it comes from. >Does the kernel opt "hangcheck timer relate" to nmi_watchdog? No. ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2003-12-08 10:43 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-12-05 20:56 Catching NForce2 lockup with NMI watchdog Allen Martin 2003-12-05 23:49 ` Catching NForce2 lockup with NMI watchdog - found? Prakash K. Cheemplavam 2003-12-05 23:55 ` Prakash K. Cheemplavam 2003-12-06 0:15 ` Craig Bradney 2003-12-06 0:21 ` Prakash K. Cheemplavam 2003-12-06 0:37 ` Craig Bradney 2003-12-08 3:08 ` Bob 2003-12-08 3:06 ` Bartlomiej Zolnierkiewicz 2003-12-08 3:03 ` Bob 2003-12-06 8:18 ` Catching NForce2 lockup with NMI watchdog - found cheuche+lkml 2003-12-06 11:22 ` Julien Oster 2003-12-08 3:34 ` Bob 2003-12-08 8:13 ` Bob 2003-12-06 12:24 ` Prakash K. Cheemplavam 2003-12-06 13:11 ` [PATCH] " Bartlomiej Zolnierkiewicz 2003-12-06 15:10 ` Prakash K. Cheemplavam 2003-12-06 15:37 ` Craig Bradney 2003-12-06 15:35 ` Vladimir Grebinskiy 2003-12-08 3:25 ` Bob 2003-12-08 3:18 ` Bartlomiej Zolnierkiewicz 2003-12-08 10:43 Mikael Pettersson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).