linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Craig Bradney <cbradney@zip.com.au>
To: b@netzentry.com
Cc: ross.alexander@uk.neceur.com, s0348365@sms.ed.ac.uk,
	linux-kernel@vger.kernel.org, pomac@vapor.com,
	forming@charter.net
Subject: Re: NForce2 pseudoscience stability testing (2.6.0-test11)
Date: Wed, 03 Dec 2003 01:28:58 +0100	[thread overview]
Message-ID: <1070411338.2452.66.camel@athlonxp.bradney.info> (raw)
In-Reply-To: <3FCD21E1.5080300@netzentry.com>

Jump down for my replies..


On Wed, 2003-12-03 at 00:36, b@netzentry.com wrote:
> Regardless of BIOS version, doesn’t the kernel get to
> fix values if things are "wrong?" I've seen this message
> on several boards where various tables and ACPI and APIC
> things are "fixed up." The only reason I want to find out so bad
> is that board survives unusual amounts of abuse on Windows 2000. (I 
> havent been able to hang Windows as Prakash had.)
> 
> Right now I can't even afford to test 2.6.0.test11 in terms
> of time but very similar problems exist in 2.4, suggesting
> something fundamental?
> 
> About the IDE, it seems to be the easiest way to promote the
> problem but time seems to be the biggest factor. Some have
> suggested wrt this NFORCE2 problem that idle time makes it
> worse, but I've seen the hang under both conditions.

Not idle time.. my PC idles (with KDE/xchat/evolution running , if that
is idling :)) while i am out or sleeping.. thats at least 4 hours non
stop per day...

> I have a very minimal set of things running on this board, I
> shut off the USB1.x and 2.0 controller, jumpered the Si SATA
> off and turned of Audio and the NVIDIA Lan. (Prakash suggested
> the Si SATA was the evil but I have it jumpered off.)

Audio on and used, USB 2.0, 1.x on but nothing plugged, NVIDIA lan off,
3com lan on and connected at 100Mbit.

> For me the formula to reproduce this problem is this simple:
> - Do anything (including passing the noapic nolapix noapixio etc)
> in a UP-kernel with APIC compiled in, get hang.
> - To avoid the problem, recompile the UP-kernel with APIC turned
> off.
> 
> What probably isnt the problem:
> - Motherboard flavor (This problem has cropped up on all nforce2 boards)

 a7n8x deluxe v2 with standard clocked athlon xp 2600+

> - BIOS (This problem can change nature with a BIOS rev, but I havent 
> seen it go away and I tried several).

 bios 1007.

> - SATA - problem appears even if SATA is shut off via jumper, as does Craig.

SATA is shut off on jumper. no SATA in kernel

> - Memory - I swapped memory in a test to see. I also tried the dual 
> channel and the single channel mode. No change.

dual channel 2x256mb

> - SMP / UP / passed flags Julien said: "I run strictly non-SMP kernels 
> and they always crash if APIC (or local APIC?) is enabled." - I see the 
> same things.

smp off, preempt off. lapic on, apic on, acpi on

> (Julien also suggest a quick way to see if the system is stable: type 
> hdparm -t /dev/hd<someharddrive> several times).

as my uptime below suggests.. that didnt hurt a thing. I havent rebooted
since that test.

> - ASUS. I have a rev2.0 PCB, and several rev 1.x PCB, and they all 
> suffer. Others also report boards from different manufacturers hanging 
> with the NFORCE2. (Prakash: Abit NF7-S V2.0), Lenar (Epox 8RDA+ , 8RDA3+  )
> - Kernel 2.4 or 2.6 specifically. This problem occurs both in later 2.6 
> builds and 2.4.23 (and below)
> - IDE (NFORCE/AMD IDE driver in kernel) It seems the problem is easily 
> exacerbated by IDE, but I think that this is not the root cause. In 
> NO-APIC mode the IDE behaves reliably.
> 
> 
> Craig said he ran it for 18 hours with abuse, including Juliens hdparm 
> test.


Uptime is 4 days 1:55 and counting...


Things I can remember running in that time:
apache, mysql, proftpd, kde, evolution, xchat2, mozilla, konsole,
scribus, any updates including gcc 3.2.3 from 3.2.2 that came through
gentoo's portage system, gcc recompilation of scribus quite often (cvs
updates and own coding), gdb for about 3 hours finding one scribus
segfault testing over and over again with many crashes (fixed now btw
for those scribus users out there :)) 


> -> Craig, which kernel are you using? Distro (RedHat Taroon's kernel has 
> LAPIC turned off)? PCB Rev of motherboard? Bios revision? Whats the 
> lspci, cat /proc/interrupts look like? dmesg?

I'm using Gentoo's gentoo-dev-sources 2.6.0_beta11 which is 2.6 test11
plus the patches to be found at:
http://dev.gentoo.org/~brad_mssw/kernel_patches/2.6.0/genpatches-0.6/

They have released a 2.6_beta11-r1 that I ahvent upgraded to which uses 
http://dev.gentoo.org/~brad_mssw/kernel_patches/2.6.0/genpatches-0.7/

and i notice there is a 0.8 directory.


So.. sorry if my comments have misled.. its not exactly standard 2.6 now
that I do check it. There is one nvidia nforce network patch. 

As above, PCB is 2.0, BIOS is standard 1007 from ASUS.

lspci and proc/interrupts follow. dmesg has been wiped out by some
packet command errors with drive seek errors on hdc (dvdrw) from when i
was playing music with that drive.

-----
lspci:
00:00.0 Host bridge: nVidia Corporation nForce2 AGP (different version?)
(rev c1)
00:00.1 RAM memory: nVidia Corporation nForce2 Memory Controller 1 (rev
c1)
00:00.2 RAM memory: nVidia Corporation nForce2 Memory Controller 4 (rev
c1)
00:00.3 RAM memory: nVidia Corporation nForce2 Memory Controller 3 (rev
c1)
00:00.4 RAM memory: nVidia Corporation nForce2 Memory Controller 2 (rev
c1)
00:00.5 RAM memory: nVidia Corporation nForce2 Memory Controller 5 (rev
c1)
00:01.0 ISA bridge: nVidia Corporation nForce2 ISA Bridge (rev a4)
00:01.1 SMBus: nVidia Corporation nForce2 SMBus (MCP) (rev a2)
00:02.0 USB Controller: nVidia Corporation nForce2 USB Controller (rev
a4)
00:02.1 USB Controller: nVidia Corporation nForce2 USB Controller (rev
a4)
00:02.2 USB Controller: nVidia Corporation nForce2 USB Controller (rev
a4)
00:05.0 Multimedia audio controller: nVidia Corporation nForce
MultiMedia audio [Via VT82C686B] (rev a2)
00:06.0 Multimedia audio controller: nVidia Corporation nForce2 AC97
Audio Controler (MCP) (rev a1)
00:08.0 PCI bridge: nVidia Corporation nForce2 External PCI Bridge (rev
a3)
00:09.0 IDE interface: nVidia Corporation nForce2 IDE (rev a2)
00:0c.0 PCI bridge: nVidia Corporation nForce2 PCI Bridge (rev a3)
00:0d.0 FireWire (IEEE 1394): nVidia Corporation nForce2 FireWire (IEEE
1394) Controller (rev a3)
00:1e.0 PCI bridge: nVidia Corporation nForce2 AGP (rev c1)
02:01.0 Ethernet controller: 3Com Corporation 3C920B-EMB Integrated Fast
Ethernet Controller (rev 40)
03:00.0 VGA compatible controller: ATI Technologies Inc Radeon R250 If
[Radeon 9000] (rev 01)
03:00.1 Display controller: ATI Technologies Inc Radeon R250 [Radeon
9000] (Secondary) (rev 01)
-----
cat /proc/interrupts
           CPU0
  0:  350520478          XT-PIC  timer
  1:     346366    IO-APIC-edge  i8042
  2:          0          XT-PIC  cascade
  8:          2    IO-APIC-edge  rtc
  9:          0   IO-APIC-level  acpi
 12:    3568875    IO-APIC-edge  i8042
 14:    2286009    IO-APIC-edge  ide0
 15:     153023    IO-APIC-edge  ide1
 19:   26315864   IO-APIC-level  radeon@PCI:3:0:0
 21:     855007   IO-APIC-level  ehci_hcd, NVidia nForce2, eth0
 22:          3   IO-APIC-level  ohci1394
NMI:          0
LOC:  350520402
ERR:          0
MIS:          0
-----
> -> Josh, Craig & anyone else who gets this working on 2.6.test11 or some 
> other fork of 2.6 can they please try 2.4.23-release as well and see if 
> the machine hangs as well?

I was running gentoo's 2.4.23 pre 8 before this.. its still on the
machine. I get no hangs with it.

This machine has never run Win* (its only 3 weeks old). Wouldnt go near
an Uber BIOS on a still warranted motherboard.

Hope some of it helps.

Craig


> The Linux APIC code generically works on most other hardware. Something 
> specific to the NFORCE2 chips and its interaction with Linux's APIC code 
> causes the hard hangs. The Windows 2000's APIC code was made before the 
> NFORCE2 existed, and it seems to run fine there.
> 
> - About that Uber BIOS bios for the Asus Deluxe board, Anyone running 
> this: a) do you really want to run a hacked bios when other OS run fine 
> on the unhacked BIOS b) do you believe that any of the un-hidden 
> settings the uber bios or settings you may have changed helps solve this 
> problem?
> 
> These bugs on the LK Bugzilla seem related:
> http://bugme.osdl.org/show_bug.cgi?id=1203
> 
> Loosely related:
> http://bugme.osdl.org/show_bug.cgi?id=1530
> http://bugme.osdl.org/show_bug.cgi?id=1440
> http://bugme.osdl.org/show_bug.cgi?id=1269
> 
> 
> Does anyone know which developers would be interested in looking at 
> this? I think it would be better if a specific
> patch fixed this problem than try a kernel from bitkeeper
> on a daily basis and wait for the problem to go away without
> ever knowing what caused it.
> 
> 
> 
> 
> 
>  > To me the strangest thing is that when I first got this
>  > board a month or so ago it would hang with APIC or LAPIC
>  > enabled. Now it works fine without disabling APIC. All I
>  > did was update the BIOS and use it for a while with APIC
>  > disabled. 2.6.0-test9-mm through 2.6.0-test11 all work just
>  > fine. Still at the same time some people are reporting that
>  > it works, some are reporting that it doesn't. I probably
>  > wouldn't think to much of this except I was one of the ones
>  > that said APIC causes crashes with IDE load, but now it
>  > doesn't?
>  >
> 
> 
> 
>  > On approximately Tue, Dec 02, 2003 at 10:13:46AM +0000,
>  > ross.alexander wrote: Alistair,
>  >
>  > I upgraded the BIOS about a week ago to 1007. I personally
>  > found it to be less stable than 1006. I don't believe it is
>  > a problem with my hardware combination since it has been
>  > stable for long periods of time. I was running the SMP
>  > kernel simply because I (wrongly) presumed a) you needed it
>  > to get the IO-APIC working, and b) it didn't do any harm.
>  >
>  > It is clear that the UP kernel is considerable more stable
>  > than the SMP kernel. This is a very useful fact since it
>  > suggests that it is not a problem with the IDE device
>  > driver per se. The whole purpose of my testing is to try to
>  > determine which options increased the stability and hence
>  > highlight where the problem could be.
>  >
>  > One of the reasons I don't like ACPI is the huge amount of
>  > additional complexity it adds and the amount of stuff it
>  > could screw up. Now I have not heard that any of the VIA
>  > KTxxx based motherboards have any problems. If this is true
>  > then the problem does not lie with the LAPIC, since that is
>  > in the processor, not the MB. The fact that it seems to
>  > only occur with the NForce2 chipset means it could well be
>  > some interrupt coming into the LAPIC from Interrupt Bus.
>  > However I certainly don't claim to be an expert on this so
>  > I could well be talking complete crap.
>  >
>  > Conclusion: More testing required.
>  >
>  > Cheers,
>  >
>  > Ross
>  >
>  > Alistair John Strachan <s0348365 28/11/2003
>  > 04:46 p.m.
>  >
>  > To: ross.alexander
>  > <brendan cc: linux-kernel
>  > Subject: Re: NForce2 pseudoscience stability testing
>  > (2.6.0-test11)
>  >
>  > On Friday 28 November 2003 15:13,
>  > ross.alexander
>  >
>  > The conclusion to this is the problem is in Local APIC with
>  > SMP. I'm not saying this is actually true only that is what
>  > the data suggests. If anybody wants me to try some other
>  > stuff feel free to suggest ideas.
>  >
>  > Cheers,
>  >
>  > Ross
>  >
>  > It's evidently a configuration problem, albeit BIOS,
>  > mainboard revision, memory quality, etc. because I and many
>  > others like me are able to run Linux 2.4/2.6 with all the
>  > options you tested and still achieve absolute stability, on
>  > the nForce 2 platform.
>  >
>  > My system is an EPOX 8RDA+, with an Athlon 2500+ (Barton)
>  > overclocked to 2.2Ghz, and 2x256MB TwinMOS PC3200 dimms.
>  > FSB is at 400Mhz, and the ram timings are 4,2,2,2. One
>  > might expect such a configuration to be unstable,
>  >
>  > but it is not.
>  >
>  > I'm currently running 2.6.0-test10-mm1 with full ACPI (+
>  > routing), APIC and local APIC, no preempt, UP, and
>  > everything has been rock-solid, despite the machine being
>  > under constant 100% CPU load and fairly active IO load.
>  >
>  > Also, many others have found that just disabling local apic
>  > (and the MPS setting in the BIOS) as well as ACPI solves
>  > their problem, so I'm skeptical that SMP really causes
>  > *nForce 2 specific* instability.
>  >
>  > -- Cheers, Alistair.
>  >
>  > personal: alistair()devzero!co!uk university:
>  > s0348365()sms!ed!ac!uk student: CS/AI Undergraduate
>  > contact: 7/10 Darroch Court, University of Edinburgh.
>  >
>  > - To unsubscribe from this list: send the line "unsubscribe
>  > linux-kernel" in the body of a message to
>  > majordomo@vger.kernel.org More majordomo info at
>  > http://vger.kernel.org/majordomo-info.html Please read the
>  > FAQ at http://www.tux.org/lkml/
> 






       reply	other threads:[~2003-12-03  0:29 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <3FCD21E1.5080300@netzentry.com>
2003-12-03  0:28 ` Craig Bradney [this message]
2003-12-03  0:48   ` NForce2 pseudoscience stability testing (2.6.0-test11) Prakash K. Cheemplavam
2003-12-03  8:15     ` Craig Bradney
2003-12-03 17:09     ` bill davidsen
     [not found]     ` <200312031709.MAA18860@gatekeeper.tmr.com>
2003-12-03 17:37       ` Prakash K. Cheemplavam
2003-12-03  0:47 ` Ian Kumlien
2003-12-04 13:07 Dan Creswell
  -- strict thread matches above, loose matches on Subject: below --
2003-12-04 12:17 b
2003-12-04 15:19 ` Craig Bradney
2003-12-04 16:32   ` Josh McKinney
2003-12-04 17:08     ` Julien Oster
2003-12-04 17:55       ` Josh McKinney
2003-12-05 13:28 ` Pat Erley
2003-12-04  9:09 b
2003-12-04  8:59 b
2003-12-04  5:37 b
2003-12-04  7:00 ` Craig Bradney
2003-12-04  5:11 Allen Martin
2003-12-04 20:04 ` Jesse Allen
2003-12-04 20:41   ` Craig Bradney
2003-12-04 20:55     ` Craig Bradney
2003-12-04 22:03       ` Bob
2003-12-04  2:57 b
2003-12-04  1:41 b
2003-12-04  2:45 ` Jesse Allen
2003-12-04  7:42   ` Prakash K. Cheemplavam
2003-12-04  4:45 ` Josh McKinney
2003-12-04 11:47 ` ross.alexander
     [not found] <fa.nmlihqm.16j6n38@ifi.uio.no>
     [not found] ` <fa.f27m7i8.1vk0j84@ifi.uio.no>
2003-12-04  1:08   ` walt
2003-12-03  1:32 Allen Martin
2003-12-03  1:23 b
2003-12-03  1:30 ` Ian Kumlien
2003-12-03  0:58 Allen Martin
2003-12-03  1:09 ` Ian Kumlien
     [not found] <WSA7.6D.39@gated-at.bofh.it>
     [not found] ` <WTYM.3ua.7@gated-at.bofh.it>
     [not found]   ` <WVoa.73O.17@gated-at.bofh.it>
2003-11-30 13:06     ` Lenar Lõhmus
2003-11-29 10:25 bug in -test11 make xconfig Christopher Sawtell
2003-11-29 11:18 ` NForce2 pseudoscience stability testing (2.6.0-test11) Craig Bradney
2003-11-29 16:34   ` Julien Oster
2003-11-29 16:47     ` Craig Bradney
2003-11-29 16:54       ` Craig Bradney
2003-12-07 11:32     ` Jussi Laako
2003-12-07 15:49       ` Prakash K. Cheemplavam
2003-12-01 18:30   ` Pavel Machek
2003-12-01 20:20     ` Craig Bradney
     [not found] <001a01c3b515$b6030de0$0f00a8c0@client.attbi.com>
2003-11-28 15:13 ` ross.alexander
2003-11-28 16:46   ` Alistair John Strachan
2003-11-28 18:13     ` Julien Oster
2003-11-28 18:24       ` Prakash K. Cheemplavam
2003-11-29  2:55       ` Josh McKinney
2003-11-29 16:33         ` Julien Oster
2003-11-29 17:15           ` Josh McKinney
2003-12-02 10:13     ` ross.alexander
2003-12-02 21:12       ` Josh McKinney
2003-12-03 16:23       ` Julien Oster
2003-11-28 18:00   ` Julien Oster
2003-11-28 18:18     ` Prakash K. Cheemplavam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1070411338.2452.66.camel@athlonxp.bradney.info \
    --to=cbradney@zip.com.au \
    --cc=b@netzentry.com \
    --cc=forming@charter.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pomac@vapor.com \
    --cc=ross.alexander@uk.neceur.com \
    --cc=s0348365@sms.ed.ac.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).