linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: Crash on boot (2.4.5)
@ 2001-06-25  6:17 Andy Ward
  2001-06-25  6:32 ` VIA Southbridge bug (Was: Crash on boot (2.4.5)) Steven Walter
  0 siblings, 1 reply; 16+ messages in thread
From: Andy Ward @ 2001-06-25  6:17 UTC (permalink / raw)
  To: Steven Walter; +Cc: linux-kernel

Well, I have tried your suggestion, and it works beautifully...  The
only change I made was to the cpu type (to 686), and everything *just*
works now...  Thanks, all!!!

-- andyw

-----Original Message-----
From: Steven Walter [mailto:srwalter@yahoo.com]
Sent: Sunday, June 24, 2001 3:42 PM
To: Andy Ward
Cc: linux-kernel@vger.kernel.org
Subject: Re: Crash on boot (2.4.5)


On Sun, Jun 24, 2001 at 01:51:09PM -0500, Daniel Fraley wrote:
> Hi, everyone..  I'm borrowing my roommate's email, so please send
replies to
> andyw@edafio.com.  Thanks!
> 
> Here's my problem...  when I boot anything 2.4, I get several oopsen
in a
> row, all of which are either (most commonly) kernel paging request
could not
> be handled, or (much less common) unable to handle kernel Null pointer
> dereference.  I will send any info on request, but here's my hardware
and
> kernel config:
> 
> iWill KKR-266R (Via 8363 Northbridge, 686B south)
> AMD tbird 1GHz
> 256MB cas2 pc133 sdram
> ATI Radeon DDR 64MB VIVO
> Kingston KNE120TX (Realtek 8139 chip)
> SBLive! 5.1
> IBM GXP75 30GB (on the via ide controller)
> Pioneer 16x dvd
> ls120
> 
> This happens regardless if I turn on swap or not.  When swap is on, it
is a
> 128MB partition (and yes, I'm aware of the recommendation of 2x RAM,
but I
> believe I read somewhere that someone was working on that, and I
didn't want
> to waste the extra 384MB on swap).
> 
> Is there anything I can do to fix this?
> 
> -- andyw
> 
> p.s., booting with devfs=nomount is better, but still causes oopsen (I
get
> to a login prompt, but if I do much more than mount a disk a copy to
it, the
> system freaks)

>From the look of things, you're being bitten by the VIA southbridge
problem.  As I've gathered, its some sort of interaction with that chip
and the 3DNow! fast copy routines the kernel uses.

If you compile the kernel for a 686, does the problem go away?  What
about 586 or lower?  If so, I believe there are some people working on
finding common aspects of the hardware that experience this problem,
though I don't remember who.  You should get in contact with them, or
they might get into contact with you.

Good luck on working this out.
-- 
-Steven
In a time of universal deceit, telling the truth is a revolutionary act.
			-- George Orwell

^ permalink raw reply	[flat|nested] 16+ messages in thread

* VIA Southbridge bug (Was: Crash on boot (2.4.5))
  2001-06-25  6:17 Crash on boot (2.4.5) Andy Ward
@ 2001-06-25  6:32 ` Steven Walter
  2001-06-25  7:06   ` Alan Cox
  0 siblings, 1 reply; 16+ messages in thread
From: Steven Walter @ 2001-06-25  6:32 UTC (permalink / raw)
  To: Andy Ward; +Cc: linux-kernel

Great, glad to here it.  Who (if anyone) is still attempting to unravel
the puzzle of the Via southbridge bug?  You, Andy, should try and get in
touch with them and help debug this thing, if you're up to it.

On Mon, Jun 25, 2001 at 01:17:57AM -0500, Andy Ward wrote:
> Well, I have tried your suggestion, and it works beautifully...  The
> only change I made was to the cpu type (to 686), and everything *just*
> works now...  Thanks, all!!!
> 
> > From the look of things, you're being bitten by the VIA southbridge
> > problem.  As I've gathered, its some sort of interaction with that chip
> > and the 3DNow! fast copy routines the kernel uses.
> > 
> > If you compile the kernel for a 686, does the problem go away?  What
> > about 586 or lower?  If so, I believe there are some people working on
> > finding common aspects of the hardware that experience this problem,
> > though I don't remember who.  You should get in contact with them, or
> > they might get into contact with you.
> > 
> > Good luck on working this out.
> > -- 
> > -Steven
> > In a time of universal deceit, telling the truth is a revolutionary act.
> > 			-- George Orwell
-- 
-Steven
In a time of universal deceit, telling the truth is a revolutionary act.
			-- George Orwell

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: VIA Southbridge bug (Was: Crash on boot (2.4.5))
  2001-06-25  6:32 ` VIA Southbridge bug (Was: Crash on boot (2.4.5)) Steven Walter
@ 2001-06-25  7:06   ` Alan Cox
  2001-06-30 13:58     ` Pavel Machek
  0 siblings, 1 reply; 16+ messages in thread
From: Alan Cox @ 2001-06-25  7:06 UTC (permalink / raw)
  To: Steven Walter; +Cc: Andy Ward, linux-kernel

> Great, glad to here it.  Who (if anyone) is still attempting to unravel
> the puzzle of the Via southbridge bug?  You, Andy, should try and get in
> touch with them and help debug this thing, if you're up to it.

The IWILL problem seems unrelated. Its the board that more than others people
report fails totally when streaming memory copies using movntq instructions.

The Athlon optimised kernel places pretty much the absolute maximum load 
possible on the memory bus. Several people have reported that machines that
are otherwise stable on the bios fast options require  the proper conservative
settings to be stable with the Athlon optimisations

Alan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: VIA Southbridge bug (Was: Crash on boot (2.4.5))
  2001-06-25  7:06   ` Alan Cox
@ 2001-06-30 13:58     ` Pavel Machek
  2001-07-08 17:37       ` Alan Cox
  0 siblings, 1 reply; 16+ messages in thread
From: Pavel Machek @ 2001-06-30 13:58 UTC (permalink / raw)
  To: Alan Cox; +Cc: Steven Walter, Andy Ward, linux-kernel

Hi!

> > Great, glad to here it.  Who (if anyone) is still attempting to unravel
> > the puzzle of the Via southbridge bug?  You, Andy, should try and get in
> > touch with them and help debug this thing, if you're up to it.
> 
> The IWILL problem seems unrelated. Its the board that more than others people
> report fails totally when streaming memory copies using movntq instructions.
> 
> The Athlon optimised kernel places pretty much the absolute maximum load 
> possible on the memory bus. Several people have reported that machines that
> are otherwise stable on the bios fast options require  the proper conservative
> settings to be stable with the Athlon optimisations

Do we need patch to memtest to use 3dnow?
									Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: VIA Southbridge bug (Was: Crash on boot (2.4.5))
  2001-06-30 13:58     ` Pavel Machek
@ 2001-07-08 17:37       ` Alan Cox
  2001-07-09 16:48         ` Rob Landley
  0 siblings, 1 reply; 16+ messages in thread
From: Alan Cox @ 2001-07-08 17:37 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Alan Cox, Steven Walter, Andy Ward, linux-kernel

> > possible on the memory bus. Several people have reported that machines that
> > are otherwise stable on the bios fast options require  the proper conservative
> > settings to be stable with the Athlon optimisations
> 
> Do we need patch to memtest to use 3dnow?

Possibly yes. Although memtest86 really tries to test for onchip not bus
related problems


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: VIA Southbridge bug (Was: Crash on boot (2.4.5))
  2001-07-08 17:37       ` Alan Cox
@ 2001-07-09 16:48         ` Rob Landley
  2001-07-10  9:17           ` Ville Herva
  0 siblings, 1 reply; 16+ messages in thread
From: Rob Landley @ 2001-07-09 16:48 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

On Sunday 08 July 2001 13:37, Alan Cox wrote:
> > > possible on the memory bus. Several people have reported that machines
> > > that are otherwise stable on the bios fast options require  the proper
> > > conservative settings to be stable with the Athlon optimisations
> >
> > Do we need patch to memtest to use 3dnow?
>
> Possibly yes. Although memtest86 really tries to test for onchip not bus
> related problems

What else tends to fail on the motherboard that might be easy to test for?  
Processor overheating?  (When the thermometer circuitry's there, anyway.)  
Something to do with DMA?  (Would DMA to/from a common card like VGA catch 
chipset-side DMA problems?)  There was an SMP exception thing floating by 
recently, is that common and testable?

I know there's a lot of funky peripheral combinations that behave strangely, 
but without opening that can of worms what kind of common problems on the 
motherboard itself might be easy to test for in a "run this overnight and see 
if it finds a problem with your hardware" sort of way?

Rob

(P.S. What kind of CPU load is most likely to send a processor into overheat? 
 (Other than "a tight loop", thanks.  I mean what kind of instructions?)  
This is going to be CPU specific, isn't it?  Our would a general instruction 
mix that doesn't call halt be enough?  It would need to keep the FPU busy 
too, wouldn't it?  And maybe handle interrupts.  Hmmm...)

I wonder...  The torture test Tom's Hardware guide uses for processor 
overheating is GCC compiling the Linux kernel.  (That's what caught the 
Pentium III 1.13 gigahertz instability when nothing else would.)  I wonder, 
maybe if a stripped down subset of a known version of GCC and a known version 
of the kernel were running from a ramdisk...  It USED to fit in 8 megs with 
no swap, might still fit in 32 with a decent chunk of kernel source.  Throw 
the compile in a loop, add in a processor temperature detector daemon to kill 
the test and HLT the system if the temperature went too high...

I wonder what bits of the kernel GCC actually needs to run these days?  
(System V inter-process communication?  sysctl support?  Hmmm...  Would 
2.4.anything be a stable enough base for this yet, or should it be 2.2.19?  
Is 2.4 still psychotic with less swap space than ram (I.E. no swap space at 
all)?)

Off to play...

Still Rob.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: VIA Southbridge bug (Was: Crash on boot (2.4.5))
  2001-07-09 16:48         ` Rob Landley
@ 2001-07-10  9:17           ` Ville Herva
  2001-07-10 15:28             ` Hardware testing [was Re: VIA Southbridge bug (Was: Crash on boot (2.4.5))] Rob Landley
                               ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Ville Herva @ 2001-07-10  9:17 UTC (permalink / raw)
  To: Rob Landley; +Cc: linux-kernel

On Mon, Jul 09, 2001 at 12:48:59PM -0400, you [Rob Landley] claimed:
> 
> (P.S. What kind of CPU load is most likely to send a processor into overheat? 
>  (Other than "a tight loop", thanks.  I mean what kind of instructions?)  
> This is going to be CPU specific, isn't it?  Our would a general instruction 
> mix that doesn't call halt be enough?  It would need to keep the FPU busy 
> too, wouldn't it?  And maybe handle interrupts.  Hmmm...)

See Robert Redelmeier's cpuburn:

http://users.ev1.net/~redelm/

It is coded is assembly specificly to heat the CPU as much as possible. See
the README for details, but it seems that floating point operations are
tougher than integers and MMX can be even harder (depending on CPU model, of
course). Not sure what kind of role SSE, SSE2, 3dNow! play these days.
Perhaps Alan knows?
 
> I wonder...  The torture test Tom's Hardware guide uses for processor 
> overheating is GCC compiling the Linux kernel. 

That shouldn't really be that good a test. During compilation, CPU spends a
_lot_ of time waiting for the memory and even for the disk io. For maximum
heat, you really want a tight loop of instructions, that sits firmly in L1
cache.

The gcc compile is a good test for many other tests - it uses a lot of
memory with complex pointers references (tests memory, and bit errors in
pointers are likely to sig11 rather than produce subtle errors in output),
stresses chipset somewhat (memory throughput), and cpu somewhat. But to test
CPU overheating and nothing else, cpuburn should be a lot better. (Even
seti@home is better as it uses FPU). Just run them an observe the sensors
readings. Cpuburn gets several degrees higher.

> the compile in a loop, add in a processor temperature detector daemon to kill 
> the test and HLT the system if the temperature went too high...

Cpuburn exists when CPU miscalculates something (sign of overheat).

I'm not sure if cpuburn is included in cerberus these days (istr it is), but
a nice test set for memory, cpu, disk etc to run over night or over weekend
to catch most of the hw faults would definetely be nice. 


-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Hardware testing [was Re: VIA Southbridge bug (Was: Crash on boot (2.4.5))]
  2001-07-10  9:17           ` Ville Herva
@ 2001-07-10 15:28             ` Rob Landley
  2001-07-11  4:18               ` Albert D. Cahalan
                                 ` (2 more replies)
  2001-07-10 21:24             ` VIA Southbridge bug (Was: Crash on boot (2.4.5)) Adam Sampson
  2001-07-11  9:03             ` Eyal Lebedinsky
  2 siblings, 3 replies; 16+ messages in thread
From: Rob Landley @ 2001-07-10 15:28 UTC (permalink / raw)
  To: Ville Herva; +Cc: linux-kernel

On Tuesday 10 July 2001 05:17, Ville Herva wrote:
> On Mon, Jul 09, 2001 at 12:48:59PM -0400, you [Rob Landley] claimed:
> > (P.S. What kind of CPU load is most likely to send a processor into
> > overheat? (Other than "a tight loop", thanks.  I mean what kind of
> > instructions?) This is going to be CPU specific, isn't it?  Our would a
> > general instruction mix that doesn't call halt be enough?  It would need
> > to keep the FPU busy too, wouldn't it?  And maybe handle interrupts. 
> > Hmmm...)
>
> See Robert Redelmeier's cpuburn:
>
> http://users.ev1.net/~redelm/

Cool.  If nothing else, this is a much better starting point for further work 
than starting from scratch...

> It is coded is assembly specificly to heat the CPU as much as possible. See
> the README for details, but it seems that floating point operations are
> tougher than integers and MMX can be even harder (depending on CPU model,
> of course). Not sure what kind of role SSE, SSE2, 3dNow! play these days.
> Perhaps Alan knows?

There's at least three seperate things that need testing here.  memtest86 
tests whether your memory is OK.  CPUburn seems to do a good job testing 
processor heat (not that I'm running it on my laptop, which doesn't seem to 
have a thermal readout thingy anyway...)

The third thing (which started this thread) was memory bus.  The new 3DNow 
optimizations drove a memory bus into failure, and that IS processor 
specific...

> The gcc compile is a good test for many other tests - it uses a lot of
> memory with complex pointers references (tests memory, and bit errors in
> pointers are likely to sig11 rather than produce subtle errors in output),
> stresses chipset somewhat (memory throughput), and cpu somewhat. But to
> test CPU overheating and nothing else, cpuburn should be a lot better.
> (Even seti@home is better as it uses FPU). Just run them an observe the
> sensors readings. Cpuburn gets several degrees higher.

The downside of a test like gcc is that it does test many things, meaning 
when it fails you still don't know why.

memtest86 is great becuase it ONLY tests memory.  CPUburn is similarly 
specific.  A memory bus buster would be a good tool to add to the mix.  (DMA 
is another common problem, but the more I look into it, the more it seems to 
be dependent on whatever peripheral you're talking to, which is more 
complication than I'm looking to bite off...)

The downside of memtest86 is that your system can pass it and still have an 
obvious problem (for example, overclocking stresses both memory bus AND 
heat...)

It might be possible to put all three testers into a menu where you could 
switch on and off what you wanted to test, and run them overnight.  That way, 
if you are testing for three things (perhaps alternating tests every few 
minutes?), and you get it to fail, you can switch some off to get more 
specific tests to narrow down the problem...

> > the compile in a loop, add in a processor temperature detector daemon to
> > kill the test and HLT the system if the temperature went too high...
>
> Cpuburn exists when CPU miscalculates something (sign of overheat).
>
> I'm not sure if cpuburn is included in cerberus these days (istr it is),
> but a nice test set for memory, cpu, disk etc to run over night or over
> weekend to catch most of the hw faults would definetely be nice.

I've heard of ceberus but thought it was just a disk test suite...  One more 
thing to download and look into...  (If the tests in it can be switched 
on/off, maybe this is what I'm looking for...) 

Rob

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: VIA Southbridge bug (Was: Crash on boot (2.4.5))
  2001-07-10  9:17           ` Ville Herva
  2001-07-10 15:28             ` Hardware testing [was Re: VIA Southbridge bug (Was: Crash on boot (2.4.5))] Rob Landley
@ 2001-07-10 21:24             ` Adam Sampson
  2001-07-11  8:32               ` Ville Herva
  2001-07-11  9:03             ` Eyal Lebedinsky
  2 siblings, 1 reply; 16+ messages in thread
From: Adam Sampson @ 2001-07-10 21:24 UTC (permalink / raw)
  To: Ville Herva; +Cc: Rob Landley, linux-kernel

Ville Herva <vherva@niksula.hut.fi> writes:

> It is coded is assembly specificly to heat the CPU as much as possible. See
> the README for details, but it seems that floating point operations are
> tougher than integers and MMX can be even harder (depending on CPU model, of
> course). Not sure what kind of role SSE, SSE2, 3dNow! play these days.
> Perhaps Alan knows?

I would have thought this would be a nice problem for a genetic
algorithm to solve---start with random blocks of data, execute them
repeatedly for a period of time (restarting upon CPU traps), and
"breed" those that cause the greatest temperature increase. Any bored
research students out there?

-- 
Adam Sampson <azz@gnu.org>                  <URL:http://azz.us-lot.org/>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hardware testing [was Re: VIA Southbridge bug (Was: Crash on boot (2.4.5))]
  2001-07-10 15:28             ` Hardware testing [was Re: VIA Southbridge bug (Was: Crash on boot (2.4.5))] Rob Landley
@ 2001-07-11  4:18               ` Albert D. Cahalan
  2001-07-11  8:43               ` Ville Herva
  2001-07-11  9:11               ` Vojtech Pavlik
  2 siblings, 0 replies; 16+ messages in thread
From: Albert D. Cahalan @ 2001-07-11  4:18 UTC (permalink / raw)
  To: landley; +Cc: Ville Herva, linux-kernel

Rob Landley writes:

> The third thing (which started this thread) was memory bus.  The new 3DNow 
> optimizations drove a memory bus into failure, and that IS processor 
> specific...
...
> memtest86 is great becuase it ONLY tests memory.  CPUburn is similarly 
> specific.  A memory bus buster would be a good tool to add to the mix.  (DMA 
> is another common problem, but the more I look into it, the more it seems to 
> be dependent on whatever peripheral you're talking to, which is more 
> complication than I'm looking to bite off...)

DMA could be done in a sane manner. Let drivers register a function
to excercise DMA. When you want to test, tell all registered drivers
to start wild excessive DMA. Use a timer to stop this, because you
might end up pretty well locked out of your system while the bus is
busy moving test data.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: VIA Southbridge bug (Was: Crash on boot (2.4.5))
  2001-07-10 21:24             ` VIA Southbridge bug (Was: Crash on boot (2.4.5)) Adam Sampson
@ 2001-07-11  8:32               ` Ville Herva
  0 siblings, 0 replies; 16+ messages in thread
From: Ville Herva @ 2001-07-11  8:32 UTC (permalink / raw)
  To: Adam Sampson; +Cc: Rob Landley, linux-kernel

On Tue, Jul 10, 2001 at 10:24:21PM +0100, you [Adam Sampson] claimed:
> Ville Herva <vherva@niksula.hut.fi> writes:
> 
> > It is coded is assembly specificly to heat the CPU as much as possible. See
> > the README for details, but it seems that floating point operations are
> > tougher than integers and MMX can be even harder (depending on CPU model, of
> > course). Not sure what kind of role SSE, SSE2, 3dNow! play these days.
> > Perhaps Alan knows?
> 
> I would have thought this would be a nice problem for a genetic
> algorithm to solve---start with random blocks of data, execute them
> repeatedly for a period of time (restarting upon CPU traps), and
> "breed" those that cause the greatest temperature increase. Any bored
> research students out there?

I'm sure getting an Intel or AMD engineer to comment on this would be far
more fertile. After all, engineers developed a computer in just 50 years,
but it took millions of years for the evolution to come up something like a
human being... [1]


-- v --

v@iki.fi

[1] Now, of course someone will insist that it was in fact God who created
    man... Perhaps someone ought to go to the desert and wait for an
    enlightenment on the Right Instruction Sequence.

    Ob-;), no offense intended.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hardware testing [was Re: VIA Southbridge bug (Was: Crash on boot (2.4.5))]
  2001-07-10 15:28             ` Hardware testing [was Re: VIA Southbridge bug (Was: Crash on boot (2.4.5))] Rob Landley
  2001-07-11  4:18               ` Albert D. Cahalan
@ 2001-07-11  8:43               ` Ville Herva
  2001-07-11  9:11               ` Vojtech Pavlik
  2 siblings, 0 replies; 16+ messages in thread
From: Ville Herva @ 2001-07-11  8:43 UTC (permalink / raw)
  To: Rob Landley; +Cc: linux-kernel

On Tue, Jul 10, 2001 at 11:28:25AM -0400, you [Rob Landley] claimed:
> 
> The downside of a test like gcc is that it does test many things, meaning 
> when it fails you still don't know why.

True.
 
> memtest86 is great becuase it ONLY tests memory.  

Yes, and because it also accurately tells you which memory location is bad.
(This can't be easily done from user space, I gather). You can use this
information to workaround the memory problem with the BadRam patch from Rick
Van Rein.

> CPUburn is similarly specific.  A memory bus buster would be a good tool
> to add to the mix.  (DMA is another common problem, but the more I look
> into it, the more it seems to be dependent on whatever peripheral you're
> talking to, which is more complication than I'm looking to bite off...)

True.
 
> It might be possible to put all three testers into a menu where you could 
> switch on and off what you wanted to test, and run them overnight.  That way, 
> if you are testing for three things (perhaps alternating tests every few 
> minutes?), and you get it to fail, you can switch some off to get more 
> specific tests to narrow down the problem...

Actually lilo is just about enough for a such menu system...

Something like

image = /boot/memtest86
        label = memtest86
image = /boot/vmlinux
        label = cpuburn
        root = /dev/hda2
        append = "init=/usr/local/bin/burnP6"
        read-only
image = /boot/vmlinux
        label = cpuburn
        root = /dev/hda2
        append = "init=/usr/local/bin/testDMA"
        read-only

It would take some scripting to alternate the tests automatically, but
perhaps it could be done.
 
> I've heard of ceberus but thought it was just a disk test suite...  One more 
> thing to download and look into...  (If the tests in it can be switched 
> on/off, maybe this is what I'm looking for...) 

AFAIK it's a pretty complete test suite VA uses (used?) for testing their
hw. I'm not sure, though.


-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: VIA Southbridge bug (Was: Crash on boot (2.4.5))
  2001-07-10  9:17           ` Ville Herva
  2001-07-10 15:28             ` Hardware testing [was Re: VIA Southbridge bug (Was: Crash on boot (2.4.5))] Rob Landley
  2001-07-10 21:24             ` VIA Southbridge bug (Was: Crash on boot (2.4.5)) Adam Sampson
@ 2001-07-11  9:03             ` Eyal Lebedinsky
  2 siblings, 0 replies; 16+ messages in thread
From: Eyal Lebedinsky @ 2001-07-11  9:03 UTC (permalink / raw)
  To: linux-kernel

Ville Herva wrote:
> 
> On Mon, Jul 09, 2001 at 12:48:59PM -0400, you [Rob Landley] claimed:
> >
> > (P.S. What kind of CPU load is most likely to send a processor into overheat?
> >  (Other than "a tight loop", thanks.  I mean what kind of instructions?)
> > This is going to be CPU specific, isn't it?  Our would a general instruction
> > mix that doesn't call halt be enough?  It would need to keep the FPU busy
> > too, wouldn't it?  And maybe handle interrupts.  Hmmm...)
> 
> See Robert Redelmeier's cpuburn:
> 
> http://users.ev1.net/~redelm/

I took this program for a spin and I noted the reported CPU temp
went up by 12dc (43->55).

However, more interesting, the +5V line dropped from 4.82 to 4.72.
This is on a Gigabyte GA-7ZX with an Athlon/1200 and 2x128MB.

Some mobos may actually have their voltages pushed outside accepted
levels and cause a failure, which is actually not related to the
temperature. And you do not need to run the test for a long time,
the drop is immediate and stable.

I can only imagine what will happen if some game pushes the CPU to
the limit while running a hot video card hard, as I expect some
highly optimized graphics drivers might do. May cause some
interesting crashes.

Anyone up to enhancing the program to stress the video memory at the
same time?


In other words, this is a good stress test for the whole mobo design
and setup, not just the CPU/HSF combo.

--
Eyal Lebedinsky (eyal@eyal.emu.id.au) <http://samba.anu.edu.au/eyal/>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hardware testing [was Re: VIA Southbridge bug (Was: Crash on boot (2.4.5))]
  2001-07-10 15:28             ` Hardware testing [was Re: VIA Southbridge bug (Was: Crash on boot (2.4.5))] Rob Landley
  2001-07-11  4:18               ` Albert D. Cahalan
  2001-07-11  8:43               ` Ville Herva
@ 2001-07-11  9:11               ` Vojtech Pavlik
  2001-07-11 15:05                 ` Rob Landley
  2 siblings, 1 reply; 16+ messages in thread
From: Vojtech Pavlik @ 2001-07-11  9:11 UTC (permalink / raw)
  To: Rob Landley; +Cc: Ville Herva, linux-kernel

On Tue, Jul 10, 2001 at 11:28:25AM -0400, Rob Landley wrote:
> On Tuesday 10 July 2001 05:17, Ville Herva wrote:
> > On Mon, Jul 09, 2001 at 12:48:59PM -0400, you [Rob Landley] claimed:
> > > (P.S. What kind of CPU load is most likely to send a processor into
> > > overheat? (Other than "a tight loop", thanks.  I mean what kind of
> > > instructions?) This is going to be CPU specific, isn't it?  Our would a
> > > general instruction mix that doesn't call halt be enough?  It would need
> > > to keep the FPU busy too, wouldn't it?  And maybe handle interrupts. 
> > > Hmmm...)
> >
> > See Robert Redelmeier's cpuburn:
> >
> > http://users.ev1.net/~redelm/
> 
> Cool.  If nothing else, this is a much better starting point for further work 
> than starting from scratch...
> 
> > It is coded is assembly specificly to heat the CPU as much as possible. See
> > the README for details, but it seems that floating point operations are
> > tougher than integers and MMX can be even harder (depending on CPU model,
> > of course). Not sure what kind of role SSE, SSE2, 3dNow! play these days.
> > Perhaps Alan knows?
> 
> There's at least three seperate things that need testing here.  memtest86 
> tests whether your memory is OK.  CPUburn seems to do a good job testing 
> processor heat (not that I'm running it on my laptop, which doesn't seem to 
> have a thermal readout thingy anyway...)
> 
> The third thing (which started this thread) was memory bus.  The new 3DNow 
> optimizations drove a memory bus into failure, and that IS processor 
> specific...

Don't forget the L1/L2/L3 caches. I had once a mainboard with a faulty
L2 cache chip ('twas a K6-3 CPU, plus a FIC VA-503+ mainboard). No memory
or CPU test found the failure, yet kernel compliation was still crashing
after 6-8 hours.

I modified the 'memtest.c' little proggy (not the big memtest86, just a
little utility that runs under Linux), to use patterns and test size
that tests the L1 and then L2, and the error has shown after ten seconds
of running the test.

-- 
Vojtech Pavlik
SuSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hardware testing [was Re: VIA Southbridge bug (Was: Crash on boot (2.4.5))]
  2001-07-11  9:11               ` Vojtech Pavlik
@ 2001-07-11 15:05                 ` Rob Landley
  2001-07-12  6:57                   ` Ville Herva
  0 siblings, 1 reply; 16+ messages in thread
From: Rob Landley @ 2001-07-11 15:05 UTC (permalink / raw)
  To: Vojtech Pavlik; +Cc: linux-kernel

On Wednesday 11 July 2001 05:11, Vojtech Pavlik wrote:

> Don't forget the L1/L2/L3 caches. I had once a mainboard with a faulty
> L2 cache chip ('twas a K6-3 CPU, plus a FIC VA-503+ mainboard). No memory
> or CPU test found the failure, yet kernel compliation was still crashing
> after 6-8 hours.
>
> I modified the 'memtest.c' little proggy (not the big memtest86, just a
> little utility that runs under Linux), to use patterns and test size
> that tests the L1 and then L2, and the error has shown after ten seconds
> of running the test.

I don't suppose you still have that lying around somewhere? :)

Rob

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hardware testing [was Re: VIA Southbridge bug (Was: Crash on boot (2.4.5))]
  2001-07-11 15:05                 ` Rob Landley
@ 2001-07-12  6:57                   ` Ville Herva
  0 siblings, 0 replies; 16+ messages in thread
From: Ville Herva @ 2001-07-12  6:57 UTC (permalink / raw)
  To: Rob Landley; +Cc: Vojtech Pavlik, linux-kernel

On Wed, Jul 11, 2001 at 11:05:19AM -0400, you [Rob Landley] claimed:
> On Wednesday 11 July 2001 05:11, Vojtech Pavlik wrote:
> 
> > I modified the 'memtest.c' little proggy (not the big memtest86, just a
> > little utility that runs under Linux), to use patterns and test size
> > that tests the L1 and then L2, and the error has shown after ten seconds
> > of running the test.
> 
> I don't suppose you still have that lying around somewhere? :)

I'm not sure if it's any good, but I have one at

http://v.iki.fi/~vherva/memburn.c

(It did find one bad memory case a while ago...)


-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2001-07-12  6:58 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-06-25  6:17 Crash on boot (2.4.5) Andy Ward
2001-06-25  6:32 ` VIA Southbridge bug (Was: Crash on boot (2.4.5)) Steven Walter
2001-06-25  7:06   ` Alan Cox
2001-06-30 13:58     ` Pavel Machek
2001-07-08 17:37       ` Alan Cox
2001-07-09 16:48         ` Rob Landley
2001-07-10  9:17           ` Ville Herva
2001-07-10 15:28             ` Hardware testing [was Re: VIA Southbridge bug (Was: Crash on boot (2.4.5))] Rob Landley
2001-07-11  4:18               ` Albert D. Cahalan
2001-07-11  8:43               ` Ville Herva
2001-07-11  9:11               ` Vojtech Pavlik
2001-07-11 15:05                 ` Rob Landley
2001-07-12  6:57                   ` Ville Herva
2001-07-10 21:24             ` VIA Southbridge bug (Was: Crash on boot (2.4.5)) Adam Sampson
2001-07-11  8:32               ` Ville Herva
2001-07-11  9:03             ` Eyal Lebedinsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).