linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* sporadic "freezes" on amd64 (GA K8NF)
@ 2005-08-05 20:33 Jaco Kroon
  0 siblings, 0 replies; 2+ messages in thread
From: Jaco Kroon @ 2005-08-05 20:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: techteam

[-- Attachment #1: Type: text/plain, Size: 2894 bytes --]

Hello all,

I'm absolutely stumped with this one.  We are still having problems
deciding whether this is a software problem or a hardware problem.  This
particular box (specs lower down) just freezes up sporadically when in
Linux.

Normally it just stops responding entirely.  As in one moment it's still
outputting and the next there is nothing.  Then once, (twice actually),
we actually got a kernel panic, I've taken a picture which can be found
at http://www.kroon.co.za/images/kernel_panic_amd64.jpg (Apologies for
the quality - phones aren't good at taking them).  From this panic (and
the other which I had no way of capturing at the time) it looks like a
bug somewhere when accessing the hard drive.  The one here was on
reiserfs the other was on ext3.

Hardware specs:

2GB RAM
Gigabyte K8NF
AMD 3500+ processor
Ge force 6200 graphics card

We've tried at least three different distributions (Mandrake, SuSE and
Gentoo) with both ext3 and reiserfs as file systems.  Mandrake and SuSE
was 32-bit versions and we tried both a 32 and 64 bit Gentoo.

I've tried various kernels, from 2.6.10, 2.6.11.8, 2.6.11.11, 2.6.12,
2.6.12.3 - all to no avail.  Unfortunately I don't have the kernel
config that was in use when we captured the trace any more.  We are
using the sata_nv module for the sata controller though.

Now for the truly odd thing:  When we down the RAM to 1GB it works fine.
 So we suspected that something might be wrong with the RAM controller
and instead of 4 x 512MB we asked for 2 x 1GB, apparently this crashed
as well.

And for those who want to ask, yes, we've left it doing memtest for a
week, we have tried different combinations of the 4 chips when going
down to 1GB (all the combinations we tried - about 10 - worked).  And
yes, all the burn-in tests (all of the ones on the ultimate boot CD) as
well as some burn-in tests from the suppliers (under Windows) worked
perfectly.  We also ran some benchmarking tools on Windows (Suppliers
said if we can consistently crash Windows they'll swap out, to quote "It
runs Windows - it performs within spec").  Needless to say - we're not
going back to them for future purchases.

And no, we are not using the binary nvidia module :).

Thanks in advance for any and all suggestions.

Jaco

PS:  A text-only version of the stack trace (minus a lot of numbers):
Call Trace:<IRQ> {as_remove_queued_request+288}{as_move_to_dispatch+342}
    {as_next_request+941}{elv_next_request+277}
    {scsi_request_fn+89}{blk_run_queue+40}
    {scsi_end_request+252}{scsi_io_completion+484}
    {sd_rw_intr+598}{scsi_sofirq+53}
    {__do_softirq+83}{do_softirq+53}
    {irq_exit+76}{do_IRQ+71}
    {ret_from_intr+0} <EOI> {system_call+126}

Code: 83 79 88 01 75 09 e9 a7 00 00 00 48 8b 4f 10 48 85 c9 66 90
RIP <ffffffff{rb_erase+384} RSP <ffffffff804379d0>
CR2: 0000.0002e8
 <0>Kernel panic - not synching: Aiee, killing interrupt handler!

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/x-pkcs7-signature, Size: 3174 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: sporadic "freezes" on amd64 (GA K8NF)
       [not found] <5.2.1.1.2.20050806081410.00bf46b8@pop.gmx.net>
@ 2005-08-06  8:01 ` Jaco Kroon
  0 siblings, 0 replies; 2+ messages in thread
From: Jaco Kroon @ 2005-08-06  8:01 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-kernel, techteam

[-- Attachment #1: Type: text/plain, Size: 2637 bytes --]

Mike Galbraith wrote:
> At 10:33 PM 8/5/2005 +0200, you wrote:
> 
>> Hello all,
>>
>> I'm absolutely stumped with this one.  We are still having problems
>> deciding whether this is a software problem or a hardware problem.
> 
> Given the number of kernels it freezes under, I'd say hardware.

That is what we thought too, initially.  I'm still tending towards this
option but we just don't know any more.  Considering that Windows runs
just fine.  Even under heavy workload.  Although - compiling tends to be
some of the most rigurous hardware chowing work there is (Used to use
compiling glibc as a test on how stable my old pentium 90 was - usually
had to cross-compile it in the end).  Anyway, we're unable to get
Windows to compile for extended periods of time, mostly because we
simply do not have any projects available to us that is big enough.  Can
always try installing cygwin and using one or another opensource project
(MySQL for example).

>>   This
>> particular box (specs lower down) just freezes up sporadically when in
>> Linux.
> 
> You failed to describe the workload.

Of course I did.  Compiling the kernel.  It's probably the most reliable
way of crashing it.  make mrproper && make allyesconfig && make - give
it about 2 to 5 minutes and it's dead.  We also got in a couple of times
to die within about 10 seconds, as in KSYM was the only thing that
remained that still had to pass, so we would type make, it would do a
couple of checks (CHK I assume is for check) and the KSYM, at which
point it would just die.

>> Normally it just stops responding entirely.  As in one moment it's still
>> outputting and the next there is nothing.  Then once, (twice actually),
>> we actually got a kernel panic, I've taken a picture which can be found
>> at http://www.kroon.co.za/images/kernel_panic_amd64.jpg (Apologies for
>> the quality - phones aren't good at taking them).
> 
> Try a serial console for capture.

I never did manage to get that right.  I've still got the cable from my
previous attempts, guess I'll be trying again.

>>   From this panic (and
>> the other which I had no way of capturing at the time) it looks like a
>> bug somewhere when accessing the hard drive.  The one here was on
>> reiserfs the other was on ext3.
> 
> 
> The first thing than comes to mind is cooling.  The next thing I'd
> suspect is the power supply.  After verifying that these are in fact not
> the problem, I'd try a different disk controller.

I doubt it's cooling - after resetting the BIOS claims <40 degree
celcius temperatures so I'll try another power supply.  Although, I
suspect it already is a 400W in there.

Jaco

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/x-pkcs7-signature, Size: 3174 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2005-08-06  8:04 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-08-05 20:33 sporadic "freezes" on amd64 (GA K8NF) Jaco Kroon
     [not found] <5.2.1.1.2.20050806081410.00bf46b8@pop.gmx.net>
2005-08-06  8:01 ` Jaco Kroon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).