linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* LinuxBIOS + ASUS CUA + 2.4.5 works; with 2.4.6 locks up
@ 2001-09-18 22:22 Ronald G Minnich
  2001-09-18 22:54 ` Eric W. Biederman
  0 siblings, 1 reply; 6+ messages in thread
From: Ronald G Minnich @ 2001-09-18 22:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: linuxbios

Here is the scenario. We have LinuxBIOS working fine with 2.4.5 on an ASUS
CUA mainboard (Acer M1631 TNT2 northbridge, m1535d southbridge). It boots
to multiuser and works fine. All versions of linux from 2.4.0 to 2.4.5
also work.

If we upgrade to 2.4.6 we see the 'Posix compliance by Unifix' (or
whatever) message and then ... that's it. The machine appears to lock up.
Testing with the ICE shows that it is repeatedly going to address
0xfefe0c0, or similar (it varies). Note that is 7 digits of hex, not 8:
it's not going after high PCI or BIOS memory, we think.

It does appear to get through creating the kernel_thread for init, and we
think it might be dying when it goes idle, but we're not sure.

We sometimes get into the kernel thread for init and can single step it
into pci setup. At some point however the machine will again lock up. The
last POST code is 97.

THe 0xfefec00 address is suspicious. Is there any APIC (NOT ACPI, I mean
IO-APIC) change that came into 2.4.6? Is there any way the kernel could be
trying to call the BIOS for some reason (we have APM etc. OFF). Does
anyone have a hint at what we could look at? FWIW, we have run this kernel
under linuxbios on other boxes. It only fails on the Acer, and only for
2.4.6 and later (we've tested up to 2.4.9).

Thanks in advance.

ron


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: LinuxBIOS + ASUS CUA + 2.4.5 works; with 2.4.6 locks up
  2001-09-18 22:22 LinuxBIOS + ASUS CUA + 2.4.5 works; with 2.4.6 locks up Ronald G Minnich
@ 2001-09-18 22:54 ` Eric W. Biederman
  2001-09-18 22:56   ` Ronald G Minnich
  0 siblings, 1 reply; 6+ messages in thread
From: Eric W. Biederman @ 2001-09-18 22:54 UTC (permalink / raw)
  To: Ronald G Minnich; +Cc: linux-kernel, linuxbios

Ronald G Minnich <rminnich@lanl.gov> writes:

> Here is the scenario. We have LinuxBIOS working fine with 2.4.5 on an ASUS
> CUA mainboard (Acer M1631 TNT2 northbridge, m1535d southbridge). It boots
> to multiuser and works fine. All versions of linux from 2.4.0 to 2.4.5
> also work.
> 
> If we upgrade to 2.4.6 we see the 'Posix compliance by Unifix' (or
> whatever) message and then ... that's it. The machine appears to lock up.
> Testing with the ICE shows that it is repeatedly going to address
> 0xfefe0c0, or similar (it varies). Note that is 7 digits of hex, not 8:
> it's not going after high PCI or BIOS memory, we think.
> 
> It does appear to get through creating the kernel_thread for init, and we
> think it might be dying when it goes idle, but we're not sure.

That could happen though it sounds unlikely on a PIII.
 
> We sometimes get into the kernel thread for init and can single step it
> into pci setup. At some point however the machine will again lock up. The
> last POST code is 97.
> 
> THe 0xfefec00 address is suspicious. Is there any APIC (NOT ACPI, I mean
> IO-APIC) change that came into 2.4.6? 

Yes, according to the changlog.  But apics are generally much higher.

> Is there any way the kernel could be
> trying to call the BIOS for some reason (we have APM etc. OFF). Does
> anyone have a hint at what we could look at? FWIW, we have run this kernel
> under linuxbios on other boxes. It only fails on the Acer, and only for
> 2.4.6 and later (we've tested up to 2.4.9).

It shouldn't be trying to call the BIOS.

Have you run memtest86?  Seriously knowing that you don't have
bad memory or a bad memory setup would be rule out all kinds of problems.

Hmm. 0xfefe0c0 is just below 256 megs (I assume that is what you have in your
machine).  The kernel allocates memory from the top down, at least it did last
time I looked.  I can't think of a reason it would jump there, but I
can see it having a variable allocated up there.

On the other hand an address like: 0xc0e0ef0f (reverse endian) is
really wacky but completely inside of the kernel address space.

Eric

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: LinuxBIOS + ASUS CUA + 2.4.5 works; with 2.4.6 locks up
  2001-09-18 22:54 ` Eric W. Biederman
@ 2001-09-18 22:56   ` Ronald G Minnich
  2001-09-18 23:03     ` Eric W. Biederman
  0 siblings, 1 reply; 6+ messages in thread
From: Ronald G Minnich @ 2001-09-18 22:56 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, linuxbios

We can run the memtest, but I thought that a fully booting kernel was a
pretty good one.

I'll try that anywy.

ron


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: LinuxBIOS + ASUS CUA + 2.4.5 works; with 2.4.6 locks up
  2001-09-18 22:56   ` Ronald G Minnich
@ 2001-09-18 23:03     ` Eric W. Biederman
  2001-09-18 23:35       ` Mark Hahn
  0 siblings, 1 reply; 6+ messages in thread
From: Eric W. Biederman @ 2001-09-18 23:03 UTC (permalink / raw)
  To: Ronald G Minnich; +Cc: linux-kernel, linuxbios

Ronald G Minnich <rminnich@lanl.gov> writes:

> We can run the memtest, but I thought that a fully booting kernel was a
> pretty good one.

It is hard to call.  The most interesting case I know of is the VIA kt133
AMD bug.  I believe it is register 0x55 bit 7 that when set causes an
athlon optimized memcpy to crash the machine, but when clear it works.

PIII optimized kernels worked fine.
 
> I'll try that anywy.

I don't expect a run of memtest86 to produce any problems but it just
feels like bad memory in the case you are describing.

Eric


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: LinuxBIOS + ASUS CUA + 2.4.5 works; with 2.4.6 locks up
  2001-09-18 23:03     ` Eric W. Biederman
@ 2001-09-18 23:35       ` Mark Hahn
  2001-09-18 23:43         ` Eric W. Biederman
  0 siblings, 1 reply; 6+ messages in thread
From: Mark Hahn @ 2001-09-18 23:35 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Ronald G Minnich, linux-kernel

> It is hard to call.  The most interesting case I know of is the VIA kt133
> AMD bug.  I believe it is register 0x55 bit 7 that when set causes an
> athlon optimized memcpy to crash the machine, but when clear it works.

"causes" is a bit strong - there are plenty of machines where it 
most definitely doesn't effect stability at all.  that is, kt133a
machines which use Arjan's fast_copy_page without any problem,
and yet have 0x55:7 set.  (my A7V133's is 0x89, for instance.)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: LinuxBIOS + ASUS CUA + 2.4.5 works; with 2.4.6 locks up
  2001-09-18 23:35       ` Mark Hahn
@ 2001-09-18 23:43         ` Eric W. Biederman
  0 siblings, 0 replies; 6+ messages in thread
From: Eric W. Biederman @ 2001-09-18 23:43 UTC (permalink / raw)
  To: Mark Hahn; +Cc: linux-kernel

Mark Hahn <hahn@physics.mcmaster.ca> writes:

> > It is hard to call.  The most interesting case I know of is the VIA kt133
> > AMD bug.  I believe it is register 0x55 bit 7 that when set causes an
> > athlon optimized memcpy to crash the machine, but when clear it works.
> 
> "causes" is a bit strong - there are plenty of machines where it 
> most definitely doesn't effect stability at all.  that is, kt133a
> machines which use Arjan's fast_copy_page without any problem,
> and yet have 0x55:7 set.  (my A7V133's is 0x89, for instance.)

Granted.  But it it does seem to be the cause for the set of systems affected.
And it nicely illustrates the point that you can have very weird problems that
don't show up under normal circumstances.

In the kt133 case it feels like a problem initializing the cpu<->northbridge bus.

In Rons case it is probably completely different.

Eric


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2001-09-18 23:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-09-18 22:22 LinuxBIOS + ASUS CUA + 2.4.5 works; with 2.4.6 locks up Ronald G Minnich
2001-09-18 22:54 ` Eric W. Biederman
2001-09-18 22:56   ` Ronald G Minnich
2001-09-18 23:03     ` Eric W. Biederman
2001-09-18 23:35       ` Mark Hahn
2001-09-18 23:43         ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).