All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.6: The hardware reports a non fatal, correctable incident occured on CPU 0.
@ 2004-01-09 16:48 lkml
  2004-01-09 17:35 ` Jesper Juhl
  2004-01-09 23:12 ` Eric
  0 siblings, 2 replies; 7+ messages in thread
From: lkml @ 2004-01-09 16:48 UTC (permalink / raw)
  To: linux-kernel

Hi!

I did have some very scary issues today playing with 2.6. The system was 
booted and ran several times today, the longtest uptime was approximately 
about an hour.

But then shortly after having booted 2.6 I got syslog messages: 

The hardware reports a non fatal, correctable incident occured on CPU 0.

I shut down the machine. After this my Athlon XP 2200+ showed up as 1050MHz in 
BIOS an indeed the bus frequency was set to 100 instead of 133 MHz (how can 
an OS change the BIOS?!) - nevertheless the CPU should have shown up as 
1500MHz. I set it back to 133 MHz - which resulted in the machine did not 
even reach the BIOS no more but was rebooting automatically prior to it. I 
turned off the machine for some seconds - no change. I turned it off for a 
few minutes and the BIOS showed up again - with 1050MHz. So I had to set the 
freq back to 133 MHz a second time. I booted my 2.4.21 kernel which seems to 
run.

What the fuck is going on here?? As far as I figured out this has something to 
do with MCE (CONFIG_X86_MCE=y, CONFIG_X86_MCE_NONFATAL=y) (?).

TIA
Timo


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6: The hardware reports a non fatal, correctable incident occured on CPU 0.
  2004-01-09 16:48 2.6: The hardware reports a non fatal, correctable incident occured on CPU 0 lkml
@ 2004-01-09 17:35 ` Jesper Juhl
       [not found]   ` <200401091856.16120.lkml@nitwit.de>
  2004-01-09 23:12 ` Eric
  1 sibling, 1 reply; 7+ messages in thread
From: Jesper Juhl @ 2004-01-09 17:35 UTC (permalink / raw)
  To: lkml; +Cc: linux-kernel


On Fri, 9 Jan 2004 lkml@nitwit.de wrote:

> Hi!
>
> I did have some very scary issues today playing with 2.6. The system was
> booted and ran several times today, the longtest uptime was approximately
> about an hour.
>
> But then shortly after having booted 2.6 I got syslog messages:
>
> The hardware reports a non fatal, correctable incident occured on CPU 0.
>
> I shut down the machine. After this my Athlon XP 2200+ showed up as 1050MHz in
> BIOS an indeed the bus frequency was set to 100 instead of 133 MHz (how can
> an OS change the BIOS?!)

It's nothing to do with the OS most likely. Some BIOS's modify the FSB
speed and other settings as a way to provide a sort of "fail safe" boot
mode if a problem was detected.

The BIOS on my board will do that if the system fails to POST and I've
also seen it happen sometimes after a crash.

It's even documented in the motherboard manual that it will behave this
way when running in JumperFree mode (this is an ASUS A7M266 board btw).
The exact text from my motherboard manual is :

"Notes for JumperFree Mode
 System Hangup

 If your system crashes or hangs due to improper frequency settings, power
 OFF your system and restart. The system will start up in safe mode
 running at a DRAM-to-CPU frequency ratio of 3:3 and a bus speed of
 100MHz. You will then be led to BIOS setup to adjust the configurations."


-- Jesper Juhl


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6: The hardware reports a non fatal, correctable incident occured on CPU 0.
       [not found]   ` <200401091856.16120.lkml@nitwit.de>
@ 2004-01-09 18:10     ` Jesper Juhl
  0 siblings, 0 replies; 7+ messages in thread
From: Jesper Juhl @ 2004-01-09 18:10 UTC (permalink / raw)
  To: lkml; +Cc: linux-kernel


On Fri, 9 Jan 2004 lkml@nitwit.de wrote:

> On Friday 09 January 2004 18:35, Jesper Juhl wrote:
> > It's nothing to do with the OS most likely. Some BIOS's modify the FSB
> > speed and other settings as a way to provide a sort of "fail safe" boot
> > mode if a problem was detected.
>
> So, in your opinion I really have hardware problems (which yet didn't notice
> and also for 3,5h did not recurr)?
>
All I'm saying is that I know for a fact that some BIOS's will do this
(set bus speed to 100) if they detect problems - I know mine does.

It's just one possibility. I don't actually /know/ what causes what you
experienced.
I guess it's possible that something the kernel did caused the BIOS to
think there was a problem even though there was not...
Or it could be something else entirely.
I don't know for sure. All I can do is suggest that maybe you should check
your motherboard manual for any hints on this behaviour and maybe try and
test your hardware just to be safe...

Other people probably have better advice for you.


-- Jesper Juhl


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6: The hardware reports a non fatal, correctable incident occured on CPU 0.
  2004-01-09 16:48 2.6: The hardware reports a non fatal, correctable incident occured on CPU 0 lkml
  2004-01-09 17:35 ` Jesper Juhl
@ 2004-01-09 23:12 ` Eric
  2004-01-09 23:30   ` Prakash K. Cheemplavam
  2004-01-10 17:16   ` lkml
  1 sibling, 2 replies; 7+ messages in thread
From: Eric @ 2004-01-09 23:12 UTC (permalink / raw)
  To: lkml, linux-kernel

On Friday 09 January 2004 10:48 am, lkml@nitwit.de wrote:
> Hi!
>
> I did have some very scary issues today playing with 2.6. The system was
> booted and ran several times today, the longtest uptime was approximately
> about an hour.
>
> But then shortly after having booted 2.6 I got syslog messages:
>
> The hardware reports a non fatal, correctable incident occured on CPU 0.
>
> I shut down the machine. After this my Athlon XP 2200+ showed up as 1050MHz
> in BIOS an indeed the bus frequency was set to 100 instead of 133 MHz (how
> can an OS change the BIOS?!) - nevertheless the CPU should have shown up as
> 1500MHz. I set it back to 133 MHz - which resulted in the machine did not
> even reach the BIOS no more but was rebooting automatically prior to it. I
> turned off the machine for some seconds - no change. I turned it off for a
> few minutes and the BIOS showed up again - with 1050MHz. So I had to set
> the freq back to 133 MHz a second time. I booted my 2.4.21 kernel which
> seems to run.
	Check your hardware CPU/MOBO/RAM. Overheating? Bad Ram? Cheap mobo?
MCE should not be triggered under any circumstances unless it is a kernel 
bug(RARE, I believe the MCE code is simple) or you REALLY have a hardware 
problem. As said before, the bios is resetting your fsb to 100 as a fail-safe 
because something bad happened.
	BTW, check your setup, an AMD 2200+ should run at 1.8ghz i believe. If you 
are setting your FSB or multiplier too low, that might also be triggering a 
problem. A quick google lists amd xp2200+ as 1800mhz

> What the fuck is going on here?? As far as I figured out this has something
> to do with MCE (CONFIG_X86_MCE=y, CONFIG_X86_MCE_NONFATAL=y) (?).
	Leave it enabled, its a good thing to tell you when you have bad hardware. 
Its not a kernel problem, but a feature.
-------------------------
Eric Bambach
Eric at cisu dot net
-------------------------

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6: The hardware reports a non fatal, correctable incident occured on CPU 0.
  2004-01-09 23:12 ` Eric
@ 2004-01-09 23:30   ` Prakash K. Cheemplavam
  2004-01-10 17:16   ` lkml
  1 sibling, 0 replies; 7+ messages in thread
From: Prakash K. Cheemplavam @ 2004-01-09 23:30 UTC (permalink / raw)
  To: Eric; +Cc: lkml, linux-kernel

Eric wrote:
> On Friday 09 January 2004 10:48 am, lkml@nitwit.de wrote:
> 
>>Hi!
>>
>>I did have some very scary issues today playing with 2.6. The system was
>>booted and ran several times today, the longtest uptime was approximately
>>about an hour.
>>
>>But then shortly after having booted 2.6 I got syslog messages:
>>
>>The hardware reports a non fatal, correctable incident occured on CPU 0.
>>
>>I shut down the machine. After this my Athlon XP 2200+ showed up as 1050MHz
>>in BIOS an indeed the bus frequency was set to 100 instead of 133 MHz (how
>>can an OS change the BIOS?!) - nevertheless the CPU should have shown up as
>>1500MHz. I set it back to 133 MHz - which resulted in the machine did not
>>even reach the BIOS no more but was rebooting automatically prior to it. I
>>turned off the machine for some seconds - no change. I turned it off for a
>>few minutes and the BIOS showed up again - with 1050MHz. So I had to set
>>the freq back to 133 MHz a second time. I booted my 2.4.21 kernel which
>>seems to run.
> 
> 	Check your hardware CPU/MOBO/RAM. Overheating? Bad Ram? Cheap mobo?
> MCE should not be triggered under any circumstances unless it is a kernel 
> bug(RARE, I believe the MCE code is simple) or you REALLY have a hardware 
> problem. As said before, the bios is resetting your fsb to 100 as a fail-safe 
> because something bad happened.
> 	BTW, check your setup, an AMD 2200+ should run at 1.8ghz i believe. If you 
> are setting your FSB or multiplier too low, that might also be triggering a 
> problem. A quick google lists amd xp2200+ as 1800mhz

Yes, I would also say that. With my Athlon XP 1700+ (1.466 GHZ, FSB 
133MHZ) clocked at 2.2GHz (FSB200) I get MCE errors, but at 2.1GHz not, 
  even though I can't find stability issues at 2.2GHz. Nevertheless I 
run the system at 2.1GHz.

Prakash

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6: The hardware reports a non fatal, correctable incident occured on CPU 0.
  2004-01-09 23:12 ` Eric
  2004-01-09 23:30   ` Prakash K. Cheemplavam
@ 2004-01-10 17:16   ` lkml
  2004-01-14  4:40     ` Dave Jones
  1 sibling, 1 reply; 7+ messages in thread
From: lkml @ 2004-01-10 17:16 UTC (permalink / raw)
  To: Eric, linux-kernel

On Saturday 10 January 2004 00:12, Eric wrote:
> 	Check your hardware CPU/MOBO/RAM. Overheating? Bad Ram? Cheap mobo?
> MCE should not be triggered under any circumstances unless it is a kernel
> bug(RARE, I believe the MCE code is simple) or you REALLY have a hardware
> problem. As said before, the bios is resetting your fsb to 100 as a
> fail-safe because something bad happened.

Well, my system did run very stable and in the meantime again does run very 
stable on both, 2.4.21 and Windows XP...

> 	BTW, check your setup, an AMD 2200+ should run at 1.8ghz i believe. If you

Yes.

> > What the fuck is going on here?? As far as I figured out this has
> > something to do with MCE (CONFIG_X86_MCE=y, CONFIG_X86_MCE_NONFATAL=y)
> > (?).
>
> 	Leave it enabled, its a good thing to tell you when you have bad hardware.
> Its not a kernel problem, but a feature.

Well, it is a good thing to tell me, but it's not a good thing to make my 
system auto-reset itself before reaching the BIOS afterwards...

timo


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6: The hardware reports a non fatal, correctable incident occured on CPU 0.
  2004-01-10 17:16   ` lkml
@ 2004-01-14  4:40     ` Dave Jones
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Jones @ 2004-01-14  4:40 UTC (permalink / raw)
  To: lkml; +Cc: Eric, linux-kernel

On Sat, Jan 10, 2004 at 06:16:22PM +0100, lkml@nitwit.de wrote:

 > > 	Check your hardware CPU/MOBO/RAM. Overheating? Bad Ram? Cheap mobo?
 > > MCE should not be triggered under any circumstances unless it is a kernel
 > > bug(RARE, I believe the MCE code is simple) or you REALLY have a hardware
 > > problem. As said before, the bios is resetting your fsb to 100 as a
 > > fail-safe because something bad happened.
 > 
 > Well, my system did run very stable and in the meantime again does run very 
 > stable on both, 2.4.21 and Windows XP...

Neither of which check for the presence of these errors.

 > > > What the fuck is going on here?? As far as I figured out this has
 > > > something to do with MCE (CONFIG_X86_MCE=y, CONFIG_X86_MCE_NONFATAL=y)
 > > > (?).
 > >
 > > 	Leave it enabled, its a good thing to tell you when you have bad hardware.
 > > Its not a kernel problem, but a feature.
 > 
 > Well, it is a good thing to tell me, but it's not a good thing to make my 
 > system auto-reset itself before reaching the BIOS afterwards...

The non-fatal MCE code doesn't do anything like that.  Any odd side-effects that you
observed were very likely due to whatever caused the MCE in the first place.


		Dave


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-01-14  4:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-09 16:48 2.6: The hardware reports a non fatal, correctable incident occured on CPU 0 lkml
2004-01-09 17:35 ` Jesper Juhl
     [not found]   ` <200401091856.16120.lkml@nitwit.de>
2004-01-09 18:10     ` Jesper Juhl
2004-01-09 23:12 ` Eric
2004-01-09 23:30   ` Prakash K. Cheemplavam
2004-01-10 17:16   ` lkml
2004-01-14  4:40     ` Dave Jones

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.