linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Machine Check Exception: 0...04
@ 2007-06-16 20:40 Mr. James W. Laferriere
  2007-06-17 12:08 ` Joachim Deguara
  2007-06-17 22:38 ` Mr. James W. Laferriere
  0 siblings, 2 replies; 4+ messages in thread
From: Mr. James W. Laferriere @ 2007-06-16 20:40 UTC (permalink / raw)
  To: Linux Kernel Maillist

 	Hello All ,  Does anoyone know howto identify a cause for these(*) ?
 	Or of any tools to help in the identification of the cause ?
 	So far the Machine checks only happen when I am running bonnie++ against
 	my software raid6 array .

 	I have done everything I know to do to attempt to ascertain what is
 	causing the machine checks .
 	ie:
  1)	memtest86+ for days ,  no errors .
  2)	cpuburnP6 ,  The tests run were 'cpuburnP6 E' & 'cpuburnP6 H' for ~
 	60 minutes each .  All CPU's & HT were at 96+<->100% for 60+ Minutes ,
 	no excessive heating or lockups . In single user mode of course .
 	I know cpuburn is old but it can excersize the comms between l1 & cpu
 	& l1 & l2 -> cpu if done right .

(*)
CPU 5: Machine Check Exception: 0000000000000004
CPU 4: Machine Check Exception: 0000000000000004
Kernel panic - not syncing: Unable to continue
<system reboots>

root@(none):~ # uname -a
Linux (none) 2.6.21.5 #1 SMP Fri Jun 15 04:37:23 UTC 2007 i686 pentium4 i386 GNU/Linux

Complete serial console log of a boot to single user mode is here .

http://www.baby-dragons.com/test-2.6.21.5-mptscsi-4.00.10.00-2007006161326.log

 		Tia ,  JimL
-- 
+-----------------------------------------------------------------+
| James   W.   Laferriere | System   Techniques | Give me VMS     |
| Network        Engineer | 663  Beaumont  Blvd |  Give me Linux  |
| babydr@baby-dragons.com | Pacifica, CA. 94044 |   only  on  AXP |
+-----------------------------------------------------------------+


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Machine Check Exception: 0...04
  2007-06-16 20:40 Machine Check Exception: 0...04 Mr. James W. Laferriere
@ 2007-06-17 12:08 ` Joachim Deguara
  2007-06-17 22:38 ` Mr. James W. Laferriere
  1 sibling, 0 replies; 4+ messages in thread
From: Joachim Deguara @ 2007-06-17 12:08 UTC (permalink / raw)
  To: Mr. James W. Laferriere; +Cc: Linux Kernel Maillist

On Saturday 16 June 2007 22:40:53 Mr. James W. Laferriere wrote:
>  	Hello All ,  Does anoyone know howto identify a cause for these(*) ?
>  	Or of any tools to help in the identification of the cause ?
>  	So far the Machine checks only happen when I am running bonnie++ against
>  	my software raid6 array .

You should run mcelog to decode what the machine checks mean.  These do point 
to a hardware problem.   Just briefly judging from you description of when 
the MCEs happen I wonder if the power supply is getting maxed out driving all 
of those disks.  Either way, use mcelog to find out what the MCEs are first.

-Joachim



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Machine Check Exception: 0...04
  2007-06-16 20:40 Machine Check Exception: 0...04 Mr. James W. Laferriere
  2007-06-17 12:08 ` Joachim Deguara
@ 2007-06-17 22:38 ` Mr. James W. Laferriere
  2007-06-17 22:42   ` Jesper Juhl
  1 sibling, 1 reply; 4+ messages in thread
From: Mr. James W. Laferriere @ 2007-06-17 22:38 UTC (permalink / raw)
  To: Linux Kernel Maillist

 	Hello All ,  As a continuation .

On Sat, 16 Jun 2007, Mr. James W. Laferriere wrote:
> 	Hello All ,  Does anoyone know howto identify a cause for these(*) ?
> 	Or of any tools to help in the identification of the cause ?
> 	So far the Machine checks only happen when I am running bonnie++ 
> against
> 	my software raid6 array .
>
> 	I have done everything I know to do to attempt to ascertain what is
> 	causing the machine checks .
> 	ie:
> 1)	memtest86+ for days ,  no errors .
> 2)	cpuburnP6 ,  The tests run were 'cpuburnP6 E' & 'cpuburnP6 H' for ~
> 	60 minutes each .  All CPU's & HT were at 96+<->100% for 60+ Minutes 
> ,
> 	no excessive heating or lockups . In single user mode of course .
> 	I know cpuburn is old but it can excersize the comms between l1 & cpu
> 	& l1 & l2 -> cpu if done right .
>
> (*)
> CPU 5: Machine Check Exception: 0000000000000004
> CPU 4: Machine Check Exception: 0000000000000004
> Kernel panic - not syncing: Unable to continue
> <system reboots>
>
> root@(none):~ # uname -a
> Linux (none) 2.6.21.5 #1 SMP Fri Jun 15 04:37:23 UTC 2007 i686 pentium4 i386 
> GNU/Linux
>
> Complete serial console log of a boot to single user mode is here .
>
> http://www.baby-dragons.com/test-2.6.21.5-mptscsi-4.00.10.00-2007006161326.log
>

 	Hopefully more useful information .  Again only if I am doing any HEAVY
 	disk activity ie: bonnie++ .
 	I am not able to find a tool to disassamble the EIP: portion of the
 	CPU 4: output .

eth2: after: tx_done_idx=125 free_idx=3 cmdsts=800005ea
CPU 5: Machine Check Exception: 0000000000000004
CPU 4: Machine Check Exception: 0000000000000004
CPU 4: EIP: c0100c72 EFLAGS: 00000246
         eax: 00000000 ebx: 00000000 ecx: 00000000 edx: 00000000
         esi: 00000000 edi: f7c58000 ebp: f7c59f68 esp: f7c59f5c
Kernel panic - not syncing: Unable to continue

 		Tia ,  JimL
-- 
+-----------------------------------------------------------------+
| James   W.   Laferriere | System   Techniques | Give me VMS     |
| Network        Engineer | 663  Beaumont  Blvd |  Give me Linux  |
| babydr@baby-dragons.com | Pacifica, CA. 94044 |   only  on  AXP |
+-----------------------------------------------------------------+

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Machine Check Exception: 0...04
  2007-06-17 22:38 ` Mr. James W. Laferriere
@ 2007-06-17 22:42   ` Jesper Juhl
  0 siblings, 0 replies; 4+ messages in thread
From: Jesper Juhl @ 2007-06-17 22:42 UTC (permalink / raw)
  To: Mr. James W. Laferriere; +Cc: Linux Kernel Maillist

On 18/06/07, Mr. James W. Laferriere <babydr@baby-dragons.com> wrote:
>         Hello All ,  As a continuation .
>
> On Sat, 16 Jun 2007, Mr. James W. Laferriere wrote:
> >       Hello All ,  Does anoyone know howto identify a cause for these(*) ?
> >       Or of any tools to help in the identification of the cause ?
> >       So far the Machine checks only happen when I am running bonnie++
> > against
> >       my software raid6 array .
> >
> >       I have done everything I know to do to attempt to ascertain what is
> >       causing the machine checks .
> >       ie:
> > 1)    memtest86+ for days ,  no errors .
> > 2)    cpuburnP6 ,  The tests run were 'cpuburnP6 E' & 'cpuburnP6 H' for ~
> >       60 minutes each .  All CPU's & HT were at 96+<->100% for 60+ Minutes
> > ,
> >       no excessive heating or lockups . In single user mode of course .
> >       I know cpuburn is old but it can excersize the comms between l1 & cpu
> >       & l1 & l2 -> cpu if done right .
> >
> > (*)
> > CPU 5: Machine Check Exception: 0000000000000004
> > CPU 4: Machine Check Exception: 0000000000000004
> > Kernel panic - not syncing: Unable to continue
> > <system reboots>
> >

An MCE is an error reported by the hardware. It is most likely not a
software problem, not much kernel people can do about it.

Google for "parsemce.c" to find a program that'll decode most MCE's for you.

You may also want to contact your hardware vendor to get an exact
explanation for the error.

-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-06-22  6:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-06-16 20:40 Machine Check Exception: 0...04 Mr. James W. Laferriere
2007-06-17 12:08 ` Joachim Deguara
2007-06-17 22:38 ` Mr. James W. Laferriere
2007-06-17 22:42   ` Jesper Juhl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).