All of lore.kernel.org
 help / color / mirror / Atom feed
* [U-Boot] AMCC 405EX Trap
@ 2009-04-29 19:45 Jonathan Haws
  2009-04-29 23:37 ` Grant Erickson
  0 siblings, 1 reply; 14+ messages in thread
From: Jonathan Haws @ 2009-04-29 19:45 UTC (permalink / raw)
  To: u-boot

All,

I am experiencing a machine check on a custom AMCC 405EX PPC board.  Our board is based on the AMCC Kilauea evaluation board.  We have a few of these boards that are up and running, but I am trying to track down a machine check error on a couple of them.

My question for you is this:  when the registers are printed to the console, there is one called TRAP.  I want to know how/where/when and with what data that gets populated.  I have read through the AMCC manuals a couple of times trying to find it and have searched through the U-Boot code to no avail.  All I know is that there is a data type "struct pt_regs*" that contains all that data, but nowhere can I find where it is populated.

Below is the console output.  The line "!!!! PAUSE !!!!" was inserted by me after I copied the text from the console to remind me of the ~20 second pause that occurs at that point.

I am hoping that someone can point me to the bit definitions for whatever register is being displayed in TRAP.  From there, I think I can trace the problem back to the specific piece of hardware and get it fixed.

Thanks!

Jonathan




U-Boot 1.3.4 (Apr 28 2009 - 16:10:06)

CPU:   AMCC PowerPC 405EX Rev. C at 400 MHz (PLB=200, OPB=100, EBC=100 MHz)
       Security support
       Bootstrap Option C - Boot ROM Location EBC (16 bits)
       16 kB I-Cache 16 kB D-Cache
Board: SDLPPC - RT PPC405EX Board
I2C:   ready
DRAM:  256 MB
Reserving 16384k for kernel logbuffer at 0fffb000
Top of RAM usable for U-Boot at: 0fffb000
Reserving 306k for U-Boot at: 0ffae000
Reserving 1040k for malloc() at: 0feaa000
Reserving 124 Bytes for Board Info at: 0fea9f84
Reserving 64 Bytes for Global Data at: 0fea9f44
Stack Pointer at: 0fea9f28
New Stack Pointer is: 0fea9f28
 !!!! PAUSE !!!!
Now running in RAM - U-Boot at: 0ffae000
 -> Initializing logBuff pointers...
 -> Calling post_output_backlog()...
 -> Calling post_reloc()...
 -> Sync'ing CPU...
 -> Setting up trap handlers...
Bus Fault @ 0x00000000, fixup 0x00000000
Machine Check Exception.
Caused by (from msr): regs 0fea9de8 Instruction Synchronous Machine Check exception
NIP: 00000000 XER: 20000000 LR: 0FFB071C REGS: 0fea9de8 TRAP: 0200 DEAR: 00000000
MSR: 00000000 EE: 0 PR: 0 FP: 0 ME: 0 IR/DR: 00

GPR00: 0FFB039C 0FEA9ED8 0FEA9F44 0FFAE000 0FFB3D6C 00000001 00000001 00021000
GPR08: 00000600 00002098 17D78400 00000002 2BA7DEF3 FFFFFFFF 0FFF1D00 1000E000
GPR16: 775DF377 FFFFF6FF FFFFFFFF FF5FF7FF FFFFFFFF FFFFFFFF FFDFFFFF FF7FFFFF
GPR24: 0000A000 0FEA9F44 0FFAE000 0FEA9F84 0FEAE000 0FEA9F84 0FFF1ED8 00000000
Call backtrace:
0FFB3D64 0FFB06A4
machine check

--
Jonathan R. Haws
Electrical Engineering
Space Dynamics Laboratory
 
Jonathan.Haws at sdl.usu.edu
(435)797-4629

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] AMCC 405EX Trap
  2009-04-29 19:45 [U-Boot] AMCC 405EX Trap Jonathan Haws
@ 2009-04-29 23:37 ` Grant Erickson
  2009-04-30 14:34   ` Jonathan Haws
  0 siblings, 1 reply; 14+ messages in thread
From: Grant Erickson @ 2009-04-29 23:37 UTC (permalink / raw)
  To: u-boot

On 4/29/09 12:45 PM, Jonathan Haws wrote:
> I am experiencing a machine check on a custom AMCC 405EX PPC board.  Our board
> is based on the AMCC Kilauea evaluation board.  We have a few of these boards
> that are up and running, but I am trying to track down a machine check error
> on a couple of them.
> 
> My question for you is this:  when the registers are printed to the console,
> there is one called TRAP.  I want to know how/where/when and with what data
> that gets populated.  I have read through the AMCC manuals a couple of times
> trying to find it and have searched through the U-Boot code to no avail.  All
> I know is that there is a data type "struct pt_regs*" that contains all that
> data, but nowhere can I find where it is populated.
> 
> Below is the console output.  The line "!!!! PAUSE !!!!" was inserted by me
> after I copied the text from the console to remind me of the ~20 second pause
> that occurs at that point.
> 
> I am hoping that someone can point me to the bit definitions for whatever
> register is being displayed in TRAP.  From there, I think I can trace the
> problem back to the specific piece of hardware and get it fixed.

Jonathan:

Typically machine checks such as this are latent and are more about
something that happened earlier during bootstrap and initialization rather
than something that happened at the time the machine check was actually
realized. This is because up until that point, exceptions have not been
enabled.

The first thing to check is your u-boot board configuration file. Are all
EBC settings correct? Are all SDRAM settings correct? Are you using the
right addresses and chip selects for data cache bootstrapping?

Beyond that, it might be useful to single step with your BDI/GDB (or other
debugger) from start.S forward, watching key exception registers after every
step.

To assist with such debugging, I defined the following macro in my .gdbinit
file to dump relevant registers after every single step:

    .gdbinit:
        define dumpexcregs
            monitor rd msr
            monitor rd esr
            monitor rd dead
            monitor rd srr0
            monitor rd srr1
            monitor rd srr2
            monitor rd srr3
            monitor rd mcsr
            monitor rd mcar
            monitor rd mcsrr0
            monitor rd mcsrr1
            monitor rd ebc_besr0
            monitor rd ebc_besr1
            monitor rd sdram_besr0
            monitor rd sdram_besr0
            monitor rd sdram_bearl
            monitor rd sdram_bearh
        end

Regards,

Grant Erickson

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] AMCC 405EX Trap
  2009-04-29 23:37 ` Grant Erickson
@ 2009-04-30 14:34   ` Jonathan Haws
  2009-05-04  7:53     ` Stefan Roese
  0 siblings, 1 reply; 14+ messages in thread
From: Jonathan Haws @ 2009-04-30 14:34 UTC (permalink / raw)
  To: u-boot

Grant,

Thanks for the reply.

I am certain that it is a hardware failure that is causing the machine check because I can use the exact same binary on another (identical) board and have it boot just fine.  That tells me that all the EBC and SDRAM settings are correct; and that I am using the right addresses and chip selects for the data cache.

Currently I am leaning toward an SDRAM problem because I get about a 20 second pause when U-Boot tries to relocate to RAM.

Again, thanks for the reply.

Jonathan



> -----Original Message-----
> From: Grant Erickson [mailto:gerickson at nuovations.com]
> Sent: Wednesday, April 29, 2009 5:38 PM
> To: Jonathan Haws
> Cc: u-boot at lists.denx.de
> Subject: Re: [U-Boot] AMCC 405EX Trap
> 
> On 4/29/09 12:45 PM, Jonathan Haws wrote:
> > I am experiencing a machine check on a custom AMCC 405EX PPC board.  Our
> board
> > is based on the AMCC Kilauea evaluation board.  We have a few of these
> boards
> > that are up and running, but I am trying to track down a machine check
> error
> > on a couple of them.
> >
> > My question for you is this:  when the registers are printed to the
> console,
> > there is one called TRAP.  I want to know how/where/when and with what
> data
> > that gets populated.  I have read through the AMCC manuals a couple of
> times
> > trying to find it and have searched through the U-Boot code to no avail.
> All
> > I know is that there is a data type "struct pt_regs*" that contains all
> that
> > data, but nowhere can I find where it is populated.
> >
> > Below is the console output.  The line "!!!! PAUSE !!!!" was inserted by
> me
> > after I copied the text from the console to remind me of the ~20 second
> pause
> > that occurs at that point.
> >
> > I am hoping that someone can point me to the bit definitions for
> whatever
> > register is being displayed in TRAP.  From there, I think I can trace
> the
> > problem back to the specific piece of hardware and get it fixed.
> 
> Jonathan:
> 
> Typically machine checks such as this are latent and are more about
> something that happened earlier during bootstrap and initialization rather
> than something that happened at the time the machine check was actually
> realized. This is because up until that point, exceptions have not been
> enabled.
> 
> The first thing to check is your u-boot board configuration file. Are all
> EBC settings correct? Are all SDRAM settings correct? Are you using the
> right addresses and chip selects for data cache bootstrapping?
> 
> Beyond that, it might be useful to single step with your BDI/GDB (or other
> debugger) from start.S forward, watching key exception registers after
> every
> step.
> 
> To assist with such debugging, I defined the following macro in my
> .gdbinit
> file to dump relevant registers after every single step:
> 
>     .gdbinit:
>         define dumpexcregs
>             monitor rd msr
>             monitor rd esr
>             monitor rd dead
>             monitor rd srr0
>             monitor rd srr1
>             monitor rd srr2
>             monitor rd srr3
>             monitor rd mcsr
>             monitor rd mcar
>             monitor rd mcsrr0
>             monitor rd mcsrr1
>             monitor rd ebc_besr0
>             monitor rd ebc_besr1
>             monitor rd sdram_besr0
>             monitor rd sdram_besr0
>             monitor rd sdram_bearl
>             monitor rd sdram_bearh
>         end
> 
> Regards,
> 
> Grant Erickson
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] AMCC 405EX Trap
  2009-04-30 14:34   ` Jonathan Haws
@ 2009-05-04  7:53     ` Stefan Roese
  2009-05-04 14:43       ` Jonathan Haws
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Roese @ 2009-05-04  7:53 UTC (permalink / raw)
  To: u-boot

On Thursday 30 April 2009, Jonathan Haws wrote:
> I am certain that it is a hardware failure that is causing the machine
> check because I can use the exact same binary on another (identical) board
> and have it boot just fine.  That tells me that all the EBC and SDRAM
> settings are correct;

From my experience you can't be sure that SDRAM setting are "currect" at this 
stage.

> and that I am using the right addresses and chip 
> selects for the data cache.
>
> Currently I am leaning toward an SDRAM problem because I get about a 20
> second pause when U-Boot tries to relocate to RAM.

Yes, I'm pretty sure that you have some SDRAM related problems. Either 
configuration is non-optimal, or even (perhaps more unlikely) a hardware 
problem. I suggest that you re-check the DDR2 autocalibration (method A & B).

Best regards,
Stefan

=====================================================================
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: +49-8142-66989-0 Fax: +49-8142-66989-80  Email: office at denx.de
=====================================================================

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] AMCC 405EX Trap
  2009-05-04  7:53     ` Stefan Roese
@ 2009-05-04 14:43       ` Jonathan Haws
  2009-05-04 14:53         ` Grant Erickson
                           ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Jonathan Haws @ 2009-05-04 14:43 UTC (permalink / raw)
  To: u-boot

> On Thursday 30 April 2009, Jonathan Haws wrote:
> > I am certain that it is a hardware failure that is causing the machine
> > check because I can use the exact same binary on another (identical)
> board
> > and have it boot just fine.  That tells me that all the EBC and SDRAM
> > settings are correct;
> 
> From my experience you can't be sure that SDRAM setting are "currect" at
> this
> stage.

Would that be the case on our other 6 boards then?  We have 6 boards that are up and running with the exact same U-Boot binary file.  If there was a problem with the SDRAM settings on one board, would not the other board show the same symptoms?  That is the reason why I have not dug deeper into the SDRAM initialization.  However, I will take your advice and do so, because if there is a problem there, then the other boards may be experiencing problems, just not to the extent that this one is.

> 
> > and that I am using the right addresses and chip
> > selects for the data cache.
> >
> > Currently I am leaning toward an SDRAM problem because I get about a 20
> > second pause when U-Boot tries to relocate to RAM.
> 
> Yes, I'm pretty sure that you have some SDRAM related problems. Either
> configuration is non-optimal, or even (perhaps more unlikely) a hardware
> problem. I suggest that you re-check the DDR2 autocalibration (method A &
> B).
> 

Thanks for confirming my initial hunch - problem lies in SDRAM.  I will let you know what I find - whether it is a hardware or software problem.

One thing I may mention is that we had the SDRAM chips re-balled before they were mounted on the board.  Maybe something went wrong during that process on these chips on the problem board - who knows.

Anyway, once I get this resolved I will post the solution.

Thanks all!

Jonathan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] AMCC 405EX Trap
  2009-05-04 14:43       ` Jonathan Haws
@ 2009-05-04 14:53         ` Grant Erickson
  2009-05-04 14:56         ` Stefan Roese
  2009-05-04 14:58         ` Jerry Van Baren
  2 siblings, 0 replies; 14+ messages in thread
From: Grant Erickson @ 2009-05-04 14:53 UTC (permalink / raw)
  To: u-boot

On 5/4/09 7:43 AM, Jonathan Haws wrote:
>> On Thursday 30 April 2009, Jonathan Haws wrote:
>>> I am certain that it is a hardware failure that is causing the machine
>>> check because I can use the exact same binary on another (identical)
>>> board and have it boot just fine.  That tells me that all the EBC and SDRAM
>>> settings are correct;
>> 
>> From my experience you can't be sure that SDRAM setting are "currect" at
>> this stage.
> 
> Would that be the case on our other 6 boards then?  We have 6 boards that are
> up and running with the exact same U-Boot binary file.  If there was a problem
> with the SDRAM settings on one board, would not the other board show the same
> symptoms?  That is the reason why I have not dug deeper into the SDRAM
> initialization.  However, I will take your advice and do so, because if there
> is a problem there, then the other boards may be experiencing problems, just
> not to the extent that this one is.

Have these additional six boards passed four corners testing with an
intensive and exhaustive memory diagnostic?

Regards,

Grant

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] AMCC 405EX Trap
  2009-05-04 14:43       ` Jonathan Haws
  2009-05-04 14:53         ` Grant Erickson
@ 2009-05-04 14:56         ` Stefan Roese
  2009-05-04 15:01           ` Jonathan Haws
  2009-05-04 14:58         ` Jerry Van Baren
  2 siblings, 1 reply; 14+ messages in thread
From: Stefan Roese @ 2009-05-04 14:56 UTC (permalink / raw)
  To: u-boot

On Monday 04 May 2009, Jonathan Haws wrote:
> > From my experience you can't be sure that SDRAM setting are "currect" at
> > this
> > stage.
>
> Would that be the case on our other 6 boards then?  We have 6 boards that
> are up and running with the exact same U-Boot binary file.  If there was a
> problem with the SDRAM settings on one board, would not the other board
> show the same symptoms?

Not necessarily. Some SDRAM related problems only show very seldom or only 
under specific conditions (temperature and/or component differences etc). 
This "might" explain why some boards show no problems and other do.

> That is the reason why I have not dug deeper into 
> the SDRAM initialization.  However, I will take your advice and do so,
> because if there is a problem there, then the other boards may be
> experiencing problems, just not to the extent that this one is.
>
> > > and that I am using the right addresses and chip
> > > selects for the data cache.
> > >
> > > Currently I am leaning toward an SDRAM problem because I get about a 20
> > > second pause when U-Boot tries to relocate to RAM.
> >
> > Yes, I'm pretty sure that you have some SDRAM related problems. Either
> > configuration is non-optimal, or even (perhaps more unlikely) a hardware
> > problem. I suggest that you re-check the DDR2 autocalibration (method A &
> > B).
>
> Thanks for confirming my initial hunch - problem lies in SDRAM.  I will let
> you know what I find - whether it is a hardware or software problem.
>
> One thing I may mention is that we had the SDRAM chips re-balled before
> they were mounted on the board.  Maybe something went wrong during that
> process on these chips on the problem board - who knows.

I see. This could be a problem.

I suggest that you run some stress tests in a conditioning cabinet to see if 
the other boards don't show any problems.

Best regards,
Stefan

=====================================================================
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: +49-8142-66989-0 Fax: +49-8142-66989-80  Email: office at denx.de
=====================================================================

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] AMCC 405EX Trap
  2009-05-04 14:43       ` Jonathan Haws
  2009-05-04 14:53         ` Grant Erickson
  2009-05-04 14:56         ` Stefan Roese
@ 2009-05-04 14:58         ` Jerry Van Baren
  2009-05-04 15:05           ` Jonathan Haws
  2 siblings, 1 reply; 14+ messages in thread
From: Jerry Van Baren @ 2009-05-04 14:58 UTC (permalink / raw)
  To: u-boot

Jonathan Haws wrote:
>> On Thursday 30 April 2009, Jonathan Haws wrote:
>>> I am certain that it is a hardware failure that is causing the machine
>>> check because I can use the exact same binary on another (identical)
>> board
>>> and have it boot just fine.  That tells me that all the EBC and SDRAM
>>> settings are correct;
>> From my experience you can't be sure that SDRAM setting are "currect" at
>> this
>> stage.
> 
> Would that be the case on our other 6 boards then?  We have 6 boards
> that are up and running with the exact same U-Boot binary file.  If
> there was a problem with the SDRAM settings on one board, would not
> the other board show the same symptoms?  That is the reason why I
> have not dug deeper into the SDRAM initialization.  However, I will
> take your advice and do so, because if there is a problem there, then
> the other boards may be experiencing problems, just not to the extent
> that this one is.
> 
>>> and that I am using the right addresses and chip
>>> selects for the data cache.
>>>
>>> Currently I am leaning toward an SDRAM problem because I get about a 20
>>> second pause when U-Boot tries to relocate to RAM.
>> Yes, I'm pretty sure that you have some SDRAM related problems. Either
>> configuration is non-optimal, or even (perhaps more unlikely) a hardware
>> problem. I suggest that you re-check the DDR2 autocalibration (method A &
>> B).
>>
> 
> Thanks for confirming my initial hunch - problem lies in SDRAM.  I
> will let you know what I find - whether it is a hardware or software
> problem.
> 
> One thing I may mention is that we had the SDRAM chips re-balled
> before they were mounted on the board.  Maybe something went wrong
> during that process on these chips on the problem board - who knows.
> 
> Anyway, once I get this resolved I will post the solution.
> 
> Thanks all!
> 
> Jonathan

1) Six boards work, one board fails.
2) SDRAM chips re-balled on the failing board.
3) SDRAM failing.

That sounds like a hardware/assembly problem to me.  My bet is a solder 
problem.  Have you (can you) x-ray the chips and verify the SDRAM 
soldering is OK?

Best regards,
gvb

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] AMCC 405EX Trap
  2009-05-04 14:56         ` Stefan Roese
@ 2009-05-04 15:01           ` Jonathan Haws
  2009-05-04 15:08             ` Stefan Roese
  0 siblings, 1 reply; 14+ messages in thread
From: Jonathan Haws @ 2009-05-04 15:01 UTC (permalink / raw)
  To: u-boot

> I suggest that you run some stress tests in a conditioning cabinet to see
> if
> the other boards don't show any problems.

That is a good idea.  I haven't thought of performing those tests.  Are there specific tests I can enable in the U-Boot environment for that?  

We have been using a couple of these boards extensively and in some pretty loaded configurations.  For example, one board has been used as a data capture system to capture gigabytes of data over a network connection.  That uses RAM extensively before it actually writes it out to disk.  However, we have not run any sort of extensive memory diagnostics to check all parts of RAM.  That is next on my list.

Thanks!

Jonathan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] AMCC 405EX Trap
  2009-05-04 14:58         ` Jerry Van Baren
@ 2009-05-04 15:05           ` Jonathan Haws
  0 siblings, 0 replies; 14+ messages in thread
From: Jonathan Haws @ 2009-05-04 15:05 UTC (permalink / raw)
  To: u-boot


> 1) Six boards work, one board fails.
> 2) SDRAM chips re-balled on the failing board.
> 3) SDRAM failing.
> 
> That sounds like a hardware/assembly problem to me.  My bet is a solder
> problem.  Have you (can you) x-ray the chips and verify the SDRAM
> soldering is OK?

That was my initial hunch simply because of the 6 working boards.  Our hardware designer took the board in for x-ray this morning to see if there is an issue there.  If that is the problem, then we are set - though I still plan to run some extensive diagnostics on the memory just to be sure.

Good to hear that someone has the same hunch as I did!

Jonathan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] AMCC 405EX Trap
  2009-05-04 15:01           ` Jonathan Haws
@ 2009-05-04 15:08             ` Stefan Roese
  2009-05-04 15:12               ` Grant Erickson
  2009-05-04 18:19               ` Wolfgang Denk
  0 siblings, 2 replies; 14+ messages in thread
From: Stefan Roese @ 2009-05-04 15:08 UTC (permalink / raw)
  To: u-boot

On Monday 04 May 2009, Jonathan Haws wrote:
> > I suggest that you run some stress tests in a conditioning cabinet to see
> > if
> > the other boards don't show any problems.
>
> That is a good idea.  I haven't thought of performing those tests.  Are
> there specific tests I can enable in the U-Boot environment for that?

Perhaps the memory tests from the POST infrastructure. But from my experience 
a realworld application running under Linux is a good test. For example 
compiling a Linux kernel in a loop. Perhaps mounted via NFS. Something like 
this should fail at some time when SDRAM related problems exist.

> We have been using a couple of these boards extensively and in some pretty
> loaded configurations.  For example, one board has been used as a data
> capture system to capture gigabytes of data over a network connection. 
> That uses RAM extensively before it actually writes it out to disk. 

That's good. Which OS was used here? Linux?

But Jerry's note about x-raying the problematic board is a good idea.

Best regards,
Stefan

=====================================================================
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: +49-8142-66989-0 Fax: +49-8142-66989-80  Email: office at denx.de
=====================================================================

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] AMCC 405EX Trap
  2009-05-04 15:08             ` Stefan Roese
@ 2009-05-04 15:12               ` Grant Erickson
  2009-05-04 15:19                 ` Jonathan Haws
  2009-05-04 18:19               ` Wolfgang Denk
  1 sibling, 1 reply; 14+ messages in thread
From: Grant Erickson @ 2009-05-04 15:12 UTC (permalink / raw)
  To: u-boot

On 5/4/09 8:08 AM, Stefan Roese wrote:
> On Monday 04 May 2009, Jonathan Haws wrote:
>>> I suggest that you run some stress tests in a conditioning cabinet to see
>>> if
>>> the other boards don't show any problems.
>> 
>> That is a good idea.  I haven't thought of performing those tests.  Are
>> there specific tests I can enable in the U-Boot environment for that?
> 
> Perhaps the memory tests from the POST infrastructure. But from my experience
> a realworld application running under Linux is a good test. For example
> compiling a Linux kernel in a loop. Perhaps mounted via NFS. Something like
> this should fail at some time when SDRAM related problems exist.

Agreed that real world application tests can be sufficiently abusive to
surface problems.

However, a side benefit of a non-application, exhaustive diagnostic is the
attendant reporting that goes with such a test that can identify particular
data patterns or addresses that fail giving better insight into the true
nature of the problem.

Regards,

Grant

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] AMCC 405EX Trap
  2009-05-04 15:12               ` Grant Erickson
@ 2009-05-04 15:19                 ` Jonathan Haws
  0 siblings, 0 replies; 14+ messages in thread
From: Jonathan Haws @ 2009-05-04 15:19 UTC (permalink / raw)
  To: u-boot

> On 5/4/09 8:08 AM, Stefan Roese wrote:
> > On Monday 04 May 2009, Jonathan Haws wrote:
> >>> I suggest that you run some stress tests in a conditioning cabinet to
> see
> >>> if
> >>> the other boards don't show any problems.
> >>
> >> That is a good idea.  I haven't thought of performing those tests.  Are
> >> there specific tests I can enable in the U-Boot environment for that?
> >
> > Perhaps the memory tests from the POST infrastructure. But from my
> experience
> > a realworld application running under Linux is a good test. For example
> > compiling a Linux kernel in a loop. Perhaps mounted via NFS. Something
> like
> > this should fail at some time when SDRAM related problems exist.
> 
> Agreed that real world application tests can be sufficiently abusive to
> surface problems.
> 
> However, a side benefit of a non-application, exhaustive diagnostic is the
> attendant reporting that goes with such a test that can identify
> particular
> data patterns or addresses that fail giving better insight into the true
> nature of the problem.

I agree with Grant on this point.  If the x-rays do not show anything, then I believe that there is something in the chips that is causing the problem - which a memory diagnostic would show well.  And if these chips are having issues and the others on the other boards are from the same lot, then there could be issues not cropping up on the other boards.

Also, Stefan, to answer your question about OS - we are using VxWorks as the OS.

Thanks again!

Jonathan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] AMCC 405EX Trap
  2009-05-04 15:08             ` Stefan Roese
  2009-05-04 15:12               ` Grant Erickson
@ 2009-05-04 18:19               ` Wolfgang Denk
  1 sibling, 0 replies; 14+ messages in thread
From: Wolfgang Denk @ 2009-05-04 18:19 UTC (permalink / raw)
  To: u-boot

In message <200905041708.59991.sr@denx.de> Stefan Roese wrote:
> On Monday 04 May 2009, Jonathan Haws wrote:
...
> > That is a good idea.  I haven't thought of performing those tests.  Are
> > there specific tests I can enable in the U-Boot environment for that?
> 
> Perhaps the memory tests from the POST infrastructure. But from my experience 
> a realworld application running under Linux is a good test. For example 
> compiling a Linux kernel in a loop. Perhaps mounted via NFS. Something like 
> this should fail at some time when SDRAM related problems exist.

Stefan is right. The memory tests in U-Boot all boil  down  to  plain
read-/write-cycles on the bus. This is nothing compared to the stress
you  put  on  the memory system when you have back-to-back burst mode
accesses. To get these, you need a combination of cache flushes (such
as in  an  OS  when  it  is  context-switching),  cache  loads  (like
instruction  fetches when lots of different code are being executed),
and DMA (like when you have heavy network traffic or  another  active
DMA  device).  Booting  Linux  with  root file system over NFS is the
easiest and one of the most reliable stress tests I know of.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
The following statement is not true.  The previous statement is true.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-05-04 18:19 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-29 19:45 [U-Boot] AMCC 405EX Trap Jonathan Haws
2009-04-29 23:37 ` Grant Erickson
2009-04-30 14:34   ` Jonathan Haws
2009-05-04  7:53     ` Stefan Roese
2009-05-04 14:43       ` Jonathan Haws
2009-05-04 14:53         ` Grant Erickson
2009-05-04 14:56         ` Stefan Roese
2009-05-04 15:01           ` Jonathan Haws
2009-05-04 15:08             ` Stefan Roese
2009-05-04 15:12               ` Grant Erickson
2009-05-04 15:19                 ` Jonathan Haws
2009-05-04 18:19               ` Wolfgang Denk
2009-05-04 14:58         ` Jerry Van Baren
2009-05-04 15:05           ` Jonathan Haws

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.