All of lore.kernel.org
 help / color / mirror / Atom feed
* imx6 silent memory corruption
@ 2015-01-22 21:25 Nikolay Dimitrov
  2015-01-22 22:25 ` Fabio Estevam
  0 siblings, 1 reply; 15+ messages in thread
From: Nikolay Dimitrov @ 2015-01-22 21:25 UTC (permalink / raw)
  To: meta-freescale

Hi folks,

I observe a behavior of my system, which I think is a silent memory
corruption. My setup is imx6d, 3.10.17-1.0.0ga, daisy. Here's a simple
script that usually should not fail:


set -e

for i in `seq 1 100`; do
     echo "Test $i"
     sha256sum -c test.dat.sha256
done


test.dat contains 64MB random data, test.dat.sha256 contains the SHA256
file hash.

The files are copied in /tmp and test is run there, to avoid unwanted
interactions with most of the filesystem drivers and storage devices.
What actually happens is that sooner or later the hash verification
fails, which should not happen (and doesn't happen on my x64
workstation). During the verification failure there are no errors in
the system log.

Performed also a similar test where I compared in a loop the content of
2 copies of the same file, and again after some iterations the
comparison fails, neither with system log errors, nor oopses.

I will appreciate if you can share ideas what could be wrong with this
setup, and also I'll be happy to hear from you suggestions for similar
simple tests for system reliability.

Regards,
Nikolay


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: imx6 silent memory corruption
  2015-01-22 21:25 imx6 silent memory corruption Nikolay Dimitrov
@ 2015-01-22 22:25 ` Fabio Estevam
  2015-01-23 21:11   ` Nikolay Dimitrov
  0 siblings, 1 reply; 15+ messages in thread
From: Fabio Estevam @ 2015-01-22 22:25 UTC (permalink / raw)
  To: Nikolay Dimitrov; +Cc: meta-freescale

On Thu, Jan 22, 2015 at 7:25 PM, Nikolay Dimitrov <picmaster@mail.bg> wrote:

> I will appreciate if you can share ideas what could be wrong with this
> setup, and also I'll be happy to hear from you suggestions for similar
> simple tests for system reliability.

Maybe you could try to run the 'memtester' utility and see it how your
board behaves.

Regards,

Fabio Estevam


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: imx6 silent memory corruption
  2015-01-22 22:25 ` Fabio Estevam
@ 2015-01-23 21:11   ` Nikolay Dimitrov
  2015-01-26 14:40     ` Doug Schwanke
  0 siblings, 1 reply; 15+ messages in thread
From: Nikolay Dimitrov @ 2015-01-23 21:11 UTC (permalink / raw)
  To: Fabio Estevam; +Cc: meta-freescale

Hi Fabio,

On 01/23/2015 12:25 AM, Fabio Estevam wrote:
> On Thu, Jan 22, 2015 at 7:25 PM, Nikolay Dimitrov <picmaster@mail.bg> wrote:
>
>> I will appreciate if you can share ideas what could be wrong with this
>> setup, and also I'll be happy to hear from you suggestions for similar
>> simple tests for system reliability.
>
> Maybe you could try to run the 'memtester' utility and see it how your
> board behaves.

Thanks for the idea. I ran the tool and it also reports errors, but
this happens rarely (just like the hash test) and I still looking for
how to easily reproduce the issue. Here's an example of memory error:


# memtester 64M 100
memtester version 4.1.3 (32-bit)
Copyright (C) 2010 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffff000
want 64MB (67108864 bytes)
got  64MB (67108864 bytes), trying mlock ...locked.
Loop 1/100:
   Stuck Address       : ok
   Random Value        : ok
FAILURE: 0xc3909006 != 0xc3909007 at offset 0x00291fac.
   Compare XOR         :   Compare SUB         : ok
   Compare MUL         : ok
   Compare DIV         : ok
   Compare OR          : ok
   Compare AND         : ok
   Sequential Increment: ok
   Solid Bits          : ok
   Block Sequential    : ok
   Checkerboard        : ok
   Bit Spread          : ok
   Bit Flip            : ok
   Walking Ones        : ok
   Walking Zeroes      : ok


Memtester can run for hours without finding an issue, and sometimes it
runs for several minutes and reports a memory error.

Found another tool, stresstestapp (http://stressapptest.googlecode.com
/svn/trunk/) which again seems to trigger the issue. Here's again an 
example of memory error:


# ./stressapptest --no_timestamps --printsec 60 -M 64 -s 300
Log: Commandline - ./stressapptest --no_timestamps --printsec 60 -M 64 
-s 300
Stats: SAT revision 1.0.7_autoconf, 32 bit binary
Log: picmaster @ riotboard on Fri Jan 23 20:48:49 EET 2015 from open 
source release
Log: 1 nodes, 2 cpus.
Log: Defaulting to 2 copy threads
Log: Flooring memory allocation to multiple of 4: 64MB
Log: Prefer plain malloc memory allocation.
Log: Using mmap() allocation at 0x72430000.
Stats: Starting SAT, 64M, 300 seconds
Log: region number 1 exceeds region count 1
Log: Region mask: 0x1
Log: Seconds remaining: 240
Log: Seconds remaining: 180
Report Error: miscompare : DIMM Unknown : 1 : 134s
Hardware Error: miscompare on CPU 1(0x2) at 0x74e93040(0x33f0d040:DIMM 
Unknown): read:0xaaaaaaaaaaaaaa8a, reread:0xaaaaaaaaaaaaaa8a 
expected:0xaaaaaaaaaaaaaaaa
Report Error: miscompare : DIMM Unknown : 1 : 136s
Hardware Error: miscompare on CPU 0(0x1) at 0x75528710(0x32270710:DIMM 
Unknown): read:0xffffffbfffffffbe, reread:0xffffffbfffffffbe 
expected:0xffffffbfffffffbf
Log: Seconds remaining: 120
Log: Seconds remaining: 60
Report Error: miscompare : DIMM Unknown : 1 : 266s
Hardware Error: miscompare on CPU 0(0x1) at 0x74b979d0(0x358ae9d0:DIMM 
Unknown): read:0x0000001000000000, reread:0x0000001000000000 
expected:0x0000001000000010
Report Error: miscompare : DIMM Unknown : 1 : 274s
Hardware Error: miscompare on CPU 0(0x1) at 0x73b4cfd0(0x35e8afd0:DIMM 
Unknown): read:0x0000001000000000, reread:0x0000001000000000 
expected:0x0000001000000010
Log: Thread 1 found 3 hardware incidents
Log: Thread 2 found 1 hardware incidents
Stats: Found 4 hardware incidents
Stats: Completed: 256346.00M in 300.03s 854.40MB/s, with 4 hardware 
incidents, 0 errors
Stats: Memory Copy: 256346.00M at 854.46MB/s
Stats: File Copy: 0.00M at 0.00MB/s
Stats: Net Copy: 0.00M at 0.00MB/s
Stats: Data Check: 0.00M at 0.00MB/s
Stats: Invert Data: 0.00M at 0.00MB/s
Stats: Disk: 0.00M at 0.00MB/s

Status: FAIL - test discovered HW problems


I plan to run again the FSL DDR stress test to see whether it
detects issues with my DDR memory. My board uses a SO-DIMM DDR3, and I
was also thinking to try with another SO-DIMM module to see whether
there's any difference.

Thanks for the ideas so far. This is a major problem for me so I need
to resolve it before doing anything else on this board.

Kind regards,
Nikolay


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: imx6 silent memory corruption
  2015-01-23 21:11   ` Nikolay Dimitrov
@ 2015-01-26 14:40     ` Doug Schwanke
  2015-01-27  8:40       ` Nikolay Dimitrov
  0 siblings, 1 reply; 15+ messages in thread
From: Doug Schwanke @ 2015-01-26 14:40 UTC (permalink / raw)
  To: Nikolay Dimitrov, Fabio Estevam; +Cc: meta-freescale

> -----Original Message-----
> From: meta-freescale-bounces@yoctoproject.org [mailto:meta-freescale-
> bounces@yoctoproject.org] On Behalf Of Nikolay Dimitrov
> Sent: Friday, January 23, 2015 3:11 PM
> To: Fabio Estevam
> Cc: meta-freescale@yoctoproject.org
> Subject: Re: [meta-freescale] imx6 silent memory corruption
> 
> Hi Fabio,
> 
> On 01/23/2015 12:25 AM, Fabio Estevam wrote:
> > On Thu, Jan 22, 2015 at 7:25 PM, Nikolay Dimitrov <picmaster@mail.bg>
> wrote:
> >
> >> I will appreciate if you can share ideas what could be wrong with
> >> this setup, and also I'll be happy to hear from you suggestions for
> >> similar simple tests for system reliability.
> >
> > Maybe you could try to run the 'memtester' utility and see it how your
> > board behaves.
> 
> Thanks for the idea. I ran the tool and it also reports errors, but this happens
> rarely (just like the hash test) and I still looking for how to easily reproduce
> the issue. Here's an example of memory error:
> 
> 
> # memtester 64M 100
> memtester version 4.1.3 (32-bit)
> Copyright (C) 2010 Charles Cazabon.
> Licensed under the GNU General Public License version 2 (only).
> 
> pagesize is 4096
> pagesizemask is 0xfffff000
> want 64MB (67108864 bytes)
> got  64MB (67108864 bytes), trying mlock ...locked.
> Loop 1/100:
>    Stuck Address       : ok
>    Random Value        : ok
> FAILURE: 0xc3909006 != 0xc3909007 at offset 0x00291fac.
>    Compare XOR         :   Compare SUB         : ok
>    Compare MUL         : ok
>    Compare DIV         : ok
>    Compare OR          : ok
>    Compare AND         : ok
>    Sequential Increment: ok
>    Solid Bits          : ok
>    Block Sequential    : ok
>    Checkerboard        : ok
>    Bit Spread          : ok
>    Bit Flip            : ok
>    Walking Ones        : ok
>    Walking Zeroes      : ok
> 
> 
> Memtester can run for hours without finding an issue, and sometimes it runs
> for several minutes and reports a memory error.
> 
> Found another tool, stresstestapp (http://stressapptest.googlecode.com
> /svn/trunk/) which again seems to trigger the issue. Here's again an example
> of memory error:
> 
> 
> # ./stressapptest --no_timestamps --printsec 60 -M 64 -s 300
> Log: Commandline - ./stressapptest --no_timestamps --printsec 60 -M 64
> -s 300
> Stats: SAT revision 1.0.7_autoconf, 32 bit binary
> Log: picmaster @ riotboard on Fri Jan 23 20:48:49 EET 2015 from open
> source release
> Log: 1 nodes, 2 cpus.
> Log: Defaulting to 2 copy threads
> Log: Flooring memory allocation to multiple of 4: 64MB
> Log: Prefer plain malloc memory allocation.
> Log: Using mmap() allocation at 0x72430000.
> Stats: Starting SAT, 64M, 300 seconds
> Log: region number 1 exceeds region count 1
> Log: Region mask: 0x1
> Log: Seconds remaining: 240
> Log: Seconds remaining: 180
> Report Error: miscompare : DIMM Unknown : 1 : 134s
> Hardware Error: miscompare on CPU 1(0x2) at 0x74e93040(0x33f0d040:DIMM
> Unknown): read:0xaaaaaaaaaaaaaa8a, reread:0xaaaaaaaaaaaaaa8a
> expected:0xaaaaaaaaaaaaaaaa
> Report Error: miscompare : DIMM Unknown : 1 : 136s
> Hardware Error: miscompare on CPU 0(0x1) at 0x75528710(0x32270710:DIMM
> Unknown): read:0xffffffbfffffffbe, reread:0xffffffbfffffffbe
> expected:0xffffffbfffffffbf
> Log: Seconds remaining: 120
> Log: Seconds remaining: 60
> Report Error: miscompare : DIMM Unknown : 1 : 266s
> Hardware Error: miscompare on CPU 0(0x1) at
> 0x74b979d0(0x358ae9d0:DIMM
> Unknown): read:0x0000001000000000, reread:0x0000001000000000
> expected:0x0000001000000010
> Report Error: miscompare : DIMM Unknown : 1 : 274s
> Hardware Error: miscompare on CPU 0(0x1) at 0x73b4cfd0(0x35e8afd0:DIMM
> Unknown): read:0x0000001000000000, reread:0x0000001000000000
> expected:0x0000001000000010
> Log: Thread 1 found 3 hardware incidents
> Log: Thread 2 found 1 hardware incidents
> Stats: Found 4 hardware incidents
> Stats: Completed: 256346.00M in 300.03s 854.40MB/s, with 4 hardware
> incidents, 0 errors
> Stats: Memory Copy: 256346.00M at 854.46MB/s
> Stats: File Copy: 0.00M at 0.00MB/s
> Stats: Net Copy: 0.00M at 0.00MB/s
> Stats: Data Check: 0.00M at 0.00MB/s
> Stats: Invert Data: 0.00M at 0.00MB/s
> Stats: Disk: 0.00M at 0.00MB/s
> 
> Status: FAIL - test discovered HW problems
> 
> 
> I plan to run again the FSL DDR stress test to see whether it
> detects issues with my DDR memory. My board uses a SO-DIMM DDR3, and I
> was also thinking to try with another SO-DIMM module to see whether
> there's any difference.
> 
> Thanks for the ideas so far. This is a major problem for me so I need
> to resolve it before doing anything else on this board.
>

Have you read ERR005198 of the Chip Errata for the i.MX 6Dual/6Quad
http://cache.freescale.com/files/32bit/doc/errata/IMX6DQCE.pdf

-Doug Schwanke

> Kind regards,
> Nikolay
> --
> _______________________________________________
> meta-freescale mailing list
> meta-freescale@yoctoproject.org
> https://lists.yoctoproject.org/listinfo/meta-freescale


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: imx6 silent memory corruption
  2015-01-26 14:40     ` Doug Schwanke
@ 2015-01-27  8:40       ` Nikolay Dimitrov
  2015-01-27 16:27         ` Nikolay Dimitrov
  0 siblings, 1 reply; 15+ messages in thread
From: Nikolay Dimitrov @ 2015-01-27  8:40 UTC (permalink / raw)
  To: Doug Schwanke; +Cc: meta-freescale

Hi Doug,

On 01/26/2015 04:40 PM, Doug Schwanke wrote:
>> -----Original Message-----
>> From: meta-freescale-bounces@yoctoproject.org [mailto:meta-freescale-
>> bounces@yoctoproject.org] On Behalf Of Nikolay Dimitrov
>> Sent: Friday, January 23, 2015 3:11 PM
>> To: Fabio Estevam
>> Cc: meta-freescale@yoctoproject.org
>> Subject: Re: [meta-freescale] imx6 silent memory corruption
>>
>> Hi Fabio,
>>
>> On 01/23/2015 12:25 AM, Fabio Estevam wrote:
>>> On Thu, Jan 22, 2015 at 7:25 PM, Nikolay Dimitrov <picmaster@mail.bg>
>> wrote:
>>>
>>>> I will appreciate if you can share ideas what could be wrong with
>>>> this setup, and also I'll be happy to hear from you suggestions for
>>>> similar simple tests for system reliability.
>>>
>>> Maybe you could try to run the 'memtester' utility and see it how your
>>> board behaves.
>>
>> Thanks for the idea. I ran the tool and it also reports errors, but this happens
>> rarely (just like the hash test) and I still looking for how to easily reproduce
>> the issue. Here's an example of memory error:
>>
>>
>> # memtester 64M 100
>> memtester version 4.1.3 (32-bit)
>> Copyright (C) 2010 Charles Cazabon.
>> Licensed under the GNU General Public License version 2 (only).
>>
>> pagesize is 4096
>> pagesizemask is 0xfffff000
>> want 64MB (67108864 bytes)
>> got  64MB (67108864 bytes), trying mlock ...locked.
>> Loop 1/100:
>>     Stuck Address       : ok
>>     Random Value        : ok
>> FAILURE: 0xc3909006 != 0xc3909007 at offset 0x00291fac.
>>     Compare XOR         :   Compare SUB         : ok
>>     Compare MUL         : ok
>>     Compare DIV         : ok
>>     Compare OR          : ok
>>     Compare AND         : ok
>>     Sequential Increment: ok
>>     Solid Bits          : ok
>>     Block Sequential    : ok
>>     Checkerboard        : ok
>>     Bit Spread          : ok
>>     Bit Flip            : ok
>>     Walking Ones        : ok
>>     Walking Zeroes      : ok
>>
>>
>> Memtester can run for hours without finding an issue, and sometimes it runs
>> for several minutes and reports a memory error.
>>
>> Found another tool, stresstestapp (http://stressapptest.googlecode.com
>> /svn/trunk/) which again seems to trigger the issue. Here's again an example
>> of memory error:
>>
>>
>> # ./stressapptest --no_timestamps --printsec 60 -M 64 -s 300
>> Log: Commandline - ./stressapptest --no_timestamps --printsec 60 -M 64
>> -s 300
>> Stats: SAT revision 1.0.7_autoconf, 32 bit binary
>> Log: picmaster @ riotboard on Fri Jan 23 20:48:49 EET 2015 from open
>> source release
>> Log: 1 nodes, 2 cpus.
>> Log: Defaulting to 2 copy threads
>> Log: Flooring memory allocation to multiple of 4: 64MB
>> Log: Prefer plain malloc memory allocation.
>> Log: Using mmap() allocation at 0x72430000.
>> Stats: Starting SAT, 64M, 300 seconds
>> Log: region number 1 exceeds region count 1
>> Log: Region mask: 0x1
>> Log: Seconds remaining: 240
>> Log: Seconds remaining: 180
>> Report Error: miscompare : DIMM Unknown : 1 : 134s
>> Hardware Error: miscompare on CPU 1(0x2) at 0x74e93040(0x33f0d040:DIMM
>> Unknown): read:0xaaaaaaaaaaaaaa8a, reread:0xaaaaaaaaaaaaaa8a
>> expected:0xaaaaaaaaaaaaaaaa
>> Report Error: miscompare : DIMM Unknown : 1 : 136s
>> Hardware Error: miscompare on CPU 0(0x1) at 0x75528710(0x32270710:DIMM
>> Unknown): read:0xffffffbfffffffbe, reread:0xffffffbfffffffbe
>> expected:0xffffffbfffffffbf
>> Log: Seconds remaining: 120
>> Log: Seconds remaining: 60
>> Report Error: miscompare : DIMM Unknown : 1 : 266s
>> Hardware Error: miscompare on CPU 0(0x1) at
>> 0x74b979d0(0x358ae9d0:DIMM
>> Unknown): read:0x0000001000000000, reread:0x0000001000000000
>> expected:0x0000001000000010
>> Report Error: miscompare : DIMM Unknown : 1 : 274s
>> Hardware Error: miscompare on CPU 0(0x1) at 0x73b4cfd0(0x35e8afd0:DIMM
>> Unknown): read:0x0000001000000000, reread:0x0000001000000000
>> expected:0x0000001000000010
>> Log: Thread 1 found 3 hardware incidents
>> Log: Thread 2 found 1 hardware incidents
>> Stats: Found 4 hardware incidents
>> Stats: Completed: 256346.00M in 300.03s 854.40MB/s, with 4 hardware
>> incidents, 0 errors
>> Stats: Memory Copy: 256346.00M at 854.46MB/s
>> Stats: File Copy: 0.00M at 0.00MB/s
>> Stats: Net Copy: 0.00M at 0.00MB/s
>> Stats: Data Check: 0.00M at 0.00MB/s
>> Stats: Invert Data: 0.00M at 0.00MB/s
>> Stats: Disk: 0.00M at 0.00MB/s
>>
>> Status: FAIL - test discovered HW problems
>>
>>
>> I plan to run again the FSL DDR stress test to see whether it
>> detects issues with my DDR memory. My board uses a SO-DIMM DDR3, and I
>> was also thinking to try with another SO-DIMM module to see whether
>> there's any difference.
>>
>> Thanks for the ideas so far. This is a major problem for me so I need
>> to resolve it before doing anything else on this board.
>>
>
> Have you read ERR005198 of the Chip Errata for the i.MX 6Dual/6Quad
> http://cache.freescale.com/files/32bit/doc/errata/IMX6DQCE.pdf

The issue is observed even when PL310 is disabled in the kernel
configuration.

Regards,
Nikolay


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: imx6 silent memory corruption
  2015-01-27  8:40       ` Nikolay Dimitrov
@ 2015-01-27 16:27         ` Nikolay Dimitrov
  2015-01-27 17:00           ` Otavio Salvador
  0 siblings, 1 reply; 15+ messages in thread
From: Nikolay Dimitrov @ 2015-01-27 16:27 UTC (permalink / raw)
  To: meta-freescale

Hi guys,

Just to share my progress so far. My board passed 500+ iterations of
DDR stress testing with the FSL tool (this bloody thing took 2 days to
complete!). Tried the U-Boot integrated memory tests (both quick and
alt), and they didn't show any issues. So I can say that at least the
hardware and DDR settings seems to be OK.

In addition, I tried to disable SMP, L2 & L1 cache support in my kernel
and see what happens, but I can still observe the corruption. I found
that there's an embedded memory tester in the kernel, but only for x86
(CONFIG_MEMTEST), not useful for ARM.

I started to doubt that my 2 cross-build machines (x64 Debian stable)
are maybe producing broken kernels, so did a test cross-build on Ubuntu
14.04, and even did native build on riotboard, but the results are the
same - issue can be still observed.

During boot and operation I sometimes see kernel oopses (I remember
most of them related with execve and filesystem reads), sometimes apps
segfault. Kernel and/or toolchain issues are also possible but I
checked that Daisy has the gcc-4.8+ PR58854 patch, so not very likely,
especially if other Yocto users don't report such issues.

Possible step forward is to try to build original FSL 3.10.17 version
without my local patches, and maybe even build a minimal 3.18+ mainline
kernel to see how it behaves in the same test case. Someone can even
joke that it's time to leave the dusty old Daisy and go forward with
Dizzy and 3.10.53.

Regards,
Nikolay


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: imx6 silent memory corruption
  2015-01-27 16:27         ` Nikolay Dimitrov
@ 2015-01-27 17:00           ` Otavio Salvador
  2015-01-27 17:40             ` Gonzalez, Alex
  0 siblings, 1 reply; 15+ messages in thread
From: Otavio Salvador @ 2015-01-27 17:00 UTC (permalink / raw)
  To: Nikolay Dimitrov; +Cc: meta-freescale

On Tue, Jan 27, 2015 at 2:27 PM, Nikolay Dimitrov <picmaster@mail.bg> wrote:
> Just to share my progress so far. My board passed 500+ iterations of
> DDR stress testing with the FSL tool (this bloody thing took 2 days to
> complete!). Tried the U-Boot integrated memory tests (both quick and
> alt), and they didn't show any issues. So I can say that at least the
> hardware and DDR settings seems to be OK.
>
> In addition, I tried to disable SMP, L2 & L1 cache support in my kernel
> and see what happens, but I can still observe the corruption. I found
> that there's an embedded memory tester in the kernel, but only for x86
> (CONFIG_MEMTEST), not useful for ARM.
>
> I started to doubt that my 2 cross-build machines (x64 Debian stable)
> are maybe producing broken kernels, so did a test cross-build on Ubuntu
> 14.04, and even did native build on riotboard, but the results are the
> same - issue can be still observed.
>
> During boot and operation I sometimes see kernel oopses (I remember
> most of them related with execve and filesystem reads), sometimes apps
> segfault. Kernel and/or toolchain issues are also possible but I
> checked that Daisy has the gcc-4.8+ PR58854 patch, so not very likely,
> especially if other Yocto users don't report such issues.
>
> Possible step forward is to try to build original FSL 3.10.17 version
> without my local patches, and maybe even build a minimal 3.18+ mainline
> kernel to see how it behaves in the same test case. Someone can even
> joke that it's time to leave the dusty old Daisy and go forward with
> Dizzy and 3.10.53.

I think Dizzy is very solid. We have been using this with several
customers and been very satisfied.


-- 
Otavio Salvador                             O.S. Systems
http://www.ossystems.com.br        http://code.ossystems.com.br
Mobile: +55 (53) 9981-7854            Mobile: +1 (347) 903-9750


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: imx6 silent memory corruption
  2015-01-27 17:00           ` Otavio Salvador
@ 2015-01-27 17:40             ` Gonzalez, Alex
  2015-01-27 20:23               ` Nikolay Dimitrov
  0 siblings, 1 reply; 15+ messages in thread
From: Gonzalez, Alex @ 2015-01-27 17:40 UTC (permalink / raw)
  To: Otavio Salvador, Nikolay Dimitrov; +Cc: meta-freescale

Nikolay,

Assuming this is a custom board, if you are using different memory or with a different configuration from the reference designs, you may need to change the memory calibration to suit the hardware. Freescale support can provide an application note that explains how to perform the calibration over a representative sample of boards.

Alex


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: imx6 silent memory corruption
  2015-01-27 17:40             ` Gonzalez, Alex
@ 2015-01-27 20:23               ` Nikolay Dimitrov
  2015-01-27 20:51                 ` Eric Bénard
  0 siblings, 1 reply; 15+ messages in thread
From: Nikolay Dimitrov @ 2015-01-27 20:23 UTC (permalink / raw)
  To: Gonzalez, Alex; +Cc: meta-freescale, Otavio Salvador

Hi Alex,

On 01/27/2015 07:40 PM, Gonzalez, Alex wrote:
> Nikolay,
>
> Assuming this is a custom board, if you are using different memory or with a different configuration from the reference designs, you may need to change the memory calibration to suit the hardware. Freescale support can provide an application note that explains how to perform the calibration over a representative sample of boards.

Yes, this is a custom board. I've already did the DDR3 calibration for
this design and validated it with 48h testing with the Freescale DDR
stress test tool (available on the IMX Community site).

Kind regards,
Nikolay


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: imx6 silent memory corruption
  2015-01-27 20:23               ` Nikolay Dimitrov
@ 2015-01-27 20:51                 ` Eric Bénard
  2015-01-27 22:35                   ` Nikolay Dimitrov
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Bénard @ 2015-01-27 20:51 UTC (permalink / raw)
  To: Nikolay Dimitrov; +Cc: meta-freescale, Otavio Salvador

Hi Nikolay,

Le Tue, 27 Jan 2015 22:23:15 +0200,
Nikolay Dimitrov <picmaster@mail.bg> a écrit :
> On 01/27/2015 07:40 PM, Gonzalez, Alex wrote:
> > Nikolay,
> >
> > Assuming this is a custom board, if you are using different memory or with a different configuration from the reference designs, you may need to change the memory calibration to suit the hardware. Freescale support can provide an application note that explains how to perform the calibration over a representative sample of boards.
> 
> Yes, this is a custom board. I've already did the DDR3 calibration for
> this design and validated it with 48h testing with the Freescale DDR
> stress test tool (available on the IMX Community site).
> 
are you sure of the stability of your power supplies under loard (not
only the memories' one) ?

best regards
Eric


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: imx6 silent memory corruption
  2015-01-27 20:51                 ` Eric Bénard
@ 2015-01-27 22:35                   ` Nikolay Dimitrov
  2015-01-27 22:59                     ` Eric Bénard
  0 siblings, 1 reply; 15+ messages in thread
From: Nikolay Dimitrov @ 2015-01-27 22:35 UTC (permalink / raw)
  To: Eric Bénard; +Cc: meta-freescale, Otavio Salvador

Hi Eric,

On 01/27/2015 10:51 PM, Eric Bénard wrote:
> Hi Nikolay,
>
> Le Tue, 27 Jan 2015 22:23:15 +0200,
> Nikolay Dimitrov <picmaster@mail.bg> a écrit :
>> On 01/27/2015 07:40 PM, Gonzalez, Alex wrote:
>>> Nikolay,
>>>
>>> Assuming this is a custom board, if you are using different memory or with a different configuration from the reference designs, you may need to change the memory calibration to suit the hardware. Freescale support can provide an application note that explains how to perform the calibration over a representative sample of boards.
>>
>> Yes, this is a custom board. I've already did the DDR3 calibration for
>> this design and validated it with 48h testing with the Freescale DDR
>> stress test tool (available on the IMX Community site).
>>
> are you sure of the stability of your power supplies under loard (not
> only the memories' one) ?

Hmm, very good question. I haven't attached a scope to the voltage 
rails, but that can be done easily. What would you advice for 
considering as an acceptable voltage noise amplitude on the CPU rails?

Kind regards,
Nikolay

PS: FYI - my board uses 12V @ about 1-1.2A, and it's supplied by Instek 
PSP-2010, which is capable of delivering up to 10A (but the protection 
is usually configured for 2A as a safety precaution). The on-board SMPS 
is MMPF0100, which supplies all the digital subsystems.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: imx6 silent memory corruption
  2015-01-27 22:35                   ` Nikolay Dimitrov
@ 2015-01-27 22:59                     ` Eric Bénard
  2015-01-28 16:18                       ` Nikolay Dimitrov
  2015-02-14 12:12                       ` Nikolay Dimitrov
  0 siblings, 2 replies; 15+ messages in thread
From: Eric Bénard @ 2015-01-27 22:59 UTC (permalink / raw)
  To: Nikolay Dimitrov; +Cc: meta-freescale, Otavio Salvador

Hi Nikolay,

Le Wed, 28 Jan 2015 00:35:30 +0200,
Nikolay Dimitrov <picmaster@mail.bg> a écrit :
> On 01/27/2015 10:51 PM, Eric Bénard wrote:
> > Hi Nikolay,
> >
> > Le Tue, 27 Jan 2015 22:23:15 +0200,
> > Nikolay Dimitrov <picmaster@mail.bg> a écrit :
> >> On 01/27/2015 07:40 PM, Gonzalez, Alex wrote:
> >>> Nikolay,
> >>>
> >>> Assuming this is a custom board, if you are using different memory or with a different configuration from the reference designs, you may need to change the memory calibration to suit the hardware. Freescale support can provide an application note that explains how to perform the calibration over a representative sample of boards.
> >>
> >> Yes, this is a custom board. I've already did the DDR3 calibration for
> >> this design and validated it with 48h testing with the Freescale DDR
> >> stress test tool (available on the IMX Community site).
> >>
> > are you sure of the stability of your power supplies under loard (not
> > only the memories' one) ?
> 
> Hmm, very good question. I haven't attached a scope to the voltage 
> rails, but that can be done easily. What would you advice for 
> considering as an acceptable voltage noise amplitude on the CPU rails?
> 
as small as possible and no spikes under load with probe as close as
possible to the CPU and good GND reference.

Best regards
Eric


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: imx6 silent memory corruption
  2015-01-27 22:59                     ` Eric Bénard
@ 2015-01-28 16:18                       ` Nikolay Dimitrov
  2015-01-28 16:40                         ` Fabio Estevam
  2015-02-14 12:12                       ` Nikolay Dimitrov
  1 sibling, 1 reply; 15+ messages in thread
From: Nikolay Dimitrov @ 2015-01-28 16:18 UTC (permalink / raw)
  To: meta-freescale; +Cc: Otavio Salvador

Hi guys,

Here is the result of a quick build of 3.10.53 without any patches:

------------[ cut here ]------------ 

WARNING: at fs/inode.c:237 __destroy_inode+0x114/0x11c() 

Modules linked in: 

CPU: 0 PID: 245 Comm: udevd Not tainted 3.10.53-84287-ge133fbc #1 

[<80013b00>] (unwind_backtrace+0x0/0xf4) from [<80011524>] 
(show_stack+0x10/0x14)
[<80011524>] (show_stack+0x10/0x14) from [<8002c538>] 
(warn_slowpath_common+0x54/0x6c)
[<8002c538>] (warn_slowpath_common+0x54/0x6c) from [<8002c5ec>] 
(warn_slowpath_null+0x1c/0x24)
[<8002c5ec>] (warn_slowpath_null+0x1c/0x24) from [<800dc920>] 
(__destroy_inode+0x114/0x11c)
[<800dc920>] (__destroy_inode+0x114/0x11c) from [<800dcd18>] 
(destroy_inode+0x1c/0x54)
[<800dcd18>] (destroy_inode+0x1c/0x54) from [<800d9420>] 
(d_kill+0xe8/0x120)
[<800d9420>] (d_kill+0xe8/0x120) from [<800d953c>] (dput+0xe4/0x1d4) 

[<800d953c>] (dput+0xe4/0x1d4) from [<800c8340>] (__fput+0x110/0x1fc) 

[<800c8340>] (__fput+0x110/0x1fc) from [<80046e90>] 
(task_work_run+0xb0/0xe8)
[<80046e90>] (task_work_run+0xb0/0xe8) from [<80011164>] 
(do_work_pending+0x98/0x9c)
[<80011164>] (do_work_pending+0x98/0x9c) from [<8000e0c0>] 
(work_pending+0xc/0x20)
---[ end trace f76af4a5875883da ]---

Maybe now the proper question is: is anyone else using nfsroot without 
issues?

Regards,
Nikolay


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: imx6 silent memory corruption
  2015-01-28 16:18                       ` Nikolay Dimitrov
@ 2015-01-28 16:40                         ` Fabio Estevam
  0 siblings, 0 replies; 15+ messages in thread
From: Fabio Estevam @ 2015-01-28 16:40 UTC (permalink / raw)
  To: Nikolay Dimitrov; +Cc: meta-freescale, Otavio Salvador

On Wed, Jan 28, 2015 at 2:18 PM, Nikolay Dimitrov <picmaster@mail.bg> wrote:

> Maybe now the proper question is: is anyone else using nfsroot without
> issues?

I don't see such problem on a mx6qsabresd booting from nfs.

Regards,

Fabio Estevam


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: imx6 silent memory corruption
  2015-01-27 22:59                     ` Eric Bénard
  2015-01-28 16:18                       ` Nikolay Dimitrov
@ 2015-02-14 12:12                       ` Nikolay Dimitrov
  1 sibling, 0 replies; 15+ messages in thread
From: Nikolay Dimitrov @ 2015-02-14 12:12 UTC (permalink / raw)
  To: Eric Bénard; +Cc: meta-freescale, Otavio Salvador

Hi Eric,

On 01/28/2015 12:59 AM, Eric Bénard wrote:
> Hi Nikolay,
>
> Le Wed, 28 Jan 2015 00:35:30 +0200,
> Nikolay Dimitrov <picmaster@mail.bg> a écrit :
>> On 01/27/2015 10:51 PM, Eric Bénard wrote:
>>> Hi Nikolay,
>>>
>>> Le Tue, 27 Jan 2015 22:23:15 +0200,
>>> Nikolay Dimitrov <picmaster@mail.bg> a écrit :
>>>> On 01/27/2015 07:40 PM, Gonzalez, Alex wrote:
>>>>> Nikolay,
>>>>>
>>>>> Assuming this is a custom board, if you are using different memory or with a different configuration from the reference designs, you may need to change the memory calibration to suit the hardware. Freescale support can provide an application note that explains how to perform the calibration over a representative sample of boards.
>>>>
>>>> Yes, this is a custom board. I've already did the DDR3 calibration for
>>>> this design and validated it with 48h testing with the Freescale DDR
>>>> stress test tool (available on the IMX Community site).
>>>>
>>> are you sure of the stability of your power supplies under loard (not
>>> only the memories' one) ?
>>
>> Hmm, very good question. I haven't attached a scope to the voltage
>> rails, but that can be done easily. What would you advice for
>> considering as an acceptable voltage noise amplitude on the CPU rails?
>>
> as small as possible and no spikes under load with probe as close as
> possible to the CPU and good GND reference.

Thanks for the remark. It turned out that the board indeed had high 
power supply noise, which was causing this instability.

Regards,
Nikolay


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-02-14 12:12 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-22 21:25 imx6 silent memory corruption Nikolay Dimitrov
2015-01-22 22:25 ` Fabio Estevam
2015-01-23 21:11   ` Nikolay Dimitrov
2015-01-26 14:40     ` Doug Schwanke
2015-01-27  8:40       ` Nikolay Dimitrov
2015-01-27 16:27         ` Nikolay Dimitrov
2015-01-27 17:00           ` Otavio Salvador
2015-01-27 17:40             ` Gonzalez, Alex
2015-01-27 20:23               ` Nikolay Dimitrov
2015-01-27 20:51                 ` Eric Bénard
2015-01-27 22:35                   ` Nikolay Dimitrov
2015-01-27 22:59                     ` Eric Bénard
2015-01-28 16:18                       ` Nikolay Dimitrov
2015-01-28 16:40                         ` Fabio Estevam
2015-02-14 12:12                       ` Nikolay Dimitrov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.