All of lore.kernel.org
 help / color / mirror / Atom feed
* Testing a device using mtd_stresstest
@ 2011-01-31 12:12 David Peverley
  2011-02-06 14:24 ` Artem Bityutskiy
  0 siblings, 1 reply; 12+ messages in thread
From: David Peverley @ 2011-01-31 12:12 UTC (permalink / raw)
  To: linux-mtd

Hi all,

I've got a new board that has been experiencing occasional issues that
look to be MTD related so I've been running the tests from linux
2.6.31.12 from under drivers/mtd/tests to see if I can trigger
failures. It's worth mentioning also that I've built our kernel with
CONFIG_MTD_NAND_VERIFY_WRITE enabled. The flash we're using is a
Micron MT29F2G08AADWP, see datasheet :
    http://download.micron.com/pdf/datasheets/flash/nand/2_4_8gb_nand_m49a.pdf
(Same part but different operating voltage)
The only caveat is that that this new board has changed to our
previous board in that the R/B line is now fixed high and  instead of
being connected to a GPIO and a 25uS chip_delay has been specified.

I was wondering if anyone might have experienced anything similar and
might be able to nudge me in the right direction on this one :

Question 1 : The  mtd_subpagetest (which I suspect should fail as the
device doesn't support sub-pages). I googled around and found a
reference that maybe I should add NAND_NO_SUBPAGE_WRITE to the
options. I tried this and it made no difference. Out of curiosity I
grepped through drivers/mtd and found that *no* drivers actully use
this bit anyway...! Is it reasonable to ignore this or ought I address
it? Should I set the flag and expect it to have an effect?

Question 2 : The mtd_stresstest test fails after anywhere between 1000
and 200,000 operations. I'm certain this is a Bad Sign. It fails in
nand_base.c:nand_write_page() in the verification step enabled by
MTD_NAND_VERIFY_WRITE. When I tested this on our previous board (that
ostensibly works fine) it failed the stress test after 2.6M operations
instead. Should I be expecting to never see a failure of the stress
test or is an occasional verify failure reasonably expected?

Thanks for any suggestions!

~Pev

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Testing a device using mtd_stresstest
  2011-01-31 12:12 Testing a device using mtd_stresstest David Peverley
@ 2011-02-06 14:24 ` Artem Bityutskiy
  2011-02-07 13:44   ` David Peverley
  0 siblings, 1 reply; 12+ messages in thread
From: Artem Bityutskiy @ 2011-02-06 14:24 UTC (permalink / raw)
  To: David Peverley; +Cc: linux-mtd

Hi,

On Mon, 2011-01-31 at 12:12 +0000, David Peverley wrote:
> Question 1 : The  mtd_subpagetest (which I suspect should fail as the
> device doesn't support sub-pages). I googled around and found a
> reference that maybe I should add NAND_NO_SUBPAGE_WRITE to the
> options. I tried this and it made no difference. Out of curiosity I
> grepped through drivers/mtd and found that *no* drivers actully use
> this bit anyway...! Is it reasonable to ignore this or ought I address
> it? Should I set the flag and expect it to have an effect?

MTD code is currently broken and CONFIG_MTD_NAND_VERIFY_WRITE causes
errors when sub-pages are used. You should either disable this
configuration option or fix MTD. We have this in our FAQ:

http://www.linux-mtd.infradead.org/faq/ubi.html#L_subpage_verify_fail


> Question 2 : The mtd_stresstest test fails after anywhere between 1000
> and 200,000 operations. I'm certain this is a Bad Sign. It fails in
> nand_base.c:nand_write_page() in the verification step enabled by
> MTD_NAND_VERIFY_WRITE. When I tested this on our previous board (that
> ostensibly works fine) it failed the stress test after 2.6M operations
> instead. Should I be expecting to never see a failure of the stress
> test or is an occasional verify failure reasonably expected?

Yes, the test is expected to never fail. You should try to dig and
understand why is it failing and what is the reason.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Testing a device using mtd_stresstest
  2011-02-06 14:24 ` Artem Bityutskiy
@ 2011-02-07 13:44   ` David Peverley
  2011-02-07 14:01     ` Andrew Murray
                       ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: David Peverley @ 2011-02-07 13:44 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd

Hi Artem,

Many thanks for the response ;

> MTD code is currently broken and CONFIG_MTD_NAND_VERIFY_WRITE causes
> errors when sub-pages are used. You should either disable this
> configuration option or fix MTD. We have this in our FAQ:
>
> http://www.linux-mtd.infradead.org/faq/ubi.html#L_subpage_verify_fail
I'm not sure what the implication of this is ; I understand that this
will cause the subpage test to fail with CONFIG_MTD_NAND_VERIFY_WRITE
enabled. However, the FAQ I had discounted as we use YAFFS2 and not
UBIFS. Given that should I still disable the write verify? At the
moment I'm inclined to leave it enabled as it seems to be regularly
catching failures that should not occur, such as the stress-test
failures noted.

We've also noticed that every so often we see "uncorrectable error:"
messages from nand_ecc.c - do you have any suggestions as to where to
start investigating here? So far I can't find a pattern to occurrences
or a regular way to reproduce.

Thanks again!

~Pev

On 6 February 2011 14:24, Artem Bityutskiy <dedekind1@gmail.com> wrote:
> Hi,
>
> On Mon, 2011-01-31 at 12:12 +0000, David Peverley wrote:
>> Question 1 : The  mtd_subpagetest (which I suspect should fail as the
>> device doesn't support sub-pages). I googled around and found a
>> reference that maybe I should add NAND_NO_SUBPAGE_WRITE to the
>> options. I tried this and it made no difference. Out of curiosity I
>> grepped through drivers/mtd and found that *no* drivers actully use
>> this bit anyway...! Is it reasonable to ignore this or ought I address
>> it? Should I set the flag and expect it to have an effect?
>
> MTD code is currently broken and CONFIG_MTD_NAND_VERIFY_WRITE causes
> errors when sub-pages are used. You should either disable this
> configuration option or fix MTD. We have this in our FAQ:
>
> http://www.linux-mtd.infradead.org/faq/ubi.html#L_subpage_verify_fail
>
>
>> Question 2 : The mtd_stresstest test fails after anywhere between 1000
>> and 200,000 operations. I'm certain this is a Bad Sign. It fails in
>> nand_base.c:nand_write_page() in the verification step enabled by
>> MTD_NAND_VERIFY_WRITE. When I tested this on our previous board (that
>> ostensibly works fine) it failed the stress test after 2.6M operations
>> instead. Should I be expecting to never see a failure of the stress
>> test or is an occasional verify failure reasonably expected?
>
> Yes, the test is expected to never fail. You should try to dig and
> understand why is it failing and what is the reason.
>
> --
> Best Regards,
> Artem Bityutskiy (Артём Битюцкий)
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Testing a device using mtd_stresstest
  2011-02-07 13:44   ` David Peverley
@ 2011-02-07 14:01     ` Andrew Murray
  2011-02-07 14:39     ` Arno Steffen
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Andrew Murray @ 2011-02-07 14:01 UTC (permalink / raw)
  To: David Peverley; +Cc: linux-mtd, dedekind1

On 7 February 2011 13:44, David Peverley <pev@sketchymonkey.com> wrote:
> We've also noticed that every so often we see "uncorrectable error:"
> messages from nand_ecc.c - do you have any suggestions as to where to
> start investigating here? So far I can't find a pattern to occurrences
> or a regular way to reproduce.

Are you using hardware ECC?

We switched from SW ECC to HW ECC and discovered that YAFFS2 was
writing over the ECC in the OOB. It may be wroth checking for any such
overlap of OOB use between the kernel-mtd/YAFFS2 and whatever boot
loaders you may have.

We worked around this issue by using in-band tags in our YAFFS2 image.

Thanks,

Andrew Murray

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Testing a device using mtd_stresstest
  2011-02-07 13:44   ` David Peverley
  2011-02-07 14:01     ` Andrew Murray
@ 2011-02-07 14:39     ` Arno Steffen
  2011-02-10 15:24       ` David Peverley
       [not found]     ` <AANLkTi=UgsLq3=7ma9MPJJBVxtNvyr=ThtLPy8qzC3Bk@mail.gmail.com>
                       ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Arno Steffen @ 2011-02-07 14:39 UTC (permalink / raw)
  To: David Peverley; +Cc: linux-mtd, dedekind1

I did some same observations. Especially I digged around with the subpage issue.
As options failes, I did patch the nand_base.c to set this bit.
At least test doesn't fail anymore. But this is a far from perfect solution.

With this uncorrectable errors: I struggled with it for month until I
found that there has been some patch (for OMAP).
I assume this is for TI OMAP only, but I don't know, what processor do you use.

There are still some other issues in with jffs2, I reported. It seems
nobody here cares about.
Artem has fixed one of the reported bugs into ubifs, but this doesn't
help me much.
JFFS2 is without support - as far as I could see.

Best regards
Arno

2011/2/7 David Peverley <pev@sketchymonkey.com>:
> Hi Artem,
>
> Many thanks for the response ;
>
>> MTD code is currently broken and CONFIG_MTD_NAND_VERIFY_WRITE causes
>> errors when sub-pages are used. You should either disable this
>> configuration option or fix MTD. We have this in our FAQ:
>>
>> http://www.linux-mtd.infradead.org/faq/ubi.html#L_subpage_verify_fail
> I'm not sure what the implication of this is ; I understand that this
> will cause the subpage test to fail with CONFIG_MTD_NAND_VERIFY_WRITE
> enabled. However, the FAQ I had discounted as we use YAFFS2 and not
> UBIFS. Given that should I still disable the write verify? At the
> moment I'm inclined to leave it enabled as it seems to be regularly
> catching failures that should not occur, such as the stress-test
> failures noted.
>
> We've also noticed that every so often we see "uncorrectable error:"
> messages from nand_ecc.c - do you have any suggestions as to where to
> start investigating here? So far I can't find a pattern to occurrences
> or a regular way to reproduce.
>
> Thanks again!
>
> ~Pev
>
> On 6 February 2011 14:24, Artem Bityutskiy <dedekind1@gmail.com> wrote:
>> Hi,
>>
>> On Mon, 2011-01-31 at 12:12 +0000, David Peverley wrote:
>>> Question 1 : The  mtd_subpagetest (which I suspect should fail as the
>>> device doesn't support sub-pages). I googled around and found a
>>> reference that maybe I should add NAND_NO_SUBPAGE_WRITE to the
>>> options. I tried this and it made no difference. Out of curiosity I
>>> grepped through drivers/mtd and found that *no* drivers actully use
>>> this bit anyway...! Is it reasonable to ignore this or ought I address
>>> it? Should I set the flag and expect it to have an effect?
>>
>> MTD code is currently broken and CONFIG_MTD_NAND_VERIFY_WRITE causes
>> errors when sub-pages are used. You should either disable this
>> configuration option or fix MTD. We have this in our FAQ:
>>
>> http://www.linux-mtd.infradead.org/faq/ubi.html#L_subpage_verify_fail
>>
>>
>>> Question 2 : The mtd_stresstest test fails after anywhere between 1000
>>> and 200,000 operations. I'm certain this is a Bad Sign. It fails in
>>> nand_base.c:nand_write_page() in the verification step enabled by
>>> MTD_NAND_VERIFY_WRITE. When I tested this on our previous board (that
>>> ostensibly works fine) it failed the stress test after 2.6M operations
>>> instead. Should I be expecting to never see a failure of the stress
>>> test or is an occasional verify failure reasonably expected?
>>
>> Yes, the test is expected to never fail. You should try to dig and
>> understand why is it failing and what is the reason.
>>
>> --
>> Best Regards,
>> Artem Bityutskiy (Артём Битюцкий)
>>
>>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Testing a device using mtd_stresstest
       [not found]     ` <AANLkTi=UgsLq3=7ma9MPJJBVxtNvyr=ThtLPy8qzC3Bk@mail.gmail.com>
@ 2011-02-10 12:30       ` David Peverley
  0 siblings, 0 replies; 12+ messages in thread
From: David Peverley @ 2011-02-10 12:30 UTC (permalink / raw)
  To: Andrew Murray; +Cc: linux-mtd, dedekind1

Hi Andy!

Actually, we're using SW ECC so I'd assumed that we should be free of
this particular issue. It in our configuration MTD is controlling the
OOB area which reserves the first two bytes for BB marking and the
last 24 bytes for ECC. The remaining 38 bytes are in oobfree for YAFFS
to use. As I understand it the yaffs packed tags struct stored there
is 12 bytes so I *think* it's OK?

Cheers,

~Pev

On 7 February 2011 13:56, Andrew Murray <amurray@mpcdata.com> wrote:
> On 7 February 2011 13:44, David Peverley <pev@sketchymonkey.com> wrote:
>>
>> We've also noticed that every so often we see "uncorrectable error:"
>> messages from nand_ecc.c - do you have any suggestions as to where to
>> start investigating here? So far I can't find a pattern to occurrences
>> or a regular way to reproduce.
>
> Are you using hardware ECC?
> We switched from SW ECC to HW ECC and discovered that YAFFS2 was writing
> over the ECC in the OOB. It may be wroth checking for any such overlap of
> OOB use between the kernel-mtd/YAFFS2 and whatever boot loaders you may
> have.
> We worked around this issue by using in-band tags in our YAFFS2 image.
> Thanks,
> Andrew Murray

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Testing a device using mtd_stresstest
  2011-02-07 14:39     ` Arno Steffen
@ 2011-02-10 15:24       ` David Peverley
  2011-02-18 10:59         ` Arno Steffen
  0 siblings, 1 reply; 12+ messages in thread
From: David Peverley @ 2011-02-10 15:24 UTC (permalink / raw)
  To: Arno Steffen; +Cc: linux-mtd, dedekind1

Hi Arno,

Is the patch you refer to the addition of dmb() to nand_command_lp()
that I found discussed on TI's E2E board? :
  http://e2e.ti.com/support/embedded/f/354/p/56710/234039.aspx

Digging around I managed to find someones GIT commit with some description at :
  http://arago-project.org/git/people/?p=sriram/ti-psp-omap.git;a=commitdiff;h=76319aa1a321c4b5981e412bf489cfb617186c2f

  "When using delay loop for wait states, need to ascertain that
   the write to OMAP HW register is reflected befor the delay
   loop starts. This patch adds a dmb() instruction to this effect.
   Without this fix, NAND read failures reported with mtd_oobtests."

That's really interesting as it's not completely dis-similar to the
idea behind the call to gpio_nand_dosync() found in the gpio nand
driver (mtd/nand/gpio.c) ; in that calls which should be effecting
changes in hardware are not occurring synchronously which we'd like in
these cases... :
  "Make sure the GPIO state changes occur in-order with writes to NAND
   memory region.
   Needed on PXA due to bus-reordering within the SoC itself (see section on
   I/O ordering in PXA manual (section 2.3, p35)"

Which was discussed in more detail here :
  http://patchwork.ozlabs.org/patch/3260/
and here :
  http://patchwork.ozlabs.org/patch/3738/

Interestingly in the former link, the approach of using a generic
memory barrier has been mooted but the verdict was that it wasn't the
right mechanism to enforce this. Additionally, the author of the
driver I'm debugging has added a udelay(2) at the equivalent position
of the first call to gpio_nand_dosync() in the gpio driver with a
comment noting that GPIO's "seemed a bit slow and was causing the
signal to not be set"... Also, we're using a delay loop (chip_delay)
as R/B isn't plumbed in, so all in all I'm wondering if we're
observing something similar. I suspect I need to spend a while reading
through the datasheet..! (PC302)

As far as I can tell, the GPIO NAND driver is only used by the
Compulab ARMCORE with a PXA255 CPU, so the manual in question can be
found at :
  http://www.xscale-freak.com/XSDoc/PXA255/27869302.pdf
where indeed, section 2.3 covers I/O ordering.

Cheers,

~Pev

On 7 February 2011 14:39, Arno Steffen <arno.steffen@googlemail.com> wrote:
> I did some same observations. Especially I digged around with the subpage issue.
> As options failes, I did patch the nand_base.c to set this bit.
> At least test doesn't fail anymore. But this is a far from perfect solution.
>
> With this uncorrectable errors: I struggled with it for month until I
> found that there has been some patch (for OMAP).
> I assume this is for TI OMAP only, but I don't know, what processor do you use.
>
> There are still some other issues in with jffs2, I reported. It seems
> nobody here cares about.
> Artem has fixed one of the reported bugs into ubifs, but this doesn't
> help me much.
> JFFS2 is without support - as far as I could see.
>
> Best regards
> Arno
>
> 2011/2/7 David Peverley <pev@sketchymonkey.com>:
>> Hi Artem,
>>
>> Many thanks for the response ;
>>
>>> MTD code is currently broken and CONFIG_MTD_NAND_VERIFY_WRITE causes
>>> errors when sub-pages are used. You should either disable this
>>> configuration option or fix MTD. We have this in our FAQ:
>>>
>>> http://www.linux-mtd.infradead.org/faq/ubi.html#L_subpage_verify_fail
>> I'm not sure what the implication of this is ; I understand that this
>> will cause the subpage test to fail with CONFIG_MTD_NAND_VERIFY_WRITE
>> enabled. However, the FAQ I had discounted as we use YAFFS2 and not
>> UBIFS. Given that should I still disable the write verify? At the
>> moment I'm inclined to leave it enabled as it seems to be regularly
>> catching failures that should not occur, such as the stress-test
>> failures noted.
>>
>> We've also noticed that every so often we see "uncorrectable error:"
>> messages from nand_ecc.c - do you have any suggestions as to where to
>> start investigating here? So far I can't find a pattern to occurrences
>> or a regular way to reproduce.
>>
>> Thanks again!
>>
>> ~Pev
>>
>> On 6 February 2011 14:24, Artem Bityutskiy <dedekind1@gmail.com> wrote:
>>> Hi,
>>>
>>> On Mon, 2011-01-31 at 12:12 +0000, David Peverley wrote:
>>>> Question 1 : The  mtd_subpagetest (which I suspect should fail as the
>>>> device doesn't support sub-pages). I googled around and found a
>>>> reference that maybe I should add NAND_NO_SUBPAGE_WRITE to the
>>>> options. I tried this and it made no difference. Out of curiosity I
>>>> grepped through drivers/mtd and found that *no* drivers actully use
>>>> this bit anyway...! Is it reasonable to ignore this or ought I address
>>>> it? Should I set the flag and expect it to have an effect?
>>>
>>> MTD code is currently broken and CONFIG_MTD_NAND_VERIFY_WRITE causes
>>> errors when sub-pages are used. You should either disable this
>>> configuration option or fix MTD. We have this in our FAQ:
>>>
>>> http://www.linux-mtd.infradead.org/faq/ubi.html#L_subpage_verify_fail
>>>
>>>
>>>> Question 2 : The mtd_stresstest test fails after anywhere between 1000
>>>> and 200,000 operations. I'm certain this is a Bad Sign. It fails in
>>>> nand_base.c:nand_write_page() in the verification step enabled by
>>>> MTD_NAND_VERIFY_WRITE. When I tested this on our previous board (that
>>>> ostensibly works fine) it failed the stress test after 2.6M operations
>>>> instead. Should I be expecting to never see a failure of the stress
>>>> test or is an occasional verify failure reasonably expected?
>>>
>>> Yes, the test is expected to never fail. You should try to dig and
>>> understand why is it failing and what is the reason.
>>>
>>> --
>>> Best Regards,
>>> Artem Bityutskiy (Артём Битюцкий)
>>>
>>>
>>
>> ______________________________________________________
>> Linux MTD discussion mailing list
>> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Testing a device using mtd_stresstest
       [not found]     ` <AANLkTimJFv1Uy2c70ewPUHYH58rQHT=VsDa3ioU9hJZh@mail.gmail.com>
@ 2011-02-11 14:25       ` David Peverley
  2011-02-11 15:16         ` Artem Bityutskiy
  0 siblings, 1 reply; 12+ messages in thread
From: David Peverley @ 2011-02-11 14:25 UTC (permalink / raw)
  To: Karl Beldan; +Cc: linux-mtd, dedekind1

Hi Karl,

Sometimes... When they are I've manually marked blocks as bad and
re-started the test run :-D I had wondered about whether it was blocks
naturally failing with use, but some of these blocks haven't been (Ok,
SHOULDNT have been!) erased anywhere near 100,000 times so I'm
suspicious about this. When I looked into it I noted that the
mtd_stresstest.ko doesn't mark blocks as bad ever so this is
potentially something that would occur. However, nandtest.c in
mtd-utils I notice *does* support marking of bad blocks during
testing. Should I consider using this instead? I'm not sure what the
relationship between these test tools is...?

Cheers,

~Pev

On 11 February 2011 13:58, Karl Beldan <karl.beldan@gmail.com> wrote:
> On Mon, Feb 7, 2011 at 2:44 PM, David Peverley <pev@sketchymonkey.com>
> wrote:
>>
>> Hi Artem,
>>
>> Many thanks for the response ;
>>
>> > MTD code is currently broken and CONFIG_MTD_NAND_VERIFY_WRITE causes
>> > errors when sub-pages are used. You should either disable this
>> > configuration option or fix MTD. We have this in our FAQ:
>> >
>> > http://www.linux-mtd.infradead.org/faq/ubi.html#L_subpage_verify_fail
>> I'm not sure what the implication of this is ; I understand that this
>> will cause the subpage test to fail with CONFIG_MTD_NAND_VERIFY_WRITE
>> enabled. However, the FAQ I had discounted as we use YAFFS2 and not
>> UBIFS. Given that should I still disable the write verify? At the
>> moment I'm inclined to leave it enabled as it seems to be regularly
>> catching failures that should not occur, such as the stress-test
>> failures noted.
>>
>> We've also noticed that every so often we see "uncorrectable error:"
>> messages from nand_ecc.c - do you have any suggestions as to where to
>> start investigating here? So far I can't find a pattern to occurrences
>> or a regular way to reproduce.
>>
>
> By any chance, wouldn't those "uncorrectable errors" happen to be within the
> same pages/blocks each time ?
> --
> Karl

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Testing a device using mtd_stresstest
  2011-02-07 13:44   ` David Peverley
                       ` (3 preceding siblings ...)
       [not found]     ` <AANLkTimJFv1Uy2c70ewPUHYH58rQHT=VsDa3ioU9hJZh@mail.gmail.com>
@ 2011-02-11 15:12     ` Artem Bityutskiy
  4 siblings, 0 replies; 12+ messages in thread
From: Artem Bityutskiy @ 2011-02-11 15:12 UTC (permalink / raw)
  To: David Peverley; +Cc: linux-mtd

On Mon, 2011-02-07 at 13:44 +0000, David Peverley wrote:
> I'm not sure what the implication of this is ; I understand that this
> will cause the subpage test to fail with CONFIG_MTD_NAND_VERIFY_WRITE
> enabled. However, the FAQ I had discounted as we use YAFFS2 and not
> UBIFS. Given that should I still disable the write verify? At the
> moment I'm inclined to leave it enabled as it seems to be regularly
> catching failures that should not occur, such as the stress-test
> failures noted.

I do not know YAFFS, ask Charles, but I _think_ YAFFS does not use
sub-pages, so you han have that option enabled.

For sure, if you do not use sub-pages and it catches problems - have it
enabled and nail the problems down.

> We've also noticed that every so often we see "uncorrectable error:"
> messages from nand_ecc.c - do you have any suggestions as to where to
> start investigating here? So far I can't find a pattern to occurrences
> or a regular way to reproduce.

Not really - this can be incorrect timings or bad HW. I cannot give you
good suggestions.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Testing a device using mtd_stresstest
  2011-02-11 14:25       ` David Peverley
@ 2011-02-11 15:16         ` Artem Bityutskiy
  2011-02-11 16:42           ` David Peverley
  0 siblings, 1 reply; 12+ messages in thread
From: Artem Bityutskiy @ 2011-02-11 15:16 UTC (permalink / raw)
  To: David Peverley; +Cc: Karl Beldan, linux-mtd

On Fri, 2011-02-11 at 14:25 +0000, David Peverley wrote:
> Sometimes... When they are I've manually marked blocks as bad and
> re-started the test run :-D I had wondered about whether it was blocks
> naturally failing with use, but some of these blocks haven't been (Ok,
> SHOULDNT have been!) erased anywhere near 100,000 times so I'm
> suspicious about this. When I looked into it I noted that the
> mtd_stresstest.ko doesn't mark blocks as bad ever so this is
> potentially something that would occur.

Yeah, I think the tests should not do this, they should just test and
report you issues.

>  However, nandtest.c in
> mtd-utils I notice *does* support marking of bad blocks during
> testing. Should I consider using this instead? I'm not sure what the
> relationship between these test tools is...? 

These are tools wirtten by different people at different times. Kernel
MTD tests were written by Nokia guys and I think the tests are more or
less consistent in how they behave.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Testing a device using mtd_stresstest
  2011-02-11 15:16         ` Artem Bityutskiy
@ 2011-02-11 16:42           ` David Peverley
  0 siblings, 0 replies; 12+ messages in thread
From: David Peverley @ 2011-02-11 16:42 UTC (permalink / raw)
  To: dedekind1; +Cc: Karl Beldan, linux-mtd

Hi Artem,

Thanks for the useful feedback!

> I do not know YAFFS, ask Charles, but I _think_ YAFFS does not use
> sub-pages, so you han have that option enabled.
Yep, I posted to the YAFFS list ; interestingly the write verify
failure found a case in yaffs where a write failure wasn't tested and
(I believe) erroneously completes a checkpoint write as a result...!
Having said that from

> For sure, if you do not use sub-pages and it catches problems - have it
> enabled and nail the problems down.
Sure, that makes sense. Although I'm having fun trying to
differentiate issues ; I get both the write verify failures and the
"uncorrectable errors" and there's not necessarily a direct
correlation between occurrences so my gut tells me they're to separate
issues...!

> Yeah, I think the tests should not do this, they should just test and
> report you issues.
Ok, so to summarise my understanding so far, a failure during the
stresstest is likely to be one of two things ; either a failure due to
a bad block developing which is not unexpected or an actual failure of
the test case per-se, or it could be due to Something Else Bad which
is not expected and IS a test failure. The only way I can see to
differentiate between these two situations is via statistics. i.e. if
a block is repeatedly failing its likely bad. If random blocks are
failing separately its probably something else warranting
investigation. Is that correct?

If so would it be more useful to adapt the (kernel) stresstest so that
it doesn't abort the test run on a failure but instead keeps a tally
of blocks within which failures have occurred and runs to completion.
Does that sound like a beneficial change? I'm not sure what strategy
is used for discerning if a block is bad or not but nandtest.c from
mtd-utils simply marks if an erase or write fails at all so this would
hopefully give more useful feedback from the stresstest. Aborting the
test on what could be a normal bad block seems a little misleading,
although I'm admittedly an unusually ardent fan of tests being
unambiguous... :-)

Thanks!

~Pev

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Testing a device using mtd_stresstest
  2011-02-10 15:24       ` David Peverley
@ 2011-02-18 10:59         ` Arno Steffen
  0 siblings, 0 replies; 12+ messages in thread
From: Arno Steffen @ 2011-02-18 10:59 UTC (permalink / raw)
  To: David Peverley; +Cc: linux-mtd, dedekind1

2011/2/10 David Peverley <pev@sketchymonkey.com>:
> Hi Arno,
>
> Is the patch you refer to the addition of dmb() to nand_command_lp()
> that I found discussed on TI's E2E board? :
>  http://e2e.ti.com/support/embedded/f/354/p/56710/234039.aspx
>
> Digging around I managed to find someones GIT commit with some description at :
>  http://arago-project.org/git/people/?p=sriram/ti-psp-omap.git;a=commitdiff;h=76319aa1a321c4b5981e412bf489cfb617186c2f
>

Yepp, that's exactly what helps me. Before that patch it was a nightmare.

I still have one issue with jffs2 - and I don't know - which of the
behaviours is right or wrong.
Generating a jffs2 will end with data , lets say at 0x140 (it's an
almost empty filesystem).  Rest of partition is 0xff. If I do changes
in filesystem it writes blocks of 0x800 bytes, and the not used data
are set to 0x00.
And then I get a message from mdt module, that there is empty flash
between 0x140 and 0x800. As kernel writes always blocks of 0x800 byte,
this message will not occure here.
So does my mkfs.jffs2 should pad from 0x140 .. 0x800 with 0x00 to work
properly? Or does my kernel / mtd driver misbehaves ?
I am courious whether you know, what is right or wrong.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-02-18 10:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-31 12:12 Testing a device using mtd_stresstest David Peverley
2011-02-06 14:24 ` Artem Bityutskiy
2011-02-07 13:44   ` David Peverley
2011-02-07 14:01     ` Andrew Murray
2011-02-07 14:39     ` Arno Steffen
2011-02-10 15:24       ` David Peverley
2011-02-18 10:59         ` Arno Steffen
     [not found]     ` <AANLkTi=UgsLq3=7ma9MPJJBVxtNvyr=ThtLPy8qzC3Bk@mail.gmail.com>
2011-02-10 12:30       ` David Peverley
     [not found]     ` <AANLkTimJFv1Uy2c70ewPUHYH58rQHT=VsDa3ioU9hJZh@mail.gmail.com>
2011-02-11 14:25       ` David Peverley
2011-02-11 15:16         ` Artem Bityutskiy
2011-02-11 16:42           ` David Peverley
2011-02-11 15:12     ` Artem Bityutskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.