All of lore.kernel.org
 help / color / mirror / Atom feed
* [YAFFS] bad block management policy
@ 2012-08-09 10:45 peterlingoal
  2012-08-09 14:24 ` Peter Barada
  0 siblings, 1 reply; 4+ messages in thread
From: peterlingoal @ 2012-08-09 10:45 UTC (permalink / raw)
  To: linux-mtd

Hi,

I am using YAFFS2 filesystem and some NANDs have hundreds and
thousands (out of 4K) blocks identified bad. After checking I found
YAFFS2 is marking a block bad if three fixable ECC errors happens
within a block. My question is:

1. I am using two Micron NAND chips, one requires minimum 1bit ECC
while the other requires 4. Bit flipping (although all fixable) seems
happen quite often in both types, is this expected behavior?
2. Micron error management doc requests to mark a block bad only when
program or erase operations fails, but not mentioning reading. So is
it safe to remove this ECC error counter? Will it lead to un-fixable
error?

thanks,
Peter

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [YAFFS] bad block management policy
  2012-08-09 10:45 [YAFFS] bad block management policy peterlingoal
@ 2012-08-09 14:24 ` Peter Barada
  2012-08-09 14:55   ` peterlingoal
  2012-08-24 11:44   ` Artem Bityutskiy
  0 siblings, 2 replies; 4+ messages in thread
From: Peter Barada @ 2012-08-09 14:24 UTC (permalink / raw)
  To: linux-mtd

On 08/09/2012 06:45 AM, peterlingoal wrote:
> Hi,
>
> I am using YAFFS2 filesystem and some NANDs have hundreds and
> thousands (out of 4K) blocks identified bad. After checking I found
> YAFFS2 is marking a block bad if three fixable ECC errors happens
> within a block. My question is:
>
> 1. I am using two Micron NAND chips, one requires minimum 1bit ECC
> while the other requires 4. Bit flipping (although all fixable) seems
> happen quite often in both types, is this expected behavior?
> 2. Micron error management doc requests to mark a block bad only when
> program or erase operations fails, but not mentioning reading. So is
> it safe to remove this ECC error counter? Will it lead to un-fixable
> error?
the "strike count" is used to predict when a block has been programmed
enough times that it is close to failure (where programmed data read
back contains uncorrectable bit errors).

This worked fine for the larger-geometry SLC devices that didn't show
correctable ECC errors until a block was very near its end of life. 
However newer small-geometry SLC/MLC devices require stronger ECC to
keep the same UBER (uncorrectable bit error rate) as previous generation
devices.  Unfortunately this means that more correctable errors will be
seen, long before the block is near its end of life.

You could modify YAFFS to ignore -EUCLEAN returns from MTD which will
prevent YAFFS from marking blocks bad prematurely, but then there is no
way to predict when a block is about to wear out and return
uncorrectable errors (-EBADMSG).


-- 
Peter Barada
peter.barada@gmail.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [YAFFS] bad block management policy
  2012-08-09 14:24 ` Peter Barada
@ 2012-08-09 14:55   ` peterlingoal
  2012-08-24 11:44   ` Artem Bityutskiy
  1 sibling, 0 replies; 4+ messages in thread
From: peterlingoal @ 2012-08-09 14:55 UTC (permalink / raw)
  To: Peter Barada; +Cc: linux-mtd

Yes this is also my understanding and what I'm worried about. On one
hand we don't want to have uncorrectable error, on the other hand we
could not let the entire NAND got marked bad before the life cycle. I
wonder how the other file system (or any raw mtd layer implementation)
handle the new small geometry chips.

Will the number of bit errors indicate how 'close' a block is near to
uncorrectable error, For e.g. 3 bit flipping on a chip requiring
minimum 4bit ECC?

regards,
Peter


On Thu, Aug 9, 2012 at 10:24 PM, Peter Barada <peter.barada@gmail.com> wrote:
> On 08/09/2012 06:45 AM, peterlingoal wrote:
>> Hi,
>>
>> I am using YAFFS2 filesystem and some NANDs have hundreds and
>> thousands (out of 4K) blocks identified bad. After checking I found
>> YAFFS2 is marking a block bad if three fixable ECC errors happens
>> within a block. My question is:
>>
>> 1. I am using two Micron NAND chips, one requires minimum 1bit ECC
>> while the other requires 4. Bit flipping (although all fixable) seems
>> happen quite often in both types, is this expected behavior?
>> 2. Micron error management doc requests to mark a block bad only when
>> program or erase operations fails, but not mentioning reading. So is
>> it safe to remove this ECC error counter? Will it lead to un-fixable
>> error?
> the "strike count" is used to predict when a block has been programmed
> enough times that it is close to failure (where programmed data read
> back contains uncorrectable bit errors).
>
> This worked fine for the larger-geometry SLC devices that didn't show
> correctable ECC errors until a block was very near its end of life.
> However newer small-geometry SLC/MLC devices require stronger ECC to
> keep the same UBER (uncorrectable bit error rate) as previous generation
> devices.  Unfortunately this means that more correctable errors will be
> seen, long before the block is near its end of life.
>
> You could modify YAFFS to ignore -EUCLEAN returns from MTD which will
> prevent YAFFS from marking blocks bad prematurely, but then there is no
> way to predict when a block is about to wear out and return
> uncorrectable errors (-EBADMSG).
>
>
> --
> Peter Barada
> peter.barada@gmail.com
>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [YAFFS] bad block management policy
  2012-08-09 14:24 ` Peter Barada
  2012-08-09 14:55   ` peterlingoal
@ 2012-08-24 11:44   ` Artem Bityutskiy
  1 sibling, 0 replies; 4+ messages in thread
From: Artem Bityutskiy @ 2012-08-24 11:44 UTC (permalink / raw)
  To: Peter Barada; +Cc: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 994 bytes --]

On Thu, 2012-08-09 at 10:24 -0400, Peter Barada wrote:
> You could modify YAFFS to ignore -EUCLEAN returns from MTD which will
> prevent YAFFS from marking blocks bad prematurely, but then there is no
> way to predict when a block is about to wear out and return
> uncorrectable errors (-EBADMSG).

We recently improved this area and now you also can set the bitflip
threshold in the MTD so that MTD won't return -EUCLEAN unless the amount
of bits flipped is larger than the threshold. The idea is that UBI
scrubs eraseblocks (moves data to different ones) in case of bitflips.
But 1-bit flips happen so often on some devices, that it is wiser to
ignore those. So now the driver can set the level on which MTD will
start returning -EUCLEAN. Below that flip level, it will return 0.

So this also means that if you set the threshold = ECC strength, you
should never even get -EUCLEAN. Not sure it is good idea, but a
possibility as well.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-08-24 11:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-09 10:45 [YAFFS] bad block management policy peterlingoal
2012-08-09 14:24 ` Peter Barada
2012-08-09 14:55   ` peterlingoal
2012-08-24 11:44   ` Artem Bityutskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.