All of lore.kernel.org
 help / color / mirror / Atom feed
* MLC NAND: all 0xff after erase?
@ 2012-07-11  0:36 Brian Norris
  2012-07-11  6:41 ` Richard Genoud
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Brian Norris @ 2012-07-11  0:36 UTC (permalink / raw)
  To: linux-mtd
  Cc: Mike Dunn, Artem Bityutskiy, Richard Weinberger, Kevin Cernekee,
	Jim Quinlan, Al Viro, Joel Reardon, David Woodhouse,
	Shmulik Ladkani

Hello all,

I've seen some issues with MLC NAND and where I might erase a block,
read it back, and receive a few bitflips such that the data is not
entirely 0xff (i.e., a few bytes may be 0xfe, 0x7f, 0xf7, etc.).
However, I also notice that UBI, UBIFS, and YAFFS2 all make the
assumption that an erased page/block will be totally pristine: all
0xff. This brings me to my main question:

Can someone find an example MLC NAND datasheet that guarantees reading
an erased page will yield all 0xff data?

On first read, in fact, an example MLC datasheet doesn't even define
what "clearing the contents of a block" means. From the datasheet [1]:

  "Erase operations are used to clear the contents of a block in the
NAND Flash array to prepare its pages for program operations."

This doesn't guarantee what happens when performing READ_PAGE on the
cleared block.

Now, if we truly cannot rely on fully-0xff after erase, then this
suggests a need for change in several different layers. A few comments
and proposals:

* A few bitflips on an otherwise-0xff page should not be treated as
unhandled corruption, as long as the erase operation did not return an
error status. Such a page should be still usable, provided the driver
has sufficient ECC capability.

* My NAND controller's HW ECC flags these erased-page bitflips as
uncorrectable errors, as there was no ECC written to the page and the
read data does not match the 0xff special "erased" case. I assume most
other ECC mechanisms would treat this similarly [2]. In such cases, I
suspect that the best the driver can do is to return the raw data
(0xff with flips) and an ECC error message.

* UBI and other FS layers need to distinguish between: (a) 0xff
cleanly-erased, (b) 0xff with bitflips, and (c) true ECC errors. In
the end, we may treat (a) and (b) the same (as erased pages), but the
problem is distinguishing between (b) and (c). This may require
modifications to:

  - the MTD API, to provide explicit notification of erased blocks.
For instance, we might introduce a new return code for mtd_read() that
represents (b); when MTD detects an ECC error, we check for all 0xff
with a threshold of bitflips, then return either -EBADMSG or a special
ERASED code.

  - UBI/UBIFS/other-FS's, to utilize the new ERASED return code that
can be checked before checking for all-0xff. Either the ERASED code or
all-0xff data would be considered "erased".

Any comments are welcome, especially regarding my first question.

Thanks,
Brian

[1] Datasheet for Micron MT29F32G08CBABA
[2] Counterexample: it seems NAND_ECC_SOFT actually corrects a single
bitflip in 0xff data. Tested with nandsim and `nandwrite -n -o`.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: MLC NAND: all 0xff after erase?
  2012-07-11  0:36 MLC NAND: all 0xff after erase? Brian Norris
@ 2012-07-11  6:41 ` Richard Genoud
  2012-07-13 21:22   ` Brian Norris
  2012-07-11 16:46 ` Mike Dunn
  2012-07-11 17:43 ` Ivan Djelic
  2 siblings, 1 reply; 10+ messages in thread
From: Richard Genoud @ 2012-07-11  6:41 UTC (permalink / raw)
  To: Brian Norris
  Cc: Mike Dunn, Artem Bityutskiy, Richard Weinberger, Kevin Cernekee,
	Jim Quinlan, linux-mtd, Al Viro, Joel Reardon, David Woodhouse,
	Shmulik Ladkani

2012/7/11 Brian Norris <computersforpeace@gmail.com>:
> Hello all,
>
> I've seen some issues with MLC NAND and where I might erase a block,
> read it back, and receive a few bitflips such that the data is not
> entirely 0xff (i.e., a few bytes may be 0xfe, 0x7f, 0xf7, etc.).
If you read the same block a few ms later, is the data the same or is
it all 0xff or.. ?

> [2] Counterexample: it seems NAND_ECC_SOFT actually corrects a single
> bitflip in 0xff data. Tested with nandsim and `nandwrite -n -o`.
If I recall correctly, is because the soft ECC of an FF page is
FFFFF.. and most (maybe all) hard ECC of FF page are not.


Richard.

-- 
for me, ck means con kolivas and not calvin klein... does it mean I'm a geek ?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: MLC NAND: all 0xff after erase?
  2012-07-11  0:36 MLC NAND: all 0xff after erase? Brian Norris
  2012-07-11  6:41 ` Richard Genoud
@ 2012-07-11 16:46 ` Mike Dunn
  2012-07-11 17:43 ` Ivan Djelic
  2 siblings, 0 replies; 10+ messages in thread
From: Mike Dunn @ 2012-07-11 16:46 UTC (permalink / raw)
  To: Brian Norris
  Cc: Artem Bityutskiy, Richard Weinberger, Kevin Cernekee,
	Jim Quinlan, linux-mtd, Al Viro, Joel Reardon, ivan.djelic,
	David Woodhouse, Shmulik Ladkani


> * My NAND controller's HW ECC flags these erased-page bitflips as
> uncorrectable errors, as there was no ECC written to the page and the
> read data does not match the 0xff special "erased" case. I assume most


Ditto for the newer diskonchips...


> other ECC mechanisms would treat this similarly [2]. In such cases, I
> suspect that the best the driver can do is to return the raw data
> (0xff with flips) and an ECC error message.
> 
> * UBI and other FS layers need to distinguish between: (a) 0xff
> cleanly-erased, (b) 0xff with bitflips, and (c) true ECC errors. In
> the end, we may treat (a) and (b) the same (as erased pages), but the
> problem is distinguishing between (b) and (c).


To distinguish between (b) and (c), the docg4 driver dedicates an unused oob
byte as a "page written" flag.  This byte is cleared whenever the page is
written.  When uncorrectible bitflip errors occur, if more than half of the bits
in this flag are set, the page is assumed to be blank, but with bitflips.  In
this case the driver quietly ignores the ecc error, but returns the blank page
data with the bitflips.  Not a complete solution, and also specific to one
particular driver, but food for thought.  This scheme was actually suggested by
Ivan Djelic.

Mike

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: MLC NAND: all 0xff after erase?
  2012-07-11  0:36 MLC NAND: all 0xff after erase? Brian Norris
  2012-07-11  6:41 ` Richard Genoud
  2012-07-11 16:46 ` Mike Dunn
@ 2012-07-11 17:43 ` Ivan Djelic
  2012-08-07 10:11   ` Calvin Johnson
                     ` (2 more replies)
  2 siblings, 3 replies; 10+ messages in thread
From: Ivan Djelic @ 2012-07-11 17:43 UTC (permalink / raw)
  To: Brian Norris
  Cc: Mike Dunn, Artem Bityutskiy, Richard Weinberger, Kevin Cernekee,
	Jim Quinlan, linux-mtd, Al Viro, Joel Reardon, David Woodhouse,
	Shmulik Ladkani

On Wed, Jul 11, 2012 at 01:36:51AM +0100, Brian Norris wrote:
> Hello all,
> 
> I've seen some issues with MLC NAND and where I might erase a block,
> read it back, and receive a few bitflips such that the data is not
> entirely 0xff (i.e., a few bytes may be 0xfe, 0x7f, 0xf7, etc.).
> However, I also notice that UBI, UBIFS, and YAFFS2 all make the
> assumption that an erased page/block will be totally pristine: all
> 0xff. This brings me to my main question:
> 
> Can someone find an example MLC NAND datasheet that guarantees reading
> an erased page will yield all 0xff data?

Hello Brian,

Due to the nature of NAND bitflips, I cannot see how a NAND datasheet
could guarantee such a thing (what would be the duration of a "0xff
guarantee" anyway ?). In practice, bitflips do appear already on 34nm SLC
devices, on blocks that have just been erased; hence I am not surprised
by your own findings on MLC devices.

See below for additional comments,

(...)
> Now, if we truly cannot rely on fully-0xff after erase, then this
> suggests a need for change in several different layers. A few comments
> and proposals:
> 
> * A few bitflips on an otherwise-0xff page should not be treated as
> unhandled corruption, as long as the erase operation did not return an
> error status. Such a page should be still usable, provided the driver
> has sufficient ECC capability.

Agreed.
 
> * My NAND controller's HW ECC flags these erased-page bitflips as
> uncorrectable errors, as there was no ECC written to the page and the
> read data does not match the 0xff special "erased" case. I assume most
> other ECC mechanisms would treat this similarly [2]. In such cases, I
> suspect that the best the driver can do is to return the raw data
> (0xff with flips) and an ECC error message.

So I guess you need to compare the _entire_ page with 0xff in your special
"erased" case ?

In my experience, the most efficient way of telling if a page has been
programmed or not is to explicitly write some kind of "programming marker"
(typically a zero byte) along with ECC in the OOB area.
This method is used by some manufacturers to implement internal ECC, when
the HW ECC bytes are such that an erased page does not have a valid ECC.
It also allows to distinguish your (b) and (c) cases.

In some cases, this marker is impossible to implement because HW controller
does not allow writing a custom byte in the OOB area; or because it would
break backward compatibility with existing devices.

Alternatively, if your controller allows it, you can xor HW-generated bytes
with a well-chosen polynomial before writing them and after reading them,
such that an erased page now has a valid ECC.

> * UBI and other FS layers need to distinguish between: (a) 0xff
> cleanly-erased, (b) 0xff with bitflips, and (c) true ECC errors. In
> the end, we may treat (a) and (b) the same (as erased pages), but the
> problem is distinguishing between (b) and (c). This may require
> modifications to:
> 
>   - the MTD API, to provide explicit notification of erased blocks.
> For instance, we might introduce a new return code for mtd_read() that
> represents (b); when MTD detects an ECC error, we check for all 0xff
> with a threshold of bitflips, then return either -EBADMSG or a special
> ERASED code.

If some MTD generic code is in charge of analyzing bitflips in empty pages,
and deciding if the number of bitflips is "acceptable" (i.e. will be correctable
by HW ecc), then it will need to know not only the ecc strength, but also
the layout of ecc blocks inside pages: having all bitflips located in the same
ecc block is not the same as having scattered bitflips. As of now this ecc block
layout is not exposed by drivers.

Maybe this "erased-page bitflip concealment" would be better managed inside
drivers that are not able to correct erased pages ? (this is what we do on
one of our platforms -- the rest are able to correct bitflips on erased pages).

Or, conversely, we could decide that erased pages are simply not ecc-protected
(which is the actual truth with many drivers), can contain anything (including
bitflips), and should be signalled as erased and dealt with in upper layers...

This is not as crazy as it may seem, because recent devices require that you
keep track of each write/erase operation and use out-of-place updates, to
guarantee power-failure robustness. In such a context, you don't rely much on
the contents of erased blocks, but track their state in a different way...

Just my 2c,

Best regards,
--
Ivan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: MLC NAND: all 0xff after erase?
  2012-07-11  6:41 ` Richard Genoud
@ 2012-07-13 21:22   ` Brian Norris
  0 siblings, 0 replies; 10+ messages in thread
From: Brian Norris @ 2012-07-13 21:22 UTC (permalink / raw)
  To: Richard Genoud
  Cc: Mike Dunn, Artem Bityutskiy, Richard Weinberger, Kevin Cernekee,
	Jim Quinlan, linux-mtd, Al Viro, Joel Reardon, David Woodhouse,
	Shmulik Ladkani

On Tue, Jul 10, 2012 at 11:41 PM, Richard Genoud
<richard.genoud@gmail.com> wrote:
> 2012/7/11 Brian Norris <computersforpeace@gmail.com>:
>> Hello all,
>>
>> I've seen some issues with MLC NAND and where I might erase a block,
>> read it back, and receive a few bitflips such that the data is not
>> entirely 0xff (i.e., a few bytes may be 0xfe, 0x7f, 0xf7, etc.).
>
> If you read the same block a few ms later, is the data the same or is
> it all 0xff or.. ?

Generally, the data is the same.

Brian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: MLC NAND: all 0xff after erase?
  2012-07-11 17:43 ` Ivan Djelic
@ 2012-08-07 10:11   ` Calvin Johnson
       [not found]   ` <CAEhpT-UMk0hiDaKAwn8GtS0B3HHRudbSUSHpZ68-i+LmMNH-=A@mail.gmail.com>
  2012-08-17  9:51   ` Artem Bityutskiy
  2 siblings, 0 replies; 10+ messages in thread
From: Calvin Johnson @ 2012-08-07 10:11 UTC (permalink / raw)
  To: Ivan Djelic
  Cc: Mike Dunn, Artem Bityutskiy, Richard Weinberger, Kevin Cernekee,
	Jim Quinlan, linux-mtd, Al Viro, Joel Reardon, Brian Norris,
	David Woodhouse, Shmulik Ladkani

Hello Ivan,

>
> >>Due to the nature of NAND bitflips, I cannot see how a NAND datasheet
> >> could guarantee such a thing (what would be the duration of a "0xff
> >> guarantee" anyway ?). In practice, bitflips do appear already on 34nm SLC
> >> devices, on blocks that have just been erased; hence I am not surprised
> >> by your own findings on MLC devices.
>

Are these bit flips occurring due to power fluctuations while
performing program/erase as mentioned in
http://www.linux-mtd.infradead.org/doc/ubifs.html#L_unstable_bits ?

If that is the case, I am observing a different problem with a MLC NAND flash.

I wrote 4K bytes of data and read back the same 4K several times. The
page size is 4K. I am NOT performing multiple erase/programs. Please
note that I am reading back the same data which sometimes matches
exactly what was written and sometimes does not, showing bit flips at
random locations.  I am not using any ECC to correct the bit errors,
which of course will be done later as I'm trying to understand this
problem.

Is this behavior expected in MLC NANDs? Is there any reference
document/links which discuss more about this?

I have read about  read disturb errors but as I understand it is a
permanent error.(http://www.klabs.org/richcontent/MemoryContent/nvmt_symp/nvmts_2002/docs/12/12_dan_p.pdf
)
----------------------------------------------------------------------------------------------
Read Disturb Errors
The read disturb effect causes a page read operation to induce a
permanent, bit value change in one of the read bits. In BLC flash
technology based on a 0.16μ manufacturing
3 process, the typical read disturb error rate is on the order of 1
bit error per 106 repetitive reads of the page containing the bit.
Although MLC cells are more prone to such errors, the effect in actual
measurements is less severe than in program disturb errors. The
measured rate is on the order of 1 bit error per approximately 105
repetitive reads of the page.
----------------------------------------------------------------------------------------------

Thanks,
Calvin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: MLC NAND: all 0xff after erase?
       [not found]   ` <CAEhpT-UMk0hiDaKAwn8GtS0B3HHRudbSUSHpZ68-i+LmMNH-=A@mail.gmail.com>
@ 2012-08-07 20:00     ` Ivan Djelic
  2012-08-09  3:50       ` Calvin Johnson
  0 siblings, 1 reply; 10+ messages in thread
From: Ivan Djelic @ 2012-08-07 20:00 UTC (permalink / raw)
  To: Calvin Johnson
  Cc: Mike Dunn, Artem Bityutskiy, Richard Weinberger, Kevin Cernekee,
	Jim Quinlan, linux-mtd, Al Viro, Joel Reardon, peter.barada,
	Brian Norris, David Woodhouse, Shmulik Ladkani

On Tue, Aug 07, 2012 at 11:08:04AM +0100, Calvin Johnson wrote:
> Hello Ivan,
> 
> 
> >>Due to the nature of NAND bitflips, I cannot see how a NAND datasheet
> >> could guarantee such a thing (what would be the duration of a "0xff
> >> guarantee" anyway ?). In practice, bitflips do appear already on 34nm SLC
> >> devices, on blocks that have just been erased; hence I am not surprised
> >> by your own findings on MLC devices.
> 
> 
> Are these bit flips occurring due to power fluctuations while performing program/erase as mentioned in http://www.linux-mtd.infradead.org/doc/ubifs.html#L_unstable_bits ?

Hello Calvin,

No, the bitflips I was referring to are not caused by an interrupted erase or program operation.
They just appear when reading back an erased block. They sometimes exhibit a specific pattern: the same bit column is flipped on multiple
pages in the same block.
 
> If that is the case, I am observing a different problem with a MLC NAND flash.
> 
> I wrote 4K bytes of data and read back the same 4K several times. The page size is 4K. I am NOT performing multiple erase/programs. Please note that I am reading back the same data which sometimes matches exactly what was written and sometimes does not, showing bit flips at random locations.  I am not using any ECC to correct the bit errors, which of course will be done later as I'm trying to understand this problem.

Is the amount of observed errors always within the ECC range recommended for this device ?

 
> Is this behavior expected in MLC NANDs? Is there any reference document/links which discuss more about this?
> 
> I have read about  read disturb errors but as I understand it is a permanent error.(http://www.klabs.org/richcontent/MemoryContent/nvmt_symp/nvmts_2002/docs/12/12_dan_p.pdf )
> ----------------------------------------------------------------------------------------------
> Read Disturb Errors
> The read disturb effect causes a page read operation to induce a permanent, bit value change in one of the read bits. In BLC flash technology based on a 0..16μ manufacturing
> 3 process, the typical read disturb error rate is on the order of 1 bit error per 106 repetitive reads of the page containing the bit.
> Although MLC cells are more prone to such errors, the effect in actual measurements is less severe than in program disturb errors. The measured rate is on the order of 1 bit error per approximately 105 repetitive reads of the page.
> 
> ----------------------------------------------------------------------------------------------

I don't have much experience with raw MLC devices, but at least I can share a few interesting docs:

* An interesting article about SLC and MLC technology: http://www.eetimes.com/design/memory-design/4390427/-SLC-vs-MLC--Which-works-best-for-high-reliability-applications-

* An overview of NAND technology from Micron: http://www.google.com/url?sa=t&rct=j&q=nand%20mlc%20erratic%20read%20errors&source=web&cd=1&ved=0CE4QFjAA&url=http%3A%2F%2Fdownload.micron.com%2Fpdf%2Fpresentations%2Fevents%2Fflash_mem_summit_jcooke_inconvenient_truths_nand.pdf&ei=MHEhUImCLcGH0AXPzYG4Aw&usg=AFQjCNGfW2BXUfAt9zLU0Nlc5WSYooZwrA&cad=rja

* A highly technical book about NAND technology, including bit error machanisms: http://books.google.com/books?id=vaq11vKwo_kC&printsec=frontcover&dq=Inside+NAND+Flash&source=bl&ots=UIULMnmFv3&sig=pqLo7iQ2HXmLvkxcWcSWgpiqEoc&hl=en&sa=X&ei=iXAhUPDRNMHS0QXgmIDACA&ved=0CDUQ6AEwAA

* A very interesting presentation from Intel: http://www.stanford.edu/class/ee380/Abstracts/081112-Fazio-slides.pdf

My guess is that the erratic/transient bitflips that you are observing are not uncommon on MLC devices...

BR,
--
Ivan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: MLC NAND: all 0xff after erase?
  2012-08-07 20:00     ` Ivan Djelic
@ 2012-08-09  3:50       ` Calvin Johnson
  0 siblings, 0 replies; 10+ messages in thread
From: Calvin Johnson @ 2012-08-09  3:50 UTC (permalink / raw)
  To: Ivan Djelic
  Cc: Mike Dunn, Artem Bityutskiy, Richard Weinberger, Kevin Cernekee,
	Jim Quinlan, linux-mtd, Al Viro, Joel Reardon, peter.barada,
	Brian Norris, David Woodhouse, Shmulik Ladkani

On Wed, Aug 8, 2012 at 1:30 AM, Ivan Djelic <ivan.djelic@parrot.com> wrote:
> On Tue, Aug 07, 2012 at 11:08:04AM +0100, Calvin Johnson wrote:
>> Hello Ivan,
>>
>>
>> >>Due to the nature of NAND bitflips, I cannot see how a NAND datasheet
>> >> could guarantee such a thing (what would be the duration of a "0xff
>> >> guarantee" anyway ?). In practice, bitflips do appear already on 34nm SLC
>> >> devices, on blocks that have just been erased; hence I am not surprised
>> >> by your own findings on MLC devices.
>>
>>
>> Are these bit flips occurring due to power fluctuations while performing program/erase as mentioned in http://www.linux-mtd.infradead.org/doc/ubifs.html#L_unstable_bits ?
>
> Hello Calvin,
>
> No, the bitflips I was referring to are not caused by an interrupted erase or program operation.
> They just appear when reading back an erased block. They sometimes exhibit a specific pattern: the same bit column is flipped on multiple
> pages in the same block.
>

Thanks a lot Ivan. From some experts, I got some more info about the
reason behind this behaviour. I'm sharing them below.

-------------------------------------------------------------------------------------------------------------------------------------------------------------
The memory array is composed of strings of cells and numerous rows of
parallel selection lines.  When the array is being read these lines
are energized.  Granted, it is a low voltage.  But, as you know, any
electrical potential difference introduces the possibility of electron
migration.  At the current geometry technology, we’re only storing
about 100 electrons on a gate (That’s not a spec’ed count).  If you
multiply even a small possibility times 16,000,000,000 bits and a
large number of reads, it’s not unreasonable to expect an occasional
bit shift.
-------------------------------------------------------------------------------------------------------------------------------------------------------------
it is due to the fact that some bits are not well inside the
distribution of the programmed cells.
Some bits are not "substituted" because can be corrected by the ECC,
but when programmed they are in the edges of the distribution and can
be read differently.
Obviously this “not screened” bits can be managed and corrected using
the specified ECC .
-------------------------------------------------------------------------------------------------------------------------------------------------------------

best regards,
Calvin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: MLC NAND: all 0xff after erase?
  2012-07-11 17:43 ` Ivan Djelic
  2012-08-07 10:11   ` Calvin Johnson
       [not found]   ` <CAEhpT-UMk0hiDaKAwn8GtS0B3HHRudbSUSHpZ68-i+LmMNH-=A@mail.gmail.com>
@ 2012-08-17  9:51   ` Artem Bityutskiy
  2012-08-17 13:54     ` Matthieu CASTET
  2 siblings, 1 reply; 10+ messages in thread
From: Artem Bityutskiy @ 2012-08-17  9:51 UTC (permalink / raw)
  To: Ivan Djelic
  Cc: Mike Dunn, Richard Weinberger, Kevin Cernekee, Jim Quinlan,
	linux-mtd, Al Viro, Joel Reardon, Brian Norris, David Woodhouse,
	Shmulik Ladkani

[-- Attachment #1: Type: text/plain, Size: 1494 bytes --]

On Wed, 2012-07-11 at 19:43 +0200, Ivan Djelic wrote:
> Or, conversely, we could decide that erased pages are simply not
> ecc-protected
> (which is the actual truth with many drivers), can contain anything
> (including
> bitflips), and should be signalled as erased and dealt with in upper
> layers... 

I did not not investigate this in details, but I believe UBI and UBIFS
can be changed and they can allow for a number of bit-flips. There are
only few places (may be even 2 - one in UBI and one in UBIFS) which
check if the area contains all 0xFFs. I do not see any obstacles
improving this and implement a smarter functions which would take a
buffer, it's length, ecc step size, and max allowable bit-flips as a
parameter, and check if the page is empty. This could even be an MTD
helper, something like 'mtd_area_is_empty()'.

I think in UBI we only verify if an area is empty in the debugging code,
to make sure we never write over older data. Should be easily fixable.

In UBIFS probably in scanning/recovery code we need to find where free
space starts, probably in a couple of places. E.g., if we are scanning
the journal, and then hit a corrupted node, we want to know if it is the
last node or not. We check this by looking if the empty space starts
after the corruption - in the next min. I/O unit (NAND page), taking
into account the node length.

So I think we only need someone brave enough to implement this.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: MLC NAND: all 0xff after erase?
  2012-08-17  9:51   ` Artem Bityutskiy
@ 2012-08-17 13:54     ` Matthieu CASTET
  0 siblings, 0 replies; 10+ messages in thread
From: Matthieu CASTET @ 2012-08-17 13:54 UTC (permalink / raw)
  To: dedekind1
  Cc: Mike Dunn, Richard Weinberger, Kevin Cernekee, Jim Quinlan,
	linux-mtd, Al Viro, Joel Reardon, Ivan Djelic, Brian Norris,
	David Woodhouse, Shmulik Ladkani

Artem Bityutskiy a écrit :
> On Wed, 2012-07-11 at 19:43 +0200, Ivan Djelic wrote:
>> Or, conversely, we could decide that erased pages are simply not
>> ecc-protected
>> (which is the actual truth with many drivers), can contain anything
>> (including
>> bitflips), and should be signalled as erased and dealt with in upper
>> layers... 
> 
> I did not not investigate this in details, but I believe UBI and UBIFS
> can be changed and they can allow for a number of bit-flips. There are
> only few places (may be even 2 - one in UBI and one in UBIFS) which
> check if the area contains all 0xFFs. I do not see any obstacles
> improving this and implement a smarter functions which would take a
> buffer, it's length, ecc step size, and max allowable bit-flips as a
> parameter, and check if the page is empty. This could even be an MTD
> helper, something like 'mtd_area_is_empty()'.
> 
> I think in UBI we only verify if an area is empty in the debugging code,
> to make sure we never write over older data. Should be easily fixable.
AFAIK, we also check it when we do bit scrubbing on dynamic volume. We need to
guess the written data size to do crc on it and not on the whole LEB (become it
can be written later and will make the crc false).

Matthieu

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-08-17 13:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-11  0:36 MLC NAND: all 0xff after erase? Brian Norris
2012-07-11  6:41 ` Richard Genoud
2012-07-13 21:22   ` Brian Norris
2012-07-11 16:46 ` Mike Dunn
2012-07-11 17:43 ` Ivan Djelic
2012-08-07 10:11   ` Calvin Johnson
     [not found]   ` <CAEhpT-UMk0hiDaKAwn8GtS0B3HHRudbSUSHpZ68-i+LmMNH-=A@mail.gmail.com>
2012-08-07 20:00     ` Ivan Djelic
2012-08-09  3:50       ` Calvin Johnson
2012-08-17  9:51   ` Artem Bityutskiy
2012-08-17 13:54     ` Matthieu CASTET

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.