All of lore.kernel.org
 help / color / mirror / Atom feed
* dangerous NAND_BBT_SCANBYTE1AND6
@ 2011-04-21 15:52 Matthieu CASTET
  2011-04-21 17:10 ` Ivan Djelic
  2011-04-21 17:33 ` Brian Norris
  0 siblings, 2 replies; 17+ messages in thread
From: Matthieu CASTET @ 2011-04-21 15:52 UTC (permalink / raw)
  To: linux-mtd, Brian Norris

Hi,

I believe NAND_BBT_SCANBYTE1AND6 behavior is very dangerous.
We have a ST flash where ecc where but on bit 5 and 6.
With new kernel all block are bad.

Is this option is really needed ?
ST datasheet say [1]. We already check the first Word.
Why do we need to check the 6th Byte ?


Matthieu

PS : the code check 1st, 2nd, 6th, 7th Bytes. So it check too much bytes.


[1]
The devices are supplied with all the locations inside valid blocks erased
(FFh). The Bad
Block Information is written prior to shipping. Any block, where the 1st and 6th
Bytes, or 1st
Word, in the spare area of the 1st page, does not contain FFh, is a Bad Block.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dangerous NAND_BBT_SCANBYTE1AND6
  2011-04-21 15:52 dangerous NAND_BBT_SCANBYTE1AND6 Matthieu CASTET
@ 2011-04-21 17:10 ` Ivan Djelic
  2011-04-22  4:50   ` Brian Norris
  2011-04-22  8:23   ` Artem Bityutskiy
  2011-04-21 17:33 ` Brian Norris
  1 sibling, 2 replies; 17+ messages in thread
From: Ivan Djelic @ 2011-04-21 17:10 UTC (permalink / raw)
  To: Matthieu CASTET; +Cc: linux-mtd, Brian Norris

On Thu, Apr 21, 2011 at 04:52:59PM +0100, Matthieu Castet wrote:
> Hi,
> 
> I believe NAND_BBT_SCANBYTE1AND6 behavior is very dangerous.
> We have a ST flash where ecc where but on bit 5 and 6.
> With new kernel all block are bad.
> 
> Is this option is really needed ?
> ST datasheet say [1]. We already check the first Word.
> Why do we need to check the 6th Byte ?

I agree with Matthieu, NAND_BBT_SCANBYTE1AND6 code also seems wrong to me.

Old small page nand devices used to have their bad block marker in 6th byte of
the spare area of the first page.

ST datasheet says that factory bad blocks will have _both_ bytes cleared
(1st and 6th); I guess this was done to allow choosing which marker to check
(but I may be wrong). Maybe to be compatible with large page marker location
scheme (again, just guessing).

Option NAND_BBT_SCANBYTE1AND6 code was introduced in commit
58373ff0afff4cc8ac40608872995f4d87eb72ec; but the commit message does not
clearly explain why both markers should be checked.

My understanding of bad block markers is (please correct me if I am wrong):
small page => check 6th byte of spare area of first page
large page, non-ONFI => check first word of spare area of first page
ONFI => see ONFI spec

Ivan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dangerous NAND_BBT_SCANBYTE1AND6
  2011-04-21 15:52 dangerous NAND_BBT_SCANBYTE1AND6 Matthieu CASTET
  2011-04-21 17:10 ` Ivan Djelic
@ 2011-04-21 17:33 ` Brian Norris
  2011-04-22  9:02   ` Matthieu CASTET
  1 sibling, 1 reply; 17+ messages in thread
From: Brian Norris @ 2011-04-21 17:33 UTC (permalink / raw)
  To: Matthieu CASTET; +Cc: linux-mtd, Brian Norris

Hi

On 4/21/2011 8:52 AM, Matthieu CASTET wrote:
> I believe NAND_BBT_SCANBYTE1AND6 behavior is very dangerous.
> We have a ST flash where ecc where but on bit 5 and 6.
> With new kernel all block are bad.
>
> Is this option is really needed ?
> ST datasheet say [1]. We already check the first Word.
> Why do we need to check the 6th Byte ?
>
>
> Matthieu
>
> PS : the code check 1st, 2nd, 6th, 7th Bytes. So it check too much bytes.
>
>
> [1]
> The devices are supplied with all the locations inside valid blocks erased
> (FFh). The Bad
> Block Information is written prior to shipping. Any block, where the 1st and 6th
> Bytes, or 1st
> Word, in the spare area of the 1st page, does not contain FFh, is a Bad Block.

I've tried my best to verify that any modifications I have made to bad 
block scanning comply with the data sheets, but I very well could have 
made mistakes (especially since there are so many different types of 
scanning patterns, and very few manufacturers are actually being 
consistent with these things).

That being said, I believe that the data sheet you quoted has some answer:
"Any block, where the 1st and 6th Bytes, or 1st Word, in the spare area 
of the 1st page, does not contain FFh, is a Bad Block."
AFAICT, this description means that x8 buswidth devices must scan bytes 
1 and 6 while x16 devices only need to scan the first word. So I bet 
your device is actually an x8 device and so the 1st/6th byte pattern is 
correct. I think the fact that this conflicts with your ECC patterns is 
something you must deal with.

 > PS : the code check 1st, 2nd, 6th, 7th Bytes. So it check too much bytes.

I've seen this before. This may be incorrect. Are you sure it's not 1st, 
2nd, 5th, 6th though? I believe the "2-byte scans" were chosen before to 
keep from having to differentiate between x8/x16 buses. Perhaps this 
should be changed. (volunteers?)

While we're on the subject: do people use x16 buses on NAND anymore?

Brian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dangerous NAND_BBT_SCANBYTE1AND6
  2011-04-21 17:10 ` Ivan Djelic
@ 2011-04-22  4:50   ` Brian Norris
  2011-04-22  8:23   ` Artem Bityutskiy
  1 sibling, 0 replies; 17+ messages in thread
From: Brian Norris @ 2011-04-22  4:50 UTC (permalink / raw)
  To: Ivan Djelic; +Cc: Brian Norris, linux-mtd, Matthieu CASTET

Hi Ivan,

(FYI, please use my @gmail.com, not my @broadcom address.)

I can't say I know everything about the intentions and history of all 
statements in various NAND flash data sheets, but I have read many of 
them and will try to explain my view. Of course, I may be wrong.

On 4/21/2011 10:10 AM, Ivan Djelic wrote:
> Old small page nand devices used to have their bad block marker in 6th byte of
> the spare area of the first page.

Correct

> ST datasheet says that factory bad blocks will have _both_ bytes cleared
> (1st and 6th); I guess this was done to allow choosing which marker to check
> (but I may be wrong). Maybe to be compatible with large page marker location
> scheme (again, just guessing).

The actual statement is one of these two (pulled from various ST and 
Numonyx sheets):

"Any block, where the 1st and 6th bytes or the 1st word in the spare 
area of the 1st page, does not contain FFh, is a bad block."

"Any block, where the 1st and 6th bytes, in the spare area of the first
page, does not contain FFh is a bad block."

Strictly speaking, neither of these "sentences" uses correct grammar, as 
the commas are placed arbitrarily. Most importantly, though, I don't 
think they make clear the following:

1) Does the manufacturer guarantee that BOTH bytes are non-FFh?
2) Does the manufacturer guarantee that the combined bytes ("1st and 
6th") contain a non-FFh byte?

I understood it as the latter, and so decided the scan needed both bytes 
(perhaps one byte was written successfully but not the other). However, 
your argument for choice (1) ("this was done to allow choosing which 
marker to check") makes just as much sense (or more) to me.

In trying to decide why I came to conclude choice (2) and not (1), I 
recall that some Hynix and Samsung parts explicitly declare that the 
first OR second page may be used, in case the first page is bad. I may 
have subconsciously applied this 1st/2nd page concept to the 1st/6th 
bytes logic.

> My understanding of bad block markers is (please correct me if I am wrong):
> small page =>  check 6th byte of spare area of first page
> large page, non-ONFI =>  check first word of spare area of first page
> ONFI =>  see ONFI spec

Unfortunately, small page, large page, and ONFI are 3 classifications 
that oversimplify bad block markers.

Some people (especially Samsung and Hynix, but even some Micron) got 
creative. Some of their chips use:
1st or 2nd page
the last page
the 1st or last page
the last or (last - 2)th page

And of course, there's the controversial 1st/6th byte usage - that I'm 
still not clear on. Some of these scanning patterns are rare, but they 
do exist.

Sorry for any confusion, but I guess it's better late than never for 
this sort of discussion...

Brian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dangerous NAND_BBT_SCANBYTE1AND6
  2011-04-21 17:10 ` Ivan Djelic
  2011-04-22  4:50   ` Brian Norris
@ 2011-04-22  8:23   ` Artem Bityutskiy
  2011-04-22  8:53     ` Matthieu CASTET
  1 sibling, 1 reply; 17+ messages in thread
From: Artem Bityutskiy @ 2011-04-22  8:23 UTC (permalink / raw)
  To: Ivan Djelic; +Cc: linux-mtd, Brian Norris, Matthieu CASTET

On Thu, 2011-04-21 at 19:10 +0200, Ivan Djelic wrote:
> On Thu, Apr 21, 2011 at 04:52:59PM +0100, Matthieu Castet wrote:
> > Hi,
> > 
> > I believe NAND_BBT_SCANBYTE1AND6 behavior is very dangerous.
> > We have a ST flash where ecc where but on bit 5 and 6.
> > With new kernel all block are bad.
> > 
> > Is this option is really needed ?
> > ST datasheet say [1]. We already check the first Word.
> > Why do we need to check the 6th Byte ?
> 
> I agree with Matthieu, NAND_BBT_SCANBYTE1AND6 code also seems wrong to me.

This just means that we need a better way for drivers to inform the
generic code about how exactly blocks are marked as bad. Probably
drivers could describe this with a data structure, and sometimes even
provide a "is_block_bad()" function.

The options seem to be not enough.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dangerous NAND_BBT_SCANBYTE1AND6
  2011-04-22  8:23   ` Artem Bityutskiy
@ 2011-04-22  8:53     ` Matthieu CASTET
  2011-04-22  9:28       ` Artem Bityutskiy
  0 siblings, 1 reply; 17+ messages in thread
From: Matthieu CASTET @ 2011-04-22  8:53 UTC (permalink / raw)
  To: dedekind1; +Cc: Ivan Djelic, linux-mtd, Brian Norris

Artem Bityutskiy a écrit :
> On Thu, 2011-04-21 at 19:10 +0200, Ivan Djelic wrote:
>> On Thu, Apr 21, 2011 at 04:52:59PM +0100, Matthieu Castet wrote:
>>> Hi,
>>>
>>> I believe NAND_BBT_SCANBYTE1AND6 behavior is very dangerous.
>>> We have a ST flash where ecc where but on bit 5 and 6.
>>> With new kernel all block are bad.
>>>
>>> Is this option is really needed ?
>>> ST datasheet say [1]. We already check the first Word.
>>> Why do we need to check the 6th Byte ?
>> I agree with Matthieu, NAND_BBT_SCANBYTE1AND6 code also seems wrong to me.
> 
> This just means that we need a better way for drivers to inform the
> generic code about how exactly blocks are marked as bad. Probably
> drivers could describe this with a data structure, and sometimes even
> provide a "is_block_bad()" function.
> 
> The options seem to be not enough.
> 
I think we should also unify bad block scanning.

In the current code bad block scanning could be done by :
- chip->block_bad (default nand_block_bad)
- nand_isbad_bbt


Why nand_isbad_bbt doesn't call chip->block_bad and implement its own scanning
code (scan_block_full or scan_block_fast) ?

This is bad because chip->block_bad can be modified by a driver, but
nand_isbad_bbt won't use it.

Also nand_block_bad and nand_isbad_bbt doesn't use the same scanning pattern.


Matthieu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dangerous NAND_BBT_SCANBYTE1AND6
  2011-04-21 17:33 ` Brian Norris
@ 2011-04-22  9:02   ` Matthieu CASTET
  2011-04-26  7:30     ` Ricard Wanderlof
  0 siblings, 1 reply; 17+ messages in thread
From: Matthieu CASTET @ 2011-04-22  9:02 UTC (permalink / raw)
  To: Brian Norris; +Cc: linux-mtd, Brian Norris

Hi,

Brian Norris a écrit :
> Hi
> 
> On 4/21/2011 8:52 AM, Matthieu CASTET wrote:
>>
>> [1]
>> The devices are supplied with all the locations inside valid blocks erased
>> (FFh). The Bad
>> Block Information is written prior to shipping. Any block, where the 1st and 6th
>> Bytes, or 1st
>> Word, in the spare area of the 1st page, does not contain FFh, is a Bad Block.
> 
> I've tried my best to verify that any modifications I have made to bad 
> block scanning comply with the data sheets, but I very well could have 
> made mistakes (especially since there are so many different types of 
> scanning patterns, and very few manufacturers are actually being 
> consistent with these things).
Did you ask some clarification to manufacturers ?

> 
> That being said, I believe that the data sheet you quoted has some answer:
> "Any block, where the 1st and 6th Bytes, or 1st Word, in the spare area 
> of the 1st page, does not contain FFh, is a Bad Block."
> AFAICT, this description means that x8 buswidth devices must scan bytes 
> 1 and 6 while x16 devices only need to scan the first word. 
Did you see real case where 6 was not 0xff but 1 was 0xff ?


> So I bet 
> your device is actually an x8 device and so the 1st/6th byte pattern is 
> correct. I think the fact that this conflicts with your ECC patterns is 
> something you must deal with.
I don't agree, that's a big mtd regression. If you update your kernel on such
flash, you brick it.


Matthieu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dangerous NAND_BBT_SCANBYTE1AND6
  2011-04-22  8:53     ` Matthieu CASTET
@ 2011-04-22  9:28       ` Artem Bityutskiy
  0 siblings, 0 replies; 17+ messages in thread
From: Artem Bityutskiy @ 2011-04-22  9:28 UTC (permalink / raw)
  To: Matthieu CASTET; +Cc: Ivan Djelic, linux-mtd, Brian Norris

On Fri, 2011-04-22 at 10:53 +0200, Matthieu CASTET wrote:
> Artem Bityutskiy a écrit :
> > On Thu, 2011-04-21 at 19:10 +0200, Ivan Djelic wrote:
> >> On Thu, Apr 21, 2011 at 04:52:59PM +0100, Matthieu Castet wrote:
> >>> Hi,
> >>>
> >>> I believe NAND_BBT_SCANBYTE1AND6 behavior is very dangerous.
> >>> We have a ST flash where ecc where but on bit 5 and 6.
> >>> With new kernel all block are bad.
> >>>
> >>> Is this option is really needed ?
> >>> ST datasheet say [1]. We already check the first Word.
> >>> Why do we need to check the 6th Byte ?
> >> I agree with Matthieu, NAND_BBT_SCANBYTE1AND6 code also seems wrong to me.
> > 
> > This just means that we need a better way for drivers to inform the
> > generic code about how exactly blocks are marked as bad. Probably
> > drivers could describe this with a data structure, and sometimes even
> > provide a "is_block_bad()" function.
> > 
> > The options seem to be not enough.
> > 
> I think we should also unify bad block scanning.

Sure, just do this in small incremental steps, send small incremental
patches with nice description (and tested). The point is - you should
not wait when someone else fixes this for you - i do not think this
happens. Additional thing - if you are using MTD and interested in its
stability - review others patches which touch the area of your
interests :-)

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dangerous NAND_BBT_SCANBYTE1AND6
  2011-04-22  9:02   ` Matthieu CASTET
@ 2011-04-26  7:30     ` Ricard Wanderlof
  2011-05-24  1:09       ` Brian Norris
  0 siblings, 1 reply; 17+ messages in thread
From: Ricard Wanderlof @ 2011-04-26  7:30 UTC (permalink / raw)
  To: Matthieu CASTET; +Cc: Brian Norris, linux-mtd, Brian Norris


On Fri, 22 Apr 2011, Matthieu CASTET wrote:

>> So I bet
>> your device is actually an x8 device and so the 1st/6th byte pattern is
>> correct. I think the fact that this conflicts with your ECC patterns is
>> something you must deal with.
> I don't agree, that's a big mtd regression. If you update your kernel on such
> flash, you brick it.

I agree, even if the behavior may have been incorrect in the past, we 
should think very carefully about changing this for exactly this reason.

/Ricard
-- 
Ricard Wolf Wanderlöf                           ricardw(at)axis.com
Axis Communications AB, Lund, Sweden            www.axis.com
Phone +46 46 272 2016                           Fax +46 46 13 61 30

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dangerous NAND_BBT_SCANBYTE1AND6
  2011-04-26  7:30     ` Ricard Wanderlof
@ 2011-05-24  1:09       ` Brian Norris
  2011-05-25 16:41         ` Ivan Djelic
  0 siblings, 1 reply; 17+ messages in thread
From: Brian Norris @ 2011-05-24  1:09 UTC (permalink / raw)
  To: Ricard Wanderlof
  Cc: Ivan Djelic, linux-mtd, Matthieu CASTET, Artem Bityutskiy

Hi,

Sorry this thread has been sitting for a long time. I've been very
busy and haven't had time for MTD stuff.

>From Matthieu:
"Did you ask some clarification to manufacturers ?"

No, unfortunately, I did not. I didn't realize the "regression" issues
at the time, so I didn't think to look further than my interpretation
of the datasheets.

On Tue, Apr 26, 2011 at 12:30 AM, Ricard Wanderlof
<ricard.wanderlof@axis.com> wrote:
>
> On Fri, 22 Apr 2011, Matthieu CASTET wrote:
>
>>> So I bet
>>> your device is actually an x8 device and so the 1st/6th byte pattern is
>>> correct. I think the fact that this conflicts with your ECC patterns is
>>> something you must deal with.
>>
>> I don't agree, that's a big mtd regression. If you update your kernel on such
>> flash, you brick it.
>
> I agree, even if the behavior may have been incorrect in the past, we should think very carefully about changing this for exactly this reason.

Right, I see how this could be a problem. So for a resolution, I'd ask
for suggestions on which of the following seems best:
1) Completely revert the SCANBYTE1AND6 change
2) Remove the option from nand_get_flash_type(), still allowing
drivers to enable the scan option themselves
3) Have nand_get_flash_type() use ECC layout information to decide to
scan bytes 1+6 or just byte 1 only

Regarding correctness:
As far as I can tell, no one has found a definitive answer on the
manufacturer intention, right? I'm now leaning toward the intention
that software only needs to scan *either* byte 1 *or* byte 6, but I
don't know for sure.

Thanks,
Brian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dangerous NAND_BBT_SCANBYTE1AND6
  2011-05-24  1:09       ` Brian Norris
@ 2011-05-25 16:41         ` Ivan Djelic
  2011-05-25 18:04           ` Atlant Schmidt
  2011-05-26  7:07           ` Ricard Wanderlof
  0 siblings, 2 replies; 17+ messages in thread
From: Ivan Djelic @ 2011-05-25 16:41 UTC (permalink / raw)
  To: Brian Norris
  Cc: linux-mtd, Ricard Wanderlof, Matthieu Castet, Artem Bityutskiy

On Tue, May 24, 2011 at 02:09:10AM +0100, Brian Norris wrote:
> >>> So I bet
> >>> your device is actually an x8 device and so the 1st/6th byte pattern is
> >>> correct. I think the fact that this conflicts with your ECC patterns is
> >>> something you must deal with.
> >>
> >> I don't agree, that's a big mtd regression. If you update your kernel on such
> >> flash, you brick it.
> >
> > I agree, even if the behavior may have been incorrect in the past, we should think very carefully about changing this for exactly this reason.
> 
> Right, I see how this could be a problem. So for a resolution, I'd ask
> for suggestions on which of the following seems best:
> 1) Completely revert the SCANBYTE1AND6 change
> 2) Remove the option from nand_get_flash_type(), still allowing
> drivers to enable the scan option themselves
> 3) Have nand_get_flash_type() use ECC layout information to decide to
> scan bytes 1+6 or just byte 1 only
> 
> Regarding correctness:
> As far as I can tell, no one has found a definitive answer on the
> manufacturer intention, right? I'm now leaning toward the intention
> that software only needs to scan *either* byte 1 *or* byte 6, but I
> don't know for sure.

Hello Brian,

Here is a relevant excerpt from a 2004 STM application note (AN1819):

  RECOGNIZING BAD BLOCKS
  The devices are supplied with all the locations inside valid blocks
  erased (FFh). The Bad Block Information is written prior to shipping.
  For 528 Byte/256 Word Page (NANDxxx-A) devices, any block where the
  6th Byte/ 1st Word in the spare area of the 1st page does not contain
  FFh is a Bad Block.  For 2112 Byte/1056 Word Page devices, any block,
  where the 1st and 6th Bytes, or 1st Word, in the spare area of the 1st
  page, does not contain FFh, is a Bad Block.

If we check only the 1st byte, we just need to make sure that there is no
possibility of having a good erased block with:
- 1st byte == bad block marker (usually 0x00)
and
- 6th byte == 0xff

I believe this is unlikely; or rather, it _was_ totally unlikely in 2004 when
the application note was written.

Therefore, I think we can safely use only the 1st marker byte to detect factory
bad blocks in that case (STM large page); the manufacturer simply guarantees
that both markers are written when a factory bad block is marked. It does not
require you to check both bytes.

<digression>
The above note is probably not applicable to recent devices. Because bitflips
are much more likely to appear, saying that a specific byte marks a bad block
if it "does not contain FFh" it not realistic. ONFI 2.2 clarifies the issue and
states that a marker is 0x00, not just a byte that "does not contain FFh".
And recent Micron devices do not store markers in flash; they just return 0x00
for any byte read in a bad block (instead of the real data), using an internal
bad block table.
</digression>

I suggest we revert the SCANBYTE1AND6 change, because:
- it breaks existing ecc layouts
- factory bad blocks in relevant STM nands can be detected without checking the
  6th byte

Best Regards,

Ivan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: dangerous NAND_BBT_SCANBYTE1AND6
  2011-05-25 16:41         ` Ivan Djelic
@ 2011-05-25 18:04           ` Atlant Schmidt
  2011-05-25 18:31             ` Ivan Djelic
  2011-05-26  7:07           ` Ricard Wanderlof
  1 sibling, 1 reply; 17+ messages in thread
From: Atlant Schmidt @ 2011-05-25 18:04 UTC (permalink / raw)
  To: 'Ivan Djelic', Brian Norris
  Cc: Ricard Wanderlof, linux-mtd, Matthieu Castet, Artem Bityutskiy

Ivan:

> <digression> ...
> And recent Micron devices do not store markers in flash; they just
> return 0x00 for any byte read in a bad block (instead of the real
> data), using an internal bad block table.
> </digression>

  Does this mean that it is impossible to mark additional
  bad blocks in these devices as blocks go hard-bad during
  use? Or do commands exist to extend the internal bad
  block table? (And do our MTD drivers know how to do that?)

                            Atlant

-----Original Message-----
From: linux-mtd-bounces@lists.infradead.org [mailto:linux-mtd-bounces@lists.infradead.org] On Behalf Of Ivan Djelic
Sent: Wednesday, May 25, 2011 12:41
To: Brian Norris
Cc: linux-mtd@lists.infradead.org; Ricard Wanderlof; Matthieu Castet; Artem Bityutskiy
Subject: Re: dangerous NAND_BBT_SCANBYTE1AND6

On Tue, May 24, 2011 at 02:09:10AM +0100, Brian Norris wrote:
> >>> So I bet
> >>> your device is actually an x8 device and so the 1st/6th byte pattern is
> >>> correct. I think the fact that this conflicts with your ECC patterns is
> >>> something you must deal with.
> >>
> >> I don't agree, that's a big mtd regression. If you update your kernel on such
> >> flash, you brick it.
> >
> > I agree, even if the behavior may have been incorrect in the past, we should think very carefully about changing this for exactly this reason.
>
> Right, I see how this could be a problem. So for a resolution, I'd ask
> for suggestions on which of the following seems best:
> 1) Completely revert the SCANBYTE1AND6 change
> 2) Remove the option from nand_get_flash_type(), still allowing
> drivers to enable the scan option themselves
> 3) Have nand_get_flash_type() use ECC layout information to decide to
> scan bytes 1+6 or just byte 1 only
>
> Regarding correctness:
> As far as I can tell, no one has found a definitive answer on the
> manufacturer intention, right? I'm now leaning toward the intention
> that software only needs to scan *either* byte 1 *or* byte 6, but I
> don't know for sure.

Hello Brian,

Here is a relevant excerpt from a 2004 STM application note (AN1819):

  RECOGNIZING BAD BLOCKS
  The devices are supplied with all the locations inside valid blocks
  erased (FFh). The Bad Block Information is written prior to shipping.
  For 528 Byte/256 Word Page (NANDxxx-A) devices, any block where the
  6th Byte/ 1st Word in the spare area of the 1st page does not contain
  FFh is a Bad Block.  For 2112 Byte/1056 Word Page devices, any block,
  where the 1st and 6th Bytes, or 1st Word, in the spare area of the 1st
  page, does not contain FFh, is a Bad Block.

If we check only the 1st byte, we just need to make sure that there is no
possibility of having a good erased block with:
- 1st byte == bad block marker (usually 0x00)
and
- 6th byte == 0xff

I believe this is unlikely; or rather, it _was_ totally unlikely in 2004 when
the application note was written.

Therefore, I think we can safely use only the 1st marker byte to detect factory
bad blocks in that case (STM large page); the manufacturer simply guarantees
that both markers are written when a factory bad block is marked. It does not
require you to check both bytes.

<digression>
The above note is probably not applicable to recent devices. Because bitflips
are much more likely to appear, saying that a specific byte marks a bad block
if it "does not contain FFh" it not realistic. ONFI 2.2 clarifies the issue and
states that a marker is 0x00, not just a byte that "does not contain FFh".
And recent Micron devices do not store markers in flash; they just return 0x00
for any byte read in a bad block (instead of the real data), using an internal
bad block table.
</digression>

I suggest we revert the SCANBYTE1AND6 change, because:
- it breaks existing ecc layouts
- factory bad blocks in relevant STM nands can be detected without checking the
  6th byte

Best Regards,

Ivan

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

This e-mail and the information, including any attachments, it contains are intended to be a confidential communication only to the person or entity to whom it is addressed and may contain information that is privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender and destroy the original message.

Thank you.

Please consider the environment before printing this email.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dangerous NAND_BBT_SCANBYTE1AND6
  2011-05-25 18:04           ` Atlant Schmidt
@ 2011-05-25 18:31             ` Ivan Djelic
  2011-05-26  7:09               ` Ricard Wanderlof
  0 siblings, 1 reply; 17+ messages in thread
From: Ivan Djelic @ 2011-05-25 18:31 UTC (permalink / raw)
  To: Atlant Schmidt
  Cc: Ricard Wanderlof, Brian Norris, linux-mtd, Matthieu Castet,
	Artem Bityutskiy

On Wed, May 25, 2011 at 07:04:40PM +0100, Atlant Schmidt wrote:
> Ivan:
> 
> > <digression> ...
> > And recent Micron devices do not store markers in flash; they just
> > return 0x00 for any byte read in a bad block (instead of the real
> > data), using an internal bad block table.
> > </digression>
> 
>   Does this mean that it is impossible to mark additional
>   bad blocks in these devices as blocks go hard-bad during
>   use? Or do commands exist to extend the internal bad
>   block table? (And do our MTD drivers know how to do that?)
> 

Note that the usual bad block detection still works on those Micron devices.
They just do not store markers in flash.

You can still mark a block gone bad either by writing your own marker into the
block or (better) in a separate BBT. The internal Micron table is hard-wired
and only used to shortcut access to factory bad blocks AFAIK.

Regards,

Ivan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dangerous NAND_BBT_SCANBYTE1AND6
  2011-05-25 16:41         ` Ivan Djelic
  2011-05-25 18:04           ` Atlant Schmidt
@ 2011-05-26  7:07           ` Ricard Wanderlof
  2011-05-26  7:57             ` Ivan Djelic
  1 sibling, 1 reply; 17+ messages in thread
From: Ricard Wanderlof @ 2011-05-26  7:07 UTC (permalink / raw)
  To: Ivan Djelic
  Cc: Ricard Wanderlöf, Brian Norris, linux-mtd, Matthieu Castet,
	Artem Bityutskiy


On Wed, 25 May 2011, Ivan Djelic wrote:

> Here is a relevant excerpt from a 2004 STM application note (AN1819):
>
>  RECOGNIZING BAD BLOCKS
>  The devices are supplied with all the locations inside valid blocks
>  erased (FFh). The Bad Block Information is written prior to shipping.
>  For 528 Byte/256 Word Page (NANDxxx-A) devices, any block where the
>  6th Byte/ 1st Word in the spare area of the 1st page does not contain
>  FFh is a Bad Block.  For 2112 Byte/1056 Word Page devices, any block,
>  where the 1st and 6th Bytes, or 1st Word, in the spare area of the 1st
>  page, does not contain FFh, is a Bad Block.
> ...
> <digression>
> The above note is probably not applicable to recent devices. Because bitflips
> are much more likely to appear, saying that a specific byte marks a bad block
> if it "does not contain FFh" it not realistic. ONFI 2.2 clarifies the issue and
> states that a marker is 0x00, not just a byte that "does not contain FFh".
> And recent Micron devices do not store markers in flash; they just return 0x00
> for any byte read in a bad block (instead of the real data), using an internal
> bad block table.
> </digression>

I'm probably wrong here, but I just want to throw this thought into the 
pot: I always thought that the reason for the 'not contain FFh' phrasing 
was that there could be something physically wrong in a bad block so that 
some bits could not be programmed to 0. Saying 'not contain FFh' would be 
a way of saying 'we try to set all bytes to 0 but if for some reason some 
bits are stuck at 1 still treat a non-FFh word as a bad block marker'.

This of course does not harmonize with ONFI 2.2. It's just me trying to 
read between the lines of the specs.

/Ricard
--
Ricard Wolf Wanderlöf                           ricardw(at)axis.com
Axis Communications AB, Lund, Sweden            www.axis.com
Phone +46 46 272 2016                           Fax +46 46 13 61 30

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dangerous NAND_BBT_SCANBYTE1AND6
  2011-05-25 18:31             ` Ivan Djelic
@ 2011-05-26  7:09               ` Ricard Wanderlof
  2011-05-26  7:58                 ` Ivan Djelic
  0 siblings, 1 reply; 17+ messages in thread
From: Ricard Wanderlof @ 2011-05-26  7:09 UTC (permalink / raw)
  To: Ivan Djelic
  Cc: Artem Bityutskiy, Matthieu Castet, Ricard Wanderlöf,
	linux-mtd, Atlant Schmidt, Brian Norris


On Wed, 25 May 2011, Ivan Djelic wrote:

>>> <digression> ...
>>> And recent Micron devices do not store markers in flash; they just
>>> return 0x00 for any byte read in a bad block (instead of the real
>>> data), using an internal bad block table.
>>> </digression>
>> ...
> Note that the usual bad block detection still works on those Micron devices.
> They just do not store markers in flash.
>
> You can still mark a block gone bad either by writing your own marker into the
> block or (better) in a separate BBT. The internal Micron table is hard-wired
> and only used to shortcut access to factory bad blocks AFAIK.

Does this also mean that if you for some reason screw up and mark lots of 
(good) blocks as bad, you can just erase all blocks in the flash; the 
factory-bad ones will refuse to be erased thanks to the on-chip bbt?

/Ricard
-- 
Ricard Wolf Wanderlöf                           ricardw(at)axis.com
Axis Communications AB, Lund, Sweden            www.axis.com
Phone +46 46 272 2016                           Fax +46 46 13 61 30

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dangerous NAND_BBT_SCANBYTE1AND6
  2011-05-26  7:07           ` Ricard Wanderlof
@ 2011-05-26  7:57             ` Ivan Djelic
  0 siblings, 0 replies; 17+ messages in thread
From: Ivan Djelic @ 2011-05-26  7:57 UTC (permalink / raw)
  To: Ricard Wanderlof
  Cc: Ricard Wanderlöf, Brian Norris, linux-mtd, Matthieu Castet,
	Artem Bityutskiy

On Thu, May 26, 2011 at 08:07:36AM +0100, Ricard Wanderlof wrote:
> > ...
> > <digression>
> > The above note is probably not applicable to recent devices. Because bitflips
> > are much more likely to appear, saying that a specific byte marks a bad block
> > if it "does not contain FFh" it not realistic. ONFI 2.2 clarifies the issue and
> > states that a marker is 0x00, not just a byte that "does not contain FFh".
> > And recent Micron devices do not store markers in flash; they just return 0x00
> > for any byte read in a bad block (instead of the real data), using an internal
> > bad block table.
> > </digression>
> 
> I'm probably wrong here, but I just want to throw this thought into the 
> pot: I always thought that the reason for the 'not contain FFh' phrasing 
> was that there could be something physically wrong in a bad block so that 
> some bits could not be programmed to 0. Saying 'not contain FFh' would be 
> a way of saying 'we try to set all bytes to 0 but if for some reason some 
> bits are stuck at 1 still treat a non-FFh word as a bad block marker'.

I agree. The safest way to check a bb marker is probably to count the number of
set bits, and compare it to a threshold; instead of just comparing it to 0xff or
0x00.
But even if you stick to a simple comparison with 0x00 or 0xff on old nand
devices, I guess the probability that a bit is stuck in the marker is very low
and would maybe result in a few spurious bad blocks in a large set of devices.

Ivan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dangerous NAND_BBT_SCANBYTE1AND6
  2011-05-26  7:09               ` Ricard Wanderlof
@ 2011-05-26  7:58                 ` Ivan Djelic
  0 siblings, 0 replies; 17+ messages in thread
From: Ivan Djelic @ 2011-05-26  7:58 UTC (permalink / raw)
  To: Ricard Wanderlof
  Cc: Artem Bityutskiy, Matthieu Castet, Ricard Wanderlöf,
	linux-mtd, Atlant Schmidt, Brian Norris

On Thu, May 26, 2011 at 08:09:15AM +0100, Ricard Wanderlof wrote:
> 
> On Wed, 25 May 2011, Ivan Djelic wrote:
> 
> >>> <digression> ...
> >>> And recent Micron devices do not store markers in flash; they just
> >>> return 0x00 for any byte read in a bad block (instead of the real
> >>> data), using an internal bad block table.
> >>> </digression>
> >> ...
> > Note that the usual bad block detection still works on those Micron devices.
> > They just do not store markers in flash.
> >
> > You can still mark a block gone bad either by writing your own marker into the
> > block or (better) in a separate BBT. The internal Micron table is hard-wired
> > and only used to shortcut access to factory bad blocks AFAIK.
> 
> Does this also mean that if you for some reason screw up and mark lots of 
> (good) blocks as bad, you can just erase all blocks in the flash; the 
> factory-bad ones will refuse to be erased thanks to the on-chip bbt?

Exactly.

Ivan

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-05-26  7:59 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-21 15:52 dangerous NAND_BBT_SCANBYTE1AND6 Matthieu CASTET
2011-04-21 17:10 ` Ivan Djelic
2011-04-22  4:50   ` Brian Norris
2011-04-22  8:23   ` Artem Bityutskiy
2011-04-22  8:53     ` Matthieu CASTET
2011-04-22  9:28       ` Artem Bityutskiy
2011-04-21 17:33 ` Brian Norris
2011-04-22  9:02   ` Matthieu CASTET
2011-04-26  7:30     ` Ricard Wanderlof
2011-05-24  1:09       ` Brian Norris
2011-05-25 16:41         ` Ivan Djelic
2011-05-25 18:04           ` Atlant Schmidt
2011-05-25 18:31             ` Ivan Djelic
2011-05-26  7:09               ` Ricard Wanderlof
2011-05-26  7:58                 ` Ivan Djelic
2011-05-26  7:07           ` Ricard Wanderlof
2011-05-26  7:57             ` Ivan Djelic

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.