All of lore.kernel.org
 help / color / mirror / Atom feed
* [QUESTION] MLC NAND and ECC over OOB area
@ 2012-10-10 17:31 Charles Hardin
  2012-10-10 18:13 ` Ivan Djelic
  0 siblings, 1 reply; 5+ messages in thread
From: Charles Hardin @ 2012-10-10 17:31 UTC (permalink / raw)
  To: linux-mtd; +Cc: Mike Dunn, Jeremy Fitzhardinge

All,

So, in working with the latest linux MTD driver updates to support BCH ECC and other requirements for MLC NAND parts, I noticed something that was a bit odd from the data sheets and a few conversations with some Flash Hardware engineers that bit errors are expected in the OOB area as well.

For instance, on a MT29F16G08CBACA - the error correction is stated in the data sheet to be 24-bit over 1080 bytes (which I thought was a typo and should have been 1024).

This makes sense once you get the stats of the NAND part that is 

Page Size: 4096
OOB: 224
Total: 4096 + 224 = 4320
From the data sheet: 1080 * 4 = 4320

After testing the NAND I did find bit errors in the OOB (the bad block marker pattern in particular) so ECC is required over the "mtd->writesize + mtd->oobsize".

This is not supported in the latest mtd drivers in the 3.6 or 3.5 because of some assumptions in nand_base.c like…

int nand_scan_tail(struct mtd_info *mtd)
{
	… snip snip snip …

        /*
         * Set the number of read / write steps for one page depending on ECC
         * mode.
         */
        chip->ecc.steps = mtd->writesize / chip->ecc.size;
        if (chip->ecc.steps * chip->ecc.size != mtd->writesize) {
                pr_warn("Invalid ECC parameters\n");
                BUG();
        }
        chip->ecc.total = chip->ecc.steps * chip->ecc.bytes;

	… snip snip snip …
}

This assumption that the ecc is only over the "write size" might permeate more of the code, so I really have two questions
- Is anyone already working on handling ECC including the OOB as well?
- Is the expectation that this is going to have to be flagged as specific to MLC or a generic relayout of the ECC when using BCH over the entire data set instead of the write payload alone?

Regards,
Charles Hardin

--
This email is probably not that important

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [QUESTION] MLC NAND and ECC over OOB area
  2012-10-10 17:31 [QUESTION] MLC NAND and ECC over OOB area Charles Hardin
@ 2012-10-10 18:13 ` Ivan Djelic
  2012-10-11 18:22   ` Charles Hardin
  0 siblings, 1 reply; 5+ messages in thread
From: Ivan Djelic @ 2012-10-10 18:13 UTC (permalink / raw)
  To: Charles Hardin; +Cc: Mike Dunn, linux-mtd, Jeremy Fitzhardinge

On Wed, Oct 10, 2012 at 06:31:08PM +0100, Charles Hardin wrote:
> All,
> 
> So, in working with the latest linux MTD driver updates to support BCH ECC and other requirements for MLC NAND parts, I noticed something that was a bit odd from the data sheets and a few conversations with some Flash Hardware engineers that bit errors are expected in the OOB area as well.
> 
> For instance, on a MT29F16G08CBACA - the error correction is stated in the data sheet to be 24-bit over 1080 bytes (which I thought was a typo and should have been 1024).
> 
> This makes sense once you get the stats of the NAND part that is 
> 
> Page Size: 4096
> OOB: 224
> Total: 4096 + 224 = 4320
> From the data sheet: 1080 * 4 = 4320
> 
> After testing the NAND I did find bit errors in the OOB (the bad block marker pattern in particular) so ECC is required over the "mtd->writesize + mtd->oobsize".
> 
> This is not supported in the latest mtd drivers in the 3.6 or 3.5 because of some assumptions in nand_base.c like…
> 
> int nand_scan_tail(struct mtd_info *mtd)
> {
> 	… snip snip snip …
> 
>         /*
>          * Set the number of read / write steps for one page depending on ECC
>          * mode.
>          */
>         chip->ecc.steps = mtd->writesize / chip->ecc.size;
>         if (chip->ecc.steps * chip->ecc.size != mtd->writesize) {
>                 pr_warn("Invalid ECC parameters\n");
>                 BUG();
>         }
>         chip->ecc.total = chip->ecc.steps * chip->ecc.bytes;
> 
> 	… snip snip snip …
> }
> 
> This assumption that the ecc is only over the "write size" might permeate more of the code, so I really have two questions
> - Is anyone already working on handling ECC including the OOB as well?
> - Is the expectation that this is going to have to be flagged as specific to MLC or a generic relayout of the ECC when using BCH over the entire data set instead of the write payload alone?

Hi Charles,

The OOB area is primarily used to store:
- a bad block marker
- ECC bytes

The ECC is generally designed to protect the data area, and the ECC bytes themselves (data+ECC bytes form a long codeword).
The bad block marker normally has two useful values only: 0xff or 0x00, therefore it can be successfully read even with
several bitflips, by looking at its Hamming weight (the number of 1s in the byte).

Now, if you want to store additional stuff (metadata) in the OOB area, _then_ you will need an ECC that covers data + metadata.
This is possible with some drivers; this can also be done with the software BCH library (with a patch).

But the general trend is to avoid using the OOB area for that, because you never know how much space the next NAND generation
will require for ECC, and you'll be in trouble once your metadata does not fit anymore in the remaining space.

BR,
-- 
Ivan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [QUESTION] MLC NAND and ECC over OOB area
  2012-10-10 18:13 ` Ivan Djelic
@ 2012-10-11 18:22   ` Charles Hardin
  2012-10-11 19:02     ` Charles Hardin
  2012-10-12  7:57     ` Matthieu CASTET
  0 siblings, 2 replies; 5+ messages in thread
From: Charles Hardin @ 2012-10-11 18:22 UTC (permalink / raw)
  To: Ivan Djelic; +Cc: Mike Dunn, linux-mtd, Jeremy Fitzhardinge

On Oct 10, 2012, at 11:13 AM, Ivan Djelic wrote:

> 
> Hi Charles,
> 
> The OOB area is primarily used to store:
> - a bad block marker
> - ECC bytes
> 
> The ECC is generally designed to protect the data area, and the ECC bytes themselves (data+ECC bytes form a long codeword).
> The bad block marker normally has two useful values only: 0xff or 0x00, therefore it can be successfully read even with
> several bitflips, by looking at its Hamming weight (the number of 1s in the byte).
> 
> Now, if you want to store additional stuff (metadata) in the OOB area, _then_ you will need an ECC that covers data + metadata.
> This is possible with some drivers; this can also be done with the software BCH library (with a patch).
> 
> But the general trend is to avoid using the OOB area for that, because you never know how much space the next NAND generation
> will require for ECC, and you'll be in trouble once your metadata does not fit anymore in the remaining space.
> 
> BR,
> -- 
> Ivan


Ok, that seems reasonable - and, following this line of thought - then the expectation would be to change the nand_bbt.c code to check for the "weight of a bad block pattern" instead of looking for an exact match on the pattern.

For example the following would change:

/**
 * check_short_pattern - [GENERIC] check if a pattern is in the buffer
 * @buf: the buffer to search
 * @td: search pattern descriptor
 *
 * Check for a pattern at the given place. Used to search bad block tables and
 * good / bad block identifiers. Same as check_pattern, but no optional empty
 * check.
 */
static int check_short_pattern(uint8_t *buf, struct nand_bbt_descr *td)
{
        int i;
        uint8_t *p = buf;

        /* Compare the pattern */
        for (i = 0; i < td->len; i++) {
                if (p[td->offs + i] != td->pattern[i])
                        return -1;
        }
        return 0;
}

to something like the pseudo code below:

static in check_short_pattern(uint8_t *buf, struct nand_bbt_descr *td)
{
	int i;
	int weight = 0;
	int tweight = 0;
	uint8_t *p = buf;

	/* Compute and compare the weight */
	for (i = 0; i < td->len; i++) {
		uint8_t w;

		w = p[td->offs+i];
		w = (w & 0x55) + ((w >> 1) & 0x55);
		w = (w & 0x33) + ((w >> 2) & 0x33);
		w = (w & 0x0f) + ((w >> 4) & 0x0f);
		weight += w;

		w = td->pattern[i];
		w = (w & 0x55) + ((w >> 1) & 0x55);
		w = (w & 0x33) + ((w >> 2) & 0x33);
		w = (w & 0x0f) + ((w >> 4) & 0x0f);
		tweight += w;
	}

	/* if the weight of the buf is more then half the weight of the pattern */
	if (weight > (tweight / 2))
		return 0;

	return -1;
}

This is a not an overly complex concept for nand_bbt code path, but it would change the exact match done currently to a heuristic match done by weight. My guess is that this behavior change will be done by the driver itself and override the nand_scan_bbt functions. However, I don't see an interface to plugin a specific BBT implementation in the underneath the nand_scan_bbt and nand_update_bbt functions currently in the hand drivers.

Regards,
Charles

--
Bits go in, bytes go out

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [QUESTION] MLC NAND and ECC over OOB area
  2012-10-11 18:22   ` Charles Hardin
@ 2012-10-11 19:02     ` Charles Hardin
  2012-10-12  7:57     ` Matthieu CASTET
  1 sibling, 0 replies; 5+ messages in thread
From: Charles Hardin @ 2012-10-11 19:02 UTC (permalink / raw)
  To: Ivan Djelic; +Cc: Mike Dunn, linux-mtd, Jeremy Fitzhardinge


On Oct 11, 2012, at 11:22 AM, Charles Hardin wrote:

> Ok, that seems reasonable - and, following this line of thought - then the expectation would be to change the nand_bbt.c code to check for the "weight of a bad block pattern" instead of looking for an exact match on the pattern.
> 
> For example the following would change:
> 
> /**
> * check_short_pattern - [GENERIC] check if a pattern is in the buffer
> * @buf: the buffer to search
> * @td: search pattern descriptor
> *
> * Check for a pattern at the given place. Used to search bad block tables and
> * good / bad block identifiers. Same as check_pattern, but no optional empty
> * check.
> */
> static int check_short_pattern(uint8_t *buf, struct nand_bbt_descr *td)
> {
>        int i;
>        uint8_t *p = buf;
> 
>        /* Compare the pattern */
>        for (i = 0; i < td->len; i++) {
>                if (p[td->offs + i] != td->pattern[i])
>                        return -1;
>        }
>        return 0;
> }
> 
> to something like the pseudo code below:
> 
> static in check_short_pattern(uint8_t *buf, struct nand_bbt_descr *td)
> {
> 	int i;
> 	int weight = 0;
> 	int tweight = 0;
> 	uint8_t *p = buf;
> 
> 	/* Compute and compare the weight */
> 	for (i = 0; i < td->len; i++) {
> 		uint8_t w;
> 
> 		w = p[td->offs+i];
> 		w = (w & 0x55) + ((w >> 1) & 0x55);
> 		w = (w & 0x33) + ((w >> 2) & 0x33);
> 		w = (w & 0x0f) + ((w >> 4) & 0x0f);
> 		weight += w;
> 
> 		w = td->pattern[i];
> 		w = (w & 0x55) + ((w >> 1) & 0x55);
> 		w = (w & 0x33) + ((w >> 2) & 0x33);
> 		w = (w & 0x0f) + ((w >> 4) & 0x0f);
> 		tweight += w;
> 	}
> 
> 	/* if the weight of the buf is more then half the weight of the pattern */
> 	if (weight > (tweight / 2))
> 		return 0;
> 
> 	return -1;
> }
> 
> This is a not an overly complex concept for nand_bbt code path, but it would change the exact match done currently to a heuristic match done by weight. My guess is that this behavior change will be done by the driver itself and override the nand_scan_bbt functions. However, I don't see an interface to plugin a specific BBT implementation in the underneath the nand_scan_bbt and nand_update_bbt functions currently in the hand drivers.
> 
> Regards,
> Charles


Oops, wrong call path - I just looked into nand_base.c and on the nand_block_bad function, I was looking at the generation of the in core memory bbt - I'll poke around a bit more now on the proper code path to construct a better world view.

Regards,
Charles

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [QUESTION] MLC NAND and ECC over OOB area
  2012-10-11 18:22   ` Charles Hardin
  2012-10-11 19:02     ` Charles Hardin
@ 2012-10-12  7:57     ` Matthieu CASTET
  1 sibling, 0 replies; 5+ messages in thread
From: Matthieu CASTET @ 2012-10-12  7:57 UTC (permalink / raw)
  To: Charles Hardin; +Cc: linux-mtd, Ivan Djelic, Mike Dunn, Jeremy Fitzhardinge

Charles Hardin a écrit :
> On Oct 10, 2012, at 11:13 AM, Ivan Djelic wrote:
> 
>> Hi Charles,
>>
>> The OOB area is primarily used to store:
>> - a bad block marker
>> - ECC bytes
>>
>> The ECC is generally designed to protect the data area, and the ECC bytes themselves (data+ECC bytes form a long codeword).
>> The bad block marker normally has two useful values only: 0xff or 0x00, therefore it can be successfully read even with
>> several bitflips, by looking at its Hamming weight (the number of 1s in the byte).
>>
>> Now, if you want to store additional stuff (metadata) in the OOB area, _then_ you will need an ECC that covers data + metadata.
>> This is possible with some drivers; this can also be done with the software BCH library (with a patch).
>>
>> But the general trend is to avoid using the OOB area for that, because you never know how much space the next NAND generation
>> will require for ECC, and you'll be in trouble once your metadata does not fit anymore in the remaining space.
>>
>> BR,
>> -- 
>> Ivan
> 
> 
> Ok, that seems reasonable - and, following this line of thought - then the expectation would be to change the nand_bbt.c code to check for the "weight of a bad block pattern" instead of looking for an exact match on the pattern.
> 
They were some work on this :
http://thread.gmane.org/gmane.linux.drivers.mtd/42243follow d

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-10-12  7:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-10 17:31 [QUESTION] MLC NAND and ECC over OOB area Charles Hardin
2012-10-10 18:13 ` Ivan Djelic
2012-10-11 18:22   ` Charles Hardin
2012-10-11 19:02     ` Charles Hardin
2012-10-12  7:57     ` Matthieu CASTET

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.