All of lore.kernel.org
 help / color / mirror / Atom feed
* On-die ECC support
@ 2017-01-09 21:37 Peter Rosin
  2017-01-09 21:46 ` Richard Weinberger
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Rosin @ 2017-01-09 21:37 UTC (permalink / raw)
  To: Richard Weinberger, linux-mtd

Hi!

AFAICT, on-die ECC support [1] (continues in [2]) was never upstreamed
and the patches no longer applies. I wonder if someone has done the
work to forward port them to recent kernels?

Cheers,
peda

[1] [RFC] On-die ECC support
    http://lists.infradead.org/pipermail/linux-mtd/2015-March/058458.html

    [PATCH 1/3] mtd: nand: Add on-die ECC support
    http://lists.infradead.org/pipermail/linux-mtd/2015-March/058459.html

    [PATCH 2/3] mtd: nand: Add support for raw access when using on-die ECC
    http://lists.infradead.org/pipermail/linux-mtd/2015-March/058457.html

    [PATCH 3/3] mtd: nand: Wire up on-die ECC support
    http://lists.infradead.org/pipermail/linux-mtd/2015-March/058456.html

[2] http://lists.infradead.org/pipermail/linux-mtd/2015-April/058846.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: On-die ECC support
  2017-01-09 21:37 On-die ECC support Peter Rosin
@ 2017-01-09 21:46 ` Richard Weinberger
  2017-01-10 14:54   ` Peter Rosin
  0 siblings, 1 reply; 12+ messages in thread
From: Richard Weinberger @ 2017-01-09 21:46 UTC (permalink / raw)
  To: Peter Rosin, linux-mtd

Peter,

Am 09.01.2017 um 22:37 schrieb Peter Rosin:
> Hi!
> 
> AFAICT, on-die ECC support [1] (continues in [2]) was never upstreamed
> and the patches no longer applies. I wonder if someone has done the
> work to forward port them to recent kernels?

Nope.

Supporting On-die ECC in a proper way is with the current MTD NAND code
not possible.
I stopped working on it since I have no longer access to working
On-die ECC hardware and On-die ECC is a dead horse.
It is slow and no longer available for recent NANDs.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: On-die ECC support
  2017-01-09 21:46 ` Richard Weinberger
@ 2017-01-10 14:54   ` Peter Rosin
  2017-01-10 15:30     ` Richard Weinberger
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Rosin @ 2017-01-10 14:54 UTC (permalink / raw)
  To: Richard Weinberger, linux-mtd

On 2017-01-09 22:46, Richard Weinberger wrote:
> Peter,
> 
> Am 09.01.2017 um 22:37 schrieb Peter Rosin:
>> Hi!
>>
>> AFAICT, on-die ECC support [1] (continues in [2]) was never upstreamed
>> and the patches no longer applies. I wonder if someone has done the
>> work to forward port them to recent kernels?
> 
> Nope.
> 
> Supporting On-die ECC in a proper way is with the current MTD NAND code
> not possible.

Hmm, has something fundamental changed since the patches were proposed
that makes it impossible? Or is it just the same as it was a couple of
years ago, but different mechanics (changes to layout handling etc)?

> I stopped working on it since I have no longer access to working
> On-die ECC hardware and On-die ECC is a dead horse.

Maybe so, but here I am with a desire to use a new kernel with old
hardware...

> It is slow and no longer available for recent NANDs.

Cheers,
peda

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: On-die ECC support
  2017-01-10 14:54   ` Peter Rosin
@ 2017-01-10 15:30     ` Richard Weinberger
  2017-01-10 15:43       ` Boris Brezillon
  0 siblings, 1 reply; 12+ messages in thread
From: Richard Weinberger @ 2017-01-10 15:30 UTC (permalink / raw)
  To: Peter Rosin; +Cc: linux-mtd, Boris Brezillon

Peter,

Am 10.01.2017 um 15:54 schrieb Peter Rosin:
> On 2017-01-09 22:46, Richard Weinberger wrote:
>> Peter,
>>
>> Am 09.01.2017 um 22:37 schrieb Peter Rosin:
>>> Hi!
>>>
>>> AFAICT, on-die ECC support [1] (continues in [2]) was never upstreamed
>>> and the patches no longer applies. I wonder if someone has done the
>>> work to forward port them to recent kernels?
>>
>> Nope.
>>
>> Supporting On-die ECC in a proper way is with the current MTD NAND code
>> not possible.
> 
> Hmm, has something fundamental changed since the patches were proposed
> that makes it impossible? Or is it just the same as it was a couple of
> years ago, but different mechanics (changes to layout handling etc)?

No, it turned out to be very complicated to support On-Die for all
possible NAND controllers.
Drivers can use existing helper functions or provide their own functions.
But for On-Die ECC you'd have to hook all of them.
That's why most On-Die ECC patches touch also a lot of driver code instead
only the NAND framework.

Of course it is doable and not rocket science this but it needs more work.
AFAIK also Boris thought about that while re-factoring the code.

>> I stopped working on it since I have no longer access to working
>> On-die ECC hardware and On-die ECC is a dead horse.
> 
> Maybe so, but here I am with a desire to use a new kernel with old
> hardware...

If you want to continue the work on the patches, you're very welcome.
Please see also the patches on BENAND support.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: On-die ECC support
  2017-01-10 15:30     ` Richard Weinberger
@ 2017-01-10 15:43       ` Boris Brezillon
  0 siblings, 0 replies; 12+ messages in thread
From: Boris Brezillon @ 2017-01-10 15:43 UTC (permalink / raw)
  To: Peter Rosin; +Cc: Richard Weinberger, linux-mtd

Hi Peter,

On Tue, 10 Jan 2017 16:30:01 +0100
Richard Weinberger <richard@nod.at> wrote:

> Peter,
> 
> Am 10.01.2017 um 15:54 schrieb Peter Rosin:
> > On 2017-01-09 22:46, Richard Weinberger wrote:  
> >> Peter,
> >>
> >> Am 09.01.2017 um 22:37 schrieb Peter Rosin:  
> >>> Hi!
> >>>
> >>> AFAICT, on-die ECC support [1] (continues in [2]) was never upstreamed
> >>> and the patches no longer applies. I wonder if someone has done the
> >>> work to forward port them to recent kernels?  
> >>
> >> Nope.
> >>
> >> Supporting On-die ECC in a proper way is with the current MTD NAND code
> >> not possible.  
> > 
> > Hmm, has something fundamental changed since the patches were proposed
> > that makes it impossible? Or is it just the same as it was a couple of
> > years ago, but different mechanics (changes to layout handling etc)?  
> 
> No, it turned out to be very complicated to support On-Die for all
> possible NAND controllers.
> Drivers can use existing helper functions or provide their own functions.
> But for On-Die ECC you'd have to hook all of them.
> That's why most On-Die ECC patches touch also a lot of driver code instead
> only the NAND framework.
> 
> Of course it is doable and not rocket science this but it needs more work.
> AFAIK also Boris thought about that while re-factoring the code.

Well, we have 2 problems here:
1/ On-die ECC is usually vendor specific and each vendor implements it
   with its own set of private cmd sequence. We thus need to provide
   generic helpers and ask vendor drivers to implement them correctly.
   Even if this is not enough, this series [1] should help address this
   problem.
2/ NAND controller drivers are free to support only a minimum set of
   cmds (basically, read/program and erase), and the core has
   currently no way to know what's supported by a NAND controller.
   That's IMO the biggest problem here. We need to improve the NAND
   framework to let NAND controller expose what they really support so
   that a decision can be taken at the core level regarding on-die ECC.
   We also need to dissociate the ECC/NAND controller concepts, which
   are usually tightly linked together in NAND controller drivers
   (with no way to disable the ECC engined embedded in the controller).

> 
> >> I stopped working on it since I have no longer access to working
> >> On-die ECC hardware and On-die ECC is a dead horse.  
> > 
> > Maybe so, but here I am with a desire to use a new kernel with old
> > hardware...  
> 
> If you want to continue the work on the patches, you're very welcome.
> Please see also the patches on BENAND support.

Yep, that's really the kind of feature that can help us improve the
NAND framework, so, as Richard said, I'd be happy to help anyone
interested in working on this feature.

Thanks,

Boris

[1]https://lkml.org/lkml/2017/1/9/270

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: on-die ECC support
  2013-03-22 21:14   ` David Mosberger-Tang
@ 2013-03-23 19:27     ` Peter Horton
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Horton @ 2013-03-23 19:27 UTC (permalink / raw)
  To: David Mosberger-Tang; +Cc: linux-mtd

On 22/03/2013 21:14, David Mosberger-Tang wrote:
>
> I ended up going with a brutally simple approach: read the page and if
> it gets the REWRITE status bit, read it again with on-die ECC turned
> off, then do a bitdiff of the data read (and the ECC bytes read) to
> count the number of bitflips.  While perhaps brutish, this approach
> has the advantage of only relying on documented features.  As long as
> the layout described by ecc->layout is accurate, this should work
> reliably.  Thus, Micron can screw around with the ECC encoding as they
> wish, as long as the layout doesn't change, we're fine.
>

I prefer this solution to ours, much simpler and probably faster too.

> Note that the patch below doesn't check for bitflips in the OOB areas
> that Micron calls "User metadata I".  We don't care about those bits
> since our file system doesn't use OOB at all.
>
> Do you have any recommendation on a good value for bitflip_threshold?
> 1 is obviously too small and 4 feels a bit like playing with fire, no?
>

IIRC we reprted only when there were three or more bit errors in a page. 
This would allow two bits with fixed states per page.

P.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: on-die ECC support
  2013-03-22  8:21 ` on-die " Peter Horton
@ 2013-03-22 21:14   ` David Mosberger-Tang
  2013-03-23 19:27     ` Peter Horton
  0 siblings, 1 reply; 12+ messages in thread
From: David Mosberger-Tang @ 2013-03-22 21:14 UTC (permalink / raw)
  To: Peter Horton; +Cc: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 2173 bytes --]

Wow, how did Ivan figure the ECC bit layouts?  I stared at it for a
while last night and then said: no way!

I ended up going with a brutally simple approach: read the page and if
it gets the REWRITE status bit, read it again with on-die ECC turned
off, then do a bitdiff of the data read (and the ECC bytes read) to
count the number of bitflips.  While perhaps brutish, this approach
has the advantage of only relying on documented features.  As long as
the layout described by ecc->layout is accurate, this should work
reliably.  Thus, Micron can screw around with the ECC encoding as they
wish, as long as the layout doesn't change, we're fine.

Note that the patch below doesn't check for bitflips in the OOB areas
that Micron calls "User metadata I".  We don't care about those bits
since our file system doesn't use OOB at all.

Do you have any recommendation on a good value for bitflip_threshold?
1 is obviously too small and 4 feels a bit like playing with fire, no?

  --david

On Fri, Mar 22, 2013 at 2:21 AM, Peter Horton <phorton@bitbox.co.uk> wrote:
> On 21/03/2013 20:00, David Mosberger-Tang wrote:
>>
>>
>> I missed your email until now.  You said:
>>
>>   > Be careful using the NAND status register to indicate bit flips. The
>>   > chips we were using set the status bit for any correction and this can
>>   > result in very high error counts. Some of the pages had bits which are
>>   > permanently 0 or 1 meaning they might need correction every read. We
>>   > worked around this by re-reading any page that the chip flagged as
>>   > corrected and using the software BCH to work out the actual number of
>>   > bits in error.
>>
>> Ouch.  I think you are right: it looks to me that the chip sets the
>> REWRITE_RECOMMENDED bit
>> for any bit flips.  Is the Linux BCH code compatible with the Micron code?
>>
>> Yes, using software BCH to figure out actual number of bitflips would
>> be the right approach then.
>> Is there a patch for that somewhere?
>
>
> Not a patch but the driver file is here :-
>
> http://files.bitbox.co.uk/public/nand-on-die-ecc/10CUL494_nand_2.6.37.c
>
> Credit for the Micron ECC layout goes to Ivan Djelic IIRC.
>
> P.
>

[-- Attachment #2: bitflip-detect.diff.txt --]
[-- Type: text/plain, Size: 6363 bytes --]

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index fa85dd5..43d8944 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -1167,7 +1167,7 @@ static int nand_read_page_swecc(struct mtd_info *mtd, struct nand_chip *chip,
  * @bufpoi: buffer to store read data
  */
 static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip,
-			uint32_t data_offs, uint32_t readlen, uint8_t *bufpoi)
+			uint32_t data_offs, uint32_t readlen, uint8_t *bufpoi, int page)
 {
 	int start_step, end_step, num_steps;
 	uint32_t *eccpos = chip->ecc.layout->eccpos;
@@ -1265,9 +1265,74 @@ set_on_die_ecc (struct mtd_info *mtd, struct nand_chip *chip, int on)
 	return (*chip->onfi_set_features)(mtd, chip, 0x90, data);
 }
 
-static int check_read_status_on_die (struct mtd_info *mtd, struct nand_chip *chip)
+/*
+ * Return the number of bits that differ between buffers SRC1 and
+ * SRC2, both of which are LEN bytes long.
+ *
+ * This code could be optimized for, but it only gets called on pages
+ * with bitflips and compared to the cost of migrating an eraseblock,
+ * the execution time here is trivial...
+ */
+static int
+bitdiff (const void *s1, const void *s2, size_t len)
 {
-	unsigned int max_bitflips = 0;
+	const uint8_t *src1 = s1, *src2 = s2;
+	int count = 0, i;
+
+	for (i = 0; i < len; ++i)
+		count += hweight8 (*src1++ ^ *src2++);
+	return count;
+}
+
+static int
+check_for_bitflips (struct mtd_info *mtd, struct nand_chip *chip, int page)
+{
+	int flips = 0, max_bitflips = 0, i, j, read_size;
+	uint8_t *chkbuf, *rawbuf, *chkoob, *rawoob;
+	uint32_t *eccpos;
+
+	chkbuf = chip->buffers->chkbuf;
+	rawbuf = chip->buffers->rawbuf;
+	read_size = mtd->writesize + mtd->oobsize;
+
+	/* Read entire page w/OOB area with on-die ECC on: */
+	chip->read_buf(mtd, chkbuf, read_size);
+
+	/* Re-read page with on-die ECC off: */
+	set_on_die_ecc(mtd, chip, 0);
+	{
+		chip->cmdfunc(mtd, NAND_CMD_READ0, 0x00, page);
+		chip->read_buf(mtd, rawbuf, read_size);
+	}
+	set_on_die_ecc(mtd, chip, 1);
+
+	chkoob = chkbuf + mtd->writesize;
+	rawoob = rawbuf + mtd->writesize;
+	eccpos = chip->ecc.layout->eccpos;
+	for (i = 0; i < chip->ecc.steps; ++i) {
+		/* Count bit flips in the actual data area: */
+		flips = bitdiff (chkbuf, rawbuf, chip->ecc.size);
+		/* Count bit flips in the ECC bytes: */
+		for (j = 0; j < chip->ecc.bytes; ++j) {
+			flips += hweight8(chkoob[*eccpos] ^ rawoob[*eccpos]);
+			++eccpos;
+		}
+		if (flips > 0)
+			mtd->ecc_stats.corrected += flips;
+		max_bitflips = max_t(int, max_bitflips, flips);
+		chkbuf += chip->ecc.size;
+		rawbuf += chip->ecc.size;
+	}
+
+	/* Re-issue the READ command for the actual data read that follows.  */
+	chip->cmdfunc(mtd, NAND_CMD_READ0, 0x00, page);
+
+	return max_bitflips;
+}
+
+static int check_read_status_on_die (struct mtd_info *mtd, struct nand_chip *chip, int page)
+{
+	int max_bitflips = 0;
 	uint8_t status;
 
 	/* Check ECC status of page just transferred into NAND's page buffer: */
@@ -1285,12 +1350,15 @@ static int check_read_status_on_die (struct mtd_info *mtd, struct nand_chip *chi
 		mtd->ecc_stats.failed++;
 	else if (status & NAND_STATUS_REWRITE) {
 		/*
-		 * Bit flip(s); we don't know how many, but since a rewrite is recommended
-		 * we should assume ecc.strength bitflips so that the higher-level software
-		 * is going to do something about it.
+		 * The Micron chips turn on the REWRITE status bit for
+		 * ANY bit flips.  Some pages have stuck bits, so we
+		 * don't want to migrate a block just because of
+		 * single bit errors because otherwise, that block
+		 * would effectively become unusable.  So, work out in
+		 * software what the max number of flipped bits is for
+		 * all subpages in a page:
 		 */
-		mtd->ecc_stats.corrected += chip->ecc.strength;
-		max_bitflips = chip->ecc.strength;
+		max_bitflips = check_for_bitflips (mtd, chip, page);
 	}
 	return max_bitflips;
 }
@@ -1304,7 +1372,7 @@ static int check_read_status_on_die (struct mtd_info *mtd, struct nand_chip *chi
  * @bufpoi: buffer to store read data
  */
 static int nand_read_subpage_on_die(struct mtd_info *mtd, struct nand_chip *chip,
-			uint32_t data_offs, uint32_t readlen, uint8_t *bufpoi)
+			uint32_t data_offs, uint32_t readlen, uint8_t *bufpoi, int page)
 {
 	int start_step, end_step, num_steps, ret;
 	int data_col_addr;
@@ -1324,7 +1392,7 @@ static int nand_read_subpage_on_die(struct mtd_info *mtd, struct nand_chip *chip
 
 	failed = mtd->ecc_stats.failed;
 
-	ret = check_read_status_on_die (mtd, chip);
+	ret = check_read_status_on_die (mtd, chip, page);
 	if (ret < 0 || mtd->ecc_stats.failed != failed) {
 		memset (p, 0, datafrag_len);
 		return ret;
@@ -1355,7 +1423,7 @@ static int nand_read_page_on_die(struct mtd_info *mtd, struct nand_chip *chip,
 
 	failed = mtd->ecc_stats.failed;
 
-	ret = check_read_status_on_die (mtd, chip);
+	ret = check_read_status_on_die (mtd, chip, page);
 	if (ret < 0 || mtd->ecc_stats.failed != failed) {
 		memset (buf, 0, mtd->writesize);
 		if (oob_required)
@@ -1639,7 +1707,7 @@ static int nand_do_read_ops(struct mtd_info *mtd, loff_t from,
 			} else if (!aligned && NAND_HAS_SUBPAGE_READ(chip) &&
 				 !oob)
 				ret = chip->ecc.read_subpage(mtd, chip,
-							col, bytes, bufpoi);
+							col, bytes, bufpoi, page);
 			else
 				ret = chip->ecc.read_page(mtd, chip, bufpoi,
 							  oob_required, page);
diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
index 53e99c7..f086dd3 100644
--- a/include/linux/mtd/nand.h
+++ b/include/linux/mtd/nand.h
@@ -379,7 +379,7 @@ struct nand_ecc_ctrl {
 	int (*read_page)(struct mtd_info *mtd, struct nand_chip *chip,
 			uint8_t *buf, int oob_required, int page);
 	int (*read_subpage)(struct mtd_info *mtd, struct nand_chip *chip,
-			uint32_t offs, uint32_t len, uint8_t *buf);
+			uint32_t offs, uint32_t len, uint8_t *buf, int page);
 	int (*write_page)(struct mtd_info *mtd, struct nand_chip *chip,
 			const uint8_t *buf, int oob_required);
 	int (*write_oob_raw)(struct mtd_info *mtd, struct nand_chip *chip,
@@ -404,6 +404,8 @@ struct nand_buffers {
 	uint8_t	ecccalc[NAND_MAX_OOBSIZE];
 	uint8_t	ecccode[NAND_MAX_OOBSIZE];
 	uint8_t databuf[NAND_MAX_PAGESIZE + NAND_MAX_OOBSIZE];
+	uint8_t chkbuf[NAND_MAX_PAGESIZE + NAND_MAX_OOBSIZE];
+	uint8_t rawbuf[NAND_MAX_PAGESIZE + NAND_MAX_OOBSIZE];
 };
 
 /**

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: on-die ECC support
       [not found] <CACwUX0M-04vCTkA8WwNR9si=7N-xuYOTy6aWO7ty8xa1+S3rUw@mail.gmail.com>
@ 2013-03-22  8:21 ` Peter Horton
  2013-03-22 21:14   ` David Mosberger-Tang
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Horton @ 2013-03-22  8:21 UTC (permalink / raw)
  To: David Mosberger-Tang; +Cc: linux-mtd

On 21/03/2013 20:00, David Mosberger-Tang wrote:
>
> I missed your email until now.  You said:
>
>   > Be careful using the NAND status register to indicate bit flips. The
>   > chips we were using set the status bit for any correction and this can
>   > result in very high error counts. Some of the pages had bits which are
>   > permanently 0 or 1 meaning they might need correction every read. We
>   > worked around this by re-reading any page that the chip flagged as
>   > corrected and using the software BCH to work out the actual number of
>   > bits in error.
>
> Ouch.  I think you are right: it looks to me that the chip sets the
> REWRITE_RECOMMENDED bit
> for any bit flips.  Is the Linux BCH code compatible with the Micron code?
>
> Yes, using software BCH to figure out actual number of bitflips would
> be the right approach then.
> Is there a patch for that somewhere?

Not a patch but the driver file is here :-

http://files.bitbox.co.uk/public/nand-on-die-ecc/10CUL494_nand_2.6.37.c

Credit for the Micron ECC layout goes to Ivan Djelic IIRC.

P.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: on-die ECC support
  2013-03-15 16:08 David Mosberger-Tang
  2013-03-15 16:17 ` David Mosberger-Tang
  2013-03-15 17:58 ` Peter Horton
@ 2013-03-20 17:46 ` David Mosberger-Tang
  2 siblings, 0 replies; 12+ messages in thread
From: David Mosberger-Tang @ 2013-03-20 17:46 UTC (permalink / raw)
  To: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 2126 bytes --]

Here is an updated patch that seems to work well so far.  Some comments:

- nand_oob_64_on_die is probably correct for Micron MT29F4G part only.  Perhaps
 there is a way to auto-detect this based on ONFI parameters?
- the handling of the raw ops is a bit ugly, since we have to
temporarily turn off
 the on-die ECC (and then turn it back on); this can't be done in the
ecc callbacks
 because those usually get called after the main command (e.g., READ) has been
 issued already

The patch is relative to kernel 3.7.0.  I'm providing it mostly in
case someone else needs the
functionality.  Feedback is always welcome, of course.  Hopefully at
least some of this can eventually
make it into main-line.

At the very least I'd like to see a warning added to the kernel which
kicks in when someone is
trying to run an ECC algorithm that is too weak (e.g., NAND_ECC_SOFT
when 4-bit ECC is
required).  Looks like that could be done easily based on ONFI parameters.

  --david

On Fri, Mar 15, 2013 at 10:08 AM, David Mosberger-Tang
<dmosberger@gmail.com> wrote:
> We need to be able to support 4-bit-correcting ECC with a
> micro-controller that doesn't have hardware-support for that.  We are
> planning to use the on-die ECC controller supported by Micron flash
> chips.  I suppose we could use the BCH swecc support in the Linux
> kernel, but I'm concerned about the performance implication of that
> and we'd also have to add BCH ecc to our bootloader, which would mean
> more development and testing.
>
> For backwards-compatibility, we need to be able to support a
> subpage-size of 512 bytes.  From the Micron data sheets, it's not
> clear to me whether the on-die controller really supports reading 512
> byte subpages but the attached patch certainly *seems* to work fine so
> far.
>
> I'd love to get some feedback as to whether I'm on the right track
> here or whether I should pursue a different path.  The patch does the
> following:
>
>  - Add NAND_ECC_HW_ON_DIE ecc mode
>  - Add nand_read_subpage_raw() to read a subpage without worrying about ECC
>  - Hack atmel_nand.c to use on-die ECC.
>
> Thanks,
>
>   --david

[-- Attachment #2: nand-on-die-ecc.diff.txt --]
[-- Type: text/plain, Size: 17608 bytes --]

commit afee2a1b319943d449f09e087369537469870b67
Author: David Mosberger <davidm@egauge.net>
Date:   Wed Mar 20 11:19:59 2013 -0600

    [MTD] Add support for NAND flash with on-die ECC controllers.
    
    * drivers/mtd/nand/nand_base.c (nand_oob_64_on_die): New struct for
    		ECC layout of Micron MT29F4G16ABADAWP.
    	(set_column): Properly handle READID, GET_FEATURES, and SET_FEATURES
    		commands.  The column is not a word address for those commands
    		and always single byte.
    	(nand_command): Use set_column() to set column in lieu of open code.
    	(nand_command_lp): Likewise.
    	(set_on_die_ecc): New function.
    	(check_read_status_on_die): Likewise.
    	(nand_read_subpage_on_die): Likewise.
    	(nand_read_page_on_die): Likewise.
    	(nand_do_read_ops): Temporarily disable on-die ECC for raw operations.
    	(nand_do_read_oob): Likewise.
    	(nand_write_page): Likewise.
    	(nand_do_write_oob): Likewise.
    	(nand_onfi_set_features): Make it work on 16-bit devices: parameters
    		need to be read in byte-mode even on 16-bit devices.
    	(nand_onfi_get_features): Likewise.
    	(nand_flash_detect_onfi): Likewise.
    	(nand_get_flash_type): Always try to detect ONFI, even if chip
    		is already known.  Otherwise, chip->onfi_get_features and
    		chip->onfi_set_features won't work on chips that do support
    		ONFI.
    	(nand_scan_tail): Check if chip has on-die ECC enabled (either by
    		hardware or by bootloader) and, if so, force on-die ECC mode.
    		Handle NAND_ECC_HW_ON_DIE.
    		Enable sub-page reads for on-die ECC as well.
    * drivers/of/of_mtd.c (nand_ecc_modes): Add NAND_ECC_HW_ON_DIE/hw_on_die.
    * include/linux/mtd/nand.h (NAND_STATUS_REWRITE): New macro.
    	(nand_ecc_modes_t): Add NAND_ECC_HW_ON_DIE.

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 1a03b7f..fa85dd5 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -79,6 +79,20 @@ static struct nand_ecclayout nand_oob_64 = {
 		 .length = 38} }
 };
 
+static struct nand_ecclayout nand_oob_64_on_die = {
+	.eccbytes = 32,
+	.eccpos = {
+		    8,  9, 10, 11, 12, 13, 14, 15,
+		   24, 25, 26, 27, 28, 29, 30, 31,
+		   40, 41, 42, 43, 44, 45, 46, 47,
+		   56, 57, 58, 59, 60, 61, 62, 63},
+	.oobfree = {
+		{.offset =  4,	 .length = 4},
+		{.offset = 20,	 .length = 4},
+		{.offset = 36,	 .length = 4},
+		{.offset = 52,	 .length = 4}}
+};
+
 static struct nand_ecclayout nand_oob_128 = {
 	.eccbytes = 48,
 	.eccpos = {
@@ -509,6 +523,32 @@ void nand_wait_ready(struct mtd_info *mtd)
 }
 EXPORT_SYMBOL_GPL(nand_wait_ready);
 
+static int
+set_column (struct mtd_info *mtd, struct nand_chip *chip,
+	    unsigned int command, int column,
+	    unsigned int column_width, int ctrl)
+{
+	switch (command) {
+	case NAND_CMD_READID:
+	case NAND_CMD_GET_FEATURES:
+	case NAND_CMD_SET_FEATURES:
+		chip->cmd_ctrl(mtd, column, ctrl);
+		ctrl &= ~NAND_CTRL_CHANGE;
+		break;
+
+	default:
+		/* Adjust columns for 16 bit buswidth */
+		if (chip->options & NAND_BUSWIDTH_16)
+			column >>= 1;
+		chip->cmd_ctrl(mtd, column, ctrl);
+		ctrl &= ~NAND_CTRL_CHANGE;
+		if (column_width > 8)
+			chip->cmd_ctrl(mtd, column >> 8, ctrl);
+		break;
+	}
+	return ctrl;
+}
+
 /**
  * nand_command - [DEFAULT] Send command to NAND device
  * @mtd: MTD device structure
@@ -548,13 +588,8 @@ static void nand_command(struct mtd_info *mtd, unsigned int command,
 	/* Address cycle, when necessary */
 	ctrl = NAND_CTRL_ALE | NAND_CTRL_CHANGE;
 	/* Serially input address */
-	if (column != -1) {
-		/* Adjust columns for 16 bit buswidth */
-		if (chip->options & NAND_BUSWIDTH_16)
-			column >>= 1;
-		chip->cmd_ctrl(mtd, column, ctrl);
-		ctrl &= ~NAND_CTRL_CHANGE;
-	}
+	if (column != -1)
+		ctrl = set_column(mtd, chip, command, column, 8, ctrl);
 	if (page_addr != -1) {
 		chip->cmd_ctrl(mtd, page_addr, ctrl);
 		ctrl &= ~NAND_CTRL_CHANGE;
@@ -640,14 +675,9 @@ static void nand_command_lp(struct mtd_info *mtd, unsigned int command,
 		int ctrl = NAND_CTRL_CHANGE | NAND_NCE | NAND_ALE;
 
 		/* Serially input address */
-		if (column != -1) {
-			/* Adjust columns for 16 bit buswidth */
-			if (chip->options & NAND_BUSWIDTH_16)
-				column >>= 1;
-			chip->cmd_ctrl(mtd, column, ctrl);
-			ctrl &= ~NAND_CTRL_CHANGE;
-			chip->cmd_ctrl(mtd, column >> 8, ctrl);
-		}
+		if (column != -1)
+			ctrl = set_column (mtd, chip, command, column, 16,
+					   ctrl);
 		if (page_addr != -1) {
 			chip->cmd_ctrl(mtd, page_addr, ctrl);
 			chip->cmd_ctrl(mtd, page_addr >> 8,
@@ -1221,6 +1251,124 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip,
 	return max_bitflips;
 }
 
+static int
+set_on_die_ecc (struct mtd_info *mtd, struct nand_chip *chip, int on)
+{
+	u8 data[ONFI_SUBFEATURE_PARAM_LEN] = { 0, };
+
+	if (chip->ecc.mode != NAND_ECC_HW_ON_DIE)
+		return 0;
+
+	if (on)
+		data[0] = 0x8;
+
+	return (*chip->onfi_set_features)(mtd, chip, 0x90, data);
+}
+
+static int check_read_status_on_die (struct mtd_info *mtd, struct nand_chip *chip)
+{
+	unsigned int max_bitflips = 0;
+	uint8_t status;
+
+	/* Check ECC status of page just transferred into NAND's page buffer: */
+	chip->cmdfunc(mtd, NAND_CMD_STATUS, -1, -1);
+	status = chip->read_byte(mtd);
+
+	/* Switch back to data reading: */
+	chip->cmd_ctrl(mtd, NAND_CMD_READ0,
+		       NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE);
+	chip->cmd_ctrl(mtd, NAND_CMD_NONE,
+		       NAND_NCE | NAND_CTRL_CHANGE);
+
+	if (status & NAND_STATUS_FAIL)
+		/* Page has invalid ECC. */
+		mtd->ecc_stats.failed++;
+	else if (status & NAND_STATUS_REWRITE) {
+		/*
+		 * Bit flip(s); we don't know how many, but since a rewrite is recommended
+		 * we should assume ecc.strength bitflips so that the higher-level software
+		 * is going to do something about it.
+		 */
+		mtd->ecc_stats.corrected += chip->ecc.strength;
+		max_bitflips = chip->ecc.strength;
+	}
+	return max_bitflips;
+}
+
+/**
+ * nand_read_subpage_on_die - [REPLACEABLE] raw sub-page read function
+ * @mtd: mtd info structure
+ * @chip: nand chip info structure
+ * @data_offs: offset of requested data within the page
+ * @readlen: data length
+ * @bufpoi: buffer to store read data
+ */
+static int nand_read_subpage_on_die(struct mtd_info *mtd, struct nand_chip *chip,
+			uint32_t data_offs, uint32_t readlen, uint8_t *bufpoi)
+{
+	int start_step, end_step, num_steps, ret;
+	int data_col_addr;
+	int datafrag_len;
+	uint32_t failed;
+	uint8_t *p;
+
+	/* Column address within the page aligned to ECC size */
+	start_step = data_offs / chip->ecc.size;
+	end_step = (data_offs + readlen - 1) / chip->ecc.size;
+	num_steps = end_step - start_step + 1;
+
+	/* Data size aligned to ECC ecc.size */
+	datafrag_len = num_steps * chip->ecc.size;
+	data_col_addr = start_step * chip->ecc.size;
+	p = bufpoi + data_col_addr;
+
+	failed = mtd->ecc_stats.failed;
+
+	ret = check_read_status_on_die (mtd, chip);
+	if (ret < 0 || mtd->ecc_stats.failed != failed) {
+		memset (p, 0, datafrag_len);
+		return ret;
+	}
+
+	/* If we read not a page aligned data */
+	if (data_col_addr != 0)
+		chip->cmdfunc(mtd, NAND_CMD_RNDOUT, data_col_addr, -1);
+
+	chip->read_buf(mtd, p, datafrag_len);
+
+	return ret;
+}
+
+/**
+ * nand_read_page_on_die - [INTERN] read raw page data without ecc
+ * @mtd: mtd info structure
+ * @chip: nand chip info structure
+ * @buf: buffer to store read data
+ * @oob_required: caller requires OOB data read to chip->oob_poi
+ * @page: page number to read
+ */
+static int nand_read_page_on_die(struct mtd_info *mtd, struct nand_chip *chip,
+				 uint8_t *buf, int oob_required, int page)
+{
+	uint32_t failed;
+	int ret;
+
+	failed = mtd->ecc_stats.failed;
+
+	ret = check_read_status_on_die (mtd, chip);
+	if (ret < 0 || mtd->ecc_stats.failed != failed) {
+		memset (buf, 0, mtd->writesize);
+		if (oob_required)
+			memset (chip->oob_poi, 0, mtd->oobsize);
+		return ret;
+	}
+
+	chip->read_buf(mtd, buf, mtd->writesize);
+	if (oob_required)
+		chip->read_buf(mtd, chip->oob_poi, mtd->oobsize);
+	return ret;
+}
+
 /**
  * nand_read_page_hwecc - [REPLACEABLE] hardware ECC based page read function
  * @mtd: mtd info structure
@@ -1474,17 +1622,21 @@ static int nand_do_read_ops(struct mtd_info *mtd, loff_t from,
 		if (realpage != chip->pagebuf || oob) {
 			bufpoi = aligned ? buf : chip->buffers->databuf;
 
+			if (unlikely(ops->mode == MTD_OPS_RAW))
+			    set_on_die_ecc(mtd, chip, 0);
+
 			chip->cmdfunc(mtd, NAND_CMD_READ0, 0x00, page);
 
 			/*
 			 * Now read the page into the buffer.  Absent an error,
 			 * the read methods return max bitflips per ecc step.
 			 */
-			if (unlikely(ops->mode == MTD_OPS_RAW))
+			if (unlikely(ops->mode == MTD_OPS_RAW)) {
 				ret = chip->ecc.read_page_raw(mtd, chip, bufpoi,
 							      oob_required,
 							      page);
-			else if (!aligned && NAND_HAS_SUBPAGE_READ(chip) &&
+				set_on_die_ecc(mtd, chip, 1);
+			} else if (!aligned && NAND_HAS_SUBPAGE_READ(chip) &&
 				 !oob)
 				ret = chip->ecc.read_subpage(mtd, chip,
 							col, bytes, bufpoi);
@@ -1778,9 +1930,11 @@ static int nand_do_read_oob(struct mtd_info *mtd, loff_t from,
 	page = realpage & chip->pagemask;
 
 	while (1) {
-		if (ops->mode == MTD_OPS_RAW)
+		if (ops->mode == MTD_OPS_RAW) {
+			set_on_die_ecc(mtd, chip, 0);
 			ret = chip->ecc.read_oob_raw(mtd, chip, page);
-		else
+			set_on_die_ecc(mtd, chip, 1);
+		} else
 			ret = chip->ecc.read_oob(mtd, chip, page);
 
 		if (ret < 0)
@@ -2045,6 +2199,9 @@ static int nand_write_page(struct mtd_info *mtd, struct nand_chip *chip,
 {
 	int status;
 
+	if (unlikely(raw))
+		set_on_die_ecc(mtd, chip, 0);
+
 	chip->cmdfunc(mtd, NAND_CMD_SEQIN, 0x00, page);
 
 	if (unlikely(raw))
@@ -2052,8 +2209,11 @@ static int nand_write_page(struct mtd_info *mtd, struct nand_chip *chip,
 	else
 		status = chip->ecc.write_page(mtd, chip, buf, oob_required);
 
-	if (status < 0)
+	if (status < 0) {
+		if (unlikely(raw))
+			set_on_die_ecc(mtd, chip, 1);
 		return status;
+	}
 
 	/*
 	 * Cached progamming disabled for now. Not sure if it's worth the
@@ -2073,11 +2233,16 @@ static int nand_write_page(struct mtd_info *mtd, struct nand_chip *chip,
 			status = chip->errstat(mtd, chip, FL_WRITING, status,
 					       page);
 
+		if (unlikely(raw))
+			set_on_die_ecc(mtd, chip, 1);
+
 		if (status & NAND_STATUS_FAIL)
 			return -EIO;
 	} else {
 		chip->cmdfunc(mtd, NAND_CMD_CACHEDPROG, -1, -1);
 		status = chip->waitfunc(mtd, chip);
+		if (unlikely(raw))
+			set_on_die_ecc(mtd, chip, 1);
 	}
 
 	return 0;
@@ -2386,9 +2551,11 @@ static int nand_do_write_oob(struct mtd_info *mtd, loff_t to,
 
 	nand_fill_oob(mtd, ops->oobbuf, ops->ooblen, ops);
 
-	if (ops->mode == MTD_OPS_RAW)
+	if (ops->mode == MTD_OPS_RAW) {
+		set_on_die_ecc(mtd, chip, 0);
 		status = chip->ecc.write_oob_raw(mtd, chip, page & chip->pagemask);
-	else
+		set_on_die_ecc(mtd, chip, 1);
+	} else
 		status = chip->ecc.write_oob(mtd, chip, page & chip->pagemask);
 
 	if (status)
@@ -2709,13 +2876,27 @@ static int nand_block_markbad(struct mtd_info *mtd, loff_t ofs)
 static int nand_onfi_set_features(struct mtd_info *mtd, struct nand_chip *chip,
 			int addr, uint8_t *subfeature_param)
 {
-	int status;
+	uint16_t buf[ONFI_SUBFEATURE_PARAM_LEN];
+	size_t len = ONFI_SUBFEATURE_PARAM_LEN;
+	int status, i;
 
 	if (!chip->onfi_version)
 		return -EINVAL;
 
 	chip->cmdfunc(mtd, NAND_CMD_SET_FEATURES, addr, -1);
-	chip->write_buf(mtd, subfeature_param, ONFI_SUBFEATURE_PARAM_LEN);
+	if (chip->options & NAND_BUSWIDTH_16) {
+		/*
+		 * ONFI says parameters are always transferred on the
+		 * lower 8-bits of the databus.  Since there is no
+		 * chip->write_byte callback, we have to convert
+		 * subfeature_param to 16-bit data.
+		 */
+		for (i = 0; i < ONFI_SUBFEATURE_PARAM_LEN; ++i)
+			buf[i] = subfeature_param[i];
+		subfeature_param = (uint8_t *) buf;
+		len = sizeof (buf);
+	}
+	chip->write_buf(mtd, subfeature_param, len);
 	status = chip->waitfunc(mtd, chip);
 	if (status & NAND_STATUS_FAIL)
 		return -EIO;
@@ -2732,6 +2913,8 @@ static int nand_onfi_set_features(struct mtd_info *mtd, struct nand_chip *chip,
 static int nand_onfi_get_features(struct mtd_info *mtd, struct nand_chip *chip,
 			int addr, uint8_t *subfeature_param)
 {
+	int i;
+
 	if (!chip->onfi_version)
 		return -EINVAL;
 
@@ -2739,7 +2922,13 @@ static int nand_onfi_get_features(struct mtd_info *mtd, struct nand_chip *chip,
 	memset(subfeature_param, 0, ONFI_SUBFEATURE_PARAM_LEN);
 
 	chip->cmdfunc(mtd, NAND_CMD_GET_FEATURES, addr, -1);
-	chip->read_buf(mtd, subfeature_param, ONFI_SUBFEATURE_PARAM_LEN);
+	/*
+	 * ONFI says parameters are always transferred on the
+	 * lower 8-bits of the databus.  Use read_byte() since
+	 * that works even on 16-bit devices.
+	 */
+	for (i = 0; i < ONFI_SUBFEATURE_PARAM_LEN; ++i)
+	  subfeature_param[i] = chip->read_byte(mtd);
 	return 0;
 }
 
@@ -2846,10 +3035,10 @@ static int nand_flash_detect_onfi(struct mtd_info *mtd, struct nand_chip *chip,
 					int *busw)
 {
 	struct nand_onfi_params *p = &chip->onfi_params;
-	int i;
+	uint8_t *bp;
+	int i, j;
 	int val;
 
-	/* Try ONFI for unknown chip or LP */
 	chip->cmdfunc(mtd, NAND_CMD_READID, 0x20, -1);
 	if (chip->read_byte(mtd) != 'O' || chip->read_byte(mtd) != 'N' ||
 		chip->read_byte(mtd) != 'F' || chip->read_byte(mtd) != 'I')
@@ -2857,7 +3046,10 @@ static int nand_flash_detect_onfi(struct mtd_info *mtd, struct nand_chip *chip,
 
 	chip->cmdfunc(mtd, NAND_CMD_PARAM, 0, -1);
 	for (i = 0; i < 3; i++) {
-		chip->read_buf(mtd, (uint8_t *)p, sizeof(*p));
+		/* Must read with 8-bit transfers even on 16-bit devices: */
+		bp = (uint8_t *)p;
+		for (j = 0; j < sizeof (*p); ++j)
+			*bp++ = chip->read_byte(mtd);
 		if (onfi_crc16(ONFI_CRC_BASE, (uint8_t *)p, 254) ==
 				le16_to_cpu(p->crc)) {
 			pr_info("ONFI param page %d valid\n", i);
@@ -3202,11 +3394,9 @@ static struct nand_flash_dev *nand_get_flash_type(struct mtd_info *mtd,
 			break;
 
 	chip->onfi_version = 0;
-	if (!type->name || !type->pagesize) {
-		/* Check is chip is ONFI compliant */
-		if (nand_flash_detect_onfi(mtd, chip, &busw))
-			goto ident_done;
-	}
+	/* Check is chip is ONFI compliant */
+	if (nand_flash_detect_onfi(mtd, chip, &busw))
+		goto ident_done;
 
 	if (!type->name)
 		return ERR_PTR(-ENODEV);
@@ -3362,6 +3552,7 @@ EXPORT_SYMBOL(nand_scan_ident);
 int nand_scan_tail(struct mtd_info *mtd)
 {
 	int i;
+	u8 features[ONFI_SUBFEATURE_PARAM_LEN];
 	struct nand_chip *chip = mtd->priv;
 
 	/* New bad blocks should be marked in OOB, flash-based BBT, or both */
@@ -3409,6 +3600,16 @@ int nand_scan_tail(struct mtd_info *mtd)
 	if (!chip->onfi_get_features)
 		chip->onfi_get_features = nand_onfi_get_features;
 
+	if ((*chip->onfi_get_features)(mtd, chip, 0x90, features) >= 0) {
+		if (features[0] & 0x08) {
+			/*
+			 * If the chip has on-die ECC enabled, we kind of have to use it.
+			 */
+			chip->ecc.mode = NAND_ECC_HW_ON_DIE;
+			pr_info("NAND device: Using on-die ECC\n");
+		}
+	}
+
 	/*
 	 * Check ECC mode, default to software if 3byte/512byte hardware ECC is
 	 * selected and we have 256 byte pagesize fallback to software ECC
@@ -3530,6 +3731,25 @@ int nand_scan_tail(struct mtd_info *mtd)
 			chip->ecc.bytes * 8 / fls(8 * chip->ecc.size);
 		break;
 
+	case NAND_ECC_HW_ON_DIE:
+		/* nand_bbt attempts to put Bbt marker at offset 8 in
+		   oob, which is used for ECC by Micron
+		   MT29F4G16ABADAWP, for example.  Fixed by not using
+		   OOB for BBT marker.  */
+		chip->bbt_options |= NAND_BBT_NO_OOB;
+		chip->ecc.layout = &nand_oob_64_on_die;
+		chip->ecc.read_page = nand_read_page_on_die;
+		chip->ecc.read_subpage = nand_read_subpage_on_die;
+		chip->ecc.write_page = nand_write_page_raw;
+		chip->ecc.read_oob = nand_read_oob_std;
+		chip->ecc.read_page_raw = nand_read_page_raw;
+		chip->ecc.write_page_raw = nand_write_page_raw;
+		chip->ecc.write_oob = nand_write_oob_std;
+		chip->ecc.size = 512;
+		chip->ecc.bytes = 8;
+		chip->ecc.strength = 4;
+		break;
+
 	case NAND_ECC_NONE:
 		pr_warn("NAND_ECC_NONE selected by board driver. "
 			   "This is not recommended!\n");
@@ -3602,8 +3822,10 @@ int nand_scan_tail(struct mtd_info *mtd)
 	/* Invalidate the pagebuffer reference */
 	chip->pagebuf = -1;
 
-	/* Large page NAND with SOFT_ECC should support subpage reads */
-	if ((chip->ecc.mode == NAND_ECC_SOFT) && (chip->page_shift > 9))
+	/* Large page NAND with SOFT_ECC or on-die ECC should support subpage reads */
+	if (((chip->ecc.mode == NAND_ECC_SOFT)
+	     || (chip->ecc.mode == NAND_ECC_HW_ON_DIE))
+	    && (chip->page_shift > 9))
 		chip->options |= NAND_SUBPAGE_READ;
 
 	/* Fill in remaining MTD driver data */
diff --git a/drivers/of/of_mtd.c b/drivers/of/of_mtd.c
index a27ec94..79ea026 100644
--- a/drivers/of/of_mtd.c
+++ b/drivers/of/of_mtd.c
@@ -23,6 +23,7 @@ static const char *nand_ecc_modes[] = {
 	[NAND_ECC_HW_SYNDROME]	= "hw_syndrome",
 	[NAND_ECC_HW_OOB_FIRST]	= "hw_oob_first",
 	[NAND_ECC_SOFT_BCH]	= "soft_bch",
+	[NAND_ECC_HW_ON_DIE]	= "hw_on_die",
 };
 
 /**
diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
index 24e9159..53e99c7 100644
--- a/include/linux/mtd/nand.h
+++ b/include/linux/mtd/nand.h
@@ -129,6 +129,7 @@ extern int nand_unlock(struct mtd_info *mtd, loff_t ofs, uint64_t len);
 /* Status bits */
 #define NAND_STATUS_FAIL	0x01
 #define NAND_STATUS_FAIL_N1	0x02
+#define NAND_STATUS_REWRITE	0x08
 #define NAND_STATUS_TRUE_READY	0x20
 #define NAND_STATUS_READY	0x40
 #define NAND_STATUS_WP		0x80
@@ -143,6 +144,7 @@ typedef enum {
 	NAND_ECC_HW_SYNDROME,
 	NAND_ECC_HW_OOB_FIRST,
 	NAND_ECC_SOFT_BCH,
+	NAND_ECC_HW_ON_DIE,
 } nand_ecc_modes_t;
 
 /*

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: on-die ECC support
  2013-03-15 16:08 David Mosberger-Tang
  2013-03-15 16:17 ` David Mosberger-Tang
@ 2013-03-15 17:58 ` Peter Horton
  2013-03-20 17:46 ` David Mosberger-Tang
  2 siblings, 0 replies; 12+ messages in thread
From: Peter Horton @ 2013-03-15 17:58 UTC (permalink / raw)
  To: linux-mtd

On 15/03/2013 16:08, David Mosberger-Tang wrote:
> We need to be able to support 4-bit-correcting ECC with a
> micro-controller that doesn't have hardware-support for that.  We are
> planning to use the on-die ECC controller supported by Micron flash
> chips.  I suppose we could use the BCH swecc support in the Linux
> kernel, but I'm concerned about the performance implication of that
> and we'd also have to add BCH ecc to our bootloader, which would mean
> more development and testing.
>

Be careful using the NAND status register to indicate bit flips. The 
chips we were using set the status bit for any correction and this can 
result in very high error counts. Some of the pages had bits which are 
permanently 0 or 1 meaning they might need correction every read. We 
worked around this by re-reading any page that the chip flagged as 
corrected and using the software BCH to work out the actual number of 
bits in error.

P.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: on-die ECC support
  2013-03-15 16:08 David Mosberger-Tang
@ 2013-03-15 16:17 ` David Mosberger-Tang
  2013-03-15 17:58 ` Peter Horton
  2013-03-20 17:46 ` David Mosberger-Tang
  2 siblings, 0 replies; 12+ messages in thread
From: David Mosberger-Tang @ 2013-03-15 16:17 UTC (permalink / raw)
  To: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 1199 bytes --]

Hopefully this attachment comes through...

On Fri, Mar 15, 2013 at 10:08 AM, David Mosberger-Tang
<dmosberger@gmail.com> wrote:
> We need to be able to support 4-bit-correcting ECC with a
> micro-controller that doesn't have hardware-support for that.  We are
> planning to use the on-die ECC controller supported by Micron flash
> chips.  I suppose we could use the BCH swecc support in the Linux
> kernel, but I'm concerned about the performance implication of that
> and we'd also have to add BCH ecc to our bootloader, which would mean
> more development and testing.
>
> For backwards-compatibility, we need to be able to support a
> subpage-size of 512 bytes.  From the Micron data sheets, it's not
> clear to me whether the on-die controller really supports reading 512
> byte subpages but the attached patch certainly *seems* to work fine so
> far.
>
> I'd love to get some feedback as to whether I'm on the right track
> here or whether I should pursue a different path.  The patch does the
> following:
>
>  - Add NAND_ECC_HW_ON_DIE ecc mode
>  - Add nand_read_subpage_raw() to read a subpage without worrying about ECC
>  - Hack atmel_nand.c to use on-die ECC.
>
> Thanks,
>
>   --david

[-- Attachment #2: on-die-ecc.diff.txt --]
[-- Type: text/plain, Size: 5001 bytes --]

diff --git a/drivers/mtd/nand/atmel_nand.c b/drivers/mtd/nand/atmel_nand.c
index 92623ac..8043fde 100644
--- a/drivers/mtd/nand/atmel_nand.c
+++ b/drivers/mtd/nand/atmel_nand.c
@@ -1462,7 +1462,12 @@ static int __init atmel_nand_probe(struct platform_device *pdev)
 	}
 
 	nand_chip->ecc.mode = host->board.ecc_mode;
+nand_chip->ecc.mode = NAND_ECC_HW_ON_DIE;	// XXX
+nand_chip->ecc.size = 512;	// XXX
+nand_chip->ecc.strength = 4;	// XXX
+printk("nand_chip->ecc.mode=%d\n", nand_chip->ecc.mode);
 	nand_chip->chip_delay = 20;		/* 20us command delay time */
+	nand_chip->chip_delay = 70;	// XXX t_R_ECC
 
 	if (host->board.bus_width_16)	/* 16-bit bus width */
 		nand_chip->options |= NAND_BUSWIDTH_16;
diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 1a03b7f..88ee536 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -711,6 +711,29 @@ static void nand_command_lp(struct mtd_info *mtd, unsigned int command,
 		chip->cmd_ctrl(mtd, NAND_CMD_NONE,
 			       NAND_NCE | NAND_CTRL_CHANGE);
 
+		if (1) {
+			u8 status;
+
+			udelay(chip->chip_delay);
+
+			chip->cmdfunc(mtd, NAND_CMD_STATUS, -1, -1);
+			status = chip->read_byte(mtd);
+
+			if (status & 0x01) {
+				printk("ERROR: on-die ECC read failure\n");
+			} else if (status & 0x08) {
+				// bit flip
+				printk("ERROR: on-die ECC rewrite recommended\n"
+);
+			}
+
+			chip->cmd_ctrl(mtd, NAND_CMD_READ0,
+				       NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE);
+			chip->cmd_ctrl(mtd, NAND_CMD_NONE,
+				       NAND_NCE | NAND_CTRL_CHANGE);
+			return;
+		}
+
 		/* This applies to read commands */
 	default:
 		/*
@@ -1222,6 +1245,42 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip,
 }
 
 /**
+ * nand_read_subpage_raw - [REPLACEABLE] raw sub-page read function
+ * @mtd: mtd info structure
+ * @chip: nand chip info structure
+ * @data_offs: offset of requested data within the page
+ * @readlen: data length
+ * @bufpoi: buffer to store read data
+ */
+static int nand_read_subpage_raw(struct mtd_info *mtd, struct nand_chip *chip,
+			uint32_t data_offs, uint32_t readlen, uint8_t *bufpoi)
+{
+	int start_step, end_step, num_steps;
+	uint8_t *p;
+	int data_col_addr;
+	int datafrag_len;
+	unsigned int max_bitflips = 0;
+
+	/* Column address within the page aligned to ECC size */
+	start_step = data_offs / chip->ecc.size;
+	end_step = (data_offs + readlen - 1) / chip->ecc.size;
+	num_steps = end_step - start_step + 1;
+
+	/* Data size aligned to ECC ecc.size */
+	datafrag_len = num_steps * chip->ecc.size;
+
+	data_col_addr = start_step * chip->ecc.size;
+	/* If we read not a page aligned data */
+	if (data_col_addr != 0)
+		chip->cmdfunc(mtd, NAND_CMD_RNDOUT, data_col_addr, -1);
+
+	p = bufpoi + data_col_addr;
+	chip->read_buf(mtd, p, datafrag_len);
+
+	return max_bitflips;
+}
+
+/**
  * nand_read_page_hwecc - [REPLACEABLE] hardware ECC based page read function
  * @mtd: mtd info structure
  * @chip: nand chip info structure
@@ -3530,6 +3589,21 @@ int nand_scan_tail(struct mtd_info *mtd)
 			chip->ecc.bytes * 8 / fls(8 * chip->ecc.size);
 		break;
 
+	case NAND_ECC_HW_ON_DIE:
+		chip->ecc.read_page = nand_read_page_raw;
+		chip->ecc.read_subpage = nand_read_subpage_raw;
+		chip->ecc.write_page = nand_write_page_raw;
+		chip->ecc.read_oob = nand_read_oob_std;
+		chip->ecc.read_page_raw = nand_read_page_raw;
+		chip->ecc.write_page_raw = nand_write_page_raw;
+		chip->ecc.write_oob = nand_write_oob_std;
+		chip->ecc.bytes = 0;
+/*
+		chip->ecc.size = mtd->writesize;
+		chip->ecc.strength = 0;
+*/
+		break;
+
 	case NAND_ECC_NONE:
 		pr_warn("NAND_ECC_NONE selected by board driver. "
 			   "This is not recommended!\n");
@@ -3591,6 +3665,7 @@ int nand_scan_tail(struct mtd_info *mtd)
 			break;
 		}
 	}
+mtd->subpage_sft = 2;	// XXX
 	chip->subpagesize = mtd->writesize >> mtd->subpage_sft;
 
 	/* Initialize state */
@@ -3603,7 +3678,9 @@ int nand_scan_tail(struct mtd_info *mtd)
 	chip->pagebuf = -1;
 
 	/* Large page NAND with SOFT_ECC should support subpage reads */
-	if ((chip->ecc.mode == NAND_ECC_SOFT) && (chip->page_shift > 9))
+	if (((chip->ecc.mode == NAND_ECC_SOFT)
+	     || (chip->ecc.mode == NAND_ECC_HW_ON_DIE))
+	    && (chip->page_shift > 9))
 		chip->options |= NAND_SUBPAGE_READ;
 
 	/* Fill in remaining MTD driver data */
diff --git a/drivers/of/of_mtd.c b/drivers/of/of_mtd.c
index a27ec94..79ea026 100644
--- a/drivers/of/of_mtd.c
+++ b/drivers/of/of_mtd.c
@@ -23,6 +23,7 @@ static const char *nand_ecc_modes[] = {
 	[NAND_ECC_HW_SYNDROME]	= "hw_syndrome",
 	[NAND_ECC_HW_OOB_FIRST]	= "hw_oob_first",
 	[NAND_ECC_SOFT_BCH]	= "soft_bch",
+	[NAND_ECC_HW_ON_DIE]	= "hw_on_die",
 };
 
 /**
diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
index 24e9159..2c62b2a 100644
--- a/include/linux/mtd/nand.h
+++ b/include/linux/mtd/nand.h
@@ -143,6 +143,7 @@ typedef enum {
 	NAND_ECC_HW_SYNDROME,
 	NAND_ECC_HW_OOB_FIRST,
 	NAND_ECC_SOFT_BCH,
+	NAND_ECC_HW_ON_DIE,
 } nand_ecc_modes_t;
 
 /*

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* on-die ECC support
@ 2013-03-15 16:08 David Mosberger-Tang
  2013-03-15 16:17 ` David Mosberger-Tang
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: David Mosberger-Tang @ 2013-03-15 16:08 UTC (permalink / raw)
  To: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 1025 bytes --]

We need to be able to support 4-bit-correcting ECC with a
micro-controller that doesn't have hardware-support for that.  We are
planning to use the on-die ECC controller supported by Micron flash
chips.  I suppose we could use the BCH swecc support in the Linux
kernel, but I'm concerned about the performance implication of that
and we'd also have to add BCH ecc to our bootloader, which would mean
more development and testing.

For backwards-compatibility, we need to be able to support a
subpage-size of 512 bytes.  From the Micron data sheets, it's not
clear to me whether the on-die controller really supports reading 512
byte subpages but the attached patch certainly *seems* to work fine so
far.

I'd love to get some feedback as to whether I'm on the right track
here or whether I should pursue a different path.  The patch does the
following:

 - Add NAND_ECC_HW_ON_DIE ecc mode
 - Add nand_read_subpage_raw() to read a subpage without worrying about ECC
 - Hack atmel_nand.c to use on-die ECC.

Thanks,

  --david

[-- Attachment #2: on-die-ecc.diff --]
[-- Type: application/octet-stream, Size: 5001 bytes --]

diff --git a/drivers/mtd/nand/atmel_nand.c b/drivers/mtd/nand/atmel_nand.c
index 92623ac..8043fde 100644
--- a/drivers/mtd/nand/atmel_nand.c
+++ b/drivers/mtd/nand/atmel_nand.c
@@ -1462,7 +1462,12 @@ static int __init atmel_nand_probe(struct platform_device *pdev)
 	}
 
 	nand_chip->ecc.mode = host->board.ecc_mode;
+nand_chip->ecc.mode = NAND_ECC_HW_ON_DIE;	// XXX
+nand_chip->ecc.size = 512;	// XXX
+nand_chip->ecc.strength = 4;	// XXX
+printk("nand_chip->ecc.mode=%d\n", nand_chip->ecc.mode);
 	nand_chip->chip_delay = 20;		/* 20us command delay time */
+	nand_chip->chip_delay = 70;	// XXX t_R_ECC
 
 	if (host->board.bus_width_16)	/* 16-bit bus width */
 		nand_chip->options |= NAND_BUSWIDTH_16;
diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 1a03b7f..88ee536 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -711,6 +711,29 @@ static void nand_command_lp(struct mtd_info *mtd, unsigned int command,
 		chip->cmd_ctrl(mtd, NAND_CMD_NONE,
 			       NAND_NCE | NAND_CTRL_CHANGE);
 
+		if (1) {
+			u8 status;
+
+			udelay(chip->chip_delay);
+
+			chip->cmdfunc(mtd, NAND_CMD_STATUS, -1, -1);
+			status = chip->read_byte(mtd);
+
+			if (status & 0x01) {
+				printk("ERROR: on-die ECC read failure\n");
+			} else if (status & 0x08) {
+				// bit flip
+				printk("ERROR: on-die ECC rewrite recommended\n"
+);
+			}
+
+			chip->cmd_ctrl(mtd, NAND_CMD_READ0,
+				       NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE);
+			chip->cmd_ctrl(mtd, NAND_CMD_NONE,
+				       NAND_NCE | NAND_CTRL_CHANGE);
+			return;
+		}
+
 		/* This applies to read commands */
 	default:
 		/*
@@ -1222,6 +1245,42 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip,
 }
 
 /**
+ * nand_read_subpage_raw - [REPLACEABLE] raw sub-page read function
+ * @mtd: mtd info structure
+ * @chip: nand chip info structure
+ * @data_offs: offset of requested data within the page
+ * @readlen: data length
+ * @bufpoi: buffer to store read data
+ */
+static int nand_read_subpage_raw(struct mtd_info *mtd, struct nand_chip *chip,
+			uint32_t data_offs, uint32_t readlen, uint8_t *bufpoi)
+{
+	int start_step, end_step, num_steps;
+	uint8_t *p;
+	int data_col_addr;
+	int datafrag_len;
+	unsigned int max_bitflips = 0;
+
+	/* Column address within the page aligned to ECC size */
+	start_step = data_offs / chip->ecc.size;
+	end_step = (data_offs + readlen - 1) / chip->ecc.size;
+	num_steps = end_step - start_step + 1;
+
+	/* Data size aligned to ECC ecc.size */
+	datafrag_len = num_steps * chip->ecc.size;
+
+	data_col_addr = start_step * chip->ecc.size;
+	/* If we read not a page aligned data */
+	if (data_col_addr != 0)
+		chip->cmdfunc(mtd, NAND_CMD_RNDOUT, data_col_addr, -1);
+
+	p = bufpoi + data_col_addr;
+	chip->read_buf(mtd, p, datafrag_len);
+
+	return max_bitflips;
+}
+
+/**
  * nand_read_page_hwecc - [REPLACEABLE] hardware ECC based page read function
  * @mtd: mtd info structure
  * @chip: nand chip info structure
@@ -3530,6 +3589,21 @@ int nand_scan_tail(struct mtd_info *mtd)
 			chip->ecc.bytes * 8 / fls(8 * chip->ecc.size);
 		break;
 
+	case NAND_ECC_HW_ON_DIE:
+		chip->ecc.read_page = nand_read_page_raw;
+		chip->ecc.read_subpage = nand_read_subpage_raw;
+		chip->ecc.write_page = nand_write_page_raw;
+		chip->ecc.read_oob = nand_read_oob_std;
+		chip->ecc.read_page_raw = nand_read_page_raw;
+		chip->ecc.write_page_raw = nand_write_page_raw;
+		chip->ecc.write_oob = nand_write_oob_std;
+		chip->ecc.bytes = 0;
+/*
+		chip->ecc.size = mtd->writesize;
+		chip->ecc.strength = 0;
+*/
+		break;
+
 	case NAND_ECC_NONE:
 		pr_warn("NAND_ECC_NONE selected by board driver. "
 			   "This is not recommended!\n");
@@ -3591,6 +3665,7 @@ int nand_scan_tail(struct mtd_info *mtd)
 			break;
 		}
 	}
+mtd->subpage_sft = 2;	// XXX
 	chip->subpagesize = mtd->writesize >> mtd->subpage_sft;
 
 	/* Initialize state */
@@ -3603,7 +3678,9 @@ int nand_scan_tail(struct mtd_info *mtd)
 	chip->pagebuf = -1;
 
 	/* Large page NAND with SOFT_ECC should support subpage reads */
-	if ((chip->ecc.mode == NAND_ECC_SOFT) && (chip->page_shift > 9))
+	if (((chip->ecc.mode == NAND_ECC_SOFT)
+	     || (chip->ecc.mode == NAND_ECC_HW_ON_DIE))
+	    && (chip->page_shift > 9))
 		chip->options |= NAND_SUBPAGE_READ;
 
 	/* Fill in remaining MTD driver data */
diff --git a/drivers/of/of_mtd.c b/drivers/of/of_mtd.c
index a27ec94..79ea026 100644
--- a/drivers/of/of_mtd.c
+++ b/drivers/of/of_mtd.c
@@ -23,6 +23,7 @@ static const char *nand_ecc_modes[] = {
 	[NAND_ECC_HW_SYNDROME]	= "hw_syndrome",
 	[NAND_ECC_HW_OOB_FIRST]	= "hw_oob_first",
 	[NAND_ECC_SOFT_BCH]	= "soft_bch",
+	[NAND_ECC_HW_ON_DIE]	= "hw_on_die",
 };
 
 /**
diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
index 24e9159..2c62b2a 100644
--- a/include/linux/mtd/nand.h
+++ b/include/linux/mtd/nand.h
@@ -143,6 +143,7 @@ typedef enum {
 	NAND_ECC_HW_SYNDROME,
 	NAND_ECC_HW_OOB_FIRST,
 	NAND_ECC_SOFT_BCH,
+	NAND_ECC_HW_ON_DIE,
 } nand_ecc_modes_t;
 
 /*

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-01-10 15:44 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-09 21:37 On-die ECC support Peter Rosin
2017-01-09 21:46 ` Richard Weinberger
2017-01-10 14:54   ` Peter Rosin
2017-01-10 15:30     ` Richard Weinberger
2017-01-10 15:43       ` Boris Brezillon
     [not found] <CACwUX0M-04vCTkA8WwNR9si=7N-xuYOTy6aWO7ty8xa1+S3rUw@mail.gmail.com>
2013-03-22  8:21 ` on-die " Peter Horton
2013-03-22 21:14   ` David Mosberger-Tang
2013-03-23 19:27     ` Peter Horton
  -- strict thread matches above, loose matches on Subject: below --
2013-03-15 16:08 David Mosberger-Tang
2013-03-15 16:17 ` David Mosberger-Tang
2013-03-15 17:58 ` Peter Horton
2013-03-20 17:46 ` David Mosberger-Tang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.