All of lore.kernel.org
 help / color / mirror / Atom feed
* [V3, 1/2] mtd: brcmnand: Add check for erased page bitflips
@ 2016-06-09 21:17 Kamal Dasu
  2016-06-09 21:17 ` [V3, 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads Kamal Dasu
  2016-06-13 20:07 ` [V3, 1/2] mtd: brcmnand: Add check for erased page bitflips Boris Brezillon
  0 siblings, 2 replies; 6+ messages in thread
From: Kamal Dasu @ 2016-06-09 21:17 UTC (permalink / raw)
  To: linux-mtd, computersforpeace, boris.brezillon
  Cc: f.fainelli, bcm-kernel-feedback-list, Kamal Dasu

Check for erased page bitflips in a page. And if well within
threshold return data as all 0xff. Apply sw check for controller
version < 7.2. Controller vesion >= 7.2 has hw support.

Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
---
V3 changes
 Return the max bitflips from a sector within a page
V2 changes 
 Added use of nand_check_erased_ecc_chunk
 Restrict change to older controller < 7.2
---

 drivers/mtd/nand/brcmnand/brcmnand.c | 62 ++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c
index b76ad7c..7ee9617 100644
--- a/drivers/mtd/nand/brcmnand/brcmnand.c
+++ b/drivers/mtd/nand/brcmnand/brcmnand.c
@@ -1545,6 +1545,56 @@ static int brcmnand_read_by_pio(struct mtd_info *mtd, struct nand_chip *chip,
 	return ret;
 }
 
+/*
+ * Check a page to see if it is erased (w/ bitflips) after an uncorrectable ECC
+ * error
+ *
+ * Because the HW ECC signals an ECC error if an erase paged has even a single
+ * bitflip, we must check each ECC error to see if it is actually an erased
+ * page with bitflips, not a truly corrupted page.
+ *
+ * On a real error, return a negative error code (-EBADMSG for ECC error), and
+ * buf will contain raw data.
+ * Otherwise, buf gets filled with 0xffs and return the maximum number of
+ * bitflips-per-ECC-sector to the caller.
+ *
+ */
+static int brcmstb_nand_verify_erased_page(struct mtd_info *mtd,
+		  struct nand_chip *chip, void *buf, u64 addr)
+{
+	int i, sas;
+	void *oob = chip->oob_poi;
+	int bitflips = 0;
+	int page = addr >> chip->page_shift;
+	int ret;
+
+	if (!buf) {
+		buf = chip->buffers->databuf;
+		/* Invalidate page cache */
+		chip->pagebuf = -1;
+	}
+
+	sas = mtd->oobsize / chip->ecc.steps;
+
+	/* read without ecc for verification */
+	chip->cmdfunc(mtd, NAND_CMD_READ0, 0x00, page);
+	ret = chip->ecc.read_page_raw(mtd, chip, buf, true, page);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < chip->ecc.steps; i++, oob += sas) {
+		ret = nand_check_erased_ecc_chunk(buf, chip->ecc.size,
+						  oob, sas, NULL, 0,
+						  chip->ecc.strength);
+		if (ret < 0)
+			return ret;
+
+		bitflips = max(bitflips, ret);
+	}
+
+	return bitflips;
+}
+
 static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
 			 u64 addr, unsigned int trans, u32 *buf, u8 *oob)
 {
@@ -1575,6 +1625,18 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
 	}
 
 	if (mtd_is_eccerr(err)) {
+		/*
+		 * Controller version 7.2 has hw encoder to detect erased page
+		 * bitflips, apply sw verification for older controllers only
+		 */
+		if (ctrl->nand_version < 0x0702) {
+			err = brcmstb_nand_verify_erased_page(mtd, chip, buf,
+							      addr);
+			/* erased page bitflips corrected */
+			if (err > 0)
+				return err;
+		}
+
 		dev_dbg(ctrl->dev, "uncorrectable error at 0x%llx\n",
 			(unsigned long long)err_addr);
 		mtd->ecc_stats.failed++;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [V3, 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads
  2016-06-09 21:17 [V3, 1/2] mtd: brcmnand: Add check for erased page bitflips Kamal Dasu
@ 2016-06-09 21:17 ` Kamal Dasu
  2016-07-11  5:04   ` Brian Norris
  2016-06-13 20:07 ` [V3, 1/2] mtd: brcmnand: Add check for erased page bitflips Boris Brezillon
  1 sibling, 1 reply; 6+ messages in thread
From: Kamal Dasu @ 2016-06-09 21:17 UTC (permalink / raw)
  To: linux-mtd, computersforpeace, boris.brezillon
  Cc: f.fainelli, bcm-kernel-feedback-list, Kamal Dasu

This change provides a fix for controller bug where nand
controller could have a possible sticky error after a PIO
followed by a DMA read. The fix retries a read if we see
a uncorr_ecc after read to detect such sticky errors.
The fix applies to only controller version 7.0 and 7.1.

Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
---
V3 changes 
 none
V2 Changes 
 Restrict controller bug workaround to version 7.0 and 7.1
---
 drivers/mtd/nand/brcmnand/brcmnand.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c
index 7ee9617..1156495 100644
--- a/drivers/mtd/nand/brcmnand/brcmnand.c
+++ b/drivers/mtd/nand/brcmnand/brcmnand.c
@@ -1602,9 +1602,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
 	struct brcmnand_controller *ctrl = host->ctrl;
 	u64 err_addr = 0;
 	int err;
+	static bool retry = true;
 
 	dev_dbg(ctrl->dev, "read %llx -> %p\n", (unsigned long long)addr, buf);
 
+try_dmaread:
 	brcmnand_write_reg(ctrl, BRCMNAND_UNCORR_COUNT, 0);
 
 	if (has_flash_dma(ctrl) && !oob && flash_dma_buf_ok(buf)) {
@@ -1626,6 +1628,22 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
 
 	if (mtd_is_eccerr(err)) {
 		/*
+		 * On controller version and 7.0, 7.1 , DMA read after a
+		 * prior PIO read that reported uncorrectable error,
+		 * the DMA engine captures this error following DMA read
+		 * cleared only on subsequent DMA read, so just retry once
+		 * to clear a possible false error reported for current DMA
+		 * read
+		 */
+		if ((ctrl->nand_version == 0x0700) ||
+		    (ctrl->nand_version == 0x0701)) {
+			if (retry) {
+				retry = false;
+				goto try_dmaread;
+			}
+		}
+
+		/*
 		 * Controller version 7.2 has hw encoder to detect erased page
 		 * bitflips, apply sw verification for older controllers only
 		 */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [V3, 1/2] mtd: brcmnand: Add check for erased page bitflips
  2016-06-09 21:17 [V3, 1/2] mtd: brcmnand: Add check for erased page bitflips Kamal Dasu
  2016-06-09 21:17 ` [V3, 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads Kamal Dasu
@ 2016-06-13 20:07 ` Boris Brezillon
  1 sibling, 0 replies; 6+ messages in thread
From: Boris Brezillon @ 2016-06-13 20:07 UTC (permalink / raw)
  To: Kamal Dasu
  Cc: linux-mtd, computersforpeace, f.fainelli, bcm-kernel-feedback-list

On Thu,  9 Jun 2016 17:17:54 -0400
Kamal Dasu <kdasu.kdev@gmail.com> wrote:

> Check for erased page bitflips in a page. And if well within
> threshold return data as all 0xff. Apply sw check for controller
> version < 7.2. Controller vesion >= 7.2 has hw support.
> 
> Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>

Applied both.

Thanks,

Boris

> ---
> V3 changes
>  Return the max bitflips from a sector within a page
> V2 changes 
>  Added use of nand_check_erased_ecc_chunk
>  Restrict change to older controller < 7.2
> ---
> 
>  drivers/mtd/nand/brcmnand/brcmnand.c | 62 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 62 insertions(+)
> 
> diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c
> index b76ad7c..7ee9617 100644
> --- a/drivers/mtd/nand/brcmnand/brcmnand.c
> +++ b/drivers/mtd/nand/brcmnand/brcmnand.c
> @@ -1545,6 +1545,56 @@ static int brcmnand_read_by_pio(struct mtd_info *mtd, struct nand_chip *chip,
>  	return ret;
>  }
>  
> +/*
> + * Check a page to see if it is erased (w/ bitflips) after an uncorrectable ECC
> + * error
> + *
> + * Because the HW ECC signals an ECC error if an erase paged has even a single
> + * bitflip, we must check each ECC error to see if it is actually an erased
> + * page with bitflips, not a truly corrupted page.
> + *
> + * On a real error, return a negative error code (-EBADMSG for ECC error), and
> + * buf will contain raw data.
> + * Otherwise, buf gets filled with 0xffs and return the maximum number of
> + * bitflips-per-ECC-sector to the caller.
> + *
> + */
> +static int brcmstb_nand_verify_erased_page(struct mtd_info *mtd,
> +		  struct nand_chip *chip, void *buf, u64 addr)
> +{
> +	int i, sas;
> +	void *oob = chip->oob_poi;
> +	int bitflips = 0;
> +	int page = addr >> chip->page_shift;
> +	int ret;
> +
> +	if (!buf) {
> +		buf = chip->buffers->databuf;
> +		/* Invalidate page cache */
> +		chip->pagebuf = -1;
> +	}
> +
> +	sas = mtd->oobsize / chip->ecc.steps;
> +
> +	/* read without ecc for verification */
> +	chip->cmdfunc(mtd, NAND_CMD_READ0, 0x00, page);
> +	ret = chip->ecc.read_page_raw(mtd, chip, buf, true, page);
> +	if (ret)
> +		return ret;
> +
> +	for (i = 0; i < chip->ecc.steps; i++, oob += sas) {
> +		ret = nand_check_erased_ecc_chunk(buf, chip->ecc.size,
> +						  oob, sas, NULL, 0,
> +						  chip->ecc.strength);
> +		if (ret < 0)
> +			return ret;
> +
> +		bitflips = max(bitflips, ret);
> +	}
> +
> +	return bitflips;
> +}
> +
>  static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
>  			 u64 addr, unsigned int trans, u32 *buf, u8 *oob)
>  {
> @@ -1575,6 +1625,18 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
>  	}
>  
>  	if (mtd_is_eccerr(err)) {
> +		/*
> +		 * Controller version 7.2 has hw encoder to detect erased page
> +		 * bitflips, apply sw verification for older controllers only
> +		 */
> +		if (ctrl->nand_version < 0x0702) {
> +			err = brcmstb_nand_verify_erased_page(mtd, chip, buf,
> +							      addr);
> +			/* erased page bitflips corrected */
> +			if (err > 0)
> +				return err;
> +		}
> +
>  		dev_dbg(ctrl->dev, "uncorrectable error at 0x%llx\n",
>  			(unsigned long long)err_addr);
>  		mtd->ecc_stats.failed++;



-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [V3, 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads
  2016-06-09 21:17 ` [V3, 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads Kamal Dasu
@ 2016-07-11  5:04   ` Brian Norris
  2016-07-11  6:49     ` Boris Brezillon
  0 siblings, 1 reply; 6+ messages in thread
From: Brian Norris @ 2016-07-11  5:04 UTC (permalink / raw)
  To: Kamal Dasu
  Cc: linux-mtd, boris.brezillon, f.fainelli, bcm-kernel-feedback-list

Hi,

Looking through Boris's pull request, I noticed an issue:

On Thu, Jun 09, 2016 at 05:17:55PM -0400, Kamal Dasu wrote:
> This change provides a fix for controller bug where nand
> controller could have a possible sticky error after a PIO
> followed by a DMA read. The fix retries a read if we see
> a uncorr_ecc after read to detect such sticky errors.
> The fix applies to only controller version 7.0 and 7.1.
> 
> Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
> ---
> V3 changes 
>  none
> V2 Changes 
>  Restrict controller bug workaround to version 7.0 and 7.1
> ---
>  drivers/mtd/nand/brcmnand/brcmnand.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c
> index 7ee9617..1156495 100644
> --- a/drivers/mtd/nand/brcmnand/brcmnand.c
> +++ b/drivers/mtd/nand/brcmnand/brcmnand.c
> @@ -1602,9 +1602,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
>  	struct brcmnand_controller *ctrl = host->ctrl;
>  	u64 err_addr = 0;
>  	int err;
> +	static bool retry = true;

Is this intentionally static? That means your retry will only be
performed exactly once *ever*. Probably not what you expected?

Boris, if this is indeed unintended, would you rather fix this up in the
original patch and send me a new pull request? Or have my apply the
trivial fixup (i.e., s/static//) as a separate patch on top?

Brian

>  
>  	dev_dbg(ctrl->dev, "read %llx -> %p\n", (unsigned long long)addr, buf);
>  
> +try_dmaread:
>  	brcmnand_write_reg(ctrl, BRCMNAND_UNCORR_COUNT, 0);
>  
>  	if (has_flash_dma(ctrl) && !oob && flash_dma_buf_ok(buf)) {
> @@ -1626,6 +1628,22 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
>  
>  	if (mtd_is_eccerr(err)) {
>  		/*
> +		 * On controller version and 7.0, 7.1 , DMA read after a
> +		 * prior PIO read that reported uncorrectable error,
> +		 * the DMA engine captures this error following DMA read
> +		 * cleared only on subsequent DMA read, so just retry once
> +		 * to clear a possible false error reported for current DMA
> +		 * read
> +		 */
> +		if ((ctrl->nand_version == 0x0700) ||
> +		    (ctrl->nand_version == 0x0701)) {
> +			if (retry) {
> +				retry = false;
> +				goto try_dmaread;
> +			}
> +		}
> +
> +		/*
>  		 * Controller version 7.2 has hw encoder to detect erased page
>  		 * bitflips, apply sw verification for older controllers only
>  		 */
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [V3, 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads
  2016-07-11  5:04   ` Brian Norris
@ 2016-07-11  6:49     ` Boris Brezillon
  2016-07-11 14:52       ` Kamal Dasu
  0 siblings, 1 reply; 6+ messages in thread
From: Boris Brezillon @ 2016-07-11  6:49 UTC (permalink / raw)
  To: Brian Norris; +Cc: Kamal Dasu, linux-mtd, f.fainelli, bcm-kernel-feedback-list

On Sun, 10 Jul 2016 22:04:52 -0700
Brian Norris <computersforpeace@gmail.com> wrote:

> Hi,
> 
> Looking through Boris's pull request, I noticed an issue:
> 
> On Thu, Jun 09, 2016 at 05:17:55PM -0400, Kamal Dasu wrote:
> > This change provides a fix for controller bug where nand
> > controller could have a possible sticky error after a PIO
> > followed by a DMA read. The fix retries a read if we see
> > a uncorr_ecc after read to detect such sticky errors.
> > The fix applies to only controller version 7.0 and 7.1.
> > 
> > Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
> > ---
> > V3 changes 
> >  none
> > V2 Changes 
> >  Restrict controller bug workaround to version 7.0 and 7.1
> > ---
> >  drivers/mtd/nand/brcmnand/brcmnand.c | 18 ++++++++++++++++++
> >  1 file changed, 18 insertions(+)
> > 
> > diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c
> > index 7ee9617..1156495 100644
> > --- a/drivers/mtd/nand/brcmnand/brcmnand.c
> > +++ b/drivers/mtd/nand/brcmnand/brcmnand.c
> > @@ -1602,9 +1602,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
> >  	struct brcmnand_controller *ctrl = host->ctrl;
> >  	u64 err_addr = 0;
> >  	int err;
> > +	static bool retry = true;  
> 
> Is this intentionally static? That means your retry will only be
> performed exactly once *ever*. Probably not what you expected?
> 
> Boris, if this is indeed unintended, would you rather fix this up in the
> original patch and send me a new pull request?

I'll fix the initial commit and send a new PR once Kamal has confirmed
this variable should not be static (which is also my opinion).

> Or have my apply the
> trivial fixup (i.e., s/static//) as a separate patch on top?

Thanks,

Boris

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [V3, 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads
  2016-07-11  6:49     ` Boris Brezillon
@ 2016-07-11 14:52       ` Kamal Dasu
  0 siblings, 0 replies; 6+ messages in thread
From: Kamal Dasu @ 2016-07-11 14:52 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Brian Norris, Kamal Dasu, linux-mtd, Florian Fainelli,
	bcm-kernel-feedback-list

It's ok to fix it up.

Kamal

On Mon, Jul 11, 2016 at 2:49 AM, Boris Brezillon
<boris.brezillon@free-electrons.com> wrote:
> On Sun, 10 Jul 2016 22:04:52 -0700
> Brian Norris <computersforpeace@gmail.com> wrote:
>
>> Hi,
>>
>> Looking through Boris's pull request, I noticed an issue:
>>
>> On Thu, Jun 09, 2016 at 05:17:55PM -0400, Kamal Dasu wrote:
>> > This change provides a fix for controller bug where nand
>> > controller could have a possible sticky error after a PIO
>> > followed by a DMA read. The fix retries a read if we see
>> > a uncorr_ecc after read to detect such sticky errors.
>> > The fix applies to only controller version 7.0 and 7.1.
>> >
>> > Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
>> > ---
>> > V3 changes
>> >  none
>> > V2 Changes
>> >  Restrict controller bug workaround to version 7.0 and 7.1
>> > ---
>> >  drivers/mtd/nand/brcmnand/brcmnand.c | 18 ++++++++++++++++++
>> >  1 file changed, 18 insertions(+)
>> >
>> > diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c
>> > index 7ee9617..1156495 100644
>> > --- a/drivers/mtd/nand/brcmnand/brcmnand.c
>> > +++ b/drivers/mtd/nand/brcmnand/brcmnand.c
>> > @@ -1602,9 +1602,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
>> >     struct brcmnand_controller *ctrl = host->ctrl;
>> >     u64 err_addr = 0;
>> >     int err;
>> > +   static bool retry = true;
>>
>> Is this intentionally static? That means your retry will only be
>> performed exactly once *ever*. Probably not what you expected?
>>
>> Boris, if this is indeed unintended, would you rather fix this up in the
>> original patch and send me a new pull request?
>
> I'll fix the initial commit and send a new PR once Kamal has confirmed
> this variable should not be static (which is also my opinion).
>
>> Or have my apply the
>> trivial fixup (i.e., s/static//) as a separate patch on top?
>
> Thanks,
>
> Boris



-- 
Kamal Dasu
Principal Engineer | Broadcom Ltd. 200 Brickstone Sq Andover | 978-719-1405

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-07-11 14:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-09 21:17 [V3, 1/2] mtd: brcmnand: Add check for erased page bitflips Kamal Dasu
2016-06-09 21:17 ` [V3, 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads Kamal Dasu
2016-07-11  5:04   ` Brian Norris
2016-07-11  6:49     ` Boris Brezillon
2016-07-11 14:52       ` Kamal Dasu
2016-06-13 20:07 ` [V3, 1/2] mtd: brcmnand: Add check for erased page bitflips Boris Brezillon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.