[PATCH 1/2] mtd: brcmnand: Add check for erased page bitflips

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/2] mtd: brcmnand: Add check for erased page bitflips
@ 2016-04-29 20:21 Kamal Dasu
  2016-04-29 20:21 ` [PATCH 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads Kamal Dasu
  2016-05-30  8:42 ` [PATCH 1/2] mtd: brcmnand: Add check for erased page bitflips Boris Brezillon
  0 siblings, 2 replies; 12+ messages in thread
From: Kamal Dasu @ 2016-04-29 20:21 UTC (permalink / raw)
  To: linux-mtd, computersforpeace
  Cc: f.fainelli, bcm-kernel-feedback-list, Kamal Dasu

Check for erased page bitflips in a page. And if well within
threshold return data as all 0xff.

Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
---
 drivers/mtd/nand/brcmnand/brcmnand.c | 83 +++++++++++++++++++++++++++++++++---
 1 file changed, 78 insertions(+), 5 deletions(-)

diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c
index e052839..29a9abd 100644
--- a/drivers/mtd/nand/brcmnand/brcmnand.c
+++ b/drivers/mtd/nand/brcmnand/brcmnand.c
@@ -1490,6 +1490,64 @@ static int brcmnand_read_by_pio(struct mtd_info *mtd, struct nand_chip *chip,
 	return ret;
 }
 
+/*
+ * Check a page to see if it is erased (w/ bitflips) after an uncorrectable ECC
+ * error
+ *
+ * Because the HW ECC signals an ECC error if an erase paged has even a single
+ * bitflip, we must check each ECC error to see if it is actually an erased
+ * page with bitflips, not a truly corrupted page.
+ *
+ * On a real error, return a negative error code (-EBADMSG for ECC error), and
+ * buf will contain raw data.
+ * Otherwise, fill buf with 0xff and return the maximum number of
+ * bitflips-per-ECC-sector to the caller.
+ *
+ */
+static int brcmstb_nand_verify_erased_page(struct mtd_info *mtd,
+		  struct nand_chip *chip, void *buf, u64 addr)
+{
+	int i, sas, oob_nbits, data_nbits;
+	void *oob = chip->oob_poi;
+	unsigned int max_bitflips = 0;
+	int page = addr >> chip->page_shift;
+	int ret;
+
+	if (!buf) {
+		buf = chip->buffers->databuf;
+		/* Invalidate page cache */
+		chip->pagebuf = -1;
+	}
+
+	sas = mtd->oobsize / chip->ecc.steps;
+	oob_nbits = sas << 3;
+	data_nbits = chip->ecc.size << 3;
+
+	/* read without ecc for verification */
+	chip->cmdfunc(mtd, NAND_CMD_READ0, 0x00, page);
+	ret = chip->ecc.read_page_raw(mtd, chip, buf, true, page);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < chip->ecc.steps; i++, oob += sas) {
+		unsigned int bitflips = 0;
+
+		bitflips += oob_nbits - bitmap_weight(oob, oob_nbits);
+		bitflips += data_nbits - bitmap_weight(buf, data_nbits);
+
+		buf += chip->ecc.size;
+		addr += chip->ecc.size;
+
+		/* Too many bitflips */
+		if (bitflips > chip->ecc.strength)
+			return -EBADMSG;
+
+		max_bitflips = max(max_bitflips, bitflips);
+	}
+
+	return max_bitflips;
+}
+
 static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
 			 u64 addr, unsigned int trans, u32 *buf, u8 *oob)
 {
@@ -1520,11 +1578,26 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
 	}
 
 	if (mtd_is_eccerr(err)) {
-		dev_dbg(ctrl->dev, "uncorrectable error at 0x%llx\n",
-			(unsigned long long)err_addr);
-		mtd->ecc_stats.failed++;
-		/* NAND layer expects zero on ECC errors */
-		return 0;
+		int ret;
+
+		ret = brcmstb_nand_verify_erased_page(mtd, chip, buf, addr);
+		if (ret < 0) {
+			dev_dbg(ctrl->dev, "uncorrectable error at 0x%llx\n",
+				(unsigned long long)err_addr);
+			mtd->ecc_stats.failed++;
+			/* NAND layer expects zero on ECC errors */
+			return 0;
+		} else {
+			if (buf)
+				memset(buf, 0xff, FC_BYTES * trans);
+			if (oob)
+				memset(oob, 0xff, mtd->oobsize);
+
+			dev_info(&host->pdev->dev,
+					"corrected %d bitflips in blank page at 0x%llx\n",
+					ret, (unsigned long long)addr);
+			return ret;
+		}
 	}
 
 	if (mtd_is_bitflip(err)) {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads
  2016-04-29 20:21 [PATCH 1/2] mtd: brcmnand: Add check for erased page bitflips Kamal Dasu
@ 2016-04-29 20:21 ` Kamal Dasu
  2016-05-30  8:50   ` Boris Brezillon
  2016-05-30  8:42 ` [PATCH 1/2] mtd: brcmnand: Add check for erased page bitflips Boris Brezillon
  1 sibling, 1 reply; 12+ messages in thread
From: Kamal Dasu @ 2016-04-29 20:21 UTC (permalink / raw)
  To: linux-mtd, computersforpeace
  Cc: f.fainelli, bcm-kernel-feedback-list, Kamal Dasu

This change provides a fix for controller bug where nand
controller could have a possible sticky error after a PIO
followed by a DMA read. The fix retries a read if we see
a uncorr_ecc after read to detect such sticky errors.

Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
---
 drivers/mtd/nand/brcmnand/brcmnand.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c
index 29a9abd..13c7784 100644
--- a/drivers/mtd/nand/brcmnand/brcmnand.c
+++ b/drivers/mtd/nand/brcmnand/brcmnand.c
@@ -1555,9 +1555,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
 	struct brcmnand_controller *ctrl = host->ctrl;
 	u64 err_addr = 0;
 	int err;
+	bool retry = true;
 
 	dev_dbg(ctrl->dev, "read %llx -> %p\n", (unsigned long long)addr, buf);
 
+try_dmaread:
 	brcmnand_write_reg(ctrl, BRCMNAND_UNCORR_COUNT, 0);
 
 	if (has_flash_dma(ctrl) && !oob && flash_dma_buf_ok(buf)) {
@@ -1579,7 +1581,18 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
 
 	if (mtd_is_eccerr(err)) {
 		int ret;
-
+		/*
+		 * On controller version >=7.0 if we are doing a DMA read
+		 * after a prior PIO read that reported uncorrectable error,
+		 * the DMA engine captures this error following DMA read
+		 * cleared only on subsequent DMA read, so just retry once
+		 * to clear a possible false error reported for current DMA
+		 * read
+		 */
+		if ((ctrl->nand_version >= 0x0700) && retry) {
+			retry = false;
+			goto try_dmaread;
+		}
 		ret = brcmstb_nand_verify_erased_page(mtd, chip, buf, addr);
 		if (ret < 0) {
 			dev_dbg(ctrl->dev, "uncorrectable error at 0x%llx\n",
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mtd: brcmnand: Add check for erased page bitflips
  2016-04-29 20:21 [PATCH 1/2] mtd: brcmnand: Add check for erased page bitflips Kamal Dasu
  2016-04-29 20:21 ` [PATCH 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads Kamal Dasu
@ 2016-05-30  8:42 ` Boris Brezillon
  2016-06-01 16:46   ` Kamal Dasu
  1 sibling, 1 reply; 12+ messages in thread
From: Boris Brezillon @ 2016-05-30  8:42 UTC (permalink / raw)
  To: Kamal Dasu
  Cc: linux-mtd, computersforpeace, f.fainelli, bcm-kernel-feedback-list

Hi Kamal,

On Fri, 29 Apr 2016 16:21:24 -0400
Kamal Dasu <kdasu.kdev@gmail.com> wrote:

> Check for erased page bitflips in a page. And if well within
> threshold return data as all 0xff.
> 
> Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
> ---
>  drivers/mtd/nand/brcmnand/brcmnand.c | 83 +++++++++++++++++++++++++++++++++---
>  1 file changed, 78 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c
> index e052839..29a9abd 100644
> --- a/drivers/mtd/nand/brcmnand/brcmnand.c
> +++ b/drivers/mtd/nand/brcmnand/brcmnand.c
> @@ -1490,6 +1490,64 @@ static int brcmnand_read_by_pio(struct mtd_info *mtd, struct nand_chip *chip,
>  	return ret;
>  }
>  
> +/*
> + * Check a page to see if it is erased (w/ bitflips) after an uncorrectable ECC
> + * error
> + *
> + * Because the HW ECC signals an ECC error if an erase paged has even a single
> + * bitflip, we must check each ECC error to see if it is actually an erased
> + * page with bitflips, not a truly corrupted page.
> + *
> + * On a real error, return a negative error code (-EBADMSG for ECC error), and
> + * buf will contain raw data.
> + * Otherwise, fill buf with 0xff and return the maximum number of
> + * bitflips-per-ECC-sector to the caller.
> + *
> + */
> +static int brcmstb_nand_verify_erased_page(struct mtd_info *mtd,
> +		  struct nand_chip *chip, void *buf, u64 addr)
> +{
> +	int i, sas, oob_nbits, data_nbits;
> +	void *oob = chip->oob_poi;
> +	unsigned int max_bitflips = 0;
> +	int page = addr >> chip->page_shift;
> +	int ret;
> +
> +	if (!buf) {
> +		buf = chip->buffers->databuf;
> +		/* Invalidate page cache */
> +		chip->pagebuf = -1;
> +	}
> +
> +	sas = mtd->oobsize / chip->ecc.steps;
> +	oob_nbits = sas << 3;
> +	data_nbits = chip->ecc.size << 3;
> +
> +	/* read without ecc for verification */
> +	chip->cmdfunc(mtd, NAND_CMD_READ0, 0x00, page);
> +	ret = chip->ecc.read_page_raw(mtd, chip, buf, true, page);

Do you really need to read the whole page in raw mode? Usually, only
reading the OOB sections is enough.

> +	if (ret)
> +		return ret;
> +
> +	for (i = 0; i < chip->ecc.steps; i++, oob += sas) {
> +		unsigned int bitflips = 0;
> +
> +		bitflips += oob_nbits - bitmap_weight(oob, oob_nbits);
> +		bitflips += data_nbits - bitmap_weight(buf, data_nbits);
> +
> +		buf += chip->ecc.size;
> +		addr += chip->ecc.size;

You seem to duplicate nand_check_erased_ecc_chunk() here. Do you have a
good reason for doing that?

Regards,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads
  2016-04-29 20:21 ` [PATCH 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads Kamal Dasu
@ 2016-05-30  8:50   ` Boris Brezillon
  2016-06-01 16:50     ` Kamal Dasu
  0 siblings, 1 reply; 12+ messages in thread
From: Boris Brezillon @ 2016-05-30  8:50 UTC (permalink / raw)
  To: Kamal Dasu
  Cc: linux-mtd, computersforpeace, f.fainelli, bcm-kernel-feedback-list

On Fri, 29 Apr 2016 16:21:25 -0400
Kamal Dasu <kdasu.kdev@gmail.com> wrote:

> This change provides a fix for controller bug where nand
> controller could have a possible sticky error after a PIO
> followed by a DMA read. The fix retries a read if we see
> a uncorr_ecc after read to detect such sticky errors.
> 
> Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
> ---
>  drivers/mtd/nand/brcmnand/brcmnand.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c
> index 29a9abd..13c7784 100644
> --- a/drivers/mtd/nand/brcmnand/brcmnand.c
> +++ b/drivers/mtd/nand/brcmnand/brcmnand.c
> @@ -1555,9 +1555,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
>  	struct brcmnand_controller *ctrl = host->ctrl;
>  	u64 err_addr = 0;
>  	int err;
> +	bool retry = true;
>  
>  	dev_dbg(ctrl->dev, "read %llx -> %p\n", (unsigned long long)addr, buf);
>  
> +try_dmaread:
>  	brcmnand_write_reg(ctrl, BRCMNAND_UNCORR_COUNT, 0);
>  
>  	if (has_flash_dma(ctrl) && !oob && flash_dma_buf_ok(buf)) {
> @@ -1579,7 +1581,18 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
>  
>  	if (mtd_is_eccerr(err)) {
>  		int ret;
> -
> +		/*
> +		 * On controller version >=7.0 if we are doing a DMA read
> +		 * after a prior PIO read that reported uncorrectable error,
> +		 * the DMA engine captures this error following DMA read
> +		 * cleared only on subsequent DMA read, so just retry once
> +		 * to clear a possible false error reported for current DMA
> +		 * read
> +		 */

Hm, shouldn't this BRCMNAND_UNCORR_COUNT bit be cleared just after
doing the PIO/DMA read instead of doing it before executing a new read?
This would solve your problem without the need for this extra retry, or
am I missing something?

> +		if ((ctrl->nand_version >= 0x0700) && retry) {
> +			retry = false;
> +			goto try_dmaread;
> +		}
>  		ret = brcmstb_nand_verify_erased_page(mtd, chip, buf, addr);
>  		if (ret < 0) {
>  			dev_dbg(ctrl->dev, "uncorrectable error at 0x%llx\n",



-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mtd: brcmnand: Add check for erased page bitflips
  2016-05-30  8:42 ` [PATCH 1/2] mtd: brcmnand: Add check for erased page bitflips Boris Brezillon
@ 2016-06-01 16:46   ` Kamal Dasu
  2016-06-01 17:14     ` Brian Norris
  0 siblings, 1 reply; 12+ messages in thread
From: Kamal Dasu @ 2016-06-01 16:46 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Kamal Dasu, linux-mtd, Brian Norris, Florian Fainelli,
	bcm-kernel-feedback-list

Boris,



On Mon, May 30, 2016 at 4:42 AM, Boris Brezillon
<boris.brezillon@free-electrons.com> wrote:
> Hi Kamal,
>
> On Fri, 29 Apr 2016 16:21:24 -0400
> Kamal Dasu <kdasu.kdev@gmail.com> wrote:
>
>> Check for erased page bitflips in a page. And if well within
>> threshold return data as all 0xff.
>>
>> Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
>> ---
>>  drivers/mtd/nand/brcmnand/brcmnand.c | 83 +++++++++++++++++++++++++++++++++---
>>  1 file changed, 78 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c
>> index e052839..29a9abd 100644
>> --- a/drivers/mtd/nand/brcmnand/brcmnand.c
>> +++ b/drivers/mtd/nand/brcmnand/brcmnand.c
>> @@ -1490,6 +1490,64 @@ static int brcmnand_read_by_pio(struct mtd_info *mtd, struct nand_chip *chip,
>>       return ret;
>>  }
>>
>> +/*
>> + * Check a page to see if it is erased (w/ bitflips) after an uncorrectable ECC
>> + * error
>> + *
>> + * Because the HW ECC signals an ECC error if an erase paged has even a single
>> + * bitflip, we must check each ECC error to see if it is actually an erased
>> + * page with bitflips, not a truly corrupted page.
>> + *
>> + * On a real error, return a negative error code (-EBADMSG for ECC error), and
>> + * buf will contain raw data.
>> + * Otherwise, fill buf with 0xff and return the maximum number of
>> + * bitflips-per-ECC-sector to the caller.
>> + *
>> + */
>> +static int brcmstb_nand_verify_erased_page(struct mtd_info *mtd,
>> +               struct nand_chip *chip, void *buf, u64 addr)
>> +{
>> +     int i, sas, oob_nbits, data_nbits;
>> +     void *oob = chip->oob_poi;
>> +     unsigned int max_bitflips = 0;
>> +     int page = addr >> chip->page_shift;
>> +     int ret;
>> +
>> +     if (!buf) {
>> +             buf = chip->buffers->databuf;
>> +             /* Invalidate page cache */
>> +             chip->pagebuf = -1;
>> +     }
>> +
>> +     sas = mtd->oobsize / chip->ecc.steps;
>> +     oob_nbits = sas << 3;
>> +     data_nbits = chip->ecc.size << 3;
>> +
>> +     /* read without ecc for verification */
>> +     chip->cmdfunc(mtd, NAND_CMD_READ0, 0x00, page);
>> +     ret = chip->ecc.read_page_raw(mtd, chip, buf, true, page);
>
> Do you really need to read the whole page in raw mode? Usually, only
> reading the OOB sections is enough.
>

Just so that the HW ecc algo does not kick in we read the page in raw
mode. OOB registers are filled as part of this read. Also just the
data might have bit flips and OOB might be all FFs, we still need to
read  page data. Generally we read again to make sure that the hw ecc
algo did not change the data after calculations in the original  read
where it reported uncorrectable error.  I will double check this.

>> +     if (ret)
>> +             return ret;
>> +
>> +     for (i = 0; i < chip->ecc.steps; i++, oob += sas) {
>> +             unsigned int bitflips = 0;
>> +
>> +             bitflips += oob_nbits - bitmap_weight(oob, oob_nbits);
>> +             bitflips += data_nbits - bitmap_weight(buf, data_nbits);
>> +
>> +             buf += chip->ecc.size;
>> +             addr += chip->ecc.size;
>
> You seem to duplicate nand_check_erased_ecc_chunk() here. Do you have a
> good reason for doing that?
>

Hmmm I see what you are saying. Let me try setting the
NAND_ECC_GENERIC_ERASED_CHECK option and see if we can get away
without having to read raw.  I will have to test and make sure on
uncorrectable error the hw leaves the return page data buffers and oob
buffers in raw state.

If that works as expected I will get rid of this duplication and send
a revised change which shall make use of the
NAND_ECC_GENERIC_ERASED_CHECK option.

Thanks
Kamal

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads
  2016-05-30  8:50   ` Boris Brezillon
@ 2016-06-01 16:50     ` Kamal Dasu
  2016-06-01 17:20       ` Boris Brezillon
  2016-06-01 20:37       ` Boris Brezillon
  0 siblings, 2 replies; 12+ messages in thread
From: Kamal Dasu @ 2016-06-01 16:50 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Kamal Dasu, linux-mtd, Brian Norris, Florian Fainelli,
	bcm-kernel-feedback-list

Boris,

On Mon, May 30, 2016 at 4:50 AM, Boris Brezillon
<boris.brezillon@free-electrons.com> wrote:
> On Fri, 29 Apr 2016 16:21:25 -0400
> Kamal Dasu <kdasu.kdev@gmail.com> wrote:
>
>> This change provides a fix for controller bug where nand
>> controller could have a possible sticky error after a PIO
>> followed by a DMA read. The fix retries a read if we see
>> a uncorr_ecc after read to detect such sticky errors.
>>
>> Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
>> ---
>>  drivers/mtd/nand/brcmnand/brcmnand.c | 15 ++++++++++++++-
>>  1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c
>> index 29a9abd..13c7784 100644
>> --- a/drivers/mtd/nand/brcmnand/brcmnand.c
>> +++ b/drivers/mtd/nand/brcmnand/brcmnand.c
>> @@ -1555,9 +1555,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
>>       struct brcmnand_controller *ctrl = host->ctrl;
>>       u64 err_addr = 0;
>>       int err;
>> +     bool retry = true;
>>
>>       dev_dbg(ctrl->dev, "read %llx -> %p\n", (unsigned long long)addr, buf);
>>
>> +try_dmaread:
>>       brcmnand_write_reg(ctrl, BRCMNAND_UNCORR_COUNT, 0);
>>
>>       if (has_flash_dma(ctrl) && !oob && flash_dma_buf_ok(buf)) {
>> @@ -1579,7 +1581,18 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
>>
>>       if (mtd_is_eccerr(err)) {
>>               int ret;
>> -
>> +             /*
>> +              * On controller version >=7.0 if we are doing a DMA read
>> +              * after a prior PIO read that reported uncorrectable error,
>> +              * the DMA engine captures this error following DMA read
>> +              * cleared only on subsequent DMA read, so just retry once
>> +              * to clear a possible false error reported for current DMA
>> +              * read
>> +              */
>
> Hm, shouldn't this BRCMNAND_UNCORR_COUNT bit be cleared just after
> doing the PIO/DMA read instead of doing it before executing a new read?
> This would solve your problem without the need for this extra retry, or
> am I missing something?
>

Clearing the count registers or the intr registers does not clear the
condition. Only a clean read (a page that does not have errors) clears
the condition. So if this was a false error  ( page is really clean)
and we read again, it will clear the condition.



Thanks
Kamal

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mtd: brcmnand: Add check for erased page bitflips
  2016-06-01 16:46   ` Kamal Dasu
@ 2016-06-01 17:14     ` Brian Norris
  2016-06-01 17:22       ` Boris Brezillon
  0 siblings, 1 reply; 12+ messages in thread
From: Brian Norris @ 2016-06-01 17:14 UTC (permalink / raw)
  To: Kamal Dasu
  Cc: Boris Brezillon, Kamal Dasu, linux-mtd, Florian Fainelli,
	bcm-kernel-feedback-list

On Wed, Jun 01, 2016 at 12:46:18PM -0400, Kamal Dasu wrote:
> On Mon, May 30, 2016 at 4:42 AM, Boris Brezillon
> <boris.brezillon@free-electrons.com> wrote:
> > On Fri, 29 Apr 2016 16:21:24 -0400
> > Kamal Dasu <kdasu.kdev@gmail.com> wrote:

> >> +     if (ret)
> >> +             return ret;
> >> +
> >> +     for (i = 0; i < chip->ecc.steps; i++, oob += sas) {
> >> +             unsigned int bitflips = 0;
> >> +
> >> +             bitflips += oob_nbits - bitmap_weight(oob, oob_nbits);
> >> +             bitflips += data_nbits - bitmap_weight(buf, data_nbits);
> >> +
> >> +             buf += chip->ecc.size;
> >> +             addr += chip->ecc.size;
> >
> > You seem to duplicate nand_check_erased_ecc_chunk() here. Do you have a
> > good reason for doing that?
> >
> 
> Hmmm I see what you are saying. Let me try setting the
> NAND_ECC_GENERIC_ERASED_CHECK option and see if we can get away
> without having to read raw.  I will have to test and make sure on
> uncorrectable error the hw leaves the return page data buffers and oob
> buffers in raw state.

I'm quite sure you can't make use of NAND_ECC_GENERIC_ERASED_CHECK
unless you do a substantial rewrite; brcmnand doesn't use any of the
nand_base ecc.read_{page,subpage} callbacks.

> If that works as expected I will get rid of this duplication and send
> a revised change which shall make use of the
> NAND_ECC_GENERIC_ERASED_CHECK option.

I suspect he was just suggesting calling the
nand_check_erased_ecc_chunk() helper instead of doing your own
bitmap_weight() calls.

Brian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads
  2016-06-01 16:50     ` Kamal Dasu
@ 2016-06-01 17:20       ` Boris Brezillon
  2016-06-01 20:37       ` Boris Brezillon
  1 sibling, 0 replies; 12+ messages in thread
From: Boris Brezillon @ 2016-06-01 17:20 UTC (permalink / raw)
  To: Kamal Dasu
  Cc: Kamal Dasu, linux-mtd, Brian Norris, Florian Fainelli,
	bcm-kernel-feedback-list

On Wed, 1 Jun 2016 12:50:56 -0400
Kamal Dasu <kamal.dasu@broadcom.com> wrote:

> Boris,
> 
> On Mon, May 30, 2016 at 4:50 AM, Boris Brezillon
> <boris.brezillon@free-electrons.com> wrote:
> > On Fri, 29 Apr 2016 16:21:25 -0400
> > Kamal Dasu <kdasu.kdev@gmail.com> wrote:
> >  
> >> This change provides a fix for controller bug where nand
> >> controller could have a possible sticky error after a PIO
> >> followed by a DMA read. The fix retries a read if we see
> >> a uncorr_ecc after read to detect such sticky errors.
> >>
> >> Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
> >> ---
> >>  drivers/mtd/nand/brcmnand/brcmnand.c | 15 ++++++++++++++-
> >>  1 file changed, 14 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c
> >> index 29a9abd..13c7784 100644
> >> --- a/drivers/mtd/nand/brcmnand/brcmnand.c
> >> +++ b/drivers/mtd/nand/brcmnand/brcmnand.c
> >> @@ -1555,9 +1555,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
> >>       struct brcmnand_controller *ctrl = host->ctrl;
> >>       u64 err_addr = 0;
> >>       int err;
> >> +     bool retry = true;
> >>
> >>       dev_dbg(ctrl->dev, "read %llx -> %p\n", (unsigned long long)addr, buf);
> >>
> >> +try_dmaread:
> >>       brcmnand_write_reg(ctrl, BRCMNAND_UNCORR_COUNT, 0);
> >>
> >>       if (has_flash_dma(ctrl) && !oob && flash_dma_buf_ok(buf)) {
> >> @@ -1579,7 +1581,18 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
> >>
> >>       if (mtd_is_eccerr(err)) {
> >>               int ret;
> >> -
> >> +             /*
> >> +              * On controller version >=7.0 if we are doing a DMA read
> >> +              * after a prior PIO read that reported uncorrectable error,
> >> +              * the DMA engine captures this error following DMA read
> >> +              * cleared only on subsequent DMA read, so just retry once
> >> +              * to clear a possible false error reported for current DMA
> >> +              * read
> >> +              */  
> >
> > Hm, shouldn't this BRCMNAND_UNCORR_COUNT bit be cleared just after
> > doing the PIO/DMA read instead of doing it before executing a new read?
> > This would solve your problem without the need for this extra retry, or
> > am I missing something?
> >  
> 
> Clearing the count registers or the intr registers does not clear the
> condition. Only a clean read (a page that does not have errors) clears
> the condition. So if this was a false error  ( page is really clean)
> and we read again, it will clear the condition.

This sounds like an expensive workaround, but you know the IP better
than I do, so I'll trust you ;).


-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mtd: brcmnand: Add check for erased page bitflips
  2016-06-01 17:14     ` Brian Norris
@ 2016-06-01 17:22       ` Boris Brezillon
  2016-06-01 17:27         ` Kamal Dasu
  0 siblings, 1 reply; 12+ messages in thread
From: Boris Brezillon @ 2016-06-01 17:22 UTC (permalink / raw)
  To: Brian Norris
  Cc: Kamal Dasu, Kamal Dasu, linux-mtd, Florian Fainelli,
	bcm-kernel-feedback-list

On Wed, 1 Jun 2016 10:14:40 -0700
Brian Norris <computersforpeace@gmail.com> wrote:

> On Wed, Jun 01, 2016 at 12:46:18PM -0400, Kamal Dasu wrote:
> > On Mon, May 30, 2016 at 4:42 AM, Boris Brezillon
> > <boris.brezillon@free-electrons.com> wrote:  
> > > On Fri, 29 Apr 2016 16:21:24 -0400
> > > Kamal Dasu <kdasu.kdev@gmail.com> wrote:  
> 
> > >> +     if (ret)
> > >> +             return ret;
> > >> +
> > >> +     for (i = 0; i < chip->ecc.steps; i++, oob += sas) {
> > >> +             unsigned int bitflips = 0;
> > >> +
> > >> +             bitflips += oob_nbits - bitmap_weight(oob, oob_nbits);
> > >> +             bitflips += data_nbits - bitmap_weight(buf, data_nbits);
> > >> +
> > >> +             buf += chip->ecc.size;
> > >> +             addr += chip->ecc.size;  
> > >
> > > You seem to duplicate nand_check_erased_ecc_chunk() here. Do you have a
> > > good reason for doing that?
> > >  
> > 
> > Hmmm I see what you are saying. Let me try setting the
> > NAND_ECC_GENERIC_ERASED_CHECK option and see if we can get away
> > without having to read raw.  I will have to test and make sure on
> > uncorrectable error the hw leaves the return page data buffers and oob
> > buffers in raw state.  
> 
> I'm quite sure you can't make use of NAND_ECC_GENERIC_ERASED_CHECK
> unless you do a substantial rewrite; brcmnand doesn't use any of the
> nand_base ecc.read_{page,subpage} callbacks.
> 
> > If that works as expected I will get rid of this duplication and send
> > a revised change which shall make use of the
> > NAND_ECC_GENERIC_ERASED_CHECK option.  
> 
> I suspect he was just suggesting calling the
> nand_check_erased_ecc_chunk() helper instead of doing your own
> bitmap_weight() calls.

Exactly.


-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mtd: brcmnand: Add check for erased page bitflips
  2016-06-01 17:22       ` Boris Brezillon
@ 2016-06-01 17:27         ` Kamal Dasu
  0 siblings, 0 replies; 12+ messages in thread
From: Kamal Dasu @ 2016-06-01 17:27 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Brian Norris, Kamal Dasu, linux-mtd, Florian Fainelli,
	bcm-kernel-feedback-list

Boris,

>> I'm quite sure you can't make use of NAND_ECC_GENERIC_ERASED_CHECK
>> unless you do a substantial rewrite; brcmnand doesn't use any of the
>> nand_base ecc.read_{page,subpage} callbacks.
>>
>> > If that works as expected I will get rid of this duplication and send
>> > a revised change which shall make use of the
>> > NAND_ECC_GENERIC_ERASED_CHECK option.
>>
>> I suspect he was just suggesting calling the
>> nand_check_erased_ecc_chunk() helper instead of doing your own
>> bitmap_weight() calls.
>
> Exactly.
>
>

Ok got it. Yes I did realize that using the option is not straight
forward as far as brcmnand driver is concerned.

Thanks
Kamal

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads
  2016-06-01 16:50     ` Kamal Dasu
  2016-06-01 17:20       ` Boris Brezillon
@ 2016-06-01 20:37       ` Boris Brezillon
  2016-06-02 18:55         ` Kamal Dasu
  1 sibling, 1 reply; 12+ messages in thread
From: Boris Brezillon @ 2016-06-01 20:37 UTC (permalink / raw)
  To: Kamal Dasu
  Cc: Kamal Dasu, linux-mtd, Brian Norris, Florian Fainelli,
	bcm-kernel-feedback-list

On Wed, 1 Jun 2016 12:50:56 -0400
Kamal Dasu <kamal.dasu@broadcom.com> wrote:

> Boris,
> 
> On Mon, May 30, 2016 at 4:50 AM, Boris Brezillon
> <boris.brezillon@free-electrons.com> wrote:
> > On Fri, 29 Apr 2016 16:21:25 -0400
> > Kamal Dasu <kdasu.kdev@gmail.com> wrote:
> >  
> >> This change provides a fix for controller bug where nand
> >> controller could have a possible sticky error after a PIO
> >> followed by a DMA read. The fix retries a read if we see
> >> a uncorr_ecc after read to detect such sticky errors.
> >>
> >> Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
> >> ---
> >>  drivers/mtd/nand/brcmnand/brcmnand.c | 15 ++++++++++++++-
> >>  1 file changed, 14 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c
> >> index 29a9abd..13c7784 100644
> >> --- a/drivers/mtd/nand/brcmnand/brcmnand.c
> >> +++ b/drivers/mtd/nand/brcmnand/brcmnand.c
> >> @@ -1555,9 +1555,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
> >>       struct brcmnand_controller *ctrl = host->ctrl;
> >>       u64 err_addr = 0;
> >>       int err;
> >> +     bool retry = true;
> >>
> >>       dev_dbg(ctrl->dev, "read %llx -> %p\n", (unsigned long long)addr, buf);
> >>
> >> +try_dmaread:
> >>       brcmnand_write_reg(ctrl, BRCMNAND_UNCORR_COUNT, 0);
> >>
> >>       if (has_flash_dma(ctrl) && !oob && flash_dma_buf_ok(buf)) {
> >> @@ -1579,7 +1581,18 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
> >>
> >>       if (mtd_is_eccerr(err)) {
> >>               int ret;
> >> -
> >> +             /*
> >> +              * On controller version >=7.0 if we are doing a DMA read
> >> +              * after a prior PIO read that reported uncorrectable error,
> >> +              * the DMA engine captures this error following DMA read
> >> +              * cleared only on subsequent DMA read, so just retry once
> >> +              * to clear a possible false error reported for current DMA
> >> +              * read
> >> +              */  
> >
> > Hm, shouldn't this BRCMNAND_UNCORR_COUNT bit be cleared just after
> > doing the PIO/DMA read instead of doing it before executing a new read?
> > This would solve your problem without the need for this extra retry, or
> > am I missing something?
> >  
> 
> Clearing the count registers or the intr registers does not clear the
> condition. Only a clean read (a page that does not have errors) clears
> the condition. So if this was a false error  ( page is really clean)
> and we read again, it will clear the condition.
> 

Can you try that patch [1], just to make sure the extra read cycle is
really necessary?

[1]http://code.bulix.org/uzb7m8-99997

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads
  2016-06-01 20:37       ` Boris Brezillon
@ 2016-06-02 18:55         ` Kamal Dasu
  0 siblings, 0 replies; 12+ messages in thread
From: Kamal Dasu @ 2016-06-02 18:55 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Kamal Dasu, linux-mtd, Brian Norris, Florian Fainelli,
	bcm-kernel-feedback-list

Yes have tried this, but its the controller internals that causes this
condition.

Kamal

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-06-02 18:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-29 20:21 [PATCH 1/2] mtd: brcmnand: Add check for erased page bitflips Kamal Dasu
2016-04-29 20:21 ` [PATCH 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads Kamal Dasu
2016-05-30  8:50   ` Boris Brezillon
2016-06-01 16:50     ` Kamal Dasu
2016-06-01 17:20       ` Boris Brezillon
2016-06-01 20:37       ` Boris Brezillon
2016-06-02 18:55         ` Kamal Dasu
2016-05-30  8:42 ` [PATCH 1/2] mtd: brcmnand: Add check for erased page bitflips Boris Brezillon
2016-06-01 16:46   ` Kamal Dasu
2016-06-01 17:14     ` Brian Norris
2016-06-01 17:22       ` Boris Brezillon
2016-06-01 17:27         ` Kamal Dasu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.