All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mtd: rawnand: micron: Fix support for on-die ECC
@ 2018-05-03  7:49 Boris Brezillon
  2018-05-04  9:58 ` Miquel Raynal
  0 siblings, 1 reply; 5+ messages in thread
From: Boris Brezillon @ 2018-05-03  7:49 UTC (permalink / raw)
  To: Boris Brezillon, Richard Weinberger, Miquel Raynal, linux-mtd
  Cc: David Woodhouse, Brian Norris, Marek Vasut, Cyrille Pitchen,
	stable, Thomas Petazzoni, Bean Huo, Peter Pan

It looks like the NAND_STATUS_FAIL bit is sticky after an ECC failure,
which leads all READ operations following the failing one to report
an ECC failure. Reset the chip to clear the NAND_STATUS_FAIL bit.

Note that this behavior is not document in the datasheet, but resetting
the chip is the only solution we found to fix the problem.

Fixes: 9748e1d87573 ("mtd: nand: add support for Micron on-die ECC")
Cc: <stable@vger.kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Bean Huo <beanhuo@micron.com>
Cc: Peter Pan <peterpandong@micron.com>
---
Peter, Bean,

Can you confirm this behavior, or ask someone in Micron who can confirm
it? Also, if a RESET is actually needed, it would be good to update the
datasheet accordingly. And if that's not the case, can you explain why
the NAND_STATUS_FAIL bit is stuck and how to clear it (I tried a 0x00
command, A.K.A. READ STATUS EXIT, but it does not clear this bit, ERASE
and PROGRAM seem to clear the bit, but that's clearly not the kind of
operation I can do when the user asks for a READ)?

Thanks,

Boris
---
 drivers/mtd/nand/raw/nand_micron.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c
index 0af45b134c0c..a915f568f6a3 100644
--- a/drivers/mtd/nand/raw/nand_micron.c
+++ b/drivers/mtd/nand/raw/nand_micron.c
@@ -153,6 +153,23 @@ micron_nand_read_page_on_die_ecc(struct mtd_info *mtd, struct nand_chip *chip,
 		ret = nand_read_data_op(chip, chip->oob_poi, mtd->oobsize,
 					false);
 
+	/*
+	 * Looks like the NAND_STATUS_FAIL bit is sticky after an ECC failure,
+	 * which leads all READ operations following the failing one to report
+	 * an ECC failure.
+	 * Reset the chip to clear it.
+	 *
+	 * Note that this behavior is not document in the datasheet, but
+	 * resetting the chip is the only solution we found to clear this bit.
+	 */
+	if (status & NAND_STATUS_FAIL) {
+		int cs = page >> (chip->chip_shift - chip->page_shift);
+
+		chip->select_chip(mtd, -1);
+		nand_reset(chip, cs);
+		chip->select_chip(mtd, cs);
+	}
+
 out:
 	micron_nand_on_die_ecc_setup(chip, false);
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] mtd: rawnand: micron: Fix support for on-die ECC
  2018-05-03  7:49 [PATCH] mtd: rawnand: micron: Fix support for on-die ECC Boris Brezillon
@ 2018-05-04  9:58 ` Miquel Raynal
  2018-05-08 21:12   ` Boris Brezillon
  0 siblings, 1 reply; 5+ messages in thread
From: Miquel Raynal @ 2018-05-04  9:58 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Richard Weinberger, linux-mtd, David Woodhouse, Brian Norris,
	Marek Vasut, Cyrille Pitchen, stable, Thomas Petazzoni, Bean Huo,
	Peter Pan

Hi Boris,

On Thu,  3 May 2018 09:49:08 +0200, Boris Brezillon
<boris.brezillon@bootlin.com> wrote:

> It looks like the NAND_STATUS_FAIL bit is sticky after an ECC failure,
> which leads all READ operations following the failing one to report
> an ECC failure. Reset the chip to clear the NAND_STATUS_FAIL bit.
> 
> Note that this behavior is not document in the datasheet, but resetting
> the chip is the only solution we found to fix the problem.
> 
> Fixes: 9748e1d87573 ("mtd: nand: add support for Micron on-die ECC")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
> Cc: Bean Huo <beanhuo@micron.com>
> Cc: Peter Pan <peterpandong@micron.com>
> ---

Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mtd: rawnand: micron: Fix support for on-die ECC
  2018-05-04  9:58 ` Miquel Raynal
@ 2018-05-08 21:12   ` Boris Brezillon
  2018-05-10  6:46     ` Boris Brezillon
  0 siblings, 1 reply; 5+ messages in thread
From: Boris Brezillon @ 2018-05-08 21:12 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Richard Weinberger, stable, Peter Pan, Marek Vasut, linux-mtd,
	Thomas Petazzoni, Cyrille Pitchen, Brian Norris, David Woodhouse,
	Bean Huo

On Fri, 4 May 2018 11:58:35 +0200
Miquel Raynal <miquel.raynal@bootlin.com> wrote:

> Hi Boris,
> 
> On Thu,  3 May 2018 09:49:08 +0200, Boris Brezillon
> <boris.brezillon@bootlin.com> wrote:
> 
> > It looks like the NAND_STATUS_FAIL bit is sticky after an ECC failure,
> > which leads all READ operations following the failing one to report
> > an ECC failure. Reset the chip to clear the NAND_STATUS_FAIL bit.
> > 
> > Note that this behavior is not document in the datasheet, but resetting
> > the chip is the only solution we found to fix the problem.
> > 
> > Fixes: 9748e1d87573 ("mtd: nand: add support for Micron on-die ECC")
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
> > Cc: Bean Huo <beanhuo@micron.com>
> > Cc: Peter Pan <peterpandong@micron.com>
> > ---  
> 
> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>

Queued to mtd/master.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mtd: rawnand: micron: Fix support for on-die ECC
  2018-05-08 21:12   ` Boris Brezillon
@ 2018-05-10  6:46     ` Boris Brezillon
  0 siblings, 0 replies; 5+ messages in thread
From: Boris Brezillon @ 2018-05-10  6:46 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Richard Weinberger, stable, Marek Vasut, linux-mtd,
	Thomas Petazzoni, Cyrille Pitchen, Bean Huo, Brian Norris,
	David Woodhouse, Peter Pan

On Tue, 8 May 2018 23:12:59 +0200
Boris Brezillon <boris.brezillon@bootlin.com> wrote:

> On Fri, 4 May 2018 11:58:35 +0200
> Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> 
> > Hi Boris,
> > 
> > On Thu,  3 May 2018 09:49:08 +0200, Boris Brezillon
> > <boris.brezillon@bootlin.com> wrote:
> >   
> > > It looks like the NAND_STATUS_FAIL bit is sticky after an ECC failure,
> > > which leads all READ operations following the failing one to report
> > > an ECC failure. Reset the chip to clear the NAND_STATUS_FAIL bit.
> > > 
> > > Note that this behavior is not document in the datasheet, but resetting
> > > the chip is the only solution we found to fix the problem.
> > > 
> > > Fixes: 9748e1d87573 ("mtd: nand: add support for Micron on-die ECC")
> > > Cc: <stable@vger.kernel.org>
> > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
> > > Cc: Bean Huo <beanhuo@micron.com>
> > > Cc: Peter Pan <peterpandong@micron.com>
> > > ---    
> > 
> > Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>  
> 
> Queued to mtd/master.

I'm dropping this patch because I'm no longer sure this is the correct
way to fix bug. It seems that nand_set_features_op() is checking the
FAIL bit while the ONFI spec clearly says that FAIL bit is only valid
after a PROGRAM, ERASE or READ-with-on-die-ECC-enabled op. That might
explain why ->set_features() fails with -EIO after an ECC failure
(apparently Micron only clears the FAIL bit when launching a PROGRAM,
ERASE or READ-with-on-die-ECC-enabled op, not on a SET_FEATURES op).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE:  [PATCH] mtd: rawnand: micron: Fix support for on-die ECC
@ 2018-05-21 22:17 Bean Huo (beanhuo)
  0 siblings, 0 replies; 5+ messages in thread
From: Bean Huo (beanhuo) @ 2018-05-21 22:17 UTC (permalink / raw)
  To: Boris Brezillon, Richard Weinberger, Miquel Raynal, linux-mtd
  Cc: David Woodhouse, Brian Norris, Marek Vasut, Cyrille Pitchen,
	stable, Thomas Petazzoni,
	Peter Pan 潘栋 (peterpandong)

Hi, Boris
Sorry for the later as for I am in a long vacation.
Here how the SR should behave:
the status register is updated after each array operation and can be cleared with a reset command. 
After a read operation the status register bit0 will report the ECC status of the read until a different array operation is performed (erase/program/read) or a reset occurs. 
The status register bit1 will report the status of the time before last time operation. So, this bit can report a fail (value 1) even if the very last operation was successful (bit0=0 bit1=1).

//beanhuo

>
>---
>Peter, Bean,
>
>Can you confirm this behavior, or ask someone in Micron who can confirm it?
>Also, if a RESET is actually needed, it would be good to update the datasheet
>accordingly. And if that's not the case, can you explain why the
>NAND_STATUS_FAIL bit is stuck and how to clear it (I tried a 0x00 command,
>A.K.A. READ STATUS EXIT, but it does not clear this bit, ERASE and PROGRAM
>seem to clear the bit, but that's clearly not the kind of operation I can do
>when the user asks for a READ)?
>
>Thanks,
>
>Boris

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-05-21 22:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-03  7:49 [PATCH] mtd: rawnand: micron: Fix support for on-die ECC Boris Brezillon
2018-05-04  9:58 ` Miquel Raynal
2018-05-08 21:12   ` Boris Brezillon
2018-05-10  6:46     ` Boris Brezillon
2018-05-21 22:17 Bean Huo (beanhuo)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.