All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/5] mtd: nand: properly handle bitflips in erased pages
@ 2015-12-30 19:32 Boris Brezillon
  2015-12-30 19:32 ` [PATCH v4 1/5] mtd: nand: return consistent error codes in ecc.correct() implementations Boris Brezillon
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Boris Brezillon @ 2015-12-30 19:32 UTC (permalink / raw)
  To: David Woodhouse, Brian Norris, linux-mtd
  Cc: Franklin S Cooper Jr.,
	Maxim Levitsky, Nicolas Ferre, Jean-Christophe Plagniol-Villard,
	Alexandre Belloni, linux-kernel, Boris Brezillon

Hi,

This patch series aims at providing a common logic to check for bitflips
in erased pages.

Currently each driver is implementing its own logic to check for bitflips
in erased pages. Not only this create code duplication, but most of these
implementations are incorrect.
Here are a few aspects that are often left aside in those implementations:
1/ they do not check OOB bytes when checking for the ff pattern, which
   means they can consider a page as empty while the MTD user actually
   wanted to write almost ff with a few bits to zero
2/ they check for the ff pattern on the whole page, while ECC actually
   works on smaller chunks (usually 512 or 1024 bytes chunks)
3/ they use random bitflip thresholds to decide whether a page/chunk is
   erased or not. IMO this threshold should be set to ECC strength (or
   at least something correlated to this parameter)

The approach taken in this series is to provide two helper functions to
check for bitflips in erased pages. Each driver that needs to check for
such cases can then call the nand_check_erased_ecc_chunk() function, and
rely on the common logic to decide whether a page is erased or not.

While Brian suggested a few times to make this detection automatic for
all drivers that set a specific flag (NAND_CHECK_ERASED_BITFLIPS?), here
is a few reasons I think this is not such a good idea:
1/ some (a lot of) drivers do not properly implement the raw access
   functions, and since we need to check for raw data and OOB bytes this
   makes the automatic detection unusable for most drivers unless they
   decide to correctly implement those methods (which would be a good
   thing BTW).
2/ as a I said earlier, this check should be made at the ECC chunk level
   and not at the page level. This spots two problems: some (a lot of)
   drivers do not properly specify the ecc layout information, and even
   if the ecc layout is correctly defined, there is no way to attach ECC
   bytes to a specific ECC chunk.
3/ the last aspect is the perf penalty incured by this test. Automatically
   doing that at the NAND core level implies reading the whole page again
   in raw mode, while with the helper function approach, drivers supporting
   access at the ECC chunk level can read only the faulty chunk in raw
   mode.

Regarding the bitflips threshold at which an erased pages is considered as
faulty, I have assigned it to ECC strength. As mentioned by Andrea, using
ECC strength might cause some trouble, because if you already have some
bitflips in an erased page, programming it might generate even more of
them.
In the other hand, shouldn't that be checked after (or before) programming
a page. I mean, UBI is already capable of detecting pages which are over
the configured bitflips_threshold and move data around when it detects
such pages.
If we check data after writing a page we wouldn't have to bother about
setting a weaker value for the "bitflips in erased page" case.
Another thing in favor of the ECC strength value for this "bitflips in
erased page" threshold value: if the ECC engine is generating 0xff ECC
bytes when the page is empty, then it will be able to fix ECC strength
bitflips without complaining, so why should we use different value when
we detect bitflips using the pattern match approach?

Best Regards,

Boris

Changes since v3:
- drop already applied patches
- make the generic "bitflips in erased pages" check as an opt-in flag
- split driver changes to ease review
- addressed Brian's comments

Changes since v2:
- improve nand_check_erased_buf() implementation
- keep nand_check_erased_buf() private to nand_base.c
- patch existing ecc.correct() implementations to return consistent error
  codes
- make the 'erased check' optional
- remove some custom implementations of the 'erased check'

Changes since v1:
- fix the nand_check_erased_buf() function
- mark the bitflips > bitflips_threshold condition as unlikely
- add missing memsets in nand_check_erased_ecc_chunk()


Boris Brezillon (5):
  mtd: nand: return consistent error codes in ecc.correct()
    implementations
  mtd: nand: use nand_check_erased_ecc_chunk in default ECC read
    functions
  mtd: nand: davinci: remove custom 'erased check' implementation
  mtd: nand: diskonchip: remove custom 'erased check' implementation
  mtd: nand: jz4740: remove custom 'erased check' implementation

 drivers/mtd/nand/atmel_nand.c   |  2 +-
 drivers/mtd/nand/bf5xx_nand.c   | 20 +++++++++++-----
 drivers/mtd/nand/davinci_nand.c | 15 ++++--------
 drivers/mtd/nand/diskonchip.c   | 37 ++--------------------------
 drivers/mtd/nand/jz4740_nand.c  | 22 ++---------------
 drivers/mtd/nand/mxc_nand.c     |  4 ++--
 drivers/mtd/nand/nand_base.c    | 53 +++++++++++++++++++++++++++++++++++------
 drivers/mtd/nand/nand_bch.c     |  2 +-
 drivers/mtd/nand/nand_ecc.c     |  2 +-
 drivers/mtd/nand/omap2.c        |  6 ++---
 drivers/mtd/nand/r852.c         |  4 ++--
 include/linux/mtd/nand.h        | 18 +++++++++++++-
 include/linux/mtd/nand_bch.h    |  2 +-
 13 files changed, 96 insertions(+), 91 deletions(-)

-- 
2.1.4


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4 1/5] mtd: nand: return consistent error codes in ecc.correct() implementations
  2015-12-30 19:32 [PATCH v4 0/5] mtd: nand: properly handle bitflips in erased pages Boris Brezillon
@ 2015-12-30 19:32 ` Boris Brezillon
  2016-01-07  2:48   ` Brian Norris
  2015-12-30 19:32 ` [PATCH v4 2/5] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions Boris Brezillon
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Boris Brezillon @ 2015-12-30 19:32 UTC (permalink / raw)
  To: David Woodhouse, Brian Norris, linux-mtd
  Cc: Franklin S Cooper Jr.,
	Maxim Levitsky, Nicolas Ferre, Jean-Christophe Plagniol-Villard,
	Alexandre Belloni, linux-kernel, Boris Brezillon

The error code returned by the ecc.correct() are not consistent over the
all implementations.

Document the expected behavior in include/linux/mtd/nand.h and fix
offending implementations.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
---
 drivers/mtd/nand/atmel_nand.c   |  2 +-
 drivers/mtd/nand/bf5xx_nand.c   | 20 ++++++++++++++------
 drivers/mtd/nand/davinci_nand.c |  6 +++---
 drivers/mtd/nand/jz4740_nand.c  |  4 ++--
 drivers/mtd/nand/mxc_nand.c     |  4 ++--
 drivers/mtd/nand/nand_bch.c     |  2 +-
 drivers/mtd/nand/nand_ecc.c     |  2 +-
 drivers/mtd/nand/omap2.c        |  6 +++---
 drivers/mtd/nand/r852.c         |  4 ++--
 include/linux/mtd/nand.h        |  8 +++++++-
 include/linux/mtd/nand_bch.h    |  2 +-
 11 files changed, 37 insertions(+), 23 deletions(-)

diff --git a/drivers/mtd/nand/atmel_nand.c b/drivers/mtd/nand/atmel_nand.c
index 18c4e14..b216bf5 100644
--- a/drivers/mtd/nand/atmel_nand.c
+++ b/drivers/mtd/nand/atmel_nand.c
@@ -1445,7 +1445,7 @@ static int atmel_nand_correct(struct mtd_info *mtd, u_char *dat,
 		 * We can't correct so many errors */
 		dev_dbg(host->dev, "atmel_nand : multiple errors detected."
 				" Unable to correct.\n");
-		return -EIO;
+		return -EBADMSG;
 	}
 
 	/* if there's a single bit error : we can correct it */
diff --git a/drivers/mtd/nand/bf5xx_nand.c b/drivers/mtd/nand/bf5xx_nand.c
index 9514e13..89d9414 100644
--- a/drivers/mtd/nand/bf5xx_nand.c
+++ b/drivers/mtd/nand/bf5xx_nand.c
@@ -252,7 +252,7 @@ static int bf5xx_nand_correct_data_256(struct mtd_info *mtd, u_char *dat,
 	 */
 	if (hweight32(syndrome[0]) == 1) {
 		dev_err(info->device, "ECC data was incorrect!\n");
-		return 1;
+		return -EBADMSG;
 	}
 
 	syndrome[1] = (calced & 0x7FF) ^ (stored & 0x7FF);
@@ -285,7 +285,7 @@ static int bf5xx_nand_correct_data_256(struct mtd_info *mtd, u_char *dat,
 		data = data ^ (0x1 << failing_bit);
 		*(dat + failing_byte) = data;
 
-		return 0;
+		return 1;
 	}
 
 	/*
@@ -298,26 +298,34 @@ static int bf5xx_nand_correct_data_256(struct mtd_info *mtd, u_char *dat,
 	dev_err(info->device,
 		"Please discard data, mark bad block\n");
 
-	return 1;
+	return -EBADMSG;
 }
 
 static int bf5xx_nand_correct_data(struct mtd_info *mtd, u_char *dat,
 					u_char *read_ecc, u_char *calc_ecc)
 {
 	struct nand_chip *chip = mtd_to_nand(mtd);
-	int ret;
+	int ret, bitflips = 0;
 
 	ret = bf5xx_nand_correct_data_256(mtd, dat, read_ecc, calc_ecc);
+	if (ret < 0)
+		return ret;
+
+	bitflips = ret;
 
 	/* If ecc size is 512, correct second 256 bytes */
 	if (chip->ecc.size == 512) {
 		dat += 256;
 		read_ecc += 3;
 		calc_ecc += 3;
-		ret |= bf5xx_nand_correct_data_256(mtd, dat, read_ecc, calc_ecc);
+		ret = bf5xx_nand_correct_data_256(mtd, dat, read_ecc, calc_ecc);
+		if (ret < 0)
+			return ret;
+
+		bitflips += ret;
 	}
 
-	return ret;
+	return bitflips;
 }
 
 static void bf5xx_nand_enable_hwecc(struct mtd_info *mtd, int mode)
diff --git a/drivers/mtd/nand/davinci_nand.c b/drivers/mtd/nand/davinci_nand.c
index 3b49fe8..ddb73c3 100644
--- a/drivers/mtd/nand/davinci_nand.c
+++ b/drivers/mtd/nand/davinci_nand.c
@@ -207,7 +207,7 @@ static int nand_davinci_correct_1bit(struct mtd_info *mtd, u_char *dat,
 				dat[diff >> (12 + 3)] ^= BIT((diff >> 12) & 7);
 				return 1;
 			} else {
-				return -1;
+				return -EBADMSG;
 			}
 		} else if (!(diff & (diff - 1))) {
 			/* Single bit ECC error in the ECC itself,
@@ -215,7 +215,7 @@ static int nand_davinci_correct_1bit(struct mtd_info *mtd, u_char *dat,
 			return 1;
 		} else {
 			/* Uncorrectable error */
-			return -1;
+			return -EBADMSG;
 		}
 
 	}
@@ -391,7 +391,7 @@ compare:
 			return 0;
 		case 1:		/* five or more errors detected */
 			davinci_nand_readl(info, NAND_ERR_ERRVAL1_OFFSET);
-			return -EIO;
+			return -EBADMSG;
 		case 2:		/* error addresses computed */
 		case 3:
 			num_errors = 1 + ((fsr >> 16) & 0x03);
diff --git a/drivers/mtd/nand/jz4740_nand.c b/drivers/mtd/nand/jz4740_nand.c
index a2363d3..adccae3 100644
--- a/drivers/mtd/nand/jz4740_nand.c
+++ b/drivers/mtd/nand/jz4740_nand.c
@@ -254,7 +254,7 @@ static int jz_nand_correct_ecc_rs(struct mtd_info *mtd, uint8_t *dat,
 	} while (!(status & JZ_NAND_STATUS_DEC_FINISH) && --timeout);
 
 	if (timeout == 0)
-	    return -1;
+		return -ETIMEDOUT;
 
 	reg = readl(nand->base + JZ_REG_NAND_ECC_CTRL);
 	reg &= ~JZ_NAND_ECC_CTRL_ENABLE;
@@ -262,7 +262,7 @@ static int jz_nand_correct_ecc_rs(struct mtd_info *mtd, uint8_t *dat,
 
 	if (status & JZ_NAND_STATUS_ERROR) {
 		if (status & JZ_NAND_STATUS_UNCOR_ERROR)
-			return -1;
+			return -EBADMSG;
 
 		error_count = (status & JZ_NAND_STATUS_ERR_COUNT) >> 29;
 
diff --git a/drivers/mtd/nand/mxc_nand.c b/drivers/mtd/nand/mxc_nand.c
index 9540099..66e56bb 100644
--- a/drivers/mtd/nand/mxc_nand.c
+++ b/drivers/mtd/nand/mxc_nand.c
@@ -674,7 +674,7 @@ static int mxc_nand_correct_data_v1(struct mtd_info *mtd, u_char *dat,
 
 	if (((ecc_status & 0x3) == 2) || ((ecc_status >> 2) == 2)) {
 		pr_debug("MXC_NAND: HWECC uncorrectable 2-bit ECC error\n");
-		return -1;
+		return -EBADMSG;
 	}
 
 	return 0;
@@ -701,7 +701,7 @@ static int mxc_nand_correct_data_v2_v3(struct mtd_info *mtd, u_char *dat,
 		err = ecc_stat & ecc_bit_mask;
 		if (err > err_limit) {
 			printk(KERN_WARNING "UnCorrectable RS-ECC Error\n");
-			return -1;
+			return -EBADMSG;
 		} else {
 			ret += err;
 		}
diff --git a/drivers/mtd/nand/nand_bch.c b/drivers/mtd/nand/nand_bch.c
index e5758d8..a87c1b6 100644
--- a/drivers/mtd/nand/nand_bch.c
+++ b/drivers/mtd/nand/nand_bch.c
@@ -98,7 +98,7 @@ int nand_bch_correct_data(struct mtd_info *mtd, unsigned char *buf,
 		}
 	} else if (count < 0) {
 		printk(KERN_ERR "ecc unrecoverable error\n");
-		count = -1;
+		count = -EBADMSG;
 	}
 	return count;
 }
diff --git a/drivers/mtd/nand/nand_ecc.c b/drivers/mtd/nand/nand_ecc.c
index e612985..d1770b0 100644
--- a/drivers/mtd/nand/nand_ecc.c
+++ b/drivers/mtd/nand/nand_ecc.c
@@ -507,7 +507,7 @@ int __nand_correct_data(unsigned char *buf,
 		return 1;	/* error in ECC data; no action needed */
 
 	pr_err("%s: uncorrectable ECC error\n", __func__);
-	return -1;
+	return -EBADMSG;
 }
 EXPORT_SYMBOL(__nand_correct_data);
 
diff --git a/drivers/mtd/nand/omap2.c b/drivers/mtd/nand/omap2.c
index e9cbbc6..c553f78 100644
--- a/drivers/mtd/nand/omap2.c
+++ b/drivers/mtd/nand/omap2.c
@@ -826,12 +826,12 @@ static int omap_compare_ecc(u8 *ecc_data1,	/* read from NAND memory */
 	case 1:
 		/* Uncorrectable error */
 		pr_debug("ECC UNCORRECTED_ERROR 1\n");
-		return -1;
+		return -EBADMSG;
 
 	case 11:
 		/* UN-Correctable error */
 		pr_debug("ECC UNCORRECTED_ERROR B\n");
-		return -1;
+		return -EBADMSG;
 
 	case 12:
 		/* Correctable error */
@@ -861,7 +861,7 @@ static int omap_compare_ecc(u8 *ecc_data1,	/* read from NAND memory */
 				return 0;
 		}
 		pr_debug("UNCORRECTED_ERROR default\n");
-		return -1;
+		return -EBADMSG;
 	}
 }
 
diff --git a/drivers/mtd/nand/r852.c b/drivers/mtd/nand/r852.c
index cb0bf09..5b15f2f 100644
--- a/drivers/mtd/nand/r852.c
+++ b/drivers/mtd/nand/r852.c
@@ -477,7 +477,7 @@ static int r852_ecc_correct(struct mtd_info *mtd, uint8_t *dat,
 
 	if (dev->dma_error) {
 		dev->dma_error = 0;
-		return -1;
+		return -EIO;
 	}
 
 	r852_write_reg(dev, R852_CTL, dev->ctlreg | R852_CTL_ECC_ACCESS);
@@ -491,7 +491,7 @@ static int r852_ecc_correct(struct mtd_info *mtd, uint8_t *dat,
 		/* ecc uncorrectable error */
 		if (ecc_status & R852_ECC_FAIL) {
 			dbg("ecc: unrecoverable error, in half %d", i);
-			error = -1;
+			error = -EBADMSG;
 			goto exit;
 		}
 
diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
index 3e92be1..5189581 100644
--- a/include/linux/mtd/nand.h
+++ b/include/linux/mtd/nand.h
@@ -456,7 +456,13 @@ struct nand_hw_control {
  * @hwctl:	function to control hardware ECC generator. Must only
  *		be provided if an hardware ECC is available
  * @calculate:	function for ECC calculation or readback from ECC hardware
- * @correct:	function for ECC correction, matching to ECC generator (sw/hw)
+ * @correct:	function for ECC correction, matching to ECC generator (sw/hw).
+ *		Should return a positive number representing the number of
+ *		corrected bitflips, -EBADMSG if the number of bitflips exceed
+ *		ECC strength, or any other error code if the error is not
+ *		directly related to correction.
+ *		If -EBADMSG is returned the input buffers should be left
+ *		untouched.
  * @read_page_raw:	function to read a raw page without ECC. This function
  *			should hide the specific layout used by the ECC
  *			controller and always return contiguous in-band and
diff --git a/include/linux/mtd/nand_bch.h b/include/linux/mtd/nand_bch.h
index 74acf53..fb0bc34 100644
--- a/include/linux/mtd/nand_bch.h
+++ b/include/linux/mtd/nand_bch.h
@@ -55,7 +55,7 @@ static inline int
 nand_bch_correct_data(struct mtd_info *mtd, unsigned char *buf,
 		      unsigned char *read_ecc, unsigned char *calc_ecc)
 {
-	return -1;
+	return -ENOTSUPP;
 }
 
 static inline struct nand_bch_control *
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 2/5] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions
  2015-12-30 19:32 [PATCH v4 0/5] mtd: nand: properly handle bitflips in erased pages Boris Brezillon
  2015-12-30 19:32 ` [PATCH v4 1/5] mtd: nand: return consistent error codes in ecc.correct() implementations Boris Brezillon
@ 2015-12-30 19:32 ` Boris Brezillon
  2016-01-07  2:50   ` Brian Norris
  2015-12-30 19:32 ` [PATCH v4 3/5] mtd: nand: davinci: remove custom 'erased check' implementation Boris Brezillon
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Boris Brezillon @ 2015-12-30 19:32 UTC (permalink / raw)
  To: David Woodhouse, Brian Norris, linux-mtd
  Cc: Franklin S Cooper Jr.,
	Maxim Levitsky, Nicolas Ferre, Jean-Christophe Plagniol-Villard,
	Alexandre Belloni, linux-kernel, Boris Brezillon

The default NAND read functions are relying on the underlying controller
driver to correct bitflips, but some of those controllers cannot properly
fix bitflips in erased pages.
Check for bitflips in erased pages in default core functions if the driver
delegated the this check by setting the NAND_ECC_GENERIC_ERASED_CHECK flag.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
---
 drivers/mtd/nand/nand_base.c | 53 ++++++++++++++++++++++++++++++++++++++------
 include/linux/mtd/nand.h     | 10 +++++++++
 2 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 928081b..909a1d4 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -1426,6 +1426,16 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip,
 
 		stat = chip->ecc.correct(mtd, p,
 			&chip->buffers->ecccode[i], &chip->buffers->ecccalc[i]);
+		if (stat == -EBADMSG &&
+		    (chip->ecc.options & NAND_ECC_GENERIC_ERASED_CHECK)) {
+			/* check for empty pages with bitflips */
+			stat = nand_check_erased_ecc_chunk(p, chip->ecc.size,
+						&chip->buffers->ecccode[i],
+						chip->ecc.bytes,
+						NULL, 0,
+						chip->ecc.strength);
+		}
+
 		if (stat < 0) {
 			mtd->ecc_stats.failed++;
 		} else {
@@ -1475,6 +1485,15 @@ static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip,
 		int stat;
 
 		stat = chip->ecc.correct(mtd, p, &ecc_code[i], &ecc_calc[i]);
+		if (stat == -EBADMSG &&
+		    (chip->ecc.options & NAND_ECC_GENERIC_ERASED_CHECK)) {
+			/* check for empty pages with bitflips */
+			stat = nand_check_erased_ecc_chunk(p, eccsize,
+						&ecc_code[i], eccbytes,
+						NULL, 0,
+						chip->ecc.strength);
+		}
+
 		if (stat < 0) {
 			mtd->ecc_stats.failed++;
 		} else {
@@ -1527,6 +1546,15 @@ static int nand_read_page_hwecc_oob_first(struct mtd_info *mtd,
 		chip->ecc.calculate(mtd, p, &ecc_calc[i]);
 
 		stat = chip->ecc.correct(mtd, p, &ecc_code[i], NULL);
+		if (stat == -EBADMSG &&
+		    (chip->ecc.options & NAND_ECC_GENERIC_ERASED_CHECK)) {
+			/* check for empty pages with bitflips */
+			stat = nand_check_erased_ecc_chunk(p, eccsize,
+						&ecc_code[i], eccbytes,
+						NULL, 0,
+						chip->ecc.strength);
+		}
+
 		if (stat < 0) {
 			mtd->ecc_stats.failed++;
 		} else {
@@ -1554,6 +1582,7 @@ static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip,
 	int i, eccsize = chip->ecc.size;
 	int eccbytes = chip->ecc.bytes;
 	int eccsteps = chip->ecc.steps;
+	int eccpadbytes = eccbytes + chip->ecc.prepad + chip->ecc.postpad;
 	uint8_t *p = buf;
 	uint8_t *oob = chip->oob_poi;
 	unsigned int max_bitflips = 0;
@@ -1573,19 +1602,29 @@ static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip,
 		chip->read_buf(mtd, oob, eccbytes);
 		stat = chip->ecc.correct(mtd, p, oob, NULL);
 
-		if (stat < 0) {
-			mtd->ecc_stats.failed++;
-		} else {
-			mtd->ecc_stats.corrected += stat;
-			max_bitflips = max_t(unsigned int, max_bitflips, stat);
-		}
-
 		oob += eccbytes;
 
 		if (chip->ecc.postpad) {
 			chip->read_buf(mtd, oob, chip->ecc.postpad);
 			oob += chip->ecc.postpad;
 		}
+
+		if (stat == -EBADMSG &&
+		    (chip->ecc.options & NAND_ECC_GENERIC_ERASED_CHECK)) {
+			/* check for empty pages with bitflips */
+			stat = nand_check_erased_ecc_chunk(p, chip->ecc.size,
+							   oob - eccpadbytes,
+							   eccpadbytes,
+							   NULL, 0,
+							   chip->ecc.strength);
+		}
+
+		if (stat < 0) {
+			mtd->ecc_stats.failed++;
+		} else {
+			mtd->ecc_stats.corrected += stat;
+			max_bitflips = max_t(unsigned int, max_bitflips, stat);
+		}
 	}
 
 	/* Calculate remaining oob bytes */
diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
index 5189581..86487db 100644
--- a/include/linux/mtd/nand.h
+++ b/include/linux/mtd/nand.h
@@ -129,6 +129,14 @@ typedef enum {
 /* Enable Hardware ECC before syndrome is read back from flash */
 #define NAND_ECC_READSYN	2
 
+/*
+ * Enable generic NAND 'page erased' check. This check is only done when
+ * ecc.correct() returns -EBADMSG.
+ * Set this flag if your implementation does not fix bitflips in erased
+ * pages and you want to rely on the default implementation.
+ */
+#define NAND_ECC_GENERIC_ERASED_CHECK	BIT(0)
+
 /* Bit mask for flags passed to do_nand_read_ecc */
 #define NAND_GET_DEVICE		0x80
 
@@ -451,6 +459,7 @@ struct nand_hw_control {
  * @total:	total number of ECC bytes per page
  * @prepad:	padding information for syndrome based ECC generators
  * @postpad:	padding information for syndrome based ECC generators
+ * @options:	ECC specific options (see NAND_ECC_XXX flags defined above)
  * @layout:	ECC layout control struct pointer
  * @priv:	pointer to private ECC control data
  * @hwctl:	function to control hardware ECC generator. Must only
@@ -500,6 +509,7 @@ struct nand_ecc_ctrl {
 	int strength;
 	int prepad;
 	int postpad;
+	unsigned int options;
 	struct nand_ecclayout	*layout;
 	void *priv;
 	void (*hwctl)(struct mtd_info *mtd, int mode);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 3/5] mtd: nand: davinci: remove custom 'erased check' implementation
  2015-12-30 19:32 [PATCH v4 0/5] mtd: nand: properly handle bitflips in erased pages Boris Brezillon
  2015-12-30 19:32 ` [PATCH v4 1/5] mtd: nand: return consistent error codes in ecc.correct() implementations Boris Brezillon
  2015-12-30 19:32 ` [PATCH v4 2/5] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions Boris Brezillon
@ 2015-12-30 19:32 ` Boris Brezillon
  2016-01-07  2:55   ` Brian Norris
  2015-12-30 19:32 ` [PATCH v4 4/5] mtd: nand: diskonchip: " Boris Brezillon
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Boris Brezillon @ 2015-12-30 19:32 UTC (permalink / raw)
  To: David Woodhouse, Brian Norris, linux-mtd
  Cc: Franklin S Cooper Jr.,
	Maxim Levitsky, Nicolas Ferre, Jean-Christophe Plagniol-Villard,
	Alexandre Belloni, linux-kernel, Boris Brezillon

The davinci drivers is manually checking for 'erased pages' while
correcting ECC bytes.
This logic can now done by the core infrastructure, and can thus be removed
from this drivers.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
---
 drivers/mtd/nand/davinci_nand.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/mtd/nand/davinci_nand.c b/drivers/mtd/nand/davinci_nand.c
index ddb73c3..8cb821b 100644
--- a/drivers/mtd/nand/davinci_nand.c
+++ b/drivers/mtd/nand/davinci_nand.c
@@ -317,14 +317,6 @@ static int nand_davinci_correct_4bit(struct mtd_info *mtd,
 	unsigned num_errors, corrected;
 	unsigned long timeo;
 
-	/* All bytes 0xff?  It's an erased page; ignore its ECC. */
-	for (i = 0; i < 10; i++) {
-		if (ecc_code[i] != 0xff)
-			goto compare;
-	}
-	return 0;
-
-compare:
 	/* Unpack ten bytes into eight 10 bit values.  We know we're
 	 * little-endian, and use type punning for less shifting/masking.
 	 */
@@ -749,6 +741,7 @@ static int nand_davinci_probe(struct platform_device *pdev)
 			info->chip.ecc.correct = nand_davinci_correct_4bit;
 			info->chip.ecc.hwctl = nand_davinci_hwctl_4bit;
 			info->chip.ecc.bytes = 10;
+			info->chip.ecc.options = NAND_ECC_GENERIC_ERASED_CHECK;
 		} else {
 			info->chip.ecc.calculate = nand_davinci_calculate_1bit;
 			info->chip.ecc.correct = nand_davinci_correct_1bit;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 4/5] mtd: nand: diskonchip: remove custom 'erased check' implementation
  2015-12-30 19:32 [PATCH v4 0/5] mtd: nand: properly handle bitflips in erased pages Boris Brezillon
                   ` (2 preceding siblings ...)
  2015-12-30 19:32 ` [PATCH v4 3/5] mtd: nand: davinci: remove custom 'erased check' implementation Boris Brezillon
@ 2015-12-30 19:32 ` Boris Brezillon
  2015-12-30 19:39   ` [PATCH v5 " Boris Brezillon
  2015-12-30 19:32 ` [PATCH v4 5/5] mtd: nand: jz4740: " Boris Brezillon
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Boris Brezillon @ 2015-12-30 19:32 UTC (permalink / raw)
  To: David Woodhouse, Brian Norris, linux-mtd
  Cc: Franklin S Cooper Jr.,
	Maxim Levitsky, Nicolas Ferre, Jean-Christophe Plagniol-Villard,
	Alexandre Belloni, linux-kernel, Boris Brezillon

The diskonchip drivers is manually checking for 'erased pages' while
correcting ECC bytes.
This logic can now done by the core infrastructure, and can thus be removed
from this drivers.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
---
 drivers/mtd/nand/diskonchip.c | 37 ++-----------------------------------
 1 file changed, 2 insertions(+), 35 deletions(-)

diff --git a/drivers/mtd/nand/diskonchip.c b/drivers/mtd/nand/diskonchip.c
index a5c0466..a6927bf 100644
--- a/drivers/mtd/nand/diskonchip.c
+++ b/drivers/mtd/nand/diskonchip.c
@@ -74,10 +74,6 @@ struct doc_priv {
 	int (*late_init)(struct mtd_info *mtd);
 };
 
-/* This is the syndrome computed by the HW ecc generator upon reading an empty
-   page, one with all 0xff for data and stored ecc code. */
-static u_char empty_read_syndrome[6] = { 0x26, 0xff, 0x6d, 0x47, 0x73, 0x7a };
-
 /* This is the ecc value computed by the HW ecc generator upon writing an empty
    page, one with all 0xff for data. */
 static u_char empty_write_ecc[6] = { 0x4b, 0x00, 0xe2, 0x0e, 0x93, 0xf7 };
@@ -912,7 +908,6 @@ static int doc200x_correct_data(struct mtd_info *mtd, u_char *dat,
 	void __iomem *docptr = doc->virtadr;
 	uint8_t calc_ecc[6];
 	volatile u_char dummy;
-	int emptymatch = 1;
 
 	/* flush the pipeline */
 	if (DoC_is_2000(doc)) {
@@ -936,37 +931,9 @@ static int doc200x_correct_data(struct mtd_info *mtd, u_char *dat,
 				calc_ecc[i] = ReadDOC_(docptr, DoC_Mplus_ECCSyndrome0 + i);
 			else
 				calc_ecc[i] = ReadDOC_(docptr, DoC_ECCSyndrome0 + i);
-			if (calc_ecc[i] != empty_read_syndrome[i])
-				emptymatch = 0;
-		}
-		/* If emptymatch=1, the read syndrome is consistent with an
-		   all-0xff data and stored ecc block.  Check the stored ecc. */
-		if (emptymatch) {
-			for (i = 0; i < 6; i++) {
-				if (read_ecc[i] == 0xff)
-					continue;
-				emptymatch = 0;
-				break;
-			}
 		}
-		/* If emptymatch still =1, check the data block. */
-		if (emptymatch) {
-			/* Note: this somewhat expensive test should not be triggered
-			   often.  It could be optimized away by examining the data in
-			   the readbuf routine, and remembering the result. */
-			for (i = 0; i < 512; i++) {
-				if (dat[i] == 0xff)
-					continue;
-				emptymatch = 0;
-				break;
-			}
-		}
-		/* If emptymatch still =1, this is almost certainly a freshly-
-		   erased block, in which case the ECC will not come out right.
-		   We'll suppress the error and tell the caller everything's
-		   OK.  Because it is. */
-		if (!emptymatch)
-			ret = doc_ecc_decode(rs_decoder, dat, calc_ecc);
+
+		ret = doc_ecc_decode(rs_decoder, dat, calc_ecc);
 		if (ret > 0)
 			printk(KERN_ERR "doc200x_correct_data corrected %d errors\n", ret);
 	}
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 5/5] mtd: nand: jz4740: remove custom 'erased check' implementation
  2015-12-30 19:32 [PATCH v4 0/5] mtd: nand: properly handle bitflips in erased pages Boris Brezillon
                   ` (3 preceding siblings ...)
  2015-12-30 19:32 ` [PATCH v4 4/5] mtd: nand: diskonchip: " Boris Brezillon
@ 2015-12-30 19:32 ` Boris Brezillon
  2015-12-30 19:41   ` [PATCH v5 " Boris Brezillon
  2015-12-30 23:34 ` [PATCH v4 0/5] mtd: nand: properly handle bitflips in erased pages Franklin S Cooper Jr.
  2016-01-07  3:02 ` Brian Norris
  6 siblings, 1 reply; 13+ messages in thread
From: Boris Brezillon @ 2015-12-30 19:32 UTC (permalink / raw)
  To: David Woodhouse, Brian Norris, linux-mtd
  Cc: Franklin S Cooper Jr.,
	Maxim Levitsky, Nicolas Ferre, Jean-Christophe Plagniol-Villard,
	Alexandre Belloni, linux-kernel, Boris Brezillon

The jz4740 drivers is manually checking for 'erased pages' while
correcting ECC bytes.
This logic can now done by the core infrastructure, and can thus be removed
from this drivers.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
---
 drivers/mtd/nand/jz4740_nand.c | 18 ------------------
 1 file changed, 18 deletions(-)

diff --git a/drivers/mtd/nand/jz4740_nand.c b/drivers/mtd/nand/jz4740_nand.c
index adccae3..f934060 100644
--- a/drivers/mtd/nand/jz4740_nand.c
+++ b/drivers/mtd/nand/jz4740_nand.c
@@ -224,24 +224,6 @@ static int jz_nand_correct_ecc_rs(struct mtd_info *mtd, uint8_t *dat,
 	uint32_t t;
 	unsigned int timeout = 1000;
 
-	t = read_ecc[0];
-
-	if (t == 0xff) {
-		for (i = 1; i < 9; ++i)
-			t &= read_ecc[i];
-
-		t &= dat[0];
-		t &= dat[nand->chip.ecc.size / 2];
-		t &= dat[nand->chip.ecc.size - 1];
-
-		if (t == 0xff) {
-			for (i = 1; i < nand->chip.ecc.size - 1; ++i)
-				t &= dat[i];
-			if (t == 0xff)
-				return 0;
-		}
-	}
-
 	for (i = 0; i < 9; ++i)
 		writeb(read_ecc[i], nand->base + JZ_REG_NAND_PAR0 + i);
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 4/5] mtd: nand: diskonchip: remove custom 'erased check' implementation
  2015-12-30 19:32 ` [PATCH v4 4/5] mtd: nand: diskonchip: " Boris Brezillon
@ 2015-12-30 19:39   ` Boris Brezillon
  0 siblings, 0 replies; 13+ messages in thread
From: Boris Brezillon @ 2015-12-30 19:39 UTC (permalink / raw)
  To: David Woodhouse, Brian Norris, linux-mtd
  Cc: Franklin S Cooper Jr.,
	Maxim Levitsky, Nicolas Ferre, Jean-Christophe Plagniol-Villard,
	Alexandre Belloni, linux-kernel, Boris Brezillon

The diskonchip drivers is manually checking for 'erased pages' while
correcting ECC bytes.
This logic can now done by the core infrastructure, and can thus be removed
from this drivers.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
---
Changes since v4:
- add missing NAND_ECC_GENERIC_ERASED_CHECK flag

 drivers/mtd/nand/diskonchip.c | 38 +++-----------------------------------
 1 file changed, 3 insertions(+), 35 deletions(-)

diff --git a/drivers/mtd/nand/diskonchip.c b/drivers/mtd/nand/diskonchip.c
index a5c0466..4f4aa35 100644
--- a/drivers/mtd/nand/diskonchip.c
+++ b/drivers/mtd/nand/diskonchip.c
@@ -74,10 +74,6 @@ struct doc_priv {
 	int (*late_init)(struct mtd_info *mtd);
 };
 
-/* This is the syndrome computed by the HW ecc generator upon reading an empty
-   page, one with all 0xff for data and stored ecc code. */
-static u_char empty_read_syndrome[6] = { 0x26, 0xff, 0x6d, 0x47, 0x73, 0x7a };
-
 /* This is the ecc value computed by the HW ecc generator upon writing an empty
    page, one with all 0xff for data. */
 static u_char empty_write_ecc[6] = { 0x4b, 0x00, 0xe2, 0x0e, 0x93, 0xf7 };
@@ -912,7 +908,6 @@ static int doc200x_correct_data(struct mtd_info *mtd, u_char *dat,
 	void __iomem *docptr = doc->virtadr;
 	uint8_t calc_ecc[6];
 	volatile u_char dummy;
-	int emptymatch = 1;
 
 	/* flush the pipeline */
 	if (DoC_is_2000(doc)) {
@@ -936,37 +931,9 @@ static int doc200x_correct_data(struct mtd_info *mtd, u_char *dat,
 				calc_ecc[i] = ReadDOC_(docptr, DoC_Mplus_ECCSyndrome0 + i);
 			else
 				calc_ecc[i] = ReadDOC_(docptr, DoC_ECCSyndrome0 + i);
-			if (calc_ecc[i] != empty_read_syndrome[i])
-				emptymatch = 0;
-		}
-		/* If emptymatch=1, the read syndrome is consistent with an
-		   all-0xff data and stored ecc block.  Check the stored ecc. */
-		if (emptymatch) {
-			for (i = 0; i < 6; i++) {
-				if (read_ecc[i] == 0xff)
-					continue;
-				emptymatch = 0;
-				break;
-			}
 		}
-		/* If emptymatch still =1, check the data block. */
-		if (emptymatch) {
-			/* Note: this somewhat expensive test should not be triggered
-			   often.  It could be optimized away by examining the data in
-			   the readbuf routine, and remembering the result. */
-			for (i = 0; i < 512; i++) {
-				if (dat[i] == 0xff)
-					continue;
-				emptymatch = 0;
-				break;
-			}
-		}
-		/* If emptymatch still =1, this is almost certainly a freshly-
-		   erased block, in which case the ECC will not come out right.
-		   We'll suppress the error and tell the caller everything's
-		   OK.  Because it is. */
-		if (!emptymatch)
-			ret = doc_ecc_decode(rs_decoder, dat, calc_ecc);
+
+		ret = doc_ecc_decode(rs_decoder, dat, calc_ecc);
 		if (ret > 0)
 			printk(KERN_ERR "doc200x_correct_data corrected %d errors\n", ret);
 	}
@@ -1586,6 +1553,7 @@ static int __init doc_probe(unsigned long physadr)
 	nand->ecc.size		= 512;
 	nand->ecc.bytes		= 6;
 	nand->ecc.strength	= 2;
+	nand->ecc.options	= NAND_ECC_GENERIC_ERASED_CHECK;
 	nand->bbt_options	= NAND_BBT_USE_FLASH;
 	/* Skip the automatic BBT scan so we can run it manually */
 	nand->options		|= NAND_SKIP_BBTSCAN;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 5/5] mtd: nand: jz4740: remove custom 'erased check' implementation
  2015-12-30 19:32 ` [PATCH v4 5/5] mtd: nand: jz4740: " Boris Brezillon
@ 2015-12-30 19:41   ` Boris Brezillon
  0 siblings, 0 replies; 13+ messages in thread
From: Boris Brezillon @ 2015-12-30 19:41 UTC (permalink / raw)
  To: David Woodhouse, Brian Norris, linux-mtd
  Cc: Franklin S Cooper Jr.,
	Maxim Levitsky, Nicolas Ferre, Jean-Christophe Plagniol-Villard,
	Alexandre Belloni, linux-kernel, Boris Brezillon

The jz4740 drivers is manually checking for 'erased pages' while
correcting ECC bytes.
This logic can now done by the core infrastructure, and can thus be removed
from this drivers.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
---
Changes since v4:
- add missing NAND_ECC_GENERIC_ERASED_CHECK flag

 drivers/mtd/nand/jz4740_nand.c | 19 +------------------
 1 file changed, 1 insertion(+), 18 deletions(-)

diff --git a/drivers/mtd/nand/jz4740_nand.c b/drivers/mtd/nand/jz4740_nand.c
index adccae3..b19d2a9 100644
--- a/drivers/mtd/nand/jz4740_nand.c
+++ b/drivers/mtd/nand/jz4740_nand.c
@@ -224,24 +224,6 @@ static int jz_nand_correct_ecc_rs(struct mtd_info *mtd, uint8_t *dat,
 	uint32_t t;
 	unsigned int timeout = 1000;
 
-	t = read_ecc[0];
-
-	if (t == 0xff) {
-		for (i = 1; i < 9; ++i)
-			t &= read_ecc[i];
-
-		t &= dat[0];
-		t &= dat[nand->chip.ecc.size / 2];
-		t &= dat[nand->chip.ecc.size - 1];
-
-		if (t == 0xff) {
-			for (i = 1; i < nand->chip.ecc.size - 1; ++i)
-				t &= dat[i];
-			if (t == 0xff)
-				return 0;
-		}
-	}
-
 	for (i = 0; i < 9; ++i)
 		writeb(read_ecc[i], nand->base + JZ_REG_NAND_PAR0 + i);
 
@@ -443,6 +425,7 @@ static int jz_nand_probe(struct platform_device *pdev)
 	chip->ecc.size		= 512;
 	chip->ecc.bytes		= 9;
 	chip->ecc.strength	= 4;
+	chip->ecc.options	= NAND_ECC_GENERIC_ERASED_CHECK;
 
 	if (pdata)
 		chip->ecc.layout = pdata->ecc_layout;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 0/5] mtd: nand: properly handle bitflips in erased pages
  2015-12-30 19:32 [PATCH v4 0/5] mtd: nand: properly handle bitflips in erased pages Boris Brezillon
                   ` (4 preceding siblings ...)
  2015-12-30 19:32 ` [PATCH v4 5/5] mtd: nand: jz4740: " Boris Brezillon
@ 2015-12-30 23:34 ` Franklin S Cooper Jr.
  2016-01-07  3:02 ` Brian Norris
  6 siblings, 0 replies; 13+ messages in thread
From: Franklin S Cooper Jr. @ 2015-12-30 23:34 UTC (permalink / raw)
  To: Boris Brezillon, David Woodhouse, Brian Norris, linux-mtd
  Cc: Maxim Levitsky, Nicolas Ferre, Jean-Christophe Plagniol-Villard,
	Alexandre Belloni, linux-kernel



On 12/30/2015 01:32 PM, Boris Brezillon wrote:
> Hi,
>
> This patch series aims at providing a common logic to check for bitflips
> in erased pages.
>
> Currently each driver is implementing its own logic to check for bitflips
> in erased pages. Not only this create code duplication, but most of these
> implementations are incorrect.
> Here are a few aspects that are often left aside in those implementations:
> 1/ they do not check OOB bytes when checking for the ff pattern, which
>    means they can consider a page as empty while the MTD user actually
>    wanted to write almost ff with a few bits to zero
> 2/ they check for the ff pattern on the whole page, while ECC actually
>    works on smaller chunks (usually 512 or 1024 bytes chunks)
> 3/ they use random bitflip thresholds to decide whether a page/chunk is
>    erased or not. IMO this threshold should be set to ECC strength (or
>    at least something correlated to this parameter)
>
> The approach taken in this series is to provide two helper functions to
> check for bitflips in erased pages. Each driver that needs to check for
> such cases can then call the nand_check_erased_ecc_chunk() function, and
> rely on the common logic to decide whether a page is erased or not.
>
> While Brian suggested a few times to make this detection automatic for
> all drivers that set a specific flag (NAND_CHECK_ERASED_BITFLIPS?), here
> is a few reasons I think this is not such a good idea:
> 1/ some (a lot of) drivers do not properly implement the raw access
>    functions, and since we need to check for raw data and OOB bytes this
>    makes the automatic detection unusable for most drivers unless they
>    decide to correctly implement those methods (which would be a good
>    thing BTW).
> 2/ as a I said earlier, this check should be made at the ECC chunk level
>    and not at the page level. This spots two problems: some (a lot of)
>    drivers do not properly specify the ecc layout information, and even
>    if the ecc layout is correctly defined, there is no way to attach ECC
>    bytes to a specific ECC chunk.
> 3/ the last aspect is the perf penalty incured by this test. Automatically
>    doing that at the NAND core level implies reading the whole page again
>    in raw mode, while with the helper function approach, drivers supporting
>    access at the ECC chunk level can read only the faulty chunk in raw
>    mode.
>
> Regarding the bitflips threshold at which an erased pages is considered as
> faulty, I have assigned it to ECC strength. As mentioned by Andrea, using
> ECC strength might cause some trouble, because if you already have some
> bitflips in an erased page, programming it might generate even more of
> them.
> In the other hand, shouldn't that be checked after (or before) programming
> a page. I mean, UBI is already capable of detecting pages which are over
> the configured bitflips_threshold and move data around when it detects
> such pages.
> If we check data after writing a page we wouldn't have to bother about
> setting a weaker value for the "bitflips in erased page" case.
> Another thing in favor of the ECC strength value for this "bitflips in
> erased page" threshold value: if the ECC engine is generating 0xff ECC
> bytes when the page is empty, then it will be able to fix ECC strength
> bitflips without complaining, so why should we use different value when
> we detect bitflips using the pattern match approach?
>
> Best Regards,
>
> Boris
>
> Changes since v3:
> - drop already applied patches
> - make the generic "bitflips in erased pages" check as an opt-in flag
> - split driver changes to ease review
> - addressed Brian's comments
>
> Changes since v2:
> - improve nand_check_erased_buf() implementation
> - keep nand_check_erased_buf() private to nand_base.c
> - patch existing ecc.correct() implementations to return consistent error
>   codes
> - make the 'erased check' optional
> - remove some custom implementations of the 'erased check'
>
> Changes since v1:
> - fix the nand_check_erased_buf() function
> - mark the bitflips > bitflips_threshold condition as unlikely
> - add missing memsets in nand_check_erased_ecc_chunk()
>
>
> Boris Brezillon (5):
>   mtd: nand: return consistent error codes in ecc.correct()
>     implementations
>   mtd: nand: use nand_check_erased_ecc_chunk in default ECC read
>     functions
>   mtd: nand: davinci: remove custom 'erased check' implementation
>   mtd: nand: diskonchip: remove custom 'erased check' implementation
>   mtd: nand: jz4740: remove custom 'erased check' implementation
>
>  drivers/mtd/nand/atmel_nand.c   |  2 +-
>  drivers/mtd/nand/bf5xx_nand.c   | 20 +++++++++++-----
>  drivers/mtd/nand/davinci_nand.c | 15 ++++--------
>  drivers/mtd/nand/diskonchip.c   | 37 ++--------------------------
>  drivers/mtd/nand/jz4740_nand.c  | 22 ++---------------
>  drivers/mtd/nand/mxc_nand.c     |  4 ++--
>  drivers/mtd/nand/nand_base.c    | 53 +++++++++++++++++++++++++++++++++++------
>  drivers/mtd/nand/nand_bch.c     |  2 +-
>  drivers/mtd/nand/nand_ecc.c     |  2 +-
>  drivers/mtd/nand/omap2.c        |  6 ++---
>  drivers/mtd/nand/r852.c         |  4 ++--
>  include/linux/mtd/nand.h        | 18 +++++++++++++-
>  include/linux/mtd/nand_bch.h    |  2 +-
>  13 files changed, 96 insertions(+), 91 deletions(-)
>

Validated this patchset on TI K2E evm.

Tested-by: Franklin S Cooper Jr. <fcooper@ti.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/5] mtd: nand: return consistent error codes in ecc.correct() implementations
  2015-12-30 19:32 ` [PATCH v4 1/5] mtd: nand: return consistent error codes in ecc.correct() implementations Boris Brezillon
@ 2016-01-07  2:48   ` Brian Norris
  0 siblings, 0 replies; 13+ messages in thread
From: Brian Norris @ 2016-01-07  2:48 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: David Woodhouse, linux-mtd, Franklin S Cooper Jr.,
	Maxim Levitsky, Nicolas Ferre, Jean-Christophe Plagniol-Villard,
	Alexandre Belloni, linux-kernel

On Wed, Dec 30, 2015 at 08:32:03PM +0100, Boris Brezillon wrote:
> The error code returned by the ecc.correct() are not consistent over the
> all implementations.
> 
> Document the expected behavior in include/linux/mtd/nand.h and fix
> offending implementations.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
> ---
>  drivers/mtd/nand/atmel_nand.c   |  2 +-
>  drivers/mtd/nand/bf5xx_nand.c   | 20 ++++++++++++++------
>  drivers/mtd/nand/davinci_nand.c |  6 +++---
>  drivers/mtd/nand/jz4740_nand.c  |  4 ++--
>  drivers/mtd/nand/mxc_nand.c     |  4 ++--
>  drivers/mtd/nand/nand_bch.c     |  2 +-
>  drivers/mtd/nand/nand_ecc.c     |  2 +-
>  drivers/mtd/nand/omap2.c        |  6 +++---
>  drivers/mtd/nand/r852.c         |  4 ++--
>  include/linux/mtd/nand.h        |  8 +++++++-
>  include/linux/mtd/nand_bch.h    |  2 +-
>  11 files changed, 37 insertions(+), 23 deletions(-)

Pushed patch 1, with added commentary about the bf5xx_nand bugfix.

Brian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 2/5] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions
  2015-12-30 19:32 ` [PATCH v4 2/5] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions Boris Brezillon
@ 2016-01-07  2:50   ` Brian Norris
  0 siblings, 0 replies; 13+ messages in thread
From: Brian Norris @ 2016-01-07  2:50 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: David Woodhouse, linux-mtd, Franklin S Cooper Jr.,
	Maxim Levitsky, Nicolas Ferre, Jean-Christophe Plagniol-Villard,
	Alexandre Belloni, linux-kernel

On Wed, Dec 30, 2015 at 08:32:04PM +0100, Boris Brezillon wrote:
> The default NAND read functions are relying on the underlying controller
> driver to correct bitflips, but some of those controllers cannot properly
> fix bitflips in erased pages.
> Check for bitflips in erased pages in default core functions if the driver
> delegated the this check by setting the NAND_ECC_GENERIC_ERASED_CHECK flag.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

Pushed to l2-mtd.git

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 3/5] mtd: nand: davinci: remove custom 'erased check' implementation
  2015-12-30 19:32 ` [PATCH v4 3/5] mtd: nand: davinci: remove custom 'erased check' implementation Boris Brezillon
@ 2016-01-07  2:55   ` Brian Norris
  0 siblings, 0 replies; 13+ messages in thread
From: Brian Norris @ 2016-01-07  2:55 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: David Woodhouse, linux-mtd, Franklin S Cooper Jr.,
	Maxim Levitsky, Nicolas Ferre, Jean-Christophe Plagniol-Villard,
	Alexandre Belloni, linux-kernel

On Wed, Dec 30, 2015 at 08:32:05PM +0100, Boris Brezillon wrote:
> The davinci drivers is manually checking for 'erased pages' while
> correcting ECC bytes.
> This logic can now done by the core infrastructure, and can thus be removed
> from this drivers.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

Pushed patch 3 to l2-mtd.git

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 0/5] mtd: nand: properly handle bitflips in erased pages
  2015-12-30 19:32 [PATCH v4 0/5] mtd: nand: properly handle bitflips in erased pages Boris Brezillon
                   ` (5 preceding siblings ...)
  2015-12-30 23:34 ` [PATCH v4 0/5] mtd: nand: properly handle bitflips in erased pages Franklin S Cooper Jr.
@ 2016-01-07  3:02 ` Brian Norris
  6 siblings, 0 replies; 13+ messages in thread
From: Brian Norris @ 2016-01-07  3:02 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: David Woodhouse, linux-mtd, Franklin S Cooper Jr.,
	Maxim Levitsky, Nicolas Ferre, Jean-Christophe Plagniol-Villard,
	Alexandre Belloni, linux-kernel

Hi Boris,

On Wed, Dec 30, 2015 at 08:32:02PM +0100, Boris Brezillon wrote:
> Hi,
> 
> This patch series aims at providing a common logic to check for bitflips
> in erased pages.
> 
> Currently each driver is implementing its own logic to check for bitflips
> in erased pages. Not only this create code duplication, but most of these
> implementations are incorrect.
> Here are a few aspects that are often left aside in those implementations:
> 1/ they do not check OOB bytes when checking for the ff pattern, which
>    means they can consider a page as empty while the MTD user actually
>    wanted to write almost ff with a few bits to zero
> 2/ they check for the ff pattern on the whole page, while ECC actually
>    works on smaller chunks (usually 512 or 1024 bytes chunks)
> 3/ they use random bitflip thresholds to decide whether a page/chunk is
>    erased or not. IMO this threshold should be set to ECC strength (or
>    at least something correlated to this parameter)
> 
> The approach taken in this series is to provide two helper functions to
> check for bitflips in erased pages. Each driver that needs to check for
> such cases can then call the nand_check_erased_ecc_chunk() function, and
> rely on the common logic to decide whether a page is erased or not.
> 
> While Brian suggested a few times to make this detection automatic for
> all drivers that set a specific flag (NAND_CHECK_ERASED_BITFLIPS?), here
> is a few reasons I think this is not such a good idea:
> 1/ some (a lot of) drivers do not properly implement the raw access
>    functions, and since we need to check for raw data and OOB bytes this
>    makes the automatic detection unusable for most drivers unless they
>    decide to correctly implement those methods (which would be a good
>    thing BTW).
> 2/ as a I said earlier, this check should be made at the ECC chunk level
>    and not at the page level. This spots two problems: some (a lot of)
>    drivers do not properly specify the ecc layout information, and even
>    if the ecc layout is correctly defined, there is no way to attach ECC
>    bytes to a specific ECC chunk.
> 3/ the last aspect is the perf penalty incured by this test. Automatically
>    doing that at the NAND core level implies reading the whole page again
>    in raw mode, while with the helper function approach, drivers supporting
>    access at the ECC chunk level can read only the faulty chunk in raw
>    mode.
> 
> Regarding the bitflips threshold at which an erased pages is considered as
> faulty, I have assigned it to ECC strength. As mentioned by Andrea, using
> ECC strength might cause some trouble, because if you already have some
> bitflips in an erased page, programming it might generate even more of
> them.
> In the other hand, shouldn't that be checked after (or before) programming
> a page. I mean, UBI is already capable of detecting pages which are over
> the configured bitflips_threshold and move data around when it detects
> such pages.
> If we check data after writing a page we wouldn't have to bother about
> setting a weaker value for the "bitflips in erased page" case.
> Another thing in favor of the ECC strength value for this "bitflips in
> erased page" threshold value: if the ECC engine is generating 0xff ECC
> bytes when the page is empty, then it will be able to fix ECC strength
> bitflips without complaining, so why should we use different value when
> we detect bitflips using the pattern match approach?

Thanks for the full description here. I agree with most, if not all of
this.

> Best Regards,
> 
> Boris
> 
> Changes since v3:
> - drop already applied patches
> - make the generic "bitflips in erased pages" check as an opt-in flag
> - split driver changes to ease review
> - addressed Brian's comments
> 
> Changes since v2:
> - improve nand_check_erased_buf() implementation
> - keep nand_check_erased_buf() private to nand_base.c
> - patch existing ecc.correct() implementations to return consistent error
>   codes
> - make the 'erased check' optional
> - remove some custom implementations of the 'erased check'
> 
> Changes since v1:
> - fix the nand_check_erased_buf() function
> - mark the bitflips > bitflips_threshold condition as unlikely
> - add missing memsets in nand_check_erased_ecc_chunk()
> 
> 
> Boris Brezillon (5):
>   mtd: nand: return consistent error codes in ecc.correct()
>     implementations
>   mtd: nand: use nand_check_erased_ecc_chunk in default ECC read
>     functions
>   mtd: nand: davinci: remove custom 'erased check' implementation
>   mtd: nand: diskonchip: remove custom 'erased check' implementation
>   mtd: nand: jz4740: remove custom 'erased check' implementation

Pushed the rest of the series. If someone ends up reviewing/testing the
last 2 (I'm not counting on it), then we can revisit them.

Thanks,
Brian

>  drivers/mtd/nand/atmel_nand.c   |  2 +-
>  drivers/mtd/nand/bf5xx_nand.c   | 20 +++++++++++-----
>  drivers/mtd/nand/davinci_nand.c | 15 ++++--------
>  drivers/mtd/nand/diskonchip.c   | 37 ++--------------------------
>  drivers/mtd/nand/jz4740_nand.c  | 22 ++---------------
>  drivers/mtd/nand/mxc_nand.c     |  4 ++--
>  drivers/mtd/nand/nand_base.c    | 53 +++++++++++++++++++++++++++++++++++------
>  drivers/mtd/nand/nand_bch.c     |  2 +-
>  drivers/mtd/nand/nand_ecc.c     |  2 +-
>  drivers/mtd/nand/omap2.c        |  6 ++---
>  drivers/mtd/nand/r852.c         |  4 ++--
>  include/linux/mtd/nand.h        | 18 +++++++++++++-
>  include/linux/mtd/nand_bch.h    |  2 +-
>  13 files changed, 96 insertions(+), 91 deletions(-)
> 
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-01-07  3:02 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-30 19:32 [PATCH v4 0/5] mtd: nand: properly handle bitflips in erased pages Boris Brezillon
2015-12-30 19:32 ` [PATCH v4 1/5] mtd: nand: return consistent error codes in ecc.correct() implementations Boris Brezillon
2016-01-07  2:48   ` Brian Norris
2015-12-30 19:32 ` [PATCH v4 2/5] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions Boris Brezillon
2016-01-07  2:50   ` Brian Norris
2015-12-30 19:32 ` [PATCH v4 3/5] mtd: nand: davinci: remove custom 'erased check' implementation Boris Brezillon
2016-01-07  2:55   ` Brian Norris
2015-12-30 19:32 ` [PATCH v4 4/5] mtd: nand: diskonchip: " Boris Brezillon
2015-12-30 19:39   ` [PATCH v5 " Boris Brezillon
2015-12-30 19:32 ` [PATCH v4 5/5] mtd: nand: jz4740: " Boris Brezillon
2015-12-30 19:41   ` [PATCH v5 " Boris Brezillon
2015-12-30 23:34 ` [PATCH v4 0/5] mtd: nand: properly handle bitflips in erased pages Franklin S Cooper Jr.
2016-01-07  3:02 ` Brian Norris

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.