All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes
@ 2014-01-17  7:13 Pekon Gupta
  2014-01-17  7:13 ` [PATCH v7 1/6] mtd: nand: omap: add field to indicate current ecc-scheme in 'struct omap_nand_info' Pekon Gupta
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Pekon Gupta @ 2014-01-17  7:13 UTC (permalink / raw)
  To: Brian Norris, Artem Bityutskiy
  Cc: Stefan Roese, linux-mtd, Pekon Gupta, Felipe Balbi


*changes v6 -> v7*
[PATCH 1/6] <no change>
[PATCH 2/6] <no change>
[PATCH 3/6] <dropped older version, refer commit message>
 - erased-page detection done based on count of zero-bits in OOB & DATA
 - incorporated some feedbacks from Brian Norris <computersforpeace@gmail.com>
	http://lists.infradead.org/pipermail/linux-mtd/2014-January/051533.html
	http://lists.infradead.org/pipermail/linux-mtd/2014-January/051534.html
[PATCH 4/6] <new> remove redundant bit-flip counting for erased-page
[PATCH 5/6] <minor cleanup>
[PATCH 6/6] <minor cleanup>
(As there are functional changes in this version, so not attaching
 Tested-by: Stefan Roese <sr@denx.de>)


*changes v5 -> v6*
[PATCH 1/6] <no change>
[PATCH 2/6] introduced variable 'actual_eccbytes' to omit reserved byte-position
[PATCH 3/6] split from [PATCH v5 3/5] fix erased-page bit-flip correction
[PATCH 4/6] split from [PATCH v5 3/5] fix erased-page detection
[PATCH 5/6] <minor cleanup>
[PATCH 6/6] <no change>


*changes v4 -> v5*
This patch series was split version of earlier patch:
http://lists.infradead.org/pipermail/linux-mtd/2013-November/050241.html

chip->ecc.correct() is used for detecting and correcting bit-flips during read
operations. In OMAP NAND driver different ecc-schemes have different callbacks:
 - omap_correct_data()		for HAM1_HW ecc-schemes (Untouched)
 - nand_bch_correct_data()	for BCHx_HW_DETECTION_SW ecc-schemes (Untouched)
 - omap_elm_correct_data()	for BCHx_HW ecc-schemes

This patch-series fixes following issues in omap_elm_correct_data():
(1) Dependency on a specific reserved byte-position in OOB area
    to differentiates between erased-pages v/s programmed-pages.
    Problem: reserved byte-position cannot be accomodated in all ecc-schemes
    Problem: reserved byte-position can itself be subjected upto 8 bit-flips
             causing the 0xff to become 0x00, causing page to be
             mis-recognized as erased-page.

(2) Bit-flips in erased-pages are detected by comparing each byte of Data & OOB
    with 0xff in check_erased_page().
    Problem: This is causes performance penalty when erased-pages are checked.

(3) Current code is not scalable for future ECC schemes due to presence of 
    tweaks for BCH4_ECC and BCH8_ECC at multiple places.

(4) Currently, bit-flips are evaluated and fixed even when ELM reports them as
    un-correctable bit-flips, this should not happen as 'number-of-error' field
    in ELM_LOCATION_STATUS becomes invalid when un-correctable flag is set.

(5) Driver should return with error-code = '-EBADMSG' when
     uncorrectable bit-flip is detected
     bit-flip outside valid Data and OOB region is detected


Pekon Gupta (6):
  mtd: nand: omap: add field to indicate current ecc-scheme in 'struct
    omap_nand_info'
  mtd: nand: omap: ecc.correct: omap_elm_correct_data: rename ambiguous
    variable 'eccsize' and 'ecc_vector_size'
  mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix erased-page
    detection for BCHx_HW ECC schemes
  mtd: nand: omap: ecc.correct: omap_elm_correct_data: remove redundant
    bit-flip counting for erased-page
  mtd: nand: omap: ecc.correct: omap_elm_correct_data: cleanup for
    future enhancements
  mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix
    programmed-page bit-flip correction logic

 drivers/mtd/nand/omap2.c | 265 +++++++++++++++++++++++------------------------
 1 file changed, 127 insertions(+), 138 deletions(-)

-- 
1.8.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v7 1/6] mtd: nand: omap: add field to indicate current ecc-scheme in 'struct omap_nand_info'
  2014-01-17  7:13 [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Pekon Gupta
@ 2014-01-17  7:13 ` Pekon Gupta
  2014-01-17  7:13 ` [PATCH v7 2/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: rename ambiguous variable 'eccsize' and 'ecc_vector_size' Pekon Gupta
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Pekon Gupta @ 2014-01-17  7:13 UTC (permalink / raw)
  To: Brian Norris, Artem Bityutskiy
  Cc: Stefan Roese, linux-mtd, Pekon Gupta, Felipe Balbi

Information of currently selected ECC scheme 'enum omap_ecc ecc_opt' should
available outside platform-data, so that single nand_chip->ecc callback can
support multiple ecc-scheme configurations.

Signed-off-by: Pekon Gupta <pekon@ti.com>
---
 drivers/mtd/nand/omap2.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/mtd/nand/omap2.c b/drivers/mtd/nand/omap2.c
index 768e7cf..8fb64d5 100644
--- a/drivers/mtd/nand/omap2.c
+++ b/drivers/mtd/nand/omap2.c
@@ -160,6 +160,7 @@ struct omap_nand_info {
 	int				gpmc_cs;
 	unsigned long			phys_base;
 	unsigned long			mem_size;
+	enum omap_ecc			ecc_opt;
 	struct completion		comp;
 	struct dma_chan			*dma;
 	int				gpmc_irq_fifo;
@@ -1657,6 +1658,7 @@ static int omap_nand_probe(struct platform_device *pdev)
 	info->gpmc_cs		= pdata->cs;
 	info->reg		= pdata->reg;
 	info->of_node		= pdata->of_node;
+	info->ecc_opt		= pdata->ecc_opt;
 	mtd			= &info->mtd;
 	mtd->priv		= &info->nand;
 	mtd->name		= dev_name(&pdev->dev);
@@ -1812,7 +1814,7 @@ static int omap_nand_probe(struct platform_device *pdev)
 	/* populate MTD interface based on ECC scheme */
 	nand_chip->ecc.layout	= &omap_oobinfo;
 	ecclayout		= &omap_oobinfo;
-	switch (pdata->ecc_opt) {
+	switch (info->ecc_opt) {
 	case OMAP_ECC_HAM1_CODE_HW:
 		pr_info("nand: using OMAP_ECC_HAM1_CODE_HW\n");
 		nand_chip->ecc.mode             = NAND_ECC_HW;
-- 
1.8.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 2/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: rename ambiguous variable 'eccsize' and 'ecc_vector_size'
  2014-01-17  7:13 [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Pekon Gupta
  2014-01-17  7:13 ` [PATCH v7 1/6] mtd: nand: omap: add field to indicate current ecc-scheme in 'struct omap_nand_info' Pekon Gupta
@ 2014-01-17  7:13 ` Pekon Gupta
  2014-01-17  7:13 ` [PATCH v7 3/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix erased-page detection for BCHx_HW ECC schemes Pekon Gupta
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Pekon Gupta @ 2014-01-17  7:13 UTC (permalink / raw)
  To: Brian Norris, Artem Bityutskiy
  Cc: Stefan Roese, linux-mtd, Pekon Gupta, Felipe Balbi

renaming following variables as they cause confusion due to resemblence to
another similar field in 'struct nand_ecc_ctrl'  (nand_chip->ecc.size).
  ecc_vector_size = info->nand.ecc.bytes  ---> eccbytes = info->nand.ecc.bytes;
  eccsize = ecc_vector_size - 1;          ---> actual_eccbytes - 1;

Signed-off-by: Pekon Gupta <pekon@ti.com>
---
 drivers/mtd/nand/omap2.c | 39 +++++++++++++++++++++++----------------
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/drivers/mtd/nand/omap2.c b/drivers/mtd/nand/omap2.c
index 8fb64d5..2c73389 100644
--- a/drivers/mtd/nand/omap2.c
+++ b/drivers/mtd/nand/omap2.c
@@ -1359,9 +1359,10 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 {
 	struct omap_nand_info *info = container_of(mtd, struct omap_nand_info,
 			mtd);
+	int eccbytes	= info->nand.ecc.bytes;
 	int eccsteps = info->nand.ecc.steps;
 	int i , j, stat = 0;
-	int eccsize, eccflag, ecc_vector_size;
+	int eccflag, actual_eccbytes;
 	struct elm_errorvec err_vec[ERROR_VECTOR_MAX];
 	u_char *ecc_vec = calc_ecc;
 	u_char *spare_ecc = read_ecc;
@@ -1369,6 +1370,20 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 	enum bch_ecc type;
 	bool is_error_reported = false;
 
+	switch (info->ecc_opt) {
+	case OMAP_ECC_BCH4_CODE_HW:
+		/* omit  7th ECC byte reserved for ROM code compatibility */
+		actual_eccbytes = eccbytes - 1;
+		break;
+	case OMAP_ECC_BCH8_CODE_HW:
+		/* omit 14th ECC byte reserved for ROM code compatibility */
+		actual_eccbytes = eccbytes - 1;
+		break;
+	default:
+		pr_err("invalid driver configuration\n");
+		return -EINVAL;
+	}
+
 	/* Initialize elm error vector to zero */
 	memset(err_vec, 0, sizeof(err_vec));
 
@@ -1380,14 +1395,6 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 		erased_ecc_vec = bch4_vector;
 	}
 
-	ecc_vector_size = info->nand.ecc.bytes;
-
-	/*
-	 * Remove extra byte padding for BCH8 RBL
-	 * compatibility and erased page handling
-	 */
-	eccsize = ecc_vector_size - 1;
-
 	for (i = 0; i < eccsteps ; i++) {
 		eccflag = 0;	/* initialize eccflag */
 
@@ -1395,8 +1402,7 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 		 * Check any error reported,
 		 * In case of error, non zero ecc reported.
 		 */
-
-		for (j = 0; (j < eccsize); j++) {
+		for (j = 0; j < actual_eccbytes; j++) {
 			if (calc_ecc[j] != 0) {
 				eccflag = 1; /* non zero ecc, error present */
 				break;
@@ -1421,7 +1427,7 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 			 * zeros are more than threshold erased page, either
 			 * case page reported as uncorrectable.
 			 */
-			if (hweight8(~read_ecc[eccsize]) >= threshold) {
+			if (hweight8(~read_ecc[actual_eccbytes]) >= threshold) {
 				/*
 				 * Update elm error vector as
 				 * data area is programmed
@@ -1433,7 +1439,8 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 				int bitflip_count;
 				u_char *buf = &data[info->nand.ecc.size * i];
 
-				if (memcmp(calc_ecc, erased_ecc_vec, eccsize)) {
+				if (memcmp(calc_ecc, erased_ecc_vec,
+							 actual_eccbytes)) {
 					bitflip_count = erased_sector_bitflips(
 							buf, read_ecc, info);
 
@@ -1446,8 +1453,8 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 		}
 
 		/* Update the ecc vector */
-		calc_ecc += ecc_vector_size;
-		read_ecc += ecc_vector_size;
+		calc_ecc += eccbytes;
+		read_ecc += eccbytes;
 	}
 
 	/* Check if any error reported */
@@ -1496,7 +1503,7 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 
 		/* Update page data with sector size */
 		data += info->nand.ecc.size;
-		spare_ecc += ecc_vector_size;
+		spare_ecc += eccbytes;
 	}
 
 	for (i = 0; i < eccsteps; i++)
-- 
1.8.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 3/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix erased-page detection for BCHx_HW ECC schemes
  2014-01-17  7:13 [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Pekon Gupta
  2014-01-17  7:13 ` [PATCH v7 1/6] mtd: nand: omap: add field to indicate current ecc-scheme in 'struct omap_nand_info' Pekon Gupta
  2014-01-17  7:13 ` [PATCH v7 2/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: rename ambiguous variable 'eccsize' and 'ecc_vector_size' Pekon Gupta
@ 2014-01-17  7:13 ` Pekon Gupta
  2014-02-11 21:34   ` Brian Norris
  2014-01-17  7:13 ` [PATCH v7 4/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: remove redundant bit-flip counting for erased-page Pekon Gupta
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 12+ messages in thread
From: Pekon Gupta @ 2014-01-17  7:13 UTC (permalink / raw)
  To: Brian Norris, Artem Bityutskiy
  Cc: Stefan Roese, linux-mtd, Pekon Gupta, Felipe Balbi

As erased-pages do not have ECC stored in their OOB area, so they need to be
seperated out from programmed-pages, before doing BCH ECC correction.

In current implementation of omap_elm_correct_data() which does ECC correction
for BCHx ECC schemes, this erased-pages are detected based on specific marker
byte (reserved as 0x00) in ecc-layout.
However, this approach has some limitation like;
 1) All ecc-scheme layouts do not have such Reserved byte marker to
    differentiate between erased-page v/s programmed-page. Thus this is a
    customized solution.
 2) Reserved marker byte can itself be subjected to bit-flips causing
    erased-page to be misunderstood as programmed-page.

This patch removes dependency on any marker byte in ecc-layout, to differentiate
between erased-page v/s programeed-page. Instead a page is considered erased if
 (a) all(OOB)  == 0xff, .i.e., number of zero-bits in read_ecc[] == 0
 (b) number of zero-bits in (Data + OOB) is less than ecc-strength

This patch also adds a generic function count_zero_bits(), to find number of
bits which are '0' in given buffer. This function is optimized for comparing
read-data with 0xff.

Signed-off-by: Pekon Gupta <pekon@ti.com>
---
 drivers/mtd/nand/omap2.c | 88 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 71 insertions(+), 17 deletions(-)

diff --git a/drivers/mtd/nand/omap2.c b/drivers/mtd/nand/omap2.c
index 2c73389..f833fbb 100644
--- a/drivers/mtd/nand/omap2.c
+++ b/drivers/mtd/nand/omap2.c
@@ -1331,6 +1331,30 @@ static int erased_sector_bitflips(u_char *data, u_char *oob,
 }
 
 /**
+ * count_zero_bits - count number of bit-flips in given buffer
+ * @buf:	pointer to buffer
+ * @length:	buffer length
+ * @max_bitflips: maximum number of correctable bit-flips (ecc.strength)
+ * $bitflip_count: pointer to store bit-flip count
+ *
+ * Counts number of bit-flips in given buffer, returns with error
+ * as soon as count exceeds max_bitflips limit.
+ */
+static int count_zero_bits(u_char *buf, int length,
+					 int max_bitflips, int *bitflip_count)
+{	int i;
+
+	for (i = 0; i < length; i++) {
+		if (unlikely(buf[i] != 0xff)) {
+			*bitflip_count += hweight8(~buf[i]);
+			if (unlikely(*bitflip_count > max_bitflips))
+				return -EBADMSG;
+		}
+	}
+	return 0;
+}
+
+/**
  * omap_elm_correct_data - corrects page data area in case error reported
  * @mtd:	MTD device structure
  * @data:	page data
@@ -1359,14 +1383,21 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 {
 	struct omap_nand_info *info = container_of(mtd, struct omap_nand_info,
 			mtd);
-	int eccbytes	= info->nand.ecc.bytes;
-	int eccsteps = info->nand.ecc.steps;
+	struct nand_ecc_ctrl *ecc = &info->nand.ecc;
+	int eccstrength		= ecc->strength;
+	int eccsize		= ecc->size;
+	int eccbytes		= ecc->bytes;
+	int eccsteps		= ecc->steps;
 	int i , j, stat = 0;
 	int eccflag, actual_eccbytes;
 	struct elm_errorvec err_vec[ERROR_VECTOR_MAX];
 	u_char *ecc_vec = calc_ecc;
 	u_char *spare_ecc = read_ecc;
 	u_char *erased_ecc_vec;
+	u_char *buf;
+	int bitflip_count;
+	int err;
+	bool page_is_erased;
 	enum bch_ecc type;
 	bool is_error_reported = false;
 
@@ -1410,24 +1441,47 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 		}
 
 		if (eccflag == 1) {
-			/*
-			 * Set threshold to minimum of 4, half of ecc.strength/2
-			 * to allow max bit flip in byte to 4
-			 */
-			unsigned int threshold = min_t(unsigned int, 4,
-					info->nand.ecc.strength / 2);
+			bitflip_count = 0;
+			/* count zero-bits in OOB region */
+			err = count_zero_bits(read_ecc, eccbytes,
+						 eccstrength, &bitflip_count);
+			if (err) {
+				/*
+				 * number of zero-bits in OOB > ecc-strength
+				 * either un-correctable or programmed-page
+				 */
+				page_is_erased = false;
+			} else if (bitflip_count == 0) {
+				/* OOB == all(0xff) */
+				page_is_erased = true;
+			} else {
+				/*
+				 * OOB has some bits as '0' but
+				 * number of zero-bits in OOB < ecc-strength.
+				 * It may be erased-page with bit-flips so,
+				 * further checks are needed for confirmation
+				 */
+				/* count zero-bits in DATA region */
+				buf = &data[eccsize * i];
+				err = count_zero_bits(buf, eccsize,
+						 eccstrength, &bitflip_count);
+				if (err) {
+					/*
+					 * total number of zero-bits in OOB
+					 * and DATA exceeds ecc-strength
+					 */
+					page_is_erased = false;
+				} else {
+					/* number of zero-bits < ecc-strength */
+					page_is_erased = true;
+				}
+			}
 
 			/*
-			 * Check data area is programmed by counting
-			 * number of 0's at fixed offset in spare area.
-			 * Checking count of 0's against threshold.
-			 * In case programmed page expects at least threshold
-			 * zeros in byte.
-			 * If zeros are less than threshold for programmed page/
-			 * zeros are more than threshold erased page, either
-			 * case page reported as uncorrectable.
+			 * erased-pages do not have ECC stored in OOB area,
+			 * so they need to be handled separately.
 			 */
-			if (hweight8(~read_ecc[actual_eccbytes]) >= threshold) {
+			if (!page_is_erased) {
 				/*
 				 * Update elm error vector as
 				 * data area is programmed
-- 
1.8.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 4/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: remove redundant bit-flip counting for erased-page
  2014-01-17  7:13 [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Pekon Gupta
                   ` (2 preceding siblings ...)
  2014-01-17  7:13 ` [PATCH v7 3/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix erased-page detection for BCHx_HW ECC schemes Pekon Gupta
@ 2014-01-17  7:13 ` Pekon Gupta
  2014-01-17  7:13 ` [PATCH v7 5/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: cleanup for future enhancements Pekon Gupta
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Pekon Gupta @ 2014-01-17  7:13 UTC (permalink / raw)
  To: Brian Norris, Artem Bityutskiy
  Cc: Stefan Roese, linux-mtd, Pekon Gupta, Felipe Balbi

This patch:
 - Removes erased_sector_bitflips() as counting of bit-flips is handled by
   count_zero_bits() in more optimized way.
   And count of bit-flips in both OOB and DATA region is anyways required for
   correct identification of erased-pages.

 - read-buffer is pre-filled with 0xff when an erased-page is detected, as
   erased-page can only contain 0xff bytes.

 - Removes static arrays bch8_vector[] and bch4_vector, which stores
   ECC signature with all(0xff) bytes.

Signed-off-by: Pekon Gupta <pekon@ti.com>
---
 drivers/mtd/nand/omap2.c | 88 +++++-------------------------------------------
 1 file changed, 9 insertions(+), 79 deletions(-)

diff --git a/drivers/mtd/nand/omap2.c b/drivers/mtd/nand/omap2.c
index f833fbb..78b7106 100644
--- a/drivers/mtd/nand/omap2.c
+++ b/drivers/mtd/nand/omap2.c
@@ -141,11 +141,6 @@
 
 #define BADBLOCK_MARKER_LENGTH		2
 
-#ifdef CONFIG_MTD_NAND_OMAP_BCH
-static u_char bch8_vector[] = {0xf3, 0xdb, 0x14, 0x16, 0x8b, 0xd2, 0xbe, 0xcc,
-	0xac, 0x6b, 0xff, 0x99, 0x7b};
-static u_char bch4_vector[] = {0x00, 0x6b, 0x31, 0xdd, 0x41, 0xbc, 0x10};
-#endif
 
 /* oob info generated runtime depending on ecc algorithm and layout selected */
 static struct nand_ecclayout omap_oobinfo;
@@ -1292,45 +1287,6 @@ static int omap3_calculate_ecc_bch(struct mtd_info *mtd, const u_char *dat,
 }
 
 /**
- * erased_sector_bitflips - count bit flips
- * @data:	data sector buffer
- * @oob:	oob buffer
- * @info:	omap_nand_info
- *
- * Check the bit flips in erased page falls below correctable level.
- * If falls below, report the page as erased with correctable bit
- * flip, else report as uncorrectable page.
- */
-static int erased_sector_bitflips(u_char *data, u_char *oob,
-		struct omap_nand_info *info)
-{
-	int flip_bits = 0, i;
-
-	for (i = 0; i < info->nand.ecc.size; i++) {
-		flip_bits += hweight8(~data[i]);
-		if (flip_bits > info->nand.ecc.strength)
-			return 0;
-	}
-
-	for (i = 0; i < info->nand.ecc.bytes - 1; i++) {
-		flip_bits += hweight8(~oob[i]);
-		if (flip_bits > info->nand.ecc.strength)
-			return 0;
-	}
-
-	/*
-	 * Bit flips falls in correctable level.
-	 * Fill data area with 0xFF
-	 */
-	if (flip_bits) {
-		memset(data, 0xFF, info->nand.ecc.size);
-		memset(oob, 0xFF, info->nand.ecc.bytes);
-	}
-
-	return flip_bits;
-}
-
-/**
  * count_zero_bits - count number of bit-flips in given buffer
  * @buf:	pointer to buffer
  * @length:	buffer length
@@ -1359,24 +1315,10 @@ static int count_zero_bits(u_char *buf, int length,
  * @mtd:	MTD device structure
  * @data:	page data
  * @read_ecc:	ecc read from nand flash
- * @calc_ecc:	ecc read from HW ECC registers
- *
- * Calculated ecc vector reported as zero in case of non-error pages.
- * In case of error/erased pages non-zero error vector is reported.
- * In case of non-zero ecc vector, check read_ecc at fixed offset
- * (x = 13/7 in case of BCH8/4 == 0) to find page programmed or not.
- * To handle bit flips in this data, count the number of 0's in
- * read_ecc[x] and check if it greater than 4. If it is less, it is
- * programmed page, else erased page.
- *
- * 1. If page is erased, check with standard ecc vector (ecc vector
- * for erased page to find any bit flip). If check fails, bit flip
- * is present in erased page. Count the bit flips in erased page and
- * if it falls under correctable level, report page with 0xFF and
- * update the correctable bit information.
- * 2. If error is reported on programmed page, update elm error
- * vector and correct the page with ELM error correction routine.
- *
+ * @calc_ecc:	ecc calculated after reading Data and OOB regions from flash
+ *		calc_ecc would be non-zero only in following cases:
+ *		- its a programmed-page with bit-flips
+ *		- its an erased-page (may or may not have bit-flips)
  */
 static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 				u_char *read_ecc, u_char *calc_ecc)
@@ -1393,7 +1335,6 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 	struct elm_errorvec err_vec[ERROR_VECTOR_MAX];
 	u_char *ecc_vec = calc_ecc;
 	u_char *spare_ecc = read_ecc;
-	u_char *erased_ecc_vec;
 	u_char *buf;
 	int bitflip_count;
 	int err;
@@ -1420,10 +1361,8 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 
 	if (info->nand.ecc.strength == BCH8_MAX_ERROR) {
 		type = BCH8_ECC;
-		erased_ecc_vec = bch8_vector;
 	} else {
 		type = BCH4_ECC;
-		erased_ecc_vec = bch4_vector;
 	}
 
 	for (i = 0; i < eccsteps ; i++) {
@@ -1489,20 +1428,11 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 				err_vec[i].error_reported = true;
 				is_error_reported = true;
 			} else {
-				/* Error reported in erased page */
-				int bitflip_count;
-				u_char *buf = &data[info->nand.ecc.size * i];
-
-				if (memcmp(calc_ecc, erased_ecc_vec,
-							 actual_eccbytes)) {
-					bitflip_count = erased_sector_bitflips(
-							buf, read_ecc, info);
-
-					if (bitflip_count)
-						stat += bitflip_count;
-					else
-						return -EINVAL;
-				}
+				/* report bit-flips found in erased-page */
+				stat += bitflip_count;
+				/* return clean read-buffer */
+				buf = &data[eccsize * i];
+				memset(buf, 0xff, eccsize);
 			}
 		}
 
-- 
1.8.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 5/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: cleanup for future enhancements
  2014-01-17  7:13 [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Pekon Gupta
                   ` (3 preceding siblings ...)
  2014-01-17  7:13 ` [PATCH v7 4/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: remove redundant bit-flip counting for erased-page Pekon Gupta
@ 2014-01-17  7:13 ` Pekon Gupta
  2014-01-17  7:13 ` [PATCH v7 6/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix programmed-page bit-flip correction logic Pekon Gupta
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Pekon Gupta @ 2014-01-17  7:13 UTC (permalink / raw)
  To: Brian Norris, Artem Bityutskiy
  Cc: Stefan Roese, linux-mtd, Pekon Gupta, Felipe Balbi

Current omap_elm_correct_data() code is not scalable for future ecc-schemes
due to presence of tweaks and hard-coded macros for BCH4_ECC and BCH8_ECC
ecc-schemes at multiple places.

This patch:
 - replaces 'ecc_opt' with '(info->nand.ecc.strength == BCH8_MAX_ERROR)
   used to differentiate between BCH8_HW and BCH4_SW
 - replaces macros (defining magic number for specific ecc-scheme) with
   generic variables
 - removes dependency on macros defined in elm.h (like BCHx_ECC_OOB_BYTES)

Signed-off-by: Pekon Gupta <pekon@ti.com>
---
 drivers/mtd/nand/omap2.c | 38 +++++++++++++-------------------------
 1 file changed, 13 insertions(+), 25 deletions(-)

diff --git a/drivers/mtd/nand/omap2.c b/drivers/mtd/nand/omap2.c
index 78b7106..d0d9682 100644
--- a/drivers/mtd/nand/omap2.c
+++ b/drivers/mtd/nand/omap2.c
@@ -118,14 +118,9 @@
 
 #define OMAP24XX_DMA_GPMC		4
 
-#define BCH8_MAX_ERROR		8	/* upto 8 bit correctable */
-#define BCH4_MAX_ERROR		4	/* upto 4 bit correctable */
-
 #define SECTOR_BYTES		512
 /* 4 bit padding to make byte aligned, 56 = 52 + 4 */
 #define BCH4_BIT_PAD		4
-#define BCH8_ECC_MAX		((SECTOR_BYTES + BCH8_ECC_OOB_BYTES) * 8)
-#define BCH4_ECC_MAX		((SECTOR_BYTES + BCH4_ECC_OOB_BYTES) * 8)
 
 /* GPMC ecc engine settings for read */
 #define BCH_WRAPMODE_1		1	/* BCH wrap mode 1 */
@@ -1338,8 +1333,8 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 	u_char *buf;
 	int bitflip_count;
 	int err;
+	u32 bit_pos, byte_pos, error_max, pos;
 	bool page_is_erased;
-	enum bch_ecc type;
 	bool is_error_reported = false;
 
 	switch (info->ecc_opt) {
@@ -1359,12 +1354,6 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 	/* Initialize elm error vector to zero */
 	memset(err_vec, 0, sizeof(err_vec));
 
-	if (info->nand.ecc.strength == BCH8_MAX_ERROR) {
-		type = BCH8_ECC;
-	} else {
-		type = BCH4_ECC;
-	}
-
 	for (i = 0; i < eccsteps ; i++) {
 		eccflag = 0;	/* initialize eccflag */
 
@@ -1451,20 +1440,19 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 	for (i = 0; i < eccsteps; i++) {
 		if (err_vec[i].error_reported) {
 			for (j = 0; j < err_vec[i].error_count; j++) {
-				u32 bit_pos, byte_pos, error_max, pos;
-
-				if (type == BCH8_ECC)
-					error_max = BCH8_ECC_MAX;
-				else
-					error_max = BCH4_ECC_MAX;
-
-				if (info->nand.ecc.strength == BCH8_MAX_ERROR)
-					pos = err_vec[i].error_loc[j];
-				else
-					/* Add 4 to take care 4 bit padding */
+				switch (info->ecc_opt) {
+				case OMAP_ECC_BCH4_CODE_HW:
+					/* Add 4 bits to take care of padding */
 					pos = err_vec[i].error_loc[j] +
 						BCH4_BIT_PAD;
-
+					break;
+				case OMAP_ECC_BCH8_CODE_HW:
+					pos = err_vec[i].error_loc[j];
+					break;
+				default:
+					return -EINVAL;
+				}
+				error_max = (eccsize + actual_eccbytes) * 8;
 				/* Calculate bit position of error */
 				bit_pos = pos % 8;
 
@@ -1486,7 +1474,7 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 		stat += err_vec[i].error_count;
 
 		/* Update page data with sector size */
-		data += info->nand.ecc.size;
+		data += eccsize;
 		spare_ecc += eccbytes;
 	}
 
-- 
1.8.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 6/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix programmed-page bit-flip correction logic
  2014-01-17  7:13 [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Pekon Gupta
                   ` (4 preceding siblings ...)
  2014-01-17  7:13 ` [PATCH v7 5/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: cleanup for future enhancements Pekon Gupta
@ 2014-01-17  7:13 ` Pekon Gupta
  2014-01-29 13:18 ` [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Stefan Roese
  2014-02-04 11:37 ` Gupta, Pekon
  7 siblings, 0 replies; 12+ messages in thread
From: Pekon Gupta @ 2014-01-17  7:13 UTC (permalink / raw)
  To: Brian Norris, Artem Bityutskiy
  Cc: Stefan Roese, linux-mtd, Pekon Gupta, Felipe Balbi

This patch updates following checks when bit-flips are detected by ELM:

 - Do not evaluate bit-flips when un-correctable bit-flips is reported by ELM,
   because as per [1] when ELM reports an un-correctable bit-flips,
   'number of error' field in its ELM_LOCATION_STATUS register is also invalid.

 - Return with error-code '-EBADMSG' on detection of un-correctable bit-flip.

 - Return with error-code '-EBADMSG' when bit-flips position is outside current
   Sector and OOB area.

[1] ELM IP spec Table-25 ELM_LOCATION_STATUS Register.
    ELM_LOCATION_STATUS[8] = ECC_CORRECTABLE: Error location process exit status
        0x0: ECC error location process failed.
             Number of errors and error locations are invalid.
        0x1: all errors were successfully located.
             Number of errors and error locations are valid.

Signed-off-by: Pekon Gupta <pekon@ti.com>
---
 drivers/mtd/nand/omap2.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/mtd/nand/omap2.c b/drivers/mtd/nand/omap2.c
index d0d9682..ada278e 100644
--- a/drivers/mtd/nand/omap2.c
+++ b/drivers/mtd/nand/omap2.c
@@ -1437,8 +1437,12 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 	/* Decode BCH error using ELM module */
 	elm_decode_bch_error_page(info->elm_dev, ecc_vec, err_vec);
 
+	err = 0;
 	for (i = 0; i < eccsteps; i++) {
-		if (err_vec[i].error_reported) {
+		if (err_vec[i].error_uncorrectable) {
+			pr_err("nand: uncorrectable bit-flips found\n");
+			err = -EBADMSG;
+		} else if (err_vec[i].error_reported) {
 			for (j = 0; j < err_vec[i].error_count; j++) {
 				switch (info->ecc_opt) {
 				case OMAP_ECC_BCH4_CODE_HW:
@@ -1460,13 +1464,22 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 				byte_pos = (error_max - pos - 1) / 8;
 
 				if (pos < error_max) {
-					if (byte_pos < 512)
+					if (byte_pos < 512) {
+						pr_debug("bitflip@data %d:%d\n",
+							 byte_pos, bit_pos);
 						data[byte_pos] ^= 1 << bit_pos;
-					else
+					} else {
+						pr_debug("bitflip@oob %d:%d\n",
+							 (byte_pos - 512),
+							 bit_pos);
 						spare_ecc[byte_pos - 512] ^=
 							1 << bit_pos;
+					}
+				} else {
+					pr_err("invalid bit-flip @ %d:%d\n",
+							 byte_pos, bit_pos);
+					err = -EBADMSG;
 				}
-				/* else, not interested to correct ecc */
 			}
 		}
 
@@ -1478,12 +1491,7 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 		spare_ecc += eccbytes;
 	}
 
-	for (i = 0; i < eccsteps; i++)
-		/* Return error if uncorrectable error present */
-		if (err_vec[i].error_uncorrectable)
-			return -EINVAL;
-
-	return stat;
+	return (err) ? err : stat;
 }
 
 /**
-- 
1.8.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes
  2014-01-17  7:13 [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Pekon Gupta
                   ` (5 preceding siblings ...)
  2014-01-17  7:13 ` [PATCH v7 6/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix programmed-page bit-flip correction logic Pekon Gupta
@ 2014-01-29 13:18 ` Stefan Roese
  2014-02-04 11:37 ` Gupta, Pekon
  7 siblings, 0 replies; 12+ messages in thread
From: Stefan Roese @ 2014-01-29 13:18 UTC (permalink / raw)
  To: Pekon Gupta, Brian Norris, Artem Bityutskiy; +Cc: linux-mtd, Felipe Balbi

On 17.01.2014 08:13, Pekon Gupta wrote:
> *changes v6 -> v7*
> [PATCH 1/6] <no change>
> [PATCH 2/6] <no change>
> [PATCH 3/6] <dropped older version, refer commit message>
>   - erased-page detection done based on count of zero-bits in OOB & DATA
>   - incorporated some feedbacks from Brian Norris <computersforpeace@gmail.com>
> 	http://lists.infradead.org/pipermail/linux-mtd/2014-January/051533.html
> 	http://lists.infradead.org/pipermail/linux-mtd/2014-January/051534.html
> [PATCH 4/6] <new> remove redundant bit-flip counting for erased-page
> [PATCH 5/6] <minor cleanup>
> [PATCH 6/6] <minor cleanup>
> (As there are functional changes in this version, so not attaching
>   Tested-by: Stefan Roese <sr@denx.de>)

Thanks, Pekon. This still fixes some problems we have seen on an AM335x 
board with ECC errors on UBI/UBIFS mounting. So for the whole patch series:

Tested-by: Stefan Roese <sr@denx.de>

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes
  2014-01-17  7:13 [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Pekon Gupta
                   ` (6 preceding siblings ...)
  2014-01-29 13:18 ` [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Stefan Roese
@ 2014-02-04 11:37 ` Gupta, Pekon
  7 siblings, 0 replies; 12+ messages in thread
From: Gupta, Pekon @ 2014-02-04 11:37 UTC (permalink / raw)
  To: Brian Norris, Artem Bityutskiy; +Cc: Stefan Roese, linux-mtd, Balbi, Felipe

Hi Brian,

>From: Gupta, Pekon
>*changes v6 -> v7*
>[PATCH 1/6] <no change>
>[PATCH 2/6] <no change>
>[PATCH 3/6] <dropped older version, refer commit message>
> - erased-page detection done based on count of zero-bits in OOB & DATA
> - incorporated some feedbacks from Brian Norris <computersforpeace@gmail.com>
>	http://lists.infradead.org/pipermail/linux-mtd/2014-January/051533.html
>	http://lists.infradead.org/pipermail/linux-mtd/2014-January/051534.html
>[PATCH 4/6] <new> remove redundant bit-flip counting for erased-page
>[PATCH 5/6] <minor cleanup>
>[PATCH 6/6] <minor cleanup>
>
Request you to please review this one, as it would keep me going for other
Clean-ups and optimizations on top of this series.

with regards, pekon

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 3/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix erased-page detection for BCHx_HW ECC schemes
  2014-01-17  7:13 ` [PATCH v7 3/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix erased-page detection for BCHx_HW ECC schemes Pekon Gupta
@ 2014-02-11 21:34   ` Brian Norris
  2014-02-12 10:31     ` Gupta, Pekon
  0 siblings, 1 reply; 12+ messages in thread
From: Brian Norris @ 2014-02-11 21:34 UTC (permalink / raw)
  To: Pekon Gupta; +Cc: Stefan Roese, linux-mtd, Felipe Balbi, Artem Bityutskiy

Hi Pekon,

On Fri, Jan 17, 2014 at 12:43:14PM +0530, Pekon Gupta wrote:
> As erased-pages do not have ECC stored in their OOB area, so they need to be
> seperated out from programmed-pages, before doing BCH ECC correction.
> 
> In current implementation of omap_elm_correct_data() which does ECC correction
> for BCHx ECC schemes, this erased-pages are detected based on specific marker
> byte (reserved as 0x00) in ecc-layout.
> However, this approach has some limitation like;
>  1) All ecc-scheme layouts do not have such Reserved byte marker to
>     differentiate between erased-page v/s programmed-page. Thus this is a
>     customized solution.
>  2) Reserved marker byte can itself be subjected to bit-flips causing
>     erased-page to be misunderstood as programmed-page.
> 
> This patch removes dependency on any marker byte in ecc-layout, to differentiate
> between erased-page v/s programeed-page. Instead a page is considered erased if
>  (a) all(OOB)  == 0xff, .i.e., number of zero-bits in read_ecc[] == 0
>  (b) number of zero-bits in (Data + OOB) is less than ecc-strength
> 
> This patch also adds a generic function count_zero_bits(), to find number of
> bits which are '0' in given buffer. This function is optimized for comparing
> read-data with 0xff.
> 
> Signed-off-by: Pekon Gupta <pekon@ti.com>
> ---
>  drivers/mtd/nand/omap2.c | 88 ++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 71 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/mtd/nand/omap2.c b/drivers/mtd/nand/omap2.c
> index 2c73389..f833fbb 100644
> --- a/drivers/mtd/nand/omap2.c
> +++ b/drivers/mtd/nand/omap2.c
> @@ -1331,6 +1331,30 @@ static int erased_sector_bitflips(u_char *data, u_char *oob,
>  }
>  
>  /**
> + * count_zero_bits - count number of bit-flips in given buffer
> + * @buf:	pointer to buffer
> + * @length:	buffer length
> + * @max_bitflips: maximum number of correctable bit-flips (ecc.strength)
> + * $bitflip_count: pointer to store bit-flip count
> + *
> + * Counts number of bit-flips in given buffer, returns with error
> + * as soon as count exceeds max_bitflips limit.
> + */
> +static int count_zero_bits(u_char *buf, int length,
> +					 int max_bitflips, int *bitflip_count)
> +{	int i;
> +
> +	for (i = 0; i < length; i++) {
> +		if (unlikely(buf[i] != 0xff)) {
> +			*bitflip_count += hweight8(~buf[i]);
> +			if (unlikely(*bitflip_count > max_bitflips))
> +				return -EBADMSG;
> +		}
> +	}
> +	return 0;
> +}
> +
> +/**
>   * omap_elm_correct_data - corrects page data area in case error reported
>   * @mtd:	MTD device structure
>   * @data:	page data
> @@ -1359,14 +1383,21 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
>  {
>  	struct omap_nand_info *info = container_of(mtd, struct omap_nand_info,
>  			mtd);
> -	int eccbytes	= info->nand.ecc.bytes;
> -	int eccsteps = info->nand.ecc.steps;
> +	struct nand_ecc_ctrl *ecc = &info->nand.ecc;
> +	int eccstrength		= ecc->strength;
> +	int eccsize		= ecc->size;
> +	int eccbytes		= ecc->bytes;
> +	int eccsteps		= ecc->steps;

When I recommended adding the 'ecc' variable (as you did), I was
suggesting you drop the next 4 variables. You don't need them when you
can (with just as few characters) refer to ecc->field directly and
easily. Stick with one or ther other -- the 4 local variables or the 1
ecc pointer -- but you don't need both.

>  	int i , j, stat = 0;
>  	int eccflag, actual_eccbytes;
>  	struct elm_errorvec err_vec[ERROR_VECTOR_MAX];
>  	u_char *ecc_vec = calc_ecc;
>  	u_char *spare_ecc = read_ecc;
>  	u_char *erased_ecc_vec;
> +	u_char *buf;
> +	int bitflip_count;
> +	int err;
> +	bool page_is_erased;
>  	enum bch_ecc type;
>  	bool is_error_reported = false;
>  
> @@ -1410,24 +1441,47 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
>  		}
>  
>  		if (eccflag == 1) {
> -			/*
> -			 * Set threshold to minimum of 4, half of ecc.strength/2
> -			 * to allow max bit flip in byte to 4
> -			 */
> -			unsigned int threshold = min_t(unsigned int, 4,
> -					info->nand.ecc.strength / 2);
> +			bitflip_count = 0;
> +			/* count zero-bits in OOB region */
> +			err = count_zero_bits(read_ecc, eccbytes,
> +						 eccstrength, &bitflip_count);
> +			if (err) {
> +				/*
> +				 * number of zero-bits in OOB > ecc-strength
> +				 * either un-correctable or programmed-page
> +				 */
> +				page_is_erased = false;
> +			} else if (bitflip_count == 0) {
> +				/* OOB == all(0xff) */
> +				page_is_erased = true;

This else-if is still incorrect. You cannot assume that just because the
OOB is all 0xff that the page is erased.

> +			} else {
> +				/*
> +				 * OOB has some bits as '0' but
> +				 * number of zero-bits in OOB < ecc-strength.
> +				 * It may be erased-page with bit-flips so,
> +				 * further checks are needed for confirmation
> +				 */
> +				/* count zero-bits in DATA region */
> +				buf = &data[eccsize * i];
> +				err = count_zero_bits(buf, eccsize,
> +						 eccstrength, &bitflip_count);
> +				if (err) {
> +					/*
> +					 * total number of zero-bits in OOB
> +					 * and DATA exceeds ecc-strength
> +					 */
> +					page_is_erased = false;
> +				} else {
> +					/* number of zero-bits < ecc-strength */
> +					page_is_erased = true;
> +				}
> +			}

This whole block (where you call count_zero_bits() twice) is a
convoluted and buggy way of just doing the following, AFAIU:

	page_is_erased = !count_zero_bits(read_ecc, ecc->bytes,
				ecc->strength, &bitflip_count) &&
			 !count_zero_bits(&data[ecc->size * i], ecc->size,
			 	ecc->strength, &bitflip_count);

You could modify that to your liking and add a comment, but it's much
more concise than your version, and from what I can tell, it's actually
correct.

>  
>  			/*
> -			 * Check data area is programmed by counting
> -			 * number of 0's at fixed offset in spare area.
> -			 * Checking count of 0's against threshold.
> -			 * In case programmed page expects at least threshold
> -			 * zeros in byte.
> -			 * If zeros are less than threshold for programmed page/
> -			 * zeros are more than threshold erased page, either
> -			 * case page reported as uncorrectable.
> +			 * erased-pages do not have ECC stored in OOB area,
> +			 * so they need to be handled separately.
>  			 */
> -			if (hweight8(~read_ecc[actual_eccbytes]) >= threshold) {
> +			if (!page_is_erased) {
>  				/*
>  				 * Update elm error vector as
>  				 * data area is programmed

Brian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH v7 3/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix erased-page detection for BCHx_HW ECC schemes
  2014-02-11 21:34   ` Brian Norris
@ 2014-02-12 10:31     ` Gupta, Pekon
  2014-03-07 19:11       ` Brian Norris
  0 siblings, 1 reply; 12+ messages in thread
From: Gupta, Pekon @ 2014-02-12 10:31 UTC (permalink / raw)
  To: Brian Norris; +Cc: Stefan Roese, linux-mtd, Balbi, Felipe, Artem Bityutskiy

Hi Brian,

>From: Brian Norris [mailto:computersforpeace@gmail.com]
>>On Fri, Jan 17, 2014 at 12:43:14PM +0530, Pekon Gupta wrote:
>> As erased-pages do not have ECC stored in their OOB area, so they need to be
>> seperated out from programmed-pages, before doing BCH ECC correction.
>>
>> In current implementation of omap_elm_correct_data() which does ECC correction
>> for BCHx ECC schemes, this erased-pages are detected based on specific marker
>> byte (reserved as 0x00) in ecc-layout.
>> However, this approach has some limitation like;
>>  1) All ecc-scheme layouts do not have such Reserved byte marker to
>>     differentiate between erased-page v/s programmed-page. Thus this is a
>>     customized solution.
>>  2) Reserved marker byte can itself be subjected to bit-flips causing
>>     erased-page to be misunderstood as programmed-page.
>>
>> This patch removes dependency on any marker byte in ecc-layout, to differentiate
>> between erased-page v/s programeed-page. Instead a page is considered erased if
>>  (a) all(OOB)  == 0xff, .i.e., number of zero-bits in read_ecc[] == 0
>>  (b) number of zero-bits in (Data + OOB) is less than ecc-strength
>>
>> This patch also adds a generic function count_zero_bits(), to find number of
>> bits which are '0' in given buffer. This function is optimized for comparing
>> read-data with 0xff.
>>
>> Signed-off-by: Pekon Gupta <pekon@ti.com>
>> ---

[...]

>> +/**
>>   * omap_elm_correct_data - corrects page data area in case error reported
>>   * @mtd:	MTD device structure
>>   * @data:	page data
>> @@ -1359,14 +1383,21 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
>>  {
>>  	struct omap_nand_info *info = container_of(mtd, struct omap_nand_info,
>>  			mtd);
>> -	int eccbytes	= info->nand.ecc.bytes;
>> -	int eccsteps = info->nand.ecc.steps;
>> +	struct nand_ecc_ctrl *ecc = &info->nand.ecc;
>> +	int eccstrength		= ecc->strength;
>> +	int eccsize		= ecc->size;
>> +	int eccbytes		= ecc->bytes;
>> +	int eccsteps		= ecc->steps;
>
>When I recommended adding the 'ecc' variable (as you did), I was
>suggesting you drop the next 4 variables. You don't need them when you
>can (with just as few characters) refer to ecc->field directly and
>easily. Stick with one or ther other -- the 4 local variables or the 1
>ecc pointer -- but you don't need both.
>
Oh sorry.. I misunderstood that. I'll fix this in next version.


>> @@ -1410,24 +1441,47 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
>>  		}
>>
>>  		if (eccflag == 1) {
>> -			/*
>> -			 * Set threshold to minimum of 4, half of ecc.strength/2
>> -			 * to allow max bit flip in byte to 4
>> -			 */
>> -			unsigned int threshold = min_t(unsigned int, 4,
>> -					info->nand.ecc.strength / 2);
>> +			bitflip_count = 0;
>> +			/* count zero-bits in OOB region */
>> +			err = count_zero_bits(read_ecc, eccbytes,
>> +						 eccstrength, &bitflip_count);
>> +			if (err) {
>> +				/*
>> +				 * number of zero-bits in OOB > ecc-strength
>> +				 * either un-correctable or programmed-page
>> +				 */
>> +				page_is_erased = false;
>> +			} else if (bitflip_count == 0) {
>> +				/* OOB == all(0xff) */
>> +				page_is_erased = true;
>
>This else-if is still incorrect. You cannot assume that just because the
>OOB is all 0xff that the page is erased.
>
Now this part is most important, where I would like to get more clarity
and feedback before I proceed.
----------------------------
if OOB of an page is == all(0xff)
(a) As per general NAND spec, chip->write_buf() _only_ fills the internal buffers of
 NAND device's with information to be written. Actual write to NAND array cells
 happen only after 'NAND_CMD_PAGEPROG' or 'NAND_CMD_CACHEDPROG' is issued.
 Thus both Main-area and OOB-area are programmed simultaneously inside NAND device.
 if there is any disruption (like power-cut) in on-going  NAND_CMD_PAGEPROG
 cycle, then the data should be assumed to be garbage, and it may have un-stable bits.

(b) if OOB==all(0xff) then there is _no_ ECC stored in OOB to check the read_data
  Hence driver cannot distinguish between genuine data and bit-flips.

Now based on (a) & (b), it's safe to assume that:
if (OOB == all 0xff)
	/* page has no-data | garbage-data*/
Agree ?
(This is where I call it page_is_erased==true, Though I could have used better
variable name as page_has_no_valid_data = true).
----------------------------

And also, driver should return 'total number zero-bits in both Main-area + OOB'
if (0 < 'total number of zero-bits in both OOB + DATA region' < mtd->bitflip_threshold)
	return -EUCLEAN; 
else
	return -EBADMSG;

** However there is a bug in current patch version which I just figured out.
I don't include the number of zero-bits of Main-area, if OOB==all(0xff).
 +			} else if (bitflip_count == 0) {
 +				/* OOB == all(0xff) */
 +				page_is_erased = true;
 +  /* missing */		bitflip_count += count_zero_bits(buf, eccsize,
 +					  		eccstrength, &bitflip_count);
------------------------

>> +			} else {
>> +				/*
>> +				 * OOB has some bits as '0' but
>> +				 * number of zero-bits in OOB < ecc-strength.
>> +				 * It may be erased-page with bit-flips so,
>> +				 * further checks are needed for confirmation
>> +				 */
>> +				/* count zero-bits in DATA region */
>> +				buf = &data[eccsize * i];
>> +				err = count_zero_bits(buf, eccsize,
>> +						 eccstrength, &bitflip_count);
>> +				if (err) {
>> +					/*
>> +					 * total number of zero-bits in OOB
>> +					 * and DATA exceeds ecc-strength
>> +					 */
>> +					page_is_erased = false;
>> +				} else {
>> +					/* number of zero-bits < ecc-strength */
>> +					page_is_erased = true;
>> +				}
>> +			}
>
>This whole block (where you call count_zero_bits() twice) is a
>convoluted and buggy way of just doing the following, AFAIU:
>
>	page_is_erased = !count_zero_bits(read_ecc, ecc->bytes,
>				ecc->strength, &bitflip_count) &&
>			 !count_zero_bits(&data[ecc->size * i], ecc->size,
>			 	ecc->strength, &bitflip_count);
>
>You could modify that to your liking and add a comment, but it's much
>more concise than your version, and from what I can tell, it's actually
>correct.
>
Firstly, In you above statement    's/&&/||'
because as per above statement, if count_zero_bits(oob) == 0, then
my patch assumes 'page_has_no_valid_data = true'.

page_is_erased = !count_zero_bits(read_ecc, ecc->bytes,
				ecc->strength, &bitflip_count) ||
			 !count_zero_bits(&data[ecc->size * i], ecc->size,
			 	ecc->strength, &bitflip_count);


Secondly, You are missing the that here NAND driver is relaxing the limit of
number-of-acceptable bit-flips in an erased-page, because some MLC and 
Toshiba SLC NANDs show bit-flips almost on every second already erased-page.
>> +				if (err) {
>> +					/*
>> +					 * total number of zero-bits in OOB
>> +					 * and DATA exceeds ecc-strength
>> +					 */
>> +					page_is_erased = false;
>> +				} else {
>> +					/* number of zero-bits < ecc-strength */
>> +					page_is_erased = true;
>> +				}

In above patch I can replace ecc->strength with mtd->bitflip_threshold
to make it more user controllable. As bitflip_threshold is configurable via
/sys/class/mtd/mtdX/bitflip_threshold
What is your opinion ?


This was also the basis of my question to Artem [1]

 [1] http://lists.infradead.org/pipermail/linux-mtd/2014-January/051556.html

With regards, pekon

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 3/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix erased-page detection for BCHx_HW ECC schemes
  2014-02-12 10:31     ` Gupta, Pekon
@ 2014-03-07 19:11       ` Brian Norris
  0 siblings, 0 replies; 12+ messages in thread
From: Brian Norris @ 2014-03-07 19:11 UTC (permalink / raw)
  To: Gupta, Pekon; +Cc: Stefan Roese, linux-mtd, Balbi, Felipe, Artem Bityutskiy

Hi Pekon,

On Wed, Feb 12, 2014 at 10:31:02AM +0000, Pekon Gupta wrote:
> >From: Brian Norris [mailto:computersforpeace@gmail.com]
> >>On Fri, Jan 17, 2014 at 12:43:14PM +0530, Pekon Gupta wrote:
> >> This patch removes dependency on any marker byte in ecc-layout, to differentiate
> >> between erased-page v/s programeed-page. Instead a page is considered erased if
> >>  (a) all(OOB)  == 0xff, .i.e., number of zero-bits in read_ecc[] == 0
> >>  (b) number of zero-bits in (Data + OOB) is less than ecc-strength
[...]
> >> @@ -1410,24 +1441,47 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
> >>  		}
> >>
> >>  		if (eccflag == 1) {
> >> -			/*
> >> -			 * Set threshold to minimum of 4, half of ecc.strength/2
> >> -			 * to allow max bit flip in byte to 4
> >> -			 */
> >> -			unsigned int threshold = min_t(unsigned int, 4,
> >> -					info->nand.ecc.strength / 2);
> >> +			bitflip_count = 0;
> >> +			/* count zero-bits in OOB region */
> >> +			err = count_zero_bits(read_ecc, eccbytes,
> >> +						 eccstrength, &bitflip_count);
> >> +			if (err) {
> >> +				/*
> >> +				 * number of zero-bits in OOB > ecc-strength
> >> +				 * either un-correctable or programmed-page
> >> +				 */
> >> +				page_is_erased = false;
> >> +			} else if (bitflip_count == 0) {
> >> +				/* OOB == all(0xff) */
> >> +				page_is_erased = true;
> >
> >This else-if is still incorrect. You cannot assume that just because the
> >OOB is all 0xff that the page is erased.
> >
> Now this part is most important, where I would like to get more clarity
> and feedback before I proceed.
> ----------------------------
> if OOB of an page is == all(0xff)
> (a) As per general NAND spec, chip->write_buf() _only_ fills the internal buffers of
>  NAND device's with information to be written. Actual write to NAND array cells
>  happen only after 'NAND_CMD_PAGEPROG' or 'NAND_CMD_CACHEDPROG' is issued.

I'm not really sure why you bring up chip->write_buf(); not all
implementations actually use this. But anyway...

>  Thus both Main-area and OOB-area are programmed simultaneously inside NAND device.
>  if there is any disruption (like power-cut) in on-going  NAND_CMD_PAGEPROG
>  cycle, then the data should be assumed to be garbage, and it may have un-stable bits.

OK. But this is not at all what we're trying to check; we are checking
*only* whether this particular page reads mostly-0xff, due to regular
bitflips on an otherwise unprogrammed page. The "assumed garbage"
conclusion is really irrelevant here.

> (b) if OOB==all(0xff) then there is _no_ ECC stored in OOB to check the read_data

I disagree. Can you guarantee that there is *no* data pattern in which
the matching ECC syndrome bytes are all 0xff?

>   Hence driver cannot distinguish between genuine data and bit-flips.
> 
> Now based on (a) & (b), it's safe to assume that:
> if (OOB == all 0xff)
> 	/* page has no-data | garbage-data*/
> Agree ?

No. But more importantly, why do you want to make this conclusion?
Aren't we *only* trying to make a decision of "is this page
mostly-0xff"?

Or more directly: you are using the above conclusion ("page has garbage
data") to then erase the buffer entirely. This is totally, 100% bogus
because the data area might have junk (partially-programmed page?) but
the OOB could be all 0xff -- and you are now lying to the upper layers,
saying the page is cleanly 0xff!!!

I believe one of the two of us has a very dire misunderstanding here.
Please help me decide which of us this is :)

> (This is where I call it page_is_erased==true, Though I could have used better
> variable name as page_has_no_valid_data = true).
> ----------------------------
> 
> And also, driver should return 'total number zero-bits in both Main-area + OOB'
> if (0 < 'total number of zero-bits in both OOB + DATA region' < mtd->bitflip_threshold)
> 	return -EUCLEAN; 
> else
> 	return -EBADMSG;
> 
> ** However there is a bug in current patch version which I just figured out.
> I don't include the number of zero-bits of Main-area, if OOB==all(0xff).
>  +			} else if (bitflip_count == 0) {
>  +				/* OOB == all(0xff) */
>  +				page_is_erased = true;
>  +  /* missing */		bitflip_count += count_zero_bits(buf, eccsize,
>  +					  		eccstrength, &bitflip_count);
> ------------------------
> 
> >> +			} else {
> >> +				/*
> >> +				 * OOB has some bits as '0' but
> >> +				 * number of zero-bits in OOB < ecc-strength.
> >> +				 * It may be erased-page with bit-flips so,
> >> +				 * further checks are needed for confirmation
> >> +				 */
> >> +				/* count zero-bits in DATA region */
> >> +				buf = &data[eccsize * i];
> >> +				err = count_zero_bits(buf, eccsize,
> >> +						 eccstrength, &bitflip_count);
> >> +				if (err) {
> >> +					/*
> >> +					 * total number of zero-bits in OOB
> >> +					 * and DATA exceeds ecc-strength
> >> +					 */
> >> +					page_is_erased = false;
> >> +				} else {
> >> +					/* number of zero-bits < ecc-strength */
> >> +					page_is_erased = true;
> >> +				}
> >> +			}
> >
> >This whole block (where you call count_zero_bits() twice) is a
> >convoluted and buggy way of just doing the following, AFAIU:
> >
> >	page_is_erased = !count_zero_bits(read_ecc, ecc->bytes,
> >				ecc->strength, &bitflip_count) &&
> >			 !count_zero_bits(&data[ecc->size * i], ecc->size,
> >			 	ecc->strength, &bitflip_count);
> >
> >You could modify that to your liking and add a comment, but it's much
> >more concise than your version, and from what I can tell, it's actually
> >correct.
> >
> Firstly, In you above statement    's/&&/||'

No. I wrote it exactly as I meant it. We need to check both the data
area and the OOB, and if either cause us to exceed the ECC strength,
then the page is not erased.

> because as per above statement, if count_zero_bits(oob) == 0, then
> my patch assumes 'page_has_no_valid_data = true'.

And that assumption is 100% wrong, AIUI.

> page_is_erased = !count_zero_bits(read_ecc, ecc->bytes,
> 				ecc->strength, &bitflip_count) ||
> 			 !count_zero_bits(&data[ecc->size * i], ecc->size,
> 			 	ecc->strength, &bitflip_count);
> 
> 
> Secondly, You are missing the that here NAND driver is relaxing the limit of
> number-of-acceptable bit-flips in an erased-page, because some MLC and 
> Toshiba SLC NANDs show bit-flips almost on every second already erased-page.

No, I'm not missing this. The MTD/NAND layers already take care of the
"number-of-acceptable bit-flips" using the mtd->bitflip_threshold
parameter.

> >> +				if (err) {
> >> +					/*
> >> +					 * total number of zero-bits in OOB
> >> +					 * and DATA exceeds ecc-strength
> >> +					 */
> >> +					page_is_erased = false;
> >> +				} else {
> >> +					/* number of zero-bits < ecc-strength */
> >> +					page_is_erased = true;
> >> +				}
> 
> In above patch I can replace ecc->strength with mtd->bitflip_threshold
> to make it more user controllable. As bitflip_threshold is configurable via
> /sys/class/mtd/mtdX/bitflip_threshold
> What is your opinion ?

The hardware driver should not be comparing with 'bitflip_threshold'.
Comparing against ecc_strength is correct, as we are simulating the
current ECC correction strength.

Brian

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-03-07 19:12 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-17  7:13 [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Pekon Gupta
2014-01-17  7:13 ` [PATCH v7 1/6] mtd: nand: omap: add field to indicate current ecc-scheme in 'struct omap_nand_info' Pekon Gupta
2014-01-17  7:13 ` [PATCH v7 2/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: rename ambiguous variable 'eccsize' and 'ecc_vector_size' Pekon Gupta
2014-01-17  7:13 ` [PATCH v7 3/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix erased-page detection for BCHx_HW ECC schemes Pekon Gupta
2014-02-11 21:34   ` Brian Norris
2014-02-12 10:31     ` Gupta, Pekon
2014-03-07 19:11       ` Brian Norris
2014-01-17  7:13 ` [PATCH v7 4/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: remove redundant bit-flip counting for erased-page Pekon Gupta
2014-01-17  7:13 ` [PATCH v7 5/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: cleanup for future enhancements Pekon Gupta
2014-01-17  7:13 ` [PATCH v7 6/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix programmed-page bit-flip correction logic Pekon Gupta
2014-01-29 13:18 ` [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Stefan Roese
2014-02-04 11:37 ` Gupta, Pekon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.