All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7] mtd: gpmi: Deal with bitflips in erased regions
@ 2014-01-03 21:27 Elie De Brauwer
  2014-01-03 21:27 ` Elie De Brauwer
  2014-01-09  7:11 ` [PATCH] mtd: gpmi: fix the bitflips for erased page Huang Shijie
  0 siblings, 2 replies; 7+ messages in thread
From: Elie De Brauwer @ 2014-01-03 21:27 UTC (permalink / raw)
  To: b32955, dwmw2, dedekind1, computersforpeace, shijie8
  Cc: Elie De Brauwer, linux-mtd

Hello all,

This is v7 of my patch which attempts to correct bitflips in erased regions.
Tests on hardware have shown that the ALLONES bit used in earlier versions
to distinguish between fast/slow path is not behaving as expected when 
also using the erase threshold. This patch takes a step back and always 
performs the correction when an erased region is detected.

Feedback is welcomed.

Elie De Brauwer (1):
  mtd: gpmi: Deal with bitflips in erased regions

 drivers/mtd/nand/gpmi-nand/bch-regs.h  |  1 +
 drivers/mtd/nand/gpmi-nand/gpmi-lib.c  |  9 +++++++
 drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 43 +++++++++++++++++++++++++++++++---
 3 files changed, 50 insertions(+), 3 deletions(-)

-- 
1.8.5.2

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v7] mtd: gpmi: Deal with bitflips in erased regions
  2014-01-03 21:27 [PATCH v7] mtd: gpmi: Deal with bitflips in erased regions Elie De Brauwer
@ 2014-01-03 21:27 ` Elie De Brauwer
  2014-01-04 12:50   ` Huang Shijie
  2014-01-07  2:13   ` Huang Shijie
  2014-01-09  7:11 ` [PATCH] mtd: gpmi: fix the bitflips for erased page Huang Shijie
  1 sibling, 2 replies; 7+ messages in thread
From: Elie De Brauwer @ 2014-01-03 21:27 UTC (permalink / raw)
  To: b32955, dwmw2, dedekind1, computersforpeace, shijie8
  Cc: Elie De Brauwer, linux-mtd

The BCH block typically used with a GPMI block on an i.MX28/i.MX6 is only
able to correct bitflips on data actually streamed through the block.
When erasing a block the data does not stream through the BCH block
and therefore no ECC data is written to the NAND chip. This causes
gpmi_ecc_read_page to return failure as soon as a single non-1-bit is
found in an erased page. Typically causing problems at higher levels
(ubifs corrupted empty space warnings). This problem was also observed
when using SLC NAND devices.

This patch configures the BCH block to mark a block as 'erased' if
not too much bitflips are found (by setting the erase threshold). A
consequence of this is that whenever an erased page is read, the
number of bitflips will be counted and corrected in software,
allowing the upper layers to take proper actions.

Signed-off-by: Elie De Brauwer <eliedebrauwer@gmail.com>
Acked-by: Peter Korsgaard <peter@korsgaard.com>
Acked-by: Huang Shijie <b32955@freescale.com>
---
 drivers/mtd/nand/gpmi-nand/bch-regs.h  |  1 +
 drivers/mtd/nand/gpmi-nand/gpmi-lib.c  |  9 +++++++
 drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 43 +++++++++++++++++++++++++++++++---
 3 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/drivers/mtd/nand/gpmi-nand/bch-regs.h b/drivers/mtd/nand/gpmi-nand/bch-regs.h
index 588f537..b2104de 100644
--- a/drivers/mtd/nand/gpmi-nand/bch-regs.h
+++ b/drivers/mtd/nand/gpmi-nand/bch-regs.h
@@ -31,6 +31,7 @@
 
 #define HW_BCH_STATUS0				0x00000010
 #define HW_BCH_MODE				0x00000020
+#define BM_BCH_MODE_ERASE_THRESHOLD_MASK	0xff
 #define HW_BCH_ENCODEPTR			0x00000030
 #define HW_BCH_DATAPTR				0x00000040
 #define HW_BCH_METAPTR				0x00000050
diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-lib.c b/drivers/mtd/nand/gpmi-nand/gpmi-lib.c
index dd1df60..b6a7aa8 100644
--- a/drivers/mtd/nand/gpmi-nand/gpmi-lib.c
+++ b/drivers/mtd/nand/gpmi-nand/gpmi-lib.c
@@ -256,6 +256,7 @@ int bch_set_geometry(struct gpmi_nand_data *this)
 	unsigned int ecc_strength;
 	unsigned int page_size;
 	unsigned int gf_len;
+	unsigned int erase_threshold;
 	int ret;
 
 	if (common_nfc_set_geometry(this))
@@ -298,6 +299,14 @@ int bch_set_geometry(struct gpmi_nand_data *this)
 			| BF_BCH_FLASH0LAYOUT1_DATAN_SIZE(block_size, this),
 			r->bch_regs + HW_BCH_FLASH0LAYOUT1);
 
+	/* Set the tolerance for bitflips when reading erased blocks. */
+	erase_threshold = gf_len / 2;
+	if (erase_threshold > bch_geo->ecc_strength)
+		erase_threshold = bch_geo->ecc_strength;
+
+	writel(erase_threshold & BM_BCH_MODE_ERASE_THRESHOLD_MASK,
+		r->bch_regs + HW_BCH_MODE);
+
 	/* Set *all* chip selects to use layout 0. */
 	writel(0, r->bch_regs + HW_BCH_LAYOUTSELECT);
 
diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
index e2f5820..eac6714 100644
--- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
+++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
@@ -958,6 +958,30 @@ static void block_mark_swapping(struct gpmi_nand_data *this,
 	p[1] = (p[1] & mask) | (from_oob >> (8 - bit));
 }
 
+/*
+ * Count the number of 0 bits in a supposed to be
+ * erased region and correct them. Return the number
+ * of bitflips or zero when the region was correct.
+ */
+static unsigned int erased_sector_bitflips(unsigned char *data,
+					unsigned int chunk,
+					struct bch_geometry *geo)
+{
+	unsigned int flip_bits = 0;
+	int i;
+	int base = geo->ecc_chunk_size * chunk;
+
+	/* Count bitflips */
+	for (i = 0; i < geo->ecc_chunk_size; i++)
+		flip_bits += hweight8(~data[base + i]);
+
+	/* Correct bitflips by 0xFF'ing this chunk. */
+	if (flip_bits)
+		memset(&data[base], 0xFF, geo->ecc_chunk_size);
+
+	return flip_bits;
+}
+
 static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 				uint8_t *buf, int oob_required, int page)
 {
@@ -969,6 +993,7 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 	dma_addr_t    auxiliary_phys;
 	unsigned int  i;
 	unsigned char *status;
+	unsigned int  flips;
 	unsigned int  max_bitflips = 0;
 	int           ret;
 
@@ -1003,15 +1028,27 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 	status = auxiliary_virt + nfc_geo->auxiliary_status_offset;
 
 	for (i = 0; i < nfc_geo->ecc_chunk_count; i++, status++) {
-		if ((*status == STATUS_GOOD) || (*status == STATUS_ERASED))
+		if (*status == STATUS_GOOD)
 			continue;
 
 		if (*status == STATUS_UNCORRECTABLE) {
 			mtd->ecc_stats.failed++;
 			continue;
 		}
-		mtd->ecc_stats.corrected += *status;
-		max_bitflips = max_t(unsigned int, max_bitflips, *status);
+
+		/* 
+		 * The number of bitflips are either counted in software
+		 * in case of an erased chunk or otherwise reported by
+		 * the BCH block.
+		 */
+		if (*status == STATUS_ERASED)
+			flips = erased_sector_bitflips(payload_virt, i,
+							       nfc_geo);
+		else
+			flips = *status;
+
+		mtd->ecc_stats.corrected += flips;
+		max_bitflips = max_t(unsigned int, max_bitflips, flips);
 	}
 
 	if (oob_required) {
-- 
1.8.5.2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v7] mtd: gpmi: Deal with bitflips in erased regions
  2014-01-03 21:27 ` Elie De Brauwer
@ 2014-01-04 12:50   ` Huang Shijie
  2014-01-07  2:13   ` Huang Shijie
  1 sibling, 0 replies; 7+ messages in thread
From: Huang Shijie @ 2014-01-04 12:50 UTC (permalink / raw)
  To: Elie De Brauwer; +Cc: b32955, computersforpeace, dwmw2, linux-mtd, dedekind1

On Fri, Jan 03, 2014 at 10:27:19PM +0100, Elie De Brauwer wrote:
> The BCH block typically used with a GPMI block on an i.MX28/i.MX6 is only
> able to correct bitflips on data actually streamed through the block.
> When erasing a block the data does not stream through the BCH block
> and therefore no ECC data is written to the NAND chip. This causes
> gpmi_ecc_read_page to return failure as soon as a single non-1-bit is
> found in an erased page. Typically causing problems at higher levels
> (ubifs corrupted empty space warnings). This problem was also observed
> when using SLC NAND devices.
> 
> This patch configures the BCH block to mark a block as 'erased' if
> not too much bitflips are found (by setting the erase threshold). A
> consequence of this is that whenever an erased page is read, the
> number of bitflips will be counted and corrected in software,
> allowing the upper layers to take proper actions.
> 
> Signed-off-by: Elie De Brauwer <eliedebrauwer@gmail.com>
> Acked-by: Peter Korsgaard <peter@korsgaard.com>
> Acked-by: Huang Shijie <b32955@freescale.com>

I am okay with this patch.

thanks
Huang Shijie

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v7] mtd: gpmi: Deal with bitflips in erased regions
  2014-01-03 21:27 ` Elie De Brauwer
  2014-01-04 12:50   ` Huang Shijie
@ 2014-01-07  2:13   ` Huang Shijie
  1 sibling, 0 replies; 7+ messages in thread
From: Huang Shijie @ 2014-01-07  2:13 UTC (permalink / raw)
  To: Elie De Brauwer; +Cc: shijie8, computersforpeace, dwmw2, linux-mtd, dedekind1

On Fri, Jan 03, 2014 at 10:27:19PM +0100, Elie De Brauwer wrote:
> 
> @@ -1003,15 +1028,27 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
>  	status = auxiliary_virt + nfc_geo->auxiliary_status_offset;
>  
>  	for (i = 0; i < nfc_geo->ecc_chunk_count; i++, status++) {
> -		if ((*status == STATUS_GOOD) || (*status == STATUS_ERASED))
> +		if (*status == STATUS_GOOD)
>  			continue;
>  
>  		if (*status == STATUS_UNCORRECTABLE) {
>  			mtd->ecc_stats.failed++;
>  			continue;
>  		}
> -		mtd->ecc_stats.corrected += *status;
> -		max_bitflips = max_t(unsigned int, max_bitflips, *status);
> +
> +		/* 

Hi Brian:
  Please do not forget to remove the whitespace at the end of the above line
  when you merge this patch.

thanks
Huang Shijie


> +		 * The number of bitflips are either counted in software
> +		 * in case of an erased chunk or otherwise reported by
> +		 * the BCH block.
> +		 */
> +		if (*status == STATUS_ERASED)
> +			flips = erased_sector_bitflips(payload_virt, i,
> +							       nfc_geo);
> +		else
> +			flips = *status;
> +

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] mtd: gpmi: fix the bitflips for erased page
  2014-01-03 21:27 [PATCH v7] mtd: gpmi: Deal with bitflips in erased regions Elie De Brauwer
  2014-01-03 21:27 ` Elie De Brauwer
@ 2014-01-09  7:11 ` Huang Shijie
  2014-01-09 20:08   ` Bill Pringlemeir
  1 sibling, 1 reply; 7+ messages in thread
From: Huang Shijie @ 2014-01-09  7:11 UTC (permalink / raw)
  To: dwmw2; +Cc: linux-mtd, computersforpeace, eliedebrauwer, Huang Shijie

We may meet the bitflips in reading an erased page(contains all 0xFF),
this may causes the UBIFS corrupt, please see the log from Elie:

-----------------------------------------------------------------
[    3.831323] UBI warning: ubi_io_read: error -74 (ECC error) while reading 16384 bytes from PEB 443:245760, read only 16384 bytes, retry
[    3.845026] UBI warning: ubi_io_read: error -74 (ECC error) while reading 16384 bytes from PEB 443:245760, read only 16384 bytes, retry
[    3.858710] UBI warning: ubi_io_read: error -74 (ECC error) while reading 16384 bytes from PEB 443:245760, read only 16384 bytes, retry
[    3.872408] UBI error: ubi_io_read: error -74 (ECC error) while reading 16384 bytes from PEB 443:245760, read 16384 bytes
...
[    4.011529] UBIFS error (pid 36): ubifs_recover_leb: corrupt empty space LEB 27:237568, corruption starts at 9815
[    4.021897] UBIFS error (pid 36): ubifs_scanned_corruption: corruption at LEB 27:247383
[    4.030000] UBIFS error (pid 36): ubifs_scanned_corruption: first 6569 bytes from LEB 27:247383
-----------------------------------------------------------------

This patch does a check for the uncorrectable failure in the following steps:

   [0] set the threshold.
       The threshold is set based on the truth:
         "A single 0 bit will lead to gf_len(13 or 14) bits 0 after the BCH
          do the ECC."

       For the sake of safe, we will set the threshold with half the gf_len, and
       do not make it bigger the ECC strength.

   [1] count the bitflips of the current ECC chunk, assume it is N.

   [2] if the (N < threshold) is true, we continue to read out the page with
       ECC disabled. and we count the bitflips again, assume it is N2.

   [3] if the (N2 < threshold) is true again, we can regard this is a erased
       page.

   [4] if the [3] fails, we can regard this is a page filled with the '0xFF'
       data.

Reported-by: Elie De Brauwer <eliedebrauwer@gmail.com>
Signed-off-by: Huang Shijie <b32955@freescale.com>
---
Hi Elie:
   could you please test this patch.

   thx.
---
 drivers/mtd/nand/gpmi-nand/gpmi-nand.c |   44 ++++++++++++++++++++++++++++++++
 1 files changed, 44 insertions(+), 0 deletions(-)

diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
index e2f5820..e39443e 100644
--- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
+++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
@@ -958,6 +958,45 @@ static void block_mark_swapping(struct gpmi_nand_data *this,
 	p[1] = (p[1] & mask) | (from_oob >> (8 - bit));
 }
 
+static bool gpmi_erased_check(struct gpmi_nand_data *this,
+			unsigned char *data, unsigned int chunk, int page)
+{
+	struct nand_chip *chip = &this->nand;
+	struct mtd_info	*mtd = &this->mtd;
+	struct bch_geometry *geo = &this->bch_geometry;
+	int base = geo->ecc_chunk_size * chunk;
+	unsigned int flip_bits = 0;
+	uint8_t *buf = this->data_buffer_dma;
+	unsigned int threshold;
+	int i;
+
+	threshold = geo->gf_len / 2;
+	if (threshold > geo->ecc_strength)
+		threshold = geo->ecc_strength;
+
+	/* Count bitflips */
+	for (i = 0; i < geo->ecc_chunk_size; i++)
+		flip_bits += hweight8(~data[base + i]);
+
+	if (flip_bits < threshold) {
+		dev_dbg(this->dev, "check for the erased page:%d, chunk:%d\n",
+				page, chunk);
+		flip_bits = 0;
+		memset(buf, 0, geo->payload_size);
+
+		/* read out the page without ECC enabled, and check it again. */
+		chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page);
+		chip->read_buf(mtd, buf, geo->payload_size);
+
+		for (i = 0; i < geo->payload_size; i++)
+			flip_bits += hweight8(~buf[i]);
+
+		if (flip_bits < threshold)
+			return true;
+	}
+	return false;
+}
+
 static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 				uint8_t *buf, int oob_required, int page)
 {
@@ -1007,6 +1046,11 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 			continue;
 
 		if (*status == STATUS_UNCORRECTABLE) {
+			if (gpmi_erased_check(this, payload_virt, i, page)) {
+				memset(payload_virt, 0xff,
+						nfc_geo->payload_size);
+				break;
+			}
 			mtd->ecc_stats.failed++;
 			continue;
 		}
-- 
1.7.2.rc3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] mtd: gpmi: fix the bitflips for erased page
  2014-01-09  7:11 ` [PATCH] mtd: gpmi: fix the bitflips for erased page Huang Shijie
@ 2014-01-09 20:08   ` Bill Pringlemeir
  2014-01-10  1:53     ` Huang Shijie
  0 siblings, 1 reply; 7+ messages in thread
From: Bill Pringlemeir @ 2014-01-09 20:08 UTC (permalink / raw)
  To: Huang Shijie; +Cc: linux-mtd

On  9 Jan 2014, b32955 at freescale.com wrote:

> ---
> drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 44 ++++++++++++++++++++++++++++++++
> 1 files changed, 44 insertions(+), 0 deletions(-)

[snip]

> +	if (flip_bits < threshold) {

Isn't it "if (flip_bits <= threshold)"?

> +		dev_dbg(this->dev, "check for the erased page:%d, chunk:%d\n",
> +				page, chunk);
> +		flip_bits = 0;
> +		memset(buf, 0, geo->payload_size);
> +
> +		/* read out the page without ECC enabled, and check it again. */
> +		chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page);
> +		chip->read_buf(mtd, buf, geo->payload_size);
> +
> +		for (i = 0; i < geo->payload_size; i++)
> +			flip_bits += hweight8(~buf[i]);
> +
> +		if (flip_bits < threshold)

Perhaps the same?  At least 'strength' means the number of correctable
bit flips; or maybe you have multiple buffers?  It is just the boundary
of when an erase page is about to go bad, which might be hard to see
through testing?  I have the same issue with the Vybrid NFC and ECC, so
I maybe copying your code.

Thanks,
Bill Pringlemeir.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mtd: gpmi: fix the bitflips for erased page
  2014-01-09 20:08   ` Bill Pringlemeir
@ 2014-01-10  1:53     ` Huang Shijie
  0 siblings, 0 replies; 7+ messages in thread
From: Huang Shijie @ 2014-01-10  1:53 UTC (permalink / raw)
  To: Bill Pringlemeir; +Cc: linux-mtd

On Thu, Jan 09, 2014 at 03:08:08PM -0500, Bill Pringlemeir wrote:
> On  9 Jan 2014, b32955 at freescale.com wrote:
> 
> > ---
> > drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 44 ++++++++++++++++++++++++++++++++
> > 1 files changed, 44 insertions(+), 0 deletions(-)
> 
> [snip]
> 
> > +	if (flip_bits < threshold) {
> 
> Isn't it "if (flip_bits <= threshold)"?
thanks. 
I agree the (flip_bits <= threshold) is better.

Huang Shijie

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-01-10  2:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-03 21:27 [PATCH v7] mtd: gpmi: Deal with bitflips in erased regions Elie De Brauwer
2014-01-03 21:27 ` Elie De Brauwer
2014-01-04 12:50   ` Huang Shijie
2014-01-07  2:13   ` Huang Shijie
2014-01-09  7:11 ` [PATCH] mtd: gpmi: fix the bitflips for erased page Huang Shijie
2014-01-09 20:08   ` Bill Pringlemeir
2014-01-10  1:53     ` Huang Shijie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.