All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mtd: gpmi: properly handle bitflips in erased pages
@ 2014-12-01 19:12 ` Boris Brezillon
  0 siblings, 0 replies; 18+ messages in thread
From: Boris Brezillon @ 2014-12-01 19:12 UTC (permalink / raw)
  To: David Woodhouse, Brian Norris, linux-mtd
  Cc: Huang Shijie, linux-kernel, linux-arm-kernel, Mike Voytovich,
	Roy Lee, Boris Brezillon

The hardware ECC engine embedded in freescale SoCs can fix a few bitflips
in written pages.
In the other hand, erased pages are filled with ones, and, as the ECC
engine does not generate all one ECC bits for this pattern, it considers
an empty page as uncorrectable (too many bitflips to be  corrected).

To address this, the NAND controller test for empty page pattern (page
filled with 0xff) before enabling the ECC engine.
While this work in most cases, it fails when at least one bit flip occurred
in an erased page (because the controller consider it as not empty/erased).

This patch read the NAND page in raw mode if the controller has returned
an UNCORRECTABLE status, and, if the page looks like an empty page with a
few bitflips (less than the ECC strength), fixes those bitflips instead of
returning an error.

Reported-by: Elie De Brauwer <eliedebrauwer@gmail.com>
Signed-off-by: Huang Shijie <shijie8@gmail.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
---
Hi Brian, Huang,

I'm reviving the 'fix bitflips in erased pages' patch (I've experienced such
bitflips on an SLC NAND).

I've mixed both of your implementations to make this patch, hence the SoB tags
(maybe I should even keep one of you in author...).

Brian, I really like the idea of having a generic implementation for this
feature (using read_page_raw) as you suggested here [1], but this implies
having a temporary buffer to store the page read in raw mode and keep the page
read in normal (HW ECC engine eanbled) mode, and I'm not sure we want to
allocate more buffers than we already have.

Huang, I remember you wanted to limit the bitflips strength on erased pages
compared to the normal ECC strength.
This version just consider it can fix up to ECC strength bits, but I can change
that to something else (ecc_strength / 2 ?).

Feel free to suggest any alternative, to this implementation.

Best Regards,

Boris

[1]http://article.gmane.org/gmane.linux.drivers.mtd/52183/raw

 drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 108 ++++++++++++++++++++++++++++++++-
 1 file changed, 106 insertions(+), 2 deletions(-)

diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
index 4f3851a..647b28b 100644
--- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
+++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
@@ -991,6 +991,69 @@ static void block_mark_swapping(struct gpmi_nand_data *this,
 	p[1] = (p[1] & mask) | (from_oob >> (8 - bit));
 }
 
+/*
+ * Check a page to see if it is erased (w/ bitflips) after an uncorrectable ECC
+ * error
+ *
+ * On a real error, return a negative error code (-EBADMSG for ECC error).
+ * Otherwise, fill chunk with 0xff and return the number of corrected
+ * bitflips.
+ *
+ */
+static int gpmi_verify_erased_page(struct gpmi_nand_data *this,
+				   unsigned int chunk)
+{
+	struct bch_geometry *nfc_geo = &this->bch_geometry;
+	int eccsize = nfc_geo->ecc_chunk_size;
+	int eccbits = nfc_geo->ecc_strength * nfc_geo->gf_len;
+	unsigned int bitflips = 0;
+	unsigned char *data = this->raw_buffer;
+	unsigned int threshold = nfc_geo->ecc_strength;
+	int data_nbits, bit_off;
+	int i;
+
+	data_nbits = (eccsize * 8) + eccbits;
+	bit_off = chunk * data_nbits;
+
+	/* First chunk also embeds metadata */
+	if (!chunk)
+		data_nbits += (nfc_geo->metadata_size * 8);
+	else
+		bit_off += (nfc_geo->metadata_size * 8);
+
+	data = this->raw_buffer + (bit_off % 8);
+	if (bit_off % 8) {
+		bitflips += hweight8(~(*data | GENMASK(bit_off - 1, 0)));
+		if (bitflips > threshold)
+			return -EBADMSG;
+
+		data_nbits -= bit_off;
+	}
+
+	for (i = 0; i < data_nbits / 8; i++) {
+		bitflips += hweight8(~data[i]);
+		if (bitflips > threshold)
+			return -EBADMSG;
+	}
+
+	data_nbits %= 8;
+	data += data_nbits / 8;
+	if (data_nbits) {
+		bitflips += hweight8(~(*data | GENMASK(7, data_nbits)));
+		if (bitflips > threshold)
+			return -EBADMSG;
+	}
+
+	memset(this->payload_virt + (chunk * eccsize), 0xff, eccsize);
+
+	if (!chunk)
+		memset(this->auxiliary_virt, 0xff, nfc_geo->metadata_size);
+
+	pr_debug("correcting %u bitflips in erased page\n", bitflips);
+
+	return bitflips;
+}
+
 static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 				uint8_t *buf, int oob_required, int page)
 {
@@ -1036,13 +1099,54 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 	status = auxiliary_virt + nfc_geo->auxiliary_status_offset;
 
 	for (i = 0; i < nfc_geo->ecc_chunk_count; i++, status++) {
+		bool raw_read_done = false;
+
 		if ((*status == STATUS_GOOD) || (*status == STATUS_ERASED))
 			continue;
 
 		if (*status == STATUS_UNCORRECTABLE) {
-			mtd->ecc_stats.failed++;
-			continue;
+			if (!raw_read_done) {
+				bool direct_dma_map_ok;
+
+				/*
+				 * Reading a page will override the current
+				 * direct_dma_map_ok field.
+				 * Save direct_dma_map_ok and restore it after
+				 * raw page read has completed.
+				 */
+				direct_dma_map_ok = this->direct_dma_map_ok;
+				chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page);
+				chip->read_buf(mtd, this->raw_buffer,
+					       mtd->writesize + mtd->oobsize);
+				this->direct_dma_map_ok = direct_dma_map_ok;
+
+				/*
+				 * Swap bad block marker if needed.
+				 */
+				if (this->swap_block_mark) {
+					u8 *raw_buf = this->raw_buffer;
+					u8 swap = raw_buf[0];
+
+					raw_buf[0] = raw_buf[mtd->writesize];
+					raw_buf[mtd->writesize] = swap;
+				}
+
+				raw_read_done = true;
+			}
+
+			/*
+			 * Analyse the current chunk to handle the "bitflips in
+			 * erased page" case.
+			 */
+			ret = gpmi_verify_erased_page(this, i);
+			if (ret < 0) {
+				mtd->ecc_stats.failed++;
+				continue;
+			}
+
+			*status = ret;
 		}
+
 		mtd->ecc_stats.corrected += *status;
 		max_bitflips = max_t(unsigned int, max_bitflips, *status);
 	}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH] mtd: gpmi: properly handle bitflips in erased pages
@ 2014-12-01 19:12 ` Boris Brezillon
  0 siblings, 0 replies; 18+ messages in thread
From: Boris Brezillon @ 2014-12-01 19:12 UTC (permalink / raw)
  To: David Woodhouse, Brian Norris, linux-mtd
  Cc: Boris Brezillon, Mike Voytovich, linux-kernel, Roy Lee,
	Huang Shijie, linux-arm-kernel

The hardware ECC engine embedded in freescale SoCs can fix a few bitflips
in written pages.
In the other hand, erased pages are filled with ones, and, as the ECC
engine does not generate all one ECC bits for this pattern, it considers
an empty page as uncorrectable (too many bitflips to be  corrected).

To address this, the NAND controller test for empty page pattern (page
filled with 0xff) before enabling the ECC engine.
While this work in most cases, it fails when at least one bit flip occurred
in an erased page (because the controller consider it as not empty/erased).

This patch read the NAND page in raw mode if the controller has returned
an UNCORRECTABLE status, and, if the page looks like an empty page with a
few bitflips (less than the ECC strength), fixes those bitflips instead of
returning an error.

Reported-by: Elie De Brauwer <eliedebrauwer@gmail.com>
Signed-off-by: Huang Shijie <shijie8@gmail.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
---
Hi Brian, Huang,

I'm reviving the 'fix bitflips in erased pages' patch (I've experienced such
bitflips on an SLC NAND).

I've mixed both of your implementations to make this patch, hence the SoB tags
(maybe I should even keep one of you in author...).

Brian, I really like the idea of having a generic implementation for this
feature (using read_page_raw) as you suggested here [1], but this implies
having a temporary buffer to store the page read in raw mode and keep the page
read in normal (HW ECC engine eanbled) mode, and I'm not sure we want to
allocate more buffers than we already have.

Huang, I remember you wanted to limit the bitflips strength on erased pages
compared to the normal ECC strength.
This version just consider it can fix up to ECC strength bits, but I can change
that to something else (ecc_strength / 2 ?).

Feel free to suggest any alternative, to this implementation.

Best Regards,

Boris

[1]http://article.gmane.org/gmane.linux.drivers.mtd/52183/raw

 drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 108 ++++++++++++++++++++++++++++++++-
 1 file changed, 106 insertions(+), 2 deletions(-)

diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
index 4f3851a..647b28b 100644
--- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
+++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
@@ -991,6 +991,69 @@ static void block_mark_swapping(struct gpmi_nand_data *this,
 	p[1] = (p[1] & mask) | (from_oob >> (8 - bit));
 }
 
+/*
+ * Check a page to see if it is erased (w/ bitflips) after an uncorrectable ECC
+ * error
+ *
+ * On a real error, return a negative error code (-EBADMSG for ECC error).
+ * Otherwise, fill chunk with 0xff and return the number of corrected
+ * bitflips.
+ *
+ */
+static int gpmi_verify_erased_page(struct gpmi_nand_data *this,
+				   unsigned int chunk)
+{
+	struct bch_geometry *nfc_geo = &this->bch_geometry;
+	int eccsize = nfc_geo->ecc_chunk_size;
+	int eccbits = nfc_geo->ecc_strength * nfc_geo->gf_len;
+	unsigned int bitflips = 0;
+	unsigned char *data = this->raw_buffer;
+	unsigned int threshold = nfc_geo->ecc_strength;
+	int data_nbits, bit_off;
+	int i;
+
+	data_nbits = (eccsize * 8) + eccbits;
+	bit_off = chunk * data_nbits;
+
+	/* First chunk also embeds metadata */
+	if (!chunk)
+		data_nbits += (nfc_geo->metadata_size * 8);
+	else
+		bit_off += (nfc_geo->metadata_size * 8);
+
+	data = this->raw_buffer + (bit_off % 8);
+	if (bit_off % 8) {
+		bitflips += hweight8(~(*data | GENMASK(bit_off - 1, 0)));
+		if (bitflips > threshold)
+			return -EBADMSG;
+
+		data_nbits -= bit_off;
+	}
+
+	for (i = 0; i < data_nbits / 8; i++) {
+		bitflips += hweight8(~data[i]);
+		if (bitflips > threshold)
+			return -EBADMSG;
+	}
+
+	data_nbits %= 8;
+	data += data_nbits / 8;
+	if (data_nbits) {
+		bitflips += hweight8(~(*data | GENMASK(7, data_nbits)));
+		if (bitflips > threshold)
+			return -EBADMSG;
+	}
+
+	memset(this->payload_virt + (chunk * eccsize), 0xff, eccsize);
+
+	if (!chunk)
+		memset(this->auxiliary_virt, 0xff, nfc_geo->metadata_size);
+
+	pr_debug("correcting %u bitflips in erased page\n", bitflips);
+
+	return bitflips;
+}
+
 static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 				uint8_t *buf, int oob_required, int page)
 {
@@ -1036,13 +1099,54 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 	status = auxiliary_virt + nfc_geo->auxiliary_status_offset;
 
 	for (i = 0; i < nfc_geo->ecc_chunk_count; i++, status++) {
+		bool raw_read_done = false;
+
 		if ((*status == STATUS_GOOD) || (*status == STATUS_ERASED))
 			continue;
 
 		if (*status == STATUS_UNCORRECTABLE) {
-			mtd->ecc_stats.failed++;
-			continue;
+			if (!raw_read_done) {
+				bool direct_dma_map_ok;
+
+				/*
+				 * Reading a page will override the current
+				 * direct_dma_map_ok field.
+				 * Save direct_dma_map_ok and restore it after
+				 * raw page read has completed.
+				 */
+				direct_dma_map_ok = this->direct_dma_map_ok;
+				chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page);
+				chip->read_buf(mtd, this->raw_buffer,
+					       mtd->writesize + mtd->oobsize);
+				this->direct_dma_map_ok = direct_dma_map_ok;
+
+				/*
+				 * Swap bad block marker if needed.
+				 */
+				if (this->swap_block_mark) {
+					u8 *raw_buf = this->raw_buffer;
+					u8 swap = raw_buf[0];
+
+					raw_buf[0] = raw_buf[mtd->writesize];
+					raw_buf[mtd->writesize] = swap;
+				}
+
+				raw_read_done = true;
+			}
+
+			/*
+			 * Analyse the current chunk to handle the "bitflips in
+			 * erased page" case.
+			 */
+			ret = gpmi_verify_erased_page(this, i);
+			if (ret < 0) {
+				mtd->ecc_stats.failed++;
+				continue;
+			}
+
+			*status = ret;
 		}
+
 		mtd->ecc_stats.corrected += *status;
 		max_bitflips = max_t(unsigned int, max_bitflips, *status);
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH] mtd: gpmi: properly handle bitflips in erased pages
@ 2014-12-01 19:12 ` Boris Brezillon
  0 siblings, 0 replies; 18+ messages in thread
From: Boris Brezillon @ 2014-12-01 19:12 UTC (permalink / raw)
  To: linux-arm-kernel

The hardware ECC engine embedded in freescale SoCs can fix a few bitflips
in written pages.
In the other hand, erased pages are filled with ones, and, as the ECC
engine does not generate all one ECC bits for this pattern, it considers
an empty page as uncorrectable (too many bitflips to be  corrected).

To address this, the NAND controller test for empty page pattern (page
filled with 0xff) before enabling the ECC engine.
While this work in most cases, it fails when at least one bit flip occurred
in an erased page (because the controller consider it as not empty/erased).

This patch read the NAND page in raw mode if the controller has returned
an UNCORRECTABLE status, and, if the page looks like an empty page with a
few bitflips (less than the ECC strength), fixes those bitflips instead of
returning an error.

Reported-by: Elie De Brauwer <eliedebrauwer@gmail.com>
Signed-off-by: Huang Shijie <shijie8@gmail.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
---
Hi Brian, Huang,

I'm reviving the 'fix bitflips in erased pages' patch (I've experienced such
bitflips on an SLC NAND).

I've mixed both of your implementations to make this patch, hence the SoB tags
(maybe I should even keep one of you in author...).

Brian, I really like the idea of having a generic implementation for this
feature (using read_page_raw) as you suggested here [1], but this implies
having a temporary buffer to store the page read in raw mode and keep the page
read in normal (HW ECC engine eanbled) mode, and I'm not sure we want to
allocate more buffers than we already have.

Huang, I remember you wanted to limit the bitflips strength on erased pages
compared to the normal ECC strength.
This version just consider it can fix up to ECC strength bits, but I can change
that to something else (ecc_strength / 2 ?).

Feel free to suggest any alternative, to this implementation.

Best Regards,

Boris

[1]http://article.gmane.org/gmane.linux.drivers.mtd/52183/raw

 drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 108 ++++++++++++++++++++++++++++++++-
 1 file changed, 106 insertions(+), 2 deletions(-)

diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
index 4f3851a..647b28b 100644
--- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
+++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
@@ -991,6 +991,69 @@ static void block_mark_swapping(struct gpmi_nand_data *this,
 	p[1] = (p[1] & mask) | (from_oob >> (8 - bit));
 }
 
+/*
+ * Check a page to see if it is erased (w/ bitflips) after an uncorrectable ECC
+ * error
+ *
+ * On a real error, return a negative error code (-EBADMSG for ECC error).
+ * Otherwise, fill chunk with 0xff and return the number of corrected
+ * bitflips.
+ *
+ */
+static int gpmi_verify_erased_page(struct gpmi_nand_data *this,
+				   unsigned int chunk)
+{
+	struct bch_geometry *nfc_geo = &this->bch_geometry;
+	int eccsize = nfc_geo->ecc_chunk_size;
+	int eccbits = nfc_geo->ecc_strength * nfc_geo->gf_len;
+	unsigned int bitflips = 0;
+	unsigned char *data = this->raw_buffer;
+	unsigned int threshold = nfc_geo->ecc_strength;
+	int data_nbits, bit_off;
+	int i;
+
+	data_nbits = (eccsize * 8) + eccbits;
+	bit_off = chunk * data_nbits;
+
+	/* First chunk also embeds metadata */
+	if (!chunk)
+		data_nbits += (nfc_geo->metadata_size * 8);
+	else
+		bit_off += (nfc_geo->metadata_size * 8);
+
+	data = this->raw_buffer + (bit_off % 8);
+	if (bit_off % 8) {
+		bitflips += hweight8(~(*data | GENMASK(bit_off - 1, 0)));
+		if (bitflips > threshold)
+			return -EBADMSG;
+
+		data_nbits -= bit_off;
+	}
+
+	for (i = 0; i < data_nbits / 8; i++) {
+		bitflips += hweight8(~data[i]);
+		if (bitflips > threshold)
+			return -EBADMSG;
+	}
+
+	data_nbits %= 8;
+	data += data_nbits / 8;
+	if (data_nbits) {
+		bitflips += hweight8(~(*data | GENMASK(7, data_nbits)));
+		if (bitflips > threshold)
+			return -EBADMSG;
+	}
+
+	memset(this->payload_virt + (chunk * eccsize), 0xff, eccsize);
+
+	if (!chunk)
+		memset(this->auxiliary_virt, 0xff, nfc_geo->metadata_size);
+
+	pr_debug("correcting %u bitflips in erased page\n", bitflips);
+
+	return bitflips;
+}
+
 static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 				uint8_t *buf, int oob_required, int page)
 {
@@ -1036,13 +1099,54 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 	status = auxiliary_virt + nfc_geo->auxiliary_status_offset;
 
 	for (i = 0; i < nfc_geo->ecc_chunk_count; i++, status++) {
+		bool raw_read_done = false;
+
 		if ((*status == STATUS_GOOD) || (*status == STATUS_ERASED))
 			continue;
 
 		if (*status == STATUS_UNCORRECTABLE) {
-			mtd->ecc_stats.failed++;
-			continue;
+			if (!raw_read_done) {
+				bool direct_dma_map_ok;
+
+				/*
+				 * Reading a page will override the current
+				 * direct_dma_map_ok field.
+				 * Save direct_dma_map_ok and restore it after
+				 * raw page read has completed.
+				 */
+				direct_dma_map_ok = this->direct_dma_map_ok;
+				chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page);
+				chip->read_buf(mtd, this->raw_buffer,
+					       mtd->writesize + mtd->oobsize);
+				this->direct_dma_map_ok = direct_dma_map_ok;
+
+				/*
+				 * Swap bad block marker if needed.
+				 */
+				if (this->swap_block_mark) {
+					u8 *raw_buf = this->raw_buffer;
+					u8 swap = raw_buf[0];
+
+					raw_buf[0] = raw_buf[mtd->writesize];
+					raw_buf[mtd->writesize] = swap;
+				}
+
+				raw_read_done = true;
+			}
+
+			/*
+			 * Analyse the current chunk to handle the "bitflips in
+			 * erased page" case.
+			 */
+			ret = gpmi_verify_erased_page(this, i);
+			if (ret < 0) {
+				mtd->ecc_stats.failed++;
+				continue;
+			}
+
+			*status = ret;
 		}
+
 		mtd->ecc_stats.corrected += *status;
 		max_bitflips = max_t(unsigned int, max_bitflips, *status);
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] mtd: gpmi: properly handle bitflips in erased pages
  2014-12-01 19:12 ` Boris Brezillon
  (?)
@ 2014-12-01 19:41   ` Brian Norris
  -1 siblings, 0 replies; 18+ messages in thread
From: Brian Norris @ 2014-12-01 19:41 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: David Woodhouse, linux-mtd, Huang Shijie, linux-kernel,
	linux-arm-kernel, Mike Voytovich, Roy Lee

Hi Boris,

On Mon, Dec 01, 2014 at 08:12:39PM +0100, Boris Brezillon wrote:
> Brian, I really like the idea of having a generic implementation for this
> feature (using read_page_raw) as you suggested here [1], but this implies
> having a temporary buffer to store the page read in raw mode and keep the page
> read in normal (HW ECC engine eanbled) mode, and I'm not sure we want to
> allocate more buffers than we already have.

Why does this require an additional buffer? If we've already noticed an
ECC error, we're expected to return raw data anyway, so what's the
problem with clobbering the original data with a raw version of the
data?

I really *really* do not want to merge another one-off attempt at
"solving" this problem in the low-level driver, if at all possible.

Brian

P.S. Sorry for not following up on my last attempt of a generic
solution. I don't think I got enough constructive feedback, other than
"this does not work for me", so I didn't pursue it further. If we have
the attention of more than one driver developer here (i.e., at least you
and me), then I'd really like to finish this off for good (?).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mtd: gpmi: properly handle bitflips in erased pages
@ 2014-12-01 19:41   ` Brian Norris
  0 siblings, 0 replies; 18+ messages in thread
From: Brian Norris @ 2014-12-01 19:41 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Huang Shijie, linux-kernel, linux-mtd, Roy Lee, Mike Voytovich,
	David Woodhouse, linux-arm-kernel

Hi Boris,

On Mon, Dec 01, 2014 at 08:12:39PM +0100, Boris Brezillon wrote:
> Brian, I really like the idea of having a generic implementation for this
> feature (using read_page_raw) as you suggested here [1], but this implies
> having a temporary buffer to store the page read in raw mode and keep the page
> read in normal (HW ECC engine eanbled) mode, and I'm not sure we want to
> allocate more buffers than we already have.

Why does this require an additional buffer? If we've already noticed an
ECC error, we're expected to return raw data anyway, so what's the
problem with clobbering the original data with a raw version of the
data?

I really *really* do not want to merge another one-off attempt at
"solving" this problem in the low-level driver, if at all possible.

Brian

P.S. Sorry for not following up on my last attempt of a generic
solution. I don't think I got enough constructive feedback, other than
"this does not work for me", so I didn't pursue it further. If we have
the attention of more than one driver developer here (i.e., at least you
and me), then I'd really like to finish this off for good (?).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] mtd: gpmi: properly handle bitflips in erased pages
@ 2014-12-01 19:41   ` Brian Norris
  0 siblings, 0 replies; 18+ messages in thread
From: Brian Norris @ 2014-12-01 19:41 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Boris,

On Mon, Dec 01, 2014 at 08:12:39PM +0100, Boris Brezillon wrote:
> Brian, I really like the idea of having a generic implementation for this
> feature (using read_page_raw) as you suggested here [1], but this implies
> having a temporary buffer to store the page read in raw mode and keep the page
> read in normal (HW ECC engine eanbled) mode, and I'm not sure we want to
> allocate more buffers than we already have.

Why does this require an additional buffer? If we've already noticed an
ECC error, we're expected to return raw data anyway, so what's the
problem with clobbering the original data with a raw version of the
data?

I really *really* do not want to merge another one-off attempt at
"solving" this problem in the low-level driver, if at all possible.

Brian

P.S. Sorry for not following up on my last attempt of a generic
solution. I don't think I got enough constructive feedback, other than
"this does not work for me", so I didn't pursue it further. If we have
the attention of more than one driver developer here (i.e., at least you
and me), then I'd really like to finish this off for good (?).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mtd: gpmi: properly handle bitflips in erased pages
  2014-12-01 19:41   ` Brian Norris
  (?)
@ 2014-12-01 20:18     ` Boris Brezillon
  -1 siblings, 0 replies; 18+ messages in thread
From: Boris Brezillon @ 2014-12-01 20:18 UTC (permalink / raw)
  To: Brian Norris
  Cc: David Woodhouse, linux-mtd, Huang Shijie, linux-kernel,
	linux-arm-kernel, Mike Voytovich, Roy Lee

Hi Brian,

On Mon, 1 Dec 2014 11:41:39 -0800
Brian Norris <computersforpeace@gmail.com> wrote:

> Hi Boris,
> 
> On Mon, Dec 01, 2014 at 08:12:39PM +0100, Boris Brezillon wrote:
> > Brian, I really like the idea of having a generic implementation for this
> > feature (using read_page_raw) as you suggested here [1], but this implies
> > having a temporary buffer to store the page read in raw mode and keep the page
> > read in normal (HW ECC engine eanbled) mode, and I'm not sure we want to
> > allocate more buffers than we already have.
> 
> Why does this require an additional buffer? If we've already noticed an
> ECC error, we're expected to return raw data anyway, so what's the
> problem with clobbering the original data with a raw version of the
> data?

Well in the GPMI particular case (and more generally all NAND
controllers which do not support subpage write) this is true, but if you
can do subpage write, then you might have a bit flip in a specific
chunk which is still empty, while other chunks are written and are
expecting standard ECC correction.
In this case you want to keep the 3 chunks with  standard ECC correction
and only one in raw mode with 'erased page bitflips' fixed.

> 
> I really *really* do not want to merge another one-off attempt at
> "solving" this problem in the low-level driver, if at all possible.

We're definitely on the same page, this is why I asked some feedback. 

> 
> Brian
> 
> P.S. Sorry for not following up on my last attempt of a generic
> solution. I don't think I got enough constructive feedback, other than
> "this does not work for me", so I didn't pursue it further. If we have
> the attention of more than one driver developer here (i.e., at least you
> and me), then I'd really like to finish this off for good (?).

No problem, I can help you achieve this goal ;-).

Regards,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mtd: gpmi: properly handle bitflips in erased pages
@ 2014-12-01 20:18     ` Boris Brezillon
  0 siblings, 0 replies; 18+ messages in thread
From: Boris Brezillon @ 2014-12-01 20:18 UTC (permalink / raw)
  To: Brian Norris
  Cc: Huang Shijie, linux-kernel, linux-mtd, Roy Lee, Mike Voytovich,
	David Woodhouse, linux-arm-kernel

Hi Brian,

On Mon, 1 Dec 2014 11:41:39 -0800
Brian Norris <computersforpeace@gmail.com> wrote:

> Hi Boris,
> 
> On Mon, Dec 01, 2014 at 08:12:39PM +0100, Boris Brezillon wrote:
> > Brian, I really like the idea of having a generic implementation for this
> > feature (using read_page_raw) as you suggested here [1], but this implies
> > having a temporary buffer to store the page read in raw mode and keep the page
> > read in normal (HW ECC engine eanbled) mode, and I'm not sure we want to
> > allocate more buffers than we already have.
> 
> Why does this require an additional buffer? If we've already noticed an
> ECC error, we're expected to return raw data anyway, so what's the
> problem with clobbering the original data with a raw version of the
> data?

Well in the GPMI particular case (and more generally all NAND
controllers which do not support subpage write) this is true, but if you
can do subpage write, then you might have a bit flip in a specific
chunk which is still empty, while other chunks are written and are
expecting standard ECC correction.
In this case you want to keep the 3 chunks with  standard ECC correction
and only one in raw mode with 'erased page bitflips' fixed.

> 
> I really *really* do not want to merge another one-off attempt at
> "solving" this problem in the low-level driver, if at all possible.

We're definitely on the same page, this is why I asked some feedback. 

> 
> Brian
> 
> P.S. Sorry for not following up on my last attempt of a generic
> solution. I don't think I got enough constructive feedback, other than
> "this does not work for me", so I didn't pursue it further. If we have
> the attention of more than one driver developer here (i.e., at least you
> and me), then I'd really like to finish this off for good (?).

No problem, I can help you achieve this goal ;-).

Regards,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] mtd: gpmi: properly handle bitflips in erased pages
@ 2014-12-01 20:18     ` Boris Brezillon
  0 siblings, 0 replies; 18+ messages in thread
From: Boris Brezillon @ 2014-12-01 20:18 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Brian,

On Mon, 1 Dec 2014 11:41:39 -0800
Brian Norris <computersforpeace@gmail.com> wrote:

> Hi Boris,
> 
> On Mon, Dec 01, 2014 at 08:12:39PM +0100, Boris Brezillon wrote:
> > Brian, I really like the idea of having a generic implementation for this
> > feature (using read_page_raw) as you suggested here [1], but this implies
> > having a temporary buffer to store the page read in raw mode and keep the page
> > read in normal (HW ECC engine eanbled) mode, and I'm not sure we want to
> > allocate more buffers than we already have.
> 
> Why does this require an additional buffer? If we've already noticed an
> ECC error, we're expected to return raw data anyway, so what's the
> problem with clobbering the original data with a raw version of the
> data?

Well in the GPMI particular case (and more generally all NAND
controllers which do not support subpage write) this is true, but if you
can do subpage write, then you might have a bit flip in a specific
chunk which is still empty, while other chunks are written and are
expecting standard ECC correction.
In this case you want to keep the 3 chunks with  standard ECC correction
and only one in raw mode with 'erased page bitflips' fixed.

> 
> I really *really* do not want to merge another one-off attempt at
> "solving" this problem in the low-level driver, if at all possible.

We're definitely on the same page, this is why I asked some feedback. 

> 
> Brian
> 
> P.S. Sorry for not following up on my last attempt of a generic
> solution. I don't think I got enough constructive feedback, other than
> "this does not work for me", so I didn't pursue it further. If we have
> the attention of more than one driver developer here (i.e., at least you
> and me), then I'd really like to finish this off for good (?).

No problem, I can help you achieve this goal ;-).

Regards,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mtd: gpmi: properly handle bitflips in erased pages
  2014-12-01 20:18     ` Boris Brezillon
  (?)
@ 2014-12-01 23:37       ` Brian Norris
  -1 siblings, 0 replies; 18+ messages in thread
From: Brian Norris @ 2014-12-01 23:37 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: David Woodhouse, linux-mtd, Huang Shijie, linux-kernel,
	linux-arm-kernel, Mike Voytovich, Roy Lee

On Mon, Dec 01, 2014 at 09:18:18PM +0100, Boris Brezillon wrote:
> On Mon, 1 Dec 2014 11:41:39 -0800 Brian Norris <computersforpeace@gmail.com> wrote:
> > On Mon, Dec 01, 2014 at 08:12:39PM +0100, Boris Brezillon wrote:
> > > Brian, I really like the idea of having a generic implementation for this
> > > feature (using read_page_raw) as you suggested here [1], but this implies
> > > having a temporary buffer to store the page read in raw mode and keep the page
> > > read in normal (HW ECC engine eanbled) mode, and I'm not sure we want to
> > > allocate more buffers than we already have.
> > 
> > Why does this require an additional buffer? If we've already noticed an
> > ECC error, we're expected to return raw data anyway, so what's the
> > problem with clobbering the original data with a raw version of the
> > data?
> 
> Well in the GPMI particular case (and more generally all NAND
> controllers which do not support subpage write) this is true, but if you
> can do subpage write, then you might have a bit flip in a specific
> chunk which is still empty, while other chunks are written and are
> expecting standard ECC correction.
> In this case you want to keep the 3 chunks with  standard ECC correction
> and only one in raw mode with 'erased page bitflips' fixed.

So the problem's not really with subpage write, exactly; the problem is
for drivers that support subpage write, we don't have a way to perform a
raw subpage read without touching the other subpages.

Brian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mtd: gpmi: properly handle bitflips in erased pages
@ 2014-12-01 23:37       ` Brian Norris
  0 siblings, 0 replies; 18+ messages in thread
From: Brian Norris @ 2014-12-01 23:37 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Huang Shijie, linux-kernel, linux-mtd, Roy Lee, Mike Voytovich,
	David Woodhouse, linux-arm-kernel

On Mon, Dec 01, 2014 at 09:18:18PM +0100, Boris Brezillon wrote:
> On Mon, 1 Dec 2014 11:41:39 -0800 Brian Norris <computersforpeace@gmail.com> wrote:
> > On Mon, Dec 01, 2014 at 08:12:39PM +0100, Boris Brezillon wrote:
> > > Brian, I really like the idea of having a generic implementation for this
> > > feature (using read_page_raw) as you suggested here [1], but this implies
> > > having a temporary buffer to store the page read in raw mode and keep the page
> > > read in normal (HW ECC engine eanbled) mode, and I'm not sure we want to
> > > allocate more buffers than we already have.
> > 
> > Why does this require an additional buffer? If we've already noticed an
> > ECC error, we're expected to return raw data anyway, so what's the
> > problem with clobbering the original data with a raw version of the
> > data?
> 
> Well in the GPMI particular case (and more generally all NAND
> controllers which do not support subpage write) this is true, but if you
> can do subpage write, then you might have a bit flip in a specific
> chunk which is still empty, while other chunks are written and are
> expecting standard ECC correction.
> In this case you want to keep the 3 chunks with  standard ECC correction
> and only one in raw mode with 'erased page bitflips' fixed.

So the problem's not really with subpage write, exactly; the problem is
for drivers that support subpage write, we don't have a way to perform a
raw subpage read without touching the other subpages.

Brian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] mtd: gpmi: properly handle bitflips in erased pages
@ 2014-12-01 23:37       ` Brian Norris
  0 siblings, 0 replies; 18+ messages in thread
From: Brian Norris @ 2014-12-01 23:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Dec 01, 2014 at 09:18:18PM +0100, Boris Brezillon wrote:
> On Mon, 1 Dec 2014 11:41:39 -0800 Brian Norris <computersforpeace@gmail.com> wrote:
> > On Mon, Dec 01, 2014 at 08:12:39PM +0100, Boris Brezillon wrote:
> > > Brian, I really like the idea of having a generic implementation for this
> > > feature (using read_page_raw) as you suggested here [1], but this implies
> > > having a temporary buffer to store the page read in raw mode and keep the page
> > > read in normal (HW ECC engine eanbled) mode, and I'm not sure we want to
> > > allocate more buffers than we already have.
> > 
> > Why does this require an additional buffer? If we've already noticed an
> > ECC error, we're expected to return raw data anyway, so what's the
> > problem with clobbering the original data with a raw version of the
> > data?
> 
> Well in the GPMI particular case (and more generally all NAND
> controllers which do not support subpage write) this is true, but if you
> can do subpage write, then you might have a bit flip in a specific
> chunk which is still empty, while other chunks are written and are
> expecting standard ECC correction.
> In this case you want to keep the 3 chunks with  standard ECC correction
> and only one in raw mode with 'erased page bitflips' fixed.

So the problem's not really with subpage write, exactly; the problem is
for drivers that support subpage write, we don't have a way to perform a
raw subpage read without touching the other subpages.

Brian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mtd: gpmi: properly handle bitflips in erased pages
  2014-12-01 23:37       ` Brian Norris
  (?)
@ 2014-12-02  8:28         ` Boris Brezillon
  -1 siblings, 0 replies; 18+ messages in thread
From: Boris Brezillon @ 2014-12-02  8:28 UTC (permalink / raw)
  To: Brian Norris
  Cc: David Woodhouse, linux-mtd, Huang Shijie, linux-kernel,
	linux-arm-kernel, Mike Voytovich, Roy Lee

On Mon, 1 Dec 2014 15:37:48 -0800
Brian Norris <computersforpeace@gmail.com> wrote:

> On Mon, Dec 01, 2014 at 09:18:18PM +0100, Boris Brezillon wrote:
> > On Mon, 1 Dec 2014 11:41:39 -0800 Brian Norris <computersforpeace@gmail.com> wrote:
> > > On Mon, Dec 01, 2014 at 08:12:39PM +0100, Boris Brezillon wrote:
> > > > Brian, I really like the idea of having a generic implementation for this
> > > > feature (using read_page_raw) as you suggested here [1], but this implies
> > > > having a temporary buffer to store the page read in raw mode and keep the page
> > > > read in normal (HW ECC engine eanbled) mode, and I'm not sure we want to
> > > > allocate more buffers than we already have.
> > > 
> > > Why does this require an additional buffer? If we've already noticed an
> > > ECC error, we're expected to return raw data anyway, so what's the
> > > problem with clobbering the original data with a raw version of the
> > > data?
> > 
> > Well in the GPMI particular case (and more generally all NAND
> > controllers which do not support subpage write) this is true, but if you
> > can do subpage write, then you might have a bit flip in a specific
> > chunk which is still empty, while other chunks are written and are
> > expecting standard ECC correction.
> > In this case you want to keep the 3 chunks with  standard ECC correction
> > and only one in raw mode with 'erased page bitflips' fixed.
> 
> So the problem's not really with subpage write, exactly; the problem is
> for drivers that support subpage write, we don't have a way to perform a
> raw subpage read without touching the other subpages.

Yes, that's what I was trying to explain :-), and the only solution I
see to address that is to have 2 buffers and then pick the most
appropriate data for a given chunk.
Do you think we should focus on support for "non subpage write"
controllers first, and then find an alternative for these controllers
if someone really needs it ?

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mtd: gpmi: properly handle bitflips in erased pages
@ 2014-12-02  8:28         ` Boris Brezillon
  0 siblings, 0 replies; 18+ messages in thread
From: Boris Brezillon @ 2014-12-02  8:28 UTC (permalink / raw)
  To: Brian Norris
  Cc: Huang Shijie, linux-kernel, linux-mtd, Roy Lee, Mike Voytovich,
	David Woodhouse, linux-arm-kernel

On Mon, 1 Dec 2014 15:37:48 -0800
Brian Norris <computersforpeace@gmail.com> wrote:

> On Mon, Dec 01, 2014 at 09:18:18PM +0100, Boris Brezillon wrote:
> > On Mon, 1 Dec 2014 11:41:39 -0800 Brian Norris <computersforpeace@gmail.com> wrote:
> > > On Mon, Dec 01, 2014 at 08:12:39PM +0100, Boris Brezillon wrote:
> > > > Brian, I really like the idea of having a generic implementation for this
> > > > feature (using read_page_raw) as you suggested here [1], but this implies
> > > > having a temporary buffer to store the page read in raw mode and keep the page
> > > > read in normal (HW ECC engine eanbled) mode, and I'm not sure we want to
> > > > allocate more buffers than we already have.
> > > 
> > > Why does this require an additional buffer? If we've already noticed an
> > > ECC error, we're expected to return raw data anyway, so what's the
> > > problem with clobbering the original data with a raw version of the
> > > data?
> > 
> > Well in the GPMI particular case (and more generally all NAND
> > controllers which do not support subpage write) this is true, but if you
> > can do subpage write, then you might have a bit flip in a specific
> > chunk which is still empty, while other chunks are written and are
> > expecting standard ECC correction.
> > In this case you want to keep the 3 chunks with  standard ECC correction
> > and only one in raw mode with 'erased page bitflips' fixed.
> 
> So the problem's not really with subpage write, exactly; the problem is
> for drivers that support subpage write, we don't have a way to perform a
> raw subpage read without touching the other subpages.

Yes, that's what I was trying to explain :-), and the only solution I
see to address that is to have 2 buffers and then pick the most
appropriate data for a given chunk.
Do you think we should focus on support for "non subpage write"
controllers first, and then find an alternative for these controllers
if someone really needs it ?

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] mtd: gpmi: properly handle bitflips in erased pages
@ 2014-12-02  8:28         ` Boris Brezillon
  0 siblings, 0 replies; 18+ messages in thread
From: Boris Brezillon @ 2014-12-02  8:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 1 Dec 2014 15:37:48 -0800
Brian Norris <computersforpeace@gmail.com> wrote:

> On Mon, Dec 01, 2014 at 09:18:18PM +0100, Boris Brezillon wrote:
> > On Mon, 1 Dec 2014 11:41:39 -0800 Brian Norris <computersforpeace@gmail.com> wrote:
> > > On Mon, Dec 01, 2014 at 08:12:39PM +0100, Boris Brezillon wrote:
> > > > Brian, I really like the idea of having a generic implementation for this
> > > > feature (using read_page_raw) as you suggested here [1], but this implies
> > > > having a temporary buffer to store the page read in raw mode and keep the page
> > > > read in normal (HW ECC engine eanbled) mode, and I'm not sure we want to
> > > > allocate more buffers than we already have.
> > > 
> > > Why does this require an additional buffer? If we've already noticed an
> > > ECC error, we're expected to return raw data anyway, so what's the
> > > problem with clobbering the original data with a raw version of the
> > > data?
> > 
> > Well in the GPMI particular case (and more generally all NAND
> > controllers which do not support subpage write) this is true, but if you
> > can do subpage write, then you might have a bit flip in a specific
> > chunk which is still empty, while other chunks are written and are
> > expecting standard ECC correction.
> > In this case you want to keep the 3 chunks with  standard ECC correction
> > and only one in raw mode with 'erased page bitflips' fixed.
> 
> So the problem's not really with subpage write, exactly; the problem is
> for drivers that support subpage write, we don't have a way to perform a
> raw subpage read without touching the other subpages.

Yes, that's what I was trying to explain :-), and the only solution I
see to address that is to have 2 buffers and then pick the most
appropriate data for a given chunk.
Do you think we should focus on support for "non subpage write"
controllers first, and then find an alternative for these controllers
if someone really needs it ?

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mtd: gpmi: properly handle bitflips in erased pages
  2014-12-02  8:28         ` Boris Brezillon
  (?)
@ 2014-12-02 18:22           ` Brian Norris
  -1 siblings, 0 replies; 18+ messages in thread
From: Brian Norris @ 2014-12-02 18:22 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: David Woodhouse, linux-mtd, Huang Shijie, linux-kernel,
	linux-arm-kernel, Mike Voytovich, Roy Lee

On Tue, Dec 02, 2014 at 09:28:58AM +0100, Boris Brezillon wrote:
> On Mon, 1 Dec 2014 15:37:48 -0800 Brian Norris <computersforpeace@gmail.com> wrote:
> > On Mon, Dec 01, 2014 at 09:18:18PM +0100, Boris Brezillon wrote:
> > > On Mon, 1 Dec 2014 11:41:39 -0800 Brian Norris <computersforpeace@gmail.com> wrote:
> > > > On Mon, Dec 01, 2014 at 08:12:39PM +0100, Boris Brezillon wrote:
> > > > > Brian, I really like the idea of having a generic implementation for this
> > > > > feature (using read_page_raw) as you suggested here [1], but this implies
> > > > > having a temporary buffer to store the page read in raw mode and keep the page
> > > > > read in normal (HW ECC engine eanbled) mode, and I'm not sure we want to
> > > > > allocate more buffers than we already have.
> > > > 
> > > > Why does this require an additional buffer? If we've already noticed an
> > > > ECC error, we're expected to return raw data anyway, so what's the
> > > > problem with clobbering the original data with a raw version of the
> > > > data?
> > > 
> > > Well in the GPMI particular case (and more generally all NAND
> > > controllers which do not support subpage write) this is true, but if you
> > > can do subpage write, then you might have a bit flip in a specific
> > > chunk which is still empty, while other chunks are written and are
> > > expecting standard ECC correction.
> > > In this case you want to keep the 3 chunks with  standard ECC correction
> > > and only one in raw mode with 'erased page bitflips' fixed.
> > 
> > So the problem's not really with subpage write, exactly; the problem is
> > for drivers that support subpage write, we don't have a way to perform a
> > raw subpage read without touching the other subpages.
> 
> Yes, that's what I was trying to explain :-), and the only solution I
> see to address that is to have 2 buffers and then pick the most
> appropriate data for a given chunk.

We actually sort of have two buffers already in nand_do_read_ops();
ops->databuf and chip->buffers->databuf. The former can be pretty small,
but we could technically copy in any data that is "correct" to
ops->databuf, and then clobber chip->buffers->databuf with raw data.

But this may be more work than it's worth.

> Do you think we should focus on support for "non subpage write"
> controllers first, and then find an alternative for these controllers
> if someone really needs it ?

I think that may be alright. It doesn't look trivial to try to do an
erased subpage check on the subpage-programmed case anyway, at least in
generic code. We'd have to further understand what the OOB-per-subpage
partitioning is, and that information isn't currently in our ecclayout.

Brian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mtd: gpmi: properly handle bitflips in erased pages
@ 2014-12-02 18:22           ` Brian Norris
  0 siblings, 0 replies; 18+ messages in thread
From: Brian Norris @ 2014-12-02 18:22 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Huang Shijie, linux-kernel, linux-mtd, Roy Lee, Mike Voytovich,
	David Woodhouse, linux-arm-kernel

On Tue, Dec 02, 2014 at 09:28:58AM +0100, Boris Brezillon wrote:
> On Mon, 1 Dec 2014 15:37:48 -0800 Brian Norris <computersforpeace@gmail.com> wrote:
> > On Mon, Dec 01, 2014 at 09:18:18PM +0100, Boris Brezillon wrote:
> > > On Mon, 1 Dec 2014 11:41:39 -0800 Brian Norris <computersforpeace@gmail.com> wrote:
> > > > On Mon, Dec 01, 2014 at 08:12:39PM +0100, Boris Brezillon wrote:
> > > > > Brian, I really like the idea of having a generic implementation for this
> > > > > feature (using read_page_raw) as you suggested here [1], but this implies
> > > > > having a temporary buffer to store the page read in raw mode and keep the page
> > > > > read in normal (HW ECC engine eanbled) mode, and I'm not sure we want to
> > > > > allocate more buffers than we already have.
> > > > 
> > > > Why does this require an additional buffer? If we've already noticed an
> > > > ECC error, we're expected to return raw data anyway, so what's the
> > > > problem with clobbering the original data with a raw version of the
> > > > data?
> > > 
> > > Well in the GPMI particular case (and more generally all NAND
> > > controllers which do not support subpage write) this is true, but if you
> > > can do subpage write, then you might have a bit flip in a specific
> > > chunk which is still empty, while other chunks are written and are
> > > expecting standard ECC correction.
> > > In this case you want to keep the 3 chunks with  standard ECC correction
> > > and only one in raw mode with 'erased page bitflips' fixed.
> > 
> > So the problem's not really with subpage write, exactly; the problem is
> > for drivers that support subpage write, we don't have a way to perform a
> > raw subpage read without touching the other subpages.
> 
> Yes, that's what I was trying to explain :-), and the only solution I
> see to address that is to have 2 buffers and then pick the most
> appropriate data for a given chunk.

We actually sort of have two buffers already in nand_do_read_ops();
ops->databuf and chip->buffers->databuf. The former can be pretty small,
but we could technically copy in any data that is "correct" to
ops->databuf, and then clobber chip->buffers->databuf with raw data.

But this may be more work than it's worth.

> Do you think we should focus on support for "non subpage write"
> controllers first, and then find an alternative for these controllers
> if someone really needs it ?

I think that may be alright. It doesn't look trivial to try to do an
erased subpage check on the subpage-programmed case anyway, at least in
generic code. We'd have to further understand what the OOB-per-subpage
partitioning is, and that information isn't currently in our ecclayout.

Brian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] mtd: gpmi: properly handle bitflips in erased pages
@ 2014-12-02 18:22           ` Brian Norris
  0 siblings, 0 replies; 18+ messages in thread
From: Brian Norris @ 2014-12-02 18:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 02, 2014 at 09:28:58AM +0100, Boris Brezillon wrote:
> On Mon, 1 Dec 2014 15:37:48 -0800 Brian Norris <computersforpeace@gmail.com> wrote:
> > On Mon, Dec 01, 2014 at 09:18:18PM +0100, Boris Brezillon wrote:
> > > On Mon, 1 Dec 2014 11:41:39 -0800 Brian Norris <computersforpeace@gmail.com> wrote:
> > > > On Mon, Dec 01, 2014 at 08:12:39PM +0100, Boris Brezillon wrote:
> > > > > Brian, I really like the idea of having a generic implementation for this
> > > > > feature (using read_page_raw) as you suggested here [1], but this implies
> > > > > having a temporary buffer to store the page read in raw mode and keep the page
> > > > > read in normal (HW ECC engine eanbled) mode, and I'm not sure we want to
> > > > > allocate more buffers than we already have.
> > > > 
> > > > Why does this require an additional buffer? If we've already noticed an
> > > > ECC error, we're expected to return raw data anyway, so what's the
> > > > problem with clobbering the original data with a raw version of the
> > > > data?
> > > 
> > > Well in the GPMI particular case (and more generally all NAND
> > > controllers which do not support subpage write) this is true, but if you
> > > can do subpage write, then you might have a bit flip in a specific
> > > chunk which is still empty, while other chunks are written and are
> > > expecting standard ECC correction.
> > > In this case you want to keep the 3 chunks with  standard ECC correction
> > > and only one in raw mode with 'erased page bitflips' fixed.
> > 
> > So the problem's not really with subpage write, exactly; the problem is
> > for drivers that support subpage write, we don't have a way to perform a
> > raw subpage read without touching the other subpages.
> 
> Yes, that's what I was trying to explain :-), and the only solution I
> see to address that is to have 2 buffers and then pick the most
> appropriate data for a given chunk.

We actually sort of have two buffers already in nand_do_read_ops();
ops->databuf and chip->buffers->databuf. The former can be pretty small,
but we could technically copy in any data that is "correct" to
ops->databuf, and then clobber chip->buffers->databuf with raw data.

But this may be more work than it's worth.

> Do you think we should focus on support for "non subpage write"
> controllers first, and then find an alternative for these controllers
> if someone really needs it ?

I think that may be alright. It doesn't look trivial to try to do an
erased subpage check on the subpage-programmed case anyway, at least in
generic code. We'd have to further understand what the OOB-per-subpage
partitioning is, and that information isn't currently in our ecclayout.

Brian

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-12-02 18:22 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-01 19:12 [PATCH] mtd: gpmi: properly handle bitflips in erased pages Boris Brezillon
2014-12-01 19:12 ` Boris Brezillon
2014-12-01 19:12 ` Boris Brezillon
2014-12-01 19:41 ` Brian Norris
2014-12-01 19:41   ` Brian Norris
2014-12-01 19:41   ` Brian Norris
2014-12-01 20:18   ` Boris Brezillon
2014-12-01 20:18     ` Boris Brezillon
2014-12-01 20:18     ` Boris Brezillon
2014-12-01 23:37     ` Brian Norris
2014-12-01 23:37       ` Brian Norris
2014-12-01 23:37       ` Brian Norris
2014-12-02  8:28       ` Boris Brezillon
2014-12-02  8:28         ` Boris Brezillon
2014-12-02  8:28         ` Boris Brezillon
2014-12-02 18:22         ` Brian Norris
2014-12-02 18:22           ` Brian Norris
2014-12-02 18:22           ` Brian Norris

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.