[RFC PATCH 0/2] mtd: nand: properly handle bitflips in erased pages

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 0/2] mtd: nand: properly handle bitflips in erased pages
@ 2015-07-30 17:34 Boris Brezillon
  2015-07-30 17:34 ` [RFC PATCH 1/2] mtd: nand: add nand_check_erased helper functions Boris Brezillon
  2015-07-30 17:34 ` [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions Boris Brezillon
  0 siblings, 2 replies; 26+ messages in thread
From: Boris Brezillon @ 2015-07-30 17:34 UTC (permalink / raw)
  To: David Woodhouse, Brian Norris, linux-mtd; +Cc: linux-kernel, Boris Brezillon

Hi Brian,

This is the patch series I promised a while ago.
The implementation should be generic enough to be used by controller
drivers.
The only problem I can see is for the GPMI controller which might produce
non byte aligned ECC data, and the nand_check_erased helpers are only
working with byte aligned buffers.

Best Regards,

Boris

Boris Brezillon (2):
  mtd: nand: add nand_check_erased helper functions
  mtd: nand: use nand_check_erased_ecc_chunk in default ECC read
    functions

 drivers/mtd/nand/nand_base.c | 167 ++++++++++++++++++++++++++++++++++++++++---
 include/linux/mtd/nand.h     |   8 +++
 2 files changed, 167 insertions(+), 8 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH 1/2] mtd: nand: add nand_check_erased helper functions
  2015-07-30 17:34 [RFC PATCH 0/2] mtd: nand: properly handle bitflips in erased pages Boris Brezillon
@ 2015-07-30 17:34 ` Boris Brezillon
  2015-07-31  7:10   ` Boris Brezillon
  2015-07-30 17:34 ` [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions Boris Brezillon
  1 sibling, 1 reply; 26+ messages in thread
From: Boris Brezillon @ 2015-07-30 17:34 UTC (permalink / raw)
  To: David Woodhouse, Brian Norris, linux-mtd; +Cc: linux-kernel, Boris Brezillon

Add two helper functions to help NAND controller drivers test whether a
specific NAND region is erased or not.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
---
 drivers/mtd/nand/nand_base.c | 104 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/mtd/nand.h     |   8 ++++
 2 files changed, 112 insertions(+)

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index ceb68ca..1542ea7 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -1101,6 +1101,110 @@ out:
 EXPORT_SYMBOL(nand_lock);
 
 /**
+ * nand_check_erased_buf - check if a buffer contains (almost) only 0xff data
+ * @buf: buffer to test
+ * @len: buffer length
+ * @bitflips_threshold:maximum number of bitflips
+ *
+ * Check if a buffer contains only 0xff, which means the underlying region
+ * has been erased and is ready to be programmed.
+ * The bitflips_threshold specify the maximum number of bitflips before
+ * considering the region is not erased.
+ * Note: The logic of this function has been extracted from the memweight
+ * implementation, except that nand_check_erased_buf function exit before
+ * testing the whole buffer if the number of bitflips exceed the
+ * bitflips_threshold value.
+ *
+ * Returns a positive number of bitflips or -ERROR_CODE.
+ */
+int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
+{
+	const unsigned char *bitmap = buf;
+	int bitflips = 0;
+	int weight;
+	int longs;
+
+	for (; len && ((unsigned long)bitmap) % sizeof(long);
+	     len--, bitmap++) {
+		weight = hweight8(*bitmap);
+
+		bitflips += sizeof(u8) - weight;
+		if (bitflips > bitflips_threshold)
+			return -EINVAL;
+	}
+
+
+	for (longs = len / sizeof(long); longs;
+	     longs--, bitmap += sizeof(long)) {
+		BUG_ON(longs >= INT_MAX / BITS_PER_LONG);
+		weight = hweight_long(*((unsigned long *)bitmap));
+
+		bitflips += sizeof(long) - weight;
+		if (bitflips > bitflips_threshold)
+			return -EINVAL;
+	}
+
+	len %= sizeof(long);
+
+	for (; len > 0; len--, bitmap++) {
+		weight = hweight8(*bitmap);
+		bitflips += sizeof(u8) - weight;
+	}
+
+	return bitflips;
+}
+EXPORT_SYMBOL(nand_check_erased_buf);
+
+/**
+ * nand_check_erased_ecc_chunk - check if an ECC chunk contains (almost) only
+ *				 0xff data
+ * @data: data buffer to test
+ * @datalen: data length
+ * @ecc: ECC buffer
+ * @ecclen: ECC length
+ * @extraoob: extra OOB buffer
+ * @extraooblen: extra OOB length
+ * @bitflips_threshold: maximum number of bitflips
+ *
+ * Check if a data buffer and its associated ECC and OOB data contains only
+ * 0xff pattern, which means the underlying region has been erased and is
+ * ready to be programmed.
+ * The bitflips_threshold specify the maximum number of bitflips before
+ * considering the region as not erased.
+ *
+ * Returns a positive number of bitflips or -ERROR_CODE.
+ */
+int nand_check_erased_ecc_chunk(void *data, int datalen,
+				void *ecc, int ecclen,
+				void *extraoob, int extraooblen,
+				int bitflips_threshold)
+{
+	int bitflips = 0;
+	int ret;
+
+	ret = nand_check_erased_buf(data, datalen, bitflips_threshold);
+	if (ret < 0)
+		return ret;
+
+	bitflips += ret;
+	bitflips_threshold -= ret;
+
+	ret = nand_check_erased_buf(ecc, ecclen, bitflips_threshold);
+	if (ret < 0)
+		return ret;
+
+	bitflips += ret;
+	bitflips_threshold -= ret;
+
+	ret = nand_check_erased_buf(extraoob, extraooblen, bitflips_threshold);
+	if (ret < 0)
+		return ret;
+
+	return bitflips + ret;
+}
+EXPORT_SYMBOL(nand_check_erased_ecc_chunk);
+
+/**
  * nand_read_page_raw - [INTERN] read raw page data without ecc
  * @mtd: mtd info structure
  * @chip: nand chip info structure
diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
index 272f429..ae06a07 100644
--- a/include/linux/mtd/nand.h
+++ b/include/linux/mtd/nand.h
@@ -1030,4 +1030,12 @@ struct nand_sdr_timings {
 
 /* get timing characteristics from ONFI timing mode. */
 const struct nand_sdr_timings *onfi_async_timing_mode_to_sdr_timings(int mode);
+
+int nand_check_erased_buf(void *data, int datalen,
+			  int threshold);
+
+int nand_check_erased_ecc_chunk(void *data, int datalen,
+				void *ecc, int ecclen,
+				void *extraoob, int extraooblen,
+				int threshold);
 #endif /* __LINUX_MTD_NAND_H */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions
  2015-07-30 17:34 [RFC PATCH 0/2] mtd: nand: properly handle bitflips in erased pages Boris Brezillon
  2015-07-30 17:34 ` [RFC PATCH 1/2] mtd: nand: add nand_check_erased helper functions Boris Brezillon
@ 2015-07-30 17:34 ` Boris Brezillon
  2015-07-31 10:07   ` Andrea Scian
  1 sibling, 1 reply; 26+ messages in thread
From: Boris Brezillon @ 2015-07-30 17:34 UTC (permalink / raw)
  To: David Woodhouse, Brian Norris, linux-mtd; +Cc: linux-kernel, Boris Brezillon

The default NAND read functions are relying on an underlying controller
to correct bitflips, but some of those controller cannot properly fix
bitflips in erased pages.
In case of ECC failures, check if the page of subpage is empty before
reporting an ECC failure.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
---
 drivers/mtd/nand/nand_base.c | 63 ++++++++++++++++++++++++++++++++++++++------
 1 file changed, 55 insertions(+), 8 deletions(-)

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 1542ea7..d527c73 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -1396,6 +1396,19 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip,
 		stat = chip->ecc.correct(mtd, p,
 			&chip->buffers->ecccode[i], &chip->buffers->ecccalc[i]);
 		if (stat < 0) {
+			/* check for empty pages with bitflips */
+			int col = (int)(p - bufpoi);
+
+			chip->cmdfunc(mtd, NAND_CMD_RNDOUT, col, -1);
+			chip->read_buf(mtd, p, chip->ecc.size);
+			stat = nand_check_erased_ecc_chunk(p, chip->ecc.size,
+						&chip->buffers->ecccode[i],
+						chip->ecc.bytes,
+						NULL, 0,
+						chip->ecc.strength);
+		}
+
+		if (stat < 0) {
 			mtd->ecc_stats.failed++;
 		} else {
 			mtd->ecc_stats.corrected += stat;
@@ -1445,6 +1458,16 @@ static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip,
 
 		stat = chip->ecc.correct(mtd, p, &ecc_code[i], &ecc_calc[i]);
 		if (stat < 0) {
+			/* check for empty pages with bitflips */
+			chip->cmdfunc(mtd, NAND_CMD_RNDOUT, i, -1);
+			chip->read_buf(mtd, p, eccsize);
+			stat = nand_check_erased_ecc_chunk(p, eccsize,
+						&ecc_code[i], eccbytes,
+						NULL, 0,
+						chip->ecc.strength);
+		}
+
+		if (stat < 0) {
 			mtd->ecc_stats.failed++;
 		} else {
 			mtd->ecc_stats.corrected += stat;
@@ -1495,7 +1518,18 @@ static int nand_read_page_hwecc_oob_first(struct mtd_info *mtd,
 		chip->read_buf(mtd, p, eccsize);
 		chip->ecc.calculate(mtd, p, &ecc_calc[i]);
 
-		stat = chip->ecc.correct(mtd, p, &ecc_code[i], NULL);
+		stat = chip->ecc.correct(mtd, p, &ecc_code[i], &ecc_calc[i]);
+		if (stat < 0) {
+			/* check for empty pages with bitflips */
+			chip->cmdfunc(mtd, NAND_CMD_RNDOUT,
+				      i + mtd->oobsize, -1);
+			chip->read_buf(mtd, p, eccsize);
+			stat = nand_check_erased_ecc_chunk(p, eccsize,
+						&ecc_code[i], eccbytes,
+						NULL, 0,
+						chip->ecc.strength);
+		}
+
 		if (stat < 0) {
 			mtd->ecc_stats.failed++;
 		} else {
@@ -1523,6 +1557,8 @@ static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip,
 	int i, eccsize = chip->ecc.size;
 	int eccbytes = chip->ecc.bytes;
 	int eccsteps = chip->ecc.steps;
+	int eccstepsize = eccsize + eccbytes + chip->ecc.prepad +
+			  chip->ecc.postpad;
 	uint8_t *p = buf;
 	uint8_t *oob = chip->oob_poi;
 	unsigned int max_bitflips = 0;
@@ -1542,19 +1578,30 @@ static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip,
 		chip->read_buf(mtd, oob, eccbytes);
 		stat = chip->ecc.correct(mtd, p, oob, NULL);
 
-		if (stat < 0) {
-			mtd->ecc_stats.failed++;
-		} else {
-			mtd->ecc_stats.corrected += stat;
-			max_bitflips = max_t(unsigned int, max_bitflips, stat);
-		}
-
 		oob += eccbytes;
 
 		if (chip->ecc.postpad) {
 			chip->read_buf(mtd, oob, chip->ecc.postpad);
 			oob += chip->ecc.postpad;
 		}
+
+		if (stat < 0) {
+			/* check for empty pages with bitflips */
+			chip->cmdfunc(mtd, NAND_CMD_RNDOUT, i * eccstepsize, -1);
+			chip->read_buf(mtd, p, chip->ecc.size);
+			stat = nand_check_erased_ecc_chunk(p, chip->ecc.size,
+							   oob - eccstepsize,
+							   eccstepsize,
+							   NULL, 0,
+							   chip->ecc.strength);
+		}
+
+		if (stat < 0) {
+			mtd->ecc_stats.failed++;
+		} else {
+			mtd->ecc_stats.corrected += stat;
+			max_bitflips = max_t(unsigned int, max_bitflips, stat);
+		}
 	}
 
 	/* Calculate remaining oob bytes */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 1/2] mtd: nand: add nand_check_erased helper functions
  2015-07-30 17:34 ` [RFC PATCH 1/2] mtd: nand: add nand_check_erased helper functions Boris Brezillon
@ 2015-07-31  7:10   ` Boris Brezillon
  2015-07-31 10:06     ` Andrea Scian
  2015-08-04 15:42     ` Andrea Scian
  0 siblings, 2 replies; 26+ messages in thread
From: Boris Brezillon @ 2015-07-31  7:10 UTC (permalink / raw)
  To: Boris Brezillon; +Cc: David Woodhouse, Brian Norris, linux-mtd, linux-kernel

On Thu, 30 Jul 2015 19:34:53 +0200
Boris Brezillon <boris.brezillon@free-electrons.com> wrote:

> Add two helper functions to help NAND controller drivers test whether a
> specific NAND region is erased or not.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
> ---
>  drivers/mtd/nand/nand_base.c | 104 +++++++++++++++++++++++++++++++++++++++++++
>  include/linux/mtd/nand.h     |   8 ++++
>  2 files changed, 112 insertions(+)
> 
> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> index ceb68ca..1542ea7 100644
> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -1101,6 +1101,110 @@ out:
>  EXPORT_SYMBOL(nand_lock);
>  
>  /**
> + * nand_check_erased_buf - check if a buffer contains (almost) only 0xff data
> + * @buf: buffer to test
> + * @len: buffer length
> + * @bitflips_threshold:maximum number of bitflips
> + *
> + * Check if a buffer contains only 0xff, which means the underlying region
> + * has been erased and is ready to be programmed.
> + * The bitflips_threshold specify the maximum number of bitflips before
> + * considering the region is not erased.
> + * Note: The logic of this function has been extracted from the memweight
> + * implementation, except that nand_check_erased_buf function exit before
> + * testing the whole buffer if the number of bitflips exceed the
> + * bitflips_threshold value.
> + *
> + * Returns a positive number of bitflips or -ERROR_CODE.
> + */
> +int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
> +{
> +	const unsigned char *bitmap = buf;
> +	int bitflips = 0;
> +	int weight;
> +	int longs;
> +
> +	for (; len && ((unsigned long)bitmap) % sizeof(long);
> +	     len--, bitmap++) {
> +		weight = hweight8(*bitmap);
> +
> +		bitflips += sizeof(u8) - weight;
> +		if (bitflips > bitflips_threshold)
> +			return -EINVAL;
> +	}
> +
> +
> +	for (longs = len / sizeof(long); longs;
> +	     longs--, bitmap += sizeof(long)) {
> +		BUG_ON(longs >= INT_MAX / BITS_PER_LONG);
> +		weight = hweight_long(*((unsigned long *)bitmap));
> +
> +		bitflips += sizeof(long) - weight;
> +		if (bitflips > bitflips_threshold)
> +			return -EINVAL;
> +	}
> +
> +	len %= sizeof(long);
> +
> +	for (; len > 0; len--, bitmap++) {
> +		weight = hweight8(*bitmap);
> +		bitflips += sizeof(u8) - weight;
> +	}
> +
> +	return bitflips;
> +}
> +EXPORT_SYMBOL(nand_check_erased_buf);
> +
> +/**
> + * nand_check_erased_ecc_chunk - check if an ECC chunk contains (almost) only
> + *				 0xff data
> + * @data: data buffer to test
> + * @datalen: data length
> + * @ecc: ECC buffer
> + * @ecclen: ECC length
> + * @extraoob: extra OOB buffer
> + * @extraooblen: extra OOB length
> + * @bitflips_threshold: maximum number of bitflips
> + *
> + * Check if a data buffer and its associated ECC and OOB data contains only
> + * 0xff pattern, which means the underlying region has been erased and is
> + * ready to be programmed.
> + * The bitflips_threshold specify the maximum number of bitflips before
> + * considering the region as not erased.
> + *
> + * Returns a positive number of bitflips or -ERROR_CODE.
> + */
> +int nand_check_erased_ecc_chunk(void *data, int datalen,
> +				void *ecc, int ecclen,
> +				void *extraoob, int extraooblen,
> +				int bitflips_threshold)
> +{
> +	int bitflips = 0;
> +	int ret;
> +
> +	ret = nand_check_erased_buf(data, datalen, bitflips_threshold);
> +	if (ret < 0)
> +		return ret;
> +
> +	bitflips += ret;
> +	bitflips_threshold -= ret;
> +
> +	ret = nand_check_erased_buf(ecc, ecclen, bitflips_threshold);
> +	if (ret < 0)
> +		return ret;
> +
> +	bitflips += ret;
> +	bitflips_threshold -= ret;
> +
> +	ret = nand_check_erased_buf(extraoob, extraooblen, bitflips_threshold);
> +	if (ret < 0)
> +		return ret;
> +

Forgot the memset operations here:

	memset(data, 0xff, datalen);
	memset(ecc, 0xff, ecclen);
	memset(extraoob, 0xff, extraooblen);

> +	return bitflips + ret;
> +}
> +EXPORT_SYMBOL(nand_check_erased_ecc_chunk);
> +
> +/**
>   * nand_read_page_raw - [INTERN] read raw page data without ecc
>   * @mtd: mtd info structure
>   * @chip: nand chip info structure
> diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
> index 272f429..ae06a07 100644
> --- a/include/linux/mtd/nand.h
> +++ b/include/linux/mtd/nand.h
> @@ -1030,4 +1030,12 @@ struct nand_sdr_timings {
>  
>  /* get timing characteristics from ONFI timing mode. */
>  const struct nand_sdr_timings *onfi_async_timing_mode_to_sdr_timings(int mode);
> +
> +int nand_check_erased_buf(void *data, int datalen,
> +			  int threshold);
> +
> +int nand_check_erased_ecc_chunk(void *data, int datalen,
> +				void *ecc, int ecclen,
> +				void *extraoob, int extraooblen,
> +				int threshold);
>  #endif /* __LINUX_MTD_NAND_H */



-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 1/2] mtd: nand: add nand_check_erased helper functions
  2015-07-31  7:10   ` Boris Brezillon
@ 2015-07-31 10:06     ` Andrea Scian
  2015-07-31 10:21       ` Boris Brezillon
  2015-08-04 15:42     ` Andrea Scian
  1 sibling, 1 reply; 26+ messages in thread
From: Andrea Scian @ 2015-07-31 10:06 UTC (permalink / raw)
  To: Boris Brezillon; +Cc: linux-mtd, David Woodhouse, Brian Norris, linux-kernel


Dear Boris,

thanks for pointing this out again.

I'm on the same topic too, using iMX6 (I'll try to test you patch on the 
next days, if I found some spare time, unfortunately I got a 3.10 
kernel, so I think the patch will not apply cleanly :-( ).

See my comment below (and on the next mail too)

Il 31/07/2015 09:10, Boris Brezillon ha scritto:
> On Thu, 30 Jul 2015 19:34:53 +0200
> Boris Brezillon <boris.brezillon@free-electrons.com> wrote:
>
>> Add two helper functions to help NAND controller drivers test whether a
>> specific NAND region is erased or not.
>>
>> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
>> ---
>>   drivers/mtd/nand/nand_base.c | 104 +++++++++++++++++++++++++++++++++++++++++++
>>   include/linux/mtd/nand.h     |   8 ++++
>>   2 files changed, 112 insertions(+)
>>
>> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
>> index ceb68ca..1542ea7 100644
>> --- a/drivers/mtd/nand/nand_base.c
>> +++ b/drivers/mtd/nand/nand_base.c
>> @@ -1101,6 +1101,110 @@ out:
>>   EXPORT_SYMBOL(nand_lock);
>>
>>   /**
>> + * nand_check_erased_buf - check if a buffer contains (almost) only 0xff data
>> + * @buf: buffer to test
>> + * @len: buffer length
>> + * @bitflips_threshold:maximum number of bitflips
>> + *
>> + * Check if a buffer contains only 0xff, which means the underlying region
>> + * has been erased and is ready to be programmed.
>> + * The bitflips_threshold specify the maximum number of bitflips before
>> + * considering the region is not erased.
>> + * Note: The logic of this function has been extracted from the memweight
>> + * implementation, except that nand_check_erased_buf function exit before
>> + * testing the whole buffer if the number of bitflips exceed the
>> + * bitflips_threshold value.
>> + *
>> + * Returns a positive number of bitflips or -ERROR_CODE.
>> + */
>> +int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
>> +{
>> +	const unsigned char *bitmap = buf;
>> +	int bitflips = 0;
>> +	int weight;
>> +	int longs;
>> +
>> +	for (; len && ((unsigned long)bitmap) % sizeof(long);
>> +	     len--, bitmap++) {
>> +		weight = hweight8(*bitmap);
>> +
>> +		bitflips += sizeof(u8) - weight;
>> +		if (bitflips > bitflips_threshold)
>> +			return -EINVAL;

I think it's better to do something like:

if (UNLIKELY(bitflips > bitflips_threshold))
	return -EINVAL;

isn't it? :-)
(the same for the other if)


>> +	}
>> +
>> +
>> +	for (longs = len / sizeof(long); longs;
>> +	     longs--, bitmap += sizeof(long)) {
>> +		BUG_ON(longs >= INT_MAX / BITS_PER_LONG);
>> +		weight = hweight_long(*((unsigned long *)bitmap));
>> +
>> +		bitflips += sizeof(long) - weight;
>> +		if (bitflips > bitflips_threshold)
>> +			return -EINVAL;
>> +	}
>> +
>> +	len %= sizeof(long);
>> +
>> +	for (; len > 0; len--, bitmap++) {
>> +		weight = hweight8(*bitmap);
>> +		bitflips += sizeof(u8) - weight;
>> +	}
>> +
>> +	return bitflips;
>> +}
>> +EXPORT_SYMBOL(nand_check_erased_buf);
>> +
>> +/**
>> + * nand_check_erased_ecc_chunk - check if an ECC chunk contains (almost) only
>> + *				 0xff data
>> + * @data: data buffer to test
>> + * @datalen: data length
>> + * @ecc: ECC buffer
>> + * @ecclen: ECC length
>> + * @extraoob: extra OOB buffer
>> + * @extraooblen: extra OOB length
>> + * @bitflips_threshold: maximum number of bitflips
>> + *
>> + * Check if a data buffer and its associated ECC and OOB data contains only
>> + * 0xff pattern, which means the underlying region has been erased and is
>> + * ready to be programmed.
>> + * The bitflips_threshold specify the maximum number of bitflips before
>> + * considering the region as not erased.
>> + *
>> + * Returns a positive number of bitflips or -ERROR_CODE.
>> + */
>> +int nand_check_erased_ecc_chunk(void *data, int datalen,
>> +				void *ecc, int ecclen,
>> +				void *extraoob, int extraooblen,
>> +				int bitflips_threshold)
>> +{
>> +	int bitflips = 0;
>> +	int ret;
>> +
>> +	ret = nand_check_erased_buf(data, datalen, bitflips_threshold);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	bitflips += ret;
>> +	bitflips_threshold -= ret;
>> +
>> +	ret = nand_check_erased_buf(ecc, ecclen, bitflips_threshold);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	bitflips += ret;
>> +	bitflips_threshold -= ret;
>> +
>> +	ret = nand_check_erased_buf(extraoob, extraooblen, bitflips_threshold);
>> +	if (ret < 0)
>> +		return ret;
>> +
>
> Forgot the memset operations here:
>
> 	memset(data, 0xff, datalen);
> 	memset(ecc, 0xff, ecclen);
> 	memset(extraoob, 0xff, extraooblen);

Yes, you're right.. I did the same mistake on my first implementation 
too ;-)

As additional optimization you may also check if the lower layer already 
did the check for you (e.g. if you have an iMXQP as we saw in latest 
days), but I think it's a minor one, because you'll face this situation 
very very unlikely.

-- 

Andrea SCIAN

DAVE Embedded Systems

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions
  2015-07-30 17:34 ` [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions Boris Brezillon
@ 2015-07-31 10:07   ` Andrea Scian
  2015-07-31 10:32     ` Boris Brezillon
  0 siblings, 1 reply; 26+ messages in thread
From: Andrea Scian @ 2015-07-31 10:07 UTC (permalink / raw)
  To: Boris Brezillon; +Cc: linux-mtd, David Woodhouse, Brian Norris, linux-kernel


Dear Boris,


Il 30/07/2015 19:34, Boris Brezillon ha scritto:
> The default NAND read functions are relying on an underlying controller
> to correct bitflips, but some of those controller cannot properly fix
> bitflips in erased pages.
> In case of ECC failures, check if the page of subpage is empty before
> reporting an ECC failure.

I'm still wondering if chip->ecc.strength is the right threshold.

Did you see my comments here [1]? WDYT?

Maybe we can have this discussion in a separate thread, if you want ;-)

Kind Regards,

-- 

Andrea SCIAN

DAVE Embedded Systems

[1] http://lists.infradead.org/pipermail/linux-mtd/2015-July/060520.html

>
> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
> ---
>   drivers/mtd/nand/nand_base.c | 63 ++++++++++++++++++++++++++++++++++++++------
>   1 file changed, 55 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> index 1542ea7..d527c73 100644
> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -1396,6 +1396,19 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip,
>   		stat = chip->ecc.correct(mtd, p,
>   			&chip->buffers->ecccode[i], &chip->buffers->ecccalc[i]);
>   		if (stat < 0) {
> +			/* check for empty pages with bitflips */
> +			int col = (int)(p - bufpoi);
> +
> +			chip->cmdfunc(mtd, NAND_CMD_RNDOUT, col, -1);
> +			chip->read_buf(mtd, p, chip->ecc.size);
> +			stat = nand_check_erased_ecc_chunk(p, chip->ecc.size,
> +						&chip->buffers->ecccode[i],
> +						chip->ecc.bytes,
> +						NULL, 0,
> +						chip->ecc.strength);
> +		}
> +
> +		if (stat < 0) {
>   			mtd->ecc_stats.failed++;
>   		} else {
>   			mtd->ecc_stats.corrected += stat;
> @@ -1445,6 +1458,16 @@ static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip,
>
>   		stat = chip->ecc.correct(mtd, p, &ecc_code[i], &ecc_calc[i]);
>   		if (stat < 0) {
> +			/* check for empty pages with bitflips */
> +			chip->cmdfunc(mtd, NAND_CMD_RNDOUT, i, -1);
> +			chip->read_buf(mtd, p, eccsize);
> +			stat = nand_check_erased_ecc_chunk(p, eccsize,
> +						&ecc_code[i], eccbytes,
> +						NULL, 0,
> +						chip->ecc.strength);
> +		}
> +
> +		if (stat < 0) {
>   			mtd->ecc_stats.failed++;
>   		} else {
>   			mtd->ecc_stats.corrected += stat;
> @@ -1495,7 +1518,18 @@ static int nand_read_page_hwecc_oob_first(struct mtd_info *mtd,
>   		chip->read_buf(mtd, p, eccsize);
>   		chip->ecc.calculate(mtd, p, &ecc_calc[i]);
>
> -		stat = chip->ecc.correct(mtd, p, &ecc_code[i], NULL);
> +		stat = chip->ecc.correct(mtd, p, &ecc_code[i], &ecc_calc[i]);
> +		if (stat < 0) {
> +			/* check for empty pages with bitflips */
> +			chip->cmdfunc(mtd, NAND_CMD_RNDOUT,
> +				      i + mtd->oobsize, -1);
> +			chip->read_buf(mtd, p, eccsize);
> +			stat = nand_check_erased_ecc_chunk(p, eccsize,
> +						&ecc_code[i], eccbytes,
> +						NULL, 0,
> +						chip->ecc.strength);
> +		}
> +
>   		if (stat < 0) {
>   			mtd->ecc_stats.failed++;
>   		} else {
> @@ -1523,6 +1557,8 @@ static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip,
>   	int i, eccsize = chip->ecc.size;
>   	int eccbytes = chip->ecc.bytes;
>   	int eccsteps = chip->ecc.steps;
> +	int eccstepsize = eccsize + eccbytes + chip->ecc.prepad +
> +			  chip->ecc.postpad;
>   	uint8_t *p = buf;
>   	uint8_t *oob = chip->oob_poi;
>   	unsigned int max_bitflips = 0;
> @@ -1542,19 +1578,30 @@ static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip,
>   		chip->read_buf(mtd, oob, eccbytes);
>   		stat = chip->ecc.correct(mtd, p, oob, NULL);
>
> -		if (stat < 0) {
> -			mtd->ecc_stats.failed++;
> -		} else {
> -			mtd->ecc_stats.corrected += stat;
> -			max_bitflips = max_t(unsigned int, max_bitflips, stat);
> -		}
> -
>   		oob += eccbytes;
>
>   		if (chip->ecc.postpad) {
>   			chip->read_buf(mtd, oob, chip->ecc.postpad);
>   			oob += chip->ecc.postpad;
>   		}
> +
> +		if (stat < 0) {
> +			/* check for empty pages with bitflips */
> +			chip->cmdfunc(mtd, NAND_CMD_RNDOUT, i * eccstepsize, -1);
> +			chip->read_buf(mtd, p, chip->ecc.size);
> +			stat = nand_check_erased_ecc_chunk(p, chip->ecc.size,
> +							   oob - eccstepsize,
> +							   eccstepsize,
> +							   NULL, 0,
> +							   chip->ecc.strength);
> +		}
> +
> +		if (stat < 0) {
> +			mtd->ecc_stats.failed++;
> +		} else {
> +			mtd->ecc_stats.corrected += stat;
> +			max_bitflips = max_t(unsigned int, max_bitflips, stat);
> +		}
>   	}
>
>   	/* Calculate remaining oob bytes */
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 1/2] mtd: nand: add nand_check_erased helper functions
  2015-07-31 10:06     ` Andrea Scian
@ 2015-07-31 10:21       ` Boris Brezillon
  2015-07-31 13:29         ` Andrea Scian
  0 siblings, 1 reply; 26+ messages in thread
From: Boris Brezillon @ 2015-07-31 10:21 UTC (permalink / raw)
  To: Andrea Scian; +Cc: linux-mtd, David Woodhouse, Brian Norris, linux-kernel

Hi Andrea,

On Fri, 31 Jul 2015 12:06:32 +0200
Andrea Scian <rnd4@dave-tech.it> wrote:

> 
> Dear Boris,
> 
> thanks for pointing this out again.
> 
> I'm on the same topic too, using iMX6 (I'll try to test you patch on the 
> next days, if I found some spare time, unfortunately I got a 3.10 
> kernel, so I think the patch will not apply cleanly :-( ).
> 
> See my comment below (and on the next mail too)
> 
> Il 31/07/2015 09:10, Boris Brezillon ha scritto:
> > On Thu, 30 Jul 2015 19:34:53 +0200
> > Boris Brezillon <boris.brezillon@free-electrons.com> wrote:
> >
> >> Add two helper functions to help NAND controller drivers test whether a
> >> specific NAND region is erased or not.
> >>
> >> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
> >> ---
> >>   drivers/mtd/nand/nand_base.c | 104 +++++++++++++++++++++++++++++++++++++++++++
> >>   include/linux/mtd/nand.h     |   8 ++++
> >>   2 files changed, 112 insertions(+)
> >>
> >> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> >> index ceb68ca..1542ea7 100644
> >> --- a/drivers/mtd/nand/nand_base.c
> >> +++ b/drivers/mtd/nand/nand_base.c
> >> @@ -1101,6 +1101,110 @@ out:
> >>   EXPORT_SYMBOL(nand_lock);
> >>
> >>   /**
> >> + * nand_check_erased_buf - check if a buffer contains (almost) only 0xff data
> >> + * @buf: buffer to test
> >> + * @len: buffer length
> >> + * @bitflips_threshold:maximum number of bitflips
> >> + *
> >> + * Check if a buffer contains only 0xff, which means the underlying region
> >> + * has been erased and is ready to be programmed.
> >> + * The bitflips_threshold specify the maximum number of bitflips before
> >> + * considering the region is not erased.
> >> + * Note: The logic of this function has been extracted from the memweight
> >> + * implementation, except that nand_check_erased_buf function exit before
> >> + * testing the whole buffer if the number of bitflips exceed the
> >> + * bitflips_threshold value.
> >> + *
> >> + * Returns a positive number of bitflips or -ERROR_CODE.
> >> + */
> >> +int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
> >> +{
> >> +	const unsigned char *bitmap = buf;
> >> +	int bitflips = 0;
> >> +	int weight;
> >> +	int longs;
> >> +
> >> +	for (; len && ((unsigned long)bitmap) % sizeof(long);
> >> +	     len--, bitmap++) {
> >> +		weight = hweight8(*bitmap);
> >> +
> >> +		bitflips += sizeof(u8) - weight;
> >> +		if (bitflips > bitflips_threshold)
> >> +			return -EINVAL;
> 
> I think it's better to do something like:
> 
> if (UNLIKELY(bitflips > bitflips_threshold))
> 	return -EINVAL;
> 
> isn't it? :-)
> (the same for the other if)

Maybe, or maybe not. It depends on whether you expect to have a lot of
corrupted pages or a lot of blank pages with bitflips ;-).
Anyway, I'm not opposed to this change.

> 
> 
> >> +	}
> >> +
> >> +
> >> +	for (longs = len / sizeof(long); longs;
> >> +	     longs--, bitmap += sizeof(long)) {
> >> +		BUG_ON(longs >= INT_MAX / BITS_PER_LONG);
> >> +		weight = hweight_long(*((unsigned long *)bitmap));
> >> +
> >> +		bitflips += sizeof(long) - weight;
> >> +		if (bitflips > bitflips_threshold)
> >> +			return -EINVAL;
> >> +	}
> >> +
> >> +	len %= sizeof(long);
> >> +
> >> +	for (; len > 0; len--, bitmap++) {
> >> +		weight = hweight8(*bitmap);
> >> +		bitflips += sizeof(u8) - weight;
> >> +	}
> >> +
> >> +	return bitflips;
> >> +}
> >> +EXPORT_SYMBOL(nand_check_erased_buf);
> >> +
> >> +/**
> >> + * nand_check_erased_ecc_chunk - check if an ECC chunk contains (almost) only
> >> + *				 0xff data
> >> + * @data: data buffer to test
> >> + * @datalen: data length
> >> + * @ecc: ECC buffer
> >> + * @ecclen: ECC length
> >> + * @extraoob: extra OOB buffer
> >> + * @extraooblen: extra OOB length
> >> + * @bitflips_threshold: maximum number of bitflips
> >> + *
> >> + * Check if a data buffer and its associated ECC and OOB data contains only
> >> + * 0xff pattern, which means the underlying region has been erased and is
> >> + * ready to be programmed.
> >> + * The bitflips_threshold specify the maximum number of bitflips before
> >> + * considering the region as not erased.
> >> + *
> >> + * Returns a positive number of bitflips or -ERROR_CODE.
> >> + */
> >> +int nand_check_erased_ecc_chunk(void *data, int datalen,
> >> +				void *ecc, int ecclen,
> >> +				void *extraoob, int extraooblen,
> >> +				int bitflips_threshold)
> >> +{
> >> +	int bitflips = 0;
> >> +	int ret;
> >> +
> >> +	ret = nand_check_erased_buf(data, datalen, bitflips_threshold);
> >> +	if (ret < 0)
> >> +		return ret;
> >> +
> >> +	bitflips += ret;
> >> +	bitflips_threshold -= ret;
> >> +
> >> +	ret = nand_check_erased_buf(ecc, ecclen, bitflips_threshold);
> >> +	if (ret < 0)
> >> +		return ret;
> >> +
> >> +	bitflips += ret;
> >> +	bitflips_threshold -= ret;
> >> +
> >> +	ret = nand_check_erased_buf(extraoob, extraooblen, bitflips_threshold);
> >> +	if (ret < 0)
> >> +		return ret;
> >> +
> >
> > Forgot the memset operations here:
> >
> > 	memset(data, 0xff, datalen);
> > 	memset(ecc, 0xff, ecclen);
> > 	memset(extraoob, 0xff, extraooblen);
> 
> Yes, you're right.. I did the same mistake on my first implementation 
> too ;-)

Hehe.

> 
> As additional optimization you may also check if the lower layer already 
> did the check for you (e.g. if you have an iMXQP as we saw in latest 
> days), but I think it's a minor one, because you'll face this situation 
> very very unlikely.

If the hardware is capable of doing such test (I mean counting the
number of bits to one and considering the page as erased under a given
limit of bitflips), there's a lot of chance it will implement its own
ecc_read_page function, and will never use this helper.

Best Regards,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions
  2015-07-31 10:07   ` Andrea Scian
@ 2015-07-31 10:32     ` Boris Brezillon
  2015-07-31 13:40       ` Andrea Scian
  0 siblings, 1 reply; 26+ messages in thread
From: Boris Brezillon @ 2015-07-31 10:32 UTC (permalink / raw)
  To: Andrea Scian
  Cc: linux-mtd, David Woodhouse, Brian Norris, linux-kernel, Han Xu

Hi Andrea,

Adding Han in Cc.

On Fri, 31 Jul 2015 12:07:21 +0200
Andrea Scian <rnd4@dave-tech.it> wrote:

> 
> Dear Boris,
> 
> 
> Il 30/07/2015 19:34, Boris Brezillon ha scritto:
> > The default NAND read functions are relying on an underlying controller
> > to correct bitflips, but some of those controller cannot properly fix
> > bitflips in erased pages.
> > In case of ECC failures, check if the page of subpage is empty before
> > reporting an ECC failure.
> 
> I'm still wondering if chip->ecc.strength is the right threshold.
> 
> Did you see my comments here [1]? WDYT?

Yes I've read it, and decided to go for ecc->strength as a first
step (I'm more interested in discussing the approach than the threshold
value right now ;-)).

Anyway, as you pointed out in the thread, writing data on an erased
page already containing some bitflips might generate even more
bitflips, so using a different threshold for the erased page check
makes sense. This threshold should definitely be correlated to the ECC
strength, but how, that's the question.

How about taking a rather conservative value like 10% of the specified
ECC strength, and see how it goes.

> 
> Maybe we can have this discussion in a separate thread, if you want ;-)

No, I think we should keep discussing it in this thread.

Thanks,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 1/2] mtd: nand: add nand_check_erased helper functions
  2015-07-31 10:21       ` Boris Brezillon
@ 2015-07-31 13:29         ` Andrea Scian
  2015-07-31 13:58           ` Boris Brezillon
  0 siblings, 1 reply; 26+ messages in thread
From: Andrea Scian @ 2015-07-31 13:29 UTC (permalink / raw)
  To: Boris Brezillon; +Cc: linux-mtd, David Woodhouse, Brian Norris, linux-kernel


Boris,

Il 31/07/2015 12:21, Boris Brezillon ha scritto:
> Hi Andrea,
>
> On Fri, 31 Jul 2015 12:06:32 +0200
> Andrea Scian <rnd4@dave-tech.it> wrote:
>
>>
>> Dear Boris,
>>
>> thanks for pointing this out again.
>>
>> I'm on the same topic too, using iMX6 (I'll try to test you patch on the
>> next days, if I found some spare time, unfortunately I got a 3.10
>> kernel, so I think the patch will not apply cleanly :-( ).
>>
>> See my comment below (and on the next mail too)
>>
>> Il 31/07/2015 09:10, Boris Brezillon ha scritto:
>>> On Thu, 30 Jul 2015 19:34:53 +0200
>>> Boris Brezillon <boris.brezillon@free-electrons.com> wrote:
>>>
>>>> Add two helper functions to help NAND controller drivers test whether a
>>>> specific NAND region is erased or not.
>>>>
>>>> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
>>>> ---
>>>>    drivers/mtd/nand/nand_base.c | 104 +++++++++++++++++++++++++++++++++++++++++++
>>>>    include/linux/mtd/nand.h     |   8 ++++
>>>>    2 files changed, 112 insertions(+)
>>>>
>>>> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
>>>> index ceb68ca..1542ea7 100644
>>>> --- a/drivers/mtd/nand/nand_base.c
>>>> +++ b/drivers/mtd/nand/nand_base.c
>>>> @@ -1101,6 +1101,110 @@ out:
>>>>    EXPORT_SYMBOL(nand_lock);
>>>>
>>>>    /**
>>>> + * nand_check_erased_buf - check if a buffer contains (almost) only 0xff data
>>>> + * @buf: buffer to test
>>>> + * @len: buffer length
>>>> + * @bitflips_threshold:maximum number of bitflips
>>>> + *
>>>> + * Check if a buffer contains only 0xff, which means the underlying region
>>>> + * has been erased and is ready to be programmed.
>>>> + * The bitflips_threshold specify the maximum number of bitflips before
>>>> + * considering the region is not erased.
>>>> + * Note: The logic of this function has been extracted from the memweight
>>>> + * implementation, except that nand_check_erased_buf function exit before
>>>> + * testing the whole buffer if the number of bitflips exceed the
>>>> + * bitflips_threshold value.
>>>> + *
>>>> + * Returns a positive number of bitflips or -ERROR_CODE.
>>>> + */
>>>> +int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
>>>> +{
>>>> +	const unsigned char *bitmap = buf;
>>>> +	int bitflips = 0;
>>>> +	int weight;
>>>> +	int longs;
>>>> +
>>>> +	for (; len && ((unsigned long)bitmap) % sizeof(long);
>>>> +	     len--, bitmap++) {
>>>> +		weight = hweight8(*bitmap);
>>>> +
>>>> +		bitflips += sizeof(u8) - weight;
>>>> +		if (bitflips > bitflips_threshold)
>>>> +			return -EINVAL;
>>
>> I think it's better to do something like:
>>
>> if (UNLIKELY(bitflips > bitflips_threshold))
>> 	return -EINVAL;
>>
>> isn't it? :-)
>> (the same for the other if)
>
> Maybe, or maybe not. It depends on whether you expect to have a lot of
> corrupted pages or a lot of blank pages with bitflips ;-).
> Anyway, I'm not opposed to this change.

I think that everything implemented inside the MTD stack 
(NAND/MTD/UBI/UBIFS) should lead to a "working" solid state device, that 
do not show any uncorrectable bitflips.
Uncorrectable pages, IMO, should happens, on stable systems, only in 
some rare case, because it means that you loss some data (or power 
during erase/write. Any other case?).

What is more frequent is that bitflips > mtd->bitflip_threshold (by 
default DIV_ROUND_UP(mtd->ecc_strength * 3, 4)), which should avoid 
bitflips > ecc_strength

>>
>>
>>>> +	}
>>>> +
>>>> +
>>>> +	for (longs = len / sizeof(long); longs;
>>>> +	     longs--, bitmap += sizeof(long)) {
>>>> +		BUG_ON(longs >= INT_MAX / BITS_PER_LONG);
>>>> +		weight = hweight_long(*((unsigned long *)bitmap));
>>>> +
>>>> +		bitflips += sizeof(long) - weight;
>>>> +		if (bitflips > bitflips_threshold)
>>>> +			return -EINVAL;
>>>> +	}
>>>> +
>>>> +	len %= sizeof(long);
>>>> +
>>>> +	for (; len > 0; len--, bitmap++) {
>>>> +		weight = hweight8(*bitmap);
>>>> +		bitflips += sizeof(u8) - weight;
>>>> +	}
>>>> +
>>>> +	return bitflips;
>>>> +}
>>>> +EXPORT_SYMBOL(nand_check_erased_buf);
>>>> +
>>>> +/**
>>>> + * nand_check_erased_ecc_chunk - check if an ECC chunk contains (almost) only
>>>> + *				 0xff data
>>>> + * @data: data buffer to test
>>>> + * @datalen: data length
>>>> + * @ecc: ECC buffer
>>>> + * @ecclen: ECC length
>>>> + * @extraoob: extra OOB buffer
>>>> + * @extraooblen: extra OOB length
>>>> + * @bitflips_threshold: maximum number of bitflips
>>>> + *
>>>> + * Check if a data buffer and its associated ECC and OOB data contains only
>>>> + * 0xff pattern, which means the underlying region has been erased and is
>>>> + * ready to be programmed.
>>>> + * The bitflips_threshold specify the maximum number of bitflips before
>>>> + * considering the region as not erased.
>>>> + *
>>>> + * Returns a positive number of bitflips or -ERROR_CODE.
>>>> + */
>>>> +int nand_check_erased_ecc_chunk(void *data, int datalen,
>>>> +				void *ecc, int ecclen,
>>>> +				void *extraoob, int extraooblen,
>>>> +				int bitflips_threshold)
>>>> +{
>>>> +	int bitflips = 0;
>>>> +	int ret;
>>>> +
>>>> +	ret = nand_check_erased_buf(data, datalen, bitflips_threshold);
>>>> +	if (ret < 0)
>>>> +		return ret;
>>>> +
>>>> +	bitflips += ret;
>>>> +	bitflips_threshold -= ret;
>>>> +
>>>> +	ret = nand_check_erased_buf(ecc, ecclen, bitflips_threshold);
>>>> +	if (ret < 0)
>>>> +		return ret;
>>>> +
>>>> +	bitflips += ret;
>>>> +	bitflips_threshold -= ret;
>>>> +
>>>> +	ret = nand_check_erased_buf(extraoob, extraooblen, bitflips_threshold);
>>>> +	if (ret < 0)
>>>> +		return ret;
>>>> +
>>>
>>> Forgot the memset operations here:
>>>
>>> 	memset(data, 0xff, datalen);
>>> 	memset(ecc, 0xff, ecclen);
>>> 	memset(extraoob, 0xff, extraooblen);
>>
>> Yes, you're right.. I did the same mistake on my first implementation
>> too ;-)
>
> Hehe.
>
>>
>> As additional optimization you may also check if the lower layer already
>> did the check for you (e.g. if you have an iMXQP as we saw in latest
>> days), but I think it's a minor one, because you'll face this situation
>> very very unlikely.
>
> If the hardware is capable of doing such test (I mean counting the
> number of bits to one and considering the page as erased under a given
> limit of bitflips), there's a lot of chance it will implement its own
> ecc_read_page function, and will never use this helper.
>

Ops.. I misunderstand your patch. I think it was something similar to 
what Brian already proposed some time ago [1].
IIUC Brial solution works, out of the box, even with the ones that 
override read_page callback, as I think most of current nand controller 
do (please correct me if I'm wrong).
If we want to add erased block check to omap2.c, atmel_nand.c, 
sh_flctl.c we have to modify them all.

I'm really not the right one to make such a decision ;-) but I think you 
already thought about it and can tell me the pros and cons of your patch 
vs the Brian's one.

What I understand up until now, is that Brian solution does not fit into 
all weird stuff that we find in single NAND controller implementation 
and this is where your solution come in handy. Am I wrong?

Kind Regards,

-- 

Andrea SCIAN

DAVE Embedded Systems

[1] http://article.gmane.org/gmane.linux.drivers.mtd/52216

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions
  2015-07-31 10:32     ` Boris Brezillon
@ 2015-07-31 13:40       ` Andrea Scian
  2015-07-31 14:10         ` Boris Brezillon
  0 siblings, 1 reply; 26+ messages in thread
From: Andrea Scian @ 2015-07-31 13:40 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: linux-mtd, David Woodhouse, Brian Norris, linux-kernel, Han Xu


Boris,

Il 31/07/2015 12:32, Boris Brezillon ha scritto:
> Hi Andrea,
>
> Adding Han in Cc.
>
> On Fri, 31 Jul 2015 12:07:21 +0200
> Andrea Scian <rnd4@dave-tech.it> wrote:
>
>>
>> Dear Boris,
>>
>>
>> Il 30/07/2015 19:34, Boris Brezillon ha scritto:
>>> The default NAND read functions are relying on an underlying controller
>>> to correct bitflips, but some of those controller cannot properly fix
>>> bitflips in erased pages.
>>> In case of ECC failures, check if the page of subpage is empty before
>>> reporting an ECC failure.
>>
>> I'm still wondering if chip->ecc.strength is the right threshold.
>>
>> Did you see my comments here [1]? WDYT?
>
> Yes I've read it, and decided to go for ecc->strength as a first
> step (I'm more interested in discussing the approach than the threshold
> value right now ;-)).

I perfectly understand, that's the reason why I ask if you want to move 
to another thread ;-)

> Anyway, as you pointed out in the thread, writing data on an erased
> page already containing some bitflips might generate even more
> bitflips, so using a different threshold for the erased page check
> makes sense. This threshold should definitely be correlated to the ECC
> strength, but how, that's the question.
>
> How about taking a rather conservative value like 10% of the specified
> ECC strength, and see how it goes.

Yes, I think that there's no real way to get the right value, other than 
feedbacks from on-field testing with various devices.

I'm also thinking about changing how a NAND page is written on the 
device, now that we know that even erased page may have (too many!) 
bitflips if they has not been so-freshly erased.

Read on NAND device is lot's faster that write, so maybe we can:

a) read the page before write it, check for bitflips on erased area and 
write it only if it fit our threshold

b) read the page after write it and check if the bitflips are lower that 
a give value

In this way:
- we can use ecc_strength as read threshold, because it fits all the 
other NAND read

- we can use "something a bit lower than" mtd->bitflip_threshold on 
read-before-write or read-after-write. If we don't do so the block will 
be scrubbed next time we read it again (if we are lucky.. if we are 
unlucky the block will have bitflip > ecc_strength!): IOW we did a write 
that will trigger another erase/write cycle.

Am I misunderstanding something?


>> Maybe we can have this discussion in a separate thread, if you want ;-)
>
> No, I think we should keep discussing it in this thread.

OK

KR,

-- 

Andrea SCIAN

DAVE Embedded Systems

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 1/2] mtd: nand: add nand_check_erased helper functions
  2015-07-31 13:29         ` Andrea Scian
@ 2015-07-31 13:58           ` Boris Brezillon
  0 siblings, 0 replies; 26+ messages in thread
From: Boris Brezillon @ 2015-07-31 13:58 UTC (permalink / raw)
  To: Andrea Scian; +Cc: linux-mtd, David Woodhouse, Brian Norris, linux-kernel

On Fri, 31 Jul 2015 15:29:21 +0200
Andrea Scian <rnd4@dave-tech.it> wrote:

> >>>> +int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
> >>>> +{
> >>>> +	const unsigned char *bitmap = buf;
> >>>> +	int bitflips = 0;
> >>>> +	int weight;
> >>>> +	int longs;
> >>>> +
> >>>> +	for (; len && ((unsigned long)bitmap) % sizeof(long);
> >>>> +	     len--, bitmap++) {
> >>>> +		weight = hweight8(*bitmap);
> >>>> +
> >>>> +		bitflips += sizeof(u8) - weight;
> >>>> +		if (bitflips > bitflips_threshold)
> >>>> +			return -EINVAL;
> >>
> >> I think it's better to do something like:
> >>
> >> if (UNLIKELY(bitflips > bitflips_threshold))
> >> 	return -EINVAL;
> >>
> >> isn't it? :-)
> >> (the same for the other if)
> >
> > Maybe, or maybe not. It depends on whether you expect to have a lot of
> > corrupted pages or a lot of blank pages with bitflips ;-).
> > Anyway, I'm not opposed to this change.
> 
> I think that everything implemented inside the MTD stack 
> (NAND/MTD/UBI/UBIFS) should lead to a "working" solid state device, that 
> do not show any uncorrectable bitflips.
> Uncorrectable pages, IMO, should happens, on stable systems, only in 
> some rare case, because it means that you loss some data (or power 
> during erase/write. Any other case?).
> 
> What is more frequent is that bitflips > mtd->bitflip_threshold (by 
> default DIV_ROUND_UP(mtd->ecc_strength * 3, 4)), which should avoid 
> bitflips > ecc_strength

Yes, you're probably right. I'll add the unlikely keyword in the next
version.


> >>
> >> As additional optimization you may also check if the lower layer already
> >> did the check for you (e.g. if you have an iMXQP as we saw in latest
> >> days), but I think it's a minor one, because you'll face this situation
> >> very very unlikely.
> >
> > If the hardware is capable of doing such test (I mean counting the
> > number of bits to one and considering the page as erased under a given
> > limit of bitflips), there's a lot of chance it will implement its own
> > ecc_read_page function, and will never use this helper.
> >
> 
> Ops.. I misunderstand your patch. I think it was something similar to 
> what Brian already proposed some time ago [1].
> IIUC Brial solution works, out of the box, even with the ones that 
> override read_page callback, as I think most of current nand controller 
> do (please correct me if I'm wrong).
> If we want to add erased block check to omap2.c, atmel_nand.c, 
> sh_flctl.c we have to modify them all.
> 
> I'm really not the right one to make such a decision ;-) but I think you 
> already thought about it and can tell me the pros and cons of your patch 
> vs the Brian's one.
> 
> What I understand up until now, is that Brian solution does not fit into 
> all weird stuff that we find in single NAND controller implementation 
> and this is where your solution come in handy. Am I wrong?

That's exactly why I'm proposing it as helper functions instead of
trying to apply the erased page check for everybody.
Here are a few missing things to make the test applicable to everybody:
- some controllers are not implementing the read_page_raw function and
  thus checking ECC bytes is impossible
- some of them are not able to retrieve OOB bytes (or are only able to
  retrieve a small portion of it)
- some controllers do not properly define the ECC layout, which makes
  it impossible to check which ECC byte is assigned to which ECC chunk.
- some ECC controllers (like the GPMI one) do not align ECC data on a
  byte, which renders the erased check even more complicated

The other reason to not enforce this test for everybody is that some
controllers generate 0xff ECC bytes for 0xff data (the sama5d4 NAND
controller does), which means they are fully capable of correcting
bitflips in erased pages by their own, and if they report a read
failure, there's no need to check for an empty page, this is a real
failure.

By only providing helper functions, we let the NAND controller
drivers decide how to check it, but we're still providing common
functions instead of duplicating the same code in all drivers.

Best Regards,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions
  2015-07-31 13:40       ` Andrea Scian
@ 2015-07-31 14:10         ` Boris Brezillon
  2015-07-31 16:19           ` Andrea Scian
  0 siblings, 1 reply; 26+ messages in thread
From: Boris Brezillon @ 2015-07-31 14:10 UTC (permalink / raw)
  To: Andrea Scian
  Cc: linux-mtd, David Woodhouse, Brian Norris, linux-kernel, Han Xu

On Fri, 31 Jul 2015 15:40:13 +0200
Andrea Scian <rnd4@dave-tech.it> wrote:

> 
> Boris,
> 
> Il 31/07/2015 12:32, Boris Brezillon ha scritto:
> > Hi Andrea,
> >
> > Adding Han in Cc.
> >
> > On Fri, 31 Jul 2015 12:07:21 +0200
> > Andrea Scian <rnd4@dave-tech.it> wrote:
> >
> >>
> >> Dear Boris,
> >>
> >>
> >> Il 30/07/2015 19:34, Boris Brezillon ha scritto:
> >>> The default NAND read functions are relying on an underlying controller
> >>> to correct bitflips, but some of those controller cannot properly fix
> >>> bitflips in erased pages.
> >>> In case of ECC failures, check if the page of subpage is empty before
> >>> reporting an ECC failure.
> >>
> >> I'm still wondering if chip->ecc.strength is the right threshold.
> >>
> >> Did you see my comments here [1]? WDYT?
> >
> > Yes I've read it, and decided to go for ecc->strength as a first
> > step (I'm more interested in discussing the approach than the threshold
> > value right now ;-)).
> 
> I perfectly understand, that's the reason why I ask if you want to move 
> to another thread ;-)
> 
> > Anyway, as you pointed out in the thread, writing data on an erased
> > page already containing some bitflips might generate even more
> > bitflips, so using a different threshold for the erased page check
> > makes sense. This threshold should definitely be correlated to the ECC
> > strength, but how, that's the question.
> >
> > How about taking a rather conservative value like 10% of the specified
> > ECC strength, and see how it goes.
> 
> Yes, I think that there's no real way to get the right value, other than 
> feedbacks from on-field testing with various devices.
> 
> I'm also thinking about changing how a NAND page is written on the 
> device, now that we know that even erased page may have (too many!) 
> bitflips if they has not been so-freshly erased.
> 
> Read on NAND device is lot's faster that write, so maybe we can:
> 
> a) read the page before write it, check for bitflips on erased area and 
> write it only if it fit our threshold
> 
> b) read the page after write it and check if the bitflips are lower that 
> a give value
> 
> In this way:
> - we can use ecc_strength as read threshold, because it fits all the 
> other NAND read
> 
> - we can use "something a bit lower than" mtd->bitflip_threshold on 
> read-before-write or read-after-write. If we don't do so the block will 
> be scrubbed next time we read it again (if we are lucky.. if we are 
> unlucky the block will have bitflip > ecc_strength!): IOW we did a write 
> that will trigger another erase/write cycle.
> 
> Am I misunderstanding something?

Nope, but this implies doing an extra read after each write :-/

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions
  2015-07-31 14:10         ` Boris Brezillon
@ 2015-07-31 16:19           ` Andrea Scian
  2015-07-31 16:27             ` Boris Brezillon
  0 siblings, 1 reply; 26+ messages in thread
From: Andrea Scian @ 2015-07-31 16:19 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: linux-mtd, David Woodhouse, Brian Norris, linux-kernel, Han Xu

Il 31/07/2015 16:10, Boris Brezillon ha scritto:
> On Fri, 31 Jul 2015 15:40:13 +0200
> Andrea Scian <rnd4@dave-tech.it> wrote:
>
>>
>> Boris,
>>
>> Il 31/07/2015 12:32, Boris Brezillon ha scritto:
>>> Hi Andrea,
>>>
>>> Adding Han in Cc.
>>>
>>> On Fri, 31 Jul 2015 12:07:21 +0200
>>> Andrea Scian <rnd4@dave-tech.it> wrote:
>>>
>>>>
>>>> Dear Boris,
>>>>
>>>>
>>>> Il 30/07/2015 19:34, Boris Brezillon ha scritto:
>>>>> The default NAND read functions are relying on an underlying controller
>>>>> to correct bitflips, but some of those controller cannot properly fix
>>>>> bitflips in erased pages.
>>>>> In case of ECC failures, check if the page of subpage is empty before
>>>>> reporting an ECC failure.
>>>>
>>>> I'm still wondering if chip->ecc.strength is the right threshold.
>>>>
>>>> Did you see my comments here [1]? WDYT?
>>>
>>> Yes I've read it, and decided to go for ecc->strength as a first
>>> step (I'm more interested in discussing the approach than the threshold
>>> value right now ;-)).
>>
>> I perfectly understand, that's the reason why I ask if you want to move
>> to another thread ;-)
>>
>>> Anyway, as you pointed out in the thread, writing data on an erased
>>> page already containing some bitflips might generate even more
>>> bitflips, so using a different threshold for the erased page check
>>> makes sense. This threshold should definitely be correlated to the ECC
>>> strength, but how, that's the question.
>>>
>>> How about taking a rather conservative value like 10% of the specified
>>> ECC strength, and see how it goes.
>>
>> Yes, I think that there's no real way to get the right value, other than
>> feedbacks from on-field testing with various devices.
>>
>> I'm also thinking about changing how a NAND page is written on the
>> device, now that we know that even erased page may have (too many!)
>> bitflips if they has not been so-freshly erased.
>>
>> Read on NAND device is lot's faster that write, so maybe we can:
>>
>> a) read the page before write it, check for bitflips on erased area and
>> write it only if it fit our threshold
>>
>> b) read the page after write it and check if the bitflips are lower that
>> a give value
>>
>> In this way:
>> - we can use ecc_strength as read threshold, because it fits all the
>> other NAND read
>>
>> - we can use "something a bit lower than" mtd->bitflip_threshold on
>> read-before-write or read-after-write. If we don't do so the block will
>> be scrubbed next time we read it again (if we are lucky.. if we are
>> unlucky the block will have bitflip > ecc_strength!): IOW we did a write
>> that will trigger another erase/write cycle.
>>
>> Am I misunderstanding something?
>
> Nope, but this implies doing an extra read after each write :-/
>

Let's wait what the others says about this, but I would like to put some 
numbers in it.

My micron MLC device says
- read page max 75 uS
- write page typ 1300uS, max 2600uS

If we implement read-before-write (which is, IMO, the best approach), in 
the worst overhead we have is 1375uS vs 1300uS, which is ~6%.
Please note that, if you read a page that "is not suitable" for write, 
you avoid the write time, schedule it for scrubbing, and use another 
free page.

Probably I'm a bit optimistic because we also need to take in account 
other latencies (DMA setup, ECC engine, buffer copies and so on) but 
it's a starting point ;-)

KR,

-- 

Andrea SCIAN

DAVE Embedded Systems

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions
  2015-07-31 16:19           ` Andrea Scian
@ 2015-07-31 16:27             ` Boris Brezillon
  2015-08-03 11:16               ` Andrea Scian
  0 siblings, 1 reply; 26+ messages in thread
From: Boris Brezillon @ 2015-07-31 16:27 UTC (permalink / raw)
  To: Andrea Scian
  Cc: linux-mtd, David Woodhouse, Brian Norris, linux-kernel, Han Xu

On Fri, 31 Jul 2015 18:19:30 +0200
Andrea Scian <rnd4@dave-tech.it> wrote:

> Il 31/07/2015 16:10, Boris Brezillon ha scritto:
> > On Fri, 31 Jul 2015 15:40:13 +0200
> > Andrea Scian <rnd4@dave-tech.it> wrote:
> >
> >>
> >> Boris,
> >>
> >> Il 31/07/2015 12:32, Boris Brezillon ha scritto:
> >>> Hi Andrea,
> >>>
> >>> Adding Han in Cc.
> >>>
> >>> On Fri, 31 Jul 2015 12:07:21 +0200
> >>> Andrea Scian <rnd4@dave-tech.it> wrote:
> >>>
> >>>>
> >>>> Dear Boris,
> >>>>
> >>>>
> >>>> Il 30/07/2015 19:34, Boris Brezillon ha scritto:
> >>>>> The default NAND read functions are relying on an underlying controller
> >>>>> to correct bitflips, but some of those controller cannot properly fix
> >>>>> bitflips in erased pages.
> >>>>> In case of ECC failures, check if the page of subpage is empty before
> >>>>> reporting an ECC failure.
> >>>>
> >>>> I'm still wondering if chip->ecc.strength is the right threshold.
> >>>>
> >>>> Did you see my comments here [1]? WDYT?
> >>>
> >>> Yes I've read it, and decided to go for ecc->strength as a first
> >>> step (I'm more interested in discussing the approach than the threshold
> >>> value right now ;-)).
> >>
> >> I perfectly understand, that's the reason why I ask if you want to move
> >> to another thread ;-)
> >>
> >>> Anyway, as you pointed out in the thread, writing data on an erased
> >>> page already containing some bitflips might generate even more
> >>> bitflips, so using a different threshold for the erased page check
> >>> makes sense. This threshold should definitely be correlated to the ECC
> >>> strength, but how, that's the question.
> >>>
> >>> How about taking a rather conservative value like 10% of the specified
> >>> ECC strength, and see how it goes.
> >>
> >> Yes, I think that there's no real way to get the right value, other than
> >> feedbacks from on-field testing with various devices.
> >>
> >> I'm also thinking about changing how a NAND page is written on the
> >> device, now that we know that even erased page may have (too many!)
> >> bitflips if they has not been so-freshly erased.
> >>
> >> Read on NAND device is lot's faster that write, so maybe we can:
> >>
> >> a) read the page before write it, check for bitflips on erased area and
> >> write it only if it fit our threshold
> >>
> >> b) read the page after write it and check if the bitflips are lower that
> >> a give value
> >>
> >> In this way:
> >> - we can use ecc_strength as read threshold, because it fits all the
> >> other NAND read
> >>
> >> - we can use "something a bit lower than" mtd->bitflip_threshold on
> >> read-before-write or read-after-write. If we don't do so the block will
> >> be scrubbed next time we read it again (if we are lucky.. if we are
> >> unlucky the block will have bitflip > ecc_strength!): IOW we did a write
> >> that will trigger another erase/write cycle.
> >>
> >> Am I misunderstanding something?
> >
> > Nope, but this implies doing an extra read after each write :-/
> >
> 
> Let's wait what the others says about this, but I would like to put some 
> numbers in it.

Sure.

> 
> My micron MLC device says
> - read page max 75 uS
> - write page typ 1300uS, max 2600uS
> 
> If we implement read-before-write (which is, IMO, the best approach), in 
> the worst overhead we have is 1375uS vs 1300uS, which is ~6%.
> Please note that, if you read a page that "is not suitable" for write, 
> you avoid the write time, schedule it for scrubbing, and use another 
> free page.

Indeed, that's not such a big overhead.

> 
> Probably I'm a bit optimistic because we also need to take in account 
> other latencies (DMA setup, ECC engine, buffer copies and so on) but 
> it's a starting point ;-)

Yep, if you test it, could you provide some speed test results (with
and without this solution).

And I wonder if we shouldn't do it the other way around, write then
read-back and check the content.
Of course this implies doing the extra write even when the erased page
contains too many bitflips, but at least your sure that the data you
put in the page were correct at that time.


-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions
  2015-07-31 16:27             ` Boris Brezillon
@ 2015-08-03 11:16               ` Andrea Scian
  2015-08-03 12:42                 ` Boris Brezillon
  0 siblings, 1 reply; 26+ messages in thread
From: Andrea Scian @ 2015-08-03 11:16 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: linux-mtd, David Woodhouse, Brian Norris, linux-kernel, Han Xu


Dear Boris,

Il 31/07/2015 18:27, Boris Brezillon ha scritto:
> On Fri, 31 Jul 2015 18:19:30 +0200
> Andrea Scian <rnd4@dave-tech.it> wrote:
>
>> Il 31/07/2015 16:10, Boris Brezillon ha scritto:
>>> On Fri, 31 Jul 2015 15:40:13 +0200
>>> Andrea Scian <rnd4@dave-tech.it> wrote:
>>>
>>>> Boris,
>>>>
>>>> Il 31/07/2015 12:32, Boris Brezillon ha scritto:
>>>>> Hi Andrea,
>>>>>
>>>>> Adding Han in Cc.
>>>>>
>>>>> On Fri, 31 Jul 2015 12:07:21 +0200
>>>>> Andrea Scian <rnd4@dave-tech.it> wrote:
>>>>>
>>>>>> Dear Boris,
>>>>>>
>>>>>>
>>>>>> Il 30/07/2015 19:34, Boris Brezillon ha scritto:
>>>>>>> The default NAND read functions are relying on an underlying controller
>>>>>>> to correct bitflips, but some of those controller cannot properly fix
>>>>>>> bitflips in erased pages.
>>>>>>> In case of ECC failures, check if the page of subpage is empty before
>>>>>>> reporting an ECC failure.
>>>>>> I'm still wondering if chip->ecc.strength is the right threshold.
>>>>>>
>>>>>> Did you see my comments here [1]? WDYT?
>>>>> Yes I've read it, and decided to go for ecc->strength as a first
>>>>> step (I'm more interested in discussing the approach than the threshold
>>>>> value right now ;-)).
>>>> I perfectly understand, that's the reason why I ask if you want to move
>>>> to another thread ;-)
>>>>
>>>>> Anyway, as you pointed out in the thread, writing data on an erased
>>>>> page already containing some bitflips might generate even more
>>>>> bitflips, so using a different threshold for the erased page check
>>>>> makes sense. This threshold should definitely be correlated to the ECC
>>>>> strength, but how, that's the question.
>>>>>
>>>>> How about taking a rather conservative value like 10% of the specified
>>>>> ECC strength, and see how it goes.
>>>> Yes, I think that there's no real way to get the right value, other than
>>>> feedbacks from on-field testing with various devices.
>>>>
>>>> I'm also thinking about changing how a NAND page is written on the
>>>> device, now that we know that even erased page may have (too many!)
>>>> bitflips if they has not been so-freshly erased.
>>>>
>>>> Read on NAND device is lot's faster that write, so maybe we can:
>>>>
>>>> a) read the page before write it, check for bitflips on erased area and
>>>> write it only if it fit our threshold
>>>>
>>>> b) read the page after write it and check if the bitflips are lower that
>>>> a give value
>>>>
>>>> In this way:
>>>> - we can use ecc_strength as read threshold, because it fits all the
>>>> other NAND read
>>>>
>>>> - we can use "something a bit lower than" mtd->bitflip_threshold on
>>>> read-before-write or read-after-write. If we don't do so the block will
>>>> be scrubbed next time we read it again (if we are lucky.. if we are
>>>> unlucky the block will have bitflip > ecc_strength!): IOW we did a write
>>>> that will trigger another erase/write cycle.
>>>>
>>>> Am I misunderstanding something?
>>> Nope, but this implies doing an extra read after each write :-/
>>>
>> Let's wait what the others says about this, but I would like to put some
>> numbers in it.
> Sure.
>
>> My micron MLC device says
>> - read page max 75 uS
>> - write page typ 1300uS, max 2600uS
>>
>> If we implement read-before-write (which is, IMO, the best approach), in
>> the worst overhead we have is 1375uS vs 1300uS, which is ~6%.
>> Please note that, if you read a page that "is not suitable" for write,
>> you avoid the write time, schedule it for scrubbing, and use another
>> free page.
> Indeed, that's not such a big overhead.
>
>> Probably I'm a bit optimistic because we also need to take in account
>> other latencies (DMA setup, ECC engine, buffer copies and so on) but
>> it's a starting point ;-)
> Yep, if you test it, could you provide some speed test results (with
> and without this solution).

I think I can find some time to do some performance tests on real hardware.
Can you please help me in finding:
- which benchmark to use (currently I'm using bonnie++ on UBIFS, maybe I 
can you just mtd_speedtest)
- where to implement those read

For the second point I think we can implement it a UBI or MTD level.
I think the former will allow us to easily schedule scrubbing and choose 
another block to issue the write to. However I don't really know how to 
implement it (I don't really know so much about the UBI code).

The latter, at least for me, is easier to implement: I think I can find 
the place to add the page read on my own, anyway any clue is welcome ;-)
But I think it will be harder (or impossible) to choose where to issue 
the write, unless UBI already do so when it saw an MTD write failure.

> And I wonder if we shouldn't do it the other way around, write then
> read-back and check the content.
> Of course this implies doing the extra write even when the erased page
> contains too many bitflips, but at least your sure that the data you
> put in the page were correct at that time.
>

You're right, I think this is something that once we can find inside the 
MTD code (something like "check NAND written page" kconfig option) but I 
cannot find this option anymore on latest kernel.

You're approach will also have another advantage, currently nearly all 
platform will use software implementation for ECC check on erased 
blocks, while nearly all of them has hardware ECC once programmed. Using 
hardware ECC will remove CPU load and, maybe, be faster than call 
nand_check_erased_ecc_chunk()
I also think that the situation of having failure on write is very 
unlikely, unless you have a very "used" NAND device or you're not using 
Richard's bitrot check. So we'll have a performance impact (issuing 
another write) only when it's really needed.

Kind Regards,

-- 

Andrea SCIAN

DAVE Embedded Systems


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions
  2015-08-03 11:16               ` Andrea Scian
@ 2015-08-03 12:42                 ` Boris Brezillon
  2015-08-03 13:39                   ` Andrea Scian
  0 siblings, 1 reply; 26+ messages in thread
From: Boris Brezillon @ 2015-08-03 12:42 UTC (permalink / raw)
  To: Andrea Scian
  Cc: linux-mtd, David Woodhouse, Brian Norris, linux-kernel, Han Xu,
	Richard Weinberger, Artem Bityutskiy

Adding Artem and Richard in the loop.

On Mon, 3 Aug 2015 13:16:02 +0200
Andrea Scian <rnd4@dave-tech.it> wrote:

> 
> Dear Boris,
> 
> Il 31/07/2015 18:27, Boris Brezillon ha scritto:
> > On Fri, 31 Jul 2015 18:19:30 +0200
> > Andrea Scian <rnd4@dave-tech.it> wrote:
> >
> >> Il 31/07/2015 16:10, Boris Brezillon ha scritto:
> >>> On Fri, 31 Jul 2015 15:40:13 +0200
> >>> Andrea Scian <rnd4@dave-tech.it> wrote:
> >>>
> >>>> Boris,
> >>>>
> >>>> Il 31/07/2015 12:32, Boris Brezillon ha scritto:
> >>>>> Hi Andrea,
> >>>>>
> >>>>> Adding Han in Cc.
> >>>>>
> >>>>> On Fri, 31 Jul 2015 12:07:21 +0200
> >>>>> Andrea Scian <rnd4@dave-tech.it> wrote:
> >>>>>
> >>>>>> Dear Boris,
> >>>>>>
> >>>>>>
> >>>>>> Il 30/07/2015 19:34, Boris Brezillon ha scritto:
> >>>>>>> The default NAND read functions are relying on an underlying controller
> >>>>>>> to correct bitflips, but some of those controller cannot properly fix
> >>>>>>> bitflips in erased pages.
> >>>>>>> In case of ECC failures, check if the page of subpage is empty before
> >>>>>>> reporting an ECC failure.
> >>>>>> I'm still wondering if chip->ecc.strength is the right threshold.
> >>>>>>
> >>>>>> Did you see my comments here [1]? WDYT?
> >>>>> Yes I've read it, and decided to go for ecc->strength as a first
> >>>>> step (I'm more interested in discussing the approach than the threshold
> >>>>> value right now ;-)).
> >>>> I perfectly understand, that's the reason why I ask if you want to move
> >>>> to another thread ;-)
> >>>>
> >>>>> Anyway, as you pointed out in the thread, writing data on an erased
> >>>>> page already containing some bitflips might generate even more
> >>>>> bitflips, so using a different threshold for the erased page check
> >>>>> makes sense. This threshold should definitely be correlated to the ECC
> >>>>> strength, but how, that's the question.
> >>>>>
> >>>>> How about taking a rather conservative value like 10% of the specified
> >>>>> ECC strength, and see how it goes.
> >>>> Yes, I think that there's no real way to get the right value, other than
> >>>> feedbacks from on-field testing with various devices.
> >>>>
> >>>> I'm also thinking about changing how a NAND page is written on the
> >>>> device, now that we know that even erased page may have (too many!)
> >>>> bitflips if they has not been so-freshly erased.
> >>>>
> >>>> Read on NAND device is lot's faster that write, so maybe we can:
> >>>>
> >>>> a) read the page before write it, check for bitflips on erased area and
> >>>> write it only if it fit our threshold
> >>>>
> >>>> b) read the page after write it and check if the bitflips are lower that
> >>>> a give value
> >>>>
> >>>> In this way:
> >>>> - we can use ecc_strength as read threshold, because it fits all the
> >>>> other NAND read
> >>>>
> >>>> - we can use "something a bit lower than" mtd->bitflip_threshold on
> >>>> read-before-write or read-after-write. If we don't do so the block will
> >>>> be scrubbed next time we read it again (if we are lucky.. if we are
> >>>> unlucky the block will have bitflip > ecc_strength!): IOW we did a write
> >>>> that will trigger another erase/write cycle.
> >>>>
> >>>> Am I misunderstanding something?
> >>> Nope, but this implies doing an extra read after each write :-/
> >>>
> >> Let's wait what the others says about this, but I would like to put some
> >> numbers in it.
> > Sure.
> >
> >> My micron MLC device says
> >> - read page max 75 uS
> >> - write page typ 1300uS, max 2600uS
> >>
> >> If we implement read-before-write (which is, IMO, the best approach), in
> >> the worst overhead we have is 1375uS vs 1300uS, which is ~6%.
> >> Please note that, if you read a page that "is not suitable" for write,
> >> you avoid the write time, schedule it for scrubbing, and use another
> >> free page.
> > Indeed, that's not such a big overhead.
> >
> >> Probably I'm a bit optimistic because we also need to take in account
> >> other latencies (DMA setup, ECC engine, buffer copies and so on) but
> >> it's a starting point ;-)
> > Yep, if you test it, could you provide some speed test results (with
> > and without this solution).
> 
> I think I can find some time to do some performance tests on real hardware.
> Can you please help me in finding:
> - which benchmark to use (currently I'm using bonnie++ on UBIFS, maybe I 
> can you just mtd_speedtest)
> - where to implement those read

I think the test should be done at the UBI layer if we want to check
the real impact of the additional read sequence, but given the answer I
gave to your other question I'm not sure this is relevant anymore ;-).

> 
> For the second point I think we can implement it a UBI or MTD level.
> I think the former will allow us to easily schedule scrubbing and choose 
> another block to issue the write to. However I don't really know how to 
> implement it (I don't really know so much about the UBI code).
> 
> The latter, at least for me, is easier to implement: I think I can find 
> the place to add the page read on my own, anyway any clue is welcome ;-)
> But I think it will be harder (or impossible) to choose where to issue 
> the write, unless UBI already do so when it saw an MTD write failure.
> 
> > And I wonder if we shouldn't do it the other way around, write then
> > read-back and check the content.
> > Of course this implies doing the extra write even when the erased page
> > contains too many bitflips, but at least your sure that the data you
> > put in the page were correct at that time.
> >
> 
> You're right, I think this is something that once we can find inside the 
> MTD code (something like "check NAND written page" kconfig option) but I 
> cannot find this option anymore on latest kernel.
> 
> You're approach will also have another advantage, currently nearly all 
> platform will use software implementation for ECC check on erased 
> blocks, while nearly all of them has hardware ECC once programmed. Using 
> hardware ECC will remove CPU load and, maybe, be faster than call 
> nand_check_erased_ecc_chunk()
> I also think that the situation of having failure on write is very 
> unlikely, unless you have a very "used" NAND device or you're not using 
> Richard's bitrot check. So we'll have a performance impact (issuing 
> another write) only when it's really needed.

I didn't check before suggesting that, but it seems that the UBI layer
is already doing this check for you [1], so if you're using UBI/UBIFS
you shouldn't worry about bitflips in erased pages: if there is any,
and their presence impact the write result, they should be detected.
AFAICT, the only thing that is not checked is whether the number of
bitflips after a write exceed the bitflips threshold or not, and I
guess this can be added.

Best Regards,

Boris

[1]http://lxr.free-electrons.com/source/drivers/mtd/ubi/io.c#L294

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions
  2015-08-03 12:42                 ` Boris Brezillon
@ 2015-08-03 13:39                   ` Andrea Scian
  2015-08-03 19:32                     ` Richard Weinberger
  0 siblings, 1 reply; 26+ messages in thread
From: Andrea Scian @ 2015-08-03 13:39 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: linux-mtd, David Woodhouse, Brian Norris, linux-kernel, Han Xu,
	Richard Weinberger, Artem Bityutskiy

Il 03/08/2015 14:42, Boris Brezillon ha scritto:
> Adding Artem and Richard in the loop.

thanks ;-)

>
> On Mon, 3 Aug 2015 13:16:02 +0200
> Andrea Scian <rnd4@dave-tech.it> wrote:
>
>>
>> Dear Boris,
>>
>> Il 31/07/2015 18:27, Boris Brezillon ha scritto:
>>> On Fri, 31 Jul 2015 18:19:30 +0200
>>> Andrea Scian <rnd4@dave-tech.it> wrote:
>>>
>>>> Il 31/07/2015 16:10, Boris Brezillon ha scritto:
>>>>> On Fri, 31 Jul 2015 15:40:13 +0200
>>>>> Andrea Scian <rnd4@dave-tech.it> wrote:
>>>>>
>>>>>> Boris,
>>>>>>
>>>>>> Il 31/07/2015 12:32, Boris Brezillon ha scritto:
>>>>>>> Hi Andrea,
>>>>>>>
>>>>>>> Adding Han in Cc.
>>>>>>>
>>>>>>> On Fri, 31 Jul 2015 12:07:21 +0200
>>>>>>> Andrea Scian <rnd4@dave-tech.it> wrote:
>>>>>>>
>>>>>>>> Dear Boris,
>>>>>>>>
>>>>>>>>
>>>>>>>> Il 30/07/2015 19:34, Boris Brezillon ha scritto:
>>>>>>>>> The default NAND read functions are relying on an underlying controller
>>>>>>>>> to correct bitflips, but some of those controller cannot properly fix
>>>>>>>>> bitflips in erased pages.
>>>>>>>>> In case of ECC failures, check if the page of subpage is empty before
>>>>>>>>> reporting an ECC failure.
>>>>>>>> I'm still wondering if chip->ecc.strength is the right threshold.
>>>>>>>>
>>>>>>>> Did you see my comments here [1]? WDYT?
>>>>>>> Yes I've read it, and decided to go for ecc->strength as a first
>>>>>>> step (I'm more interested in discussing the approach than the threshold
>>>>>>> value right now ;-)).
>>>>>> I perfectly understand, that's the reason why I ask if you want to move
>>>>>> to another thread ;-)
>>>>>>
>>>>>>> Anyway, as you pointed out in the thread, writing data on an erased
>>>>>>> page already containing some bitflips might generate even more
>>>>>>> bitflips, so using a different threshold for the erased page check
>>>>>>> makes sense. This threshold should definitely be correlated to the ECC
>>>>>>> strength, but how, that's the question.
>>>>>>>
>>>>>>> How about taking a rather conservative value like 10% of the specified
>>>>>>> ECC strength, and see how it goes.
>>>>>> Yes, I think that there's no real way to get the right value, other than
>>>>>> feedbacks from on-field testing with various devices.
>>>>>>
>>>>>> I'm also thinking about changing how a NAND page is written on the
>>>>>> device, now that we know that even erased page may have (too many!)
>>>>>> bitflips if they has not been so-freshly erased.
>>>>>>
>>>>>> Read on NAND device is lot's faster that write, so maybe we can:
>>>>>>
>>>>>> a) read the page before write it, check for bitflips on erased area and
>>>>>> write it only if it fit our threshold
>>>>>>
>>>>>> b) read the page after write it and check if the bitflips are lower that
>>>>>> a give value
>>>>>>
>>>>>> In this way:
>>>>>> - we can use ecc_strength as read threshold, because it fits all the
>>>>>> other NAND read
>>>>>>
>>>>>> - we can use "something a bit lower than" mtd->bitflip_threshold on
>>>>>> read-before-write or read-after-write. If we don't do so the block will
>>>>>> be scrubbed next time we read it again (if we are lucky.. if we are
>>>>>> unlucky the block will have bitflip > ecc_strength!): IOW we did a write
>>>>>> that will trigger another erase/write cycle.
>>>>>>
>>>>>> Am I misunderstanding something?
>>>>> Nope, but this implies doing an extra read after each write :-/
>>>>>
>>>> Let's wait what the others says about this, but I would like to put some
>>>> numbers in it.
>>> Sure.
>>>
>>>> My micron MLC device says
>>>> - read page max 75 uS
>>>> - write page typ 1300uS, max 2600uS
>>>>
>>>> If we implement read-before-write (which is, IMO, the best approach), in
>>>> the worst overhead we have is 1375uS vs 1300uS, which is ~6%.
>>>> Please note that, if you read a page that "is not suitable" for write,
>>>> you avoid the write time, schedule it for scrubbing, and use another
>>>> free page.
>>> Indeed, that's not such a big overhead.
>>>
>>>> Probably I'm a bit optimistic because we also need to take in account
>>>> other latencies (DMA setup, ECC engine, buffer copies and so on) but
>>>> it's a starting point ;-)
>>> Yep, if you test it, could you provide some speed test results (with
>>> and without this solution).
>>
>> I think I can find some time to do some performance tests on real hardware.
>> Can you please help me in finding:
>> - which benchmark to use (currently I'm using bonnie++ on UBIFS, maybe I
>> can you just mtd_speedtest)
>> - where to implement those read
>
> I think the test should be done at the UBI layer if we want to check
> the real impact of the additional read sequence, but given the answer I
> gave to your other question I'm not sure this is relevant anymore ;-).
>
>>
>> For the second point I think we can implement it a UBI or MTD level.
>> I think the former will allow us to easily schedule scrubbing and choose
>> another block to issue the write to. However I don't really know how to
>> implement it (I don't really know so much about the UBI code).
>>
>> The latter, at least for me, is easier to implement: I think I can find
>> the place to add the page read on my own, anyway any clue is welcome ;-)
>> But I think it will be harder (or impossible) to choose where to issue
>> the write, unless UBI already do so when it saw an MTD write failure.
>>
>>> And I wonder if we shouldn't do it the other way around, write then
>>> read-back and check the content.
>>> Of course this implies doing the extra write even when the erased page
>>> contains too many bitflips, but at least your sure that the data you
>>> put in the page were correct at that time.
>>>
>>
>> You're right, I think this is something that once we can find inside the
>> MTD code (something like "check NAND written page" kconfig option) but I
>> cannot find this option anymore on latest kernel.
>>
>> You're approach will also have another advantage, currently nearly all
>> platform will use software implementation for ECC check on erased
>> blocks, while nearly all of them has hardware ECC once programmed. Using
>> hardware ECC will remove CPU load and, maybe, be faster than call
>> nand_check_erased_ecc_chunk()
>> I also think that the situation of having failure on write is very
>> unlikely, unless you have a very "used" NAND device or you're not using
>> Richard's bitrot check. So we'll have a performance impact (issuing
>> another write) only when it's really needed.
>
> I didn't check before suggesting that, but it seems that the UBI layer
> is already doing this check for you [1], so if you're using UBI/UBIFS
> you shouldn't worry about bitflips in erased pages: if there is any,
> and their presence impact the write result, they should be detected.
> AFAICT, the only thing that is not checked is whether the number of
> bitflips after a write exceed the bitflips threshold or not, and I
> guess this can be added.

IIUC this is a runtime debug check

if (!ubi_dbg_chk_io(ubi))
	....

And thus is disabled by default.
Anyway, this answer my missing "check NAND written page" question above ;-)

IIUC (again) /sys/kernel/debug/ubi/ubi0/chk_io enable this and many 
other runtime check, so it's only partially useful to have a first raw 
approach to performance impact.

And, yes, those functions only check for valid data, not about bitflip 
counts.

Kind Regards,

-- 

Andrea SCIAN

DAVE Embedded Systems

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions
  2015-08-03 13:39                   ` Andrea Scian
@ 2015-08-03 19:32                     ` Richard Weinberger
  2015-08-04  7:02                       ` Andrea Scian
  0 siblings, 1 reply; 26+ messages in thread
From: Richard Weinberger @ 2015-08-03 19:32 UTC (permalink / raw)
  To: Andrea Scian, Boris Brezillon
  Cc: linux-mtd, David Woodhouse, Brian Norris, linux-kernel, Han Xu,
	Artem Bityutskiy

Am 03.08.2015 um 15:39 schrieb Andrea Scian:
>>> I think I can find some time to do some performance tests on real hardware.
>>> Can you please help me in finding:
>>> - which benchmark to use (currently I'm using bonnie++ on UBIFS, maybe I
>>> can you just mtd_speedtest)
>>> - where to implement those read
>>
>> I think the test should be done at the UBI layer if we want to check
>> the real impact of the additional read sequence, but given the answer I
>> gave to your other question I'm not sure this is relevant anymore ;-).

I'm not sure whether introducing a read-before-write check is the best solution.
At least we need hard numbers for slow/old SLC NANDs too.

We has such checks already and got rid of them.
commit 657f28f8811c92724db10d18bbbec70d540147d6
Author: Huang Shijie <shijie8@gmail.com>
Date:   Tue Aug 14 22:38:45 2012 -0400

    mtd: kill MTD_NAND_VERIFY_WRITE


Although the goal of 657f28f8 was something else.

In general I don't think putting much MTD/ECC logic into UBI is the way to go.
UBI is a layer above MTD and MTD should do as much as possible wrt. ECC.

>>>
>>> For the second point I think we can implement it a UBI or MTD level.
>>> I think the former will allow us to easily schedule scrubbing and choose
>>> another block to issue the write to. However I don't really know how to
>>> implement it (I don't really know so much about the UBI code).

Implementing this is not much work.
I've done such hacks for various customers to hunt down hardware issues.

>> I didn't check before suggesting that, but it seems that the UBI layer
>> is already doing this check for you [1], so if you're using UBI/UBIFS
>> you shouldn't worry about bitflips in erased pages: if there is any,
>> and their presence impact the write result, they should be detected.
>> AFAICT, the only thing that is not checked is whether the number of
>> bitflips after a write exceed the bitflips threshold or not, and I
>> guess this can be added.
> 
> IIUC this is a runtime debug check
> 
> if (!ubi_dbg_chk_io(ubi))
>     ....
> 
> And thus is disabled by default.

That's correct.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions
  2015-08-03 19:32                     ` Richard Weinberger
@ 2015-08-04  7:02                       ` Andrea Scian
  2015-08-04  7:21                         ` Richard Weinberger
  0 siblings, 1 reply; 26+ messages in thread
From: Andrea Scian @ 2015-08-04  7:02 UTC (permalink / raw)
  To: Richard Weinberger, Boris Brezillon
  Cc: linux-mtd, David Woodhouse, Brian Norris, linux-kernel, Han Xu,
	Artem Bityutskiy


Richard,

Il 03/08/2015 21:32, Richard Weinberger ha scritto:
> Am 03.08.2015 um 15:39 schrieb Andrea Scian:
>>>> I think I can find some time to do some performance tests on real hardware.
>>>> Can you please help me in finding:
>>>> - which benchmark to use (currently I'm using bonnie++ on UBIFS, maybe I
>>>> can you just mtd_speedtest)
>>>> - where to implement those read
>>>
>>> I think the test should be done at the UBI layer if we want to check
>>> the real impact of the additional read sequence, but given the answer I
>>> gave to your other question I'm not sure this is relevant anymore ;-).
>
> I'm not sure whether introducing a read-before-write check is the best solution.
> At least we need hard numbers for slow/old SLC NANDs too.

We can enable the feature only for MLC, AFAIK it has not been required 
for old SLC ;-)

Anyway, maybe I can do some performance test if you point me to the 
right userspace tool to use.
As I already say I'm using bonnie++ to stress the device, more from a 
stability than from performance point of view.
I'm also used to run mtd_speedtest but this may be useless if we put 
some code inside the upper layers.

> We has such checks already and got rid of them.
> commit 657f28f8811c92724db10d18bbbec70d540147d6
> Author: Huang Shijie <shijie8@gmail.com>
> Date:   Tue Aug 14 22:38:45 2012 -0400
>
>      mtd: kill MTD_NAND_VERIFY_WRITE
>
>
> Although the goal of 657f28f8 was something else.

Understood, thanks for point this out

>
> In general I don't think putting much MTD/ECC logic into UBI is the way to go.
> UBI is a layer above MTD and MTD should do as much as possible wrt. ECC.
>
>>>>
>>>> For the second point I think we can implement it a UBI or MTD level.
>>>> I think the former will allow us to easily schedule scrubbing and choose
>>>> another block to issue the write to. However I don't really know how to
>>>> implement it (I don't really know so much about the UBI code).
>
> Implementing this is not much work.
> I've done such hacks for various customers to hunt down hardware issues.
>
>>> I didn't check before suggesting that, but it seems that the UBI layer
>>> is already doing this check for you [1], so if you're using UBI/UBIFS
>>> you shouldn't worry about bitflips in erased pages: if there is any,
>>> and their presence impact the write result, they should be detected.
>>> AFAICT, the only thing that is not checked is whether the number of
>>> bitflips after a write exceed the bitflips threshold or not, and I
>>> guess this can be added.
>>
>> IIUC this is a runtime debug check
>>
>> if (!ubi_dbg_chk_io(ubi))
>>      ....
>>
>> And thus is disabled by default.
>
> That's correct.

Thanks.
In your opinion, enabling chk_io is correct to rough estimate the 
overhead or does it enable too much checks?

TIA,

-- 

Andrea SCIAN

DAVE Embedded Systems

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions
  2015-08-04  7:02                       ` Andrea Scian
@ 2015-08-04  7:21                         ` Richard Weinberger
  2015-08-06  4:28                           ` Andrea Scian
  2015-08-06  9:19                           ` Boris Brezillon
  0 siblings, 2 replies; 26+ messages in thread
From: Richard Weinberger @ 2015-08-04  7:21 UTC (permalink / raw)
  To: Andrea Scian, Boris Brezillon
  Cc: linux-mtd, David Woodhouse, Brian Norris, linux-kernel, Han Xu,
	Artem Bityutskiy

Andrea,

Am 04.08.2015 um 09:02 schrieb Andrea Scian:
>> I'm not sure whether introducing a read-before-write check is the best solution.
>> At least we need hard numbers for slow/old SLC NANDs too.
> 
> We can enable the feature only for MLC, AFAIK it has not been required for old SLC ;-)

I think this needs more discussion.

Boris, Brian, will you be at Embedded Linux Conference Europe in Dublin?
Maybe we can discuss these issues (data retention, ff-checks, etc...) in person and figure out where to address them.
I really want to avoid ad-hoc solutions. :)

> Thanks.
> In your opinion, enabling chk_io is correct to rough estimate the overhead or does it enable too much checks?

You mean the other checks bedside of self_check_write()? You can comment them out for your tests.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 1/2] mtd: nand: add nand_check_erased helper functions
  2015-07-31  7:10   ` Boris Brezillon
  2015-07-31 10:06     ` Andrea Scian
@ 2015-08-04 15:42     ` Andrea Scian
  2015-08-04 16:27         ` Boris Brezillon
  1 sibling, 1 reply; 26+ messages in thread
From: Andrea Scian @ 2015-08-04 15:42 UTC (permalink / raw)
  To: linux-mtd, Boris Brezillon; +Cc: David Woodhouse, Brian Norris, linux-kernel


Boris,

sorry for the later review.
I'm trying to run your patch on our systems. See below

Il 31/07/2015 09:10, Boris Brezillon ha scritto:
> On Thu, 30 Jul 2015 19:34:53 +0200
> Boris Brezillon <boris.brezillon@free-electrons.com> wrote:
>
>> Add two helper functions to help NAND controller drivers test whether a
>> specific NAND region is erased or not.
>>
>> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
>> ---
>>   drivers/mtd/nand/nand_base.c | 104 +++++++++++++++++++++++++++++++++++++++++++
>>   include/linux/mtd/nand.h     |   8 ++++
>>   2 files changed, 112 insertions(+)
>>
>> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
>> index ceb68ca..1542ea7 100644
>> --- a/drivers/mtd/nand/nand_base.c
>> +++ b/drivers/mtd/nand/nand_base.c
>> @@ -1101,6 +1101,110 @@ out:
>>   EXPORT_SYMBOL(nand_lock);
>>
>>   /**
>> + * nand_check_erased_buf - check if a buffer contains (almost) only 0xff data
>> + * @buf: buffer to test
>> + * @len: buffer length
>> + * @bitflips_threshold:maximum number of bitflips
>> + *
>> + * Check if a buffer contains only 0xff, which means the underlying region
>> + * has been erased and is ready to be programmed.
>> + * The bitflips_threshold specify the maximum number of bitflips before
>> + * considering the region is not erased.
>> + * Note: The logic of this function has been extracted from the memweight
>> + * implementation, except that nand_check_erased_buf function exit before
>> + * testing the whole buffer if the number of bitflips exceed the
>> + * bitflips_threshold value.
>> + *
>> + * Returns a positive number of bitflips or -ERROR_CODE.
>> + */
>> +int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
>> +{
>> +	const unsigned char *bitmap = buf;
>> +	int bitflips = 0;
>> +	int weight;
>> +	int longs;
>> +
>> +	for (; len && ((unsigned long)bitmap) % sizeof(long);
>> +	     len--, bitmap++) {
>> +		weight = hweight8(*bitmap);
>> +
>> +		bitflips += sizeof(u8) - weight;

I think the above like should be

bitflips += sizeof(u8)*BITS_PER_BYTE - weight;


>> +		if (bitflips > bitflips_threshold)
>> +			return -EINVAL;
>> +	}
>> +
>> +
>> +	for (longs = len / sizeof(long); longs;
>> +	     longs--, bitmap += sizeof(long)) {
>> +		BUG_ON(longs >= INT_MAX / BITS_PER_LONG);
>> +		weight = hweight_long(*((unsigned long *)bitmap));
>> +
>> +		bitflips += sizeof(long) - weight;

as above:

bitflips += sizeof(long)*BITS_PER_BYTE - weight;

>> +		if (bitflips > bitflips_threshold)
>> +			return -EINVAL;
>> +	}
>> +
>> +	len %= sizeof(long);
>> +
>> +	for (; len > 0; len--, bitmap++) {
>> +		weight = hweight8(*bitmap);
>> +		bitflips += sizeof(u8) - weight;


as above:

bitflips += sizeof(u8)*BITS_PER_BYTE - weight;


>> +	}
>> +
>> +	return bitflips;
>> +}
>> +EXPORT_SYMBOL(nand_check_erased_buf);
>> +
>> +/**
>> + * nand_check_erased_ecc_chunk - check if an ECC chunk contains (almost) only
>> + *				 0xff data
>> + * @data: data buffer to test
>> + * @datalen: data length
>> + * @ecc: ECC buffer
>> + * @ecclen: ECC length
>> + * @extraoob: extra OOB buffer
>> + * @extraooblen: extra OOB length
>> + * @bitflips_threshold: maximum number of bitflips
>> + *
>> + * Check if a data buffer and its associated ECC and OOB data contains only
>> + * 0xff pattern, which means the underlying region has been erased and is
>> + * ready to be programmed.
>> + * The bitflips_threshold specify the maximum number of bitflips before
>> + * considering the region as not erased.
>> + *
>> + * Returns a positive number of bitflips or -ERROR_CODE.
>> + */
>> +int nand_check_erased_ecc_chunk(void *data, int datalen,
>> +				void *ecc, int ecclen,
>> +				void *extraoob, int extraooblen,
>> +				int bitflips_threshold)
>> +{
>> +	int bitflips = 0;
>> +	int ret;
>> +
>> +	ret = nand_check_erased_buf(data, datalen, bitflips_threshold);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	bitflips += ret;
>> +	bitflips_threshold -= ret;
>> +
>> +	ret = nand_check_erased_buf(ecc, ecclen, bitflips_threshold);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	bitflips += ret;
>> +	bitflips_threshold -= ret;
>> +
>> +	ret = nand_check_erased_buf(extraoob, extraooblen, bitflips_threshold);
>> +	if (ret < 0)
>> +		return ret;
>> +
>
> Forgot the memset operations here:
>
> 	memset(data, 0xff, datalen);
> 	memset(ecc, 0xff, ecclen);
> 	memset(extraoob, 0xff, extraooblen);
>
>> +	return bitflips + ret;
>> +}
>> +EXPORT_SYMBOL(nand_check_erased_ecc_chunk);
>> +
>> +/**
>>    * nand_read_page_raw - [INTERN] read raw page data without ecc
>>    * @mtd: mtd info structure
>>    * @chip: nand chip info structure
>> diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
>> index 272f429..ae06a07 100644
>> --- a/include/linux/mtd/nand.h
>> +++ b/include/linux/mtd/nand.h
>> @@ -1030,4 +1030,12 @@ struct nand_sdr_timings {
>>
>>   /* get timing characteristics from ONFI timing mode. */
>>   const struct nand_sdr_timings *onfi_async_timing_mode_to_sdr_timings(int mode);
>> +
>> +int nand_check_erased_buf(void *data, int datalen,
>> +			  int threshold);
>> +
>> +int nand_check_erased_ecc_chunk(void *data, int datalen,
>> +				void *ecc, int ecclen,
>> +				void *extraoob, int extraooblen,
>> +				int threshold);
>>   #endif /* __LINUX_MTD_NAND_H */
>
>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 1/2] mtd: nand: add nand_check_erased helper functions
  2015-08-04 15:42     ` Andrea Scian
@ 2015-08-04 16:27         ` Boris Brezillon
  0 siblings, 0 replies; 26+ messages in thread
From: Boris Brezillon @ 2015-08-04 16:27 UTC (permalink / raw)
  To: Andrea Scian, linux-mtd; +Cc: David Woodhouse, Brian Norris, linux-kernel

Hi Andrea

Le 4 août 2015 17:42:43 GMT+02:00, Andrea Scian <rnd4@dave-tech.it> a écrit :
>
>>> +
>>> +		bitflips += sizeof(long) - weight;
>
>as above:
>
>bitflips += sizeof(long)*BITS_PER_BYTE - weight;

Indeed, or we could just replace sizeof(u8) by BITS_PER_BYTE and sizeof(long) by BITS_PER_LONG.

I'll fix that in the next version.

Note that I didn't test the series on a real platform and won't be able to do so for the next two weeks, so you might find other inconsistencies/bugs.

Thanks for testing it.

Best regards,

Boris

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 1/2] mtd: nand: add nand_check_erased helper functions
@ 2015-08-04 16:27         ` Boris Brezillon
  0 siblings, 0 replies; 26+ messages in thread
From: Boris Brezillon @ 2015-08-04 16:27 UTC (permalink / raw)
  To: Andrea Scian, linux-mtd; +Cc: David Woodhouse, Brian Norris, linux-kernel

Hi Andrea

Le 4 août 2015 17:42:43 GMT+02:00, Andrea Scian <rnd4@dave-tech.it> a écrit :
>
>>> +
>>> +		bitflips += sizeof(long) - weight;
>
>as above:
>
>bitflips += sizeof(long)*BITS_PER_BYTE - weight;

Indeed, or we could just replace sizeof(u8) by BITS_PER_BYTE and sizeof(long) by BITS_PER_LONG.

I'll fix that in the next version.

Note that I didn't test the series on a real platform and won't be able to do so for the next two weeks, so you might find other inconsistencies/bugs.

Thanks for testing it.

Best regards,

Boris

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions
  2015-08-04  7:21                         ` Richard Weinberger
@ 2015-08-06  4:28                           ` Andrea Scian
  2015-08-06  9:19                           ` Boris Brezillon
  1 sibling, 0 replies; 26+ messages in thread
From: Andrea Scian @ 2015-08-06  4:28 UTC (permalink / raw)
  To: Richard Weinberger, Boris Brezillon
  Cc: linux-mtd, David Woodhouse, Brian Norris, linux-kernel, Han Xu,
	Artem Bityutskiy

Il 04/08/2015 09:21, Richard Weinberger ha scritto:
> Andrea,
>
> Am 04.08.2015 um 09:02 schrieb Andrea Scian:
>>> I'm not sure whether introducing a read-before-write check is the best solution.
>>> At least we need hard numbers for slow/old SLC NANDs too.
>>
>> We can enable the feature only for MLC, AFAIK it has not been required for old SLC ;-)
>
> I think this needs more discussion.
>
> Boris, Brian, will you be at Embedded Linux Conference Europe in Dublin?
> Maybe we can discuss these issues (data retention, ff-checks, etc...) in person and
 > figure out where to address them.
> I really want to avoid ad-hoc solutions. :)

Maybe I'll be at ELCE this year too
I'll be glad to meet all of you in person and participate to this 
discussion. :)

It will be nice if also some silicon vendor would like to participate. I 
know that someone from micron is actively following us on this ML, but I 
don't really know if there's someone here in Europe. :)

>
>> Thanks.
>> In your opinion, enabling chk_io is correct to rough estimate the overhead
 >> or does it enable too much checks?
>
> You mean the other checks bedside of self_check_write()? You can comment them out
 > for your tests.
>
> Thanks,
> //richard
>

Kind Regards,

-- 

Andrea SCIAN

DAVE Embedded Systems

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions
  2015-08-04  7:21                         ` Richard Weinberger
  2015-08-06  4:28                           ` Andrea Scian
@ 2015-08-06  9:19                           ` Boris Brezillon
  2015-08-06  9:42                             ` Richard Weinberger
  1 sibling, 1 reply; 26+ messages in thread
From: Boris Brezillon @ 2015-08-06  9:19 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Andrea Scian, linux-mtd, David Woodhouse, Brian Norris,
	linux-kernel, Han Xu, Artem Bityutskiy

Hi Richard,

On Tue, 4 Aug 2015 09:21:27 +0200
Richard Weinberger <richard@nod.at> wrote:

> Andrea,
> 
> Am 04.08.2015 um 09:02 schrieb Andrea Scian:
> >> I'm not sure whether introducing a read-before-write check is the best solution.
> >> At least we need hard numbers for slow/old SLC NANDs too.
> > 
> > We can enable the feature only for MLC, AFAIK it has not been required for old SLC ;-)
> 
> I think this needs more discussion.
> 
> Boris, Brian, will you be at Embedded Linux Conference Europe in Dublin?
> Maybe we can discuss these issues (data retention, ff-checks, etc...) in person and figure out where to address them.
> I really want to avoid ad-hoc solutions. :)

I'll be at ELCE and I'd be happy to discuss all these NAND/MTD/UBI
related issues with you, though I think we should keep discussing
those problems on the ML too.

Best Regards,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions
  2015-08-06  9:19                           ` Boris Brezillon
@ 2015-08-06  9:42                             ` Richard Weinberger
  0 siblings, 0 replies; 26+ messages in thread
From: Richard Weinberger @ 2015-08-06  9:42 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Andrea Scian, linux-mtd, David Woodhouse, Brian Norris,
	linux-kernel, Han Xu, Artem Bityutskiy



Am 06.08.2015 um 11:19 schrieb Boris Brezillon:
>> I think this needs more discussion.
>>
>> Boris, Brian, will you be at Embedded Linux Conference Europe in Dublin?
>> Maybe we can discuss these issues (data retention, ff-checks, etc...) in person and figure out where to address them.
>> I really want to avoid ad-hoc solutions. :)
> 
> I'll be at ELCE and I'd be happy to discuss all these NAND/MTD/UBI
> related issues with you, though I think we should keep discussing
> those problems on the ML too.

Sure, I did not recommend a secret meeting. ;-)
But often a face to face conversation is better for brain storming.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-08-06  9:42 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-30 17:34 [RFC PATCH 0/2] mtd: nand: properly handle bitflips in erased pages Boris Brezillon
2015-07-30 17:34 ` [RFC PATCH 1/2] mtd: nand: add nand_check_erased helper functions Boris Brezillon
2015-07-31  7:10   ` Boris Brezillon
2015-07-31 10:06     ` Andrea Scian
2015-07-31 10:21       ` Boris Brezillon
2015-07-31 13:29         ` Andrea Scian
2015-07-31 13:58           ` Boris Brezillon
2015-08-04 15:42     ` Andrea Scian
2015-08-04 16:27       ` Boris Brezillon
2015-08-04 16:27         ` Boris Brezillon
2015-07-30 17:34 ` [RFC PATCH 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions Boris Brezillon
2015-07-31 10:07   ` Andrea Scian
2015-07-31 10:32     ` Boris Brezillon
2015-07-31 13:40       ` Andrea Scian
2015-07-31 14:10         ` Boris Brezillon
2015-07-31 16:19           ` Andrea Scian
2015-07-31 16:27             ` Boris Brezillon
2015-08-03 11:16               ` Andrea Scian
2015-08-03 12:42                 ` Boris Brezillon
2015-08-03 13:39                   ` Andrea Scian
2015-08-03 19:32                     ` Richard Weinberger
2015-08-04  7:02                       ` Andrea Scian
2015-08-04  7:21                         ` Richard Weinberger
2015-08-06  4:28                           ` Andrea Scian
2015-08-06  9:19                           ` Boris Brezillon
2015-08-06  9:42                             ` Richard Weinberger
     [not found] <mailman.4514.1438332781.1758.linux-mtd@lists.infradead.org>
     [not found] <mailman.4457.1438277726.1758.linux-mtd@lists.infradead.org>

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.