All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] mtd: nand: properly handle bitflips in erased pages
@ 2015-08-24  9:47 Boris Brezillon
  2015-08-24  9:47 ` [PATCH v2 1/2] mtd: nand: add nand_check_erased helper functions Boris Brezillon
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Boris Brezillon @ 2015-08-24  9:47 UTC (permalink / raw)
  To: David Woodhouse, Brian Norris, linux-mtd
  Cc: Andrea Scian, Richard Weinberger, Boris Brezillon

Hello,

This patch series aims at providing a common logic to check for bitflips
in erased pages.

Currently each driver is implementing its own logic to check for bitflips
in erased pages. Not only this create code duplication, but most of these
implementations are incorrect.
Here are a few aspects that are often left aside in those implementations:
1/ they do not check OOB bytes when checking for the ff pattern, which
   means they can consider a page as empty while the MTD user actually
   wanted to write almost ff with a few bits to zero
2/ they check for the ff pattern on the whole page, while ECC actually
   works on smaller chunks (usually 512 or 1024 bytes chunks)
3/ they use random bitflip thresholds to decide whether a page/chunk is
   erased or not. IMO this threshold should be set to ECC strength (or
   at least something correlated to this parameter)

The approach taken in this series is to provide two helper functions to
check for bitflips in erased pages. Each driver that needs to check for
such cases can then call the nand_check_erased_ecc_chunk() function, and
rely on the common logic to decide whether a page is erased or not.

While Brian suggested a few times to make this detection automatic for
all drivers that set a specific flag (NAND_CHECK_ERASED_BITFLIPS?), here
is a few reasons I think this is not such a good idea:
1/ some (a lot of) drivers do not properly implement the raw access
   functions, and since we need to check for raw data and OOB bytes this
   makes the automatic detection unusable for most drivers unless they
   decide to correctly implement those methods (which would be a good
   thing BTW).
2/ as a I said earlier, this check should be made at the ECC chunk level
   and not at the page level. This spots two problems: some (a lot of)
   drivers do not properly specify the ecc layout information, and even
   if the ecc layout is correctly defined, there is no way to attach ECC
   bytes to a specific ECC chunk.
3/ the last aspect is the perf penalty incured by this test. Automatically
   doing that at the NAND core level implies reading the whole page again
   in raw mode, while with the helper function approach, drivers supporting
   access at the ECC chunk level can read only the faulty chunk in raw
   mode.

Regarding the bitflips threshold at which an erased pages is considered as
faulty, I have assigned it to ECC strength. As mentioned by Andrea, using
ECC strength might cause some trouble, because if you already have some
bitflips in an erased page, programming it might generate even more of
them.
In the other hand, shouldn't that be checked after (or before) programming
a page. I mean, UBI is already capable of detecting pages which are over
the configured bitflips_threshold and move data around when it detects
such pages.
If we check data after writing a page we wouldn't have to bother about
setting a weaker value for the "bitflips in erased page" case.
Another thing in favor of the ECC strength value for this "bitflips in
erased page" threshold value: if the ECC engine is generating 0xff ECC
bytes when the page is empty, then it will be able to fix ECC strength
bitflips without complaining, so why should we use different value when
we detect bitflips using the pattern match approach?

Best Regards,

Boris

Changes since v1:
- fix the nand_check_erased_buf() function
- mark the bitflips > bitflips_threshold condition as unlikely
- add missing memsets in nand_check_erased_ecc_chunk()

Boris Brezillon (2):
  mtd: nand: add nand_check_erased helper functions
  mtd: nand: use nand_check_erased_ecc_chunk in default ECC read
    functions

 drivers/mtd/nand/nand_base.c | 170 +++++++++++++++++++++++++++++++++++++++++--
 include/linux/mtd/nand.h     |   8 ++
 2 files changed, 171 insertions(+), 7 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/2] mtd: nand: add nand_check_erased helper functions
  2015-08-24  9:47 [PATCH v2 0/2] mtd: nand: properly handle bitflips in erased pages Boris Brezillon
@ 2015-08-24  9:47 ` Boris Brezillon
  2015-09-02 18:41   ` Brian Norris
  2015-08-24  9:47 ` [PATCH v2 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions Boris Brezillon
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Boris Brezillon @ 2015-08-24  9:47 UTC (permalink / raw)
  To: David Woodhouse, Brian Norris, linux-mtd
  Cc: Andrea Scian, Richard Weinberger, Boris Brezillon

Add two helper functions to help NAND controller drivers test whether a
specific NAND region is erased or not.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
---
 drivers/mtd/nand/nand_base.c | 108 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/mtd/nand.h     |   8 ++++
 2 files changed, 116 insertions(+)

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index ceb68ca..4d2ef65 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -1101,6 +1101,114 @@ out:
 EXPORT_SYMBOL(nand_lock);
 
 /**
+ * nand_check_erased_buf - check if a buffer contains (almost) only 0xff data
+ * @buf: buffer to test
+ * @len: buffer length
+ * @bitflips_threshold: maximum number of bitflips
+ *
+ * Check if a buffer contains only 0xff, which means the underlying region
+ * has been erased and is ready to be programmed.
+ * The bitflips_threshold specify the maximum number of bitflips before
+ * considering the region is not erased.
+ * Note: The logic of this function has been extracted from the memweight
+ * implementation, except that nand_check_erased_buf function exit before
+ * testing the whole buffer if the number of bitflips exceed the
+ * bitflips_threshold value.
+ *
+ * Returns a positive number of bitflips or -ERROR_CODE.
+ */
+int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
+{
+	const unsigned char *bitmap = buf;
+	int bitflips = 0;
+	int weight;
+	int longs;
+
+	for (; len && ((unsigned long)bitmap) % sizeof(long);
+	     len--, bitmap++) {
+		weight = hweight8(*bitmap);
+		bitflips += BITS_PER_BYTE - weight;
+		if (unlikely(bitflips > bitflips_threshold))
+			return -EINVAL;
+	}
+
+
+	for (longs = len / sizeof(long); longs;
+	     longs--, bitmap += sizeof(long)) {
+		BUG_ON(longs >= INT_MAX / BITS_PER_LONG);
+		weight = hweight_long(*((unsigned long *)bitmap));
+		bitflips += BITS_PER_LONG - weight;
+		if (unlikely(bitflips > bitflips_threshold))
+			return -EINVAL;
+	}
+
+	len %= sizeof(long);
+
+	for (; len > 0; len--, bitmap++) {
+		weight = hweight8(*bitmap);
+		bitflips += BITS_PER_BYTE - weight;
+		if (unlikely(bitflips > bitflips_threshold))
+			return -EINVAL;
+	}
+
+	return bitflips;
+}
+EXPORT_SYMBOL(nand_check_erased_buf);
+
+/**
+ * nand_check_erased_ecc_chunk - check if an ECC chunk contains (almost) only
+ *				 0xff data
+ * @data: data buffer to test
+ * @datalen: data length
+ * @ecc: ECC buffer
+ * @ecclen: ECC length
+ * @extraoob: extra OOB buffer
+ * @extraooblen: extra OOB length
+ * @bitflips_threshold: maximum number of bitflips
+ *
+ * Check if a data buffer and its associated ECC and OOB data contains only
+ * 0xff pattern, which means the underlying region has been erased and is
+ * ready to be programmed.
+ * The bitflips_threshold specify the maximum number of bitflips before
+ * considering the region as not erased.
+ *
+ * Returns a positive number of bitflips or -ERROR_CODE.
+ */
+int nand_check_erased_ecc_chunk(void *data, int datalen,
+				void *ecc, int ecclen,
+				void *extraoob, int extraooblen,
+				int bitflips_threshold)
+{
+	int bitflips = 0;
+	int ret;
+
+	ret = nand_check_erased_buf(data, datalen, bitflips_threshold);
+	if (ret < 0)
+		return ret;
+
+	bitflips += ret;
+	bitflips_threshold -= ret;
+
+	ret = nand_check_erased_buf(ecc, ecclen, bitflips_threshold);
+	if (ret < 0)
+		return ret;
+
+	bitflips += ret;
+	bitflips_threshold -= ret;
+
+	ret = nand_check_erased_buf(extraoob, extraooblen, bitflips_threshold);
+	if (ret < 0)
+		return ret;
+
+	memset(data, 0xff, datalen);
+	memset(ecc, 0xff, ecclen);
+	memset(extraoob, 0xff, extraooblen);
+
+	return bitflips + ret;
+}
+EXPORT_SYMBOL(nand_check_erased_ecc_chunk);
+
+/**
  * nand_read_page_raw - [INTERN] read raw page data without ecc
  * @mtd: mtd info structure
  * @chip: nand chip info structure
diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
index 272f429..ae06a07 100644
--- a/include/linux/mtd/nand.h
+++ b/include/linux/mtd/nand.h
@@ -1030,4 +1030,12 @@ struct nand_sdr_timings {
 
 /* get timing characteristics from ONFI timing mode. */
 const struct nand_sdr_timings *onfi_async_timing_mode_to_sdr_timings(int mode);
+
+int nand_check_erased_buf(void *data, int datalen,
+			  int threshold);
+
+int nand_check_erased_ecc_chunk(void *data, int datalen,
+				void *ecc, int ecclen,
+				void *extraoob, int extraooblen,
+				int threshold);
 #endif /* __LINUX_MTD_NAND_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions
  2015-08-24  9:47 [PATCH v2 0/2] mtd: nand: properly handle bitflips in erased pages Boris Brezillon
  2015-08-24  9:47 ` [PATCH v2 1/2] mtd: nand: add nand_check_erased helper functions Boris Brezillon
@ 2015-08-24  9:47 ` Boris Brezillon
  2015-09-02 20:35   ` Brian Norris
  2015-09-02 12:46 ` [PATCH v2 0/2] mtd: nand: properly handle bitflips in erased pages Boris Brezillon
  2015-09-02 19:43 ` Brian Norris
  3 siblings, 1 reply; 12+ messages in thread
From: Boris Brezillon @ 2015-08-24  9:47 UTC (permalink / raw)
  To: David Woodhouse, Brian Norris, linux-mtd
  Cc: Andrea Scian, Richard Weinberger, Boris Brezillon

The default NAND read functions are relying on an underlying controller
to correct bitflips, but some of those controller cannot properly fix
bitflips in erased pages.
In case of ECC failures, check if the page of subpage is empty before
reporting an ECC failure.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
---
 drivers/mtd/nand/nand_base.c | 62 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 55 insertions(+), 7 deletions(-)

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 4d2ef65..e095d86 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -1400,6 +1400,19 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip,
 		stat = chip->ecc.correct(mtd, p,
 			&chip->buffers->ecccode[i], &chip->buffers->ecccalc[i]);
 		if (stat < 0) {
+			/* check for empty pages with bitflips */
+			int col = (int)(p - bufpoi);
+
+			chip->cmdfunc(mtd, NAND_CMD_RNDOUT, col, -1);
+			chip->read_buf(mtd, p, chip->ecc.size);
+			stat = nand_check_erased_ecc_chunk(p, chip->ecc.size,
+						&chip->buffers->ecccode[i],
+						chip->ecc.bytes,
+						NULL, 0,
+						chip->ecc.strength);
+		}
+
+		if (stat < 0) {
 			mtd->ecc_stats.failed++;
 		} else {
 			mtd->ecc_stats.corrected += stat;
@@ -1449,6 +1462,16 @@ static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip,
 
 		stat = chip->ecc.correct(mtd, p, &ecc_code[i], &ecc_calc[i]);
 		if (stat < 0) {
+			/* check for empty pages with bitflips */
+			chip->cmdfunc(mtd, NAND_CMD_RNDOUT, i, -1);
+			chip->read_buf(mtd, p, eccsize);
+			stat = nand_check_erased_ecc_chunk(p, eccsize,
+						&ecc_code[i], eccbytes,
+						NULL, 0,
+						chip->ecc.strength);
+		}
+
+		if (stat < 0) {
 			mtd->ecc_stats.failed++;
 		} else {
 			mtd->ecc_stats.corrected += stat;
@@ -1501,6 +1524,17 @@ static int nand_read_page_hwecc_oob_first(struct mtd_info *mtd,
 
 		stat = chip->ecc.correct(mtd, p, &ecc_code[i], NULL);
 		if (stat < 0) {
+			/* check for empty pages with bitflips */
+			chip->cmdfunc(mtd, NAND_CMD_RNDOUT,
+				      i + mtd->oobsize, -1);
+			chip->read_buf(mtd, p, eccsize);
+			stat = nand_check_erased_ecc_chunk(p, eccsize,
+						&ecc_code[i], eccbytes,
+						NULL, 0,
+						chip->ecc.strength);
+		}
+
+		if (stat < 0) {
 			mtd->ecc_stats.failed++;
 		} else {
 			mtd->ecc_stats.corrected += stat;
@@ -1527,6 +1561,8 @@ static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip,
 	int i, eccsize = chip->ecc.size;
 	int eccbytes = chip->ecc.bytes;
 	int eccsteps = chip->ecc.steps;
+	int eccstepsize = eccsize + eccbytes + chip->ecc.prepad +
+			  chip->ecc.postpad;
 	uint8_t *p = buf;
 	uint8_t *oob = chip->oob_poi;
 	unsigned int max_bitflips = 0;
@@ -1546,19 +1582,31 @@ static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip,
 		chip->read_buf(mtd, oob, eccbytes);
 		stat = chip->ecc.correct(mtd, p, oob, NULL);
 
-		if (stat < 0) {
-			mtd->ecc_stats.failed++;
-		} else {
-			mtd->ecc_stats.corrected += stat;
-			max_bitflips = max_t(unsigned int, max_bitflips, stat);
-		}
-
 		oob += eccbytes;
 
 		if (chip->ecc.postpad) {
 			chip->read_buf(mtd, oob, chip->ecc.postpad);
 			oob += chip->ecc.postpad;
 		}
+
+		if (stat < 0) {
+			/* check for empty pages with bitflips */
+			chip->cmdfunc(mtd, NAND_CMD_RNDOUT,
+				      i * eccstepsize, -1);
+			chip->read_buf(mtd, p, chip->ecc.size);
+			stat = nand_check_erased_ecc_chunk(p, chip->ecc.size,
+							   oob - eccstepsize,
+							   eccstepsize,
+							   NULL, 0,
+							   chip->ecc.strength);
+		}
+
+		if (stat < 0) {
+			mtd->ecc_stats.failed++;
+		} else {
+			mtd->ecc_stats.corrected += stat;
+			max_bitflips = max_t(unsigned int, max_bitflips, stat);
+		}
 	}
 
 	/* Calculate remaining oob bytes */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/2] mtd: nand: properly handle bitflips in erased pages
  2015-08-24  9:47 [PATCH v2 0/2] mtd: nand: properly handle bitflips in erased pages Boris Brezillon
  2015-08-24  9:47 ` [PATCH v2 1/2] mtd: nand: add nand_check_erased helper functions Boris Brezillon
  2015-08-24  9:47 ` [PATCH v2 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions Boris Brezillon
@ 2015-09-02 12:46 ` Boris Brezillon
  2015-09-02 19:43 ` Brian Norris
  3 siblings, 0 replies; 12+ messages in thread
From: Boris Brezillon @ 2015-09-02 12:46 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: David Woodhouse, Brian Norris, linux-mtd, Andrea Scian,
	Richard Weinberger

Brian,

On Mon, 24 Aug 2015 11:47:20 +0200
Boris Brezillon <boris.brezillon@free-electrons.com> wrote:

> Hello,
> 
> This patch series aims at providing a common logic to check for bitflips
> in erased pages.

Could we settle on something regarding this problem?
I need that feature for the sunxi NAND driver, and we can't agree on
something I'll have to implement my own private function to do that...

> 
> Currently each driver is implementing its own logic to check for bitflips
> in erased pages. Not only this create code duplication, but most of these
> implementations are incorrect.
> Here are a few aspects that are often left aside in those implementations:
> 1/ they do not check OOB bytes when checking for the ff pattern, which
>    means they can consider a page as empty while the MTD user actually
>    wanted to write almost ff with a few bits to zero
> 2/ they check for the ff pattern on the whole page, while ECC actually
>    works on smaller chunks (usually 512 or 1024 bytes chunks)
> 3/ they use random bitflip thresholds to decide whether a page/chunk is
>    erased or not. IMO this threshold should be set to ECC strength (or
>    at least something correlated to this parameter)
> 
> The approach taken in this series is to provide two helper functions to
> check for bitflips in erased pages. Each driver that needs to check for
> such cases can then call the nand_check_erased_ecc_chunk() function, and
> rely on the common logic to decide whether a page is erased or not.
> 
> While Brian suggested a few times to make this detection automatic for
> all drivers that set a specific flag (NAND_CHECK_ERASED_BITFLIPS?), here
> is a few reasons I think this is not such a good idea:
> 1/ some (a lot of) drivers do not properly implement the raw access
>    functions, and since we need to check for raw data and OOB bytes this
>    makes the automatic detection unusable for most drivers unless they
>    decide to correctly implement those methods (which would be a good
>    thing BTW).
> 2/ as a I said earlier, this check should be made at the ECC chunk level
>    and not at the page level. This spots two problems: some (a lot of)
>    drivers do not properly specify the ecc layout information, and even
>    if the ecc layout is correctly defined, there is no way to attach ECC
>    bytes to a specific ECC chunk.
> 3/ the last aspect is the perf penalty incured by this test. Automatically
>    doing that at the NAND core level implies reading the whole page again
>    in raw mode, while with the helper function approach, drivers supporting
>    access at the ECC chunk level can read only the faulty chunk in raw
>    mode.
> 
> Regarding the bitflips threshold at which an erased pages is considered as
> faulty, I have assigned it to ECC strength. As mentioned by Andrea, using
> ECC strength might cause some trouble, because if you already have some
> bitflips in an erased page, programming it might generate even more of
> them.
> In the other hand, shouldn't that be checked after (or before) programming
> a page. I mean, UBI is already capable of detecting pages which are over
> the configured bitflips_threshold and move data around when it detects
> such pages.
> If we check data after writing a page we wouldn't have to bother about
> setting a weaker value for the "bitflips in erased page" case.
> Another thing in favor of the ECC strength value for this "bitflips in
> erased page" threshold value: if the ECC engine is generating 0xff ECC
> bytes when the page is empty, then it will be able to fix ECC strength
> bitflips without complaining, so why should we use different value when
> we detect bitflips using the pattern match approach?
> 
> Best Regards,
> 
> Boris
> 
> Changes since v1:
> - fix the nand_check_erased_buf() function
> - mark the bitflips > bitflips_threshold condition as unlikely
> - add missing memsets in nand_check_erased_ecc_chunk()
> 
> Boris Brezillon (2):
>   mtd: nand: add nand_check_erased helper functions
>   mtd: nand: use nand_check_erased_ecc_chunk in default ECC read
>     functions
> 
>  drivers/mtd/nand/nand_base.c | 170 +++++++++++++++++++++++++++++++++++++++++--
>  include/linux/mtd/nand.h     |   8 ++
>  2 files changed, 171 insertions(+), 7 deletions(-)
> 



-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] mtd: nand: add nand_check_erased helper functions
  2015-08-24  9:47 ` [PATCH v2 1/2] mtd: nand: add nand_check_erased helper functions Boris Brezillon
@ 2015-09-02 18:41   ` Brian Norris
  2015-09-02 19:30     ` Boris Brezillon
  0 siblings, 1 reply; 12+ messages in thread
From: Brian Norris @ 2015-09-02 18:41 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: David Woodhouse, linux-mtd, Andrea Scian, Richard Weinberger

Hi Boris,

First of all, thanks for the persistence.

On Mon, Aug 24, 2015 at 11:47:21AM +0200, Boris Brezillon wrote:
> Add two helper functions to help NAND controller drivers test whether a
> specific NAND region is erased or not.

What sort of tests (if any) did you run for this?

> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
> ---
>  drivers/mtd/nand/nand_base.c | 108 +++++++++++++++++++++++++++++++++++++++++++
>  include/linux/mtd/nand.h     |   8 ++++
>  2 files changed, 116 insertions(+)
> 
> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> index ceb68ca..4d2ef65 100644
> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -1101,6 +1101,114 @@ out:
>  EXPORT_SYMBOL(nand_lock);
>  
>  /**
> + * nand_check_erased_buf - check if a buffer contains (almost) only 0xff data
> + * @buf: buffer to test
> + * @len: buffer length
> + * @bitflips_threshold: maximum number of bitflips
> + *
> + * Check if a buffer contains only 0xff, which means the underlying region
> + * has been erased and is ready to be programmed.
> + * The bitflips_threshold specify the maximum number of bitflips before
> + * considering the region is not erased.
> + * Note: The logic of this function has been extracted from the memweight
> + * implementation, except that nand_check_erased_buf function exit before
> + * testing the whole buffer if the number of bitflips exceed the
> + * bitflips_threshold value.
> + *
> + * Returns a positive number of bitflips or -ERROR_CODE.

Small clarification:

  "Returns a positive number of bitflips less than or equal to
  bitflips_threshold, or -ERROR_CODE for bitflips in excess of the
  threshold."

> + */
> +int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)

I don't have much problem with this function as a helper, but does it
really need to be exported? This is exactly the form I *don't* want to
encourage -- checking a data buffer while ignoring the OOB / ECC. Or did
you have a good reason to export this one? I see you don't use it
outside of this file.

And especially if we don't end up exporting this one, I'd really like
some extra explanatory text next to the public interface
(nand_check_erased_ecc_chunk()) to explain some of the pitfalls.
Particularly, why we must check both the data and spare area, and how we
highly recommend (if not require) that new drivers use the existing
methods rather than roll their own.

> +{
> +	const unsigned char *bitmap = buf;
> +	int bitflips = 0;
> +	int weight;
> +	int longs;
> +

Does it make any sense to have a short-circuit condition for the
threshold == 0 case? We could expect to see that for common 1-bit ECC or
similar.

> +	for (; len && ((unsigned long)bitmap) % sizeof(long);

Cast pointer to unsigned long? I'd expect uintptr_t, but then it sees
those are the same often enough that types.h says they're identical...

> +	     len--, bitmap++) {
> +		weight = hweight8(*bitmap);
> +		bitflips += BITS_PER_BYTE - weight;
> +		if (unlikely(bitflips > bitflips_threshold))
> +			return -EINVAL;

I'm not sure EINVAL is the right thing. This isn't a bad argument, it's
just a bad flash. EUCLEAN would fit typical usage. Same comment applies
elsewhere.

> +	}
> +
> +
> +	for (longs = len / sizeof(long); longs;
> +	     longs--, bitmap += sizeof(long)) {
> +		BUG_ON(longs >= INT_MAX / BITS_PER_LONG);

Please don't add BUG_ON(). The language around it has gotten even
stronger to discourage its use. And in this instance I don't think it's
even helpful; because 'longs' is derived from 'len' (which is an 'int'),
this condition is mathematically impossible, no?

Also (feel free to shoot me down), wouldn't the loop bounds be a little
more straightforward (and consistent with the rest of the function), and
not require the 'longs' variable, when written as the following?

	for (; len >= sizeof(long); len -= sizeof(long), bitmap += sizeof(long)) {
		...
	}

> +		weight = hweight_long(*((unsigned long *)bitmap));
> +		bitflips += BITS_PER_LONG - weight;
> +		if (unlikely(bitflips > bitflips_threshold))
> +			return -EINVAL;
> +	}
> +
> +	len %= sizeof(long);

If you did the above, then you shouldn't need this line.

> +
> +	for (; len > 0; len--, bitmap++) {
> +		weight = hweight8(*bitmap);
> +		bitflips += BITS_PER_BYTE - weight;
> +		if (unlikely(bitflips > bitflips_threshold))
> +			return -EINVAL;
> +	}
> +
> +	return bitflips;
> +}
> +EXPORT_SYMBOL(nand_check_erased_buf);
> +
> +/**
> + * nand_check_erased_ecc_chunk - check if an ECC chunk contains (almost) only
> + *				 0xff data
> + * @data: data buffer to test
> + * @datalen: data length
> + * @ecc: ECC buffer
> + * @ecclen: ECC length
> + * @extraoob: extra OOB buffer
> + * @extraooblen: extra OOB length
> + * @bitflips_threshold: maximum number of bitflips
> + *
> + * Check if a data buffer and its associated ECC and OOB data contains only
> + * 0xff pattern, which means the underlying region has been erased and is
> + * ready to be programmed.
> + * The bitflips_threshold specify the maximum number of bitflips before
> + * considering the region as not erased.
> + *
> + * Returns a positive number of bitflips or -ERROR_CODE.

Add similar language from above? Also, it's very important to note that
this function wipes the buffers in the nearly-erased case!

> + */
> +int nand_check_erased_ecc_chunk(void *data, int datalen,
> +				void *ecc, int ecclen,
> +				void *extraoob, int extraooblen,
> +				int bitflips_threshold)
> +{
> +	int bitflips = 0;
> +	int ret;
> +
> +	ret = nand_check_erased_buf(data, datalen, bitflips_threshold);
> +	if (ret < 0)
> +		return ret;
> +
> +	bitflips += ret;
> +	bitflips_threshold -= ret;
> +
> +	ret = nand_check_erased_buf(ecc, ecclen, bitflips_threshold);
> +	if (ret < 0)
> +		return ret;
> +
> +	bitflips += ret;
> +	bitflips_threshold -= ret;
> +
> +	ret = nand_check_erased_buf(extraoob, extraooblen, bitflips_threshold);
> +	if (ret < 0)
> +		return ret;
> +
> +	memset(data, 0xff, datalen);
> +	memset(ecc, 0xff, ecclen);
> +	memset(extraoob, 0xff, extraooblen);
> +
> +	return bitflips + ret;
> +}
> +EXPORT_SYMBOL(nand_check_erased_ecc_chunk);
> +
> +/**
>   * nand_read_page_raw - [INTERN] read raw page data without ecc
>   * @mtd: mtd info structure
>   * @chip: nand chip info structure
> diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
> index 272f429..ae06a07 100644
> --- a/include/linux/mtd/nand.h
> +++ b/include/linux/mtd/nand.h
> @@ -1030,4 +1030,12 @@ struct nand_sdr_timings {
>  
>  /* get timing characteristics from ONFI timing mode. */
>  const struct nand_sdr_timings *onfi_async_timing_mode_to_sdr_timings(int mode);
> +
> +int nand_check_erased_buf(void *data, int datalen,
> +			  int threshold);
> +
> +int nand_check_erased_ecc_chunk(void *data, int datalen,
> +				void *ecc, int ecclen,
> +				void *extraoob, int extraooblen,
> +				int threshold);
>  #endif /* __LINUX_MTD_NAND_H */


Brian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] mtd: nand: add nand_check_erased helper functions
  2015-09-02 18:41   ` Brian Norris
@ 2015-09-02 19:30     ` Boris Brezillon
  2015-09-02 20:26       ` Brian Norris
  2015-09-03 13:22       ` Andrea Scian
  0 siblings, 2 replies; 12+ messages in thread
From: Boris Brezillon @ 2015-09-02 19:30 UTC (permalink / raw)
  To: Brian Norris; +Cc: David Woodhouse, linux-mtd, Andrea Scian, Richard Weinberger

Hi,

On Wed, 2 Sep 2015 11:41:43 -0700
Brian Norris <computersforpeace@gmail.com> wrote:

> Hi Boris,
> 
> First of all, thanks for the persistence.

Hehe, you're not done with me: I still have a bunch of NAND related
patches to post ;-). But I need this one to be accepted first.

> 
> On Mon, Aug 24, 2015 at 11:47:21AM +0200, Boris Brezillon wrote:
> > Add two helper functions to help NAND controller drivers test whether a
> > specific NAND region is erased or not.
> 
> What sort of tests (if any) did you run for this?

As I said, I have a series depending on this patch (see this branch [1]
if you want more details), which means I successfully tested it on a
sunxi platform.
BTW, I tested it with the nandflipbits tool [2] I posted a while ago.
Could you consider taking it (or a similar tool) in mtd-utils?

Andrea, did you test this patch on your imx platform(s)?

> 
> > Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
> > ---
> >  drivers/mtd/nand/nand_base.c | 108 +++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/mtd/nand.h     |   8 ++++
> >  2 files changed, 116 insertions(+)
> > 
> > diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> > index ceb68ca..4d2ef65 100644
> > --- a/drivers/mtd/nand/nand_base.c
> > +++ b/drivers/mtd/nand/nand_base.c
> > @@ -1101,6 +1101,114 @@ out:
> >  EXPORT_SYMBOL(nand_lock);
> >  
> >  /**
> > + * nand_check_erased_buf - check if a buffer contains (almost) only 0xff data
> > + * @buf: buffer to test
> > + * @len: buffer length
> > + * @bitflips_threshold: maximum number of bitflips
> > + *
> > + * Check if a buffer contains only 0xff, which means the underlying region
> > + * has been erased and is ready to be programmed.
> > + * The bitflips_threshold specify the maximum number of bitflips before
> > + * considering the region is not erased.
> > + * Note: The logic of this function has been extracted from the memweight
> > + * implementation, except that nand_check_erased_buf function exit before
> > + * testing the whole buffer if the number of bitflips exceed the
> > + * bitflips_threshold value.
> > + *
> > + * Returns a positive number of bitflips or -ERROR_CODE.
> 
> Small clarification:
> 
>   "Returns a positive number of bitflips less than or equal to
>   bitflips_threshold, or -ERROR_CODE for bitflips in excess of the
>   threshold."
> 

Sure, I'll change this comment.

> > + */
> > +int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
> 
> I don't have much problem with this function as a helper, but does it
> really need to be exported? This is exactly the form I *don't* want to
> encourage -- checking a data buffer while ignoring the OOB / ECC. Or did
> you have a good reason to export this one? I see you don't use it
> outside of this file.

I currently have no reason to export this function, but if we want to
reuse part of the logic for non standard drivers we might need to
export it afterwards.
The example I have in mind is the GPMI driver, which also need to count
bitflips, with the exception that ECC buffers are not necessarily
aligned on a byte. So exposing this function would allow reusing a
common logic for aligned buffers, and then dealing with unaligned parts
in driver specific code.

> 
> And especially if we don't end up exporting this one, I'd really like
> some extra explanatory text next to the public interface
> (nand_check_erased_ecc_chunk()) to explain some of the pitfalls.
> Particularly, why we must check both the data and spare area, and how we
> highly recommend (if not require) that new drivers use the existing
> methods rather than roll their own.

This is something I can add even if we keep exporting the previous
function ;-).

> 
> > +{
> > +	const unsigned char *bitmap = buf;
> > +	int bitflips = 0;
> > +	int weight;
> > +	int longs;
> > +
> 
> Does it make any sense to have a short-circuit condition for the
> threshold == 0 case? We could expect to see that for common 1-bit ECC or
> similar.

Hm, I'm not sure I understand your suggestion. How would you shortcut
this case? You still need to detect one bitflip before returning, right?
Are you suggesting to use memweight() in this case? Because if you are,
then I don't see any benefit since I have copied the memweight()
implementation to implement the nand_check_erased_buf() logic.
The only thing that could be avoided are the extra
'bitflips > bitflips_threshold' checks, but I'm not sure it makes such a
big difference for small/medium NAND pages (and 1-bit ECC will
definitely be used on relatively small pages).

> 
> > +	for (; len && ((unsigned long)bitmap) % sizeof(long);
> 
> Cast pointer to unsigned long? I'd expect uintptr_t, but then it sees
> those are the same often enough that types.h says they're identical...

As I said in the header note, I blindly copied memweight()
implementation, sightly adapting it for my need, so there might be room
for improvement.

> 
> > +	     len--, bitmap++) {
> > +		weight = hweight8(*bitmap);
> > +		bitflips += BITS_PER_BYTE - weight;
> > +		if (unlikely(bitflips > bitflips_threshold))
> > +			return -EINVAL;
> 
> I'm not sure EINVAL is the right thing. This isn't a bad argument, it's
> just a bad flash. EUCLEAN would fit typical usage. Same comment applies
> elsewhere.

Yep, I was considering replacing EINVAL by EIO, but EUCLEAN is better.
I'll fix that.

> 
> > +	}
> > +
> > +
> > +	for (longs = len / sizeof(long); longs;
> > +	     longs--, bitmap += sizeof(long)) {
> > +		BUG_ON(longs >= INT_MAX / BITS_PER_LONG);
> 
> Please don't add BUG_ON(). The language around it has gotten even
> stronger to discourage its use. And in this instance I don't think it's
> even helpful; because 'longs' is derived from 'len' (which is an 'int'),
> this condition is mathematically impossible, no?

Ditto: everything was blindly copied from the memweight implementation.

> 
> Also (feel free to shoot me down), wouldn't the loop bounds be a little
> more straightforward (and consistent with the rest of the function), and
> not require the 'longs' variable, when written as the following?
> 
> 	for (; len >= sizeof(long); len -= sizeof(long), bitmap += sizeof(long)) {
> 		...
> 	}

Definitely better.

> 
> > +		weight = hweight_long(*((unsigned long *)bitmap));
> > +		bitflips += BITS_PER_LONG - weight;
> > +		if (unlikely(bitflips > bitflips_threshold))
> > +			return -EINVAL;
> > +	}
> > +
> > +	len %= sizeof(long);
> 
> If you did the above, then you shouldn't need this line.

Yep.

> 
> > +
> > +	for (; len > 0; len--, bitmap++) {
> > +		weight = hweight8(*bitmap);
> > +		bitflips += BITS_PER_BYTE - weight;
> > +		if (unlikely(bitflips > bitflips_threshold))
> > +			return -EINVAL;
> > +	}
> > +
> > +	return bitflips;
> > +}
> > +EXPORT_SYMBOL(nand_check_erased_buf);
> > +
> > +/**
> > + * nand_check_erased_ecc_chunk - check if an ECC chunk contains (almost) only
> > + *				 0xff data
> > + * @data: data buffer to test
> > + * @datalen: data length
> > + * @ecc: ECC buffer
> > + * @ecclen: ECC length
> > + * @extraoob: extra OOB buffer
> > + * @extraooblen: extra OOB length
> > + * @bitflips_threshold: maximum number of bitflips
> > + *
> > + * Check if a data buffer and its associated ECC and OOB data contains only
> > + * 0xff pattern, which means the underlying region has been erased and is
> > + * ready to be programmed.
> > + * The bitflips_threshold specify the maximum number of bitflips before
> > + * considering the region as not erased.
> > + *
> > + * Returns a positive number of bitflips or -ERROR_CODE.
> 
> Add similar language from above? Also, it's very important to note that
> this function wipes the buffers in the nearly-erased case!

Yes, I'll add that in the comment, in addition to a lot more precise
description of how to use it and the importance of the ecc and extraoob
fields (even if they are optionals).


Thanks for your review.

Best Regards,

Boris

[1]https://github.com/bbrezillon/linux-sunxi/tree/sunxi/mtd-next
[2]http://patchwork.ozlabs.org/patch/415517/


-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/2] mtd: nand: properly handle bitflips in erased pages
  2015-08-24  9:47 [PATCH v2 0/2] mtd: nand: properly handle bitflips in erased pages Boris Brezillon
                   ` (2 preceding siblings ...)
  2015-09-02 12:46 ` [PATCH v2 0/2] mtd: nand: properly handle bitflips in erased pages Boris Brezillon
@ 2015-09-02 19:43 ` Brian Norris
  3 siblings, 0 replies; 12+ messages in thread
From: Brian Norris @ 2015-09-02 19:43 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: David Woodhouse, linux-mtd, Andrea Scian, Richard Weinberger

Hi Boris,

Some high level comments. We've discussed some of this before on IRC,
but I guess having a few more bits here could help.

On Mon, Aug 24, 2015 at 11:47:20AM +0200, Boris Brezillon wrote:
> Hello,
> 
> This patch series aims at providing a common logic to check for bitflips
> in erased pages.
> 
> Currently each driver is implementing its own logic to check for bitflips
> in erased pages. Not only this create code duplication, but most of these
> implementations are incorrect.
> Here are a few aspects that are often left aside in those implementations:
> 1/ they do not check OOB bytes when checking for the ff pattern, which
>    means they can consider a page as empty while the MTD user actually
>    wanted to write almost ff with a few bits to zero
> 2/ they check for the ff pattern on the whole page, while ECC actually
>    works on smaller chunks (usually 512 or 1024 bytes chunks)
> 3/ they use random bitflip thresholds to decide whether a page/chunk is
>    erased or not. IMO this threshold should be set to ECC strength (or
>    at least something correlated to this parameter)

Agreed.

> The approach taken in this series is to provide two helper functions to
> check for bitflips in erased pages. Each driver that needs to check for
> such cases can then call the nand_check_erased_ecc_chunk() function, and
> rely on the common logic to decide whether a page is erased or not.
> 
> While Brian suggested a few times to make this detection automatic for
> all drivers that set a specific flag (NAND_CHECK_ERASED_BITFLIPS?), here
> is a few reasons I think this is not such a good idea:
> 1/ some (a lot of) drivers do not properly implement the raw access
>    functions, and since we need to check for raw data and OOB bytes this
>    makes the automatic detection unusable for most drivers unless they
>    decide to correctly implement those methods (which would be a good
>    thing BTW).

Given your last parenthetical statement (which I agree with), I'm not
sure this is a very strong point. IMO it's only valid for hardware which
*cannot* support raw modes. For all other cases, it's reasonable to
assume that developers can implement a conforming driver in order to
pick up these reliability features.

> 2/ as a I said earlier, this check should be made at the ECC chunk level
>    and not at the page level. This spots two problems: some (a lot of)
>    drivers do not properly specify the ecc layout information, and even
>    if the ecc layout is correctly defined, there is no way to attach ECC
>    bytes to a specific ECC chunk.

I think this point is pretty valid, and it's one of the main
deficiencies in an automatic approach that ignores the ECC layout.

> 3/ the last aspect is the perf penalty incured by this test. Automatically
>    doing that at the NAND core level implies reading the whole page again
>    in raw mode, while with the helper function approach, drivers supporting
>    access at the ECC chunk level can read only the faulty chunk in raw
>    mode.

The real important point here is (IMO) that the driver may have
knowledge of the uncorrected data + OOB even in the !oob_required case,
so *no* re-reading is required. It's a little harder to get that
guarantee when you move a little higher up the abstraction layer.

> Regarding the bitflips threshold at which an erased pages is considered as
> faulty, I have assigned it to ECC strength. As mentioned by Andrea, using
> ECC strength might cause some trouble, because if you already have some
> bitflips in an erased page, programming it might generate even more of
> them.
> In the other hand, shouldn't that be checked after (or before) programming
> a page. I mean, UBI is already capable of detecting pages which are over
> the configured bitflips_threshold and move data around when it detects
> such pages.
> If we check data after writing a page we wouldn't have to bother about
> setting a weaker value for the "bitflips in erased page" case.
> Another thing in favor of the ECC strength value for this "bitflips in
> erased page" threshold value: if the ECC engine is generating 0xff ECC
> bytes when the page is empty, then it will be able to fix ECC strength
> bitflips without complaining, so why should we use different value when
> we detect bitflips using the pattern match approach?

I like this argument about comparing with an ECC algorithm that corrects
bitflips in 0xff pages.

Regardless, in case this does become an issue, might this threshold be
worth stashing in struct nand_chip? Then the driver API can omit the
threshold parameter, and we have a chance to override it in a central
place (or, maybe in special-case drivers) rather than at each call site
for nand_check_erased_ecc_chunk().

Brian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] mtd: nand: add nand_check_erased helper functions
  2015-09-02 19:30     ` Boris Brezillon
@ 2015-09-02 20:26       ` Brian Norris
  2015-09-02 20:51         ` Boris Brezillon
  2015-09-03 13:22       ` Andrea Scian
  1 sibling, 1 reply; 12+ messages in thread
From: Brian Norris @ 2015-09-02 20:26 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: David Woodhouse, linux-mtd, Andrea Scian, Richard Weinberger

On Wed, Sep 02, 2015 at 09:30:34PM +0200, Boris Brezillon wrote:
> On Wed, 2 Sep 2015 11:41:43 -0700
> Brian Norris <computersforpeace@gmail.com> wrote:
> 
> > Hi Boris,
> > 
> > First of all, thanks for the persistence.
> 
> Hehe, you're not done with me: I still have a bunch of NAND related
> patches to post ;-). But I need this one to be accepted first.

:)

> > 
> > On Mon, Aug 24, 2015 at 11:47:21AM +0200, Boris Brezillon wrote:
> > > Add two helper functions to help NAND controller drivers test whether a
> > > specific NAND region is erased or not.
> > 
> > What sort of tests (if any) did you run for this?
> 
> As I said, I have a series depending on this patch (see this branch [1]
> if you want more details), which means I successfully tested it on a
> sunxi platform.

OK. I'm a little wary of other platforms that may not, for instance,
expect the additional RNDOUT command. Hopefully we can get some more
testing, or at least I'll try to look through a few more drivers for
special cases.

> BTW, I tested it with the nandflipbits tool [2] I posted a while ago.
> Could you consider taking it (or a similar tool) in mtd-utils?

I'll try to take another look at it.

> Andrea, did you test this patch on your imx platform(s)?
> 
> > 
> > > Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
> > > ---
> > >  drivers/mtd/nand/nand_base.c | 108 +++++++++++++++++++++++++++++++++++++++++++
> > >  include/linux/mtd/nand.h     |   8 ++++
> > >  2 files changed, 116 insertions(+)
> > > 
> > > diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> > > index ceb68ca..4d2ef65 100644
> > > --- a/drivers/mtd/nand/nand_base.c
> > > +++ b/drivers/mtd/nand/nand_base.c
> > > @@ -1101,6 +1101,114 @@ out:
> > >  EXPORT_SYMBOL(nand_lock);
> > >  
> > >  /**
> > > + * nand_check_erased_buf - check if a buffer contains (almost) only 0xff data
> > > + * @buf: buffer to test
> > > + * @len: buffer length
> > > + * @bitflips_threshold: maximum number of bitflips
> > > + *
> > > + * Check if a buffer contains only 0xff, which means the underlying region
> > > + * has been erased and is ready to be programmed.
> > > + * The bitflips_threshold specify the maximum number of bitflips before
> > > + * considering the region is not erased.
> > > + * Note: The logic of this function has been extracted from the memweight
> > > + * implementation, except that nand_check_erased_buf function exit before
> > > + * testing the whole buffer if the number of bitflips exceed the
> > > + * bitflips_threshold value.
> > > + *
> > > + * Returns a positive number of bitflips or -ERROR_CODE.
> > 
> > Small clarification:
> > 
> >   "Returns a positive number of bitflips less than or equal to
> >   bitflips_threshold, or -ERROR_CODE for bitflips in excess of the
> >   threshold."
> > 
> 
> Sure, I'll change this comment.
> 
> > > + */
> > > +int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
> > 
> > I don't have much problem with this function as a helper, but does it
> > really need to be exported? This is exactly the form I *don't* want to
> > encourage -- checking a data buffer while ignoring the OOB / ECC. Or did
> > you have a good reason to export this one? I see you don't use it
> > outside of this file.
> 
> I currently have no reason to export this function, but if we want to
> reuse part of the logic for non standard drivers we might need to
> export it afterwards.
> The example I have in mind is the GPMI driver, which also need to count
> bitflips, with the exception that ECC buffers are not necessarily
> aligned on a byte. So exposing this function would allow reusing a
> common logic for aligned buffers, and then dealing with unaligned parts
> in driver specific code.

Hmm, well I think I'd still be happier if we had an example
implementation that used it before exporting it. The GPMI case is
interesting, but I'd like to see code first.

> > 
> > And especially if we don't end up exporting this one, I'd really like
> > some extra explanatory text next to the public interface
> > (nand_check_erased_ecc_chunk()) to explain some of the pitfalls.
> > Particularly, why we must check both the data and spare area, and how we
> > highly recommend (if not require) that new drivers use the existing
> > methods rather than roll their own.
> 
> This is something I can add even if we keep exporting the previous
> function ;-).

Sure, I just said "especially" not "only" :)

> > 
> > > +{
> > > +	const unsigned char *bitmap = buf;
> > > +	int bitflips = 0;
> > > +	int weight;
> > > +	int longs;
> > > +
> > 
> > Does it make any sense to have a short-circuit condition for the
> > threshold == 0 case? We could expect to see that for common 1-bit ECC or
> > similar.
> 
> Hm, I'm not sure I understand your suggestion. How would you shortcut
> this case? You still need to detect one bitflip before returning, right?
> Are you suggesting to use memweight() in this case? Because if you are,
> then I don't see any benefit since I have copied the memweight()
> implementation to implement the nand_check_erased_buf() logic.
> The only thing that could be avoided are the extra
> 'bitflips > bitflips_threshold' checks, but I'm not sure it makes such a
> big difference for small/medium NAND pages (and 1-bit ECC will
> definitely be used on relatively small pages).

I probably wasn't too clear.

You don't need any weight check at all. You can essentially just do
memcmp() with 0xff. Or since memcmp() requires a second buffer, maybe
just roll our own loop checking for 0xffffffff. But that may not be
worth it.

> > 
> > > +	for (; len && ((unsigned long)bitmap) % sizeof(long);
> > 
> > Cast pointer to unsigned long? I'd expect uintptr_t, but then it sees
> > those are the same often enough that types.h says they're identical...
> 
> As I said in the header note, I blindly copied memweight()
> implementation, sightly adapting it for my need, so there might be room
> for improvement.

Ah, right. Well I guess my comments stand anyway.

> > 
> > > +	     len--, bitmap++) {
> > > +		weight = hweight8(*bitmap);
> > > +		bitflips += BITS_PER_BYTE - weight;
> > > +		if (unlikely(bitflips > bitflips_threshold))
> > > +			return -EINVAL;
> > 
> > I'm not sure EINVAL is the right thing. This isn't a bad argument, it's
> > just a bad flash. EUCLEAN would fit typical usage. Same comment applies
> > elsewhere.
> 
> Yep, I was considering replacing EINVAL by EIO, but EUCLEAN is better.
> I'll fix that.

I think I actually meant EBADMSG, not EUCLEAN. The MTD API uses the
former to mean uncorrectable and the latter to mean correctable.

> > 
> > > +	}
> > > +
> > > +
> > > +	for (longs = len / sizeof(long); longs;
> > > +	     longs--, bitmap += sizeof(long)) {
> > > +		BUG_ON(longs >= INT_MAX / BITS_PER_LONG);
> > 
> > Please don't add BUG_ON(). The language around it has gotten even
> > stronger to discourage its use. And in this instance I don't think it's
> > even helpful; because 'longs' is derived from 'len' (which is an 'int'),
> > this condition is mathematically impossible, no?
> 
> Ditto: everything was blindly copied from the memweight implementation.

OK.

> > 
> > Also (feel free to shoot me down), wouldn't the loop bounds be a little
> > more straightforward (and consistent with the rest of the function), and
> > not require the 'longs' variable, when written as the following?
> > 
> > 	for (; len >= sizeof(long); len -= sizeof(long), bitmap += sizeof(long)) {
> > 		...
> > 	}
> 
> Definitely better.
> 
> > 
> > > +		weight = hweight_long(*((unsigned long *)bitmap));
> > > +		bitflips += BITS_PER_LONG - weight;
> > > +		if (unlikely(bitflips > bitflips_threshold))
> > > +			return -EINVAL;
> > > +	}
> > > +
> > > +	len %= sizeof(long);
> > 
> > If you did the above, then you shouldn't need this line.
> 
> Yep.
> 
> > 
> > > +
> > > +	for (; len > 0; len--, bitmap++) {
> > > +		weight = hweight8(*bitmap);
> > > +		bitflips += BITS_PER_BYTE - weight;
> > > +		if (unlikely(bitflips > bitflips_threshold))
> > > +			return -EINVAL;
> > > +	}
> > > +
> > > +	return bitflips;
> > > +}
> > > +EXPORT_SYMBOL(nand_check_erased_buf);
> > > +
> > > +/**
> > > + * nand_check_erased_ecc_chunk - check if an ECC chunk contains (almost) only
> > > + *				 0xff data
> > > + * @data: data buffer to test
> > > + * @datalen: data length
> > > + * @ecc: ECC buffer
> > > + * @ecclen: ECC length
> > > + * @extraoob: extra OOB buffer
> > > + * @extraooblen: extra OOB length
> > > + * @bitflips_threshold: maximum number of bitflips
> > > + *
> > > + * Check if a data buffer and its associated ECC and OOB data contains only
> > > + * 0xff pattern, which means the underlying region has been erased and is
> > > + * ready to be programmed.
> > > + * The bitflips_threshold specify the maximum number of bitflips before
> > > + * considering the region as not erased.
> > > + *
> > > + * Returns a positive number of bitflips or -ERROR_CODE.
> > 
> > Add similar language from above? Also, it's very important to note that
> > this function wipes the buffers in the nearly-erased case!
> 
> Yes, I'll add that in the comment, in addition to a lot more precise
> description of how to use it and the importance of the ecc and extraoob
> fields (even if they are optionals).
> 
> 
> Thanks for your review.
> 
> Best Regards,
> 
> Boris
> 
> [1]https://github.com/bbrezillon/linux-sunxi/tree/sunxi/mtd-next
> [2]http://patchwork.ozlabs.org/patch/415517/

Brian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions
  2015-08-24  9:47 ` [PATCH v2 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions Boris Brezillon
@ 2015-09-02 20:35   ` Brian Norris
  2015-09-02 20:45     ` Boris Brezillon
  0 siblings, 1 reply; 12+ messages in thread
From: Brian Norris @ 2015-09-02 20:35 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: David Woodhouse, linux-mtd, Andrea Scian, Richard Weinberger

On Mon, Aug 24, 2015 at 11:47:22AM +0200, Boris Brezillon wrote:
> The default NAND read functions are relying on an underlying controller
> to correct bitflips, but some of those controller cannot properly fix
> bitflips in erased pages.
> In case of ECC failures, check if the page of subpage is empty before
> reporting an ECC failure.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>

General note: this looks pretty good to me. Are there drivers which we
should kill erased-page checks from now, given this patch? There are
several of dubious value that we might drop without consequence. But
with some, I'd wonder if we might cause a performance slowdown and/or
high CPU utilization -- particularly those that look like they might
signal ECC errors on all-0xff pages, even with no bitflips.

> ---
>  drivers/mtd/nand/nand_base.c | 62 +++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 55 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> index 4d2ef65..e095d86 100644
> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -1400,6 +1400,19 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip,
>  		stat = chip->ecc.correct(mtd, p,
>  			&chip->buffers->ecccode[i], &chip->buffers->ecccalc[i]);
>  		if (stat < 0) {

I'm not sure if this is a fault of your patch or of the API design, but
do we want to do erased-ECC checks on all failures, regardless of type?
I would have expected maybe we could check only for -EBADMSG, but it
appears that's not consistent. Apparently all correction failures are
just "some negative value."

Anyway, if we had better consistency, I'd suggest:

		if (stat == -EBADMSG) {

But I suppose that 'stat < 0' is the best we can do for now.

> +			/* check for empty pages with bitflips */
> +			int col = (int)(p - bufpoi);
> +
> +			chip->cmdfunc(mtd, NAND_CMD_RNDOUT, col, -1);

Are all drivers that use this function prepared to handle another RNDOUT
properly? I know some drivers tend to make assumptions about things that
nand_base is doing like this. I know that would be a dirty trick, but
it's not impossible...

> +			chip->read_buf(mtd, p, chip->ecc.size);

Also, are you sure we need to re-read here? Technically, drivers are
supposed to be leaving uncorrected data in their buffers if they can't
correct it, no?

Similar comments apply to the other cases.

> +			stat = nand_check_erased_ecc_chunk(p, chip->ecc.size,
> +						&chip->buffers->ecccode[i],
> +						chip->ecc.bytes,
> +						NULL, 0,
> +						chip->ecc.strength);
> +		}
> +
> +		if (stat < 0) {
>  			mtd->ecc_stats.failed++;
>  		} else {
>  			mtd->ecc_stats.corrected += stat;
> @@ -1449,6 +1462,16 @@ static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip,
>  
>  		stat = chip->ecc.correct(mtd, p, &ecc_code[i], &ecc_calc[i]);
>  		if (stat < 0) {
> +			/* check for empty pages with bitflips */
> +			chip->cmdfunc(mtd, NAND_CMD_RNDOUT, i, -1);
> +			chip->read_buf(mtd, p, eccsize);
> +			stat = nand_check_erased_ecc_chunk(p, eccsize,
> +						&ecc_code[i], eccbytes,
> +						NULL, 0,
> +						chip->ecc.strength);
> +		}
> +
> +		if (stat < 0) {
>  			mtd->ecc_stats.failed++;
>  		} else {
>  			mtd->ecc_stats.corrected += stat;
> @@ -1501,6 +1524,17 @@ static int nand_read_page_hwecc_oob_first(struct mtd_info *mtd,
>  
>  		stat = chip->ecc.correct(mtd, p, &ecc_code[i], NULL);
>  		if (stat < 0) {
> +			/* check for empty pages with bitflips */
> +			chip->cmdfunc(mtd, NAND_CMD_RNDOUT,
> +				      i + mtd->oobsize, -1);
> +			chip->read_buf(mtd, p, eccsize);
> +			stat = nand_check_erased_ecc_chunk(p, eccsize,
> +						&ecc_code[i], eccbytes,
> +						NULL, 0,
> +						chip->ecc.strength);
> +		}
> +
> +		if (stat < 0) {
>  			mtd->ecc_stats.failed++;
>  		} else {
>  			mtd->ecc_stats.corrected += stat;
> @@ -1527,6 +1561,8 @@ static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip,
>  	int i, eccsize = chip->ecc.size;
>  	int eccbytes = chip->ecc.bytes;
>  	int eccsteps = chip->ecc.steps;
> +	int eccstepsize = eccsize + eccbytes + chip->ecc.prepad +
> +			  chip->ecc.postpad;
>  	uint8_t *p = buf;
>  	uint8_t *oob = chip->oob_poi;
>  	unsigned int max_bitflips = 0;
> @@ -1546,19 +1582,31 @@ static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip,
>  		chip->read_buf(mtd, oob, eccbytes);
>  		stat = chip->ecc.correct(mtd, p, oob, NULL);
>  
> -		if (stat < 0) {
> -			mtd->ecc_stats.failed++;
> -		} else {
> -			mtd->ecc_stats.corrected += stat;
> -			max_bitflips = max_t(unsigned int, max_bitflips, stat);
> -		}
> -
>  		oob += eccbytes;
>  
>  		if (chip->ecc.postpad) {
>  			chip->read_buf(mtd, oob, chip->ecc.postpad);
>  			oob += chip->ecc.postpad;
>  		}
> +
> +		if (stat < 0) {
> +			/* check for empty pages with bitflips */
> +			chip->cmdfunc(mtd, NAND_CMD_RNDOUT,
> +				      i * eccstepsize, -1);
> +			chip->read_buf(mtd, p, chip->ecc.size);
> +			stat = nand_check_erased_ecc_chunk(p, chip->ecc.size,
> +							   oob - eccstepsize,
> +							   eccstepsize,
> +							   NULL, 0,
> +							   chip->ecc.strength);
> +		}
> +
> +		if (stat < 0) {
> +			mtd->ecc_stats.failed++;
> +		} else {
> +			mtd->ecc_stats.corrected += stat;
> +			max_bitflips = max_t(unsigned int, max_bitflips, stat);
> +		}
>  	}
>  
>  	/* Calculate remaining oob bytes */

Brian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions
  2015-09-02 20:35   ` Brian Norris
@ 2015-09-02 20:45     ` Boris Brezillon
  0 siblings, 0 replies; 12+ messages in thread
From: Boris Brezillon @ 2015-09-02 20:45 UTC (permalink / raw)
  To: Brian Norris; +Cc: David Woodhouse, linux-mtd, Andrea Scian, Richard Weinberger

On Wed, 2 Sep 2015 13:35:30 -0700
Brian Norris <computersforpeace@gmail.com> wrote:

> On Mon, Aug 24, 2015 at 11:47:22AM +0200, Boris Brezillon wrote:
> > The default NAND read functions are relying on an underlying controller
> > to correct bitflips, but some of those controller cannot properly fix
> > bitflips in erased pages.
> > In case of ECC failures, check if the page of subpage is empty before
> > reporting an ECC failure.
> > 
> > Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
> 
> General note: this looks pretty good to me. Are there drivers which we
> should kill erased-page checks from now, given this patch? There are
> several of dubious value that we might drop without consequence. But
> with some, I'd wonder if we might cause a performance slowdown and/or
> high CPU utilization -- particularly those that look like they might
> signal ECC errors on all-0xff pages, even with no bitflips.
> 
> > ---
> >  drivers/mtd/nand/nand_base.c | 62 +++++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 55 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> > index 4d2ef65..e095d86 100644
> > --- a/drivers/mtd/nand/nand_base.c
> > +++ b/drivers/mtd/nand/nand_base.c
> > @@ -1400,6 +1400,19 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip,
> >  		stat = chip->ecc.correct(mtd, p,
> >  			&chip->buffers->ecccode[i], &chip->buffers->ecccalc[i]);
> >  		if (stat < 0) {
> 
> I'm not sure if this is a fault of your patch or of the API design, but
> do we want to do erased-ECC checks on all failures, regardless of type?
> I would have expected maybe we could check only for -EBADMSG, but it
> appears that's not consistent. Apparently all correction failures are
> just "some negative value."
> 
> Anyway, if we had better consistency, I'd suggest:
> 
> 		if (stat == -EBADMSG) {

Yes, that would be preferable to avoid useless empty pattern check. 

> 
> But I suppose that 'stat < 0' is the best we can do for now.

I guess that's something we can easily check (I'll have a look).

> 
> > +			/* check for empty pages with bitflips */
> > +			int col = (int)(p - bufpoi);
> > +
> > +			chip->cmdfunc(mtd, NAND_CMD_RNDOUT, col, -1);
> 
> Are all drivers that use this function prepared to handle another RNDOUT
> properly? I know some drivers tend to make assumptions about things that
> nand_base is doing like this. I know that would be a dirty trick, but
> it's not impossible...
> 
> > +			chip->read_buf(mtd, p, chip->ecc.size);
> 
> Also, are you sure we need to re-read here? Technically, drivers are
> supposed to be leaving uncorrected data in their buffers if they can't
> correct it, no?

Normally they should leave the data untouched in this case, but I wasn't
sure all drivers were behaving like this, hence the conservative
approach.
Maybe that's something we can drop, which would also remove the extra
RNDOUT command.


-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] mtd: nand: add nand_check_erased helper functions
  2015-09-02 20:26       ` Brian Norris
@ 2015-09-02 20:51         ` Boris Brezillon
  0 siblings, 0 replies; 12+ messages in thread
From: Boris Brezillon @ 2015-09-02 20:51 UTC (permalink / raw)
  To: Brian Norris; +Cc: David Woodhouse, linux-mtd, Andrea Scian, Richard Weinberger

On Wed, 2 Sep 2015 13:26:50 -0700
Brian Norris <computersforpeace@gmail.com> wrote:

> On Wed, Sep 02, 2015 at 09:30:34PM +0200, Boris Brezillon wrote:
> > On Wed, 2 Sep 2015 11:41:43 -0700
> > Brian Norris <computersforpeace@gmail.com> wrote:
> > 
> > > Hi Boris,
> > > 
> > > First of all, thanks for the persistence.
> > 
> > Hehe, you're not done with me: I still have a bunch of NAND related
> > patches to post ;-). But I need this one to be accepted first.
> 
> :)
> 
> > > 
> > > On Mon, Aug 24, 2015 at 11:47:21AM +0200, Boris Brezillon wrote:
> > > > Add two helper functions to help NAND controller drivers test whether a
> > > > specific NAND region is erased or not.
> > > 
> > > What sort of tests (if any) did you run for this?
> > 
> > As I said, I have a series depending on this patch (see this branch [1]
> > if you want more details), which means I successfully tested it on a
> > sunxi platform.
> 
> OK. I'm a little wary of other platforms that may not, for instance,
> expect the additional RNDOUT command. Hopefully we can get some more
> testing, or at least I'll try to look through a few more drivers for
> special cases.

Hm, I guess you're talking about patch 2, because this one does not
directly interact with the NAND.


> > > > + */
> > > > +int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
> > > 
> > > I don't have much problem with this function as a helper, but does it
> > > really need to be exported? This is exactly the form I *don't* want to
> > > encourage -- checking a data buffer while ignoring the OOB / ECC. Or did
> > > you have a good reason to export this one? I see you don't use it
> > > outside of this file.
> > 
> > I currently have no reason to export this function, but if we want to
> > reuse part of the logic for non standard drivers we might need to
> > export it afterwards.
> > The example I have in mind is the GPMI driver, which also need to count
> > bitflips, with the exception that ECC buffers are not necessarily
> > aligned on a byte. So exposing this function would allow reusing a
> > common logic for aligned buffers, and then dealing with unaligned parts
> > in driver specific code.
> 
> Hmm, well I think I'd still be happier if we had an example
> implementation that used it before exporting it. The GPMI case is
> interesting, but I'd like to see code first.

Sounds reasonable. I'll drop the EXPORT and make this function static.



> > > Does it make any sense to have a short-circuit condition for the
> > > threshold == 0 case? We could expect to see that for common 1-bit ECC or
> > > similar.
> > 
> > Hm, I'm not sure I understand your suggestion. How would you shortcut
> > this case? You still need to detect one bitflip before returning, right?
> > Are you suggesting to use memweight() in this case? Because if you are,
> > then I don't see any benefit since I have copied the memweight()
> > implementation to implement the nand_check_erased_buf() logic.
> > The only thing that could be avoided are the extra
> > 'bitflips > bitflips_threshold' checks, but I'm not sure it makes such a
> > big difference for small/medium NAND pages (and 1-bit ECC will
> > definitely be used on relatively small pages).
> 
> I probably wasn't too clear.
> 
> You don't need any weight check at all. You can essentially just do
> memcmp() with 0xff. Or since memcmp() requires a second buffer, maybe
> just roll our own loop checking for 0xffffffff. But that may not be
> worth it.

Yep, I thought about memcmp(), but allocating a buffer (or filling an
existing buffer with 0xff) just for that seems a bit overkill.
This leaves the specialized loop solution, but I think we should leave
it as a possible improvement if someone feels the need to get better
perfs in this case. 

> 
> > > 
> > > > +	for (; len && ((unsigned long)bitmap) % sizeof(long);
> > > 
> > > Cast pointer to unsigned long? I'd expect uintptr_t, but then it sees
> > > those are the same often enough that types.h says they're identical...
> > 
> > As I said in the header note, I blindly copied memweight()
> > implementation, sightly adapting it for my need, so there might be room
> > for improvement.
> 
> Ah, right. Well I guess my comments stand anyway.

Sure, I'll address all of them in v3.

> 
> > > 
> > > > +	     len--, bitmap++) {
> > > > +		weight = hweight8(*bitmap);
> > > > +		bitflips += BITS_PER_BYTE - weight;
> > > > +		if (unlikely(bitflips > bitflips_threshold))
> > > > +			return -EINVAL;
> > > 
> > > I'm not sure EINVAL is the right thing. This isn't a bad argument, it's
> > > just a bad flash. EUCLEAN would fit typical usage. Same comment applies
> > > elsewhere.
> > 
> > Yep, I was considering replacing EINVAL by EIO, but EUCLEAN is better.
> > I'll fix that.
> 
> I think I actually meant EBADMSG, not EUCLEAN. The MTD API uses the
> former to mean uncorrectable and the latter to mean correctable.

Right.


-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] mtd: nand: add nand_check_erased helper functions
  2015-09-02 19:30     ` Boris Brezillon
  2015-09-02 20:26       ` Brian Norris
@ 2015-09-03 13:22       ` Andrea Scian
  1 sibling, 0 replies; 12+ messages in thread
From: Andrea Scian @ 2015-09-03 13:22 UTC (permalink / raw)
  To: Boris Brezillon, Brian Norris
  Cc: David Woodhouse, linux-mtd, Richard Weinberger


Dear Boris,

Il 02/09/2015 21:30, Boris Brezillon ha scritto:
> Hi,
>
> On Wed, 2 Sep 2015 11:41:43 -0700
> Brian Norris <computersforpeace@gmail.com> wrote:
>
>> Hi Boris,
>>
>> First of all, thanks for the persistence.
>
> Hehe, you're not done with me: I still have a bunch of NAND related
> patches to post ;-). But I need this one to be accepted first.
>
>>
>> On Mon, Aug 24, 2015 at 11:47:21AM +0200, Boris Brezillon wrote:
>>> Add two helper functions to help NAND controller drivers test whether a
>>> specific NAND region is erased or not.
>>
>> What sort of tests (if any) did you run for this?
>
> As I said, I have a series depending on this patch (see this branch [1]
> if you want more details), which means I successfully tested it on a
> sunxi platform.
> BTW, I tested it with the nandflipbits tool [2] I posted a while ago.
> Could you consider taking it (or a similar tool) in mtd-utils?
>
> Andrea, did you test this patch on your imx platform(s)?


I'm still working on it on my spare time
I got your patches applied nearly cleanly on my 3.10 tree but I still 
need some time to use them into the imx driver.
I hope I'll find some time to think about it next days and sign it as 
tested-by ;-)

Kind Regards,

-- 

Andrea SCIAN

DAVE Embedded Systems

>
>>
>>> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
>>> ---
>>>   drivers/mtd/nand/nand_base.c | 108 +++++++++++++++++++++++++++++++++++++++++++
>>>   include/linux/mtd/nand.h     |   8 ++++
>>>   2 files changed, 116 insertions(+)
>>>
>>> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
>>> index ceb68ca..4d2ef65 100644
>>> --- a/drivers/mtd/nand/nand_base.c
>>> +++ b/drivers/mtd/nand/nand_base.c
>>> @@ -1101,6 +1101,114 @@ out:
>>>   EXPORT_SYMBOL(nand_lock);
>>>
>>>   /**
>>> + * nand_check_erased_buf - check if a buffer contains (almost) only 0xff data
>>> + * @buf: buffer to test
>>> + * @len: buffer length
>>> + * @bitflips_threshold: maximum number of bitflips
>>> + *
>>> + * Check if a buffer contains only 0xff, which means the underlying region
>>> + * has been erased and is ready to be programmed.
>>> + * The bitflips_threshold specify the maximum number of bitflips before
>>> + * considering the region is not erased.
>>> + * Note: The logic of this function has been extracted from the memweight
>>> + * implementation, except that nand_check_erased_buf function exit before
>>> + * testing the whole buffer if the number of bitflips exceed the
>>> + * bitflips_threshold value.
>>> + *
>>> + * Returns a positive number of bitflips or -ERROR_CODE.
>>
>> Small clarification:
>>
>>    "Returns a positive number of bitflips less than or equal to
>>    bitflips_threshold, or -ERROR_CODE for bitflips in excess of the
>>    threshold."
>>
>
> Sure, I'll change this comment.
>
>>> + */
>>> +int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
>>
>> I don't have much problem with this function as a helper, but does it
>> really need to be exported? This is exactly the form I *don't* want to
>> encourage -- checking a data buffer while ignoring the OOB / ECC. Or did
>> you have a good reason to export this one? I see you don't use it
>> outside of this file.
>
> I currently have no reason to export this function, but if we want to
> reuse part of the logic for non standard drivers we might need to
> export it afterwards.
> The example I have in mind is the GPMI driver, which also need to count
> bitflips, with the exception that ECC buffers are not necessarily
> aligned on a byte. So exposing this function would allow reusing a
> common logic for aligned buffers, and then dealing with unaligned parts
> in driver specific code.
>
>>
>> And especially if we don't end up exporting this one, I'd really like
>> some extra explanatory text next to the public interface
>> (nand_check_erased_ecc_chunk()) to explain some of the pitfalls.
>> Particularly, why we must check both the data and spare area, and how we
>> highly recommend (if not require) that new drivers use the existing
>> methods rather than roll their own.
>
> This is something I can add even if we keep exporting the previous
> function ;-).
>
>>
>>> +{
>>> +	const unsigned char *bitmap = buf;
>>> +	int bitflips = 0;
>>> +	int weight;
>>> +	int longs;
>>> +
>>
>> Does it make any sense to have a short-circuit condition for the
>> threshold == 0 case? We could expect to see that for common 1-bit ECC or
>> similar.
>
> Hm, I'm not sure I understand your suggestion. How would you shortcut
> this case? You still need to detect one bitflip before returning, right?
> Are you suggesting to use memweight() in this case? Because if you are,
> then I don't see any benefit since I have copied the memweight()
> implementation to implement the nand_check_erased_buf() logic.
> The only thing that could be avoided are the extra
> 'bitflips > bitflips_threshold' checks, but I'm not sure it makes such a
> big difference for small/medium NAND pages (and 1-bit ECC will
> definitely be used on relatively small pages).
>
>>
>>> +	for (; len && ((unsigned long)bitmap) % sizeof(long);
>>
>> Cast pointer to unsigned long? I'd expect uintptr_t, but then it sees
>> those are the same often enough that types.h says they're identical...
>
> As I said in the header note, I blindly copied memweight()
> implementation, sightly adapting it for my need, so there might be room
> for improvement.
>
>>
>>> +	     len--, bitmap++) {
>>> +		weight = hweight8(*bitmap);
>>> +		bitflips += BITS_PER_BYTE - weight;
>>> +		if (unlikely(bitflips > bitflips_threshold))
>>> +			return -EINVAL;
>>
>> I'm not sure EINVAL is the right thing. This isn't a bad argument, it's
>> just a bad flash. EUCLEAN would fit typical usage. Same comment applies
>> elsewhere.
>
> Yep, I was considering replacing EINVAL by EIO, but EUCLEAN is better.
> I'll fix that.
>
>>
>>> +	}
>>> +
>>> +
>>> +	for (longs = len / sizeof(long); longs;
>>> +	     longs--, bitmap += sizeof(long)) {
>>> +		BUG_ON(longs >= INT_MAX / BITS_PER_LONG);
>>
>> Please don't add BUG_ON(). The language around it has gotten even
>> stronger to discourage its use. And in this instance I don't think it's
>> even helpful; because 'longs' is derived from 'len' (which is an 'int'),
>> this condition is mathematically impossible, no?
>
> Ditto: everything was blindly copied from the memweight implementation.
>
>>
>> Also (feel free to shoot me down), wouldn't the loop bounds be a little
>> more straightforward (and consistent with the rest of the function), and
>> not require the 'longs' variable, when written as the following?
>>
>> 	for (; len >= sizeof(long); len -= sizeof(long), bitmap += sizeof(long)) {
>> 		...
>> 	}
>
> Definitely better.
>
>>
>>> +		weight = hweight_long(*((unsigned long *)bitmap));
>>> +		bitflips += BITS_PER_LONG - weight;
>>> +		if (unlikely(bitflips > bitflips_threshold))
>>> +			return -EINVAL;
>>> +	}
>>> +
>>> +	len %= sizeof(long);
>>
>> If you did the above, then you shouldn't need this line.
>
> Yep.
>
>>
>>> +
>>> +	for (; len > 0; len--, bitmap++) {
>>> +		weight = hweight8(*bitmap);
>>> +		bitflips += BITS_PER_BYTE - weight;
>>> +		if (unlikely(bitflips > bitflips_threshold))
>>> +			return -EINVAL;
>>> +	}
>>> +
>>> +	return bitflips;
>>> +}
>>> +EXPORT_SYMBOL(nand_check_erased_buf);
>>> +
>>> +/**
>>> + * nand_check_erased_ecc_chunk - check if an ECC chunk contains (almost) only
>>> + *				 0xff data
>>> + * @data: data buffer to test
>>> + * @datalen: data length
>>> + * @ecc: ECC buffer
>>> + * @ecclen: ECC length
>>> + * @extraoob: extra OOB buffer
>>> + * @extraooblen: extra OOB length
>>> + * @bitflips_threshold: maximum number of bitflips
>>> + *
>>> + * Check if a data buffer and its associated ECC and OOB data contains only
>>> + * 0xff pattern, which means the underlying region has been erased and is
>>> + * ready to be programmed.
>>> + * The bitflips_threshold specify the maximum number of bitflips before
>>> + * considering the region as not erased.
>>> + *
>>> + * Returns a positive number of bitflips or -ERROR_CODE.
>>
>> Add similar language from above? Also, it's very important to note that
>> this function wipes the buffers in the nearly-erased case!
>
> Yes, I'll add that in the comment, in addition to a lot more precise
> description of how to use it and the importance of the ecc and extraoob
> fields (even if they are optionals).
>
>
> Thanks for your review.
>
> Best Regards,
>
> Boris
>
> [1]https://github.com/bbrezillon/linux-sunxi/tree/sunxi/mtd-next
> [2]http://patchwork.ozlabs.org/patch/415517/
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-09-03 13:22 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-24  9:47 [PATCH v2 0/2] mtd: nand: properly handle bitflips in erased pages Boris Brezillon
2015-08-24  9:47 ` [PATCH v2 1/2] mtd: nand: add nand_check_erased helper functions Boris Brezillon
2015-09-02 18:41   ` Brian Norris
2015-09-02 19:30     ` Boris Brezillon
2015-09-02 20:26       ` Brian Norris
2015-09-02 20:51         ` Boris Brezillon
2015-09-03 13:22       ` Andrea Scian
2015-08-24  9:47 ` [PATCH v2 2/2] mtd: nand: use nand_check_erased_ecc_chunk in default ECC read functions Boris Brezillon
2015-09-02 20:35   ` Brian Norris
2015-09-02 20:45     ` Boris Brezillon
2015-09-02 12:46 ` [PATCH v2 0/2] mtd: nand: properly handle bitflips in erased pages Boris Brezillon
2015-09-02 19:43 ` Brian Norris

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.