All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads
@ 2012-03-15 17:25 Mike Dunn
  2012-03-15 17:25 ` [PATCH 1/3] MTD: expose ecc_strength through sysfs Mike Dunn
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Mike Dunn @ 2012-03-15 17:25 UTC (permalink / raw)
  To: linux-mtd; +Cc: Ricard Wanderlof, Robert Jarzmik, Mike Dunn

Hi,

These are a repost of the last two patches in the patchset of the same name that
was posted the other day, with the addition of patch #1, which is new.  This
version incorporates Artem's comments.  The changes are minimal, but I did test
again.

This patchset addresses the problem of insufficient information being returned
by mtd_read() and mtd_read_oob() with regard to bit error corrections.
Currently -EUCLEAN is returned if one or more bit errors were corrected during
the course of the read operation.  Higher layers like UBI use this return code
as an indication that the erase block may be degrading and should be considered
as a candidate for being marked as a bad block.  The problem is that high bit
error rates are common on more recent NAND flash devices, which these devices
compensate for by using strong ecc algorithms.  Frequent (but entirely normal)
bit error corrections on these devices result in blocks being incorrectly marked
as bad.  On some devices, ubi/ubifs can not be reliably used because of this.

This problem was discussed a while back [1][2][3], and a consensus of sorts was
reached for a solution, which these patches implement.  The recent addition of
the mtd api entry functions now make this solution practical (thanks Artem!).
These patches are on top of those which added ecc_strength to mtd_info and had
the drivers initialize it.  (Those patches have already been pushed.)  A quick
description of each patch will provide a synopsis of the patchset:

(1) The struct mtd_info element 'ecc_strength' is exposed through sysfs as a
    read-only variable.

(2) The element 'bitflip_threshold' is added to struct mtd_info.  If the driver
    leaves this uninitialized, mtd sets it to ecc_strength when the device or
    partition is registered.  This element is also exposed for reading and
    writing from userspace through sysfs.

(3) The drivers' read methods, absent an error, return a non-negative integer
    indicating the maximum number of bit errors that were corrected in any one
    writesize region.  MTD returns -EUCLEAN if this is >= bitflip_threshold, 0
    otherwise.

So basically, the meaning of -EUCLEAN is changed from "one or more bit errors
were corrected", to "a dangerously high number of bit errors were corrected on
one or more writesize regions".  By default, "dangerously high" is interpreted
as ecc_strength.  Drivers can specify a different value, and the user can
override it if more or less caution regarding data integrity is desired.

Of course reviews greatly appreciated.  Thanks!

[1] http://lists.infradead.org/pipermail/linux-mtd/2011-December/038755.html
[2] http://lists.infradead.org/pipermail/linux-mtd/2011-December/038688.html
[3] http://lists.infradead.org/pipermail/linux-mtd/2011-December/038828.html


Mike Dunn (3):
  MTD: expose ecc_strength through sysfs
  MTD: bitflip_threshold added to mtd_info and sysfs
  MTD: drivers return max_bitflips, mtd returns -EUCLEAN

 Documentation/ABI/testing/sysfs-class-mtd |   42 +++++++++++++++
 drivers/mtd/devices/docg3.c               |    4 +-
 drivers/mtd/mtdcore.c                     |   78 ++++++++++++++++++++++++++++-
 drivers/mtd/mtdpart.c                     |   25 +++++-----
 drivers/mtd/nand/alauda.c                 |    4 +-
 drivers/mtd/nand/nand_base.c              |   14 +++++-
 drivers/mtd/onenand/onenand_base.c        |    6 ++-
 include/linux/mtd/mtd.h                   |   19 ++++---
 8 files changed, 163 insertions(+), 29 deletions(-)

-- 
1.7.3.4

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/3] MTD: expose ecc_strength through sysfs
  2012-03-15 17:25 [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads Mike Dunn
@ 2012-03-15 17:25 ` Mike Dunn
  2012-03-15 17:25 ` [PATCH 2/3] MTD: bitflip_threshold added to mtd_info and sysfs Mike Dunn
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 25+ messages in thread
From: Mike Dunn @ 2012-03-15 17:25 UTC (permalink / raw)
  To: linux-mtd; +Cc: Ricard Wanderlof, Mike Dunn

This patch exposes the struct mtd_info element ecc_strength as a read-only
variable through sysfs.

Signed-off-by: Mike Dunn <mikedunn@newsguy.com>
---
 Documentation/ABI/testing/sysfs-class-mtd |   11 +++++++++++
 drivers/mtd/mtdcore.c                     |   10 ++++++++++
 2 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-class-mtd b/Documentation/ABI/testing/sysfs-class-mtd
index 4d55a18..6a218e6 100644
--- a/Documentation/ABI/testing/sysfs-class-mtd
+++ b/Documentation/ABI/testing/sysfs-class-mtd
@@ -123,3 +123,14 @@ Description:
 		half page, or a quarter page).
 
 		In the case of ECC NOR, it is the ECC block size.
+
+What:		/sys/class/mtd/mtdX/ecc_strength
+Date:		March 2012
+KernelVersion:	3.3.1
+Contact:	linux-mtd@lists.infradead.org
+Description:
+		Maximum number of bit errors that the device is capable of
+		correcting when reading a region of size writesize.  This will
+		always be a non-negative integer.
+
+		In the case of devices lacking any ECC capability, it is 0.
diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index b274fdf..1573644 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -250,6 +250,15 @@ static ssize_t mtd_name_show(struct device *dev,
 }
 static DEVICE_ATTR(name, S_IRUGO, mtd_name_show, NULL);
 
+static ssize_t mtd_ecc_strength_show(struct device *dev,
+				     struct device_attribute *attr, char *buf)
+{
+	struct mtd_info *mtd = dev_get_drvdata(dev);
+
+	return snprintf(buf, PAGE_SIZE, "%u\n", mtd->ecc_strength);
+}
+static DEVICE_ATTR(ecc_strength, S_IRUGO, mtd_ecc_strength_show, NULL);
+
 static struct attribute *mtd_attrs[] = {
 	&dev_attr_type.attr,
 	&dev_attr_flags.attr,
@@ -260,6 +269,7 @@ static struct attribute *mtd_attrs[] = {
 	&dev_attr_oobsize.attr,
 	&dev_attr_numeraseregions.attr,
 	&dev_attr_name.attr,
+	&dev_attr_ecc_strength.attr,
 	NULL,
 };
 
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/3] MTD: bitflip_threshold added to mtd_info and sysfs
  2012-03-15 17:25 [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads Mike Dunn
  2012-03-15 17:25 ` [PATCH 1/3] MTD: expose ecc_strength through sysfs Mike Dunn
@ 2012-03-15 17:25 ` Mike Dunn
  2012-03-16 16:31   ` Ivan Djelic
  2012-03-15 17:25 ` [PATCH 3/3] MTD: drivers return max_bitflips, mtd returns -EUCLEAN Mike Dunn
  2012-03-16 11:19 ` [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads Ivan Djelic
  3 siblings, 1 reply; 25+ messages in thread
From: Mike Dunn @ 2012-03-15 17:25 UTC (permalink / raw)
  To: linux-mtd; +Cc: Ricard Wanderlof, Mike Dunn

The element 'bitflip_threshold' is added to struct mtd_info.  If the driver
leaves this uninitialized, mtd sets it to ecc_strength when the device or
partition is registered.  This element is also exposed for reading and writing
from userspace through sysfs.

Signed-off-by: Mike Dunn <mikedunn@newsguy.com>
---
 Documentation/ABI/testing/sysfs-class-mtd |   31 +++++++++++++++++++++++++++
 drivers/mtd/mtdcore.c                     |   33 +++++++++++++++++++++++++++++
 include/linux/mtd/mtd.h                   |    9 ++++++++
 3 files changed, 73 insertions(+), 0 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-class-mtd b/Documentation/ABI/testing/sysfs-class-mtd
index 6a218e6..e2f24c8 100644
--- a/Documentation/ABI/testing/sysfs-class-mtd
+++ b/Documentation/ABI/testing/sysfs-class-mtd
@@ -134,3 +134,34 @@ Description:
 		always be a non-negative integer.
 
 		In the case of devices lacking any ECC capability, it is 0.
+
+What:		/sys/class/mtd/mtdX/bitflip_threshold
+Date:		March 2012
+KernelVersion:	3.3.1
+Contact:	linux-mtd@lists.infradead.org
+Description:
+		This allows the user to examine and adjust the criteria by which
+		mtd returns -EUCLEAN from mtd_read() and mtd_read_oob().  If the
+		maximum number of bit errors that were corrected on any single
+		writesize region (as reported by the driver) equals or exceeds
+		this value, -EUCLEAN is returned.  Otherwise, absent an error, 0
+		is returned.  Higher layers (e.g., UBI) use this return code as
+		an indication that an erase block may be degrading and should be
+		scrutinized as a candidate for being marked as bad.
+
+		The initial value may be specified by the flash device driver.
+		If not, then the default value is ecc_strength.  Users who wish
+		to be more paranoid about data integrity can lower the value.
+		If the value exceeds ecc_strength, -EUCLEAN is never returned by
+		the read functions.
+
+		Note that the introduction of this feature brings a subtle
+		change to the meaning of the -EUCLEAN return code.  Previously,
+		it was interpreted to mean simply "one or more bit errors were
+		corrected".  Its new interpretation can be phrased as "a
+		dangerously high number of bit errors were corrected on one or
+		more writesize regions".  The precise definition of "dangerously
+		high" can be adjusted by the user with bitflip_threshold.
+
+		This is generally applicable only to NAND flash devices with ECC
+		capability.
diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index 1573644..7e20335 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -259,6 +259,34 @@ static ssize_t mtd_ecc_strength_show(struct device *dev,
 }
 static DEVICE_ATTR(ecc_strength, S_IRUGO, mtd_ecc_strength_show, NULL);
 
+static ssize_t mtd_bitflip_threshold_show(struct device *dev,
+					  struct device_attribute *attr,
+					  char *buf)
+{
+	struct mtd_info *mtd = dev_get_drvdata(dev);
+
+	return snprintf(buf, PAGE_SIZE, "%u\n", mtd->bitflip_threshold);
+}
+
+static ssize_t mtd_bitflip_threshold_store(struct device *dev,
+					   struct device_attribute *attr,
+					   const char *buf, size_t count)
+{
+	struct mtd_info *mtd = dev_get_drvdata(dev);
+	unsigned int bitflip_threshold;
+	int retval;
+
+	retval = kstrtouint(buf, 0, &bitflip_threshold);
+	if (retval)
+		return retval;
+
+	mtd->bitflip_threshold = bitflip_threshold;
+	return count;
+}
+static DEVICE_ATTR(bitflip_threshold, S_IRUGO | S_IWUSR,
+		   mtd_bitflip_threshold_show,
+		   mtd_bitflip_threshold_store);
+
 static struct attribute *mtd_attrs[] = {
 	&dev_attr_type.attr,
 	&dev_attr_flags.attr,
@@ -270,6 +298,7 @@ static struct attribute *mtd_attrs[] = {
 	&dev_attr_numeraseregions.attr,
 	&dev_attr_name.attr,
 	&dev_attr_ecc_strength.attr,
+	&dev_attr_bitflip_threshold.attr,
 	NULL,
 };
 
@@ -332,6 +361,10 @@ int add_mtd_device(struct mtd_info *mtd)
 	mtd->index = i;
 	mtd->usecount = 0;
 
+	/* default value if not set by driver */
+	if (mtd->bitflip_threshold == 0)
+		mtd->bitflip_threshold = mtd->ecc_strength;
+
 	if (is_power_of_2(mtd->erasesize))
 		mtd->erasesize_shift = ffs(mtd->erasesize) - 1;
 	else
diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h
index cf5ea8c..1c3706e 100644
--- a/include/linux/mtd/mtd.h
+++ b/include/linux/mtd/mtd.h
@@ -157,6 +157,15 @@ struct mtd_info {
 	unsigned int erasesize_mask;
 	unsigned int writesize_mask;
 
+	/*
+	 * read ops return -EUCLEAN if max number of bitflips corrected on any
+	 * one writesize region equals or exceeds this value.  Settable by
+	 * driver, else defaults to ecc_strength.  User can override in sysfs.
+	 * N.B. The meaning of the -EUCLEAN return code has changed; see
+	 * Documentation/ABI/testing/sysfs-class-mtd for more detail.
+	 */
+	unsigned int bitflip_threshold;
+
 	// Kernel-only stuff starts here.
 	const char *name;
 	int index;
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/3] MTD: drivers return max_bitflips, mtd returns -EUCLEAN
  2012-03-15 17:25 [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads Mike Dunn
  2012-03-15 17:25 ` [PATCH 1/3] MTD: expose ecc_strength through sysfs Mike Dunn
  2012-03-15 17:25 ` [PATCH 2/3] MTD: bitflip_threshold added to mtd_info and sysfs Mike Dunn
@ 2012-03-15 17:25 ` Mike Dunn
  2012-03-16 11:19 ` [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads Ivan Djelic
  3 siblings, 0 replies; 25+ messages in thread
From: Mike Dunn @ 2012-03-15 17:25 UTC (permalink / raw)
  To: linux-mtd; +Cc: Ricard Wanderlof, Robert Jarzmik, Mike Dunn

The drivers' read methods, absent an error, return a non-negative integer
indicating the maximum number of bit errors that were corrected in any one
writesize region.  MTD returns -EUCLEAN if this is >= bitflip_threshold, 0
otherwise.  Note that this is a subtle change to the driver interface.

This and the preceeding patches in this set were tested with ubi on top of the
nandsim and docg4 devices, running the ubi test io_basic from mtd-utils.

Signed-off-by: Mike Dunn <mikedunn@newsguy.com>
---
 drivers/mtd/devices/docg3.c        |    4 +++-
 drivers/mtd/mtdcore.c              |   35 ++++++++++++++++++++++++++++++++++-
 drivers/mtd/mtdpart.c              |   25 +++++++++++++------------
 drivers/mtd/nand/alauda.c          |    4 ++--
 drivers/mtd/nand/nand_base.c       |   14 ++++++++++++--
 drivers/mtd/onenand/onenand_base.c |    6 ++++--
 include/linux/mtd/mtd.h            |   10 +---------
 7 files changed, 69 insertions(+), 29 deletions(-)

diff --git a/drivers/mtd/devices/docg3.c b/drivers/mtd/devices/docg3.c
index 25f688c..1d677a61 100644
--- a/drivers/mtd/devices/docg3.c
+++ b/drivers/mtd/devices/docg3.c
@@ -854,6 +854,7 @@ static int doc_read_oob(struct mtd_info *mtd, loff_t from,
 	u8 *buf = ops->datbuf;
 	size_t len, ooblen, nbdata, nboob;
 	u8 hwecc[DOC_ECC_BCH_SIZE], eccconf1;
+	int max_bitflips = 0;
 
 	if (buf)
 		len = ops->len;
@@ -941,7 +942,8 @@ static int doc_read_oob(struct mtd_info *mtd, loff_t from,
 			}
 			if (ret > 0) {
 				mtd->ecc_stats.corrected += ret;
-				ret = -EUCLEAN;
+				max_bitflips = max(max_bitflips, ret);
+				ret = max_bitflips;
 			}
 		}
 
diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index 7e20335..4bb8f3a 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -799,12 +799,24 @@ EXPORT_SYMBOL_GPL(mtd_get_unmapped_area);
 int mtd_read(struct mtd_info *mtd, loff_t from, size_t len, size_t *retlen,
 	     u_char *buf)
 {
+	int ret_code;
 	*retlen = 0;
 	if (from < 0 || from > mtd->size || len > mtd->size - from)
 		return -EINVAL;
 	if (!len)
 		return 0;
-	return mtd->_read(mtd, from, len, retlen, buf);
+
+	/*
+	 * In the absence of an error, drivers return a non-negative integer
+	 * representing the maximum number of bitflips that were corrected on
+	 * any one writesize region (typically a NAND page).
+	 */
+	ret_code = mtd->_read(mtd, from, len, retlen, buf);
+
+	if (unlikely(ret_code < 0))
+		return ret_code;
+
+	return ret_code >= mtd->bitflip_threshold ? -EUCLEAN : 0;
 }
 EXPORT_SYMBOL_GPL(mtd_read);
 
@@ -845,6 +857,27 @@ int mtd_panic_write(struct mtd_info *mtd, loff_t to, size_t len, size_t *retlen,
 }
 EXPORT_SYMBOL_GPL(mtd_panic_write);
 
+int mtd_read_oob(struct mtd_info *mtd, loff_t from, struct mtd_oob_ops *ops)
+{
+	int ret_code;
+	ops->retlen = ops->oobretlen = 0;
+	if (!mtd->_read_oob)
+		return -EOPNOTSUPP;
+
+	/*
+	 * In the absence of an error, drivers return a non-negative integer
+	 * representing the maximum number of bitflips that were corrected on
+	 * any one writesize region (typically a NAND page).
+	 */
+	ret_code = mtd->_read_oob(mtd, from, ops);
+
+	if (unlikely(ret_code < 0))
+		return ret_code;
+
+	return ret_code >= mtd->bitflip_threshold ? -EUCLEAN : 0;
+}
+EXPORT_SYMBOL_GPL(mtd_read_oob);
+
 /*
  * Method to access the protection register area, present in some flash
  * devices. The user data is one time programmable but the factory data is read
diff --git a/drivers/mtd/mtdpart.c b/drivers/mtd/mtdpart.c
index 9651c06..24022ce 100644
--- a/drivers/mtd/mtdpart.c
+++ b/drivers/mtd/mtdpart.c
@@ -67,12 +67,12 @@ static int part_read(struct mtd_info *mtd, loff_t from, size_t len,
 	stats = part->master->ecc_stats;
 	res = part->master->_read(part->master, from + part->offset, len,
 				  retlen, buf);
-	if (unlikely(res)) {
-		if (mtd_is_bitflip(res))
-			mtd->ecc_stats.corrected += part->master->ecc_stats.corrected - stats.corrected;
-		if (mtd_is_eccerr(res))
-			mtd->ecc_stats.failed += part->master->ecc_stats.failed - stats.failed;
-	}
+	if (unlikely(mtd_is_eccerr(res)))
+		mtd->ecc_stats.failed +=
+			part->master->ecc_stats.failed - stats.failed;
+	else
+		mtd->ecc_stats.corrected +=
+			part->master->ecc_stats.corrected - stats.corrected;
 	return res;
 }
 
@@ -108,6 +108,7 @@ static int part_read_oob(struct mtd_info *mtd, loff_t from,
 		struct mtd_oob_ops *ops)
 {
 	struct mtd_part *part = PART(mtd);
+	struct mtd_ecc_stats stats = part->master->ecc_stats;
 	int res;
 
 	if (from >= mtd->size)
@@ -133,12 +134,12 @@ static int part_read_oob(struct mtd_info *mtd, loff_t from,
 	}
 
 	res = part->master->_read_oob(part->master, from + part->offset, ops);
-	if (unlikely(res)) {
-		if (mtd_is_bitflip(res))
-			mtd->ecc_stats.corrected++;
-		if (mtd_is_eccerr(res))
-			mtd->ecc_stats.failed++;
-	}
+	if (unlikely(mtd_is_eccerr(res)))
+		mtd->ecc_stats.failed +=
+			part->master->ecc_stats.failed - stats.failed;
+	else
+		mtd->ecc_stats.corrected +=
+			part->master->ecc_stats.corrected - stats.corrected;
 	return res;
 }
 
diff --git a/drivers/mtd/nand/alauda.c b/drivers/mtd/nand/alauda.c
index 4f20e1d..7e59e83 100644
--- a/drivers/mtd/nand/alauda.c
+++ b/drivers/mtd/nand/alauda.c
@@ -414,7 +414,7 @@ static int alauda_bounce_read(struct mtd_info *mtd, loff_t from, size_t len,
 	}
 	err = 0;
 	if (corrected)
-		err = -EUCLEAN;
+		err = 1;	/* return max bitflips per page */
 	if (uncorrected)
 		err = -EBADMSG;
 out:
@@ -446,7 +446,7 @@ static int alauda_read(struct mtd_info *mtd, loff_t from, size_t len,
 	}
 	err = 0;
 	if (corrected)
-		err = -EUCLEAN;
+		err = 1;	/* return max bitflips per page */
 	if (uncorrected)
 		err = -EBADMSG;
 	return err;
diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 8008853..e2bf003 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -1469,6 +1469,7 @@ static int nand_do_read_ops(struct mtd_info *mtd, loff_t from,
 	uint32_t oobreadlen = ops->ooblen;
 	uint32_t max_oobsize = ops->mode == MTD_OPS_AUTO_OOB ?
 		mtd->oobavail : mtd->oobsize;
+	unsigned int max_bitflips = 0;
 
 	uint8_t *bufpoi, *oob, *buf;
 
@@ -1491,6 +1492,8 @@ static int nand_do_read_ops(struct mtd_info *mtd, loff_t from,
 
 		/* Is the current page in the buffer? */
 		if (realpage != chip->pagebuf || oob) {
+			unsigned int prev_corrected = mtd->ecc_stats.corrected;
+
 			bufpoi = aligned ? buf : chip->buffers->databuf;
 
 			if (likely(sndcmd)) {
@@ -1553,6 +1556,9 @@ static int nand_do_read_ops(struct mtd_info *mtd, loff_t from,
 				else
 					nand_wait_ready(mtd);
 			}
+			max_bitflips = max(max_bitflips,
+					   mtd->ecc_stats.corrected -
+					   prev_corrected);
 		} else {
 			memcpy(buf, chip->buffers->databuf + col, bytes);
 			buf += bytes;
@@ -1594,7 +1600,7 @@ static int nand_do_read_ops(struct mtd_info *mtd, loff_t from,
 	if (mtd->ecc_stats.failed - stats.failed)
 		return -EBADMSG;
 
-	return  mtd->ecc_stats.corrected - stats.corrected ? -EUCLEAN : 0;
+	return max_bitflips;
 }
 
 /**
@@ -1782,6 +1788,7 @@ static int nand_do_read_oob(struct mtd_info *mtd, loff_t from,
 	int readlen = ops->ooblen;
 	int len;
 	uint8_t *buf = ops->oobbuf;
+	unsigned int max_bitflips = 0;
 
 	pr_debug("%s: from = 0x%08Lx, len = %i\n",
 			__func__, (unsigned long long)from, readlen);
@@ -1816,6 +1823,7 @@ static int nand_do_read_oob(struct mtd_info *mtd, loff_t from,
 	page = realpage & chip->pagemask;
 
 	while (1) {
+		unsigned int prev_corrected = mtd->ecc_stats.corrected;
 		if (ops->mode == MTD_OPS_RAW)
 			sndcmd = chip->ecc.read_oob_raw(mtd, chip, page, sndcmd);
 		else
@@ -1836,6 +1844,8 @@ static int nand_do_read_oob(struct mtd_info *mtd, loff_t from,
 			else
 				nand_wait_ready(mtd);
 		}
+		max_bitflips = max(max_bitflips,
+				   mtd->ecc_stats.corrected - prev_corrected);
 
 		readlen -= len;
 		if (!readlen)
@@ -1865,7 +1875,7 @@ static int nand_do_read_oob(struct mtd_info *mtd, loff_t from,
 	if (mtd->ecc_stats.failed - stats.failed)
 		return -EBADMSG;
 
-	return  mtd->ecc_stats.corrected - stats.corrected ? -EUCLEAN : 0;
+	return  max_bitflips;
 }
 
 /**
diff --git a/drivers/mtd/onenand/onenand_base.c b/drivers/mtd/onenand/onenand_base.c
index 3d781b8..177b0a5 100644
--- a/drivers/mtd/onenand/onenand_base.c
+++ b/drivers/mtd/onenand/onenand_base.c
@@ -1201,7 +1201,8 @@ static int onenand_mlc_read_ops_nolock(struct mtd_info *mtd, loff_t from,
 	if (mtd->ecc_stats.failed - stats.failed)
 		return -EBADMSG;
 
-	return mtd->ecc_stats.corrected - stats.corrected ? -EUCLEAN : 0;
+	/* return max bitflips per page; ONENANDs correct 1 bit only */
+	return mtd->ecc_stats.corrected != stats.corrected ? 1 : 0;
 }
 
 /**
@@ -1333,7 +1334,8 @@ static int onenand_read_ops_nolock(struct mtd_info *mtd, loff_t from,
 	if (mtd->ecc_stats.failed - stats.failed)
 		return -EBADMSG;
 
-	return mtd->ecc_stats.corrected - stats.corrected ? -EUCLEAN : 0;
+	/* return max bitflips per page; ONENANDs correct 1 bit only */
+	return mtd->ecc_stats.corrected != stats.corrected ? 1 : 0;
 }
 
 /**
diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h
index 1c3706e..9e44e36 100644
--- a/include/linux/mtd/mtd.h
+++ b/include/linux/mtd/mtd.h
@@ -264,15 +264,7 @@ int mtd_write(struct mtd_info *mtd, loff_t to, size_t len, size_t *retlen,
 	      const u_char *buf);
 int mtd_panic_write(struct mtd_info *mtd, loff_t to, size_t len, size_t *retlen,
 		    const u_char *buf);
-
-static inline int mtd_read_oob(struct mtd_info *mtd, loff_t from,
-			       struct mtd_oob_ops *ops)
-{
-	ops->retlen = ops->oobretlen = 0;
-	if (!mtd->_read_oob)
-		return -EOPNOTSUPP;
-	return mtd->_read_oob(mtd, from, ops);
-}
+int mtd_read_oob(struct mtd_info *mtd, loff_t from, struct mtd_oob_ops *ops);
 
 static inline int mtd_write_oob(struct mtd_info *mtd, loff_t to,
 				struct mtd_oob_ops *ops)
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads
  2012-03-15 17:25 [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads Mike Dunn
                   ` (2 preceding siblings ...)
  2012-03-15 17:25 ` [PATCH 3/3] MTD: drivers return max_bitflips, mtd returns -EUCLEAN Mike Dunn
@ 2012-03-16 11:19 ` Ivan Djelic
  2012-03-16 12:49   ` Artem Bityutskiy
  2012-03-16 16:25   ` Mike Dunn
  3 siblings, 2 replies; 25+ messages in thread
From: Ivan Djelic @ 2012-03-16 11:19 UTC (permalink / raw)
  To: Mike Dunn; +Cc: Ricard Wanderlof, Robert Jarzmik, linux-mtd

On Thu, Mar 15, 2012 at 05:25:50PM +0000, Mike Dunn wrote:
> 
> This patchset addresses the problem of insufficient information being returned
> by mtd_read() and mtd_read_oob() with regard to bit error corrections.
> Currently -EUCLEAN is returned if one or more bit errors were corrected during
> the course of the read operation.  Higher layers like UBI use this return code
> as an indication that the erase block may be degrading and should be considered
> as a candidate for being marked as a bad block.  The problem is that high bit
> error rates are common on more recent NAND flash devices, which these devices
> compensate for by using strong ecc algorithms.  Frequent (but entirely normal)
> bit error corrections on these devices result in blocks being incorrectly marked
> as bad.  On some devices, ubi/ubifs can not be reliably used because of this.
> 
> This problem was discussed a while back [1][2][3], and a consensus of sorts was
> reached for a solution, which these patches implement.  The recent addition of
> the mtd api entry functions now make this solution practical (thanks Artem!).
> These patches are on top of those which added ecc_strength to mtd_info and had
> the drivers initialize it.  (Those patches have already been pushed.)  A quick
> description of each patch will provide a synopsis of the patchset:
> 
> (1) The struct mtd_info element 'ecc_strength' is exposed through sysfs as a
>     read-only variable.
> 
> (2) The element 'bitflip_threshold' is added to struct mtd_info.  If the driver
>     leaves this uninitialized, mtd sets it to ecc_strength when the device or
>     partition is registered.  This element is also exposed for reading and
>     writing from userspace through sysfs.
> 
> (3) The drivers' read methods, absent an error, return a non-negative integer
>     indicating the maximum number of bit errors that were corrected in any one
>     writesize region.  MTD returns -EUCLEAN if this is >= bitflip_threshold, 0
>     otherwise.
> 
> So basically, the meaning of -EUCLEAN is changed from "one or more bit errors
> were corrected", to "a dangerously high number of bit errors were corrected on
> one or more writesize regions".  By default, "dangerously high" is interpreted
> as ecc_strength.  Drivers can specify a different value, and the user can
> override it if more or less caution regarding data integrity is desired.

Hello Mike,

Thanks a lot for working on this long needed feature!

I haven't had the chance to work on mtd stuff for a very long time, but I'll try
at least to review your work.
>From your quick description, I fear there may be a design problem: with
'ecc_strength' and 'bitflip_threshold' defined on a per writesize region basis, I
think you cannot provide upper layers with enough information regarding the
block cleaning decision.

Consider the following situation:
- a NAND device with 2kB pages and 4 ecc steps per page (4 x 512 bytes)
- the driver has chip->ecc.strength = 4, and therefore mtd->ecc_strength = 16
- let's say mtd->bitflip_threshold = 16

The driver read() method could return a non-negative integer, say 4, in at least
the following cases:

1. During a single page read, each of the 4 ecc steps corrected 1 bit, with a
total variation of ecc_stats.corrected equal to 4.
=> no cleaning needed

2. During a single page read, 1 ecc step corrected 4 bits, the 3 other steps had
no correction to perform, with a total variation of ecc_stats.corrected equal
to 4.
=> cleaning is needed

In both cases, you will compare the same value 4 to mtd->bitflip_threshold (16)
and decide to return 0 (and not -EUCLEAN).

So my point is that the cleaning decision happens at the ecc step level,
not at the page reading level.

I think this could be fixed by dropping 'ecc_strength' and changing the semantics
of 'bitflip_threshold' in the following way (rephrasing your explanation):

  (3) The drivers' read methods, absent an error, return a non-negative integer
      indicating the maximum number of bit errors that were corrected in any one
      ecc step.  MTD returns -EUCLEAN if this is >= bitflip_threshold, 0
      otherwise.

  So basically, the meaning of -EUCLEAN is changed from "one or more bit errors
  were corrected", to "a dangerously high number of bit errors were corrected on
  one or more ecc step block".  By default, "dangerously high" is interpreted
  as chip->ecc.strength.  Drivers can specify a different value, and the user can
  override it if more or less caution regarding data integrity is desired.

But still, there is a problem: how do we implement (3), i.e. how do we know
"the maximum number of bit errors that were corrected in any one ecc step" ?

Just looking at ecc_stats.corrected is not enough, as it accumulates over each
ecc step result, and does not allow us to distinguish cases 1 and 2 (from my
previous example). Maybe we could have per-step ecc stats ? or have the driver
return directly the information ?

BR,

--
Ivan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads
  2012-03-16 11:19 ` [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads Ivan Djelic
@ 2012-03-16 12:49   ` Artem Bityutskiy
  2012-03-16 16:30     ` Mike Dunn
  2012-03-16 16:25   ` Mike Dunn
  1 sibling, 1 reply; 25+ messages in thread
From: Artem Bityutskiy @ 2012-03-16 12:49 UTC (permalink / raw)
  To: Ivan Djelic; +Cc: Ricard Wanderlof, Robert Jarzmik, Mike Dunn, linux-mtd

[-- Attachment #1: Type: text/plain, Size: 1792 bytes --]

> Consider the following situation:
> - a NAND device with 2kB pages and 4 ecc steps per page (4 x 512 bytes)
> - the driver has chip->ecc.strength = 4, and therefore mtd->ecc_strength = 16

I would expect in this case we should have mtd->ecc_strength = 4 anyway.

> So my point is that the cleaning decision happens at the ecc step level,
> not at the page reading level.

I agree, good point.

> I think this could be fixed by dropping 'ecc_strength' and changing the semantics
> of 'bitflip_threshold' in the following way (rephrasing your explanation):
> 
>   (3) The drivers' read methods, absent an error, return a non-negative integer
>       indicating the maximum number of bit errors that were corrected in any one
>       ecc step.  MTD returns -EUCLEAN if this is >= bitflip_threshold, 0
>       otherwise.
> 
>   So basically, the meaning of -EUCLEAN is changed from "one or more bit errors
>   were corrected", to "a dangerously high number of bit errors were corrected on
>   one or more ecc step block".  By default, "dangerously high" is interpreted
>   as chip->ecc.strength.  Drivers can specify a different value, and the user can
>   override it if more or less caution regarding data integrity is desired.
> 
> But still, there is a problem: how do we implement (3), i.e. how do we know
> "the maximum number of bit errors that were corrected in any one ecc step" ?
> 
> Just looking at ecc_stats.corrected is not enough, as it accumulates over each
> ecc step result, and does not allow us to distinguish cases 1 and 2 (from my
> previous example). Maybe we could have per-step ecc stats ? or have the driver
> return directly the information ?

Yeah, sounds like we should have per-step stats.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads
  2012-03-16 11:19 ` [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads Ivan Djelic
  2012-03-16 12:49   ` Artem Bityutskiy
@ 2012-03-16 16:25   ` Mike Dunn
  2012-03-16 18:43     ` Ivan Djelic
  2012-03-16 21:54     ` Shmulik Ladkani
  1 sibling, 2 replies; 25+ messages in thread
From: Mike Dunn @ 2012-03-16 16:25 UTC (permalink / raw)
  To: Ivan Djelic; +Cc: Ricard Wanderlof, Robert Jarzmik, linux-mtd

Hi Ivan.  Thanks for the review!

On 03/16/2012 04:19 AM, Ivan Djelic wrote:
> 
> Consider the following situation:
> - a NAND device with 2kB pages and 4 ecc steps per page (4 x 512 bytes)
> - the driver has chip->ecc.strength = 4, and therefore mtd->ecc_strength = 16
> - let's say mtd->bitflip_threshold = 16
> 
> The driver read() method could return a non-negative integer, say 4, in at least
> the following cases:
> 
> 1. During a single page read, each of the 4 ecc steps corrected 1 bit, with a
> total variation of ecc_stats.corrected equal to 4.
> => no cleaning needed
> 
> 2. During a single page read, 1 ecc step corrected 4 bits, the 3 other steps had
> no correction to perform, with a total variation of ecc_stats.corrected equal
> to 4.
> => cleaning is needed


Maybe my (admittedly limited) understanding of the physical nature of NAND flash
is flawed.  I assumed that a writesize region (i.e., a NAND page for our
purposes) is the most elemental unit wrt physical wear, regardless of whether or
not ecc is caclulated once for the whole page or incrementally in steps.


> 
> In both cases, you will compare the same value 4 to mtd->bitflip_threshold (16)
> and decide to return 0 (and not -EUCLEAN).
> 
> So my point is that the cleaning decision happens at the ecc step level,
> not at the page reading level.


But you're sayimg my assumption is incorrect.  So each ecc-sized area within a
page is physically distinct and must be considered in isolation?  Could you
maybe elaborate on this?


> 
> I think this could be fixed by dropping 'ecc_strength' and changing the semantics
> of 'bitflip_threshold' in the following way (rephrasing your explanation):
> 
>   (3) The drivers' read methods, absent an error, return a non-negative integer
>       indicating the maximum number of bit errors that were corrected in any one
>       ecc step.  MTD returns -EUCLEAN if this is >= bitflip_threshold, 0
>       otherwise.
> 
>   So basically, the meaning of -EUCLEAN is changed from "one or more bit errors
>   were corrected", to "a dangerously high number of bit errors were corrected on
>   one or more ecc step block".  By default, "dangerously high" is interpreted
>   as chip->ecc.strength.  Drivers can specify a different value, and the user can
>   override it if more or less caution regarding data integrity is desired.
> 
> But still, there is a problem: how do we implement (3), i.e. how do we know
> "the maximum number of bit errors that were corrected in any one ecc step" ?
> 
> Just looking at ecc_stats.corrected is not enough, as it accumulates over each
> ecc step result, and does not allow us to distinguish cases 1 and 2 (from my
> previous example). Maybe we could have per-step ecc stats ? or have the driver
> return directly the information ?


Yes, this will require more work, touching many drivers :(  The per-page stats
allowed me to limit most of the changes to nand_base.c.

If you are correct about the need to consider each ecc-sized region separately,
then these patches are actually a regression, since a "dangerously high" number
of bitflips will be considered OK, as your example illustrates.

Thanks again.

Mike

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads
  2012-03-16 12:49   ` Artem Bityutskiy
@ 2012-03-16 16:30     ` Mike Dunn
  0 siblings, 0 replies; 25+ messages in thread
From: Mike Dunn @ 2012-03-16 16:30 UTC (permalink / raw)
  To: dedekind1; +Cc: Robert Jarzmik, Ivan Djelic, Ricard Wanderlof, linux-mtd

On 03/16/2012 05:49 AM, Artem Bityutskiy wrote:
>> Consider the following situation:
>> - a NAND device with 2kB pages and 4 ecc steps per page (4 x 512 bytes)
>> - the driver has chip->ecc.strength = 4, and therefore mtd->ecc_strength = 16
> 
> I would expect in this case we should have mtd->ecc_strength = 4 anyway.


But Ivan is correct; the patches would make it 16 because it is per page, not
per ecc.size.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/3] MTD: bitflip_threshold added to mtd_info and sysfs
  2012-03-15 17:25 ` [PATCH 2/3] MTD: bitflip_threshold added to mtd_info and sysfs Mike Dunn
@ 2012-03-16 16:31   ` Ivan Djelic
  0 siblings, 0 replies; 25+ messages in thread
From: Ivan Djelic @ 2012-03-16 16:31 UTC (permalink / raw)
  To: Mike Dunn; +Cc: Ricard Wanderlof, linux-mtd

On Thu, Mar 15, 2012 at 05:25:52PM +0000, Mike Dunn wrote:
> +
> +What:		/sys/class/mtd/mtdX/bitflip_threshold
> +Date:		March 2012
> +KernelVersion:	3.3.1
> +Contact:	linux-mtd@lists.infradead.org
> +Description:
> +		This allows the user to examine and adjust the criteria by which
> +		mtd returns -EUCLEAN from mtd_read() and mtd_read_oob().  If the
> +		maximum number of bit errors that were corrected on any single
> +		writesize region (as reported by the driver) equals or exceeds
> +		this value, -EUCLEAN is returned.  Otherwise, absent an error, 0
> +		is returned.  Higher layers (e.g., UBI) use this return code as
> +		an indication that an erase block may be degrading and should be
> +		scrutinized as a candidate for being marked as bad.
> +
> +		The initial value may be specified by the flash device driver.
> +		If not, then the default value is ecc_strength.  Users who wish
> +		to be more paranoid about data integrity can lower the value.
> +		If the value exceeds ecc_strength, -EUCLEAN is never returned by
> +		the read functions.

Hmmm. I don't think it's a good idea to say "Users who wish to be more paranoid
about data integrity can lower the value"; because this is not exactly true.

Lowering the value is very dangerous, and can have devastating effects: on NAND
devices where sticky bitflips appear (we have plenty of those devices), a low
threshold (say 1) triggers block torture by UBI, then bad block retirement,
quickly reducing the number of valid blocks; the other "sane" blocks
with intermittent bitflips keep being scrubbed, thrashing the whole device.
Even worse: if enough bad blocks appear, UBI runs out of replacement blocks
and stops working.

IMHO the value of 'bitflip_threshold' should be carefully chosen:

- low enough to ensure ecc correction has a safety margin and manufacturer
  requirements are met
- high enough to avoid the effects described above

In some cases, controlling bitflip_threshold can be interesting for other
reasons; for instance, on a specific board, I have used a NAND device
requiring 4-bit ecc, but I implemented 8-bit protection through hardware BCH for
extra safety (and future 8-bit NAND upgrades).
In that particular setup, I would set bitflip_threshold to 3 or 4 instead of the
value derived from the ecc strength (8).

So in practice, setting bitflip_threshold is tricky and requires a good
knowledge of the NAND (or Doc/whatever) device your are using, and of how mtd/UBI
will use the threshold.
I suggest we warn about the dangers and discourage people from messing with this
knob unless they know what they are doing.

BR,
-- 
Ivan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads
  2012-03-16 16:25   ` Mike Dunn
@ 2012-03-16 18:43     ` Ivan Djelic
  2012-03-17 20:18       ` Mike Dunn
  2012-03-16 21:54     ` Shmulik Ladkani
  1 sibling, 1 reply; 25+ messages in thread
From: Ivan Djelic @ 2012-03-16 18:43 UTC (permalink / raw)
  To: Mike Dunn; +Cc: Ricard Wanderlof, Robert Jarzmik, linux-mtd

On Fri, Mar 16, 2012 at 04:25:08PM +0000, Mike Dunn wrote:
> 
> Maybe my (admittedly limited) understanding of the physical nature of NAND flash
> is flawed.  I assumed that a writesize region (i.e., a NAND page for our
> purposes) is the most elemental unit wrt physical wear, regardless of whether or
> not ecc is caclulated once for the whole page or incrementally in steps.
> 
> 
> > 
> > In both cases, you will compare the same value 4 to mtd->bitflip_threshold (16)
> > and decide to return 0 (and not -EUCLEAN).
> > 
> > So my point is that the cleaning decision happens at the ecc step level,
> > not at the page reading level.
> 
> 
> But you're sayimg my assumption is incorrect.  So each ecc-sized area within a
> page is physically distinct and must be considered in isolation?  Could you
> maybe elaborate on this?

When NAND manufacturers specify ECC requirements in their datasheet, they indicate:
- a block size on which ECC should be computed (possibly smaller than the page size)
- how many errors should be correctable in a single block size, i.e. strength

For instance:
- size = 512 bytes
- strength = 8 errors

If the NAND device has 2 kB (resp. 4kB) pages, you will need to perform 4 (resp. 8) ECC
computations per page.

The point of implementing the specified ECC is to ensure the integrity of data for the
specified lifetime, i.e. its longevity.

Manufacturers select an ECC size/strength setup for a device purely from statistical
computations or empirical results: there is no "special" physical ecc-sized area in each
page. It's just that the error distribution empirically observed is well covered by the
specified size/strength combination. You could perfectly protect your NAND device using
a different size/strength combination, with different longevity results.
The manufacturer size/strength recommendation takes also into account the available spare
space and ECC computational requirements.

As a matter of fact, NAND manufacturers run write/read cycle aging tests in ovens, and
measure how fast data corruption happens, assuming various protection schemes: no ECC,
1-bit ECC/512, 2-bit ECC/512, and so on. Google "NAND UBER" for more details (UBER =
Uncorrectable Bit Error Rate).

Now, the point of scrubbing a block is to avoid error accumulation to the point when
ECC is no longer able to correct all errors. This is why we must monitor each single
ecc step rather than consider the total amount of errors corrected in a page.

BR,
-- 
Ivan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads
  2012-03-16 16:25   ` Mike Dunn
  2012-03-16 18:43     ` Ivan Djelic
@ 2012-03-16 21:54     ` Shmulik Ladkani
  2012-03-16 22:57       ` Peter Barada
  2012-03-17 20:50       ` Mike Dunn
  1 sibling, 2 replies; 25+ messages in thread
From: Shmulik Ladkani @ 2012-03-16 21:54 UTC (permalink / raw)
  To: Mike Dunn; +Cc: Robert Jarzmik, Ivan Djelic, Ricard Wanderlof, linux-mtd

Hi Mike,

On Fri, 16 Mar 2012 09:25:08 -0700 Mike Dunn <mikedunn@newsguy.com> wrote:
> Maybe my (admittedly limited) understanding of the physical nature of NAND flash
> is flawed.  I assumed that a writesize region (i.e., a NAND page for our
> purposes) is the most elemental unit wrt physical wear, regardless of whether or
> not ecc is caclulated once for the whole page or incrementally in steps.

Bit-flips may occur at a per-cell basis, even on the OOB cells, as a
result of program-disturb, charge-loss, or cell ware-out causing read
sensing errors.

> But you're sayimg my assumption is incorrect.  So each ecc-sized area within a
> page is physically distinct and must be considered in isolation? 

There's no "physical" distinction, in the sense that cells are separated
in the device or alike.
Simply, the ECC algorithm is independently calculated over several
portions of the page.
But that's not a must: suppose X bits per Y bytes ECC is required; you
may use a 2X / 2Y ECC and acheive similar intergrity and endurance
statistical characteristics.

For your purposes, the question whether the cleaning decision should be
according to the ecc step level, is dependent of how you define
"a dangerously high number of bit errors".

Lets continue with Ivan's example (2KiB page, 4 eccsteps, 512 bytes
each step, strength 4bits/512bytes).
Suppose the first ECC portion has 4 bit errors, the other 3 portions
have none.
If, for example, several read operations later, a new bitflip is
intorduced within the first portion, leading to 5 bit errors.
Obviously, the ECC algorithm is now unable to correct this portion,
meaning, the buffer is corrupt - which also means the entire page data
read is corrupt. The nand infrastructure would return -EBADMSG - and you
had just 5 bit errors over the entire page.

So question is, would you consider 4 bit errors in the first ECC portion
to be "a dangerously high number of bit errors" as what's reported to
the MTD users?
If so, then yes, the cleaning decision should be according to the ecc
step level, not at the page reading level.

Regards,
Shmulik

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads
  2012-03-16 21:54     ` Shmulik Ladkani
@ 2012-03-16 22:57       ` Peter Barada
  2012-03-17 21:10         ` Mike Dunn
  2012-03-17 20:50       ` Mike Dunn
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Barada @ 2012-03-16 22:57 UTC (permalink / raw)
  To: linux-mtd

On 03/16/2012 05:54 PM, Shmulik Ladkani wrote:
>
> So question is, would you consider 4 bit errors in the first ECC portion
> to be "a dangerously high number of bit errors" as what's reported to
> the MTD users?
> If so, then yes, the cleaning decision should be according to the ecc
> step level, not at the page reading level.
If you had a ECC method that could correct N bits over the entire page 
and the ECC showed N-1 bits needed correcting then it should be obvious 
that the page is in danger of becoming uncorrectable.  This should be 
the same as if there are multiple ECC steps per page and a single step 
shoes N-1 bits that need correcting.  I think the indication from MTD 
should be the worst case found in all the ECC steps...

The bigger issue is how to discern whether the degredation is due to 
read-disturb (which can be recovered by erasing/reprogramming the block) 
or the page physically wearing out (in which case it needs to be 
retired).  For first generation SLC parts with large geometries this was 
relatively straightforward where the block didn't show *any* any 
bitflips up until it got close to its wear limit.  With smaller geometry 
SLC (and definitely with MLC) things are not straightfoward.

In discussions with at least one NAND manufacturer, they indicated that 
the "proper" method is to track reads per block (somehow across power 
cycles) and when the number of reads per block (after an erasure of the 
block) hits a limit then refresh the block, *and* disregard statistical 
counting of bit flips - the read patterns across pages/blocks can affect 
the number of bitflips seen - apparently it has to do with how the 
physical geometry of the cells are laid out (due to the address lines 
that are energized that exist nearby, but no details for the part in 
question were provided).

Unfortunately there's no current method (that I know of) in MTD to keep 
a non-volatile count of reads of pages within a block between erases 
that can be used to handle the read-disturb case.  If such existed (and 
kept track of erase counts) then it should be possible to handle both 
cases.  Then a NAND manufacturer's rating of "at temperature range M, N 
year retention, you can get X UBER if limt reads to Y thousands of 
reads/block, and Z thousands of erasures" would be tractable...

-- 
Peter Barada
peter.barada@gmil.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads
  2012-03-16 18:43     ` Ivan Djelic
@ 2012-03-17 20:18       ` Mike Dunn
  2012-03-18  8:00         ` Shmulik Ladkani
                           ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Mike Dunn @ 2012-03-17 20:18 UTC (permalink / raw)
  To: Ivan Djelic; +Cc: Ricard Wanderlof, Robert Jarzmik, linux-mtd

On 03/16/2012 11:43 AM, Ivan Djelic wrote:
> On Fri, Mar 16, 2012 at 04:25:08PM +0000, Mike Dunn wrote:
>>
>> But you're sayimg my assumption is incorrect.  So each ecc-sized area within a
>> page is physically distinct and must be considered in isolation?  Could you
>> maybe elaborate on this?
> 
> When NAND manufacturers specify ECC requirements in their datasheet, they indicate:
> - a block size on which ECC should be computed (possibly smaller than the page size)
> - how many errors should be correctable in a single block size, i.e. strength
> 
> For instance:
> - size = 512 bytes
> - strength = 8 errors
> 
> If the NAND device has 2 kB (resp. 4kB) pages, you will need to perform 4 (resp. 8) ECC
> computations per page.
> 
> The point of implementing the specified ECC is to ensure the integrity of data for the
> specified lifetime, i.e. its longevity.
> 
> Manufacturers select an ECC size/strength setup for a device purely from statistical
> computations or empirical results: there is no "special" physical ecc-sized area in each
> page. It's just that the error distribution empirically observed is well covered by the
> specified size/strength combination. You could perfectly protect your NAND device using
> a different size/strength combination, with different longevity results.
> The manufacturer size/strength recommendation takes also into account the available spare
> space and ECC computational requirements.
> 
> As a matter of fact, NAND manufacturers run write/read cycle aging tests in ovens, and
> measure how fast data corruption happens, assuming various protection schemes: no ECC,
> 1-bit ECC/512, 2-bit ECC/512, and so on. Google "NAND UBER" for more details (UBER =
> Uncorrectable Bit Error Rate).
> 
> Now, the point of scrubbing a block is to avoid error accumulation to the point when
> ECC is no longer able to correct all errors. This is why we must monitor each single
> ecc step rather than consider the total amount of errors corrected in a page.


Thanks for the detailed explanation, Ivan.  I guess I'm convinced now that I
think it through...  Basically, the chain (NAND page) is only as strong as the
weakest link (ecc.size).

So if we're all convinced that thresholds must be evaluated at the ecc.size
level, there's more work <sigh>.  As Ivan pointed out earlier, the nand drivers
currently do not track bitflip corrections per ecc step, which means all the
nand drivers must be modified.  Before I start on this, I'd like to get a
blessing from Artem / David, since the chamges will touch many drivers in a
somehat non-trivial manner.

I think the best approach is to have the nand drivers return max_bitflips over
all ecc steps, same idea as what the previous patches returned to mtd, but at a
finer granularity.  Then the nand code just passes this up to mtd, instead of
examining the ecc_stats as previously.

If I do go ahead, I can first post patches that fixes the two patches already
pushed to l2-mtd, or Artem can just pull them out and I'll start from scratch.
I personally prefer the former, since it's easier for me, plus it would be
illustrative of the change of plans.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads
  2012-03-16 21:54     ` Shmulik Ladkani
  2012-03-16 22:57       ` Peter Barada
@ 2012-03-17 20:50       ` Mike Dunn
  1 sibling, 0 replies; 25+ messages in thread
From: Mike Dunn @ 2012-03-17 20:50 UTC (permalink / raw)
  To: Shmulik Ladkani; +Cc: Robert Jarzmik, Ivan Djelic, Ricard Wanderlof, linux-mtd

Many thanks Shmulik.  This helped my thought process, which was distracted by
consideration of the "physical" nature of the device ;)  Yes, the integrity of
the entire page can be threatened in any one ecc step, so the danger threshold
should be based on the ecc strength of each step.

Thanks again,
Mike


On 03/16/2012 02:54 PM, Shmulik Ladkani wrote:
> Hi Mike,
> 
> On Fri, 16 Mar 2012 09:25:08 -0700 Mike Dunn <mikedunn@newsguy.com> wrote:
>> Maybe my (admittedly limited) understanding of the physical nature of NAND flash
>> is flawed.  I assumed that a writesize region (i.e., a NAND page for our
>> purposes) is the most elemental unit wrt physical wear, regardless of whether or
>> not ecc is caclulated once for the whole page or incrementally in steps.
> 
> Bit-flips may occur at a per-cell basis, even on the OOB cells, as a
> result of program-disturb, charge-loss, or cell ware-out causing read
> sensing errors.
> 
>> But you're sayimg my assumption is incorrect.  So each ecc-sized area within a
>> page is physically distinct and must be considered in isolation? 
> 
> There's no "physical" distinction, in the sense that cells are separated
> in the device or alike.
> Simply, the ECC algorithm is independently calculated over several
> portions of the page.
> But that's not a must: suppose X bits per Y bytes ECC is required; you
> may use a 2X / 2Y ECC and acheive similar intergrity and endurance
> statistical characteristics.
> 
> For your purposes, the question whether the cleaning decision should be
> according to the ecc step level, is dependent of how you define
> "a dangerously high number of bit errors".
> 
> Lets continue with Ivan's example (2KiB page, 4 eccsteps, 512 bytes
> each step, strength 4bits/512bytes).
> Suppose the first ECC portion has 4 bit errors, the other 3 portions
> have none.
> If, for example, several read operations later, a new bitflip is
> intorduced within the first portion, leading to 5 bit errors.
> Obviously, the ECC algorithm is now unable to correct this portion,
> meaning, the buffer is corrupt - which also means the entire page data
> read is corrupt. The nand infrastructure would return -EBADMSG - and you
> had just 5 bit errors over the entire page.
> 
> So question is, would you consider 4 bit errors in the first ECC portion
> to be "a dangerously high number of bit errors" as what's reported to
> the MTD users?
> If so, then yes, the cleaning decision should be according to the ecc
> step level, not at the page reading level.
> 
> Regards,
> Shmulik
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads
  2012-03-16 22:57       ` Peter Barada
@ 2012-03-17 21:10         ` Mike Dunn
  0 siblings, 0 replies; 25+ messages in thread
From: Mike Dunn @ 2012-03-17 21:10 UTC (permalink / raw)
  To: Peter Barada; +Cc: linux-mtd

On 03/16/2012 03:57 PM, Peter Barada wrote:
> On 03/16/2012 05:54 PM, Shmulik Ladkani wrote:
> I think the indication from MTD should be the worst case found in
> all the ECC steps...


Yes, that's the consensus.  Some of us are just a litle slow :)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads
  2012-03-17 20:18       ` Mike Dunn
@ 2012-03-18  8:00         ` Shmulik Ladkani
  2012-03-19  8:50           ` Matthieu CASTET
  2012-03-30 14:21           ` Artem Bityutskiy
  2012-03-30 14:16         ` Artem Bityutskiy
  2012-03-30 14:19         ` Artem Bityutskiy
  2 siblings, 2 replies; 25+ messages in thread
From: Shmulik Ladkani @ 2012-03-18  8:00 UTC (permalink / raw)
  To: Mike Dunn; +Cc: Robert Jarzmik, Ivan Djelic, Ricard Wanderlof, linux-mtd

Hi Mike,

On Sat, 17 Mar 2012 13:18:04 -0700 Mike Dunn <mikedunn@newsguy.com> wrote:
> So if we're all convinced that thresholds must be evaluated at the ecc.size
> level, there's more work <sigh>.  As Ivan pointed out earlier, the nand drivers
> currently do not track bitflip corrections per ecc step, which means all the
> nand drivers must be modified.  Before I start on this, I'd like to get a
> blessing from Artem / David, since the chamges will touch many drivers in a
> somehat non-trivial manner.

Well the change would affect more places, that's for sure.
However it could be a trivial change; see below.

> I think the best approach is to have the nand drivers return max_bitflips over
> all ecc steps, same idea as what the previous patches returned to mtd, but at a
> finer granularity.  Then the nand code just passes this up to mtd, instead of
> examining the ecc_stats as previously.

Please consider the following scheme:

(1) 'ecc.read_page' interface is changed.
  The return value will denote max bitflips over all ecc steps.
  (currently all implementations return 0)
  Please examine nand_read_page_swecc/nand_read_page_hwecc (and other
  implementations) and observe how easy this can be implemeted.

  Downside: we'll have to do this (max calculation) for each 'read_page'
  implementation.
  There are about ~10 implementations.
  (BTW most implementations are SO similar, maybe the code can be
  consolidated?)

(2) Modify nand_base's 'nand_do_read_ops', similary to what you've
  already done, to aggregate the maximum bitflips according to the
  returned values of the 'chip->ecc.read_page()' calls.
  Meaning, instead of your aggregation:
+	max_bitflips = max(max_bitflips, mtd->ecc_stats.corrected - prev_corrected);
  let's use:
+	max_bitflips = max(max_bitflips, read_page_returned_max_bitflips);

(3)  Finally, 'nand_do_read_ops' may return -EUCLEAN if max_bitflips is
  above the threshold:
-	return  mtd->ecc_stats.corrected - stats.corrected ? -EUCLEAN : 0;
+	return  max_bitflips >= mtd->bitflip_threshold ? -EUCLEAN : 0;

  [ BTW, at a side note, I failed to understand why you prefer the
  threshold comparison at the generic 'mtd_read' wrapper, and not within
  nand_base.c.
  I guess you didn't want to duplicate the condition into onenand_base.c
  and alauda.c?
  Thing is, I feel mtd->bitflip_threshold is a NAND property, so it
  makes more sense if it is tested within the NAND infrastricture (and
  clones).
  Changing the 'mtd->_read' interface was less elegant IMO. ]

Obvisouly, the above said is 30000 feet high, cutting a few corners :-)
We must let the code talk.

The only immediate blocker for such a scheme would be those nand
controller drivers, where the controller is responsible of the ECC
correction, and it does not report per-step stats to the software.
Are there any?
(In that case, we have nothing else to do besides falling back to a
per-page decision. For example, such driver's 'read_page' may return
total_corrected_per_page/ecc.steps as an estimate of the number of
per-step corrected bits, or alike)

Regards,
Shmulik

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads
  2012-03-18  8:00         ` Shmulik Ladkani
@ 2012-03-19  8:50           ` Matthieu CASTET
  2012-03-19  9:29             ` Shmulik Ladkani
  2012-03-19 19:09             ` Mike Dunn
  2012-03-30 14:21           ` Artem Bityutskiy
  1 sibling, 2 replies; 25+ messages in thread
From: Matthieu CASTET @ 2012-03-19  8:50 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: Ricard Wanderlof, Robert Jarzmik, Ivan Djelic, Mike Dunn, linux-mtd

Hi,

Shmulik Ladkani a écrit :
> Hi Mike,
> 

> The only immediate blocker for such a scheme would be those nand
> controller drivers, where the controller is responsible of the ECC
> correction, and it does not report per-step stats to the software.
> Are there any?
> (In that case, we have nothing else to do besides falling back to a
> per-page decision. For example, such driver's 'read_page' may return
> total_corrected_per_page/ecc.steps as an estimate of the number of
> per-step corrected bits, or alike)
No, they should return the worst case : max(total_corrected_per_page, ecc.strength).
Otherwise you have the same problem : case 2 in [1] will fail.
It is better to trigger false positive (do scrubbing) than having uncorrectable
error.


Matthieu


[1]
Consider the following situation:
- a NAND device with 2kB pages and 4 ecc steps per page (4 x 512 bytes)
- the driver has chip->ecc.strength = 4, and therefore mtd->ecc_strength = 16
- let's say mtd->bitflip_threshold = 16

The driver read() method could return a non-negative integer, say 4, in at least
the following cases:

1. During a single page read, each of the 4 ecc steps corrected 1 bit, with a
total variation of ecc_stats.corrected equal to 4.
=> no cleaning needed

2. During a single page read, 1 ecc step corrected 4 bits, the 3 other steps had
no correction to perform, with a total variation of ecc_stats.corrected equal
to 4.
=> cleaning is needed

In both cases, you will compare the same value 4 to mtd->bitflip_threshold (16)
and decide to return 0 (and not -EUCLEAN).

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads
  2012-03-19  8:50           ` Matthieu CASTET
@ 2012-03-19  9:29             ` Shmulik Ladkani
  2012-03-19 19:09             ` Mike Dunn
  1 sibling, 0 replies; 25+ messages in thread
From: Shmulik Ladkani @ 2012-03-19  9:29 UTC (permalink / raw)
  To: Matthieu CASTET
  Cc: Ricard Wanderlof, Robert Jarzmik, Ivan Djelic, Mike Dunn, linux-mtd

Hi Matthieu,

On Mon, 19 Mar 2012 09:50:54 +0100 Matthieu CASTET <matthieu.castet@parrot.com> wrote:
> > The only immediate blocker for such a scheme would be those nand
> > controller drivers, where the controller is responsible of the ECC
> > correction, and it does not report per-step stats to the software.
> > Are there any?
> > (In that case, we have nothing else to do besides falling back to a
> > per-page decision. For example, such driver's 'read_page' may return
> > total_corrected_per_page/ecc.steps as an estimate of the number of
> > per-step corrected bits, or alike)
> No, they should return the worst case : max(total_corrected_per_page, ecc.strength).
> Otherwise you have the same problem : case 2 in [1] will fail.
> It is better to trigger false positive (do scrubbing) than having uncorrectable
> error.

Thanks, your suggested heuristic is better than averaging.

However I guess you meant: min(total_corrected_per_page, ecc.strength)

My suggested 'ecc.read_page' retval denotes maximum bit errors per step.
As such, max possible return value is 'ecc.strength' (returning
'total_corrected_per_page' would be beyond max possible value).

'min(total_corrected_per_page, ecc.strength)' fits your suggestion to
prefer a false positive than having uncorrectable error.
It will indeed lead to scrubbing in "case 2".

Regards,
Shmulik

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads
  2012-03-19  8:50           ` Matthieu CASTET
  2012-03-19  9:29             ` Shmulik Ladkani
@ 2012-03-19 19:09             ` Mike Dunn
       [not found]               ` <20120319211835.1073a491@halley>
  1 sibling, 1 reply; 25+ messages in thread
From: Mike Dunn @ 2012-03-19 19:09 UTC (permalink / raw)
  To: Matthieu CASTET
  Cc: Robert Jarzmik, Ivan Djelic, Ricard Wanderlof, Shmulik Ladkani,
	linux-mtd

On 03/19/2012 01:50 AM, Matthieu CASTET wrote:
> Hi,
> 
> Shmulik Ladkani a écrit :
>> Hi Mike,
>>
> 
>> The only immediate blocker for such a scheme would be those nand
>> controller drivers, where the controller is responsible of the ECC
>> correction, and it does not report per-step stats to the software.
>> Are there any?


I didn't get Shmulik's original email for some reason, only what was quoted in
Matthieu's.  I think the answer to this question is no.  There is one driver,
fsl_elbc_nand.c, that fits this description; i.e., detection, correction, and
stats reporting are all done in hardware, and it looks like the hardware can
only report that some corrections occurred, not the exact number.  Ecc can be
done in multiple steps (depending on page size). but because ecc.strength==1, we
only need to know if any corrections were made at all in order to report
bitflips on a per ecc step basis.  I don't believe there are any other drivers
where bitflips per step can not be determined.


>> (In that case, we have nothing else to do besides falling back to a
>> per-page decision. For example, such driver's 'read_page' may return
>> total_corrected_per_page/ecc.steps as an estimate of the number of
>> per-step corrected bits, or alike)


A compromise between max_bitflips per page vs per ecc step?  As Matthieu and
Ivan pointed out, you can still have a hard error and be under the threshold.


> No, they should return the worst case : max(total_corrected_per_page, ecc.strength).


I don't quite understand that.


> Otherwise you have the same problem : case 2 in [1] will fail.
> It is better to trigger false positive (do scrubbing) than having uncorrectable
> error.


I agree.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads
       [not found]               ` <20120319211835.1073a491@halley>
@ 2012-03-20  1:27                 ` Mike Dunn
  0 siblings, 0 replies; 25+ messages in thread
From: Mike Dunn @ 2012-03-20  1:27 UTC (permalink / raw)
  To: Shmulik Ladkani; +Cc: robert.jarzmik, ivan.djelic, ricard.wanderlof, linux-mtd

On 03/19/2012 12:18 PM, Shmulik Ladkani wrote:
> On Mon, 19 Mar 2012 12:09:02 -0700 Mike Dunn <mikedunn@newsguy.com> wrote:
>>
>> I didn't get Shmulik's original email for some reason, only what was quoted in
>> Matthieu's. 
> 
> http://lists.infradead.org/pipermail/linux-mtd/2012-March/040343.html


Many thanks Shmulik.  I'm glad I didn't miss this.  Don't know why some emails
are getting dropped.  What you describe is exactly what I had in mind, and is
just an extension of the previous patches to the finer granularity of ecc steps.
 Yes it's not too hard, but touching drivers that I don't have the means to test
has to be done very carefully.

Quoting from your post off the list archive...


  [ BTW, at a side note, I failed to understand why you prefer the
  threshold comparison at the generic 'mtd_read' wrapper, and not within
  nand_base.c.
  I guess you didn't want to duplicate the condition into onenand_base.c
  and alauda.c?
  Thing is, I feel mtd->bitflip_threshold is a NAND property, so it
  makes more sense if it is tested within the NAND infrastricture (and
  clones).
  Changing the 'mtd->_read' interface was less elegant IMO. ]

I agree, it is a NAND property and would have preferred to keep it in the nand
interface,  But in addition to alauda, devices/docg3.c also is a nand device
that does not use the nand interface,  After being bandied about, it was decided
to make it part of the mtd driver interface.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads
  2012-03-17 20:18       ` Mike Dunn
  2012-03-18  8:00         ` Shmulik Ladkani
@ 2012-03-30 14:16         ` Artem Bityutskiy
  2012-03-31  1:23           ` Mike Dunn
  2012-03-30 14:19         ` Artem Bityutskiy
  2 siblings, 1 reply; 25+ messages in thread
From: Artem Bityutskiy @ 2012-03-30 14:16 UTC (permalink / raw)
  To: Mike Dunn; +Cc: Robert Jarzmik, Ivan Djelic, Ricard Wanderlof, linux-mtd

[-- Attachment #1: Type: text/plain, Size: 417 bytes --]

On Sat, 2012-03-17 at 13:18 -0700, Mike Dunn wrote:
> Before I start on this, I'd like to get a
> blessing from Artem / David, since the chamges will touch many drivers
> in a
> somehat non-trivial manner. 

Come on Mike, just go ahead without blessings. The stuff you are want to
do very important I believe. Currently MTD is basically lacking behind
the technology.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads
  2012-03-17 20:18       ` Mike Dunn
  2012-03-18  8:00         ` Shmulik Ladkani
  2012-03-30 14:16         ` Artem Bityutskiy
@ 2012-03-30 14:19         ` Artem Bityutskiy
  2 siblings, 0 replies; 25+ messages in thread
From: Artem Bityutskiy @ 2012-03-30 14:19 UTC (permalink / raw)
  To: Mike Dunn; +Cc: Robert Jarzmik, Ivan Djelic, Ricard Wanderlof, linux-mtd

[-- Attachment #1: Type: text/plain, Size: 517 bytes --]

On Sat, 2012-03-17 at 13:18 -0700, Mike Dunn wrote:
> 
> If I do go ahead, I can first post patches that fixes the two patches
> already
> pushed to l2-mtd, or Artem can just pull them out and I'll start from
> scratch.
> I personally prefer the former, since it's easier for me, plus it
> would be
> illustrative of the change of plans. 

We can do this during the post-rc1 cycle. It is my fault that I did not
weed them out in time, but this does not look fatal.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads
  2012-03-18  8:00         ` Shmulik Ladkani
  2012-03-19  8:50           ` Matthieu CASTET
@ 2012-03-30 14:21           ` Artem Bityutskiy
  2012-03-31  2:03             ` Mike Dunn
  1 sibling, 1 reply; 25+ messages in thread
From: Artem Bityutskiy @ 2012-03-30 14:21 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: Ricard Wanderlof, Robert Jarzmik, Ivan Djelic, Mike Dunn, linux-mtd

[-- Attachment #1: Type: text/plain, Size: 223 bytes --]

On Sun, 2012-03-18 at 10:00 +0200, Shmulik Ladkani wrote:
>   (BTW most implementations are SO similar, maybe the code can be
>   consolidated?)

Sounds like an attractive idea!

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads
  2012-03-30 14:16         ` Artem Bityutskiy
@ 2012-03-31  1:23           ` Mike Dunn
  0 siblings, 0 replies; 25+ messages in thread
From: Mike Dunn @ 2012-03-31  1:23 UTC (permalink / raw)
  To: dedekind1; +Cc: Robert Jarzmik, Ivan Djelic, Ricard Wanderlof, linux-mtd

On 03/30/2012 07:16 AM, Artem Bityutskiy wrote:
> On Sat, 2012-03-17 at 13:18 -0700, Mike Dunn wrote:
>> Before I start on this, I'd like to get a
>> blessing from Artem / David, since the chamges will touch many drivers
>> in a
>> somehat non-trivial manner. 
> 
> Come on Mike, just go ahead without blessings. The stuff you are want to
> do very important I believe. Currently MTD is basically lacking behind
> the technology.


How about just a slight nod in my general direction?  :)

OK, I was letting it rest until the merge window passed, which was good because
it gave Brian time to point out my blunder.

Mike

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads
  2012-03-30 14:21           ` Artem Bityutskiy
@ 2012-03-31  2:03             ` Mike Dunn
  0 siblings, 0 replies; 25+ messages in thread
From: Mike Dunn @ 2012-03-31  2:03 UTC (permalink / raw)
  To: dedekind1
  Cc: Ricard Wanderlof, Ivan Djelic, Robert Jarzmik, Shmulik Ladkani,
	linux-mtd

On 03/30/2012 07:21 AM, Artem Bityutskiy wrote:
> On Sun, 2012-03-18 at 10:00 +0200, Shmulik Ladkani wrote:
>>   (BTW most implementations are SO similar, maybe the code can be
>>   consolidated?)
> 
> Sounds like an attractive idea!


I'll watch for these opportunities while I work on the patches for the nand drivers.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2012-03-31  2:03 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-15 17:25 [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads Mike Dunn
2012-03-15 17:25 ` [PATCH 1/3] MTD: expose ecc_strength through sysfs Mike Dunn
2012-03-15 17:25 ` [PATCH 2/3] MTD: bitflip_threshold added to mtd_info and sysfs Mike Dunn
2012-03-16 16:31   ` Ivan Djelic
2012-03-15 17:25 ` [PATCH 3/3] MTD: drivers return max_bitflips, mtd returns -EUCLEAN Mike Dunn
2012-03-16 11:19 ` [PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads Ivan Djelic
2012-03-16 12:49   ` Artem Bityutskiy
2012-03-16 16:30     ` Mike Dunn
2012-03-16 16:25   ` Mike Dunn
2012-03-16 18:43     ` Ivan Djelic
2012-03-17 20:18       ` Mike Dunn
2012-03-18  8:00         ` Shmulik Ladkani
2012-03-19  8:50           ` Matthieu CASTET
2012-03-19  9:29             ` Shmulik Ladkani
2012-03-19 19:09             ` Mike Dunn
     [not found]               ` <20120319211835.1073a491@halley>
2012-03-20  1:27                 ` Mike Dunn
2012-03-30 14:21           ` Artem Bityutskiy
2012-03-31  2:03             ` Mike Dunn
2012-03-30 14:16         ` Artem Bityutskiy
2012-03-31  1:23           ` Mike Dunn
2012-03-30 14:19         ` Artem Bityutskiy
2012-03-16 21:54     ` Shmulik Ladkani
2012-03-16 22:57       ` Peter Barada
2012-03-17 21:10         ` Mike Dunn
2012-03-17 20:50       ` Mike Dunn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.