* [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes
@ 2014-01-17 7:13 Pekon Gupta
2014-01-17 7:13 ` [PATCH v7 1/6] mtd: nand: omap: add field to indicate current ecc-scheme in 'struct omap_nand_info' Pekon Gupta
` (7 more replies)
0 siblings, 8 replies; 12+ messages in thread
From: Pekon Gupta @ 2014-01-17 7:13 UTC (permalink / raw)
To: Brian Norris, Artem Bityutskiy
Cc: Stefan Roese, linux-mtd, Pekon Gupta, Felipe Balbi
*changes v6 -> v7*
[PATCH 1/6] <no change>
[PATCH 2/6] <no change>
[PATCH 3/6] <dropped older version, refer commit message>
- erased-page detection done based on count of zero-bits in OOB & DATA
- incorporated some feedbacks from Brian Norris <computersforpeace@gmail.com>
http://lists.infradead.org/pipermail/linux-mtd/2014-January/051533.html
http://lists.infradead.org/pipermail/linux-mtd/2014-January/051534.html
[PATCH 4/6] <new> remove redundant bit-flip counting for erased-page
[PATCH 5/6] <minor cleanup>
[PATCH 6/6] <minor cleanup>
(As there are functional changes in this version, so not attaching
Tested-by: Stefan Roese <sr@denx.de>)
*changes v5 -> v6*
[PATCH 1/6] <no change>
[PATCH 2/6] introduced variable 'actual_eccbytes' to omit reserved byte-position
[PATCH 3/6] split from [PATCH v5 3/5] fix erased-page bit-flip correction
[PATCH 4/6] split from [PATCH v5 3/5] fix erased-page detection
[PATCH 5/6] <minor cleanup>
[PATCH 6/6] <no change>
*changes v4 -> v5*
This patch series was split version of earlier patch:
http://lists.infradead.org/pipermail/linux-mtd/2013-November/050241.html
chip->ecc.correct() is used for detecting and correcting bit-flips during read
operations. In OMAP NAND driver different ecc-schemes have different callbacks:
- omap_correct_data() for HAM1_HW ecc-schemes (Untouched)
- nand_bch_correct_data() for BCHx_HW_DETECTION_SW ecc-schemes (Untouched)
- omap_elm_correct_data() for BCHx_HW ecc-schemes
This patch-series fixes following issues in omap_elm_correct_data():
(1) Dependency on a specific reserved byte-position in OOB area
to differentiates between erased-pages v/s programmed-pages.
Problem: reserved byte-position cannot be accomodated in all ecc-schemes
Problem: reserved byte-position can itself be subjected upto 8 bit-flips
causing the 0xff to become 0x00, causing page to be
mis-recognized as erased-page.
(2) Bit-flips in erased-pages are detected by comparing each byte of Data & OOB
with 0xff in check_erased_page().
Problem: This is causes performance penalty when erased-pages are checked.
(3) Current code is not scalable for future ECC schemes due to presence of
tweaks for BCH4_ECC and BCH8_ECC at multiple places.
(4) Currently, bit-flips are evaluated and fixed even when ELM reports them as
un-correctable bit-flips, this should not happen as 'number-of-error' field
in ELM_LOCATION_STATUS becomes invalid when un-correctable flag is set.
(5) Driver should return with error-code = '-EBADMSG' when
uncorrectable bit-flip is detected
bit-flip outside valid Data and OOB region is detected
Pekon Gupta (6):
mtd: nand: omap: add field to indicate current ecc-scheme in 'struct
omap_nand_info'
mtd: nand: omap: ecc.correct: omap_elm_correct_data: rename ambiguous
variable 'eccsize' and 'ecc_vector_size'
mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix erased-page
detection for BCHx_HW ECC schemes
mtd: nand: omap: ecc.correct: omap_elm_correct_data: remove redundant
bit-flip counting for erased-page
mtd: nand: omap: ecc.correct: omap_elm_correct_data: cleanup for
future enhancements
mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix
programmed-page bit-flip correction logic
drivers/mtd/nand/omap2.c | 265 +++++++++++++++++++++++------------------------
1 file changed, 127 insertions(+), 138 deletions(-)
--
1.8.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v7 1/6] mtd: nand: omap: add field to indicate current ecc-scheme in 'struct omap_nand_info'
2014-01-17 7:13 [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Pekon Gupta
@ 2014-01-17 7:13 ` Pekon Gupta
2014-01-17 7:13 ` [PATCH v7 2/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: rename ambiguous variable 'eccsize' and 'ecc_vector_size' Pekon Gupta
` (6 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Pekon Gupta @ 2014-01-17 7:13 UTC (permalink / raw)
To: Brian Norris, Artem Bityutskiy
Cc: Stefan Roese, linux-mtd, Pekon Gupta, Felipe Balbi
Information of currently selected ECC scheme 'enum omap_ecc ecc_opt' should
available outside platform-data, so that single nand_chip->ecc callback can
support multiple ecc-scheme configurations.
Signed-off-by: Pekon Gupta <pekon@ti.com>
---
drivers/mtd/nand/omap2.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/omap2.c b/drivers/mtd/nand/omap2.c
index 768e7cf..8fb64d5 100644
--- a/drivers/mtd/nand/omap2.c
+++ b/drivers/mtd/nand/omap2.c
@@ -160,6 +160,7 @@ struct omap_nand_info {
int gpmc_cs;
unsigned long phys_base;
unsigned long mem_size;
+ enum omap_ecc ecc_opt;
struct completion comp;
struct dma_chan *dma;
int gpmc_irq_fifo;
@@ -1657,6 +1658,7 @@ static int omap_nand_probe(struct platform_device *pdev)
info->gpmc_cs = pdata->cs;
info->reg = pdata->reg;
info->of_node = pdata->of_node;
+ info->ecc_opt = pdata->ecc_opt;
mtd = &info->mtd;
mtd->priv = &info->nand;
mtd->name = dev_name(&pdev->dev);
@@ -1812,7 +1814,7 @@ static int omap_nand_probe(struct platform_device *pdev)
/* populate MTD interface based on ECC scheme */
nand_chip->ecc.layout = &omap_oobinfo;
ecclayout = &omap_oobinfo;
- switch (pdata->ecc_opt) {
+ switch (info->ecc_opt) {
case OMAP_ECC_HAM1_CODE_HW:
pr_info("nand: using OMAP_ECC_HAM1_CODE_HW\n");
nand_chip->ecc.mode = NAND_ECC_HW;
--
1.8.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v7 2/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: rename ambiguous variable 'eccsize' and 'ecc_vector_size'
2014-01-17 7:13 [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Pekon Gupta
2014-01-17 7:13 ` [PATCH v7 1/6] mtd: nand: omap: add field to indicate current ecc-scheme in 'struct omap_nand_info' Pekon Gupta
@ 2014-01-17 7:13 ` Pekon Gupta
2014-01-17 7:13 ` [PATCH v7 3/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix erased-page detection for BCHx_HW ECC schemes Pekon Gupta
` (5 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Pekon Gupta @ 2014-01-17 7:13 UTC (permalink / raw)
To: Brian Norris, Artem Bityutskiy
Cc: Stefan Roese, linux-mtd, Pekon Gupta, Felipe Balbi
renaming following variables as they cause confusion due to resemblence to
another similar field in 'struct nand_ecc_ctrl' (nand_chip->ecc.size).
ecc_vector_size = info->nand.ecc.bytes ---> eccbytes = info->nand.ecc.bytes;
eccsize = ecc_vector_size - 1; ---> actual_eccbytes - 1;
Signed-off-by: Pekon Gupta <pekon@ti.com>
---
drivers/mtd/nand/omap2.c | 39 +++++++++++++++++++++++----------------
1 file changed, 23 insertions(+), 16 deletions(-)
diff --git a/drivers/mtd/nand/omap2.c b/drivers/mtd/nand/omap2.c
index 8fb64d5..2c73389 100644
--- a/drivers/mtd/nand/omap2.c
+++ b/drivers/mtd/nand/omap2.c
@@ -1359,9 +1359,10 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
{
struct omap_nand_info *info = container_of(mtd, struct omap_nand_info,
mtd);
+ int eccbytes = info->nand.ecc.bytes;
int eccsteps = info->nand.ecc.steps;
int i , j, stat = 0;
- int eccsize, eccflag, ecc_vector_size;
+ int eccflag, actual_eccbytes;
struct elm_errorvec err_vec[ERROR_VECTOR_MAX];
u_char *ecc_vec = calc_ecc;
u_char *spare_ecc = read_ecc;
@@ -1369,6 +1370,20 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
enum bch_ecc type;
bool is_error_reported = false;
+ switch (info->ecc_opt) {
+ case OMAP_ECC_BCH4_CODE_HW:
+ /* omit 7th ECC byte reserved for ROM code compatibility */
+ actual_eccbytes = eccbytes - 1;
+ break;
+ case OMAP_ECC_BCH8_CODE_HW:
+ /* omit 14th ECC byte reserved for ROM code compatibility */
+ actual_eccbytes = eccbytes - 1;
+ break;
+ default:
+ pr_err("invalid driver configuration\n");
+ return -EINVAL;
+ }
+
/* Initialize elm error vector to zero */
memset(err_vec, 0, sizeof(err_vec));
@@ -1380,14 +1395,6 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
erased_ecc_vec = bch4_vector;
}
- ecc_vector_size = info->nand.ecc.bytes;
-
- /*
- * Remove extra byte padding for BCH8 RBL
- * compatibility and erased page handling
- */
- eccsize = ecc_vector_size - 1;
-
for (i = 0; i < eccsteps ; i++) {
eccflag = 0; /* initialize eccflag */
@@ -1395,8 +1402,7 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
* Check any error reported,
* In case of error, non zero ecc reported.
*/
-
- for (j = 0; (j < eccsize); j++) {
+ for (j = 0; j < actual_eccbytes; j++) {
if (calc_ecc[j] != 0) {
eccflag = 1; /* non zero ecc, error present */
break;
@@ -1421,7 +1427,7 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
* zeros are more than threshold erased page, either
* case page reported as uncorrectable.
*/
- if (hweight8(~read_ecc[eccsize]) >= threshold) {
+ if (hweight8(~read_ecc[actual_eccbytes]) >= threshold) {
/*
* Update elm error vector as
* data area is programmed
@@ -1433,7 +1439,8 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
int bitflip_count;
u_char *buf = &data[info->nand.ecc.size * i];
- if (memcmp(calc_ecc, erased_ecc_vec, eccsize)) {
+ if (memcmp(calc_ecc, erased_ecc_vec,
+ actual_eccbytes)) {
bitflip_count = erased_sector_bitflips(
buf, read_ecc, info);
@@ -1446,8 +1453,8 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
}
/* Update the ecc vector */
- calc_ecc += ecc_vector_size;
- read_ecc += ecc_vector_size;
+ calc_ecc += eccbytes;
+ read_ecc += eccbytes;
}
/* Check if any error reported */
@@ -1496,7 +1503,7 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
/* Update page data with sector size */
data += info->nand.ecc.size;
- spare_ecc += ecc_vector_size;
+ spare_ecc += eccbytes;
}
for (i = 0; i < eccsteps; i++)
--
1.8.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v7 3/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix erased-page detection for BCHx_HW ECC schemes
2014-01-17 7:13 [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Pekon Gupta
2014-01-17 7:13 ` [PATCH v7 1/6] mtd: nand: omap: add field to indicate current ecc-scheme in 'struct omap_nand_info' Pekon Gupta
2014-01-17 7:13 ` [PATCH v7 2/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: rename ambiguous variable 'eccsize' and 'ecc_vector_size' Pekon Gupta
@ 2014-01-17 7:13 ` Pekon Gupta
2014-02-11 21:34 ` Brian Norris
2014-01-17 7:13 ` [PATCH v7 4/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: remove redundant bit-flip counting for erased-page Pekon Gupta
` (4 subsequent siblings)
7 siblings, 1 reply; 12+ messages in thread
From: Pekon Gupta @ 2014-01-17 7:13 UTC (permalink / raw)
To: Brian Norris, Artem Bityutskiy
Cc: Stefan Roese, linux-mtd, Pekon Gupta, Felipe Balbi
As erased-pages do not have ECC stored in their OOB area, so they need to be
seperated out from programmed-pages, before doing BCH ECC correction.
In current implementation of omap_elm_correct_data() which does ECC correction
for BCHx ECC schemes, this erased-pages are detected based on specific marker
byte (reserved as 0x00) in ecc-layout.
However, this approach has some limitation like;
1) All ecc-scheme layouts do not have such Reserved byte marker to
differentiate between erased-page v/s programmed-page. Thus this is a
customized solution.
2) Reserved marker byte can itself be subjected to bit-flips causing
erased-page to be misunderstood as programmed-page.
This patch removes dependency on any marker byte in ecc-layout, to differentiate
between erased-page v/s programeed-page. Instead a page is considered erased if
(a) all(OOB) == 0xff, .i.e., number of zero-bits in read_ecc[] == 0
(b) number of zero-bits in (Data + OOB) is less than ecc-strength
This patch also adds a generic function count_zero_bits(), to find number of
bits which are '0' in given buffer. This function is optimized for comparing
read-data with 0xff.
Signed-off-by: Pekon Gupta <pekon@ti.com>
---
drivers/mtd/nand/omap2.c | 88 ++++++++++++++++++++++++++++++++++++++----------
1 file changed, 71 insertions(+), 17 deletions(-)
diff --git a/drivers/mtd/nand/omap2.c b/drivers/mtd/nand/omap2.c
index 2c73389..f833fbb 100644
--- a/drivers/mtd/nand/omap2.c
+++ b/drivers/mtd/nand/omap2.c
@@ -1331,6 +1331,30 @@ static int erased_sector_bitflips(u_char *data, u_char *oob,
}
/**
+ * count_zero_bits - count number of bit-flips in given buffer
+ * @buf: pointer to buffer
+ * @length: buffer length
+ * @max_bitflips: maximum number of correctable bit-flips (ecc.strength)
+ * $bitflip_count: pointer to store bit-flip count
+ *
+ * Counts number of bit-flips in given buffer, returns with error
+ * as soon as count exceeds max_bitflips limit.
+ */
+static int count_zero_bits(u_char *buf, int length,
+ int max_bitflips, int *bitflip_count)
+{ int i;
+
+ for (i = 0; i < length; i++) {
+ if (unlikely(buf[i] != 0xff)) {
+ *bitflip_count += hweight8(~buf[i]);
+ if (unlikely(*bitflip_count > max_bitflips))
+ return -EBADMSG;
+ }
+ }
+ return 0;
+}
+
+/**
* omap_elm_correct_data - corrects page data area in case error reported
* @mtd: MTD device structure
* @data: page data
@@ -1359,14 +1383,21 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
{
struct omap_nand_info *info = container_of(mtd, struct omap_nand_info,
mtd);
- int eccbytes = info->nand.ecc.bytes;
- int eccsteps = info->nand.ecc.steps;
+ struct nand_ecc_ctrl *ecc = &info->nand.ecc;
+ int eccstrength = ecc->strength;
+ int eccsize = ecc->size;
+ int eccbytes = ecc->bytes;
+ int eccsteps = ecc->steps;
int i , j, stat = 0;
int eccflag, actual_eccbytes;
struct elm_errorvec err_vec[ERROR_VECTOR_MAX];
u_char *ecc_vec = calc_ecc;
u_char *spare_ecc = read_ecc;
u_char *erased_ecc_vec;
+ u_char *buf;
+ int bitflip_count;
+ int err;
+ bool page_is_erased;
enum bch_ecc type;
bool is_error_reported = false;
@@ -1410,24 +1441,47 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
}
if (eccflag == 1) {
- /*
- * Set threshold to minimum of 4, half of ecc.strength/2
- * to allow max bit flip in byte to 4
- */
- unsigned int threshold = min_t(unsigned int, 4,
- info->nand.ecc.strength / 2);
+ bitflip_count = 0;
+ /* count zero-bits in OOB region */
+ err = count_zero_bits(read_ecc, eccbytes,
+ eccstrength, &bitflip_count);
+ if (err) {
+ /*
+ * number of zero-bits in OOB > ecc-strength
+ * either un-correctable or programmed-page
+ */
+ page_is_erased = false;
+ } else if (bitflip_count == 0) {
+ /* OOB == all(0xff) */
+ page_is_erased = true;
+ } else {
+ /*
+ * OOB has some bits as '0' but
+ * number of zero-bits in OOB < ecc-strength.
+ * It may be erased-page with bit-flips so,
+ * further checks are needed for confirmation
+ */
+ /* count zero-bits in DATA region */
+ buf = &data[eccsize * i];
+ err = count_zero_bits(buf, eccsize,
+ eccstrength, &bitflip_count);
+ if (err) {
+ /*
+ * total number of zero-bits in OOB
+ * and DATA exceeds ecc-strength
+ */
+ page_is_erased = false;
+ } else {
+ /* number of zero-bits < ecc-strength */
+ page_is_erased = true;
+ }
+ }
/*
- * Check data area is programmed by counting
- * number of 0's at fixed offset in spare area.
- * Checking count of 0's against threshold.
- * In case programmed page expects at least threshold
- * zeros in byte.
- * If zeros are less than threshold for programmed page/
- * zeros are more than threshold erased page, either
- * case page reported as uncorrectable.
+ * erased-pages do not have ECC stored in OOB area,
+ * so they need to be handled separately.
*/
- if (hweight8(~read_ecc[actual_eccbytes]) >= threshold) {
+ if (!page_is_erased) {
/*
* Update elm error vector as
* data area is programmed
--
1.8.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v7 4/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: remove redundant bit-flip counting for erased-page
2014-01-17 7:13 [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Pekon Gupta
` (2 preceding siblings ...)
2014-01-17 7:13 ` [PATCH v7 3/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix erased-page detection for BCHx_HW ECC schemes Pekon Gupta
@ 2014-01-17 7:13 ` Pekon Gupta
2014-01-17 7:13 ` [PATCH v7 5/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: cleanup for future enhancements Pekon Gupta
` (3 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Pekon Gupta @ 2014-01-17 7:13 UTC (permalink / raw)
To: Brian Norris, Artem Bityutskiy
Cc: Stefan Roese, linux-mtd, Pekon Gupta, Felipe Balbi
This patch:
- Removes erased_sector_bitflips() as counting of bit-flips is handled by
count_zero_bits() in more optimized way.
And count of bit-flips in both OOB and DATA region is anyways required for
correct identification of erased-pages.
- read-buffer is pre-filled with 0xff when an erased-page is detected, as
erased-page can only contain 0xff bytes.
- Removes static arrays bch8_vector[] and bch4_vector, which stores
ECC signature with all(0xff) bytes.
Signed-off-by: Pekon Gupta <pekon@ti.com>
---
drivers/mtd/nand/omap2.c | 88 +++++-------------------------------------------
1 file changed, 9 insertions(+), 79 deletions(-)
diff --git a/drivers/mtd/nand/omap2.c b/drivers/mtd/nand/omap2.c
index f833fbb..78b7106 100644
--- a/drivers/mtd/nand/omap2.c
+++ b/drivers/mtd/nand/omap2.c
@@ -141,11 +141,6 @@
#define BADBLOCK_MARKER_LENGTH 2
-#ifdef CONFIG_MTD_NAND_OMAP_BCH
-static u_char bch8_vector[] = {0xf3, 0xdb, 0x14, 0x16, 0x8b, 0xd2, 0xbe, 0xcc,
- 0xac, 0x6b, 0xff, 0x99, 0x7b};
-static u_char bch4_vector[] = {0x00, 0x6b, 0x31, 0xdd, 0x41, 0xbc, 0x10};
-#endif
/* oob info generated runtime depending on ecc algorithm and layout selected */
static struct nand_ecclayout omap_oobinfo;
@@ -1292,45 +1287,6 @@ static int omap3_calculate_ecc_bch(struct mtd_info *mtd, const u_char *dat,
}
/**
- * erased_sector_bitflips - count bit flips
- * @data: data sector buffer
- * @oob: oob buffer
- * @info: omap_nand_info
- *
- * Check the bit flips in erased page falls below correctable level.
- * If falls below, report the page as erased with correctable bit
- * flip, else report as uncorrectable page.
- */
-static int erased_sector_bitflips(u_char *data, u_char *oob,
- struct omap_nand_info *info)
-{
- int flip_bits = 0, i;
-
- for (i = 0; i < info->nand.ecc.size; i++) {
- flip_bits += hweight8(~data[i]);
- if (flip_bits > info->nand.ecc.strength)
- return 0;
- }
-
- for (i = 0; i < info->nand.ecc.bytes - 1; i++) {
- flip_bits += hweight8(~oob[i]);
- if (flip_bits > info->nand.ecc.strength)
- return 0;
- }
-
- /*
- * Bit flips falls in correctable level.
- * Fill data area with 0xFF
- */
- if (flip_bits) {
- memset(data, 0xFF, info->nand.ecc.size);
- memset(oob, 0xFF, info->nand.ecc.bytes);
- }
-
- return flip_bits;
-}
-
-/**
* count_zero_bits - count number of bit-flips in given buffer
* @buf: pointer to buffer
* @length: buffer length
@@ -1359,24 +1315,10 @@ static int count_zero_bits(u_char *buf, int length,
* @mtd: MTD device structure
* @data: page data
* @read_ecc: ecc read from nand flash
- * @calc_ecc: ecc read from HW ECC registers
- *
- * Calculated ecc vector reported as zero in case of non-error pages.
- * In case of error/erased pages non-zero error vector is reported.
- * In case of non-zero ecc vector, check read_ecc at fixed offset
- * (x = 13/7 in case of BCH8/4 == 0) to find page programmed or not.
- * To handle bit flips in this data, count the number of 0's in
- * read_ecc[x] and check if it greater than 4. If it is less, it is
- * programmed page, else erased page.
- *
- * 1. If page is erased, check with standard ecc vector (ecc vector
- * for erased page to find any bit flip). If check fails, bit flip
- * is present in erased page. Count the bit flips in erased page and
- * if it falls under correctable level, report page with 0xFF and
- * update the correctable bit information.
- * 2. If error is reported on programmed page, update elm error
- * vector and correct the page with ELM error correction routine.
- *
+ * @calc_ecc: ecc calculated after reading Data and OOB regions from flash
+ * calc_ecc would be non-zero only in following cases:
+ * - its a programmed-page with bit-flips
+ * - its an erased-page (may or may not have bit-flips)
*/
static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
u_char *read_ecc, u_char *calc_ecc)
@@ -1393,7 +1335,6 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
struct elm_errorvec err_vec[ERROR_VECTOR_MAX];
u_char *ecc_vec = calc_ecc;
u_char *spare_ecc = read_ecc;
- u_char *erased_ecc_vec;
u_char *buf;
int bitflip_count;
int err;
@@ -1420,10 +1361,8 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
if (info->nand.ecc.strength == BCH8_MAX_ERROR) {
type = BCH8_ECC;
- erased_ecc_vec = bch8_vector;
} else {
type = BCH4_ECC;
- erased_ecc_vec = bch4_vector;
}
for (i = 0; i < eccsteps ; i++) {
@@ -1489,20 +1428,11 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
err_vec[i].error_reported = true;
is_error_reported = true;
} else {
- /* Error reported in erased page */
- int bitflip_count;
- u_char *buf = &data[info->nand.ecc.size * i];
-
- if (memcmp(calc_ecc, erased_ecc_vec,
- actual_eccbytes)) {
- bitflip_count = erased_sector_bitflips(
- buf, read_ecc, info);
-
- if (bitflip_count)
- stat += bitflip_count;
- else
- return -EINVAL;
- }
+ /* report bit-flips found in erased-page */
+ stat += bitflip_count;
+ /* return clean read-buffer */
+ buf = &data[eccsize * i];
+ memset(buf, 0xff, eccsize);
}
}
--
1.8.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v7 5/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: cleanup for future enhancements
2014-01-17 7:13 [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Pekon Gupta
` (3 preceding siblings ...)
2014-01-17 7:13 ` [PATCH v7 4/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: remove redundant bit-flip counting for erased-page Pekon Gupta
@ 2014-01-17 7:13 ` Pekon Gupta
2014-01-17 7:13 ` [PATCH v7 6/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix programmed-page bit-flip correction logic Pekon Gupta
` (2 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Pekon Gupta @ 2014-01-17 7:13 UTC (permalink / raw)
To: Brian Norris, Artem Bityutskiy
Cc: Stefan Roese, linux-mtd, Pekon Gupta, Felipe Balbi
Current omap_elm_correct_data() code is not scalable for future ecc-schemes
due to presence of tweaks and hard-coded macros for BCH4_ECC and BCH8_ECC
ecc-schemes at multiple places.
This patch:
- replaces 'ecc_opt' with '(info->nand.ecc.strength == BCH8_MAX_ERROR)
used to differentiate between BCH8_HW and BCH4_SW
- replaces macros (defining magic number for specific ecc-scheme) with
generic variables
- removes dependency on macros defined in elm.h (like BCHx_ECC_OOB_BYTES)
Signed-off-by: Pekon Gupta <pekon@ti.com>
---
drivers/mtd/nand/omap2.c | 38 +++++++++++++-------------------------
1 file changed, 13 insertions(+), 25 deletions(-)
diff --git a/drivers/mtd/nand/omap2.c b/drivers/mtd/nand/omap2.c
index 78b7106..d0d9682 100644
--- a/drivers/mtd/nand/omap2.c
+++ b/drivers/mtd/nand/omap2.c
@@ -118,14 +118,9 @@
#define OMAP24XX_DMA_GPMC 4
-#define BCH8_MAX_ERROR 8 /* upto 8 bit correctable */
-#define BCH4_MAX_ERROR 4 /* upto 4 bit correctable */
-
#define SECTOR_BYTES 512
/* 4 bit padding to make byte aligned, 56 = 52 + 4 */
#define BCH4_BIT_PAD 4
-#define BCH8_ECC_MAX ((SECTOR_BYTES + BCH8_ECC_OOB_BYTES) * 8)
-#define BCH4_ECC_MAX ((SECTOR_BYTES + BCH4_ECC_OOB_BYTES) * 8)
/* GPMC ecc engine settings for read */
#define BCH_WRAPMODE_1 1 /* BCH wrap mode 1 */
@@ -1338,8 +1333,8 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
u_char *buf;
int bitflip_count;
int err;
+ u32 bit_pos, byte_pos, error_max, pos;
bool page_is_erased;
- enum bch_ecc type;
bool is_error_reported = false;
switch (info->ecc_opt) {
@@ -1359,12 +1354,6 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
/* Initialize elm error vector to zero */
memset(err_vec, 0, sizeof(err_vec));
- if (info->nand.ecc.strength == BCH8_MAX_ERROR) {
- type = BCH8_ECC;
- } else {
- type = BCH4_ECC;
- }
-
for (i = 0; i < eccsteps ; i++) {
eccflag = 0; /* initialize eccflag */
@@ -1451,20 +1440,19 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
for (i = 0; i < eccsteps; i++) {
if (err_vec[i].error_reported) {
for (j = 0; j < err_vec[i].error_count; j++) {
- u32 bit_pos, byte_pos, error_max, pos;
-
- if (type == BCH8_ECC)
- error_max = BCH8_ECC_MAX;
- else
- error_max = BCH4_ECC_MAX;
-
- if (info->nand.ecc.strength == BCH8_MAX_ERROR)
- pos = err_vec[i].error_loc[j];
- else
- /* Add 4 to take care 4 bit padding */
+ switch (info->ecc_opt) {
+ case OMAP_ECC_BCH4_CODE_HW:
+ /* Add 4 bits to take care of padding */
pos = err_vec[i].error_loc[j] +
BCH4_BIT_PAD;
-
+ break;
+ case OMAP_ECC_BCH8_CODE_HW:
+ pos = err_vec[i].error_loc[j];
+ break;
+ default:
+ return -EINVAL;
+ }
+ error_max = (eccsize + actual_eccbytes) * 8;
/* Calculate bit position of error */
bit_pos = pos % 8;
@@ -1486,7 +1474,7 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
stat += err_vec[i].error_count;
/* Update page data with sector size */
- data += info->nand.ecc.size;
+ data += eccsize;
spare_ecc += eccbytes;
}
--
1.8.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v7 6/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix programmed-page bit-flip correction logic
2014-01-17 7:13 [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Pekon Gupta
` (4 preceding siblings ...)
2014-01-17 7:13 ` [PATCH v7 5/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: cleanup for future enhancements Pekon Gupta
@ 2014-01-17 7:13 ` Pekon Gupta
2014-01-29 13:18 ` [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Stefan Roese
2014-02-04 11:37 ` Gupta, Pekon
7 siblings, 0 replies; 12+ messages in thread
From: Pekon Gupta @ 2014-01-17 7:13 UTC (permalink / raw)
To: Brian Norris, Artem Bityutskiy
Cc: Stefan Roese, linux-mtd, Pekon Gupta, Felipe Balbi
This patch updates following checks when bit-flips are detected by ELM:
- Do not evaluate bit-flips when un-correctable bit-flips is reported by ELM,
because as per [1] when ELM reports an un-correctable bit-flips,
'number of error' field in its ELM_LOCATION_STATUS register is also invalid.
- Return with error-code '-EBADMSG' on detection of un-correctable bit-flip.
- Return with error-code '-EBADMSG' when bit-flips position is outside current
Sector and OOB area.
[1] ELM IP spec Table-25 ELM_LOCATION_STATUS Register.
ELM_LOCATION_STATUS[8] = ECC_CORRECTABLE: Error location process exit status
0x0: ECC error location process failed.
Number of errors and error locations are invalid.
0x1: all errors were successfully located.
Number of errors and error locations are valid.
Signed-off-by: Pekon Gupta <pekon@ti.com>
---
drivers/mtd/nand/omap2.c | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)
diff --git a/drivers/mtd/nand/omap2.c b/drivers/mtd/nand/omap2.c
index d0d9682..ada278e 100644
--- a/drivers/mtd/nand/omap2.c
+++ b/drivers/mtd/nand/omap2.c
@@ -1437,8 +1437,12 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
/* Decode BCH error using ELM module */
elm_decode_bch_error_page(info->elm_dev, ecc_vec, err_vec);
+ err = 0;
for (i = 0; i < eccsteps; i++) {
- if (err_vec[i].error_reported) {
+ if (err_vec[i].error_uncorrectable) {
+ pr_err("nand: uncorrectable bit-flips found\n");
+ err = -EBADMSG;
+ } else if (err_vec[i].error_reported) {
for (j = 0; j < err_vec[i].error_count; j++) {
switch (info->ecc_opt) {
case OMAP_ECC_BCH4_CODE_HW:
@@ -1460,13 +1464,22 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
byte_pos = (error_max - pos - 1) / 8;
if (pos < error_max) {
- if (byte_pos < 512)
+ if (byte_pos < 512) {
+ pr_debug("bitflip@data %d:%d\n",
+ byte_pos, bit_pos);
data[byte_pos] ^= 1 << bit_pos;
- else
+ } else {
+ pr_debug("bitflip@oob %d:%d\n",
+ (byte_pos - 512),
+ bit_pos);
spare_ecc[byte_pos - 512] ^=
1 << bit_pos;
+ }
+ } else {
+ pr_err("invalid bit-flip @ %d:%d\n",
+ byte_pos, bit_pos);
+ err = -EBADMSG;
}
- /* else, not interested to correct ecc */
}
}
@@ -1478,12 +1491,7 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
spare_ecc += eccbytes;
}
- for (i = 0; i < eccsteps; i++)
- /* Return error if uncorrectable error present */
- if (err_vec[i].error_uncorrectable)
- return -EINVAL;
-
- return stat;
+ return (err) ? err : stat;
}
/**
--
1.8.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes
2014-01-17 7:13 [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Pekon Gupta
` (5 preceding siblings ...)
2014-01-17 7:13 ` [PATCH v7 6/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix programmed-page bit-flip correction logic Pekon Gupta
@ 2014-01-29 13:18 ` Stefan Roese
2014-02-04 11:37 ` Gupta, Pekon
7 siblings, 0 replies; 12+ messages in thread
From: Stefan Roese @ 2014-01-29 13:18 UTC (permalink / raw)
To: Pekon Gupta, Brian Norris, Artem Bityutskiy; +Cc: linux-mtd, Felipe Balbi
On 17.01.2014 08:13, Pekon Gupta wrote:
> *changes v6 -> v7*
> [PATCH 1/6] <no change>
> [PATCH 2/6] <no change>
> [PATCH 3/6] <dropped older version, refer commit message>
> - erased-page detection done based on count of zero-bits in OOB & DATA
> - incorporated some feedbacks from Brian Norris <computersforpeace@gmail.com>
> http://lists.infradead.org/pipermail/linux-mtd/2014-January/051533.html
> http://lists.infradead.org/pipermail/linux-mtd/2014-January/051534.html
> [PATCH 4/6] <new> remove redundant bit-flip counting for erased-page
> [PATCH 5/6] <minor cleanup>
> [PATCH 6/6] <minor cleanup>
> (As there are functional changes in this version, so not attaching
> Tested-by: Stefan Roese <sr@denx.de>)
Thanks, Pekon. This still fixes some problems we have seen on an AM335x
board with ECC errors on UBI/UBIFS mounting. So for the whole patch series:
Tested-by: Stefan Roese <sr@denx.de>
Thanks,
Stefan
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes
2014-01-17 7:13 [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Pekon Gupta
` (6 preceding siblings ...)
2014-01-29 13:18 ` [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Stefan Roese
@ 2014-02-04 11:37 ` Gupta, Pekon
7 siblings, 0 replies; 12+ messages in thread
From: Gupta, Pekon @ 2014-02-04 11:37 UTC (permalink / raw)
To: Brian Norris, Artem Bityutskiy; +Cc: Stefan Roese, linux-mtd, Balbi, Felipe
Hi Brian,
>From: Gupta, Pekon
>*changes v6 -> v7*
>[PATCH 1/6] <no change>
>[PATCH 2/6] <no change>
>[PATCH 3/6] <dropped older version, refer commit message>
> - erased-page detection done based on count of zero-bits in OOB & DATA
> - incorporated some feedbacks from Brian Norris <computersforpeace@gmail.com>
> http://lists.infradead.org/pipermail/linux-mtd/2014-January/051533.html
> http://lists.infradead.org/pipermail/linux-mtd/2014-January/051534.html
>[PATCH 4/6] <new> remove redundant bit-flip counting for erased-page
>[PATCH 5/6] <minor cleanup>
>[PATCH 6/6] <minor cleanup>
>
Request you to please review this one, as it would keep me going for other
Clean-ups and optimizations on top of this series.
with regards, pekon
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v7 3/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix erased-page detection for BCHx_HW ECC schemes
2014-01-17 7:13 ` [PATCH v7 3/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix erased-page detection for BCHx_HW ECC schemes Pekon Gupta
@ 2014-02-11 21:34 ` Brian Norris
2014-02-12 10:31 ` Gupta, Pekon
0 siblings, 1 reply; 12+ messages in thread
From: Brian Norris @ 2014-02-11 21:34 UTC (permalink / raw)
To: Pekon Gupta; +Cc: Stefan Roese, linux-mtd, Felipe Balbi, Artem Bityutskiy
Hi Pekon,
On Fri, Jan 17, 2014 at 12:43:14PM +0530, Pekon Gupta wrote:
> As erased-pages do not have ECC stored in their OOB area, so they need to be
> seperated out from programmed-pages, before doing BCH ECC correction.
>
> In current implementation of omap_elm_correct_data() which does ECC correction
> for BCHx ECC schemes, this erased-pages are detected based on specific marker
> byte (reserved as 0x00) in ecc-layout.
> However, this approach has some limitation like;
> 1) All ecc-scheme layouts do not have such Reserved byte marker to
> differentiate between erased-page v/s programmed-page. Thus this is a
> customized solution.
> 2) Reserved marker byte can itself be subjected to bit-flips causing
> erased-page to be misunderstood as programmed-page.
>
> This patch removes dependency on any marker byte in ecc-layout, to differentiate
> between erased-page v/s programeed-page. Instead a page is considered erased if
> (a) all(OOB) == 0xff, .i.e., number of zero-bits in read_ecc[] == 0
> (b) number of zero-bits in (Data + OOB) is less than ecc-strength
>
> This patch also adds a generic function count_zero_bits(), to find number of
> bits which are '0' in given buffer. This function is optimized for comparing
> read-data with 0xff.
>
> Signed-off-by: Pekon Gupta <pekon@ti.com>
> ---
> drivers/mtd/nand/omap2.c | 88 ++++++++++++++++++++++++++++++++++++++----------
> 1 file changed, 71 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/mtd/nand/omap2.c b/drivers/mtd/nand/omap2.c
> index 2c73389..f833fbb 100644
> --- a/drivers/mtd/nand/omap2.c
> +++ b/drivers/mtd/nand/omap2.c
> @@ -1331,6 +1331,30 @@ static int erased_sector_bitflips(u_char *data, u_char *oob,
> }
>
> /**
> + * count_zero_bits - count number of bit-flips in given buffer
> + * @buf: pointer to buffer
> + * @length: buffer length
> + * @max_bitflips: maximum number of correctable bit-flips (ecc.strength)
> + * $bitflip_count: pointer to store bit-flip count
> + *
> + * Counts number of bit-flips in given buffer, returns with error
> + * as soon as count exceeds max_bitflips limit.
> + */
> +static int count_zero_bits(u_char *buf, int length,
> + int max_bitflips, int *bitflip_count)
> +{ int i;
> +
> + for (i = 0; i < length; i++) {
> + if (unlikely(buf[i] != 0xff)) {
> + *bitflip_count += hweight8(~buf[i]);
> + if (unlikely(*bitflip_count > max_bitflips))
> + return -EBADMSG;
> + }
> + }
> + return 0;
> +}
> +
> +/**
> * omap_elm_correct_data - corrects page data area in case error reported
> * @mtd: MTD device structure
> * @data: page data
> @@ -1359,14 +1383,21 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
> {
> struct omap_nand_info *info = container_of(mtd, struct omap_nand_info,
> mtd);
> - int eccbytes = info->nand.ecc.bytes;
> - int eccsteps = info->nand.ecc.steps;
> + struct nand_ecc_ctrl *ecc = &info->nand.ecc;
> + int eccstrength = ecc->strength;
> + int eccsize = ecc->size;
> + int eccbytes = ecc->bytes;
> + int eccsteps = ecc->steps;
When I recommended adding the 'ecc' variable (as you did), I was
suggesting you drop the next 4 variables. You don't need them when you
can (with just as few characters) refer to ecc->field directly and
easily. Stick with one or ther other -- the 4 local variables or the 1
ecc pointer -- but you don't need both.
> int i , j, stat = 0;
> int eccflag, actual_eccbytes;
> struct elm_errorvec err_vec[ERROR_VECTOR_MAX];
> u_char *ecc_vec = calc_ecc;
> u_char *spare_ecc = read_ecc;
> u_char *erased_ecc_vec;
> + u_char *buf;
> + int bitflip_count;
> + int err;
> + bool page_is_erased;
> enum bch_ecc type;
> bool is_error_reported = false;
>
> @@ -1410,24 +1441,47 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
> }
>
> if (eccflag == 1) {
> - /*
> - * Set threshold to minimum of 4, half of ecc.strength/2
> - * to allow max bit flip in byte to 4
> - */
> - unsigned int threshold = min_t(unsigned int, 4,
> - info->nand.ecc.strength / 2);
> + bitflip_count = 0;
> + /* count zero-bits in OOB region */
> + err = count_zero_bits(read_ecc, eccbytes,
> + eccstrength, &bitflip_count);
> + if (err) {
> + /*
> + * number of zero-bits in OOB > ecc-strength
> + * either un-correctable or programmed-page
> + */
> + page_is_erased = false;
> + } else if (bitflip_count == 0) {
> + /* OOB == all(0xff) */
> + page_is_erased = true;
This else-if is still incorrect. You cannot assume that just because the
OOB is all 0xff that the page is erased.
> + } else {
> + /*
> + * OOB has some bits as '0' but
> + * number of zero-bits in OOB < ecc-strength.
> + * It may be erased-page with bit-flips so,
> + * further checks are needed for confirmation
> + */
> + /* count zero-bits in DATA region */
> + buf = &data[eccsize * i];
> + err = count_zero_bits(buf, eccsize,
> + eccstrength, &bitflip_count);
> + if (err) {
> + /*
> + * total number of zero-bits in OOB
> + * and DATA exceeds ecc-strength
> + */
> + page_is_erased = false;
> + } else {
> + /* number of zero-bits < ecc-strength */
> + page_is_erased = true;
> + }
> + }
This whole block (where you call count_zero_bits() twice) is a
convoluted and buggy way of just doing the following, AFAIU:
page_is_erased = !count_zero_bits(read_ecc, ecc->bytes,
ecc->strength, &bitflip_count) &&
!count_zero_bits(&data[ecc->size * i], ecc->size,
ecc->strength, &bitflip_count);
You could modify that to your liking and add a comment, but it's much
more concise than your version, and from what I can tell, it's actually
correct.
>
> /*
> - * Check data area is programmed by counting
> - * number of 0's at fixed offset in spare area.
> - * Checking count of 0's against threshold.
> - * In case programmed page expects at least threshold
> - * zeros in byte.
> - * If zeros are less than threshold for programmed page/
> - * zeros are more than threshold erased page, either
> - * case page reported as uncorrectable.
> + * erased-pages do not have ECC stored in OOB area,
> + * so they need to be handled separately.
> */
> - if (hweight8(~read_ecc[actual_eccbytes]) >= threshold) {
> + if (!page_is_erased) {
> /*
> * Update elm error vector as
> * data area is programmed
Brian
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH v7 3/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix erased-page detection for BCHx_HW ECC schemes
2014-02-11 21:34 ` Brian Norris
@ 2014-02-12 10:31 ` Gupta, Pekon
2014-03-07 19:11 ` Brian Norris
0 siblings, 1 reply; 12+ messages in thread
From: Gupta, Pekon @ 2014-02-12 10:31 UTC (permalink / raw)
To: Brian Norris; +Cc: Stefan Roese, linux-mtd, Balbi, Felipe, Artem Bityutskiy
Hi Brian,
>From: Brian Norris [mailto:computersforpeace@gmail.com]
>>On Fri, Jan 17, 2014 at 12:43:14PM +0530, Pekon Gupta wrote:
>> As erased-pages do not have ECC stored in their OOB area, so they need to be
>> seperated out from programmed-pages, before doing BCH ECC correction.
>>
>> In current implementation of omap_elm_correct_data() which does ECC correction
>> for BCHx ECC schemes, this erased-pages are detected based on specific marker
>> byte (reserved as 0x00) in ecc-layout.
>> However, this approach has some limitation like;
>> 1) All ecc-scheme layouts do not have such Reserved byte marker to
>> differentiate between erased-page v/s programmed-page. Thus this is a
>> customized solution.
>> 2) Reserved marker byte can itself be subjected to bit-flips causing
>> erased-page to be misunderstood as programmed-page.
>>
>> This patch removes dependency on any marker byte in ecc-layout, to differentiate
>> between erased-page v/s programeed-page. Instead a page is considered erased if
>> (a) all(OOB) == 0xff, .i.e., number of zero-bits in read_ecc[] == 0
>> (b) number of zero-bits in (Data + OOB) is less than ecc-strength
>>
>> This patch also adds a generic function count_zero_bits(), to find number of
>> bits which are '0' in given buffer. This function is optimized for comparing
>> read-data with 0xff.
>>
>> Signed-off-by: Pekon Gupta <pekon@ti.com>
>> ---
[...]
>> +/**
>> * omap_elm_correct_data - corrects page data area in case error reported
>> * @mtd: MTD device structure
>> * @data: page data
>> @@ -1359,14 +1383,21 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
>> {
>> struct omap_nand_info *info = container_of(mtd, struct omap_nand_info,
>> mtd);
>> - int eccbytes = info->nand.ecc.bytes;
>> - int eccsteps = info->nand.ecc.steps;
>> + struct nand_ecc_ctrl *ecc = &info->nand.ecc;
>> + int eccstrength = ecc->strength;
>> + int eccsize = ecc->size;
>> + int eccbytes = ecc->bytes;
>> + int eccsteps = ecc->steps;
>
>When I recommended adding the 'ecc' variable (as you did), I was
>suggesting you drop the next 4 variables. You don't need them when you
>can (with just as few characters) refer to ecc->field directly and
>easily. Stick with one or ther other -- the 4 local variables or the 1
>ecc pointer -- but you don't need both.
>
Oh sorry.. I misunderstood that. I'll fix this in next version.
>> @@ -1410,24 +1441,47 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
>> }
>>
>> if (eccflag == 1) {
>> - /*
>> - * Set threshold to minimum of 4, half of ecc.strength/2
>> - * to allow max bit flip in byte to 4
>> - */
>> - unsigned int threshold = min_t(unsigned int, 4,
>> - info->nand.ecc.strength / 2);
>> + bitflip_count = 0;
>> + /* count zero-bits in OOB region */
>> + err = count_zero_bits(read_ecc, eccbytes,
>> + eccstrength, &bitflip_count);
>> + if (err) {
>> + /*
>> + * number of zero-bits in OOB > ecc-strength
>> + * either un-correctable or programmed-page
>> + */
>> + page_is_erased = false;
>> + } else if (bitflip_count == 0) {
>> + /* OOB == all(0xff) */
>> + page_is_erased = true;
>
>This else-if is still incorrect. You cannot assume that just because the
>OOB is all 0xff that the page is erased.
>
Now this part is most important, where I would like to get more clarity
and feedback before I proceed.
----------------------------
if OOB of an page is == all(0xff)
(a) As per general NAND spec, chip->write_buf() _only_ fills the internal buffers of
NAND device's with information to be written. Actual write to NAND array cells
happen only after 'NAND_CMD_PAGEPROG' or 'NAND_CMD_CACHEDPROG' is issued.
Thus both Main-area and OOB-area are programmed simultaneously inside NAND device.
if there is any disruption (like power-cut) in on-going NAND_CMD_PAGEPROG
cycle, then the data should be assumed to be garbage, and it may have un-stable bits.
(b) if OOB==all(0xff) then there is _no_ ECC stored in OOB to check the read_data
Hence driver cannot distinguish between genuine data and bit-flips.
Now based on (a) & (b), it's safe to assume that:
if (OOB == all 0xff)
/* page has no-data | garbage-data*/
Agree ?
(This is where I call it page_is_erased==true, Though I could have used better
variable name as page_has_no_valid_data = true).
----------------------------
And also, driver should return 'total number zero-bits in both Main-area + OOB'
if (0 < 'total number of zero-bits in both OOB + DATA region' < mtd->bitflip_threshold)
return -EUCLEAN;
else
return -EBADMSG;
** However there is a bug in current patch version which I just figured out.
I don't include the number of zero-bits of Main-area, if OOB==all(0xff).
+ } else if (bitflip_count == 0) {
+ /* OOB == all(0xff) */
+ page_is_erased = true;
+ /* missing */ bitflip_count += count_zero_bits(buf, eccsize,
+ eccstrength, &bitflip_count);
------------------------
>> + } else {
>> + /*
>> + * OOB has some bits as '0' but
>> + * number of zero-bits in OOB < ecc-strength.
>> + * It may be erased-page with bit-flips so,
>> + * further checks are needed for confirmation
>> + */
>> + /* count zero-bits in DATA region */
>> + buf = &data[eccsize * i];
>> + err = count_zero_bits(buf, eccsize,
>> + eccstrength, &bitflip_count);
>> + if (err) {
>> + /*
>> + * total number of zero-bits in OOB
>> + * and DATA exceeds ecc-strength
>> + */
>> + page_is_erased = false;
>> + } else {
>> + /* number of zero-bits < ecc-strength */
>> + page_is_erased = true;
>> + }
>> + }
>
>This whole block (where you call count_zero_bits() twice) is a
>convoluted and buggy way of just doing the following, AFAIU:
>
> page_is_erased = !count_zero_bits(read_ecc, ecc->bytes,
> ecc->strength, &bitflip_count) &&
> !count_zero_bits(&data[ecc->size * i], ecc->size,
> ecc->strength, &bitflip_count);
>
>You could modify that to your liking and add a comment, but it's much
>more concise than your version, and from what I can tell, it's actually
>correct.
>
Firstly, In you above statement 's/&&/||'
because as per above statement, if count_zero_bits(oob) == 0, then
my patch assumes 'page_has_no_valid_data = true'.
page_is_erased = !count_zero_bits(read_ecc, ecc->bytes,
ecc->strength, &bitflip_count) ||
!count_zero_bits(&data[ecc->size * i], ecc->size,
ecc->strength, &bitflip_count);
Secondly, You are missing the that here NAND driver is relaxing the limit of
number-of-acceptable bit-flips in an erased-page, because some MLC and
Toshiba SLC NANDs show bit-flips almost on every second already erased-page.
>> + if (err) {
>> + /*
>> + * total number of zero-bits in OOB
>> + * and DATA exceeds ecc-strength
>> + */
>> + page_is_erased = false;
>> + } else {
>> + /* number of zero-bits < ecc-strength */
>> + page_is_erased = true;
>> + }
In above patch I can replace ecc->strength with mtd->bitflip_threshold
to make it more user controllable. As bitflip_threshold is configurable via
/sys/class/mtd/mtdX/bitflip_threshold
What is your opinion ?
This was also the basis of my question to Artem [1]
[1] http://lists.infradead.org/pipermail/linux-mtd/2014-January/051556.html
With regards, pekon
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v7 3/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix erased-page detection for BCHx_HW ECC schemes
2014-02-12 10:31 ` Gupta, Pekon
@ 2014-03-07 19:11 ` Brian Norris
0 siblings, 0 replies; 12+ messages in thread
From: Brian Norris @ 2014-03-07 19:11 UTC (permalink / raw)
To: Gupta, Pekon; +Cc: Stefan Roese, linux-mtd, Balbi, Felipe, Artem Bityutskiy
Hi Pekon,
On Wed, Feb 12, 2014 at 10:31:02AM +0000, Pekon Gupta wrote:
> >From: Brian Norris [mailto:computersforpeace@gmail.com]
> >>On Fri, Jan 17, 2014 at 12:43:14PM +0530, Pekon Gupta wrote:
> >> This patch removes dependency on any marker byte in ecc-layout, to differentiate
> >> between erased-page v/s programeed-page. Instead a page is considered erased if
> >> (a) all(OOB) == 0xff, .i.e., number of zero-bits in read_ecc[] == 0
> >> (b) number of zero-bits in (Data + OOB) is less than ecc-strength
[...]
> >> @@ -1410,24 +1441,47 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
> >> }
> >>
> >> if (eccflag == 1) {
> >> - /*
> >> - * Set threshold to minimum of 4, half of ecc.strength/2
> >> - * to allow max bit flip in byte to 4
> >> - */
> >> - unsigned int threshold = min_t(unsigned int, 4,
> >> - info->nand.ecc.strength / 2);
> >> + bitflip_count = 0;
> >> + /* count zero-bits in OOB region */
> >> + err = count_zero_bits(read_ecc, eccbytes,
> >> + eccstrength, &bitflip_count);
> >> + if (err) {
> >> + /*
> >> + * number of zero-bits in OOB > ecc-strength
> >> + * either un-correctable or programmed-page
> >> + */
> >> + page_is_erased = false;
> >> + } else if (bitflip_count == 0) {
> >> + /* OOB == all(0xff) */
> >> + page_is_erased = true;
> >
> >This else-if is still incorrect. You cannot assume that just because the
> >OOB is all 0xff that the page is erased.
> >
> Now this part is most important, where I would like to get more clarity
> and feedback before I proceed.
> ----------------------------
> if OOB of an page is == all(0xff)
> (a) As per general NAND spec, chip->write_buf() _only_ fills the internal buffers of
> NAND device's with information to be written. Actual write to NAND array cells
> happen only after 'NAND_CMD_PAGEPROG' or 'NAND_CMD_CACHEDPROG' is issued.
I'm not really sure why you bring up chip->write_buf(); not all
implementations actually use this. But anyway...
> Thus both Main-area and OOB-area are programmed simultaneously inside NAND device.
> if there is any disruption (like power-cut) in on-going NAND_CMD_PAGEPROG
> cycle, then the data should be assumed to be garbage, and it may have un-stable bits.
OK. But this is not at all what we're trying to check; we are checking
*only* whether this particular page reads mostly-0xff, due to regular
bitflips on an otherwise unprogrammed page. The "assumed garbage"
conclusion is really irrelevant here.
> (b) if OOB==all(0xff) then there is _no_ ECC stored in OOB to check the read_data
I disagree. Can you guarantee that there is *no* data pattern in which
the matching ECC syndrome bytes are all 0xff?
> Hence driver cannot distinguish between genuine data and bit-flips.
>
> Now based on (a) & (b), it's safe to assume that:
> if (OOB == all 0xff)
> /* page has no-data | garbage-data*/
> Agree ?
No. But more importantly, why do you want to make this conclusion?
Aren't we *only* trying to make a decision of "is this page
mostly-0xff"?
Or more directly: you are using the above conclusion ("page has garbage
data") to then erase the buffer entirely. This is totally, 100% bogus
because the data area might have junk (partially-programmed page?) but
the OOB could be all 0xff -- and you are now lying to the upper layers,
saying the page is cleanly 0xff!!!
I believe one of the two of us has a very dire misunderstanding here.
Please help me decide which of us this is :)
> (This is where I call it page_is_erased==true, Though I could have used better
> variable name as page_has_no_valid_data = true).
> ----------------------------
>
> And also, driver should return 'total number zero-bits in both Main-area + OOB'
> if (0 < 'total number of zero-bits in both OOB + DATA region' < mtd->bitflip_threshold)
> return -EUCLEAN;
> else
> return -EBADMSG;
>
> ** However there is a bug in current patch version which I just figured out.
> I don't include the number of zero-bits of Main-area, if OOB==all(0xff).
> + } else if (bitflip_count == 0) {
> + /* OOB == all(0xff) */
> + page_is_erased = true;
> + /* missing */ bitflip_count += count_zero_bits(buf, eccsize,
> + eccstrength, &bitflip_count);
> ------------------------
>
> >> + } else {
> >> + /*
> >> + * OOB has some bits as '0' but
> >> + * number of zero-bits in OOB < ecc-strength.
> >> + * It may be erased-page with bit-flips so,
> >> + * further checks are needed for confirmation
> >> + */
> >> + /* count zero-bits in DATA region */
> >> + buf = &data[eccsize * i];
> >> + err = count_zero_bits(buf, eccsize,
> >> + eccstrength, &bitflip_count);
> >> + if (err) {
> >> + /*
> >> + * total number of zero-bits in OOB
> >> + * and DATA exceeds ecc-strength
> >> + */
> >> + page_is_erased = false;
> >> + } else {
> >> + /* number of zero-bits < ecc-strength */
> >> + page_is_erased = true;
> >> + }
> >> + }
> >
> >This whole block (where you call count_zero_bits() twice) is a
> >convoluted and buggy way of just doing the following, AFAIU:
> >
> > page_is_erased = !count_zero_bits(read_ecc, ecc->bytes,
> > ecc->strength, &bitflip_count) &&
> > !count_zero_bits(&data[ecc->size * i], ecc->size,
> > ecc->strength, &bitflip_count);
> >
> >You could modify that to your liking and add a comment, but it's much
> >more concise than your version, and from what I can tell, it's actually
> >correct.
> >
> Firstly, In you above statement 's/&&/||'
No. I wrote it exactly as I meant it. We need to check both the data
area and the OOB, and if either cause us to exceed the ECC strength,
then the page is not erased.
> because as per above statement, if count_zero_bits(oob) == 0, then
> my patch assumes 'page_has_no_valid_data = true'.
And that assumption is 100% wrong, AIUI.
> page_is_erased = !count_zero_bits(read_ecc, ecc->bytes,
> ecc->strength, &bitflip_count) ||
> !count_zero_bits(&data[ecc->size * i], ecc->size,
> ecc->strength, &bitflip_count);
>
>
> Secondly, You are missing the that here NAND driver is relaxing the limit of
> number-of-acceptable bit-flips in an erased-page, because some MLC and
> Toshiba SLC NANDs show bit-flips almost on every second already erased-page.
No, I'm not missing this. The MTD/NAND layers already take care of the
"number-of-acceptable bit-flips" using the mtd->bitflip_threshold
parameter.
> >> + if (err) {
> >> + /*
> >> + * total number of zero-bits in OOB
> >> + * and DATA exceeds ecc-strength
> >> + */
> >> + page_is_erased = false;
> >> + } else {
> >> + /* number of zero-bits < ecc-strength */
> >> + page_is_erased = true;
> >> + }
>
> In above patch I can replace ecc->strength with mtd->bitflip_threshold
> to make it more user controllable. As bitflip_threshold is configurable via
> /sys/class/mtd/mtdX/bitflip_threshold
> What is your opinion ?
The hardware driver should not be comparing with 'bitflip_threshold'.
Comparing against ecc_strength is correct, as we are simulating the
current ECC correction strength.
Brian
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-03-07 19:12 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-17 7:13 [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Pekon Gupta
2014-01-17 7:13 ` [PATCH v7 1/6] mtd: nand: omap: add field to indicate current ecc-scheme in 'struct omap_nand_info' Pekon Gupta
2014-01-17 7:13 ` [PATCH v7 2/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: rename ambiguous variable 'eccsize' and 'ecc_vector_size' Pekon Gupta
2014-01-17 7:13 ` [PATCH v7 3/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix erased-page detection for BCHx_HW ECC schemes Pekon Gupta
2014-02-11 21:34 ` Brian Norris
2014-02-12 10:31 ` Gupta, Pekon
2014-03-07 19:11 ` Brian Norris
2014-01-17 7:13 ` [PATCH v7 4/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: remove redundant bit-flip counting for erased-page Pekon Gupta
2014-01-17 7:13 ` [PATCH v7 5/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: cleanup for future enhancements Pekon Gupta
2014-01-17 7:13 ` [PATCH v7 6/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix programmed-page bit-flip correction logic Pekon Gupta
2014-01-29 13:18 ` [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Stefan Roese
2014-02-04 11:37 ` Gupta, Pekon
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.