* [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages @ 2016-02-21 12:52 ` Markus Pargmann 0 siblings, 0 replies; 32+ messages in thread From: Markus Pargmann @ 2016-02-21 12:52 UTC (permalink / raw) To: Han Xu, David Woodhouse, Boris BREZILLON Cc: Brian Norris, Huang Shijie, Fabio Estevam, linux-mtd, linux-arm-kernel, kernel, Markus Pargmann ECC is only calculated for written pages. As erased pages are not actively written the ECC is always invalid. For this purpose the Hardware BCH unit is able to check for erased pages and does not raise an ECC error in this case. This behaviour can be influenced using the BCH_MODE register which sets the number of allowed bitflips in an erased page. Unfortunately the unit is not capable of fixing the bitflips in memory. To avoid complete software checks for erased pages, we can simply check buffers with uncorrectable ECC errors because we know that any erased page with errors is uncorrectable by the BCH unit. This patch adds the generic nand_check_erased_ecc_chunk() to gpmi-nand to correct erased pages. To have the valid data in the buffer before using them, this patch moves the read_page_swap_end() call before the ECC status checking for-loop. Signed-off-by: Markus Pargmann <mpa@pengutronix.de> --- drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 49 ++++++++++++++++++++++++++++++---- 1 file changed, 44 insertions(+), 5 deletions(-) diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c index 235ddcb58f39..ce5a21252102 100644 --- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c +++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c @@ -1035,14 +1035,58 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip, /* Loop over status bytes, accumulating ECC status. */ status = auxiliary_virt + nfc_geo->auxiliary_status_offset; + read_page_swap_end(this, buf, nfc_geo->payload_size, + this->payload_virt, this->payload_phys, + nfc_geo->payload_size, + payload_virt, payload_phys); + for (i = 0; i < nfc_geo->ecc_chunk_count; i++, status++) { if ((*status == STATUS_GOOD) || (*status == STATUS_ERASED)) continue; if (*status == STATUS_UNCORRECTABLE) { + int flips; + + /* + * The ECC hardware has an uncorrectable ECC status + * code in case we have bitflips in an erased page. As + * nothing was written into this subpage the ECC is + * obviously wrong and we can not trust it. We assume + * at this point that we are reading an erased page and + * try to correct the bitflips in buffer up to + * ecc_strength bitflips. If this is a page with random + * data, we exceed this number of bitflips and have a + * ECC failure. Otherwise we use the corrected buffer. + */ + if (i == 0) { + /* The first block includes metadata */ + flips = nand_check_erased_ecc_chunk( + buf + i * nfc_geo->ecc_chunk_size, + nfc_geo->ecc_chunk_size, + NULL, 0, + auxiliary_virt, + nfc_geo->metadata_size, + nfc_geo->ecc_strength); + } else { + flips = nand_check_erased_ecc_chunk( + buf + i * nfc_geo->ecc_chunk_size, + nfc_geo->ecc_chunk_size, + NULL, 0, + NULL, 0, + nfc_geo->ecc_strength); + } + + if (flips > 0) { + max_bitflips = max_t(unsigned int, max_bitflips, + flips); + mtd->ecc_stats.corrected += flips; + continue; + } + mtd->ecc_stats.failed++; continue; } + mtd->ecc_stats.corrected += *status; max_bitflips = max_t(unsigned int, max_bitflips, *status); } @@ -1062,11 +1106,6 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip, chip->oob_poi[0] = ((uint8_t *) auxiliary_virt)[0]; } - read_page_swap_end(this, buf, nfc_geo->payload_size, - this->payload_virt, this->payload_phys, - nfc_geo->payload_size, - payload_virt, payload_phys); - return max_bitflips; } -- 2.7.0 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages @ 2016-02-21 12:52 ` Markus Pargmann 0 siblings, 0 replies; 32+ messages in thread From: Markus Pargmann @ 2016-02-21 12:52 UTC (permalink / raw) To: linux-arm-kernel ECC is only calculated for written pages. As erased pages are not actively written the ECC is always invalid. For this purpose the Hardware BCH unit is able to check for erased pages and does not raise an ECC error in this case. This behaviour can be influenced using the BCH_MODE register which sets the number of allowed bitflips in an erased page. Unfortunately the unit is not capable of fixing the bitflips in memory. To avoid complete software checks for erased pages, we can simply check buffers with uncorrectable ECC errors because we know that any erased page with errors is uncorrectable by the BCH unit. This patch adds the generic nand_check_erased_ecc_chunk() to gpmi-nand to correct erased pages. To have the valid data in the buffer before using them, this patch moves the read_page_swap_end() call before the ECC status checking for-loop. Signed-off-by: Markus Pargmann <mpa@pengutronix.de> --- drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 49 ++++++++++++++++++++++++++++++---- 1 file changed, 44 insertions(+), 5 deletions(-) diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c index 235ddcb58f39..ce5a21252102 100644 --- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c +++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c @@ -1035,14 +1035,58 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip, /* Loop over status bytes, accumulating ECC status. */ status = auxiliary_virt + nfc_geo->auxiliary_status_offset; + read_page_swap_end(this, buf, nfc_geo->payload_size, + this->payload_virt, this->payload_phys, + nfc_geo->payload_size, + payload_virt, payload_phys); + for (i = 0; i < nfc_geo->ecc_chunk_count; i++, status++) { if ((*status == STATUS_GOOD) || (*status == STATUS_ERASED)) continue; if (*status == STATUS_UNCORRECTABLE) { + int flips; + + /* + * The ECC hardware has an uncorrectable ECC status + * code in case we have bitflips in an erased page. As + * nothing was written into this subpage the ECC is + * obviously wrong and we can not trust it. We assume + * at this point that we are reading an erased page and + * try to correct the bitflips in buffer up to + * ecc_strength bitflips. If this is a page with random + * data, we exceed this number of bitflips and have a + * ECC failure. Otherwise we use the corrected buffer. + */ + if (i == 0) { + /* The first block includes metadata */ + flips = nand_check_erased_ecc_chunk( + buf + i * nfc_geo->ecc_chunk_size, + nfc_geo->ecc_chunk_size, + NULL, 0, + auxiliary_virt, + nfc_geo->metadata_size, + nfc_geo->ecc_strength); + } else { + flips = nand_check_erased_ecc_chunk( + buf + i * nfc_geo->ecc_chunk_size, + nfc_geo->ecc_chunk_size, + NULL, 0, + NULL, 0, + nfc_geo->ecc_strength); + } + + if (flips > 0) { + max_bitflips = max_t(unsigned int, max_bitflips, + flips); + mtd->ecc_stats.corrected += flips; + continue; + } + mtd->ecc_stats.failed++; continue; } + mtd->ecc_stats.corrected += *status; max_bitflips = max_t(unsigned int, max_bitflips, *status); } @@ -1062,11 +1106,6 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip, chip->oob_poi[0] = ((uint8_t *) auxiliary_virt)[0]; } - read_page_swap_end(this, buf, nfc_geo->payload_size, - this->payload_virt, this->payload_phys, - nfc_geo->payload_size, - payload_virt, payload_phys); - return max_bitflips; } -- 2.7.0 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages 2016-02-21 12:52 ` Markus Pargmann @ 2016-02-24 13:57 ` Boris Brezillon -1 siblings, 0 replies; 32+ messages in thread From: Boris Brezillon @ 2016-02-24 13:57 UTC (permalink / raw) To: Markus Pargmann Cc: Han Xu, David Woodhouse, Fabio Estevam, linux-mtd, kernel, Huang Shijie, Brian Norris, linux-arm-kernel Hi Markus, On Sun, 21 Feb 2016 13:52:06 +0100 Markus Pargmann <mpa@pengutronix.de> wrote: > ECC is only calculated for written pages. As erased pages are not > actively written the ECC is always invalid. For this purpose the > Hardware BCH unit is able to check for erased pages and does not raise > an ECC error in this case. This behaviour can be influenced using the > BCH_MODE register which sets the number of allowed bitflips in an erased > page. Unfortunately the unit is not capable of fixing the bitflips in > memory. > > To avoid complete software checks for erased pages, we can simply check > buffers with uncorrectable ECC errors because we know that any erased > page with errors is uncorrectable by the BCH unit. > > This patch adds the generic nand_check_erased_ecc_chunk() to gpmi-nand > to correct erased pages. To have the valid data in the buffer before > using them, this patch moves the read_page_swap_end() call before the > ECC status checking for-loop. > > Signed-off-by: Markus Pargmann <mpa@pengutronix.de> > --- > drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 49 ++++++++++++++++++++++++++++++---- > 1 file changed, 44 insertions(+), 5 deletions(-) > > diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > index 235ddcb58f39..ce5a21252102 100644 > --- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > +++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > @@ -1035,14 +1035,58 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip, > /* Loop over status bytes, accumulating ECC status. */ > status = auxiliary_virt + nfc_geo->auxiliary_status_offset; > > + read_page_swap_end(this, buf, nfc_geo->payload_size, > + this->payload_virt, this->payload_phys, > + nfc_geo->payload_size, > + payload_virt, payload_phys); > + > for (i = 0; i < nfc_geo->ecc_chunk_count; i++, status++) { > if ((*status == STATUS_GOOD) || (*status == STATUS_ERASED)) > continue; > > if (*status == STATUS_UNCORRECTABLE) { > + int flips; > + > + /* > + * The ECC hardware has an uncorrectable ECC status > + * code in case we have bitflips in an erased page. As > + * nothing was written into this subpage the ECC is > + * obviously wrong and we can not trust it. We assume > + * at this point that we are reading an erased page and > + * try to correct the bitflips in buffer up to > + * ecc_strength bitflips. If this is a page with random > + * data, we exceed this number of bitflips and have a > + * ECC failure. Otherwise we use the corrected buffer. > + */ > + if (i == 0) { > + /* The first block includes metadata */ > + flips = nand_check_erased_ecc_chunk( > + buf + i * nfc_geo->ecc_chunk_size, > + nfc_geo->ecc_chunk_size, > + NULL, 0, > + auxiliary_virt, > + nfc_geo->metadata_size, > + nfc_geo->ecc_strength); > + } else { > + flips = nand_check_erased_ecc_chunk( > + buf + i * nfc_geo->ecc_chunk_size, > + nfc_geo->ecc_chunk_size, > + NULL, 0, > + NULL, 0, > + nfc_geo->ecc_strength); > + } You're not checking ECC bytes. I know it's a bit more complicated to implement, but as Brian stated several times, you should always check ECC bytes along with data bytes when testing for an erased chunk. Indeed, it might appear that the user really programmed ff on the page, and in this case you don't want to consider the page as erased. In this case, the ECC bytes will be different from ff, and you'll be able to identify it by checking those ECC bytes. Best Regards, Boris -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages @ 2016-02-24 13:57 ` Boris Brezillon 0 siblings, 0 replies; 32+ messages in thread From: Boris Brezillon @ 2016-02-24 13:57 UTC (permalink / raw) To: linux-arm-kernel Hi Markus, On Sun, 21 Feb 2016 13:52:06 +0100 Markus Pargmann <mpa@pengutronix.de> wrote: > ECC is only calculated for written pages. As erased pages are not > actively written the ECC is always invalid. For this purpose the > Hardware BCH unit is able to check for erased pages and does not raise > an ECC error in this case. This behaviour can be influenced using the > BCH_MODE register which sets the number of allowed bitflips in an erased > page. Unfortunately the unit is not capable of fixing the bitflips in > memory. > > To avoid complete software checks for erased pages, we can simply check > buffers with uncorrectable ECC errors because we know that any erased > page with errors is uncorrectable by the BCH unit. > > This patch adds the generic nand_check_erased_ecc_chunk() to gpmi-nand > to correct erased pages. To have the valid data in the buffer before > using them, this patch moves the read_page_swap_end() call before the > ECC status checking for-loop. > > Signed-off-by: Markus Pargmann <mpa@pengutronix.de> > --- > drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 49 ++++++++++++++++++++++++++++++---- > 1 file changed, 44 insertions(+), 5 deletions(-) > > diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > index 235ddcb58f39..ce5a21252102 100644 > --- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > +++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > @@ -1035,14 +1035,58 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip, > /* Loop over status bytes, accumulating ECC status. */ > status = auxiliary_virt + nfc_geo->auxiliary_status_offset; > > + read_page_swap_end(this, buf, nfc_geo->payload_size, > + this->payload_virt, this->payload_phys, > + nfc_geo->payload_size, > + payload_virt, payload_phys); > + > for (i = 0; i < nfc_geo->ecc_chunk_count; i++, status++) { > if ((*status == STATUS_GOOD) || (*status == STATUS_ERASED)) > continue; > > if (*status == STATUS_UNCORRECTABLE) { > + int flips; > + > + /* > + * The ECC hardware has an uncorrectable ECC status > + * code in case we have bitflips in an erased page. As > + * nothing was written into this subpage the ECC is > + * obviously wrong and we can not trust it. We assume > + * at this point that we are reading an erased page and > + * try to correct the bitflips in buffer up to > + * ecc_strength bitflips. If this is a page with random > + * data, we exceed this number of bitflips and have a > + * ECC failure. Otherwise we use the corrected buffer. > + */ > + if (i == 0) { > + /* The first block includes metadata */ > + flips = nand_check_erased_ecc_chunk( > + buf + i * nfc_geo->ecc_chunk_size, > + nfc_geo->ecc_chunk_size, > + NULL, 0, > + auxiliary_virt, > + nfc_geo->metadata_size, > + nfc_geo->ecc_strength); > + } else { > + flips = nand_check_erased_ecc_chunk( > + buf + i * nfc_geo->ecc_chunk_size, > + nfc_geo->ecc_chunk_size, > + NULL, 0, > + NULL, 0, > + nfc_geo->ecc_strength); > + } You're not checking ECC bytes. I know it's a bit more complicated to implement, but as Brian stated several times, you should always check ECC bytes along with data bytes when testing for an erased chunk. Indeed, it might appear that the user really programmed ff on the page, and in this case you don't want to consider the page as erased. In this case, the ECC bytes will be different from ff, and you'll be able to identify it by checking those ECC bytes. Best Regards, Boris -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages 2016-02-24 13:57 ` Boris Brezillon @ 2016-04-11 6:34 ` Markus Pargmann -1 siblings, 0 replies; 32+ messages in thread From: Markus Pargmann @ 2016-04-11 6:34 UTC (permalink / raw) To: Boris Brezillon Cc: Han Xu, David Woodhouse, Fabio Estevam, linux-mtd, kernel, Huang Shijie, Brian Norris, linux-arm-kernel [-- Attachment #1: Type: text/plain, Size: 4871 bytes --] Hi Boris, On Wednesday 24 February 2016 14:57:50 Boris Brezillon wrote: > Hi Markus, > > On Sun, 21 Feb 2016 13:52:06 +0100 > Markus Pargmann <mpa@pengutronix.de> wrote: > > > ECC is only calculated for written pages. As erased pages are not > > actively written the ECC is always invalid. For this purpose the > > Hardware BCH unit is able to check for erased pages and does not raise > > an ECC error in this case. This behaviour can be influenced using the > > BCH_MODE register which sets the number of allowed bitflips in an erased > > page. Unfortunately the unit is not capable of fixing the bitflips in > > memory. > > > > To avoid complete software checks for erased pages, we can simply check > > buffers with uncorrectable ECC errors because we know that any erased > > page with errors is uncorrectable by the BCH unit. > > > > This patch adds the generic nand_check_erased_ecc_chunk() to gpmi-nand > > to correct erased pages. To have the valid data in the buffer before > > using them, this patch moves the read_page_swap_end() call before the > > ECC status checking for-loop. > > > > Signed-off-by: Markus Pargmann <mpa@pengutronix.de> > > --- > > drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 49 ++++++++++++++++++++++++++++++---- > > 1 file changed, 44 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > > index 235ddcb58f39..ce5a21252102 100644 > > --- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > > +++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > > @@ -1035,14 +1035,58 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip, > > /* Loop over status bytes, accumulating ECC status. */ > > status = auxiliary_virt + nfc_geo->auxiliary_status_offset; > > > > + read_page_swap_end(this, buf, nfc_geo->payload_size, > > + this->payload_virt, this->payload_phys, > > + nfc_geo->payload_size, > > + payload_virt, payload_phys); > > + > > for (i = 0; i < nfc_geo->ecc_chunk_count; i++, status++) { > > if ((*status == STATUS_GOOD) || (*status == STATUS_ERASED)) > > continue; > > > > if (*status == STATUS_UNCORRECTABLE) { > > + int flips; > > + > > + /* > > + * The ECC hardware has an uncorrectable ECC status > > + * code in case we have bitflips in an erased page. As > > + * nothing was written into this subpage the ECC is > > + * obviously wrong and we can not trust it. We assume > > + * at this point that we are reading an erased page and > > + * try to correct the bitflips in buffer up to > > + * ecc_strength bitflips. If this is a page with random > > + * data, we exceed this number of bitflips and have a > > + * ECC failure. Otherwise we use the corrected buffer. > > + */ > > + if (i == 0) { > > + /* The first block includes metadata */ > > + flips = nand_check_erased_ecc_chunk( > > + buf + i * nfc_geo->ecc_chunk_size, > > + nfc_geo->ecc_chunk_size, > > + NULL, 0, > > + auxiliary_virt, > > + nfc_geo->metadata_size, > > + nfc_geo->ecc_strength); > > + } else { > > + flips = nand_check_erased_ecc_chunk( > > + buf + i * nfc_geo->ecc_chunk_size, > > + nfc_geo->ecc_chunk_size, > > + NULL, 0, > > + NULL, 0, > > + nfc_geo->ecc_strength); > > + } > > You're not checking ECC bytes. I know it's a bit more complicated to > implement, but as Brian stated several times, you should always check > ECC bytes along with data bytes when testing for an erased chunk. > > Indeed, it might appear that the user really programmed ff on the page, > and in this case you don't want to consider the page as erased. > In this case, the ECC bytes will be different from ff, and you'll > be able to identify it by checking those ECC bytes. Thanks for the feedback. Talking with a coworker about this we may have found a better approach to this that is less complicated to implement. The hardware unit allows us to set a bitflip threshold for erased pages. The ECC unit creates an ECC error only if the number of bitflips exceeds this threshold, but it does not correct these. So the idea is to change the patch so that we set pages, that are signaled by the ECC as erased, to 0xff completely without checking. So the ECC will do all the work and we completely trust in its abilities to do it correctly. I will send a modified patch as soon as I have some time for this. Best Regards, Markus -- Pengutronix e.K. | | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages @ 2016-04-11 6:34 ` Markus Pargmann 0 siblings, 0 replies; 32+ messages in thread From: Markus Pargmann @ 2016-04-11 6:34 UTC (permalink / raw) To: linux-arm-kernel Hi Boris, On Wednesday 24 February 2016 14:57:50 Boris Brezillon wrote: > Hi Markus, > > On Sun, 21 Feb 2016 13:52:06 +0100 > Markus Pargmann <mpa@pengutronix.de> wrote: > > > ECC is only calculated for written pages. As erased pages are not > > actively written the ECC is always invalid. For this purpose the > > Hardware BCH unit is able to check for erased pages and does not raise > > an ECC error in this case. This behaviour can be influenced using the > > BCH_MODE register which sets the number of allowed bitflips in an erased > > page. Unfortunately the unit is not capable of fixing the bitflips in > > memory. > > > > To avoid complete software checks for erased pages, we can simply check > > buffers with uncorrectable ECC errors because we know that any erased > > page with errors is uncorrectable by the BCH unit. > > > > This patch adds the generic nand_check_erased_ecc_chunk() to gpmi-nand > > to correct erased pages. To have the valid data in the buffer before > > using them, this patch moves the read_page_swap_end() call before the > > ECC status checking for-loop. > > > > Signed-off-by: Markus Pargmann <mpa@pengutronix.de> > > --- > > drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 49 ++++++++++++++++++++++++++++++---- > > 1 file changed, 44 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > > index 235ddcb58f39..ce5a21252102 100644 > > --- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > > +++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > > @@ -1035,14 +1035,58 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip, > > /* Loop over status bytes, accumulating ECC status. */ > > status = auxiliary_virt + nfc_geo->auxiliary_status_offset; > > > > + read_page_swap_end(this, buf, nfc_geo->payload_size, > > + this->payload_virt, this->payload_phys, > > + nfc_geo->payload_size, > > + payload_virt, payload_phys); > > + > > for (i = 0; i < nfc_geo->ecc_chunk_count; i++, status++) { > > if ((*status == STATUS_GOOD) || (*status == STATUS_ERASED)) > > continue; > > > > if (*status == STATUS_UNCORRECTABLE) { > > + int flips; > > + > > + /* > > + * The ECC hardware has an uncorrectable ECC status > > + * code in case we have bitflips in an erased page. As > > + * nothing was written into this subpage the ECC is > > + * obviously wrong and we can not trust it. We assume > > + * at this point that we are reading an erased page and > > + * try to correct the bitflips in buffer up to > > + * ecc_strength bitflips. If this is a page with random > > + * data, we exceed this number of bitflips and have a > > + * ECC failure. Otherwise we use the corrected buffer. > > + */ > > + if (i == 0) { > > + /* The first block includes metadata */ > > + flips = nand_check_erased_ecc_chunk( > > + buf + i * nfc_geo->ecc_chunk_size, > > + nfc_geo->ecc_chunk_size, > > + NULL, 0, > > + auxiliary_virt, > > + nfc_geo->metadata_size, > > + nfc_geo->ecc_strength); > > + } else { > > + flips = nand_check_erased_ecc_chunk( > > + buf + i * nfc_geo->ecc_chunk_size, > > + nfc_geo->ecc_chunk_size, > > + NULL, 0, > > + NULL, 0, > > + nfc_geo->ecc_strength); > > + } > > You're not checking ECC bytes. I know it's a bit more complicated to > implement, but as Brian stated several times, you should always check > ECC bytes along with data bytes when testing for an erased chunk. > > Indeed, it might appear that the user really programmed ff on the page, > and in this case you don't want to consider the page as erased. > In this case, the ECC bytes will be different from ff, and you'll > be able to identify it by checking those ECC bytes. Thanks for the feedback. Talking with a coworker about this we may have found a better approach to this that is less complicated to implement. The hardware unit allows us to set a bitflip threshold for erased pages. The ECC unit creates an ECC error only if the number of bitflips exceeds this threshold, but it does not correct these. So the idea is to change the patch so that we set pages, that are signaled by the ECC as erased, to 0xff completely without checking. So the ECC will do all the work and we completely trust in its abilities to do it correctly. I will send a modified patch as soon as I have some time for this. Best Regards, Markus -- Pengutronix e.K. | | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part. URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20160411/f276b318/attachment-0001.sig> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages 2016-04-11 6:34 ` Markus Pargmann @ 2016-04-11 7:29 ` Boris Brezillon -1 siblings, 0 replies; 32+ messages in thread From: Boris Brezillon @ 2016-04-11 7:29 UTC (permalink / raw) To: Markus Pargmann Cc: Han Xu, David Woodhouse, Fabio Estevam, linux-mtd, kernel, Huang Shijie, Brian Norris, linux-arm-kernel On Mon, 11 Apr 2016 08:34:35 +0200 Markus Pargmann <mpa@pengutronix.de> wrote: > Hi Boris, > > On Wednesday 24 February 2016 14:57:50 Boris Brezillon wrote: > > Hi Markus, > > > > On Sun, 21 Feb 2016 13:52:06 +0100 > > Markus Pargmann <mpa@pengutronix.de> wrote: > > > > > ECC is only calculated for written pages. As erased pages are not > > > actively written the ECC is always invalid. For this purpose the > > > Hardware BCH unit is able to check for erased pages and does not raise > > > an ECC error in this case. This behaviour can be influenced using the > > > BCH_MODE register which sets the number of allowed bitflips in an erased > > > page. Unfortunately the unit is not capable of fixing the bitflips in > > > memory. > > > > > > To avoid complete software checks for erased pages, we can simply check > > > buffers with uncorrectable ECC errors because we know that any erased > > > page with errors is uncorrectable by the BCH unit. > > > > > > This patch adds the generic nand_check_erased_ecc_chunk() to gpmi-nand > > > to correct erased pages. To have the valid data in the buffer before > > > using them, this patch moves the read_page_swap_end() call before the > > > ECC status checking for-loop. > > > > > > Signed-off-by: Markus Pargmann <mpa@pengutronix.de> > > > --- > > > drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 49 ++++++++++++++++++++++++++++++---- > > > 1 file changed, 44 insertions(+), 5 deletions(-) > > > > > > diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > > > index 235ddcb58f39..ce5a21252102 100644 > > > --- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > > > +++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > > > @@ -1035,14 +1035,58 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip, > > > /* Loop over status bytes, accumulating ECC status. */ > > > status = auxiliary_virt + nfc_geo->auxiliary_status_offset; > > > > > > + read_page_swap_end(this, buf, nfc_geo->payload_size, > > > + this->payload_virt, this->payload_phys, > > > + nfc_geo->payload_size, > > > + payload_virt, payload_phys); > > > + > > > for (i = 0; i < nfc_geo->ecc_chunk_count; i++, status++) { > > > if ((*status == STATUS_GOOD) || (*status == STATUS_ERASED)) > > > continue; > > > > > > if (*status == STATUS_UNCORRECTABLE) { > > > + int flips; > > > + > > > + /* > > > + * The ECC hardware has an uncorrectable ECC status > > > + * code in case we have bitflips in an erased page. As > > > + * nothing was written into this subpage the ECC is > > > + * obviously wrong and we can not trust it. We assume > > > + * at this point that we are reading an erased page and > > > + * try to correct the bitflips in buffer up to > > > + * ecc_strength bitflips. If this is a page with random > > > + * data, we exceed this number of bitflips and have a > > > + * ECC failure. Otherwise we use the corrected buffer. > > > + */ > > > + if (i == 0) { > > > + /* The first block includes metadata */ > > > + flips = nand_check_erased_ecc_chunk( > > > + buf + i * nfc_geo->ecc_chunk_size, > > > + nfc_geo->ecc_chunk_size, > > > + NULL, 0, > > > + auxiliary_virt, > > > + nfc_geo->metadata_size, > > > + nfc_geo->ecc_strength); > > > + } else { > > > + flips = nand_check_erased_ecc_chunk( > > > + buf + i * nfc_geo->ecc_chunk_size, > > > + nfc_geo->ecc_chunk_size, > > > + NULL, 0, > > > + NULL, 0, > > > + nfc_geo->ecc_strength); > > > + } > > > > You're not checking ECC bytes. I know it's a bit more complicated to > > implement, but as Brian stated several times, you should always check > > ECC bytes along with data bytes when testing for an erased chunk. > > > > Indeed, it might appear that the user really programmed ff on the page, > > and in this case you don't want to consider the page as erased. > > In this case, the ECC bytes will be different from ff, and you'll > > be able to identify it by checking those ECC bytes. > > Thanks for the feedback. Talking with a coworker about this we may have found a > better approach to this that is less complicated to implement. The hardware > unit allows us to set a bitflip threshold for erased pages. The ECC unit > creates an ECC error only if the number of bitflips exceeds this threshold, but > it does not correct these. So the idea is to change the patch so that we set > pages, that are signaled by the ECC as erased, to 0xff completely without > checking. So the ECC will do all the work and we completely trust in its > abilities to do it correctly. Sounds good. -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages @ 2016-04-11 7:29 ` Boris Brezillon 0 siblings, 0 replies; 32+ messages in thread From: Boris Brezillon @ 2016-04-11 7:29 UTC (permalink / raw) To: linux-arm-kernel On Mon, 11 Apr 2016 08:34:35 +0200 Markus Pargmann <mpa@pengutronix.de> wrote: > Hi Boris, > > On Wednesday 24 February 2016 14:57:50 Boris Brezillon wrote: > > Hi Markus, > > > > On Sun, 21 Feb 2016 13:52:06 +0100 > > Markus Pargmann <mpa@pengutronix.de> wrote: > > > > > ECC is only calculated for written pages. As erased pages are not > > > actively written the ECC is always invalid. For this purpose the > > > Hardware BCH unit is able to check for erased pages and does not raise > > > an ECC error in this case. This behaviour can be influenced using the > > > BCH_MODE register which sets the number of allowed bitflips in an erased > > > page. Unfortunately the unit is not capable of fixing the bitflips in > > > memory. > > > > > > To avoid complete software checks for erased pages, we can simply check > > > buffers with uncorrectable ECC errors because we know that any erased > > > page with errors is uncorrectable by the BCH unit. > > > > > > This patch adds the generic nand_check_erased_ecc_chunk() to gpmi-nand > > > to correct erased pages. To have the valid data in the buffer before > > > using them, this patch moves the read_page_swap_end() call before the > > > ECC status checking for-loop. > > > > > > Signed-off-by: Markus Pargmann <mpa@pengutronix.de> > > > --- > > > drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 49 ++++++++++++++++++++++++++++++---- > > > 1 file changed, 44 insertions(+), 5 deletions(-) > > > > > > diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > > > index 235ddcb58f39..ce5a21252102 100644 > > > --- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > > > +++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > > > @@ -1035,14 +1035,58 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip, > > > /* Loop over status bytes, accumulating ECC status. */ > > > status = auxiliary_virt + nfc_geo->auxiliary_status_offset; > > > > > > + read_page_swap_end(this, buf, nfc_geo->payload_size, > > > + this->payload_virt, this->payload_phys, > > > + nfc_geo->payload_size, > > > + payload_virt, payload_phys); > > > + > > > for (i = 0; i < nfc_geo->ecc_chunk_count; i++, status++) { > > > if ((*status == STATUS_GOOD) || (*status == STATUS_ERASED)) > > > continue; > > > > > > if (*status == STATUS_UNCORRECTABLE) { > > > + int flips; > > > + > > > + /* > > > + * The ECC hardware has an uncorrectable ECC status > > > + * code in case we have bitflips in an erased page. As > > > + * nothing was written into this subpage the ECC is > > > + * obviously wrong and we can not trust it. We assume > > > + * at this point that we are reading an erased page and > > > + * try to correct the bitflips in buffer up to > > > + * ecc_strength bitflips. If this is a page with random > > > + * data, we exceed this number of bitflips and have a > > > + * ECC failure. Otherwise we use the corrected buffer. > > > + */ > > > + if (i == 0) { > > > + /* The first block includes metadata */ > > > + flips = nand_check_erased_ecc_chunk( > > > + buf + i * nfc_geo->ecc_chunk_size, > > > + nfc_geo->ecc_chunk_size, > > > + NULL, 0, > > > + auxiliary_virt, > > > + nfc_geo->metadata_size, > > > + nfc_geo->ecc_strength); > > > + } else { > > > + flips = nand_check_erased_ecc_chunk( > > > + buf + i * nfc_geo->ecc_chunk_size, > > > + nfc_geo->ecc_chunk_size, > > > + NULL, 0, > > > + NULL, 0, > > > + nfc_geo->ecc_strength); > > > + } > > > > You're not checking ECC bytes. I know it's a bit more complicated to > > implement, but as Brian stated several times, you should always check > > ECC bytes along with data bytes when testing for an erased chunk. > > > > Indeed, it might appear that the user really programmed ff on the page, > > and in this case you don't want to consider the page as erased. > > In this case, the ECC bytes will be different from ff, and you'll > > be able to identify it by checking those ECC bytes. > > Thanks for the feedback. Talking with a coworker about this we may have found a > better approach to this that is less complicated to implement. The hardware > unit allows us to set a bitflip threshold for erased pages. The ECC unit > creates an ECC error only if the number of bitflips exceeds this threshold, but > it does not correct these. So the idea is to change the patch so that we set > pages, that are signaled by the ECC as erased, to 0xff completely without > checking. So the ECC will do all the work and we completely trust in its > abilities to do it correctly. Sounds good. -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages 2016-04-11 7:29 ` Boris Brezillon @ 2016-04-12 22:39 ` Han Xu -1 siblings, 0 replies; 32+ messages in thread From: Han Xu @ 2016-04-12 22:39 UTC (permalink / raw) To: Boris Brezillon, Markus Pargmann Cc: David Woodhouse, Fabio Estevam, linux-mtd, kernel, Huang Shijie, Brian Norris, linux-arm-kernel ________________________________________ From: Boris Brezillon <boris.brezillon@free-electrons.com> Sent: Monday, April 11, 2016 1:29 AM To: Markus Pargmann Cc: Han Xu; David Woodhouse; Fabio Estevam; linux-mtd@lists.infradead.org; kernel@pengutronix.de; Huang Shijie; Brian Norris; linux-arm-kernel@lists.infradead.org Subject: Re: [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages On Mon, 11 Apr 2016 08:34:35 +0200 Markus Pargmann <mpa@pengutronix.de> wrote: > Hi Boris, > > On Wednesday 24 February 2016 14:57:50 Boris Brezillon wrote: > > Hi Markus, > > > > On Sun, 21 Feb 2016 13:52:06 +0100 > > Markus Pargmann <mpa@pengutronix.de> wrote: > > > > > ECC is only calculated for written pages. As erased pages are not > > > actively written the ECC is always invalid. For this purpose the > > > Hardware BCH unit is able to check for erased pages and does not raise > > > an ECC error in this case. This behaviour can be influenced using the > > > BCH_MODE register which sets the number of allowed bitflips in an erased > > > page. Unfortunately the unit is not capable of fixing the bitflips in > > > memory. > > > > > > To avoid complete software checks for erased pages, we can simply check > > > buffers with uncorrectable ECC errors because we know that any erased > > > page with errors is uncorrectable by the BCH unit. > > > > > > This patch adds the generic nand_check_erased_ecc_chunk() to gpmi-nand > > > to correct erased pages. To have the valid data in the buffer before > > > using them, this patch moves the read_page_swap_end() call before the > > > ECC status checking for-loop. > > > > > > Signed-off-by: Markus Pargmann <mpa@pengutronix.de> > > > --- > > > drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 49 ++++++++++++++++++++++++++++++---- > > > 1 file changed, 44 insertions(+), 5 deletions(-) > > > > > > diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > > > index 235ddcb58f39..ce5a21252102 100644 > > > --- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > > > +++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > > > @@ -1035,14 +1035,58 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip, > > > /* Loop over status bytes, accumulating ECC status. */ > > > status = auxiliary_virt + nfc_geo->auxiliary_status_offset; > > > > > > + read_page_swap_end(this, buf, nfc_geo->payload_size, > > > + this->payload_virt, this->payload_phys, > > > + nfc_geo->payload_size, > > > + payload_virt, payload_phys); > > > + > > > for (i = 0; i < nfc_geo->ecc_chunk_count; i++, status++) { > > > if ((*status == STATUS_GOOD) || (*status == STATUS_ERASED)) > > > continue; > > > > > > if (*status == STATUS_UNCORRECTABLE) { > > > + int flips; > > > + > > > + /* > > > + * The ECC hardware has an uncorrectable ECC status > > > + * code in case we have bitflips in an erased page. As > > > + * nothing was written into this subpage the ECC is > > > + * obviously wrong and we can not trust it. We assume > > > + * at this point that we are reading an erased page and > > > + * try to correct the bitflips in buffer up to > > > + * ecc_strength bitflips. If this is a page with random > > > + * data, we exceed this number of bitflips and have a > > > + * ECC failure. Otherwise we use the corrected buffer. > > > + */ > > > + if (i == 0) { > > > + /* The first block includes metadata */ > > > + flips = nand_check_erased_ecc_chunk( > > > + buf + i * nfc_geo->ecc_chunk_size, > > > + nfc_geo->ecc_chunk_size, > > > + NULL, 0, > > > + auxiliary_virt, > > > + nfc_geo->metadata_size, > > > + nfc_geo->ecc_strength); > > > + } else { > > > + flips = nand_check_erased_ecc_chunk( > > > + buf + i * nfc_geo->ecc_chunk_size, > > > + nfc_geo->ecc_chunk_size, > > > + NULL, 0, > > > + NULL, 0, > > > + nfc_geo->ecc_strength); > > > + } > > > > You're not checking ECC bytes. I know it's a bit more complicated to > > implement, but as Brian stated several times, you should always check > > ECC bytes along with data bytes when testing for an erased chunk. > > > > Indeed, it might appear that the user really programmed ff on the page, > > and in this case you don't want to consider the page as erased. > > In this case, the ECC bytes will be different from ff, and you'll > > be able to identify it by checking those ECC bytes. > > Thanks for the feedback. Talking with a coworker about this we may have found a > better approach to this that is less complicated to implement. The hardware > unit allows us to set a bitflip threshold for erased pages. The ECC unit > creates an ECC error only if the number of bitflips exceeds this threshold, but > it does not correct these. So the idea is to change the patch so that we set > pages, that are signaled by the ECC as erased, to 0xff completely without > checking. So the ECC will do all the work and we completely trust in its > abilities to do it correctly. Sounds good. some new platforms with new gpmi controller could check the count of 0 bits in page, refer to my patch https://patchwork.ozlabs.org/patch/587124/ But for all legacy platforms, IMO, considering bitflip is rare case, set threshold to 0 and only check the uncorrectable branch and then correct data sounds better. Setting threshold and correcting all erased page may highly impact the performance. -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages @ 2016-04-12 22:39 ` Han Xu 0 siblings, 0 replies; 32+ messages in thread From: Han Xu @ 2016-04-12 22:39 UTC (permalink / raw) To: linux-arm-kernel ________________________________________ From: Boris Brezillon <boris.brezillon@free-electrons.com> Sent: Monday, April 11, 2016 1:29 AM To: Markus Pargmann Cc: Han Xu; David Woodhouse; Fabio Estevam; linux-mtd at lists.infradead.org; kernel at pengutronix.de; Huang Shijie; Brian Norris; linux-arm-kernel at lists.infradead.org Subject: Re: [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages On Mon, 11 Apr 2016 08:34:35 +0200 Markus Pargmann <mpa@pengutronix.de> wrote: > Hi Boris, > > On Wednesday 24 February 2016 14:57:50 Boris Brezillon wrote: > > Hi Markus, > > > > On Sun, 21 Feb 2016 13:52:06 +0100 > > Markus Pargmann <mpa@pengutronix.de> wrote: > > > > > ECC is only calculated for written pages. As erased pages are not > > > actively written the ECC is always invalid. For this purpose the > > > Hardware BCH unit is able to check for erased pages and does not raise > > > an ECC error in this case. This behaviour can be influenced using the > > > BCH_MODE register which sets the number of allowed bitflips in an erased > > > page. Unfortunately the unit is not capable of fixing the bitflips in > > > memory. > > > > > > To avoid complete software checks for erased pages, we can simply check > > > buffers with uncorrectable ECC errors because we know that any erased > > > page with errors is uncorrectable by the BCH unit. > > > > > > This patch adds the generic nand_check_erased_ecc_chunk() to gpmi-nand > > > to correct erased pages. To have the valid data in the buffer before > > > using them, this patch moves the read_page_swap_end() call before the > > > ECC status checking for-loop. > > > > > > Signed-off-by: Markus Pargmann <mpa@pengutronix.de> > > > --- > > > drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 49 ++++++++++++++++++++++++++++++---- > > > 1 file changed, 44 insertions(+), 5 deletions(-) > > > > > > diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > > > index 235ddcb58f39..ce5a21252102 100644 > > > --- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > > > +++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > > > @@ -1035,14 +1035,58 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip, > > > /* Loop over status bytes, accumulating ECC status. */ > > > status = auxiliary_virt + nfc_geo->auxiliary_status_offset; > > > > > > + read_page_swap_end(this, buf, nfc_geo->payload_size, > > > + this->payload_virt, this->payload_phys, > > > + nfc_geo->payload_size, > > > + payload_virt, payload_phys); > > > + > > > for (i = 0; i < nfc_geo->ecc_chunk_count; i++, status++) { > > > if ((*status == STATUS_GOOD) || (*status == STATUS_ERASED)) > > > continue; > > > > > > if (*status == STATUS_UNCORRECTABLE) { > > > + int flips; > > > + > > > + /* > > > + * The ECC hardware has an uncorrectable ECC status > > > + * code in case we have bitflips in an erased page. As > > > + * nothing was written into this subpage the ECC is > > > + * obviously wrong and we can not trust it. We assume > > > + * at this point that we are reading an erased page and > > > + * try to correct the bitflips in buffer up to > > > + * ecc_strength bitflips. If this is a page with random > > > + * data, we exceed this number of bitflips and have a > > > + * ECC failure. Otherwise we use the corrected buffer. > > > + */ > > > + if (i == 0) { > > > + /* The first block includes metadata */ > > > + flips = nand_check_erased_ecc_chunk( > > > + buf + i * nfc_geo->ecc_chunk_size, > > > + nfc_geo->ecc_chunk_size, > > > + NULL, 0, > > > + auxiliary_virt, > > > + nfc_geo->metadata_size, > > > + nfc_geo->ecc_strength); > > > + } else { > > > + flips = nand_check_erased_ecc_chunk( > > > + buf + i * nfc_geo->ecc_chunk_size, > > > + nfc_geo->ecc_chunk_size, > > > + NULL, 0, > > > + NULL, 0, > > > + nfc_geo->ecc_strength); > > > + } > > > > You're not checking ECC bytes. I know it's a bit more complicated to > > implement, but as Brian stated several times, you should always check > > ECC bytes along with data bytes when testing for an erased chunk. > > > > Indeed, it might appear that the user really programmed ff on the page, > > and in this case you don't want to consider the page as erased. > > In this case, the ECC bytes will be different from ff, and you'll > > be able to identify it by checking those ECC bytes. > > Thanks for the feedback. Talking with a coworker about this we may have found a > better approach to this that is less complicated to implement. The hardware > unit allows us to set a bitflip threshold for erased pages. The ECC unit > creates an ECC error only if the number of bitflips exceeds this threshold, but > it does not correct these. So the idea is to change the patch so that we set > pages, that are signaled by the ECC as erased, to 0xff completely without > checking. So the ECC will do all the work and we completely trust in its > abilities to do it correctly. Sounds good. some new platforms with new gpmi controller could check the count of 0 bits in page, refer to my patch https://patchwork.ozlabs.org/patch/587124/ But for all legacy platforms, IMO, considering bitflip is rare case, set threshold to 0 and only check the uncorrectable branch and then correct data sounds better. Setting threshold and correcting all erased page may highly impact the performance. -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages 2016-04-12 22:39 ` Han Xu @ 2016-04-12 22:51 ` Boris Brezillon -1 siblings, 0 replies; 32+ messages in thread From: Boris Brezillon @ 2016-04-12 22:51 UTC (permalink / raw) To: Han Xu Cc: Markus Pargmann, David Woodhouse, Fabio Estevam, linux-mtd, kernel, Huang Shijie, Brian Norris, linux-arm-kernel On Tue, 12 Apr 2016 22:39:08 +0000 Han Xu <han.xu@nxp.com> wrote: > > Thanks for the feedback. Talking with a coworker about this we may have found a > > better approach to this that is less complicated to implement. The hardware > > unit allows us to set a bitflip threshold for erased pages. The ECC unit > > creates an ECC error only if the number of bitflips exceeds this threshold, but > > it does not correct these. So the idea is to change the patch so that we set > > pages, that are signaled by the ECC as erased, to 0xff completely without > > checking. So the ECC will do all the work and we completely trust in its > > abilities to do it correctly. > > Sounds good. > > some new platforms with new gpmi controller could check the count of 0 bits in page, > refer to my patch https://patchwork.ozlabs.org/patch/587124/ > > But for all legacy platforms, IMO, considering bitflip is rare case, set threshold to 0 and > only check the uncorrectable branch and then correct data sounds better. Setting threshold > and correcting all erased page may highly impact the performance. Indeed, bitflips in erased pages is not so common, and penalizing the likely case (erased pages without any bitflips) doesn't look like a good idea in the end. You can still implement this check in software. You can have a look at nand_check_erased_ecc_chunk() [1] if you need an example, but you'll have to adapt it because your controller does not guarantees that ECC bits for a given chunk are byte aligned :-/ [1]http://lxr.free-electrons.com/source/drivers/mtd/nand/nand_base.c#L1200 -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages @ 2016-04-12 22:51 ` Boris Brezillon 0 siblings, 0 replies; 32+ messages in thread From: Boris Brezillon @ 2016-04-12 22:51 UTC (permalink / raw) To: linux-arm-kernel On Tue, 12 Apr 2016 22:39:08 +0000 Han Xu <han.xu@nxp.com> wrote: > > Thanks for the feedback. Talking with a coworker about this we may have found a > > better approach to this that is less complicated to implement. The hardware > > unit allows us to set a bitflip threshold for erased pages. The ECC unit > > creates an ECC error only if the number of bitflips exceeds this threshold, but > > it does not correct these. So the idea is to change the patch so that we set > > pages, that are signaled by the ECC as erased, to 0xff completely without > > checking. So the ECC will do all the work and we completely trust in its > > abilities to do it correctly. > > Sounds good. > > some new platforms with new gpmi controller could check the count of 0 bits in page, > refer to my patch https://patchwork.ozlabs.org/patch/587124/ > > But for all legacy platforms, IMO, considering bitflip is rare case, set threshold to 0 and > only check the uncorrectable branch and then correct data sounds better. Setting threshold > and correcting all erased page may highly impact the performance. Indeed, bitflips in erased pages is not so common, and penalizing the likely case (erased pages without any bitflips) doesn't look like a good idea in the end. You can still implement this check in software. You can have a look at nand_check_erased_ecc_chunk() [1] if you need an example, but you'll have to adapt it because your controller does not guarantees that ECC bits for a given chunk are byte aligned :-/ [1]http://lxr.free-electrons.com/source/drivers/mtd/nand/nand_base.c#L1200 -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages 2016-04-12 22:51 ` Boris Brezillon @ 2016-04-15 7:55 ` Markus Pargmann -1 siblings, 0 replies; 32+ messages in thread From: Markus Pargmann @ 2016-04-15 7:55 UTC (permalink / raw) To: Boris Brezillon Cc: Han Xu, David Woodhouse, Fabio Estevam, linux-mtd, kernel, Huang Shijie, Brian Norris, linux-arm-kernel, Stefan Christ, Elie De Brauwer [-- Attachment #1: Type: text/plain, Size: 2418 bytes --] On Wednesday 13 April 2016 00:51:55 Boris Brezillon wrote: > On Tue, 12 Apr 2016 22:39:08 +0000 > Han Xu <han.xu@nxp.com> wrote: > > > > Thanks for the feedback. Talking with a coworker about this we may have found a > > > better approach to this that is less complicated to implement. The hardware > > > unit allows us to set a bitflip threshold for erased pages. The ECC unit > > > creates an ECC error only if the number of bitflips exceeds this threshold, but > > > it does not correct these. So the idea is to change the patch so that we set > > > pages, that are signaled by the ECC as erased, to 0xff completely without > > > checking. So the ECC will do all the work and we completely trust in its > > > abilities to do it correctly. > > > > Sounds good. > > > > some new platforms with new gpmi controller could check the count of 0 bits in page, > > refer to my patch https://patchwork.ozlabs.org/patch/587124/ > > > > But for all legacy platforms, IMO, considering bitflip is rare case, set threshold to 0 and > > only check the uncorrectable branch and then correct data sounds better. Setting threshold > > and correcting all erased page may highly impact the performance. > > Indeed, bitflips in erased pages is not so common, and penalizing the > likely case (erased pages without any bitflips) doesn't look like a good > idea in the end. Are erased pages really read that often? I am not sure how UBI handles this, does it read every page before writing? > > You can still implement this check in software. You can have a look at > nand_check_erased_ecc_chunk() [1] if you need an example, but you'll > have to adapt it because your controller does not guarantees that ECC > bits for a given chunk are byte aligned :-/ Yes I used this function in the patch. The issue is that I am not quite sure yet where to find the raw ECC data (without rereading the page). The reference manual is not extremely clear about that, ecc data may be in the 'auxilliary data' but I am not sure that it really is available somewhere. Best Regards, Markus -- Pengutronix e.K. | | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages @ 2016-04-15 7:55 ` Markus Pargmann 0 siblings, 0 replies; 32+ messages in thread From: Markus Pargmann @ 2016-04-15 7:55 UTC (permalink / raw) To: linux-arm-kernel On Wednesday 13 April 2016 00:51:55 Boris Brezillon wrote: > On Tue, 12 Apr 2016 22:39:08 +0000 > Han Xu <han.xu@nxp.com> wrote: > > > > Thanks for the feedback. Talking with a coworker about this we may have found a > > > better approach to this that is less complicated to implement. The hardware > > > unit allows us to set a bitflip threshold for erased pages. The ECC unit > > > creates an ECC error only if the number of bitflips exceeds this threshold, but > > > it does not correct these. So the idea is to change the patch so that we set > > > pages, that are signaled by the ECC as erased, to 0xff completely without > > > checking. So the ECC will do all the work and we completely trust in its > > > abilities to do it correctly. > > > > Sounds good. > > > > some new platforms with new gpmi controller could check the count of 0 bits in page, > > refer to my patch https://patchwork.ozlabs.org/patch/587124/ > > > > But for all legacy platforms, IMO, considering bitflip is rare case, set threshold to 0 and > > only check the uncorrectable branch and then correct data sounds better. Setting threshold > > and correcting all erased page may highly impact the performance. > > Indeed, bitflips in erased pages is not so common, and penalizing the > likely case (erased pages without any bitflips) doesn't look like a good > idea in the end. Are erased pages really read that often? I am not sure how UBI handles this, does it read every page before writing? > > You can still implement this check in software. You can have a look at > nand_check_erased_ecc_chunk() [1] if you need an example, but you'll > have to adapt it because your controller does not guarantees that ECC > bits for a given chunk are byte aligned :-/ Yes I used this function in the patch. The issue is that I am not quite sure yet where to find the raw ECC data (without rereading the page). The reference manual is not extremely clear about that, ecc data may be in the 'auxilliary data' but I am not sure that it really is available somewhere. Best Regards, Markus -- Pengutronix e.K. | | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part. URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20160415/4a32d055/attachment.sig> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages 2016-04-15 7:55 ` Markus Pargmann @ 2016-04-15 8:35 ` Boris Brezillon -1 siblings, 0 replies; 32+ messages in thread From: Boris Brezillon @ 2016-04-15 8:35 UTC (permalink / raw) To: Markus Pargmann Cc: Han Xu, David Woodhouse, Fabio Estevam, linux-mtd, kernel, Huang Shijie, Brian Norris, linux-arm-kernel, Stefan Christ, Elie De Brauwer, Richard Weinberger, Artem Bityutskiy Hi Markus, On Fri, 15 Apr 2016 09:55:45 +0200 Markus Pargmann <mpa@pengutronix.de> wrote: > On Wednesday 13 April 2016 00:51:55 Boris Brezillon wrote: > > On Tue, 12 Apr 2016 22:39:08 +0000 > > Han Xu <han.xu@nxp.com> wrote: > > > > > > Thanks for the feedback. Talking with a coworker about this we may have found a > > > > better approach to this that is less complicated to implement. The hardware > > > > unit allows us to set a bitflip threshold for erased pages. The ECC unit > > > > creates an ECC error only if the number of bitflips exceeds this threshold, but > > > > it does not correct these. So the idea is to change the patch so that we set > > > > pages, that are signaled by the ECC as erased, to 0xff completely without > > > > checking. So the ECC will do all the work and we completely trust in its > > > > abilities to do it correctly. > > > > > > Sounds good. > > > > > > some new platforms with new gpmi controller could check the count of 0 bits in page, > > > refer to my patch https://patchwork.ozlabs.org/patch/587124/ > > > > > > But for all legacy platforms, IMO, considering bitflip is rare case, set threshold to 0 and > > > only check the uncorrectable branch and then correct data sounds better. Setting threshold > > > and correcting all erased page may highly impact the performance. > > > > Indeed, bitflips in erased pages is not so common, and penalizing the > > likely case (erased pages without any bitflips) doesn't look like a good > > idea in the end. > > Are erased pages really read that often? Yes, it's not unusual to have those "empty pages?" checks (added Artem and Richard to get a confirmation). AFAIR, UBIFS check for empty pages in its journal heads after an unclean unmount (which happens quite often) to make sure there's no corruption. > I am not sure how UBI handles > this, does it read every page before writing? Nope, or maybe it does when you activate some extra checks. > > > > > You can still implement this check in software. You can have a look at > > nand_check_erased_ecc_chunk() [1] if you need an example, but you'll > > have to adapt it because your controller does not guarantees that ECC > > bits for a given chunk are byte aligned :-/ > > Yes I used this function in the patch. The issue is that I am not quite > sure yet where to find the raw ECC data (without rereading the page). > The reference manual is not extremely clear about that, ecc data may be > in the 'auxilliary data' but I am not sure that it really is available > somewhere. AFAIR (and I'm not sure since it was a long time ago), you don't have direct access to ECC bytes with the GPMI engine. If that's the case, you'll have to read the ECC bytes manually (moving the page pointer using ->cmdfunc(NAND_CMD_RNDOUT, column, -1)), which is a pain with this engine, because ECC bytes are not guaranteed to be byte aligned (see gpmi ->read_page_raw() implementation). Once you've retrieved ECC bytes (or bits in this case), for each ECC chunk, you can use the nand_check_erased_ecc_chunk() function (just make sure you're padding the last ECC byte of each chunk with ones so that bitflips cannot be reported on this section). Best Regards, Boris -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages @ 2016-04-15 8:35 ` Boris Brezillon 0 siblings, 0 replies; 32+ messages in thread From: Boris Brezillon @ 2016-04-15 8:35 UTC (permalink / raw) To: linux-arm-kernel Hi Markus, On Fri, 15 Apr 2016 09:55:45 +0200 Markus Pargmann <mpa@pengutronix.de> wrote: > On Wednesday 13 April 2016 00:51:55 Boris Brezillon wrote: > > On Tue, 12 Apr 2016 22:39:08 +0000 > > Han Xu <han.xu@nxp.com> wrote: > > > > > > Thanks for the feedback. Talking with a coworker about this we may have found a > > > > better approach to this that is less complicated to implement. The hardware > > > > unit allows us to set a bitflip threshold for erased pages. The ECC unit > > > > creates an ECC error only if the number of bitflips exceeds this threshold, but > > > > it does not correct these. So the idea is to change the patch so that we set > > > > pages, that are signaled by the ECC as erased, to 0xff completely without > > > > checking. So the ECC will do all the work and we completely trust in its > > > > abilities to do it correctly. > > > > > > Sounds good. > > > > > > some new platforms with new gpmi controller could check the count of 0 bits in page, > > > refer to my patch https://patchwork.ozlabs.org/patch/587124/ > > > > > > But for all legacy platforms, IMO, considering bitflip is rare case, set threshold to 0 and > > > only check the uncorrectable branch and then correct data sounds better. Setting threshold > > > and correcting all erased page may highly impact the performance. > > > > Indeed, bitflips in erased pages is not so common, and penalizing the > > likely case (erased pages without any bitflips) doesn't look like a good > > idea in the end. > > Are erased pages really read that often? Yes, it's not unusual to have those "empty pages?" checks (added Artem and Richard to get a confirmation). AFAIR, UBIFS check for empty pages in its journal heads after an unclean unmount (which happens quite often) to make sure there's no corruption. > I am not sure how UBI handles > this, does it read every page before writing? Nope, or maybe it does when you activate some extra checks. > > > > > You can still implement this check in software. You can have a look at > > nand_check_erased_ecc_chunk() [1] if you need an example, but you'll > > have to adapt it because your controller does not guarantees that ECC > > bits for a given chunk are byte aligned :-/ > > Yes I used this function in the patch. The issue is that I am not quite > sure yet where to find the raw ECC data (without rereading the page). > The reference manual is not extremely clear about that, ecc data may be > in the 'auxilliary data' but I am not sure that it really is available > somewhere. AFAIR (and I'm not sure since it was a long time ago), you don't have direct access to ECC bytes with the GPMI engine. If that's the case, you'll have to read the ECC bytes manually (moving the page pointer using ->cmdfunc(NAND_CMD_RNDOUT, column, -1)), which is a pain with this engine, because ECC bytes are not guaranteed to be byte aligned (see gpmi ->read_page_raw() implementation). Once you've retrieved ECC bytes (or bits in this case), for each ECC chunk, you can use the nand_check_erased_ecc_chunk() function (just make sure you're padding the last ECC byte of each chunk with ones so that bitflips cannot be reported on this section). Best Regards, Boris -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages 2016-04-15 8:35 ` Boris Brezillon @ 2016-04-15 9:35 ` Markus Pargmann -1 siblings, 0 replies; 32+ messages in thread From: Markus Pargmann @ 2016-04-15 9:35 UTC (permalink / raw) To: Boris Brezillon Cc: Han Xu, David Woodhouse, Fabio Estevam, linux-mtd, kernel, Huang Shijie, Brian Norris, linux-arm-kernel, Stefan Christ, Elie De Brauwer, Richard Weinberger, Artem Bityutskiy [-- Attachment #1: Type: text/plain, Size: 4062 bytes --] Hi Boris, On Friday 15 April 2016 10:35:08 Boris Brezillon wrote: > Hi Markus, > > On Fri, 15 Apr 2016 09:55:45 +0200 > Markus Pargmann <mpa@pengutronix.de> wrote: > > > On Wednesday 13 April 2016 00:51:55 Boris Brezillon wrote: > > > On Tue, 12 Apr 2016 22:39:08 +0000 > > > Han Xu <han.xu@nxp.com> wrote: > > > > > > > > Thanks for the feedback. Talking with a coworker about this we may have found a > > > > > better approach to this that is less complicated to implement. The hardware > > > > > unit allows us to set a bitflip threshold for erased pages. The ECC unit > > > > > creates an ECC error only if the number of bitflips exceeds this threshold, but > > > > > it does not correct these. So the idea is to change the patch so that we set > > > > > pages, that are signaled by the ECC as erased, to 0xff completely without > > > > > checking. So the ECC will do all the work and we completely trust in its > > > > > abilities to do it correctly. > > > > > > > > Sounds good. > > > > > > > > some new platforms with new gpmi controller could check the count of 0 bits in page, > > > > refer to my patch https://patchwork.ozlabs.org/patch/587124/ > > > > > > > > But for all legacy platforms, IMO, considering bitflip is rare case, set threshold to 0 and > > > > only check the uncorrectable branch and then correct data sounds better. Setting threshold > > > > and correcting all erased page may highly impact the performance. > > > > > > Indeed, bitflips in erased pages is not so common, and penalizing the > > > likely case (erased pages without any bitflips) doesn't look like a good > > > idea in the end. > > > > Are erased pages really read that often? > > Yes, it's not unusual to have those "empty pages?" checks (added Artem > and Richard to get a confirmation). AFAIR, UBIFS check for empty pages > in its journal heads after an unclean unmount (which happens quite > often) to make sure there's no corruption. > > > I am not sure how UBI handles > > this, does it read every page before writing? > > Nope, or maybe it does when you activate some extra checks. > > > > > > > > > You can still implement this check in software. You can have a look at > > > nand_check_erased_ecc_chunk() [1] if you need an example, but you'll > > > have to adapt it because your controller does not guarantees that ECC > > > bits for a given chunk are byte aligned :-/ > > > > Yes I used this function in the patch. The issue is that I am not quite > > sure yet where to find the raw ECC data (without rereading the page). > > The reference manual is not extremely clear about that, ecc data may be > > in the 'auxilliary data' but I am not sure that it really is available > > somewhere. > > AFAIR (and I'm not sure since it was a long time ago), you don't have > direct access to ECC bytes with the GPMI engine. If that's the case, > you'll have to read the ECC bytes manually (moving the page pointer > using ->cmdfunc(NAND_CMD_RNDOUT, column, -1)), which is a pain with > this engine, because ECC bytes are not guaranteed to be byte aligned > (see gpmi ->read_page_raw() implementation). > Once you've retrieved ECC bytes (or bits in this case), for each ECC > chunk, you can use the nand_check_erased_ecc_chunk() function (just make > sure you're padding the last ECC byte of each chunk with ones so that > bitflips cannot be reported on this section). Thanks for the information. So I understand that this approach is the preferred one to avoid any performance issues for normal operation. I actually won't be able to fix this patch accordingly for some time. If anyone else needs this earlier, feel free to implement it. Best Regards, Markus -- Pengutronix e.K. | | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages @ 2016-04-15 9:35 ` Markus Pargmann 0 siblings, 0 replies; 32+ messages in thread From: Markus Pargmann @ 2016-04-15 9:35 UTC (permalink / raw) To: linux-arm-kernel Hi Boris, On Friday 15 April 2016 10:35:08 Boris Brezillon wrote: > Hi Markus, > > On Fri, 15 Apr 2016 09:55:45 +0200 > Markus Pargmann <mpa@pengutronix.de> wrote: > > > On Wednesday 13 April 2016 00:51:55 Boris Brezillon wrote: > > > On Tue, 12 Apr 2016 22:39:08 +0000 > > > Han Xu <han.xu@nxp.com> wrote: > > > > > > > > Thanks for the feedback. Talking with a coworker about this we may have found a > > > > > better approach to this that is less complicated to implement. The hardware > > > > > unit allows us to set a bitflip threshold for erased pages. The ECC unit > > > > > creates an ECC error only if the number of bitflips exceeds this threshold, but > > > > > it does not correct these. So the idea is to change the patch so that we set > > > > > pages, that are signaled by the ECC as erased, to 0xff completely without > > > > > checking. So the ECC will do all the work and we completely trust in its > > > > > abilities to do it correctly. > > > > > > > > Sounds good. > > > > > > > > some new platforms with new gpmi controller could check the count of 0 bits in page, > > > > refer to my patch https://patchwork.ozlabs.org/patch/587124/ > > > > > > > > But for all legacy platforms, IMO, considering bitflip is rare case, set threshold to 0 and > > > > only check the uncorrectable branch and then correct data sounds better. Setting threshold > > > > and correcting all erased page may highly impact the performance. > > > > > > Indeed, bitflips in erased pages is not so common, and penalizing the > > > likely case (erased pages without any bitflips) doesn't look like a good > > > idea in the end. > > > > Are erased pages really read that often? > > Yes, it's not unusual to have those "empty pages?" checks (added Artem > and Richard to get a confirmation). AFAIR, UBIFS check for empty pages > in its journal heads after an unclean unmount (which happens quite > often) to make sure there's no corruption. > > > I am not sure how UBI handles > > this, does it read every page before writing? > > Nope, or maybe it does when you activate some extra checks. > > > > > > > > > You can still implement this check in software. You can have a look at > > > nand_check_erased_ecc_chunk() [1] if you need an example, but you'll > > > have to adapt it because your controller does not guarantees that ECC > > > bits for a given chunk are byte aligned :-/ > > > > Yes I used this function in the patch. The issue is that I am not quite > > sure yet where to find the raw ECC data (without rereading the page). > > The reference manual is not extremely clear about that, ecc data may be > > in the 'auxilliary data' but I am not sure that it really is available > > somewhere. > > AFAIR (and I'm not sure since it was a long time ago), you don't have > direct access to ECC bytes with the GPMI engine. If that's the case, > you'll have to read the ECC bytes manually (moving the page pointer > using ->cmdfunc(NAND_CMD_RNDOUT, column, -1)), which is a pain with > this engine, because ECC bytes are not guaranteed to be byte aligned > (see gpmi ->read_page_raw() implementation). > Once you've retrieved ECC bytes (or bits in this case), for each ECC > chunk, you can use the nand_check_erased_ecc_chunk() function (just make > sure you're padding the last ECC byte of each chunk with ones so that > bitflips cannot be reported on this section). Thanks for the information. So I understand that this approach is the preferred one to avoid any performance issues for normal operation. I actually won't be able to fix this patch accordingly for some time. If anyone else needs this earlier, feel free to implement it. Best Regards, Markus -- Pengutronix e.K. | | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part. URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20160415/aa2caa65/attachment.sig> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages 2016-04-15 9:35 ` Markus Pargmann @ 2016-04-15 9:39 ` Boris Brezillon -1 siblings, 0 replies; 32+ messages in thread From: Boris Brezillon @ 2016-04-15 9:39 UTC (permalink / raw) To: Markus Pargmann Cc: Han Xu, David Woodhouse, Fabio Estevam, linux-mtd, kernel, Huang Shijie, Brian Norris, linux-arm-kernel, Stefan Christ, Elie De Brauwer, Richard Weinberger, Artem Bityutskiy Hi Markus, On Fri, 15 Apr 2016 11:35:07 +0200 Markus Pargmann <mpa@pengutronix.de> wrote: > Hi Boris, > > On Friday 15 April 2016 10:35:08 Boris Brezillon wrote: > > Hi Markus, > > > > On Fri, 15 Apr 2016 09:55:45 +0200 > > Markus Pargmann <mpa@pengutronix.de> wrote: > > > > > On Wednesday 13 April 2016 00:51:55 Boris Brezillon wrote: > > > > On Tue, 12 Apr 2016 22:39:08 +0000 > > > > Han Xu <han.xu@nxp.com> wrote: > > > > > > > > > > Thanks for the feedback. Talking with a coworker about this we may have found a > > > > > > better approach to this that is less complicated to implement. The hardware > > > > > > unit allows us to set a bitflip threshold for erased pages. The ECC unit > > > > > > creates an ECC error only if the number of bitflips exceeds this threshold, but > > > > > > it does not correct these. So the idea is to change the patch so that we set > > > > > > pages, that are signaled by the ECC as erased, to 0xff completely without > > > > > > checking. So the ECC will do all the work and we completely trust in its > > > > > > abilities to do it correctly. > > > > > > > > > > Sounds good. > > > > > > > > > > some new platforms with new gpmi controller could check the count of 0 bits in page, > > > > > refer to my patch https://patchwork.ozlabs.org/patch/587124/ > > > > > > > > > > But for all legacy platforms, IMO, considering bitflip is rare case, set threshold to 0 and > > > > > only check the uncorrectable branch and then correct data sounds better. Setting threshold > > > > > and correcting all erased page may highly impact the performance. > > > > > > > > Indeed, bitflips in erased pages is not so common, and penalizing the > > > > likely case (erased pages without any bitflips) doesn't look like a good > > > > idea in the end. > > > > > > Are erased pages really read that often? > > > > Yes, it's not unusual to have those "empty pages?" checks (added Artem > > and Richard to get a confirmation). AFAIR, UBIFS check for empty pages > > in its journal heads after an unclean unmount (which happens quite > > often) to make sure there's no corruption. > > > > > I am not sure how UBI handles > > > this, does it read every page before writing? > > > > Nope, or maybe it does when you activate some extra checks. > > > > > > > > > > > > > You can still implement this check in software. You can have a look at > > > > nand_check_erased_ecc_chunk() [1] if you need an example, but you'll > > > > have to adapt it because your controller does not guarantees that ECC > > > > bits for a given chunk are byte aligned :-/ > > > > > > Yes I used this function in the patch. The issue is that I am not quite > > > sure yet where to find the raw ECC data (without rereading the page). > > > The reference manual is not extremely clear about that, ecc data may be > > > in the 'auxilliary data' but I am not sure that it really is available > > > somewhere. > > > > AFAIR (and I'm not sure since it was a long time ago), you don't have > > direct access to ECC bytes with the GPMI engine. If that's the case, > > you'll have to read the ECC bytes manually (moving the page pointer > > using ->cmdfunc(NAND_CMD_RNDOUT, column, -1)), which is a pain with > > this engine, because ECC bytes are not guaranteed to be byte aligned > > (see gpmi ->read_page_raw() implementation). > > Once you've retrieved ECC bytes (or bits in this case), for each ECC > > chunk, you can use the nand_check_erased_ecc_chunk() function (just make > > sure you're padding the last ECC byte of each chunk with ones so that > > bitflips cannot be reported on this section). > > Thanks for the information. So I understand that this approach is the > preferred one to avoid any performance issues for normal operation. > > I actually won't be able to fix this patch accordingly for some time. If > anyone else needs this earlier, feel free to implement it. I just did [1] (it applies on top of your patch), but maybe you can test it (I don't have any imx platforms right now) ;). If these changes work, feel free to squash them into your previous patch. Thanks, Boris [1]http://code.bulix.org/bq6yyh-96549 -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages @ 2016-04-15 9:39 ` Boris Brezillon 0 siblings, 0 replies; 32+ messages in thread From: Boris Brezillon @ 2016-04-15 9:39 UTC (permalink / raw) To: linux-arm-kernel Hi Markus, On Fri, 15 Apr 2016 11:35:07 +0200 Markus Pargmann <mpa@pengutronix.de> wrote: > Hi Boris, > > On Friday 15 April 2016 10:35:08 Boris Brezillon wrote: > > Hi Markus, > > > > On Fri, 15 Apr 2016 09:55:45 +0200 > > Markus Pargmann <mpa@pengutronix.de> wrote: > > > > > On Wednesday 13 April 2016 00:51:55 Boris Brezillon wrote: > > > > On Tue, 12 Apr 2016 22:39:08 +0000 > > > > Han Xu <han.xu@nxp.com> wrote: > > > > > > > > > > Thanks for the feedback. Talking with a coworker about this we may have found a > > > > > > better approach to this that is less complicated to implement. The hardware > > > > > > unit allows us to set a bitflip threshold for erased pages. The ECC unit > > > > > > creates an ECC error only if the number of bitflips exceeds this threshold, but > > > > > > it does not correct these. So the idea is to change the patch so that we set > > > > > > pages, that are signaled by the ECC as erased, to 0xff completely without > > > > > > checking. So the ECC will do all the work and we completely trust in its > > > > > > abilities to do it correctly. > > > > > > > > > > Sounds good. > > > > > > > > > > some new platforms with new gpmi controller could check the count of 0 bits in page, > > > > > refer to my patch https://patchwork.ozlabs.org/patch/587124/ > > > > > > > > > > But for all legacy platforms, IMO, considering bitflip is rare case, set threshold to 0 and > > > > > only check the uncorrectable branch and then correct data sounds better. Setting threshold > > > > > and correcting all erased page may highly impact the performance. > > > > > > > > Indeed, bitflips in erased pages is not so common, and penalizing the > > > > likely case (erased pages without any bitflips) doesn't look like a good > > > > idea in the end. > > > > > > Are erased pages really read that often? > > > > Yes, it's not unusual to have those "empty pages?" checks (added Artem > > and Richard to get a confirmation). AFAIR, UBIFS check for empty pages > > in its journal heads after an unclean unmount (which happens quite > > often) to make sure there's no corruption. > > > > > I am not sure how UBI handles > > > this, does it read every page before writing? > > > > Nope, or maybe it does when you activate some extra checks. > > > > > > > > > > > > > You can still implement this check in software. You can have a look at > > > > nand_check_erased_ecc_chunk() [1] if you need an example, but you'll > > > > have to adapt it because your controller does not guarantees that ECC > > > > bits for a given chunk are byte aligned :-/ > > > > > > Yes I used this function in the patch. The issue is that I am not quite > > > sure yet where to find the raw ECC data (without rereading the page). > > > The reference manual is not extremely clear about that, ecc data may be > > > in the 'auxilliary data' but I am not sure that it really is available > > > somewhere. > > > > AFAIR (and I'm not sure since it was a long time ago), you don't have > > direct access to ECC bytes with the GPMI engine. If that's the case, > > you'll have to read the ECC bytes manually (moving the page pointer > > using ->cmdfunc(NAND_CMD_RNDOUT, column, -1)), which is a pain with > > this engine, because ECC bytes are not guaranteed to be byte aligned > > (see gpmi ->read_page_raw() implementation). > > Once you've retrieved ECC bytes (or bits in this case), for each ECC > > chunk, you can use the nand_check_erased_ecc_chunk() function (just make > > sure you're padding the last ECC byte of each chunk with ones so that > > bitflips cannot be reported on this section). > > Thanks for the information. So I understand that this approach is the > preferred one to avoid any performance issues for normal operation. > > I actually won't be able to fix this patch accordingly for some time. If > anyone else needs this earlier, feel free to implement it. I just did [1] (it applies on top of your patch), but maybe you can test it (I don't have any imx platforms right now) ;). If these changes work, feel free to squash them into your previous patch. Thanks, Boris [1]http://code.bulix.org/bq6yyh-96549 -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages 2016-04-15 9:39 ` Boris Brezillon @ 2016-04-15 12:03 ` Markus Pargmann -1 siblings, 0 replies; 32+ messages in thread From: Markus Pargmann @ 2016-04-15 12:03 UTC (permalink / raw) To: Boris Brezillon Cc: Han Xu, David Woodhouse, Fabio Estevam, linux-mtd, kernel, Huang Shijie, Brian Norris, linux-arm-kernel, Stefan Christ, Elie De Brauwer, Richard Weinberger, Artem Bityutskiy [-- Attachment #1: Type: text/plain, Size: 4997 bytes --] Hi Boris, On Friday 15 April 2016 11:39:06 Boris Brezillon wrote: > Hi Markus, > > On Fri, 15 Apr 2016 11:35:07 +0200 > Markus Pargmann <mpa@pengutronix.de> wrote: > > > Hi Boris, > > > > On Friday 15 April 2016 10:35:08 Boris Brezillon wrote: > > > Hi Markus, > > > > > > On Fri, 15 Apr 2016 09:55:45 +0200 > > > Markus Pargmann <mpa@pengutronix.de> wrote: > > > > > > > On Wednesday 13 April 2016 00:51:55 Boris Brezillon wrote: > > > > > On Tue, 12 Apr 2016 22:39:08 +0000 > > > > > Han Xu <han.xu@nxp.com> wrote: > > > > > > > > > > > > Thanks for the feedback. Talking with a coworker about this we may have found a > > > > > > > better approach to this that is less complicated to implement. The hardware > > > > > > > unit allows us to set a bitflip threshold for erased pages. The ECC unit > > > > > > > creates an ECC error only if the number of bitflips exceeds this threshold, but > > > > > > > it does not correct these. So the idea is to change the patch so that we set > > > > > > > pages, that are signaled by the ECC as erased, to 0xff completely without > > > > > > > checking. So the ECC will do all the work and we completely trust in its > > > > > > > abilities to do it correctly. > > > > > > > > > > > > Sounds good. > > > > > > > > > > > > some new platforms with new gpmi controller could check the count of 0 bits in page, > > > > > > refer to my patch https://patchwork.ozlabs.org/patch/587124/ > > > > > > > > > > > > But for all legacy platforms, IMO, considering bitflip is rare case, set threshold to 0 and > > > > > > only check the uncorrectable branch and then correct data sounds better. Setting threshold > > > > > > and correcting all erased page may highly impact the performance. > > > > > > > > > > Indeed, bitflips in erased pages is not so common, and penalizing the > > > > > likely case (erased pages without any bitflips) doesn't look like a good > > > > > idea in the end. > > > > > > > > Are erased pages really read that often? > > > > > > Yes, it's not unusual to have those "empty pages?" checks (added Artem > > > and Richard to get a confirmation). AFAIR, UBIFS check for empty pages > > > in its journal heads after an unclean unmount (which happens quite > > > often) to make sure there's no corruption. > > > > > > > I am not sure how UBI handles > > > > this, does it read every page before writing? > > > > > > Nope, or maybe it does when you activate some extra checks. > > > > > > > > > > > > > > > > > You can still implement this check in software. You can have a look at > > > > > nand_check_erased_ecc_chunk() [1] if you need an example, but you'll > > > > > have to adapt it because your controller does not guarantees that ECC > > > > > bits for a given chunk are byte aligned :-/ > > > > > > > > Yes I used this function in the patch. The issue is that I am not quite > > > > sure yet where to find the raw ECC data (without rereading the page). > > > > The reference manual is not extremely clear about that, ecc data may be > > > > in the 'auxilliary data' but I am not sure that it really is available > > > > somewhere. > > > > > > AFAIR (and I'm not sure since it was a long time ago), you don't have > > > direct access to ECC bytes with the GPMI engine. If that's the case, > > > you'll have to read the ECC bytes manually (moving the page pointer > > > using ->cmdfunc(NAND_CMD_RNDOUT, column, -1)), which is a pain with > > > this engine, because ECC bytes are not guaranteed to be byte aligned > > > (see gpmi ->read_page_raw() implementation). > > > Once you've retrieved ECC bytes (or bits in this case), for each ECC > > > chunk, you can use the nand_check_erased_ecc_chunk() function (just make > > > sure you're padding the last ECC byte of each chunk with ones so that > > > bitflips cannot be reported on this section). > > > > Thanks for the information. So I understand that this approach is the > > preferred one to avoid any performance issues for normal operation. > > > > I actually won't be able to fix this patch accordingly for some time. If > > anyone else needs this earlier, feel free to implement it. > > I just did [1] (it applies on top of your patch), but maybe you > can test it (I don't have any imx platforms right now) ;). Great, thank you :). I just tested the patch and it works for me. The erased page bitflips are still detected and fixed. I will send a new version then. Best Regards, Markus > > If these changes work, feel free to squash them into your previous > patch. > > Thanks, > > Boris > > [1]http://code.bulix.org/bq6yyh-96549 > > -- Pengutronix e.K. | | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages @ 2016-04-15 12:03 ` Markus Pargmann 0 siblings, 0 replies; 32+ messages in thread From: Markus Pargmann @ 2016-04-15 12:03 UTC (permalink / raw) To: linux-arm-kernel Hi Boris, On Friday 15 April 2016 11:39:06 Boris Brezillon wrote: > Hi Markus, > > On Fri, 15 Apr 2016 11:35:07 +0200 > Markus Pargmann <mpa@pengutronix.de> wrote: > > > Hi Boris, > > > > On Friday 15 April 2016 10:35:08 Boris Brezillon wrote: > > > Hi Markus, > > > > > > On Fri, 15 Apr 2016 09:55:45 +0200 > > > Markus Pargmann <mpa@pengutronix.de> wrote: > > > > > > > On Wednesday 13 April 2016 00:51:55 Boris Brezillon wrote: > > > > > On Tue, 12 Apr 2016 22:39:08 +0000 > > > > > Han Xu <han.xu@nxp.com> wrote: > > > > > > > > > > > > Thanks for the feedback. Talking with a coworker about this we may have found a > > > > > > > better approach to this that is less complicated to implement. The hardware > > > > > > > unit allows us to set a bitflip threshold for erased pages. The ECC unit > > > > > > > creates an ECC error only if the number of bitflips exceeds this threshold, but > > > > > > > it does not correct these. So the idea is to change the patch so that we set > > > > > > > pages, that are signaled by the ECC as erased, to 0xff completely without > > > > > > > checking. So the ECC will do all the work and we completely trust in its > > > > > > > abilities to do it correctly. > > > > > > > > > > > > Sounds good. > > > > > > > > > > > > some new platforms with new gpmi controller could check the count of 0 bits in page, > > > > > > refer to my patch https://patchwork.ozlabs.org/patch/587124/ > > > > > > > > > > > > But for all legacy platforms, IMO, considering bitflip is rare case, set threshold to 0 and > > > > > > only check the uncorrectable branch and then correct data sounds better. Setting threshold > > > > > > and correcting all erased page may highly impact the performance. > > > > > > > > > > Indeed, bitflips in erased pages is not so common, and penalizing the > > > > > likely case (erased pages without any bitflips) doesn't look like a good > > > > > idea in the end. > > > > > > > > Are erased pages really read that often? > > > > > > Yes, it's not unusual to have those "empty pages?" checks (added Artem > > > and Richard to get a confirmation). AFAIR, UBIFS check for empty pages > > > in its journal heads after an unclean unmount (which happens quite > > > often) to make sure there's no corruption. > > > > > > > I am not sure how UBI handles > > > > this, does it read every page before writing? > > > > > > Nope, or maybe it does when you activate some extra checks. > > > > > > > > > > > > > > > > > You can still implement this check in software. You can have a look at > > > > > nand_check_erased_ecc_chunk() [1] if you need an example, but you'll > > > > > have to adapt it because your controller does not guarantees that ECC > > > > > bits for a given chunk are byte aligned :-/ > > > > > > > > Yes I used this function in the patch. The issue is that I am not quite > > > > sure yet where to find the raw ECC data (without rereading the page). > > > > The reference manual is not extremely clear about that, ecc data may be > > > > in the 'auxilliary data' but I am not sure that it really is available > > > > somewhere. > > > > > > AFAIR (and I'm not sure since it was a long time ago), you don't have > > > direct access to ECC bytes with the GPMI engine. If that's the case, > > > you'll have to read the ECC bytes manually (moving the page pointer > > > using ->cmdfunc(NAND_CMD_RNDOUT, column, -1)), which is a pain with > > > this engine, because ECC bytes are not guaranteed to be byte aligned > > > (see gpmi ->read_page_raw() implementation). > > > Once you've retrieved ECC bytes (or bits in this case), for each ECC > > > chunk, you can use the nand_check_erased_ecc_chunk() function (just make > > > sure you're padding the last ECC byte of each chunk with ones so that > > > bitflips cannot be reported on this section). > > > > Thanks for the information. So I understand that this approach is the > > preferred one to avoid any performance issues for normal operation. > > > > I actually won't be able to fix this patch accordingly for some time. If > > anyone else needs this earlier, feel free to implement it. > > I just did [1] (it applies on top of your patch), but maybe you > can test it (I don't have any imx platforms right now) ;). Great, thank you :). I just tested the patch and it works for me. The erased page bitflips are still detected and fixed. I will send a new version then. Best Regards, Markus > > If these changes work, feel free to squash them into your previous > patch. > > Thanks, > > Boris > > [1]http://code.bulix.org/bq6yyh-96549 > > -- Pengutronix e.K. | | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part. URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20160415/1f739a1b/attachment.sig> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages 2016-04-15 12:03 ` Markus Pargmann @ 2016-04-15 15:33 ` Han Xu -1 siblings, 0 replies; 32+ messages in thread From: Han Xu @ 2016-04-15 15:33 UTC (permalink / raw) To: Markus Pargmann, Boris Brezillon Cc: David Woodhouse, Fabio Estevam, linux-mtd, kernel, Huang Shijie, Brian Norris, linux-arm-kernel, Stefan Christ, Elie De Brauwer, Richard Weinberger, Artem Bityutskiy ________________________________________ From: Markus Pargmann <mpa@pengutronix.de> Sent: Friday, April 15, 2016 6:03 AM To: Boris Brezillon Cc: Han Xu; David Woodhouse; Fabio Estevam; linux-mtd@lists.infradead.org; kernel@pengutronix.de; Huang Shijie; Brian Norris; linux-arm-kernel@lists.infradead.org; Stefan Christ; Elie De Brauwer; Richard Weinberger; Artem Bityutskiy Subject: Re: [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages Hi Boris, On Friday 15 April 2016 11:39:06 Boris Brezillon wrote: > Hi Markus, > > On Fri, 15 Apr 2016 11:35:07 +0200 > Markus Pargmann <mpa@pengutronix.de> wrote: > > > Hi Boris, > > > > On Friday 15 April 2016 10:35:08 Boris Brezillon wrote: > > > Hi Markus, > > > > > > On Fri, 15 Apr 2016 09:55:45 +0200 > > > Markus Pargmann <mpa@pengutronix.de> wrote: > > > > > > > On Wednesday 13 April 2016 00:51:55 Boris Brezillon wrote: > > > > > On Tue, 12 Apr 2016 22:39:08 +0000 > > > > > Han Xu <han.xu@nxp.com> wrote: > > > > > > > > > > > > Thanks for the feedback. Talking with a coworker about this we may have found a > > > > > > > better approach to this that is less complicated to implement. The hardware > > > > > > > unit allows us to set a bitflip threshold for erased pages. The ECC unit > > > > > > > creates an ECC error only if the number of bitflips exceeds this threshold, but > > > > > > > it does not correct these. So the idea is to change the patch so that we set > > > > > > > pages, that are signaled by the ECC as erased, to 0xff completely without > > > > > > > checking. So the ECC will do all the work and we completely trust in its > > > > > > > abilities to do it correctly. > > > > > > > > > > > > Sounds good. > > > > > > > > > > > > some new platforms with new gpmi controller could check the count of 0 bits in page, > > > > > > refer to my patch https://patchwork.ozlabs.org/patch/587124/ > > > > > > > > > > > > But for all legacy platforms, IMO, considering bitflip is rare case, set threshold to 0 and > > > > > > only check the uncorrectable branch and then correct data sounds better. Setting threshold > > > > > > and correcting all erased page may highly impact the performance. > > > > > > > > > > Indeed, bitflips in erased pages is not so common, and penalizing the > > > > > likely case (erased pages without any bitflips) doesn't look like a good > > > > > idea in the end. > > > > > > > > Are erased pages really read that often? > > > > > > Yes, it's not unusual to have those "empty pages?" checks (added Artem > > > and Richard to get a confirmation). AFAIR, UBIFS check for empty pages > > > in its journal heads after an unclean unmount (which happens quite > > > often) to make sure there's no corruption. > > > > > > > I am not sure how UBI handles > > > > this, does it read every page before writing? > > > > > > Nope, or maybe it does when you activate some extra checks. > > > > > > > > > > > > > > > > > You can still implement this check in software. You can have a look at > > > > > nand_check_erased_ecc_chunk() [1] if you need an example, but you'll > > > > > have to adapt it because your controller does not guarantees that ECC > > > > > bits for a given chunk are byte aligned :-/ > > > > > > > > Yes I used this function in the patch. The issue is that I am not quite > > > > sure yet where to find the raw ECC data (without rereading the page). > > > > The reference manual is not extremely clear about that, ecc data may be > > > > in the 'auxilliary data' but I am not sure that it really is available > > > > somewhere. > > > > > > AFAIR (and I'm not sure since it was a long time ago), you don't have > > > direct access to ECC bytes with the GPMI engine. If that's the case, > > > you'll have to read the ECC bytes manually (moving the page pointer > > > using ->cmdfunc(NAND_CMD_RNDOUT, column, -1)), which is a pain with > > > this engine, because ECC bytes are not guaranteed to be byte aligned > > > (see gpmi ->read_page_raw() implementation). > > > Once you've retrieved ECC bytes (or bits in this case), for each ECC > > > chunk, you can use the nand_check_erased_ecc_chunk() function (just make > > > sure you're padding the last ECC byte of each chunk with ones so that > > > bitflips cannot be reported on this section). > > > > Thanks for the information. So I understand that this approach is the > > preferred one to avoid any performance issues for normal operation. > > > > I actually won't be able to fix this patch accordingly for some time. If > > anyone else needs this earlier, feel free to implement it. > > I just did [1] (it applies on top of your patch), but maybe you > can test it (I don't have any imx platforms right now) ;). Great, thank you :). I just tested the patch and it works for me. The erased page bitflips are still detected and fixed. I will send a new version then. Hi Markus, Could you please share how to verify the patch, in other words, how to reproduce the UBIFS corruption issue consistently. Thanks. Best Regards, Markus > > If these changes work, feel free to squash them into your previous > patch. > > Thanks, > > Boris > > [1]http://code.bulix.org/bq6yyh-96549 > > -- Pengutronix e.K. | | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages @ 2016-04-15 15:33 ` Han Xu 0 siblings, 0 replies; 32+ messages in thread From: Han Xu @ 2016-04-15 15:33 UTC (permalink / raw) To: linux-arm-kernel ________________________________________ From: Markus Pargmann <mpa@pengutronix.de> Sent: Friday, April 15, 2016 6:03 AM To: Boris Brezillon Cc: Han Xu; David Woodhouse; Fabio Estevam; linux-mtd at lists.infradead.org; kernel at pengutronix.de; Huang Shijie; Brian Norris; linux-arm-kernel at lists.infradead.org; Stefan Christ; Elie De Brauwer; Richard Weinberger; Artem Bityutskiy Subject: Re: [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages Hi Boris, On Friday 15 April 2016 11:39:06 Boris Brezillon wrote: > Hi Markus, > > On Fri, 15 Apr 2016 11:35:07 +0200 > Markus Pargmann <mpa@pengutronix.de> wrote: > > > Hi Boris, > > > > On Friday 15 April 2016 10:35:08 Boris Brezillon wrote: > > > Hi Markus, > > > > > > On Fri, 15 Apr 2016 09:55:45 +0200 > > > Markus Pargmann <mpa@pengutronix.de> wrote: > > > > > > > On Wednesday 13 April 2016 00:51:55 Boris Brezillon wrote: > > > > > On Tue, 12 Apr 2016 22:39:08 +0000 > > > > > Han Xu <han.xu@nxp.com> wrote: > > > > > > > > > > > > Thanks for the feedback. Talking with a coworker about this we may have found a > > > > > > > better approach to this that is less complicated to implement. The hardware > > > > > > > unit allows us to set a bitflip threshold for erased pages. The ECC unit > > > > > > > creates an ECC error only if the number of bitflips exceeds this threshold, but > > > > > > > it does not correct these. So the idea is to change the patch so that we set > > > > > > > pages, that are signaled by the ECC as erased, to 0xff completely without > > > > > > > checking. So the ECC will do all the work and we completely trust in its > > > > > > > abilities to do it correctly. > > > > > > > > > > > > Sounds good. > > > > > > > > > > > > some new platforms with new gpmi controller could check the count of 0 bits in page, > > > > > > refer to my patch https://patchwork.ozlabs.org/patch/587124/ > > > > > > > > > > > > But for all legacy platforms, IMO, considering bitflip is rare case, set threshold to 0 and > > > > > > only check the uncorrectable branch and then correct data sounds better. Setting threshold > > > > > > and correcting all erased page may highly impact the performance. > > > > > > > > > > Indeed, bitflips in erased pages is not so common, and penalizing the > > > > > likely case (erased pages without any bitflips) doesn't look like a good > > > > > idea in the end. > > > > > > > > Are erased pages really read that often? > > > > > > Yes, it's not unusual to have those "empty pages?" checks (added Artem > > > and Richard to get a confirmation). AFAIR, UBIFS check for empty pages > > > in its journal heads after an unclean unmount (which happens quite > > > often) to make sure there's no corruption. > > > > > > > I am not sure how UBI handles > > > > this, does it read every page before writing? > > > > > > Nope, or maybe it does when you activate some extra checks. > > > > > > > > > > > > > > > > > You can still implement this check in software. You can have a look at > > > > > nand_check_erased_ecc_chunk() [1] if you need an example, but you'll > > > > > have to adapt it because your controller does not guarantees that ECC > > > > > bits for a given chunk are byte aligned :-/ > > > > > > > > Yes I used this function in the patch. The issue is that I am not quite > > > > sure yet where to find the raw ECC data (without rereading the page). > > > > The reference manual is not extremely clear about that, ecc data may be > > > > in the 'auxilliary data' but I am not sure that it really is available > > > > somewhere. > > > > > > AFAIR (and I'm not sure since it was a long time ago), you don't have > > > direct access to ECC bytes with the GPMI engine. If that's the case, > > > you'll have to read the ECC bytes manually (moving the page pointer > > > using ->cmdfunc(NAND_CMD_RNDOUT, column, -1)), which is a pain with > > > this engine, because ECC bytes are not guaranteed to be byte aligned > > > (see gpmi ->read_page_raw() implementation). > > > Once you've retrieved ECC bytes (or bits in this case), for each ECC > > > chunk, you can use the nand_check_erased_ecc_chunk() function (just make > > > sure you're padding the last ECC byte of each chunk with ones so that > > > bitflips cannot be reported on this section). > > > > Thanks for the information. So I understand that this approach is the > > preferred one to avoid any performance issues for normal operation. > > > > I actually won't be able to fix this patch accordingly for some time. If > > anyone else needs this earlier, feel free to implement it. > > I just did [1] (it applies on top of your patch), but maybe you > can test it (I don't have any imx platforms right now) ;). Great, thank you :). I just tested the patch and it works for me. The erased page bitflips are still detected and fixed. I will send a new version then. Hi Markus, Could you please share how to verify the patch, in other words, how to reproduce the UBIFS corruption issue consistently. Thanks. Best Regards, Markus > > If these changes work, feel free to squash them into your previous > patch. > > Thanks, > > Boris > > [1]http://code.bulix.org/bq6yyh-96549 > > -- Pengutronix e.K. | | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages 2016-04-15 15:33 ` Han Xu @ 2016-04-15 15:40 ` Boris Brezillon -1 siblings, 0 replies; 32+ messages in thread From: Boris Brezillon @ 2016-04-15 15:40 UTC (permalink / raw) To: Han Xu Cc: Markus Pargmann, David Woodhouse, Fabio Estevam, linux-mtd, kernel, Huang Shijie, Brian Norris, linux-arm-kernel, Stefan Christ, Elie De Brauwer, Richard Weinberger, Artem Bityutskiy On Fri, 15 Apr 2016 15:33:07 +0000 Han Xu <han.xu@nxp.com> wrote: > > > > I just did [1] (it applies on top of your patch), but maybe you > > can test it (I don't have any imx platforms right now) ;). > > Great, thank you :). I just tested the patch and it works for me. The > erased page bitflips are still detected and fixed. I will send a new > version then. > > Hi Markus, > > Could you please share how to verify the patch, in other words, how to reproduce the > UBIFS corruption issue consistently. Thanks. I should really post a new version of the nandflipbits tool [1]. That's clearly the easiest solution to verify that your bitflip correction is reliable. [1]http://lists.infradead.org/pipermail/linux-mtd/2014-November/056634.html -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages @ 2016-04-15 15:40 ` Boris Brezillon 0 siblings, 0 replies; 32+ messages in thread From: Boris Brezillon @ 2016-04-15 15:40 UTC (permalink / raw) To: linux-arm-kernel On Fri, 15 Apr 2016 15:33:07 +0000 Han Xu <han.xu@nxp.com> wrote: > > > > I just did [1] (it applies on top of your patch), but maybe you > > can test it (I don't have any imx platforms right now) ;). > > Great, thank you :). I just tested the patch and it works for me. The > erased page bitflips are still detected and fixed. I will send a new > version then. > > Hi Markus, > > Could you please share how to verify the patch, in other words, how to reproduce the > UBIFS corruption issue consistently. Thanks. I should really post a new version of the nandflipbits tool [1]. That's clearly the easiest solution to verify that your bitflip correction is reliable. [1]http://lists.infradead.org/pipermail/linux-mtd/2014-November/056634.html -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages 2016-04-15 15:40 ` Boris Brezillon @ 2016-04-18 10:07 ` Markus Pargmann -1 siblings, 0 replies; 32+ messages in thread From: Markus Pargmann @ 2016-04-18 10:07 UTC (permalink / raw) To: linux-arm-kernel Cc: Boris Brezillon, Han Xu, Fabio Estevam, Elie De Brauwer, Artem Bityutskiy, Stefan Christ, Huang Shijie, linux-mtd, kernel, Richard Weinberger, Brian Norris, David Woodhouse [-- Attachment #1: Type: text/plain, Size: 1485 bytes --] On Friday 15 April 2016 17:40:31 Boris Brezillon wrote: > On Fri, 15 Apr 2016 15:33:07 +0000 > Han Xu <han.xu@nxp.com> wrote: > > > > > > > I just did [1] (it applies on top of your patch), but maybe you > > > can test it (I don't have any imx platforms right now) ;). > > > > Great, thank you :). I just tested the patch and it works for me. The > > erased page bitflips are still detected and fixed. I will send a new > > version then. > > > > Hi Markus, > > > > Could you please share how to verify the patch, in other words, how to reproduce the > > UBIFS corruption issue consistently. Thanks. I used a simple bashscript with 'nandwrite --noecc' that writes a number of zeroes in every sub-page. For written pages the ECC will handle the flips. For erased pages we can test the algorithm that handles the erased page bitflips. > > I should really post a new version of the nandflipbits tool [1]. > That's clearly the easiest solution to verify that your bitflip > correction is reliable. Seems really useful. Best Regards, Markus > > > [1]http://lists.infradead.org/pipermail/linux-mtd/2014-November/056634.html > > -- Pengutronix e.K. | | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages @ 2016-04-18 10:07 ` Markus Pargmann 0 siblings, 0 replies; 32+ messages in thread From: Markus Pargmann @ 2016-04-18 10:07 UTC (permalink / raw) To: linux-arm-kernel On Friday 15 April 2016 17:40:31 Boris Brezillon wrote: > On Fri, 15 Apr 2016 15:33:07 +0000 > Han Xu <han.xu@nxp.com> wrote: > > > > > > > I just did [1] (it applies on top of your patch), but maybe you > > > can test it (I don't have any imx platforms right now) ;). > > > > Great, thank you :). I just tested the patch and it works for me. The > > erased page bitflips are still detected and fixed. I will send a new > > version then. > > > > Hi Markus, > > > > Could you please share how to verify the patch, in other words, how to reproduce the > > UBIFS corruption issue consistently. Thanks. I used a simple bashscript with 'nandwrite --noecc' that writes a number of zeroes in every sub-page. For written pages the ECC will handle the flips. For erased pages we can test the algorithm that handles the erased page bitflips. > > I should really post a new version of the nandflipbits tool [1]. > That's clearly the easiest solution to verify that your bitflip > correction is reliable. Seems really useful. Best Regards, Markus > > > [1]http://lists.infradead.org/pipermail/linux-mtd/2014-November/056634.html > > -- Pengutronix e.K. | | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part. URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20160418/b0f1222c/attachment-0001.sig> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages 2016-04-15 9:39 ` Boris Brezillon @ 2016-04-18 14:47 ` Stefan Christ -1 siblings, 0 replies; 32+ messages in thread From: Stefan Christ @ 2016-04-18 14:47 UTC (permalink / raw) To: Boris Brezillon Cc: Markus Pargmann, Fabio Estevam, Elie De Brauwer, Artem Bityutskiy, linux-mtd, kernel, Richard Weinberger, Huang Shijie, Han Xu, Brian Norris, David Woodhouse, linux-arm-kernel Hi Boris, On Fri, Apr 15, 2016 at 11:39:06AM +0200, Boris Brezillon wrote: > Hi Markus, > > On Fri, 15 Apr 2016 11:35:07 +0200 > Markus Pargmann <mpa@pengutronix.de> wrote: > > > Hi Boris, > > > > On Friday 15 April 2016 10:35:08 Boris Brezillon wrote: > > > Hi Markus, > > > > > > On Fri, 15 Apr 2016 09:55:45 +0200 > > > Markus Pargmann <mpa@pengutronix.de> wrote: > > > > > > > On Wednesday 13 April 2016 00:51:55 Boris Brezillon wrote: > > > > > On Tue, 12 Apr 2016 22:39:08 +0000 > > > > > Han Xu <han.xu@nxp.com> wrote: > > > > > > > > > > > > Thanks for the feedback. Talking with a coworker about this we may have found a > > > > > > > better approach to this that is less complicated to implement. The hardware > > > > > > > unit allows us to set a bitflip threshold for erased pages. The ECC unit > > > > > > > creates an ECC error only if the number of bitflips exceeds this threshold, but > > > > > > > it does not correct these. So the idea is to change the patch so that we set > > > > > > > pages, that are signaled by the ECC as erased, to 0xff completely without > > > > > > > checking. So the ECC will do all the work and we completely trust in its > > > > > > > abilities to do it correctly. > > > > > > > > > > > > Sounds good. > > > > > > > > > > > > some new platforms with new gpmi controller could check the count of 0 bits in page, > > > > > > refer to my patch https://patchwork.ozlabs.org/patch/587124/ > > > > > > > > > > > > But for all legacy platforms, IMO, considering bitflip is rare case, set threshold to 0 and > > > > > > only check the uncorrectable branch and then correct data sounds better. Setting threshold > > > > > > and correcting all erased page may highly impact the performance. > > > > > > > > > > Indeed, bitflips in erased pages is not so common, and penalizing the > > > > > likely case (erased pages without any bitflips) doesn't look like a good > > > > > idea in the end. > > > > > > > > Are erased pages really read that often? > > > > > > Yes, it's not unusual to have those "empty pages?" checks (added Artem > > > and Richard to get a confirmation). AFAIR, UBIFS check for empty pages > > > in its journal heads after an unclean unmount (which happens quite > > > often) to make sure there's no corruption. > > > > > > > I am not sure how UBI handles > > > > this, does it read every page before writing? > > > > > > Nope, or maybe it does when you activate some extra checks. > > > > > > > > > > > > > > > > > You can still implement this check in software. You can have a look at > > > > > nand_check_erased_ecc_chunk() [1] if you need an example, but you'll > > > > > have to adapt it because your controller does not guarantees that ECC > > > > > bits for a given chunk are byte aligned :-/ > > > > > > > > Yes I used this function in the patch. The issue is that I am not quite > > > > sure yet where to find the raw ECC data (without rereading the page). > > > > The reference manual is not extremely clear about that, ecc data may be > > > > in the 'auxilliary data' but I am not sure that it really is available > > > > somewhere. > > > > > > AFAIR (and I'm not sure since it was a long time ago), you don't have > > > direct access to ECC bytes with the GPMI engine. If that's the case, > > > you'll have to read the ECC bytes manually (moving the page pointer > > > using ->cmdfunc(NAND_CMD_RNDOUT, column, -1)), which is a pain with > > > this engine, because ECC bytes are not guaranteed to be byte aligned > > > (see gpmi ->read_page_raw() implementation). > > > Once you've retrieved ECC bytes (or bits in this case), for each ECC > > > chunk, you can use the nand_check_erased_ecc_chunk() function (just make > > > sure you're padding the last ECC byte of each chunk with ones so that > > > bitflips cannot be reported on this section). > > > > Thanks for the information. So I understand that this approach is the > > preferred one to avoid any performance issues for normal operation. > > > > I actually won't be able to fix this patch accordingly for some time. If > > anyone else needs this earlier, feel free to implement it. > > I just did [1] (it applies on top of your patch), but maybe you > can test it (I don't have any imx platforms right now) ;). > > If these changes work, feel free to squash them into your previous > patch. I've tested your diff onto Markus Pargmann's patch. It looks promising. However I've noticed that the calculation of the ECC parity bits position is wrong. It doesn't consider the extra metadata bytes at the beginning of the raw page and that the ECC parity bits are at the end of the ECC chunk. My test platform is the i.MX6 with two NAND flashes nand: Samsung NAND 1GiB 3,3V 8-bit nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 (-> 104 Bits ECC ) and nand: AMD/Spansion S34ML08G2 nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 128 (-> 234 Bits ECC ) I've also tested the bit alignment code. It works correctly for the Spansion NAND, as the 234 Bits of ECC are 29.25 Bytes on the NAND flash. So there the parity bits are not byte aligned. Mit freundlichen Grüßen / Kind regards, Stefan Christ The corrected ECC parity bit code is: ---->8---- diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c index 2f16d7f..ccae6e6 100644 --- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c +++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c @@ -1054,7 +1054,9 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip, int flips; /* Read ECC bytes into our internal raw_buffer */ - offset = ((8 * nfc_geo->ecc_chunk_size) + eccbits) * i; + offset = nfc_geo->metadata_size * 8; + offset += ((8 * nfc_geo->ecc_chunk_size) + eccbits) * (i + 1); + offset -= eccbits; bitoffset = offset % 8; eccbytes = DIV_ROUND_UP(offset + eccbits, 8); offset /= 8; ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages @ 2016-04-18 14:47 ` Stefan Christ 0 siblings, 0 replies; 32+ messages in thread From: Stefan Christ @ 2016-04-18 14:47 UTC (permalink / raw) To: linux-arm-kernel Hi Boris, On Fri, Apr 15, 2016 at 11:39:06AM +0200, Boris Brezillon wrote: > Hi Markus, > > On Fri, 15 Apr 2016 11:35:07 +0200 > Markus Pargmann <mpa@pengutronix.de> wrote: > > > Hi Boris, > > > > On Friday 15 April 2016 10:35:08 Boris Brezillon wrote: > > > Hi Markus, > > > > > > On Fri, 15 Apr 2016 09:55:45 +0200 > > > Markus Pargmann <mpa@pengutronix.de> wrote: > > > > > > > On Wednesday 13 April 2016 00:51:55 Boris Brezillon wrote: > > > > > On Tue, 12 Apr 2016 22:39:08 +0000 > > > > > Han Xu <han.xu@nxp.com> wrote: > > > > > > > > > > > > Thanks for the feedback. Talking with a coworker about this we may have found a > > > > > > > better approach to this that is less complicated to implement. The hardware > > > > > > > unit allows us to set a bitflip threshold for erased pages. The ECC unit > > > > > > > creates an ECC error only if the number of bitflips exceeds this threshold, but > > > > > > > it does not correct these. So the idea is to change the patch so that we set > > > > > > > pages, that are signaled by the ECC as erased, to 0xff completely without > > > > > > > checking. So the ECC will do all the work and we completely trust in its > > > > > > > abilities to do it correctly. > > > > > > > > > > > > Sounds good. > > > > > > > > > > > > some new platforms with new gpmi controller could check the count of 0 bits in page, > > > > > > refer to my patch https://patchwork.ozlabs.org/patch/587124/ > > > > > > > > > > > > But for all legacy platforms, IMO, considering bitflip is rare case, set threshold to 0 and > > > > > > only check the uncorrectable branch and then correct data sounds better. Setting threshold > > > > > > and correcting all erased page may highly impact the performance. > > > > > > > > > > Indeed, bitflips in erased pages is not so common, and penalizing the > > > > > likely case (erased pages without any bitflips) doesn't look like a good > > > > > idea in the end. > > > > > > > > Are erased pages really read that often? > > > > > > Yes, it's not unusual to have those "empty pages?" checks (added Artem > > > and Richard to get a confirmation). AFAIR, UBIFS check for empty pages > > > in its journal heads after an unclean unmount (which happens quite > > > often) to make sure there's no corruption. > > > > > > > I am not sure how UBI handles > > > > this, does it read every page before writing? > > > > > > Nope, or maybe it does when you activate some extra checks. > > > > > > > > > > > > > > > > > You can still implement this check in software. You can have a look at > > > > > nand_check_erased_ecc_chunk() [1] if you need an example, but you'll > > > > > have to adapt it because your controller does not guarantees that ECC > > > > > bits for a given chunk are byte aligned :-/ > > > > > > > > Yes I used this function in the patch. The issue is that I am not quite > > > > sure yet where to find the raw ECC data (without rereading the page). > > > > The reference manual is not extremely clear about that, ecc data may be > > > > in the 'auxilliary data' but I am not sure that it really is available > > > > somewhere. > > > > > > AFAIR (and I'm not sure since it was a long time ago), you don't have > > > direct access to ECC bytes with the GPMI engine. If that's the case, > > > you'll have to read the ECC bytes manually (moving the page pointer > > > using ->cmdfunc(NAND_CMD_RNDOUT, column, -1)), which is a pain with > > > this engine, because ECC bytes are not guaranteed to be byte aligned > > > (see gpmi ->read_page_raw() implementation). > > > Once you've retrieved ECC bytes (or bits in this case), for each ECC > > > chunk, you can use the nand_check_erased_ecc_chunk() function (just make > > > sure you're padding the last ECC byte of each chunk with ones so that > > > bitflips cannot be reported on this section). > > > > Thanks for the information. So I understand that this approach is the > > preferred one to avoid any performance issues for normal operation. > > > > I actually won't be able to fix this patch accordingly for some time. If > > anyone else needs this earlier, feel free to implement it. > > I just did [1] (it applies on top of your patch), but maybe you > can test it (I don't have any imx platforms right now) ;). > > If these changes work, feel free to squash them into your previous > patch. I've tested your diff onto Markus Pargmann's patch. It looks promising. However I've noticed that the calculation of the ECC parity bits position is wrong. It doesn't consider the extra metadata bytes at the beginning of the raw page and that the ECC parity bits are at the end of the ECC chunk. My test platform is the i.MX6 with two NAND flashes nand: Samsung NAND 1GiB 3,3V 8-bit nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 (-> 104 Bits ECC ) and nand: AMD/Spansion S34ML08G2 nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 128 (-> 234 Bits ECC ) I've also tested the bit alignment code. It works correctly for the Spansion NAND, as the 234 Bits of ECC are 29.25 Bytes on the NAND flash. So there the parity bits are not byte aligned. Mit freundlichen Gr??en / Kind regards, Stefan Christ The corrected ECC parity bit code is: ---->8---- diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c index 2f16d7f..ccae6e6 100644 --- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c +++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c @@ -1054,7 +1054,9 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip, int flips; /* Read ECC bytes into our internal raw_buffer */ - offset = ((8 * nfc_geo->ecc_chunk_size) + eccbits) * i; + offset = nfc_geo->metadata_size * 8; + offset += ((8 * nfc_geo->ecc_chunk_size) + eccbits) * (i + 1); + offset -= eccbits; bitoffset = offset % 8; eccbytes = DIV_ROUND_UP(offset + eccbits, 8); offset /= 8; ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages 2016-04-18 14:47 ` Stefan Christ @ 2016-04-18 15:10 ` Boris Brezillon -1 siblings, 0 replies; 32+ messages in thread From: Boris Brezillon @ 2016-04-18 15:10 UTC (permalink / raw) To: Stefan Christ Cc: Markus Pargmann, Fabio Estevam, Elie De Brauwer, Artem Bityutskiy, linux-mtd, kernel, Richard Weinberger, Huang Shijie, Brian Norris, David Woodhouse, linux-arm-kernel On Mon, 18 Apr 2016 16:47:20 +0200 Stefan Christ <s.christ@phytec.de> wrote: > Hi Boris, > > On Fri, Apr 15, 2016 at 11:39:06AM +0200, Boris Brezillon wrote: > > Hi Markus, > > > > On Fri, 15 Apr 2016 11:35:07 +0200 > > Markus Pargmann <mpa@pengutronix.de> wrote: > > > > > Hi Boris, > > > > > > On Friday 15 April 2016 10:35:08 Boris Brezillon wrote: > > > > Hi Markus, > > > > > > > > On Fri, 15 Apr 2016 09:55:45 +0200 > > > > Markus Pargmann <mpa@pengutronix.de> wrote: > > > > > > > > > On Wednesday 13 April 2016 00:51:55 Boris Brezillon wrote: > > > > > > On Tue, 12 Apr 2016 22:39:08 +0000 > > > > > > Han Xu <han.xu@nxp.com> wrote: > > > > > > > > > > > > > > Thanks for the feedback. Talking with a coworker about this we may have found a > > > > > > > > better approach to this that is less complicated to implement. The hardware > > > > > > > > unit allows us to set a bitflip threshold for erased pages. The ECC unit > > > > > > > > creates an ECC error only if the number of bitflips exceeds this threshold, but > > > > > > > > it does not correct these. So the idea is to change the patch so that we set > > > > > > > > pages, that are signaled by the ECC as erased, to 0xff completely without > > > > > > > > checking. So the ECC will do all the work and we completely trust in its > > > > > > > > abilities to do it correctly. > > > > > > > > > > > > > > Sounds good. > > > > > > > > > > > > > > some new platforms with new gpmi controller could check the count of 0 bits in page, > > > > > > > refer to my patch https://patchwork.ozlabs.org/patch/587124/ > > > > > > > > > > > > > > But for all legacy platforms, IMO, considering bitflip is rare case, set threshold to 0 and > > > > > > > only check the uncorrectable branch and then correct data sounds better. Setting threshold > > > > > > > and correcting all erased page may highly impact the performance. > > > > > > > > > > > > Indeed, bitflips in erased pages is not so common, and penalizing the > > > > > > likely case (erased pages without any bitflips) doesn't look like a good > > > > > > idea in the end. > > > > > > > > > > Are erased pages really read that often? > > > > > > > > Yes, it's not unusual to have those "empty pages?" checks (added Artem > > > > and Richard to get a confirmation). AFAIR, UBIFS check for empty pages > > > > in its journal heads after an unclean unmount (which happens quite > > > > often) to make sure there's no corruption. > > > > > > > > > I am not sure how UBI handles > > > > > this, does it read every page before writing? > > > > > > > > Nope, or maybe it does when you activate some extra checks. > > > > > > > > > > > > > > > > > > > > > You can still implement this check in software. You can have a look at > > > > > > nand_check_erased_ecc_chunk() [1] if you need an example, but you'll > > > > > > have to adapt it because your controller does not guarantees that ECC > > > > > > bits for a given chunk are byte aligned :-/ > > > > > > > > > > Yes I used this function in the patch. The issue is that I am not quite > > > > > sure yet where to find the raw ECC data (without rereading the page). > > > > > The reference manual is not extremely clear about that, ecc data may be > > > > > in the 'auxilliary data' but I am not sure that it really is available > > > > > somewhere. > > > > > > > > AFAIR (and I'm not sure since it was a long time ago), you don't have > > > > direct access to ECC bytes with the GPMI engine. If that's the case, > > > > you'll have to read the ECC bytes manually (moving the page pointer > > > > using ->cmdfunc(NAND_CMD_RNDOUT, column, -1)), which is a pain with > > > > this engine, because ECC bytes are not guaranteed to be byte aligned > > > > (see gpmi ->read_page_raw() implementation). > > > > Once you've retrieved ECC bytes (or bits in this case), for each ECC > > > > chunk, you can use the nand_check_erased_ecc_chunk() function (just make > > > > sure you're padding the last ECC byte of each chunk with ones so that > > > > bitflips cannot be reported on this section). > > > > > > Thanks for the information. So I understand that this approach is the > > > preferred one to avoid any performance issues for normal operation. > > > > > > I actually won't be able to fix this patch accordingly for some time. If > > > anyone else needs this earlier, feel free to implement it. > > > > I just did [1] (it applies on top of your patch), but maybe you > > can test it (I don't have any imx platforms right now) ;). > > > > If these changes work, feel free to squash them into your previous > > patch. > > I've tested your diff onto Markus Pargmann's patch. It looks promising. > > However I've noticed that the calculation of the ECC parity bits position is > wrong. It doesn't consider the extra metadata bytes at the beginning of the > raw page and that the ECC parity bits are at the end of the ECC chunk. Oh, you're right. Thanks for fixing that. > My test > platform is the i.MX6 with two NAND flashes > > nand: Samsung NAND 1GiB 3,3V 8-bit > nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 > (-> 104 Bits ECC ) > > and > > nand: AMD/Spansion S34ML08G2 > nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 128 > (-> 234 Bits ECC ) > > I've also tested the bit alignment code. It works correctly for the Spansion > NAND, as the 234 Bits of ECC are 29.25 Bytes on the NAND flash. So there the > parity bits are not byte aligned. And thanks for testing. Markus, if you resubmit your patch, please take Stefan's changes. > > Mit freundlichen Grüßen / Kind regards, > Stefan Christ > > The corrected ECC parity bit code is: > > ---->8---- > diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > index 2f16d7f..ccae6e6 100644 > --- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > +++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > @@ -1054,7 +1054,9 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip, > int flips; > > /* Read ECC bytes into our internal raw_buffer */ > - offset = ((8 * nfc_geo->ecc_chunk_size) + eccbits) * i; > + offset = nfc_geo->metadata_size * 8; > + offset += ((8 * nfc_geo->ecc_chunk_size) + eccbits) * (i + 1); > + offset -= eccbits; > bitoffset = offset % 8; > eccbytes = DIV_ROUND_UP(offset + eccbits, 8); > offset /= 8; -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages @ 2016-04-18 15:10 ` Boris Brezillon 0 siblings, 0 replies; 32+ messages in thread From: Boris Brezillon @ 2016-04-18 15:10 UTC (permalink / raw) To: linux-arm-kernel On Mon, 18 Apr 2016 16:47:20 +0200 Stefan Christ <s.christ@phytec.de> wrote: > Hi Boris, > > On Fri, Apr 15, 2016 at 11:39:06AM +0200, Boris Brezillon wrote: > > Hi Markus, > > > > On Fri, 15 Apr 2016 11:35:07 +0200 > > Markus Pargmann <mpa@pengutronix.de> wrote: > > > > > Hi Boris, > > > > > > On Friday 15 April 2016 10:35:08 Boris Brezillon wrote: > > > > Hi Markus, > > > > > > > > On Fri, 15 Apr 2016 09:55:45 +0200 > > > > Markus Pargmann <mpa@pengutronix.de> wrote: > > > > > > > > > On Wednesday 13 April 2016 00:51:55 Boris Brezillon wrote: > > > > > > On Tue, 12 Apr 2016 22:39:08 +0000 > > > > > > Han Xu <han.xu@nxp.com> wrote: > > > > > > > > > > > > > > Thanks for the feedback. Talking with a coworker about this we may have found a > > > > > > > > better approach to this that is less complicated to implement. The hardware > > > > > > > > unit allows us to set a bitflip threshold for erased pages. The ECC unit > > > > > > > > creates an ECC error only if the number of bitflips exceeds this threshold, but > > > > > > > > it does not correct these. So the idea is to change the patch so that we set > > > > > > > > pages, that are signaled by the ECC as erased, to 0xff completely without > > > > > > > > checking. So the ECC will do all the work and we completely trust in its > > > > > > > > abilities to do it correctly. > > > > > > > > > > > > > > Sounds good. > > > > > > > > > > > > > > some new platforms with new gpmi controller could check the count of 0 bits in page, > > > > > > > refer to my patch https://patchwork.ozlabs.org/patch/587124/ > > > > > > > > > > > > > > But for all legacy platforms, IMO, considering bitflip is rare case, set threshold to 0 and > > > > > > > only check the uncorrectable branch and then correct data sounds better. Setting threshold > > > > > > > and correcting all erased page may highly impact the performance. > > > > > > > > > > > > Indeed, bitflips in erased pages is not so common, and penalizing the > > > > > > likely case (erased pages without any bitflips) doesn't look like a good > > > > > > idea in the end. > > > > > > > > > > Are erased pages really read that often? > > > > > > > > Yes, it's not unusual to have those "empty pages?" checks (added Artem > > > > and Richard to get a confirmation). AFAIR, UBIFS check for empty pages > > > > in its journal heads after an unclean unmount (which happens quite > > > > often) to make sure there's no corruption. > > > > > > > > > I am not sure how UBI handles > > > > > this, does it read every page before writing? > > > > > > > > Nope, or maybe it does when you activate some extra checks. > > > > > > > > > > > > > > > > > > > > > You can still implement this check in software. You can have a look at > > > > > > nand_check_erased_ecc_chunk() [1] if you need an example, but you'll > > > > > > have to adapt it because your controller does not guarantees that ECC > > > > > > bits for a given chunk are byte aligned :-/ > > > > > > > > > > Yes I used this function in the patch. The issue is that I am not quite > > > > > sure yet where to find the raw ECC data (without rereading the page). > > > > > The reference manual is not extremely clear about that, ecc data may be > > > > > in the 'auxilliary data' but I am not sure that it really is available > > > > > somewhere. > > > > > > > > AFAIR (and I'm not sure since it was a long time ago), you don't have > > > > direct access to ECC bytes with the GPMI engine. If that's the case, > > > > you'll have to read the ECC bytes manually (moving the page pointer > > > > using ->cmdfunc(NAND_CMD_RNDOUT, column, -1)), which is a pain with > > > > this engine, because ECC bytes are not guaranteed to be byte aligned > > > > (see gpmi ->read_page_raw() implementation). > > > > Once you've retrieved ECC bytes (or bits in this case), for each ECC > > > > chunk, you can use the nand_check_erased_ecc_chunk() function (just make > > > > sure you're padding the last ECC byte of each chunk with ones so that > > > > bitflips cannot be reported on this section). > > > > > > Thanks for the information. So I understand that this approach is the > > > preferred one to avoid any performance issues for normal operation. > > > > > > I actually won't be able to fix this patch accordingly for some time. If > > > anyone else needs this earlier, feel free to implement it. > > > > I just did [1] (it applies on top of your patch), but maybe you > > can test it (I don't have any imx platforms right now) ;). > > > > If these changes work, feel free to squash them into your previous > > patch. > > I've tested your diff onto Markus Pargmann's patch. It looks promising. > > However I've noticed that the calculation of the ECC parity bits position is > wrong. It doesn't consider the extra metadata bytes at the beginning of the > raw page and that the ECC parity bits are at the end of the ECC chunk. Oh, you're right. Thanks for fixing that. > My test > platform is the i.MX6 with two NAND flashes > > nand: Samsung NAND 1GiB 3,3V 8-bit > nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 > (-> 104 Bits ECC ) > > and > > nand: AMD/Spansion S34ML08G2 > nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 128 > (-> 234 Bits ECC ) > > I've also tested the bit alignment code. It works correctly for the Spansion > NAND, as the 234 Bits of ECC are 29.25 Bytes on the NAND flash. So there the > parity bits are not byte aligned. And thanks for testing. Markus, if you resubmit your patch, please take Stefan's changes. > > Mit freundlichen Gr??en / Kind regards, > Stefan Christ > > The corrected ECC parity bit code is: > > ---->8---- > diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > index 2f16d7f..ccae6e6 100644 > --- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > +++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c > @@ -1054,7 +1054,9 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip, > int flips; > > /* Read ECC bytes into our internal raw_buffer */ > - offset = ((8 * nfc_geo->ecc_chunk_size) + eccbits) * i; > + offset = nfc_geo->metadata_size * 8; > + offset += ((8 * nfc_geo->ecc_chunk_size) + eccbits) * (i + 1); > + offset -= eccbits; > bitoffset = offset % 8; > eccbytes = DIV_ROUND_UP(offset + eccbits, 8); > offset /= 8; -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2016-04-18 15:10 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-02-21 12:52 [PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages Markus Pargmann 2016-02-21 12:52 ` Markus Pargmann 2016-02-24 13:57 ` Boris Brezillon 2016-02-24 13:57 ` Boris Brezillon 2016-04-11 6:34 ` Markus Pargmann 2016-04-11 6:34 ` Markus Pargmann 2016-04-11 7:29 ` Boris Brezillon 2016-04-11 7:29 ` Boris Brezillon 2016-04-12 22:39 ` Han Xu 2016-04-12 22:39 ` Han Xu 2016-04-12 22:51 ` Boris Brezillon 2016-04-12 22:51 ` Boris Brezillon 2016-04-15 7:55 ` Markus Pargmann 2016-04-15 7:55 ` Markus Pargmann 2016-04-15 8:35 ` Boris Brezillon 2016-04-15 8:35 ` Boris Brezillon 2016-04-15 9:35 ` Markus Pargmann 2016-04-15 9:35 ` Markus Pargmann 2016-04-15 9:39 ` Boris Brezillon 2016-04-15 9:39 ` Boris Brezillon 2016-04-15 12:03 ` Markus Pargmann 2016-04-15 12:03 ` Markus Pargmann 2016-04-15 15:33 ` Han Xu 2016-04-15 15:33 ` Han Xu 2016-04-15 15:40 ` Boris Brezillon 2016-04-15 15:40 ` Boris Brezillon 2016-04-18 10:07 ` Markus Pargmann 2016-04-18 10:07 ` Markus Pargmann 2016-04-18 14:47 ` Stefan Christ 2016-04-18 14:47 ` Stefan Christ 2016-04-18 15:10 ` Boris Brezillon 2016-04-18 15:10 ` Boris Brezillon
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.