From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755470AbdCXIHO (ORCPT ); Fri, 24 Mar 2017 04:07:14 -0400 Received: from mail.free-electrons.com ([62.4.15.54]:33727 "EHLO mail.free-electrons.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753072AbdCXIHI (ORCPT ); Fri, 24 Mar 2017 04:07:08 -0400 Date: Fri, 24 Mar 2017 09:06:59 +0100 From: Boris Brezillon To: Masahiro Yamada Cc: linux-mtd@lists.infradead.org, Laurent Monat , thorsten.christiansson@idquantique.com, Enrico Jorns , Jason Roberts , Artem Bityutskiy , Dinh Nguyen , Marek Vasut , Brian Norris , Graham Moore , David Woodhouse , Masami Hiramatsu , Chuanxiao Dong , Jassi Brar , Linux Kernel Mailing List , Richard Weinberger , Cyrille Pitchen Subject: Re: [PATCH v2 10/53] mtd: nand: denali: fix erased page checking Message-ID: <20170324090659.76178a37@bbrezillon> In-Reply-To: References: <1490191680-14481-1-git-send-email-yamada.masahiro@socionext.com> <1490191680-14481-11-git-send-email-yamada.masahiro@socionext.com> <20170322215623.531e9690@bbrezillon> <20170323085644.338a3a81@bbrezillon> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 24 Mar 2017 11:43:43 +0900 Masahiro Yamada wrote: > Hi Boris, > > > 2017-03-23 16:56 GMT+09:00 Boris Brezillon : > > On Thu, 23 Mar 2017 14:04:44 +0900 > > Masahiro Yamada wrote: > > > >> Hi Boris, > >> > >> 2017-03-23 5:56 GMT+09:00 Boris Brezillon : > >> > On Wed, 22 Mar 2017 23:07:17 +0900 > >> > Masahiro Yamada wrote: > >> >> dev_err(denali->dev, > >> >> @@ -1148,12 +1136,15 @@ static int denali_read_page(struct mtd_info *mtd, struct nand_chip *chip, > >> >> if (check_erased_page) { > >> >> read_oob_data(mtd, chip->oob_poi, denali->page); > >> >> > >> >> - /* check ECC failures that may have occurred on erased pages */ > >> >> - if (check_erased_page) { > >> >> - if (!is_erased(buf, mtd->writesize)) > >> >> - mtd->ecc_stats.failed++; > >> >> - if (!is_erased(buf, mtd->oobsize)) > >> >> - mtd->ecc_stats.failed++; > >> >> + stat = nand_check_erased_ecc_chunk( > >> >> + buf, mtd->writesize, > >> >> + chip->oob_poi, mtd->oobsize, > >> >> + NULL, 0, > >> >> + chip->ecc.strength * chip->ecc.steps); > >> > > >> > That's not how it's supposed to be done. Each chunk should be checked > >> > independently. Here is a simple example explaining why this is > >> > important: > >> > > >> > Let's consider the following setup: > >> > - 4k pages > >> > - 16bits/1024bytes ECC > >> > > >> > With your approach, you turn this into: > >> > - 4k pages > >> > - 64bits/4096bytes ECC > >> > > >> > Now suppose you have 32 bitflips in the first 1024 bytes. The real ECC > >> > config is expected to report uncorrectable errors, but your approach > >> > will just report that 32 bits have been fixed, which is wrong. > >> > >> > >> OK. How about adding a helper like follows: > >> > >> static int denali_check_erased_page(struct mtd_info *mtd, > >> struct nand_chip *chip, uint8_t *buf) > >> { > >> uint8_t *ecc_code = chip->buffers->ecccode; > >> int ecc_steps = chip->ecc.steps; > >> int ecc_size = chip->ecc.size; > >> int ecc_bytes = chip->ecc.bytes; > >> int i, ret; > >> > >> ret = mtd_ooblayout_get_eccbytes(mtd, ecc_code, chip->oob_poi, 0, > >> chip->ecc.total); > >> if (ret) > >> return ret; > >> > >> for (i = 0; i < ecc_steps; i++) { > >> ret = nand_check_erased_ecc_chunk(buf, ecc_size, > >> ecc_code, ecc_bytes, > >> NULL, 0, > >> chip->ecc.strength); > >> if (ret < 0) > >> return ret; > >> buf += ecc_size; > >> ecc_code += ecc_bytes; > >> } > >> > >> return 0; > >> } > >> > >> > >> > >> Then, > >> > >> stat = denali_check_erased_page(mtd, chip, buf); > >> if (stat < 0) { > >> mtd->ecc_stats.failed++; > >> /* return 0 for uncorrectable bitflips */ > >> stat = 0; > >> } > > > > What's the point of checking all ECC chunks if only one contains ECC > > errors? I really recommend to put the nand_check_erased_ecc_chunk() > > call next to the per-ECC-block correction test. > > > OK. I can fix it for software ECC fixup. > > > What should I do for hardware ECC fixup case? > http://patchwork.ozlabs.org/patch/742321/ > > > If at least one ECC sector fails to correct bit-flips, > the controller sets INTR__ECC_UNCOR_ERR flag. > > > In this case, we can not know the number of uncorrectable errors. > > Possible solutions are: > > - Increment ecc_stats.failed only by one (compromised solution) Let's go for this solution. > > > > Also, mtd->ecc_stats.failed is supposed to be incremented each time an > > uncorrectable error is detected. In your denali_sw_ecc_fixup() > > implementation you can detect errors at the ECC chunk level, so you > > should increment ecc_stats.failed for each failure and not once if at > > least one chunk is faulty. > > > Yes, I can do this for denali_sw_ecc_fixup(). > > Can I ask what disadvantage would happen > if ecc_stats.failed / .corrected is incremented only by one, > where actually errors happen in multiple sectors. Reporting wrong stats, which is not such a big deal, but let's try to keep them correct when we can (the SW ECC fixup case).