All of lore.kernel.org
 help / color / mirror / Atom feed
* fsl_ifc_nand: are blank pages protected by ECC?
@ 2017-04-19 12:13 Pavel Machek
  2017-04-19 21:18 ` Boris Brezillon
  0 siblings, 1 reply; 42+ messages in thread
From: Pavel Machek @ 2017-04-19 12:13 UTC (permalink / raw)
  To: boris.brezillon, richard, dwmw2, computersforpeace, marek.vasut,
	cyrille.pitchen, linux-mtd, linux-kernel, mark.marshall, b44839,
	prabhakar

[-- Attachment #1: Type: text/plain, Size: 2069 bytes --]

Hi!

We have some problems with fsl_ifc_nand ... in the old kernels, but
this one does not seem to be fixed in v4.11, either.

UBIFS complains:

UBIFS error (pid 931): ubifs_scan: corrupt empty space at LEB 282:252630
UBIFS error (pid 931): ubifs_scanned_corruption: corruption at LEB 282:252630
UBIFS error (pid 931): ubifs_scanned_corruption: first 1322 bytes from LEB 282:252630
UBIFS error (pid 931): ubifs_scan: LEB 282 scanning failed

Possible explanation is here:

https://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/289605

# I see on the forum that this issue has been raised before - my
# understanding is that the omap2 nand driver does not perform ECC
# detection/correction on empty pages so when UBIFS checks the empty
# space data and doesn't read all 0xFF then it fails and mounts
# read-only. I didn't find any good solution - only a workaround to
# remove the UBIFS check..

So I checked fsl_ifc_nand.c in v4.11-rc, and yes, it seems to have the
same problem:

			if (errors == 15) {
	                        /*
                                 * Uncorrectable error.
                                 * OK only if the whole page is blank.
                                 *
                                 * We disable ECCER reporting due to...
                                 * erratum IFC-A002770 -- so report it now if we
                                 * see an uncorrectable error in ECCSTAT.
                                 */
                                if (!is_blank(mtd, bufnum))
                                        ctrl->nand_stat |=
                                                IFC_NAND_EVTER_STAT_ECCER;
                                break;
                        }

is_blank() checks for all 0xff's, so single-bit 0xfe in the data will
result in_blank() == 0 and uncorrectable error being signaled.

Should the driver be modified somehow?

Thanks,
									Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: fsl_ifc_nand: are blank pages protected by ECC?
  2017-04-19 12:13 fsl_ifc_nand: are blank pages protected by ECC? Pavel Machek
@ 2017-04-19 21:18 ` Boris Brezillon
  2017-04-19 22:15   ` Pavel Machek
  2017-04-21 10:08   ` fsl_ifc_nand: are blank pages protected by ECC? Pavel Machek
  0 siblings, 2 replies; 42+ messages in thread
From: Boris Brezillon @ 2017-04-19 21:18 UTC (permalink / raw)
  To: Pavel Machek
  Cc: richard, dwmw2, computersforpeace, marek.vasut, cyrille.pitchen,
	linux-mtd, linux-kernel, mark.marshall, b44839, prabhakar

On Wed, 19 Apr 2017 14:13:32 +0200
Pavel Machek <pavel@ucw.cz> wrote:

> Hi!
> 
> We have some problems with fsl_ifc_nand ... in the old kernels, but
> this one does not seem to be fixed in v4.11, either.
> 
> UBIFS complains:
> 
> UBIFS error (pid 931): ubifs_scan: corrupt empty space at LEB 282:252630
> UBIFS error (pid 931): ubifs_scanned_corruption: corruption at LEB 282:252630
> UBIFS error (pid 931): ubifs_scanned_corruption: first 1322 bytes from LEB 282:252630
> UBIFS error (pid 931): ubifs_scan: LEB 282 scanning failed
> 
> Possible explanation is here:
> 
> https://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/289605
> 
> # I see on the forum that this issue has been raised before - my
> # understanding is that the omap2 nand driver does not perform ECC
> # detection/correction on empty pages so when UBIFS checks the empty
> # space data and doesn't read all 0xFF then it fails and mounts
> # read-only. I didn't find any good solution - only a workaround to
> # remove the UBIFS check..
> 
> So I checked fsl_ifc_nand.c in v4.11-rc, and yes, it seems to have the
> same problem:
> 
> 			if (errors == 15) {
> 	                        /*
>                                  * Uncorrectable error.
>                                  * OK only if the whole page is blank.
>                                  *
>                                  * We disable ECCER reporting due to...
>                                  * erratum IFC-A002770 -- so report it now if we
>                                  * see an uncorrectable error in ECCSTAT.
>                                  */
>                                 if (!is_blank(mtd, bufnum))
>                                         ctrl->nand_stat |=
>                                                 IFC_NAND_EVTER_STAT_ECCER;
>                                 break;
>                         }
> 
> is_blank() checks for all 0xff's, so single-bit 0xfe in the data will
> result in_blank() == 0 and uncorrectable error being signaled.
> 
> Should the driver be modified somehow?

Yep, nand_check_erased_ecc_chunk() [1] is here to help you check this
case, unfortunately, it's not directly applicable here, because this
function takes regular pointers and not __iomem ones. You'll either
have to copy the data in an intermediate buffer before calling
nand_check_erased_ecc_chunk(), or cast the SRAM region to a void
pointer (which is usually not a good idea). The last option would be to
open code nand_check_erased_ecc_chunk(), but I'd really like to avoid
that (for maintainability concerns).

[1]http://lxr.free-electrons.com/source/drivers/mtd/nand/nand_base.c#L1414

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: fsl_ifc_nand: are blank pages protected by ECC?
  2017-04-19 21:18 ` Boris Brezillon
@ 2017-04-19 22:15   ` Pavel Machek
  2017-04-19 22:27     ` Boris Brezillon
  2017-04-21 10:08   ` fsl_ifc_nand: are blank pages protected by ECC? Pavel Machek
  1 sibling, 1 reply; 42+ messages in thread
From: Pavel Machek @ 2017-04-19 22:15 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: richard, dwmw2, computersforpeace, marek.vasut, cyrille.pitchen,
	linux-mtd, linux-kernel, mark.marshall, b44839, prabhakar

[-- Attachment #1: Type: text/plain, Size: 4142 bytes --]

Hi!

> > We have some problems with fsl_ifc_nand ... in the old kernels, but
> > this one does not seem to be fixed in v4.11, either.
> > 
> > UBIFS complains:
> > 
> > UBIFS error (pid 931): ubifs_scan: corrupt empty space at LEB 282:252630
> > UBIFS error (pid 931): ubifs_scanned_corruption: corruption at LEB 282:252630
> > UBIFS error (pid 931): ubifs_scanned_corruption: first 1322 bytes from LEB 282:252630
> > UBIFS error (pid 931): ubifs_scan: LEB 282 scanning failed
> > 
> > Possible explanation is here:
> > 
> > https://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/289605
> > 
> > # I see on the forum that this issue has been raised before - my
> > # understanding is that the omap2 nand driver does not perform ECC
> > # detection/correction on empty pages so when UBIFS checks the empty
> > # space data and doesn't read all 0xFF then it fails and mounts
> > # read-only. I didn't find any good solution - only a workaround to
> > # remove the UBIFS check..
> > 
> > So I checked fsl_ifc_nand.c in v4.11-rc, and yes, it seems to have the
> > same problem:
> > 
> > 			if (errors == 15) {
> > 	                        /*
> >                                  * Uncorrectable error.
> >                                  * OK only if the whole page is blank.
> >                                  *
> >                                  * We disable ECCER reporting due to...
> >                                  * erratum IFC-A002770 -- so report it now if we
> >                                  * see an uncorrectable error in ECCSTAT.
> >                                  */
> >                                 if (!is_blank(mtd, bufnum))
> >                                         ctrl->nand_stat |=
> >                                                 IFC_NAND_EVTER_STAT_ECCER;
> >                                 break;
> >                         }
> > 
> > is_blank() checks for all 0xff's, so single-bit 0xfe in the data will
> > result in_blank() == 0 and uncorrectable error being signaled.
> > 
> > Should the driver be modified somehow?
> 
> Yep, nand_check_erased_ecc_chunk() [1] is here to help you check this
> case, unfortunately, it's not directly applicable here, because this
> function takes regular pointers and not __iomem ones. You'll either
> have to copy the data in an intermediate buffer before calling
> nand_check_erased_ecc_chunk(), or cast the SRAM region to a void
> pointer (which is usually not a good idea). The last option would be to
> open code nand_check_erased_ecc_chunk(), but I'd really like to avoid
> that (for maintainability concerns).

Ok, thanks a lot for the pointer, that should be doable.

Core of the code is:

1357         for (; len >= sizeof(long);
1358              len -= sizeof(long), bitmap += sizeof(long)) {
1359                 weight = hweight_long(*((unsigned long
*)bitmap));
1360                 bitflips += BITS_PER_LONG - weight;
1361                 if (unlikely(bitflips > bitflips_threshold))
1362                         return -EBADMSG;
1363         }

Someone clearly optimized this code (took care to do long accesses
etc), but afaict hweight is quite a heavy operation:

_GLOBAL(__arch_hweight32)
BEGIN_FTR_SECTION
        b __sw_hweight32
        nop
        nop
        nop
        nop
	nop
	nop
FTR_SECTION_ELSE
  BEGIN_FTR_SECTION_NESTED(51)
	PPC_POPCNTB(R3,R3)
	srdi    r4,r3,16
        add     r3,r4,r3
        srdi    r4,r3,8
	add     r3,r4,r3
	clrldi  r3,r3,64-8
	blr
  FTR_SECTION_ELSE_NESTED(51)
	PPC_POPCNTW(R3,R3)
	clrldi  r3,r3,64-8
	blr
  ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_POPCNTD, 51)
ALT_FTR_SECTION_END_IFCLR(CPU_FTR_POPCNTB)
EXPORT_SYMBOL(__arch_hweight32)

Would it make sense to only do hweight if *bitmap != ~0ULL ? Would it
make sense to only check for bitflips > bitflips_threshold each 128
bytes or something like that?

Thanks and best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: fsl_ifc_nand: are blank pages protected by ECC?
  2017-04-19 22:15   ` Pavel Machek
@ 2017-04-19 22:27     ` Boris Brezillon
  2017-04-20 11:40       ` Pavel Machek
  0 siblings, 1 reply; 42+ messages in thread
From: Boris Brezillon @ 2017-04-19 22:27 UTC (permalink / raw)
  To: Pavel Machek
  Cc: richard, dwmw2, computersforpeace, marek.vasut, cyrille.pitchen,
	linux-mtd, linux-kernel, mark.marshall, b44839, prabhakar

On Thu, 20 Apr 2017 00:15:07 +0200
Pavel Machek <pavel@ucw.cz> wrote:

> Hi!
> 
> > > We have some problems with fsl_ifc_nand ... in the old kernels, but
> > > this one does not seem to be fixed in v4.11, either.
> > > 
> > > UBIFS complains:
> > > 
> > > UBIFS error (pid 931): ubifs_scan: corrupt empty space at LEB 282:252630
> > > UBIFS error (pid 931): ubifs_scanned_corruption: corruption at LEB 282:252630
> > > UBIFS error (pid 931): ubifs_scanned_corruption: first 1322 bytes from LEB 282:252630
> > > UBIFS error (pid 931): ubifs_scan: LEB 282 scanning failed
> > > 
> > > Possible explanation is here:
> > > 
> > > https://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/289605
> > > 
> > > # I see on the forum that this issue has been raised before - my
> > > # understanding is that the omap2 nand driver does not perform ECC
> > > # detection/correction on empty pages so when UBIFS checks the empty
> > > # space data and doesn't read all 0xFF then it fails and mounts
> > > # read-only. I didn't find any good solution - only a workaround to
> > > # remove the UBIFS check..
> > > 
> > > So I checked fsl_ifc_nand.c in v4.11-rc, and yes, it seems to have the
> > > same problem:
> > > 
> > > 			if (errors == 15) {
> > > 	                        /*
> > >                                  * Uncorrectable error.
> > >                                  * OK only if the whole page is blank.
> > >                                  *
> > >                                  * We disable ECCER reporting due to...
> > >                                  * erratum IFC-A002770 -- so report it now if we
> > >                                  * see an uncorrectable error in ECCSTAT.
> > >                                  */
> > >                                 if (!is_blank(mtd, bufnum))
> > >                                         ctrl->nand_stat |=
> > >                                                 IFC_NAND_EVTER_STAT_ECCER;
> > >                                 break;
> > >                         }
> > > 
> > > is_blank() checks for all 0xff's, so single-bit 0xfe in the data will
> > > result in_blank() == 0 and uncorrectable error being signaled.
> > > 
> > > Should the driver be modified somehow?  
> > 
> > Yep, nand_check_erased_ecc_chunk() [1] is here to help you check this
> > case, unfortunately, it's not directly applicable here, because this
> > function takes regular pointers and not __iomem ones. You'll either
> > have to copy the data in an intermediate buffer before calling
> > nand_check_erased_ecc_chunk(), or cast the SRAM region to a void
> > pointer (which is usually not a good idea). The last option would be to
> > open code nand_check_erased_ecc_chunk(), but I'd really like to avoid
> > that (for maintainability concerns).  
> 
> Ok, thanks a lot for the pointer, that should be doable.
> 
> Core of the code is:
> 
> 1357         for (; len >= sizeof(long);
> 1358              len -= sizeof(long), bitmap += sizeof(long)) {
> 1359                 weight = hweight_long(*((unsigned long
> *)bitmap));
> 1360                 bitflips += BITS_PER_LONG - weight;
> 1361                 if (unlikely(bitflips > bitflips_threshold))
> 1362                         return -EBADMSG;
> 1363         }
> 
> Someone clearly optimized this code (took care to do long accesses
> etc), but afaict hweight is quite a heavy operation:
> 
> _GLOBAL(__arch_hweight32)
> BEGIN_FTR_SECTION
>         b __sw_hweight32
>         nop
>         nop
>         nop
>         nop
> 	nop
> 	nop
> FTR_SECTION_ELSE
>   BEGIN_FTR_SECTION_NESTED(51)
> 	PPC_POPCNTB(R3,R3)
> 	srdi    r4,r3,16
>         add     r3,r4,r3
>         srdi    r4,r3,8
> 	add     r3,r4,r3
> 	clrldi  r3,r3,64-8
> 	blr
>   FTR_SECTION_ELSE_NESTED(51)
> 	PPC_POPCNTW(R3,R3)
> 	clrldi  r3,r3,64-8
> 	blr
>   ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_POPCNTD, 51)
> ALT_FTR_SECTION_END_IFCLR(CPU_FTR_POPCNTB)
> EXPORT_SYMBOL(__arch_hweight32)
> 
> Would it make sense to only do hweight if *bitmap != ~0ULL ? Would it
> make sense to only check for bitflips > bitflips_threshold each 128
> bytes or something like that?

I didn't go as far as you did and simply assumed hweight32/64() were
already optimized. Feel free to propose extra improvements.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: fsl_ifc_nand: are blank pages protected by ECC?
  2017-04-19 22:27     ` Boris Brezillon
@ 2017-04-20 11:40       ` Pavel Machek
  2017-04-20 12:15         ` Boris Brezillon
  2017-04-21 10:51         ` [PATCH] nand_base: optimize checking of erased buffers Pavel Machek
  0 siblings, 2 replies; 42+ messages in thread
From: Pavel Machek @ 2017-04-20 11:40 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: richard, dwmw2, computersforpeace, marek.vasut, cyrille.pitchen,
	linux-mtd, linux-kernel, mark.marshall, b44839, prabhakar

[-- Attachment #1: Type: text/plain, Size: 1366 bytes --]

Hi!

> > Would it make sense to only do hweight if *bitmap != ~0ULL ? Would it
> > make sense to only check for bitflips > bitflips_threshold each 128
> > bytes or something like that?
> 
> I didn't go as far as you did and simply assumed hweight32/64() were
> already optimized. Feel free to propose extra improvements.

I'd propose this one (only compile tested, sorry, not sure how to test
this one). If we see ~0UL, there's no need for hweight, and no need to
check number of bitflips. So this should be net win.

Signed-off-by: Pavel Machek <pavel@denx.de>

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index b0524f8..96c27ec 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -1357,7 +1357,10 @@ static int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
 
 	for (; len >= sizeof(long);
 	     len -= sizeof(long), bitmap += sizeof(long)) {
-		weight = hweight_long(*((unsigned long *)bitmap));
+		unsigned long d = *((unsigned long *)bitmap);
+		if (d == ~0UL)
+			continue;
+		weight = hweight_long(d);
 		bitflips += BITS_PER_LONG - weight;
 		if (unlikely(bitflips > bitflips_threshold))
 			return -EBADMSG;

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: fsl_ifc_nand: are blank pages protected by ECC?
  2017-04-20 11:40       ` Pavel Machek
@ 2017-04-20 12:15         ` Boris Brezillon
  2017-04-21 10:51         ` [PATCH] nand_base: optimize checking of erased buffers Pavel Machek
  1 sibling, 0 replies; 42+ messages in thread
From: Boris Brezillon @ 2017-04-20 12:15 UTC (permalink / raw)
  To: Pavel Machek
  Cc: richard, dwmw2, computersforpeace, marek.vasut, cyrille.pitchen,
	linux-mtd, linux-kernel, mark.marshall, b44839, prabhakar

On Thu, 20 Apr 2017 13:40:57 +0200
Pavel Machek <pavel@ucw.cz> wrote:

> Hi!
> 
> > > Would it make sense to only do hweight if *bitmap != ~0ULL ? Would it
> > > make sense to only check for bitflips > bitflips_threshold each 128
> > > bytes or something like that?  
> > 
> > I didn't go as far as you did and simply assumed hweight32/64() were
> > already optimized. Feel free to propose extra improvements.  
> 
> I'd propose this one (only compile tested, sorry, not sure how to test
> this one). If we see ~0UL, there's no need for hweight, and no need to
> check number of bitflips. So this should be net win.

Looks good to me. Can you send a patch with a real commit message?

Thanks,

Boris

> 
> Signed-off-by: Pavel Machek <pavel@denx.de>
> 
> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> index b0524f8..96c27ec 100644
> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -1357,7 +1357,10 @@ static int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
>  
>  	for (; len >= sizeof(long);
>  	     len -= sizeof(long), bitmap += sizeof(long)) {
> -		weight = hweight_long(*((unsigned long *)bitmap));
> +		unsigned long d = *((unsigned long *)bitmap);
> +		if (d == ~0UL)
> +			continue;
> +		weight = hweight_long(d);
>  		bitflips += BITS_PER_LONG - weight;
>  		if (unlikely(bitflips > bitflips_threshold))
>  			return -EBADMSG;
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: fsl_ifc_nand: are blank pages protected by ECC?
  2017-04-19 21:18 ` Boris Brezillon
  2017-04-19 22:15   ` Pavel Machek
@ 2017-04-21 10:08   ` Pavel Machek
  2017-04-21 10:12     ` Richard Weinberger
                       ` (2 more replies)
  1 sibling, 3 replies; 42+ messages in thread
From: Pavel Machek @ 2017-04-21 10:08 UTC (permalink / raw)
  To: Boris Brezillon, Dipen.Dudhat
  Cc: richard, dwmw2, computersforpeace, marek.vasut, cyrille.pitchen,
	linux-mtd, linux-kernel, mark.marshall, b44839, prabhakar

[-- Attachment #1: Type: text/plain, Size: 4521 bytes --]

Hi!

(Added driver author to the cc list, maybe he can help).

> > Hi!
> > 
> > We have some problems with fsl_ifc_nand ... in the old kernels, but
> > this one does not seem to be fixed in v4.11, either.
> > 
> > UBIFS complains:
> > 
> > UBIFS error (pid 931): ubifs_scan: corrupt empty space at LEB 282:252630
> > UBIFS error (pid 931): ubifs_scanned_corruption: corruption at LEB 282:252630
> > UBIFS error (pid 931): ubifs_scanned_corruption: first 1322 bytes from LEB 282:252630
> > UBIFS error (pid 931): ubifs_scan: LEB 282 scanning failed
> > 
> > Possible explanation is here:
> > 
> > https://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/289605
> > 
> > # I see on the forum that this issue has been raised before - my
> > # understanding is that the omap2 nand driver does not perform ECC
> > # detection/correction on empty pages so when UBIFS checks the empty
> > # space data and doesn't read all 0xFF then it fails and mounts
> > # read-only. I didn't find any good solution - only a workaround to
> > # remove the UBIFS check..
> > 
> > So I checked fsl_ifc_nand.c in v4.11-rc, and yes, it seems to have the
> > same problem:
> > 
> > 			if (errors == 15) {
> > 	                        /*
> >                                  * Uncorrectable error.
> >                                  * OK only if the whole page is blank.
> >                                  *
> >                                  * We disable ECCER reporting due to...
> >                                  * erratum IFC-A002770 -- so report it now if we
> >                                  * see an uncorrectable error in ECCSTAT.
> >                                  */
> >                                 if (!is_blank(mtd, bufnum))
> >                                         ctrl->nand_stat |=
> >                                                 IFC_NAND_EVTER_STAT_ECCER;
> >                                 break;
> >                         }
> > 
> > is_blank() checks for all 0xff's, so single-bit 0xfe in the data will
> > result in_blank() == 0 and uncorrectable error being signaled.
> > 
> > Should the driver be modified somehow?
> 
> Yep, nand_check_erased_ecc_chunk() [1] is here to help you check this
> case, unfortunately, it's not directly applicable here, because this
> function takes regular pointers and not __iomem ones. You'll either
> have to copy the data in an intermediate buffer before calling
> nand_check_erased_ecc_chunk(), or cast the SRAM region to a void
> pointer (which is usually not a good idea). The last option would be to
> open code nand_check_erased_ecc_chunk(), but I'd really like to avoid
> that (for maintainability concerns).

Ok, took a look. __iomem is part of a problem, another part is that
nand_check_erased_ecc_chunk() needs to actually write back 0xff's to
undo the corruption, which would probably be bad idea to do in the
iomem, and next one is that blank actually checks arbitrary number of
regions, based on ecc.layout.

So this could be used to simplify the code (if nand_check_erased_buf
was exported; it is not), but it does not fix the problem as we still
need to undo the corruption.

Hints welcome, especially if you know right place where to put this
checking.

(BTW, switching to ecc.mode = ECC_SOFT will cause compatibility
problems but should make the problem go away, right?) 

Thanks,
								Pavel

diff --git a/drivers/mtd/nand/fsl_ifc_nand.c b/drivers/mtd/nand/fsl_ifc_nand.c
index d1570f5..df02d4c 100644
--- a/drivers/mtd/nand/fsl_ifc_nand.c
+++ b/drivers/mtd/nand/fsl_ifc_nand.c
@@ -181,17 +181,15 @@ static int is_blank(struct mtd_info *mtd, unsigned int bufnum)
 	struct mtd_oob_region oobregion = { };
 	int i, section = 0;
 
-	for (i = 0; i < mtd->writesize / 4; i++) {
-		if (__raw_readl(&mainarea[i]) != 0xffffffff)
-			return 0;
-	}
+	i = nand_check_erased_buf(&mainarea[i], mtd->writesize, 0);
+	if (i)
+		return 0;
 
 	mtd_ooblayout_ecc(mtd, section++, &oobregion);
 	while (oobregion.length) {
-		for (i = 0; i < oobregion.length; i++) {
-			if (__raw_readb(&oob[oobregion.offset + i]) != 0xff)
-				return 0;
-		}
+		i = nand_check_erased_buf(&oob[oobregion.offset], oobregion.length, 0);
+		if (i)
+			return 0;
 
 		mtd_ooblayout_ecc(mtd, section++, &oobregion);
 	}




-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: fsl_ifc_nand: are blank pages protected by ECC?
  2017-04-21 10:08   ` fsl_ifc_nand: are blank pages protected by ECC? Pavel Machek
@ 2017-04-21 10:12     ` Richard Weinberger
  2017-04-21 12:04     ` Boris Brezillon
  2017-04-21 13:37     ` Pavel Machek
  2 siblings, 0 replies; 42+ messages in thread
From: Richard Weinberger @ 2017-04-21 10:12 UTC (permalink / raw)
  To: Pavel Machek, Boris Brezillon, Dipen.Dudhat
  Cc: dwmw2, computersforpeace, marek.vasut, cyrille.pitchen,
	linux-mtd, linux-kernel, mark.marshall, b44839, prabhakar

Pavel,

Am 21.04.2017 um 12:08 schrieb Pavel Machek:
> (BTW, switching to ecc.mode = ECC_SOFT will cause compatibility
> problems but should make the problem go away, right?) 

Yes and it is slow.
So, fixing the driver is the way to go. :-)

Thanks,
//richard

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH] nand_base: optimize checking of erased buffers
  2017-04-20 11:40       ` Pavel Machek
  2017-04-20 12:15         ` Boris Brezillon
@ 2017-04-21 10:51         ` Pavel Machek
  2017-05-17 11:27           ` Mason
  2017-05-17 12:22           ` [PATCH] fsl_ifc_nand: fix handing of bit flips in erased nand Pavel Machek
  1 sibling, 2 replies; 42+ messages in thread
From: Pavel Machek @ 2017-04-21 10:51 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: richard, dwmw2, computersforpeace, marek.vasut, cyrille.pitchen,
	linux-mtd, linux-kernel, mark.marshall, b44839, prabhakar

[-- Attachment #1: Type: text/plain, Size: 963 bytes --]


If we see ~0UL in flash, there's no need for hweight, and no need to
check number of bitflips. So this should be net win.

Signed-off-by: Pavel Machek <pavel@denx.de>

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index b0524f8..96c27ec 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -1357,7 +1357,10 @@ static int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
 
 	for (; len >= sizeof(long);
 	     len -= sizeof(long), bitmap += sizeof(long)) {
-		weight = hweight_long(*((unsigned long *)bitmap));
+		unsigned long d = *((unsigned long *)bitmap);
+		if (d == ~0UL)
+			continue;
+		weight = hweight_long(d);
 		bitflips += BITS_PER_LONG - weight;
 		if (unlikely(bitflips > bitflips_threshold))
 			return -EBADMSG;

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: fsl_ifc_nand: are blank pages protected by ECC?
  2017-04-21 10:08   ` fsl_ifc_nand: are blank pages protected by ECC? Pavel Machek
  2017-04-21 10:12     ` Richard Weinberger
@ 2017-04-21 12:04     ` Boris Brezillon
  2017-04-21 13:37     ` Pavel Machek
  2 siblings, 0 replies; 42+ messages in thread
From: Boris Brezillon @ 2017-04-21 12:04 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Dipen.Dudhat, richard, dwmw2, computersforpeace, marek.vasut,
	cyrille.pitchen, linux-mtd, linux-kernel, mark.marshall, b44839,
	prabhakar

On Fri, 21 Apr 2017 12:08:13 +0200
Pavel Machek <pavel@ucw.cz> wrote:

> Hi!
> 
> (Added driver author to the cc list, maybe he can help).
> 
> > > Hi!
> > > 
> > > We have some problems with fsl_ifc_nand ... in the old kernels, but
> > > this one does not seem to be fixed in v4.11, either.
> > > 
> > > UBIFS complains:
> > > 
> > > UBIFS error (pid 931): ubifs_scan: corrupt empty space at LEB 282:252630
> > > UBIFS error (pid 931): ubifs_scanned_corruption: corruption at LEB 282:252630
> > > UBIFS error (pid 931): ubifs_scanned_corruption: first 1322 bytes from LEB 282:252630
> > > UBIFS error (pid 931): ubifs_scan: LEB 282 scanning failed
> > > 
> > > Possible explanation is here:
> > > 
> > > https://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/289605
> > > 
> > > # I see on the forum that this issue has been raised before - my
> > > # understanding is that the omap2 nand driver does not perform ECC
> > > # detection/correction on empty pages so when UBIFS checks the empty
> > > # space data and doesn't read all 0xFF then it fails and mounts
> > > # read-only. I didn't find any good solution - only a workaround to
> > > # remove the UBIFS check..
> > > 
> > > So I checked fsl_ifc_nand.c in v4.11-rc, and yes, it seems to have the
> > > same problem:
> > > 
> > > 			if (errors == 15) {
> > > 	                        /*
> > >                                  * Uncorrectable error.
> > >                                  * OK only if the whole page is blank.
> > >                                  *
> > >                                  * We disable ECCER reporting due to...
> > >                                  * erratum IFC-A002770 -- so report it now if we
> > >                                  * see an uncorrectable error in ECCSTAT.
> > >                                  */
> > >                                 if (!is_blank(mtd, bufnum))
> > >                                         ctrl->nand_stat |=
> > >                                                 IFC_NAND_EVTER_STAT_ECCER;
> > >                                 break;
> > >                         }
> > > 
> > > is_blank() checks for all 0xff's, so single-bit 0xfe in the data will
> > > result in_blank() == 0 and uncorrectable error being signaled.
> > > 
> > > Should the driver be modified somehow?  
> > 
> > Yep, nand_check_erased_ecc_chunk() [1] is here to help you check this
> > case, unfortunately, it's not directly applicable here, because this
> > function takes regular pointers and not __iomem ones. You'll either
> > have to copy the data in an intermediate buffer before calling
> > nand_check_erased_ecc_chunk(), or cast the SRAM region to a void
> > pointer (which is usually not a good idea). The last option would be to
> > open code nand_check_erased_ecc_chunk(), but I'd really like to avoid
> > that (for maintainability concerns).  
> 
> Ok, took a look. __iomem is part of a problem, another part is that
> nand_check_erased_ecc_chunk() needs to actually write back 0xff's to
> undo the corruption, which would probably be bad idea to do in the
> iomem, and next one is that blank actually checks arbitrary number of
> regions, based on ecc.layout.
> 
> So this could be used to simplify the code (if nand_check_erased_buf
> was exported; it is not), but it does not fix the problem as we still
> need to undo the corruption.

Actually, there was a good reason for not directly exporting this
buffer (see Brian's comment here [1]), and I don't think we should start
exporting it. This and the fact that passing an iomem pointer sounds
like a bad idea makes me think you should modify the driver to put the
data in a buffer when you want to check for bitflips in erased pages.

> 
> Hints welcome, especially if you know right place where to put this
> checking.

Just had a quick look at the driver, and it seems like you could move
things around to check for bitflips in erased pages after you've copied
the data in the user buffer (in fsl_ifc_read_page()).

> 
> (BTW, switching to ecc.mode = ECC_SOFT will cause compatibility
> problems but should make the problem go away, right?) 

Nope, I don't think switching to ECC_SOFT is the right solution here.

Regards,

Boris

[1]https://patchwork.ozlabs.org/patch/509970/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: fsl_ifc_nand: are blank pages protected by ECC?
  2017-04-21 10:08   ` fsl_ifc_nand: are blank pages protected by ECC? Pavel Machek
  2017-04-21 10:12     ` Richard Weinberger
  2017-04-21 12:04     ` Boris Brezillon
@ 2017-04-21 13:37     ` Pavel Machek
  2017-04-21 13:49       ` Boris Brezillon
  2 siblings, 1 reply; 42+ messages in thread
From: Pavel Machek @ 2017-04-21 13:37 UTC (permalink / raw)
  To: Boris Brezillon, Dipen.Dudhat
  Cc: richard, dwmw2, computersforpeace, marek.vasut, cyrille.pitchen,
	linux-mtd, linux-kernel, mark.marshall, b44839, prabhakar

[-- Attachment #1: Type: text/plain, Size: 4879 bytes --]

Hi!

> (Added driver author to the cc list, maybe he can help).

> > > UBIFS complains:
> > > 
> > > UBIFS error (pid 931): ubifs_scan: corrupt empty space at LEB 282:252630
> > > UBIFS error (pid 931): ubifs_scanned_corruption: corruption at LEB 282:252630
> > > UBIFS error (pid 931): ubifs_scanned_corruption: first 1322 bytes from LEB 282:252630
> > > UBIFS error (pid 931): ubifs_scan: LEB 282 scanning failed
...
> > > is_blank() checks for all 0xff's, so single-bit 0xfe in the data will
> > > result in_blank() == 0 and uncorrectable error being signaled.
> > > 
> > > Should the driver be modified somehow?
> > 
> > Yep, nand_check_erased_ecc_chunk() [1] is here to help you check this
> > case, unfortunately, it's not directly applicable here, because this

Maybe I figured it out. Unfortunately, it is only compile tested. Does
it look approximately right?

Thanks,
								Pavel

--- a/drivers/mtd/nand/fsl_ifc_nand.c
+++ b/drivers/mtd/nand/fsl_ifc_nand.c
@@ -171,32 +171,6 @@ static void set_addr(struct mtd_info *mtd, int column, int page_addr, int oob)
 		ifc_nand_ctrl->index += mtd->writesize;
 }
 
-static int is_blank(struct mtd_info *mtd, unsigned int bufnum)
-{
-	struct nand_chip *chip = mtd_to_nand(mtd);
-	struct fsl_ifc_mtd *priv = nand_get_controller_data(chip);
-	u8 __iomem *addr = priv->vbase + bufnum * (mtd->writesize * 2);
-	u32 __iomem *mainarea = (u32 __iomem *)addr;
-	u8 __iomem *oob = addr + mtd->writesize;
-	struct mtd_oob_region oobregion = { };
-	int i, section = 0;
-
-	i = nand_check_erased_buf(&mainarea[i], mtd->writesize, 0);
-	if (i)
-		return 0;
-
-	mtd_ooblayout_ecc(mtd, section++, &oobregion);
-	while (oobregion.length) {
-		i = nand_check_erased_buf(&oob[oobregion.offset], oobregion.length, 0);
-		if (i)
-			return 0;
-
-		mtd_ooblayout_ecc(mtd, section++, &oobregion);
-	}
-
-	return 1;
-}
-
 /* returns nonzero if entire page is blank */
 static int check_read_ecc(struct mtd_info *mtd, struct fsl_ifc_ctrl *ctrl,
 			  u32 *eccstat, unsigned int bufnum)
@@ -272,16 +246,14 @@ static void fsl_ifc_run_command(struct mtd_info *mtd)
 			if (errors == 15) {
 				/*
 				 * Uncorrectable error.
-				 * OK only if the whole page is blank.
+				 * We'll check for blank pages later.
 				 *
 				 * We disable ECCER reporting due to...
 				 * erratum IFC-A002770 -- so report it now if we
 				 * see an uncorrectable error in ECCSTAT.
 				 */
-				if (!is_blank(mtd, bufnum))
-					ctrl->nand_stat |=
-						IFC_NAND_EVTER_STAT_ECCER;
-				break;
+				ctrl->nand_stat |= IFC_NAND_EVTER_STAT_ECCER;
+				printk("NAND flash read: error but continuing.\n");
 			}
 
 			mtd->ecc_stats.corrected += errors;
@@ -676,6 +648,40 @@ static int fsl_ifc_wait(struct mtd_info *mtd, struct nand_chip *chip)
 	return nand_fsr | NAND_STATUS_WP;
 }
 
+/*
+ * The controller does not check for bitflips in erased pages,
+ * therefore software must check instead.
+ */
+static int check_erased_page(struct nand_chip *chip, u8 *buf)
+{
+	struct mtd_info *mtd = nand_to_mtd(chip);
+	u8 *ecc = chip->oob_poi;
+	const int ecc_size = chip->ecc.bytes;
+	const int pkt_size = chip->ecc.size;
+	int i, res, bitflips = 0;
+	struct mtd_oob_region oobregion = { };
+	
+	mtd_ooblayout_ecc(mtd, 0, &oobregion);
+	ecc += oobregion.offset;
+
+	for (i = 0; i < chip->ecc.steps; ++i) {
+		res = nand_check_erased_ecc_chunk(buf, pkt_size, ecc, ecc_size,
+						  NULL, 0,
+						  chip->ecc.strength);
+		if (res < 0)
+			mtd->ecc_stats.failed++;
+
+		bitflips = max(res, bitflips);
+		buf += pkt_size;
+		ecc += ecc_size;
+	}
+
+	mtd_ooblayout_ecc(mtd, 1, &oobregion);
+	BUG_ON(oobregion.length);
+
+	return bitflips;
+}
+
 static int fsl_ifc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 			     uint8_t *buf, int oob_required, int page)
 {
@@ -687,11 +693,23 @@ static int fsl_ifc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 	if (oob_required)
 		fsl_ifc_read_buf(mtd, chip->oob_poi, mtd->oobsize);
 
-	if (ctrl->nand_stat & IFC_NAND_EVTER_STAT_ECCER)
-		dev_err(priv->dev, "NAND Flash ECC Uncorrectable Error\n");
-
 	if (ctrl->nand_stat != IFC_NAND_EVTER_STAT_OPC)
 		mtd->ecc_stats.failed++;
+	
+	if (ctrl->nand_stat & IFC_NAND_EVTER_STAT_ECCER) {
+		int res;
+
+		if (!oob_required)
+			fsl_ifc_read_buf(mtd, chip->oob_poi, mtd->oobsize);
+
+		printk("NAND flash: have error (%d)\n", nctrl->max_bitflips);
+		res = check_erased_page(chip, buf);
+		printk("NAND flash: but erased page says %d\n", res);
+
+		dev_err(priv->dev, "NAND Flash ECC Uncorrectable Error or erased page\n");
+		return res;
+	}
+
 
 	return nctrl->max_bitflips;
 }

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: fsl_ifc_nand: are blank pages protected by ECC?
  2017-04-21 13:37     ` Pavel Machek
@ 2017-04-21 13:49       ` Boris Brezillon
  2017-04-22  7:01         ` Pavel Machek
                           ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Boris Brezillon @ 2017-04-21 13:49 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Dipen.Dudhat, richard, dwmw2, computersforpeace, marek.vasut,
	cyrille.pitchen, linux-mtd, linux-kernel, mark.marshall, b44839,
	prabhakar

On Fri, 21 Apr 2017 15:37:21 +0200
Pavel Machek <pavel@ucw.cz> wrote:

> Hi!
> 
> > (Added driver author to the cc list, maybe he can help).  
> 
> > > > UBIFS complains:
> > > > 
> > > > UBIFS error (pid 931): ubifs_scan: corrupt empty space at LEB 282:252630
> > > > UBIFS error (pid 931): ubifs_scanned_corruption: corruption at LEB 282:252630
> > > > UBIFS error (pid 931): ubifs_scanned_corruption: first 1322 bytes from LEB 282:252630
> > > > UBIFS error (pid 931): ubifs_scan: LEB 282 scanning failed  
> ...
> > > > is_blank() checks for all 0xff's, so single-bit 0xfe in the data will
> > > > result in_blank() == 0 and uncorrectable error being signaled.
> > > > 
> > > > Should the driver be modified somehow?  
> > > 
> > > Yep, nand_check_erased_ecc_chunk() [1] is here to help you check this
> > > case, unfortunately, it's not directly applicable here, because this  
> 
> Maybe I figured it out. Unfortunately, it is only compile tested. Does
> it look approximately right?

Yep that's definitely better. Just one thing missing (see below),
otherwise it looks good.

> 
> Thanks,
> 								Pavel
> 
> --- a/drivers/mtd/nand/fsl_ifc_nand.c
> +++ b/drivers/mtd/nand/fsl_ifc_nand.c
> @@ -171,32 +171,6 @@ static void set_addr(struct mtd_info *mtd, int column, int page_addr, int oob)
>  		ifc_nand_ctrl->index += mtd->writesize;
>  }
>  
> -static int is_blank(struct mtd_info *mtd, unsigned int bufnum)
> -{
> -	struct nand_chip *chip = mtd_to_nand(mtd);
> -	struct fsl_ifc_mtd *priv = nand_get_controller_data(chip);
> -	u8 __iomem *addr = priv->vbase + bufnum * (mtd->writesize * 2);
> -	u32 __iomem *mainarea = (u32 __iomem *)addr;
> -	u8 __iomem *oob = addr + mtd->writesize;
> -	struct mtd_oob_region oobregion = { };
> -	int i, section = 0;
> -
> -	i = nand_check_erased_buf(&mainarea[i], mtd->writesize, 0);
> -	if (i)
> -		return 0;
> -
> -	mtd_ooblayout_ecc(mtd, section++, &oobregion);
> -	while (oobregion.length) {
> -		i = nand_check_erased_buf(&oob[oobregion.offset], oobregion.length, 0);
> -		if (i)
> -			return 0;
> -
> -		mtd_ooblayout_ecc(mtd, section++, &oobregion);
> -	}
> -
> -	return 1;
> -}
> -
>  /* returns nonzero if entire page is blank */
>  static int check_read_ecc(struct mtd_info *mtd, struct fsl_ifc_ctrl *ctrl,
>  			  u32 *eccstat, unsigned int bufnum)
> @@ -272,16 +246,14 @@ static void fsl_ifc_run_command(struct mtd_info *mtd)
>  			if (errors == 15) {
>  				/*
>  				 * Uncorrectable error.
> -				 * OK only if the whole page is blank.
> +				 * We'll check for blank pages later.
>  				 *
>  				 * We disable ECCER reporting due to...
>  				 * erratum IFC-A002770 -- so report it now if we
>  				 * see an uncorrectable error in ECCSTAT.
>  				 */
> -				if (!is_blank(mtd, bufnum))
> -					ctrl->nand_stat |=
> -						IFC_NAND_EVTER_STAT_ECCER;
> -				break;
> +				ctrl->nand_stat |= IFC_NAND_EVTER_STAT_ECCER;
> +				printk("NAND flash read: error but continuing.\n");
>  			}
>  
>  			mtd->ecc_stats.corrected += errors;
> @@ -676,6 +648,40 @@ static int fsl_ifc_wait(struct mtd_info *mtd, struct nand_chip *chip)
>  	return nand_fsr | NAND_STATUS_WP;
>  }
>  
> +/*
> + * The controller does not check for bitflips in erased pages,
> + * therefore software must check instead.
> + */
> +static int check_erased_page(struct nand_chip *chip, u8 *buf)
> +{
> +	struct mtd_info *mtd = nand_to_mtd(chip);
> +	u8 *ecc = chip->oob_poi;
> +	const int ecc_size = chip->ecc.bytes;
> +	const int pkt_size = chip->ecc.size;
> +	int i, res, bitflips = 0;
> +	struct mtd_oob_region oobregion = { };
> +	
> +	mtd_ooblayout_ecc(mtd, 0, &oobregion);
> +	ecc += oobregion.offset;
> +
> +	for (i = 0; i < chip->ecc.steps; ++i) {
> +		res = nand_check_erased_ecc_chunk(buf, pkt_size, ecc, ecc_size,
> +						  NULL, 0,
> +						  chip->ecc.strength);
> +		if (res < 0)
> +			mtd->ecc_stats.failed++;

		else
			mtd->ecc_stats.corrected += res;


> +
> +		bitflips = max(res, bitflips);
> +		buf += pkt_size;
> +		ecc += ecc_size;
> +	}
> +
> +	mtd_ooblayout_ecc(mtd, 1, &oobregion);
> +	BUG_ON(oobregion.length);

Probably something you should check at registration time only.

> +
> +	return bitflips;
> +}
> +
>  static int fsl_ifc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
>  			     uint8_t *buf, int oob_required, int page)
>  {
> @@ -687,11 +693,23 @@ static int fsl_ifc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
>  	if (oob_required)
>  		fsl_ifc_read_buf(mtd, chip->oob_poi, mtd->oobsize);
>  
> -	if (ctrl->nand_stat & IFC_NAND_EVTER_STAT_ECCER)
> -		dev_err(priv->dev, "NAND Flash ECC Uncorrectable Error\n");
> -
>  	if (ctrl->nand_stat != IFC_NAND_EVTER_STAT_OPC)
>  		mtd->ecc_stats.failed++;
> +	
> +	if (ctrl->nand_stat & IFC_NAND_EVTER_STAT_ECCER) {
> +		int res;
> +
> +		if (!oob_required)
> +			fsl_ifc_read_buf(mtd, chip->oob_poi, mtd->oobsize);
> +
> +		printk("NAND flash: have error (%d)\n", nctrl->max_bitflips);
> +		res = check_erased_page(chip, buf);
> +		printk("NAND flash: but erased page says %d\n", res);
> +
> +		dev_err(priv->dev, "NAND Flash ECC Uncorrectable Error or erased page\n");
> +		return res;
> +	}
> +
>  
>  	return nctrl->max_bitflips;
>  }
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: fsl_ifc_nand: are blank pages protected by ECC?
  2017-04-21 13:49       ` Boris Brezillon
@ 2017-04-22  7:01         ` Pavel Machek
  2017-04-22 10:40         ` [PATCH] tango_nand.c: fix ecc.stats_corrected in empty flash case Pavel Machek
  2017-04-23  9:58         ` tango_nand: is logic right in error cases? (was Re: fsl_ifc_nand: are blank pages protected by ECC?) Pavel Machek
  2 siblings, 0 replies; 42+ messages in thread
From: Pavel Machek @ 2017-04-22  7:01 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Dipen.Dudhat, richard, dwmw2, computersforpeace, marek.vasut,
	cyrille.pitchen, linux-mtd, linux-kernel, mark.marshall, b44839,
	prabhakar

[-- Attachment #1: Type: text/plain, Size: 970 bytes --]

Hi!

> > Maybe I figured it out. Unfortunately, it is only compile tested. Does
> > it look approximately right?
> 
> Yep that's definitely better. Just one thing missing (see below),
> otherwise it looks good.

Thanks for review. Unfortunately it is still untested, so...

> > +		if (res < 0)
> > +			mtd->ecc_stats.failed++;
> 
> 		else
> 			mtd->ecc_stats.corrected += res;

Ok, I copied this from tango_nand. I'll submit a patch to fix it
there...

> > +
> > +		bitflips = max(res, bitflips);
> > +		buf += pkt_size;
> > +		ecc += ecc_size;
> > +	}
> > +
> > +	mtd_ooblayout_ecc(mtd, 1, &oobregion);
> > +	BUG_ON(oobregion.length);
> 
> Probably something you should check at registration time only.

Drive defensively, buy a tank ;-). Ok, I'll delete this one in final
version.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH] tango_nand.c: fix ecc.stats_corrected in empty flash case
  2017-04-21 13:49       ` Boris Brezillon
  2017-04-22  7:01         ` Pavel Machek
@ 2017-04-22 10:40         ` Pavel Machek
  2017-04-24  8:58           ` Marc Gonzalez
  2017-04-23  9:58         ` tango_nand: is logic right in error cases? (was Re: fsl_ifc_nand: are blank pages protected by ECC?) Pavel Machek
  2 siblings, 1 reply; 42+ messages in thread
From: Pavel Machek @ 2017-04-22 10:40 UTC (permalink / raw)
  To: Boris Brezillon, marc_gonzalez
  Cc: Dipen.Dudhat, richard, dwmw2, computersforpeace, marek.vasut,
	cyrille.pitchen, linux-mtd, linux-kernel, mark.marshall, b44839,
	prabhakar

[-- Attachment #1: Type: text/plain, Size: 792 bytes --]

Fix ecc.stats_corrected in empty flash case.

Signed-off-by: Pavel Machek <pavel@denx.de>

---

This was suggested by Boris Brezillon in another context. Not tested;
I don't have the hardware.

diff --git a/drivers/mtd/nand/tango_nand.c b/drivers/mtd/nand/tango_nand.c
index 4a5e948..db4bff4 100644
--- a/drivers/mtd/nand/tango_nand.c
+++ b/drivers/mtd/nand/tango_nand.c
@@ -193,6 +193,8 @@ static int check_erased_page(struct nand_chip *chip, u8 *buf)
 						  chip->ecc.strength);
 		if (res < 0)
 			mtd->ecc_stats.failed++;
+		else
+			mtd->ecc_stats.corrected += res;
 
 		bitflips = max(res, bitflips);
 		buf += pkt_size;

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* tango_nand: is logic right in error cases? (was Re: fsl_ifc_nand: are blank pages protected by ECC?)
  2017-04-21 13:49       ` Boris Brezillon
  2017-04-22  7:01         ` Pavel Machek
  2017-04-22 10:40         ` [PATCH] tango_nand.c: fix ecc.stats_corrected in empty flash case Pavel Machek
@ 2017-04-23  9:58         ` Pavel Machek
  2017-04-24  7:12           ` Boris Brezillon
  2 siblings, 1 reply; 42+ messages in thread
From: Pavel Machek @ 2017-04-23  9:58 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Dipen.Dudhat, richard, dwmw2, computersforpeace, marek.vasut,
	cyrille.pitchen, linux-mtd, linux-kernel, mark.marshall, b44839,
	prabhakar

[-- Attachment #1: Type: text/plain, Size: 1594 bytes --]

Hi!

> > Maybe I figured it out. Unfortunately, it is only compile tested. Does
> > it look approximately right?
> 
> Yep that's definitely better. Just one thing missing (see below),
> otherwise it looks good.

I'm copying from tango_nand, therefore I had to check tango_nand, too.

static int check_erased_page(struct nand_chip *chip, u8 *buf)
{
...
                res = nand_check_erased_ecc_chunk(buf, pkt_size, ecc, ecc_size,
                                                  meta, meta_len,
                                                  chip->ecc.strength);
                if (res < 0)
                        mtd->ecc_stats.failed++;
                else
                        mtd->ecc_stats.corrected += res;

                bitflips = max(res, bitflips);
...
        return bitflips;
}

static int tango_read_page(struct mtd_info *mtd, struct nand_chip *chip,
                           u8 *buf, int oob_required, int page)
{
...
        res = decode_error_report(nfc);
        if (res < 0) {
                chip->ecc.read_oob_raw(mtd, chip, page);
                res = check_erased_page(chip, buf);
        }

        return res;
}


So nand_check_erased_ecc_chunk() returns < 0 (failed ECC), but then we
perform max() with bitflips (lets say 1, correctable ECC) and return
1? tango_read_page then returns 1 (correctable ECC) forgetting about
failed ECC...?

Best regards,
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: tango_nand: is logic right in error cases? (was Re: fsl_ifc_nand: are blank pages protected by ECC?)
  2017-04-23  9:58         ` tango_nand: is logic right in error cases? (was Re: fsl_ifc_nand: are blank pages protected by ECC?) Pavel Machek
@ 2017-04-24  7:12           ` Boris Brezillon
  0 siblings, 0 replies; 42+ messages in thread
From: Boris Brezillon @ 2017-04-24  7:12 UTC (permalink / raw)
  To: Pavel Machek
  Cc: richard, mark.marshall, linux-kernel, marek.vasut, linux-mtd,
	Dipen.Dudhat, cyrille.pitchen, computersforpeace, dwmw2,
	prabhakar, b44839

On Sun, 23 Apr 2017 11:58:45 +0200
Pavel Machek <pavel@ucw.cz> wrote:

> Hi!
> 
> > > Maybe I figured it out. Unfortunately, it is only compile tested. Does
> > > it look approximately right?  
> > 
> > Yep that's definitely better. Just one thing missing (see below),
> > otherwise it looks good.  
> 
> I'm copying from tango_nand, therefore I had to check tango_nand, too.
> 
> static int check_erased_page(struct nand_chip *chip, u8 *buf)
> {
> ...
>                 res = nand_check_erased_ecc_chunk(buf, pkt_size, ecc, ecc_size,
>                                                   meta, meta_len,
>                                                   chip->ecc.strength);
>                 if (res < 0)
>                         mtd->ecc_stats.failed++;
>                 else
>                         mtd->ecc_stats.corrected += res;
> 
>                 bitflips = max(res, bitflips);
> ...
>         return bitflips;
> }
> 
> static int tango_read_page(struct mtd_info *mtd, struct nand_chip *chip,
>                            u8 *buf, int oob_required, int page)
> {
> ...
>         res = decode_error_report(nfc);
>         if (res < 0) {
>                 chip->ecc.read_oob_raw(mtd, chip, page);
>                 res = check_erased_page(chip, buf);
>         }
> 
>         return res;
> }
> 
> 
> So nand_check_erased_ecc_chunk() returns < 0 (failed ECC), but then we
> perform max() with bitflips (lets say 1, correctable ECC) and return
> 1? tango_read_page then returns 1 (correctable ECC) forgetting about
> failed ECC...?

Yep, that's expected, see what's done in the core to detect
uncorrectable errors and return EBADMSG when appropriate [1].

[1]http://lxr.free-electrons.com/source/drivers/mtd/nand/nand_base.c#L2033

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] tango_nand.c: fix ecc.stats_corrected in empty flash case
  2017-04-22 10:40         ` [PATCH] tango_nand.c: fix ecc.stats_corrected in empty flash case Pavel Machek
@ 2017-04-24  8:58           ` Marc Gonzalez
  2017-04-24  9:03             ` Pavel Machek
                               ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Marc Gonzalez @ 2017-04-24  8:58 UTC (permalink / raw)
  To: Pavel Machek, Boris Brezillon; +Cc: linux-mtd, Thibaud Cornic, Mason

[ Trimming CC list ]

On 22/04/2017 12:40, Pavel Machek wrote:

> Fix ecc.stats_corrected in empty flash case.
> 
> Signed-off-by: Pavel Machek <pavel@denx.de>
> 
> ---
> 
> This was suggested by Boris Brezillon in another context. Not tested;
> I don't have the hardware.
> 
> diff --git a/drivers/mtd/nand/tango_nand.c b/drivers/mtd/nand/tango_nand.c
> index 4a5e948..db4bff4 100644
> --- a/drivers/mtd/nand/tango_nand.c
> +++ b/drivers/mtd/nand/tango_nand.c
> @@ -193,6 +193,8 @@ static int check_erased_page(struct nand_chip *chip, u8 *buf)
>  						  chip->ecc.strength);
>  		if (res < 0)
>  			mtd->ecc_stats.failed++;
> +		else
> +			mtd->ecc_stats.corrected += res;
>  
>  		bitflips = max(res, bitflips);
>  		buf += pkt_size;
> 

Hello Pavel,

You may have noticed that ecc_stats.corrected is not updated in
decode_error_report() which is the main code path, i.e. the path
that will succeed 99.99% of the time (HW read).

It turns out that the HW does not report the number of errors
corrected in a page... Instead it reports two values:
1) U = number of errors corrected in the first packet/step
2) V = max number of errors corrected in other packets/steps

Thus, it is not possible to determine the actual number of errors
corrected in a page (unless V is 0). Otherwise, we just have an
interval; let n be the number of packets/steps:

U + V <= corrected errors count <= U + (n-1)*V

In my opinion, it is better to provide no information than to
provide incorrect information. Therefore, I did not update
ecc_stats.corrected in decode_error_report().

One could argue that updating ecc_stats.corrected in
check_erased_page() sets the correct value, since the error
counts are computed in software for each step. But updating
the value here is IMO pointless if we can't do it in the main
code path.

Regards.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] tango_nand.c: fix ecc.stats_corrected in empty flash case
  2017-04-24  8:58           ` Marc Gonzalez
@ 2017-04-24  9:03             ` Pavel Machek
  2017-05-02  9:42             ` Boris Brezillon
  2017-05-03 20:04             ` Pavel Machek
  2 siblings, 0 replies; 42+ messages in thread
From: Pavel Machek @ 2017-04-24  9:03 UTC (permalink / raw)
  To: Marc Gonzalez; +Cc: Boris Brezillon, linux-mtd, Thibaud Cornic, Mason

[-- Attachment #1: Type: text/plain, Size: 2471 bytes --]

Hi!

> [ Trimming CC list ]
> 
> On 22/04/2017 12:40, Pavel Machek wrote:
> 
> > Fix ecc.stats_corrected in empty flash case.
> > 
> > Signed-off-by: Pavel Machek <pavel@denx.de>
> > 
> > ---
> > 
> > This was suggested by Boris Brezillon in another context. Not tested;
> > I don't have the hardware.
> > 
> > diff --git a/drivers/mtd/nand/tango_nand.c b/drivers/mtd/nand/tango_nand.c
> > index 4a5e948..db4bff4 100644
> > --- a/drivers/mtd/nand/tango_nand.c
> > +++ b/drivers/mtd/nand/tango_nand.c
> > @@ -193,6 +193,8 @@ static int check_erased_page(struct nand_chip *chip, u8 *buf)
> >  						  chip->ecc.strength);
> >  		if (res < 0)
> >  			mtd->ecc_stats.failed++;
> > +		else
> > +			mtd->ecc_stats.corrected += res;
> >  
> >  		bitflips = max(res, bitflips);
> >  		buf += pkt_size;
> > 
> 
> Hello Pavel,
> 
> You may have noticed that ecc_stats.corrected is not updated in
> decode_error_report() which is the main code path, i.e. the path
> that will succeed 99.99% of the time (HW read).
> 
> It turns out that the HW does not report the number of errors
> corrected in a page... Instead it reports two values:
> 1) U = number of errors corrected in the first packet/step
> 2) V = max number of errors corrected in other packets/steps
> 
> Thus, it is not possible to determine the actual number of errors
> corrected in a page (unless V is 0). Otherwise, we just have an
> interval; let n be the number of packets/steps:
> 
> U + V <= corrected errors count <= U + (n-1)*V
> 
> In my opinion, it is better to provide no information than to
> provide incorrect information. Therefore, I did not update
> ecc_stats.corrected in decode_error_report().
> 
> One could argue that updating ecc_stats.corrected in
> check_erased_page() sets the correct value, since the error
> counts are computed in software for each step. But updating
> the value here is IMO pointless if we can't do it in the main
> code path.

Aha, thanks for explanation... perhaps comment is worth adding? This
is certainly "interesting" property (people would conclude that
check_erased_page is buggy -- and it is buggy -- but it matches
behaviour of rest of the driver). Also... people do copy&paste in
kernel (I did :-) ) and this is quite a trap for them.

Thanks,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] tango_nand.c: fix ecc.stats_corrected in empty flash case
  2017-04-24  8:58           ` Marc Gonzalez
  2017-04-24  9:03             ` Pavel Machek
@ 2017-05-02  9:42             ` Boris Brezillon
  2017-05-02 11:52               ` Marc Gonzalez
  2017-05-03 20:04             ` Pavel Machek
  2 siblings, 1 reply; 42+ messages in thread
From: Boris Brezillon @ 2017-05-02  9:42 UTC (permalink / raw)
  To: Marc Gonzalez; +Cc: Pavel Machek, linux-mtd, Thibaud Cornic, Mason

Hi Marc,

On Mon, 24 Apr 2017 10:58:47 +0200
Marc Gonzalez <marc_gonzalez@sigmadesigns.com> wrote:

> [ Trimming CC list ]
> 
> On 22/04/2017 12:40, Pavel Machek wrote:
> 
> > Fix ecc.stats_corrected in empty flash case.
> > 
> > Signed-off-by: Pavel Machek <pavel@denx.de>
> > 
> > ---
> > 
> > This was suggested by Boris Brezillon in another context. Not tested;
> > I don't have the hardware.
> > 
> > diff --git a/drivers/mtd/nand/tango_nand.c b/drivers/mtd/nand/tango_nand.c
> > index 4a5e948..db4bff4 100644
> > --- a/drivers/mtd/nand/tango_nand.c
> > +++ b/drivers/mtd/nand/tango_nand.c
> > @@ -193,6 +193,8 @@ static int check_erased_page(struct nand_chip *chip, u8 *buf)
> >  						  chip->ecc.strength);
> >  		if (res < 0)
> >  			mtd->ecc_stats.failed++;
> > +		else
> > +			mtd->ecc_stats.corrected += res;
> >  
> >  		bitflips = max(res, bitflips);
> >  		buf += pkt_size;
> >   
> 
> Hello Pavel,
> 
> You may have noticed that ecc_stats.corrected is not updated in
> decode_error_report() which is the main code path, i.e. the path
> that will succeed 99.99% of the time (HW read).
> 
> It turns out that the HW does not report the number of errors
> corrected in a page... Instead it reports two values:
> 1) U = number of errors corrected in the first packet/step
> 2) V = max number of errors corrected in other packets/steps
> 
> Thus, it is not possible to determine the actual number of errors
> corrected in a page (unless V is 0). Otherwise, we just have an
> interval; let n be the number of packets/steps:
> 
> U + V <= corrected errors count <= U + (n-1)*V
> 
> In my opinion, it is better to provide no information than to
> provide incorrect information. Therefore, I did not update
> ecc_stats.corrected in decode_error_report().

Hm, not sure I agree with that. The situation is far from ideal, but
some userspace tools query the number of corrected bits before and
after doing a read operation to report the number of bitflips that
have been correcte. Letting users think there were no bitflips at all is
not a good thing IMO.

> 
> One could argue that updating ecc_stats.corrected in
> check_erased_page() sets the correct value, since the error
> counts are computed in software for each step. But updating
> the value here is IMO pointless if we can't do it in the main
> code path.

Well, it's not pointless if you want to inform the user that some
bitflips were present on the NAND. Yes, the numbers you report
won't be accurate in your case, but at least the user can tell when
bitflips were discovered.

Note that ideally, we should have per-page (or per-block) max-bitflips
information (where max-bitflips is the number returned by
ecc->read_page()), because counting the total number of corrected bits
is certainly not helping when it comes to checking how reliable a NAND
page/eraseblock is.

Regards,

Boris

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] tango_nand.c: fix ecc.stats_corrected in empty flash case
  2017-05-02  9:42             ` Boris Brezillon
@ 2017-05-02 11:52               ` Marc Gonzalez
  2017-05-02 12:20                 ` Boris Brezillon
  0 siblings, 1 reply; 42+ messages in thread
From: Marc Gonzalez @ 2017-05-02 11:52 UTC (permalink / raw)
  To: Boris Brezillon; +Cc: Pavel Machek, linux-mtd, Thibaud Cornic, Mason

On 02/05/2017 11:42, Boris Brezillon wrote:

> Hi Marc,
> 
> On Mon, 24 Apr 2017 10:58:47 +0200
> Marc Gonzalez <marc_gonzalez@sigmadesigns.com> wrote:
> 
>> [ Trimming CC list ]
>>
>> On 22/04/2017 12:40, Pavel Machek wrote:
>>
>>> Fix ecc.stats_corrected in empty flash case.
>>>
>>> Signed-off-by: Pavel Machek <pavel@denx.de>
>>>
>>> ---
>>>
>>> This was suggested by Boris Brezillon in another context. Not tested;
>>> I don't have the hardware.
>>>
>>> diff --git a/drivers/mtd/nand/tango_nand.c b/drivers/mtd/nand/tango_nand.c
>>> index 4a5e948..db4bff4 100644
>>> --- a/drivers/mtd/nand/tango_nand.c
>>> +++ b/drivers/mtd/nand/tango_nand.c
>>> @@ -193,6 +193,8 @@ static int check_erased_page(struct nand_chip *chip, u8 *buf)
>>>  						  chip->ecc.strength);
>>>  		if (res < 0)
>>>  			mtd->ecc_stats.failed++;
>>> +		else
>>> +			mtd->ecc_stats.corrected += res;
>>>  
>>>  		bitflips = max(res, bitflips);
>>>  		buf += pkt_size;
>>>   
>>
>> Hello Pavel,
>>
>> You may have noticed that ecc_stats.corrected is not updated in
>> decode_error_report() which is the main code path, i.e. the path
>> that will succeed 99.99% of the time (HW read).
>>
>> It turns out that the HW does not report the number of errors
>> corrected in a page... Instead it reports two values:
>> 1) U = number of errors corrected in the first packet/step
>> 2) V = max number of errors corrected in other packets/steps
>>
>> Thus, it is not possible to determine the actual number of errors
>> corrected in a page (unless V is 0). Otherwise, we just have an
>> interval; let n be the number of packets/steps:
>>
>> U + V <= corrected errors count <= U + (n-1)*V
>>
>> In my opinion, it is better to provide no information than to
>> provide incorrect information. Therefore, I did not update
>> ecc_stats.corrected in decode_error_report().
> 
> Hm, not sure I agree with that. The situation is far from ideal, but
> some userspace tools query the number of corrected bits before and
> after doing a read operation to report the number of bitflips that
> have been correcte. Letting users think there were no bitflips at all is
> not a good thing IMO.

I (still) find the API of ecc->read_page somewhat confusing ;-)
Let me try to work through the assumptions, as I remember.

1) If the driver was not able to read the page,
then we return an error code < 0

2) If the driver was able to read the page, and there were
no bitflips, then we return 0

3) If the driver was able to read the page, and there were
less than 'strength' bitflips in each step, then we return
the *max* of all bitflips across all steps

4) If the driver was able to read the page, but there was
at least one step with too many bitflips, then we are
expected to increment mtd->ecc_stats.failed, and return
the max of all bitflips across successful steps

AFAICT, the NAND framework does not use or update ecc_stats.corrected?
What about the MTD layer?


>> One could argue that updating ecc_stats.corrected in
>> check_erased_page() sets the correct value, since the error
>> counts are computed in software for each step. But updating
>> the value here is IMO pointless if we can't do it in the main
>> code path.
> 
> Well, it's not pointless if you want to inform the user that some
> bitflips were present on the NAND. Yes, the numbers you report
> won't be accurate in your case, but at least the user can tell when
> bitflips were discovered.

I do report some bitflips value: it's the return value from
ecc->read_page -- which is the max bitflips per step. But perhaps
that value is not available to user-space? In other words,
the NAND and MTD layers do not forward it to user-space, as this
is left as a responsibility of individual drivers?

> Note that ideally, we should have per-page (or per-block) max-bitflips
> information (where max-bitflips is the number returned by
> ecc->read_page()), because counting the total number of corrected bits
> is certainly not helping when it comes to checking how reliable a NAND
> page/eraseblock is.

That would be quite large an array for a 1 TB NAND chip?

Regards.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] tango_nand.c: fix ecc.stats_corrected in empty flash case
  2017-05-02 11:52               ` Marc Gonzalez
@ 2017-05-02 12:20                 ` Boris Brezillon
  2017-05-03 20:02                   ` Pavel Machek
  0 siblings, 1 reply; 42+ messages in thread
From: Boris Brezillon @ 2017-05-02 12:20 UTC (permalink / raw)
  To: Marc Gonzalez; +Cc: Pavel Machek, linux-mtd, Thibaud Cornic, Mason

On Tue, 2 May 2017 13:52:30 +0200
Marc Gonzalez <marc_gonzalez@sigmadesigns.com> wrote:

> On 02/05/2017 11:42, Boris Brezillon wrote:
> 
> > Hi Marc,
> > 
> > On Mon, 24 Apr 2017 10:58:47 +0200
> > Marc Gonzalez <marc_gonzalez@sigmadesigns.com> wrote:
> >   
> >> [ Trimming CC list ]
> >>
> >> On 22/04/2017 12:40, Pavel Machek wrote:
> >>  
> >>> Fix ecc.stats_corrected in empty flash case.
> >>>
> >>> Signed-off-by: Pavel Machek <pavel@denx.de>
> >>>
> >>> ---
> >>>
> >>> This was suggested by Boris Brezillon in another context. Not tested;
> >>> I don't have the hardware.
> >>>
> >>> diff --git a/drivers/mtd/nand/tango_nand.c b/drivers/mtd/nand/tango_nand.c
> >>> index 4a5e948..db4bff4 100644
> >>> --- a/drivers/mtd/nand/tango_nand.c
> >>> +++ b/drivers/mtd/nand/tango_nand.c
> >>> @@ -193,6 +193,8 @@ static int check_erased_page(struct nand_chip *chip, u8 *buf)
> >>>  						  chip->ecc.strength);
> >>>  		if (res < 0)
> >>>  			mtd->ecc_stats.failed++;
> >>> +		else
> >>> +			mtd->ecc_stats.corrected += res;
> >>>  
> >>>  		bitflips = max(res, bitflips);
> >>>  		buf += pkt_size;
> >>>     
> >>
> >> Hello Pavel,
> >>
> >> You may have noticed that ecc_stats.corrected is not updated in
> >> decode_error_report() which is the main code path, i.e. the path
> >> that will succeed 99.99% of the time (HW read).
> >>
> >> It turns out that the HW does not report the number of errors
> >> corrected in a page... Instead it reports two values:
> >> 1) U = number of errors corrected in the first packet/step
> >> 2) V = max number of errors corrected in other packets/steps
> >>
> >> Thus, it is not possible to determine the actual number of errors
> >> corrected in a page (unless V is 0). Otherwise, we just have an
> >> interval; let n be the number of packets/steps:
> >>
> >> U + V <= corrected errors count <= U + (n-1)*V
> >>
> >> In my opinion, it is better to provide no information than to
> >> provide incorrect information. Therefore, I did not update
> >> ecc_stats.corrected in decode_error_report().  
> > 
> > Hm, not sure I agree with that. The situation is far from ideal, but
> > some userspace tools query the number of corrected bits before and
> > after doing a read operation to report the number of bitflips that
> > have been correcte. Letting users think there were no bitflips at all is
> > not a good thing IMO.  
> 
> I (still) find the API of ecc->read_page somewhat confusing ;-)
> Let me try to work through the assumptions, as I remember.
> 
> 1) If the driver was not able to read the page,
> then we return an error code < 0
> 
> 2) If the driver was able to read the page, and there were
> no bitflips, then we return 0
> 
> 3) If the driver was able to read the page, and there were
> less than 'strength' bitflips in each step, then we return
> the *max* of all bitflips across all steps
> 
> 4) If the driver was able to read the page, but there was
> at least one step with too many bitflips, then we are
> expected to increment mtd->ecc_stats.failed, and return
> the max of all bitflips across successful steps

Yep, you got it right.

> 
> AFAICT, the NAND framework does not use or update ecc_stats.corrected?

Nope, but this information is exposed to userspace, and some tools
(like nanddump) use it to calculate the number of bitflips found in a
page. It's definitely not reliable, because someone could have read
different portion of the MTD device in parallel, thus invalidating the
stats initially retrieved by nanddump, but it works as long as no-one
accesses the MTD device while nandump is used.

> What about the MTD layer?

Not sure what you mean by the MTD layer? The MTD layer itself does not
use this information, but it exposes it to its users, so MTD users
might use this information.

> 
> 
> >> One could argue that updating ecc_stats.corrected in
> >> check_erased_page() sets the correct value, since the error
> >> counts are computed in software for each step. But updating
> >> the value here is IMO pointless if we can't do it in the main
> >> code path.  
> > 
> > Well, it's not pointless if you want to inform the user that some
> > bitflips were present on the NAND. Yes, the numbers you report
> > won't be accurate in your case, but at least the user can tell when
> > bitflips were discovered.  
> 
> I do report some bitflips value: it's the return value from
> ecc->read_page -- which is the max bitflips per step. But perhaps
> that value is not available to user-space?

It's indeed not exposed to userspace. It's only used by the MTD core to
decide when to return -EUCLEAN (see here [1]).

> In other words,
> the NAND and MTD layers do not forward it to user-space, as this
> is left as a responsibility of individual drivers?

I don't know the exact reason, but I guess no one needed it so far.
IOW, everyone was happy with the existing corrected/failed stats.

> 
> > Note that ideally, we should have per-page (or per-block) max-bitflips
> > information (where max-bitflips is the number returned by
> > ecc->read_page()), because counting the total number of corrected bits
> > is certainly not helping when it comes to checking how reliable a NAND
> > page/eraseblock is.  
> 
> That would be quite large an array for a 1 TB NAND chip?

The page/eraseblock size tend to grow with total chip size, but I
agree, keeping max-bitflips on a per-page basis has a
non-negligible cost.

Anyway, I think we went too far. I was just arguing that never updating
stats->corrected just because you don't know the exact number of
bitflips is not necessarily a better idea than updating corrected with
a potentially invalid number of bitflips. At least the 2nd approach
shows that bitflips were present on the media.

Regards,

Boris

[1]http://lxr.free-electrons.com/source/drivers/mtd/mtdcore.c#L1043

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] tango_nand.c: fix ecc.stats_corrected in empty flash case
  2017-05-02 12:20                 ` Boris Brezillon
@ 2017-05-03 20:02                   ` Pavel Machek
  0 siblings, 0 replies; 42+ messages in thread
From: Pavel Machek @ 2017-05-03 20:02 UTC (permalink / raw)
  To: Boris Brezillon; +Cc: Marc Gonzalez, linux-mtd, Thibaud Cornic, Mason

[-- Attachment #1: Type: text/plain, Size: 572 bytes --]

Hi!

> Anyway, I think we went too far. I was just arguing that never updating
> stats->corrected just because you don't know the exact number of
> bitflips is not necessarily a better idea than updating corrected with
> a potentially invalid number of bitflips. At least the 2nd approach
> shows that bitflips were present on the media.

I have to agree here. Indication of "we have errors" is important.

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] tango_nand.c: fix ecc.stats_corrected in empty flash case
  2017-04-24  8:58           ` Marc Gonzalez
  2017-04-24  9:03             ` Pavel Machek
  2017-05-02  9:42             ` Boris Brezillon
@ 2017-05-03 20:04             ` Pavel Machek
  2017-05-04  8:42               ` Boris Brezillon
  2 siblings, 1 reply; 42+ messages in thread
From: Pavel Machek @ 2017-05-03 20:04 UTC (permalink / raw)
  To: Marc Gonzalez; +Cc: Boris Brezillon, linux-mtd, Thibaud Cornic, Mason

[-- Attachment #1: Type: text/plain, Size: 2074 bytes --]

Hi!
On Mon 2017-04-24 10:58:47, Marc Gonzalez wrote:
> [ Trimming CC list ]
> 
> On 22/04/2017 12:40, Pavel Machek wrote:
> 
> > Fix ecc.stats_corrected in empty flash case.
> > 
> > Signed-off-by: Pavel Machek <pavel@denx.de>
> > 
> > ---
> > 
> > This was suggested by Boris Brezillon in another context. Not tested;
> > I don't have the hardware.
> > 
> > diff --git a/drivers/mtd/nand/tango_nand.c b/drivers/mtd/nand/tango_nand.c
> > index 4a5e948..db4bff4 100644
> > --- a/drivers/mtd/nand/tango_nand.c
> > +++ b/drivers/mtd/nand/tango_nand.c
> > @@ -193,6 +193,8 @@ static int check_erased_page(struct nand_chip *chip, u8 *buf)
> >  						  chip->ecc.strength);
> >  		if (res < 0)
> >  			mtd->ecc_stats.failed++;
> > +		else
> > +			mtd->ecc_stats.corrected += res;
> >  
> >  		bitflips = max(res, bitflips);
> >  		buf += pkt_size;
> > 
> 
> Hello Pavel,
> 
> You may have noticed that ecc_stats.corrected is not updated in
> decode_error_report() which is the main code path, i.e. the path
> that will succeed 99.99% of the time (HW read).
> 
> It turns out that the HW does not report the number of errors
> corrected in a page... Instead it reports two values:
> 1) U = number of errors corrected in the first packet/step
> 2) V = max number of errors corrected in other packets/steps
> 
> Thus, it is not possible to determine the actual number of errors
> corrected in a page (unless V is 0). Otherwise, we just have an
> interval; let n be the number of packets/steps:
> 
> U + V <= corrected errors count <= U + (n-1)*V
> 
> In my opinion, it is better to provide no information than to
> provide incorrect information. Therefore, I did not update
> ecc_stats.corrected in decode_error_report().

Well... Having corrected ECC errors is pretty rare, right? So one
solution would be to re-compute ECCs in software if we see U or V >
0...

Regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] tango_nand.c: fix ecc.stats_corrected in empty flash case
  2017-05-03 20:04             ` Pavel Machek
@ 2017-05-04  8:42               ` Boris Brezillon
  2017-05-12 15:34                 ` [PATCH] mtd: nand: tango: Update ecc_stats.corrected Marc Gonzalez
  2017-05-17 12:04                 ` [PATCH] tango_nand.c: fix ecc.stats_corrected in empty flash case Pavel Machek
  0 siblings, 2 replies; 42+ messages in thread
From: Boris Brezillon @ 2017-05-04  8:42 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Marc Gonzalez, Thibaud Cornic, linux-mtd, Mason

On Wed, 3 May 2017 22:04:27 +0200
Pavel Machek <pavel@ucw.cz> wrote:

> Hi!
> On Mon 2017-04-24 10:58:47, Marc Gonzalez wrote:
> > [ Trimming CC list ]
> > 
> > On 22/04/2017 12:40, Pavel Machek wrote:
> >   
> > > Fix ecc.stats_corrected in empty flash case.
> > > 
> > > Signed-off-by: Pavel Machek <pavel@denx.de>
> > > 
> > > ---
> > > 
> > > This was suggested by Boris Brezillon in another context. Not tested;
> > > I don't have the hardware.
> > > 
> > > diff --git a/drivers/mtd/nand/tango_nand.c b/drivers/mtd/nand/tango_nand.c
> > > index 4a5e948..db4bff4 100644
> > > --- a/drivers/mtd/nand/tango_nand.c
> > > +++ b/drivers/mtd/nand/tango_nand.c
> > > @@ -193,6 +193,8 @@ static int check_erased_page(struct nand_chip *chip, u8 *buf)
> > >  						  chip->ecc.strength);
> > >  		if (res < 0)
> > >  			mtd->ecc_stats.failed++;
> > > +		else
> > > +			mtd->ecc_stats.corrected += res;
> > >  
> > >  		bitflips = max(res, bitflips);
> > >  		buf += pkt_size;
> > >   
> > 
> > Hello Pavel,
> > 
> > You may have noticed that ecc_stats.corrected is not updated in
> > decode_error_report() which is the main code path, i.e. the path
> > that will succeed 99.99% of the time (HW read).
> > 
> > It turns out that the HW does not report the number of errors
> > corrected in a page... Instead it reports two values:
> > 1) U = number of errors corrected in the first packet/step
> > 2) V = max number of errors corrected in other packets/steps
> > 
> > Thus, it is not possible to determine the actual number of errors
> > corrected in a page (unless V is 0). Otherwise, we just have an
> > interval; let n be the number of packets/steps:
> > 
> > U + V <= corrected errors count <= U + (n-1)*V
> > 
> > In my opinion, it is better to provide no information than to
> > provide incorrect information. Therefore, I did not update
> > ecc_stats.corrected in decode_error_report().  
> 
> Well... Having corrected ECC errors is pretty rare, right?

Depends on the NAND chip. On modern SLC NAND chips requiring
ECC of 8bits/512bytes are likely to have frequent bitflips.

> So one
> solution would be to re-compute ECCs in software if we see U or V >
> 0...

Hm, not sure it's worth the trouble for statistics that are anyway
rarely used, and when they are, are only used has a metric to determine
how worn the NAND is.

I'd prefer to see a better user-space interface returning the
max_bitflips information when someone reads from an MTD device (see [1])
rather than trying to fix drivers to return the exact number of
corrected bitflips (which might be impossible for some of them anyway).

[1]http://lists.infradead.org/pipermail/linux-mtd/2016-April/067187.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH] mtd: nand: tango: Update ecc_stats.corrected
  2017-05-04  8:42               ` Boris Brezillon
@ 2017-05-12 15:34                 ` Marc Gonzalez
  2017-05-15  8:56                   ` Boris Brezillon
  2017-05-15 20:47                   ` Boris Brezillon
  2017-05-17 12:04                 ` [PATCH] tango_nand.c: fix ecc.stats_corrected in empty flash case Pavel Machek
  1 sibling, 2 replies; 42+ messages in thread
From: Marc Gonzalez @ 2017-05-12 15:34 UTC (permalink / raw)
  To: Boris Brezillon, Pavel Machek
  Cc: linux-mtd, Mason, Richard Weinberger, David Woodhouse, Brian Norris

According to Boris, some user-space tools expect MTD drivers to
update ecc_stats.corrected, and it's better to provide a lower
bound than to provide no information at all.

Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
---
NB: this patch is based on 4.9; if it looks good, I'll rebase
it on v4.12-rc1 next week.
---
 drivers/mtd/nand/tango_nand.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/mtd/nand/tango_nand.c b/drivers/mtd/nand/tango_nand.c
index 4a5e948c62df..471348462e42 100644
--- a/drivers/mtd/nand/tango_nand.c
+++ b/drivers/mtd/nand/tango_nand.c
@@ -55,10 +55,10 @@
  * byte 1 for other packets in the page (PKT_N, for N > 0)
  * ERR_COUNT_PKT_N is the max error count over all but the first packet.
  */
-#define DECODE_OK_PKT_0(v)	((v) & BIT(7))
-#define DECODE_OK_PKT_N(v)	((v) & BIT(15))
 #define ERR_COUNT_PKT_0(v)	(((v) >> 0) & 0x3f)
 #define ERR_COUNT_PKT_N(v)	(((v) >> 8) & 0x3f)
+#define DECODE_FAIL_PKT_0(v)	(((v) & BIT(7)) == 0)
+#define DECODE_FAIL_PKT_N(v)	(((v) & BIT(15)) == 0)
 
 /* Offsets relative to pbus_base */
 #define PBUS_CS_CTRL	0x83c
@@ -193,6 +193,8 @@ static int check_erased_page(struct nand_chip *chip, u8 *buf)
 						  chip->ecc.strength);
 		if (res < 0)
 			mtd->ecc_stats.failed++;
+		else
+			mtd->ecc_stats.corrected += res;
 
 		bitflips = max(res, bitflips);
 		buf += pkt_size;
@@ -202,9 +204,10 @@ static int check_erased_page(struct nand_chip *chip, u8 *buf)
 	return bitflips;
 }
 
-static int decode_error_report(struct tango_nfc *nfc)
+static int decode_error_report(struct nand_chip *chip, struct tango_nfc *nfc)
 {
 	u32 status, res;
+	struct mtd_info *mtd = nand_to_mtd(chip);
 
 	status = readl_relaxed(nfc->reg_base + NFC_XFER_STATUS);
 	if (status & PAGE_IS_EMPTY)
@@ -212,10 +215,14 @@ static int decode_error_report(struct tango_nfc *nfc)
 
 	res = readl_relaxed(nfc->mem_base + ERROR_REPORT);
 
-	if (DECODE_OK_PKT_0(res) && DECODE_OK_PKT_N(res))
-		return max(ERR_COUNT_PKT_0(res), ERR_COUNT_PKT_N(res));
+	if (DECODE_FAIL_PKT_0(res) || DECODE_FAIL_PKT_N(res))
+		return -EBADMSG;
+
+	/* ERR_COUNT_PKT_N is max, not sum, but that's all we have */
+	mtd->ecc_stats.corrected +=
+		ERR_COUNT_PKT_0(res) + ERR_COUNT_PKT_N(res);
 
-	return -EBADMSG;
+	return max(ERR_COUNT_PKT_0(res), ERR_COUNT_PKT_N(res));
 }
 
 static void tango_dma_callback(void *arg)
@@ -280,7 +287,7 @@ static int tango_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 	if (err)
 		return err;
 
-	res = decode_error_report(nfc);
+	res = decode_error_report(chip, nfc);
 	if (res < 0) {
 		chip->ecc.read_oob_raw(mtd, chip, page);
 		res = check_erased_page(chip, buf);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: tango: Update ecc_stats.corrected
  2017-05-12 15:34                 ` [PATCH] mtd: nand: tango: Update ecc_stats.corrected Marc Gonzalez
@ 2017-05-15  8:56                   ` Boris Brezillon
  2017-05-15 20:47                   ` Boris Brezillon
  1 sibling, 0 replies; 42+ messages in thread
From: Boris Brezillon @ 2017-05-15  8:56 UTC (permalink / raw)
  To: Marc Gonzalez
  Cc: Pavel Machek, Richard Weinberger, Brian Norris, linux-mtd,
	David Woodhouse, Mason

On Fri, 12 May 2017 17:34:01 +0200
Marc Gonzalez <marc_gonzalez@sigmadesigns.com> wrote:

> According to Boris, some user-space tools expect MTD drivers to
> update ecc_stats.corrected, and it's better to provide a lower
> bound than to provide no information at all.
> 
> Reported-by: Pavel Machek <pavel@ucw.cz>
> Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
> ---
> NB: this patch is based on 4.9; if it looks good, I'll rebase
> it on v4.12-rc1 next week.

No need to resend, it seems to apply cleanly.

> ---
>  drivers/mtd/nand/tango_nand.c | 21 ++++++++++++++-------
>  1 file changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/mtd/nand/tango_nand.c b/drivers/mtd/nand/tango_nand.c
> index 4a5e948c62df..471348462e42 100644
> --- a/drivers/mtd/nand/tango_nand.c
> +++ b/drivers/mtd/nand/tango_nand.c
> @@ -55,10 +55,10 @@
>   * byte 1 for other packets in the page (PKT_N, for N > 0)
>   * ERR_COUNT_PKT_N is the max error count over all but the first packet.
>   */
> -#define DECODE_OK_PKT_0(v)	((v) & BIT(7))
> -#define DECODE_OK_PKT_N(v)	((v) & BIT(15))
>  #define ERR_COUNT_PKT_0(v)	(((v) >> 0) & 0x3f)
>  #define ERR_COUNT_PKT_N(v)	(((v) >> 8) & 0x3f)
> +#define DECODE_FAIL_PKT_0(v)	(((v) & BIT(7)) == 0)
> +#define DECODE_FAIL_PKT_N(v)	(((v) & BIT(15)) == 0)
>  
>  /* Offsets relative to pbus_base */
>  #define PBUS_CS_CTRL	0x83c
> @@ -193,6 +193,8 @@ static int check_erased_page(struct nand_chip *chip, u8 *buf)
>  						  chip->ecc.strength);
>  		if (res < 0)
>  			mtd->ecc_stats.failed++;
> +		else
> +			mtd->ecc_stats.corrected += res;
>  
>  		bitflips = max(res, bitflips);
>  		buf += pkt_size;
> @@ -202,9 +204,10 @@ static int check_erased_page(struct nand_chip *chip, u8 *buf)
>  	return bitflips;
>  }
>  
> -static int decode_error_report(struct tango_nfc *nfc)
> +static int decode_error_report(struct nand_chip *chip, struct tango_nfc *nfc)

You could just pass chip here and extract nfc using to_tango_nfc(), but
that's just a tiny detail.

Looks good otherwise.

Thanks,

Boris

>  {
>  	u32 status, res;
> +	struct mtd_info *mtd = nand_to_mtd(chip);
>  
>  	status = readl_relaxed(nfc->reg_base + NFC_XFER_STATUS);
>  	if (status & PAGE_IS_EMPTY)
> @@ -212,10 +215,14 @@ static int decode_error_report(struct tango_nfc *nfc)
>  
>  	res = readl_relaxed(nfc->mem_base + ERROR_REPORT);
>  
> -	if (DECODE_OK_PKT_0(res) && DECODE_OK_PKT_N(res))
> -		return max(ERR_COUNT_PKT_0(res), ERR_COUNT_PKT_N(res));
> +	if (DECODE_FAIL_PKT_0(res) || DECODE_FAIL_PKT_N(res))
> +		return -EBADMSG;
> +
> +	/* ERR_COUNT_PKT_N is max, not sum, but that's all we have */
> +	mtd->ecc_stats.corrected +=
> +		ERR_COUNT_PKT_0(res) + ERR_COUNT_PKT_N(res);
>  
> -	return -EBADMSG;
> +	return max(ERR_COUNT_PKT_0(res), ERR_COUNT_PKT_N(res));
>  }
>  
>  static void tango_dma_callback(void *arg)
> @@ -280,7 +287,7 @@ static int tango_read_page(struct mtd_info *mtd, struct nand_chip *chip,
>  	if (err)
>  		return err;
>  
> -	res = decode_error_report(nfc);
> +	res = decode_error_report(chip, nfc);
>  	if (res < 0) {
>  		chip->ecc.read_oob_raw(mtd, chip, page);
>  		res = check_erased_page(chip, buf);

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: tango: Update ecc_stats.corrected
  2017-05-12 15:34                 ` [PATCH] mtd: nand: tango: Update ecc_stats.corrected Marc Gonzalez
  2017-05-15  8:56                   ` Boris Brezillon
@ 2017-05-15 20:47                   ` Boris Brezillon
  1 sibling, 0 replies; 42+ messages in thread
From: Boris Brezillon @ 2017-05-15 20:47 UTC (permalink / raw)
  To: Marc Gonzalez
  Cc: Pavel Machek, linux-mtd, Mason, Richard Weinberger,
	David Woodhouse, Brian Norris

On Fri, 12 May 2017 17:34:01 +0200
Marc Gonzalez <marc_gonzalez@sigmadesigns.com> wrote:

> According to Boris, some user-space tools expect MTD drivers to
> update ecc_stats.corrected, and it's better to provide a lower
> bound than to provide no information at all.
> 
> Reported-by: Pavel Machek <pavel@ucw.cz>
> Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>

Applied to nand/fixes after changing decode_error_report() prototype
and adding Fixes+Cc-stable tags (as discussed on IRC).

Thanks,

Boris

> ---
> NB: this patch is based on 4.9; if it looks good, I'll rebase
> it on v4.12-rc1 next week.
> ---
>  drivers/mtd/nand/tango_nand.c | 21 ++++++++++++++-------
>  1 file changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/mtd/nand/tango_nand.c b/drivers/mtd/nand/tango_nand.c
> index 4a5e948c62df..471348462e42 100644
> --- a/drivers/mtd/nand/tango_nand.c
> +++ b/drivers/mtd/nand/tango_nand.c
> @@ -55,10 +55,10 @@
>   * byte 1 for other packets in the page (PKT_N, for N > 0)
>   * ERR_COUNT_PKT_N is the max error count over all but the first packet.
>   */
> -#define DECODE_OK_PKT_0(v)	((v) & BIT(7))
> -#define DECODE_OK_PKT_N(v)	((v) & BIT(15))
>  #define ERR_COUNT_PKT_0(v)	(((v) >> 0) & 0x3f)
>  #define ERR_COUNT_PKT_N(v)	(((v) >> 8) & 0x3f)
> +#define DECODE_FAIL_PKT_0(v)	(((v) & BIT(7)) == 0)
> +#define DECODE_FAIL_PKT_N(v)	(((v) & BIT(15)) == 0)
>  
>  /* Offsets relative to pbus_base */
>  #define PBUS_CS_CTRL	0x83c
> @@ -193,6 +193,8 @@ static int check_erased_page(struct nand_chip *chip, u8 *buf)
>  						  chip->ecc.strength);
>  		if (res < 0)
>  			mtd->ecc_stats.failed++;
> +		else
> +			mtd->ecc_stats.corrected += res;
>  
>  		bitflips = max(res, bitflips);
>  		buf += pkt_size;
> @@ -202,9 +204,10 @@ static int check_erased_page(struct nand_chip *chip, u8 *buf)
>  	return bitflips;
>  }
>  
> -static int decode_error_report(struct tango_nfc *nfc)
> +static int decode_error_report(struct nand_chip *chip, struct tango_nfc *nfc)
>  {
>  	u32 status, res;
> +	struct mtd_info *mtd = nand_to_mtd(chip);
>  
>  	status = readl_relaxed(nfc->reg_base + NFC_XFER_STATUS);
>  	if (status & PAGE_IS_EMPTY)
> @@ -212,10 +215,14 @@ static int decode_error_report(struct tango_nfc *nfc)
>  
>  	res = readl_relaxed(nfc->mem_base + ERROR_REPORT);
>  
> -	if (DECODE_OK_PKT_0(res) && DECODE_OK_PKT_N(res))
> -		return max(ERR_COUNT_PKT_0(res), ERR_COUNT_PKT_N(res));
> +	if (DECODE_FAIL_PKT_0(res) || DECODE_FAIL_PKT_N(res))
> +		return -EBADMSG;
> +
> +	/* ERR_COUNT_PKT_N is max, not sum, but that's all we have */
> +	mtd->ecc_stats.corrected +=
> +		ERR_COUNT_PKT_0(res) + ERR_COUNT_PKT_N(res);
>  
> -	return -EBADMSG;
> +	return max(ERR_COUNT_PKT_0(res), ERR_COUNT_PKT_N(res));
>  }
>  
>  static void tango_dma_callback(void *arg)
> @@ -280,7 +287,7 @@ static int tango_read_page(struct mtd_info *mtd, struct nand_chip *chip,
>  	if (err)
>  		return err;
>  
> -	res = decode_error_report(nfc);
> +	res = decode_error_report(chip, nfc);
>  	if (res < 0) {
>  		chip->ecc.read_oob_raw(mtd, chip, page);
>  		res = check_erased_page(chip, buf);

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] nand_base: optimize checking of erased buffers
  2017-04-21 10:51         ` [PATCH] nand_base: optimize checking of erased buffers Pavel Machek
@ 2017-05-17 11:27           ` Mason
  2017-05-17 11:39             ` Mason
  2017-05-17 11:52             ` Pavel Machek
  2017-05-17 12:22           ` [PATCH] fsl_ifc_nand: fix handing of bit flips in erased nand Pavel Machek
  1 sibling, 2 replies; 42+ messages in thread
From: Mason @ 2017-05-17 11:27 UTC (permalink / raw)
  To: Pavel Machek, Boris Brezillon
  Cc: Richard Weinberger, David Woodhouse, Brian Norris, Marek Vasut,
	Cyrille Pitchen, linux-mtd, LKML, mark.marshall, b44839,
	prabhakar

On 21/04/2017 12:51, Pavel Machek wrote:

> If we see ~0UL in flash, there's no need for hweight, and no need to
> check number of bitflips. So this should be net win.
> 
> Signed-off-by: Pavel Machek <pavel@denx.de>
> 
> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> index b0524f8..96c27ec 100644
> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -1357,7 +1357,10 @@ static int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
>  
>  	for (; len >= sizeof(long);
>  	     len -= sizeof(long), bitmap += sizeof(long)) {
> -		weight = hweight_long(*((unsigned long *)bitmap));

I hadn't noticed this earlier. There is, obviously, an implicit
requirement that 'buf' must be 4-byte aligned on 32-bit platforms,
and 8-byte aligned on 64-bit platforms.

This is not true for my platform, as the ecc pointer is
chip->oob_poi + 10

I suppose it's not a problem if the platform can handle
unaligned loads, otherwise it spells trouble.


> +		unsigned long d = *((unsigned long *)bitmap);
> +		if (d == ~0UL)
> +			continue;
> +		weight = hweight_long(d);
>  		bitflips += BITS_PER_LONG - weight;
>  		if (unlikely(bitflips > bitflips_threshold))
>  			return -EBADMSG;
> 

The optimization makes sense in itself, but given that it's on an
error path (?) I'm not sure it will bring any tangible benefits?

Regards.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] nand_base: optimize checking of erased buffers
  2017-05-17 11:27           ` Mason
@ 2017-05-17 11:39             ` Mason
  2017-05-17 11:52             ` Pavel Machek
  1 sibling, 0 replies; 42+ messages in thread
From: Mason @ 2017-05-17 11:39 UTC (permalink / raw)
  To: Pavel Machek, Boris Brezillon
  Cc: Richard Weinberger, David Woodhouse, Brian Norris, Marek Vasut,
	Cyrille Pitchen, linux-mtd, LKML, mark.marshall, b44839,
	prabhakar

On 17/05/2017 13:27, Mason wrote:

> On 21/04/2017 12:51, Pavel Machek wrote:
> 
>> If we see ~0UL in flash, there's no need for hweight, and no need to
>> check number of bitflips. So this should be net win.
>>
>> Signed-off-by: Pavel Machek <pavel@denx.de>
>>
>> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
>> index b0524f8..96c27ec 100644
>> --- a/drivers/mtd/nand/nand_base.c
>> +++ b/drivers/mtd/nand/nand_base.c
>> @@ -1357,7 +1357,10 @@ static int nand_check_erased_buf(void *buf, int len, int bitflips_threshold)
>>  
>>  	for (; len >= sizeof(long);
>>  	     len -= sizeof(long), bitmap += sizeof(long)) {
>> -		weight = hweight_long(*((unsigned long *)bitmap));
> 
> I hadn't noticed this earlier. There is, obviously, an implicit
> requirement that 'buf' must be 4-byte aligned on 32-bit platforms,
> and 8-byte aligned on 64-bit platforms.
> 
> This is not true for my platform, as the ecc pointer is
> chip->oob_poi + 10

Doh! As Boris points out, the prologue/epilogue handle
all alignment & size issues.

Regards.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] nand_base: optimize checking of erased buffers
  2017-05-17 11:27           ` Mason
  2017-05-17 11:39             ` Mason
@ 2017-05-17 11:52             ` Pavel Machek
  1 sibling, 0 replies; 42+ messages in thread
From: Pavel Machek @ 2017-05-17 11:52 UTC (permalink / raw)
  To: Mason
  Cc: Boris Brezillon, Richard Weinberger, David Woodhouse,
	Brian Norris, Marek Vasut, Cyrille Pitchen, linux-mtd, LKML,
	mark.marshall, b44839, prabhakar

[-- Attachment #1: Type: text/plain, Size: 876 bytes --]


> This is not true for my platform, as the ecc pointer is
> chip->oob_poi + 10
> 
> I suppose it's not a problem if the platform can handle
> unaligned loads, otherwise it spells trouble.
> 
> 
> > +		unsigned long d = *((unsigned long *)bitmap);
> > +		if (d == ~0UL)
> > +			continue;
> > +		weight = hweight_long(d);
> >  		bitflips += BITS_PER_LONG - weight;
> >  		if (unlikely(bitflips > bitflips_threshold))
> >  			return -EBADMSG;
> > 
> 
> The optimization makes sense in itself, but given that it's on an
> error path (?) I'm not sure it will bring any tangible benefits?

Well, if it is critical enough to do magic it does, it is also
important enough not to do expensive bit-counting needlessly... 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] tango_nand.c: fix ecc.stats_corrected in empty flash case
  2017-05-04  8:42               ` Boris Brezillon
  2017-05-12 15:34                 ` [PATCH] mtd: nand: tango: Update ecc_stats.corrected Marc Gonzalez
@ 2017-05-17 12:04                 ` Pavel Machek
  1 sibling, 0 replies; 42+ messages in thread
From: Pavel Machek @ 2017-05-17 12:04 UTC (permalink / raw)
  To: Boris Brezillon; +Cc: Marc Gonzalez, Thibaud Cornic, linux-mtd, Mason

[-- Attachment #1: Type: text/plain, Size: 2182 bytes --]

Hi!

> > > Hello Pavel,
> > > 
> > > You may have noticed that ecc_stats.corrected is not updated in
> > > decode_error_report() which is the main code path, i.e. the path
> > > that will succeed 99.99% of the time (HW read).
> > > 
> > > It turns out that the HW does not report the number of errors
> > > corrected in a page... Instead it reports two values:
> > > 1) U = number of errors corrected in the first packet/step
> > > 2) V = max number of errors corrected in other packets/steps
> > > 
> > > Thus, it is not possible to determine the actual number of errors
> > > corrected in a page (unless V is 0). Otherwise, we just have an
> > > interval; let n be the number of packets/steps:
> > > 
> > > U + V <= corrected errors count <= U + (n-1)*V
> > > 
> > > In my opinion, it is better to provide no information than to
> > > provide incorrect information. Therefore, I did not update
> > > ecc_stats.corrected in decode_error_report().  
> > 
> > Well... Having corrected ECC errors is pretty rare, right?
> 
> Depends on the NAND chip. On modern SLC NAND chips requiring
> ECC of 8bits/512bytes are likely to have frequent bitflips.
> 
> > So one
> > solution would be to re-compute ECCs in software if we see U or V >
> > 0...
> 
> Hm, not sure it's worth the trouble for statistics that are anyway
> rarely used, and when they are, are only used has a metric to determine
> how worn the NAND is.

Well, knowing "how worn the NAND is" and "when to refresh the
data". Both seem to be quite important for storage system that works.

> I'd prefer to see a better user-space interface returning the
> max_bitflips information when someone reads from an MTD device (see [1])
> rather than trying to fix drivers to return the exact number of
> corrected bitflips (which might be impossible for some of them anyway).
> 
> > [1]http://lists.infradead.org/pipermail/linux-mtd/2016-April/067187.html

Yes, current interface leaves something to be desired...

Best regards,
										Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH] fsl_ifc_nand: fix handing of bit flips in erased nand
  2017-04-21 10:51         ` [PATCH] nand_base: optimize checking of erased buffers Pavel Machek
  2017-05-17 11:27           ` Mason
@ 2017-05-17 12:22           ` Pavel Machek
  2017-05-17 12:32             ` Boris Brezillon
  1 sibling, 1 reply; 42+ messages in thread
From: Pavel Machek @ 2017-05-17 12:22 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: richard, dwmw2, computersforpeace, marek.vasut, cyrille.pitchen,
	linux-mtd, linux-kernel, mark.marshall, b44839, prabhakar

[-- Attachment #1: Type: text/plain, Size: 4707 bytes --]

If we see unrecoverable ECC error, we need to count number of bitflips
from all-ones and report correctable/uncorrectable according to
that. Otherwise we report ECC failed on erased flash with single bit error.

Signed-off-by: Pavel Machek <pavel@denx.de>

diff --git a/drivers/mtd/nand/fsl_ifc_nand.c b/drivers/mtd/nand/fsl_ifc_nand.c
index d1570f5..1c66696 100644
--- a/drivers/mtd/nand/fsl_ifc_nand.c
+++ b/drivers/mtd/nand/fsl_ifc_nand.c
@@ -171,34 +171,6 @@ static void set_addr(struct mtd_info *mtd, int column, int page_addr, int oob)
 		ifc_nand_ctrl->index += mtd->writesize;
 }
 
-static int is_blank(struct mtd_info *mtd, unsigned int bufnum)
-{
-	struct nand_chip *chip = mtd_to_nand(mtd);
-	struct fsl_ifc_mtd *priv = nand_get_controller_data(chip);
-	u8 __iomem *addr = priv->vbase + bufnum * (mtd->writesize * 2);
-	u32 __iomem *mainarea = (u32 __iomem *)addr;
-	u8 __iomem *oob = addr + mtd->writesize;
-	struct mtd_oob_region oobregion = { };
-	int i, section = 0;
-
-	for (i = 0; i < mtd->writesize / 4; i++) {
-		if (__raw_readl(&mainarea[i]) != 0xffffffff)
-			return 0;
-	}
-
-	mtd_ooblayout_ecc(mtd, section++, &oobregion);
-	while (oobregion.length) {
-		for (i = 0; i < oobregion.length; i++) {
-			if (__raw_readb(&oob[oobregion.offset + i]) != 0xff)
-				return 0;
-		}
-
-		mtd_ooblayout_ecc(mtd, section++, &oobregion);
-	}
-
-	return 1;
-}
-
 /* returns nonzero if entire page is blank */
 static int check_read_ecc(struct mtd_info *mtd, struct fsl_ifc_ctrl *ctrl,
 			  u32 *eccstat, unsigned int bufnum)
@@ -274,16 +246,13 @@ static void fsl_ifc_run_command(struct mtd_info *mtd)
 			if (errors == 15) {
 				/*
 				 * Uncorrectable error.
-				 * OK only if the whole page is blank.
+				 * We'll check for blank pages later.
 				 *
 				 * We disable ECCER reporting due to...
 				 * erratum IFC-A002770 -- so report it now if we
 				 * see an uncorrectable error in ECCSTAT.
 				 */
-				if (!is_blank(mtd, bufnum))
-					ctrl->nand_stat |=
-						IFC_NAND_EVTER_STAT_ECCER;
-				break;
+				ctrl->nand_stat |= IFC_NAND_EVTER_STAT_ECCER;
 			}
 
 			mtd->ecc_stats.corrected += errors;
@@ -678,6 +648,41 @@ static int fsl_ifc_wait(struct mtd_info *mtd, struct nand_chip *chip)
 	return nand_fsr | NAND_STATUS_WP;
 }
 
+/*
+ * The controller does not check for bitflips in erased pages,
+ * therefore software must check instead.
+ */
+static int check_erased_page(struct nand_chip *chip, u8 *buf)
+{
+	struct mtd_info *mtd = nand_to_mtd(chip);
+	u8 *ecc = chip->oob_poi;
+	const int ecc_size = chip->ecc.bytes;
+	const int pkt_size = chip->ecc.size;
+	int i, res, bitflips = 0;
+	struct mtd_oob_region oobregion = { };
+
+	mtd_ooblayout_ecc(mtd, 0, &oobregion);
+	ecc += oobregion.offset;
+
+	for (i = 0; i < chip->ecc.steps; ++i) {
+		res = nand_check_erased_ecc_chunk(buf, pkt_size, ecc, ecc_size,
+						  NULL, 0,
+						  chip->ecc.strength);
+		if (res < 0)
+			mtd->ecc_stats.failed++;
+		else
+			mtd->ecc_stats.corrected += res;
+
+		bitflips = max(res, bitflips);
+		buf += pkt_size;
+		ecc += ecc_size;
+	}
+
+	mtd_ooblayout_ecc(mtd, 1, &oobregion);
+
+	return bitflips;
+}
+
 static int fsl_ifc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 			     uint8_t *buf, int oob_required, int page)
 {
@@ -689,11 +695,19 @@ static int fsl_ifc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 	if (oob_required)
 		fsl_ifc_read_buf(mtd, chip->oob_poi, mtd->oobsize);
 
-	if (ctrl->nand_stat & IFC_NAND_EVTER_STAT_ECCER)
-		dev_err(priv->dev, "NAND Flash ECC Uncorrectable Error\n");
-
 	if (ctrl->nand_stat != IFC_NAND_EVTER_STAT_OPC)
 		mtd->ecc_stats.failed++;
+
+	if (ctrl->nand_stat & IFC_NAND_EVTER_STAT_ECCER) {
+		int res;
+
+		if (!oob_required)
+			fsl_ifc_read_buf(mtd, chip->oob_poi, mtd->oobsize);
+
+		res = check_erased_page(chip, buf);
+		return res;
+	}
+
 
 	return nctrl->max_bitflips;
 }
@@ -904,6 +922,21 @@ static int fsl_ifc_chip_init(struct fsl_ifc_mtd *priv)
 		chip->ecc.algo = NAND_ECC_HAMMING;
 	}
 
+	{
+		struct mtd_oob_region oobregion = { };
+
+		mtd_ooblayout_ecc(mtd, 0, &oobregion);
+		if (!oobregion.length) {
+			dev_err(priv->dev, "No ECC in oobregion?\n");
+			return -EINVAL;
+		}
+		mtd_ooblayout_ecc(mtd, 1, &oobregion);
+		if (oobregion.length) {
+			dev_err(priv->dev, "Extra data in oobregion?\n");
+			return -EINVAL;
+		}
+	}
+
 	if (ctrl->version == FSL_IFC_VERSION_1_1_0)
 		fsl_ifc_sram_init(priv);
 



-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH] fsl_ifc_nand: fix handing of bit flips in erased nand
  2017-05-17 12:22           ` [PATCH] fsl_ifc_nand: fix handing of bit flips in erased nand Pavel Machek
@ 2017-05-17 12:32             ` Boris Brezillon
  2017-05-17 13:00               ` Pavel Machek
  0 siblings, 1 reply; 42+ messages in thread
From: Boris Brezillon @ 2017-05-17 12:32 UTC (permalink / raw)
  To: Pavel Machek
  Cc: richard, dwmw2, computersforpeace, marek.vasut, cyrille.pitchen,
	linux-mtd, linux-kernel, mark.marshall, b44839, prabhakar

On Wed, 17 May 2017 14:22:24 +0200
Pavel Machek <pavel@ucw.cz> wrote:

> If we see unrecoverable ECC error, we need to count number of bitflips
> from all-ones and report correctable/uncorrectable according to
> that. Otherwise we report ECC failed on erased flash with single bit error.
> 
> Signed-off-by: Pavel Machek <pavel@denx.de>
> 
> diff --git a/drivers/mtd/nand/fsl_ifc_nand.c b/drivers/mtd/nand/fsl_ifc_nand.c
> index d1570f5..1c66696 100644
> --- a/drivers/mtd/nand/fsl_ifc_nand.c
> +++ b/drivers/mtd/nand/fsl_ifc_nand.c
> @@ -171,34 +171,6 @@ static void set_addr(struct mtd_info *mtd, int column, int page_addr, int oob)
>  		ifc_nand_ctrl->index += mtd->writesize;
>  }
>  
> -static int is_blank(struct mtd_info *mtd, unsigned int bufnum)
> -{
> -	struct nand_chip *chip = mtd_to_nand(mtd);
> -	struct fsl_ifc_mtd *priv = nand_get_controller_data(chip);
> -	u8 __iomem *addr = priv->vbase + bufnum * (mtd->writesize * 2);
> -	u32 __iomem *mainarea = (u32 __iomem *)addr;
> -	u8 __iomem *oob = addr + mtd->writesize;
> -	struct mtd_oob_region oobregion = { };
> -	int i, section = 0;
> -
> -	for (i = 0; i < mtd->writesize / 4; i++) {
> -		if (__raw_readl(&mainarea[i]) != 0xffffffff)
> -			return 0;
> -	}
> -
> -	mtd_ooblayout_ecc(mtd, section++, &oobregion);
> -	while (oobregion.length) {
> -		for (i = 0; i < oobregion.length; i++) {
> -			if (__raw_readb(&oob[oobregion.offset + i]) != 0xff)
> -				return 0;
> -		}
> -
> -		mtd_ooblayout_ecc(mtd, section++, &oobregion);
> -	}
> -
> -	return 1;
> -}
> -
>  /* returns nonzero if entire page is blank */
>  static int check_read_ecc(struct mtd_info *mtd, struct fsl_ifc_ctrl *ctrl,
>  			  u32 *eccstat, unsigned int bufnum)
> @@ -274,16 +246,13 @@ static void fsl_ifc_run_command(struct mtd_info *mtd)
>  			if (errors == 15) {
>  				/*
>  				 * Uncorrectable error.
> -				 * OK only if the whole page is blank.
> +				 * We'll check for blank pages later.
>  				 *
>  				 * We disable ECCER reporting due to...
>  				 * erratum IFC-A002770 -- so report it now if we
>  				 * see an uncorrectable error in ECCSTAT.
>  				 */
> -				if (!is_blank(mtd, bufnum))
> -					ctrl->nand_stat |=
> -						IFC_NAND_EVTER_STAT_ECCER;
> -				break;
> +				ctrl->nand_stat |= IFC_NAND_EVTER_STAT_ECCER;
>  			}
>  
>  			mtd->ecc_stats.corrected += errors;
> @@ -678,6 +648,41 @@ static int fsl_ifc_wait(struct mtd_info *mtd, struct nand_chip *chip)
>  	return nand_fsr | NAND_STATUS_WP;
>  }
>  
> +/*
> + * The controller does not check for bitflips in erased pages,
> + * therefore software must check instead.
> + */
> +static int check_erased_page(struct nand_chip *chip, u8 *buf)
> +{
> +	struct mtd_info *mtd = nand_to_mtd(chip);
> +	u8 *ecc = chip->oob_poi;
> +	const int ecc_size = chip->ecc.bytes;
> +	const int pkt_size = chip->ecc.size;
> +	int i, res, bitflips = 0;
> +	struct mtd_oob_region oobregion = { };
> +
> +	mtd_ooblayout_ecc(mtd, 0, &oobregion);
> +	ecc += oobregion.offset;
> +
> +	for (i = 0; i < chip->ecc.steps; ++i) {
> +		res = nand_check_erased_ecc_chunk(buf, pkt_size, ecc, ecc_size,
> +						  NULL, 0,
> +						  chip->ecc.strength);
> +		if (res < 0)
> +			mtd->ecc_stats.failed++;
> +		else
> +			mtd->ecc_stats.corrected += res;
> +
> +		bitflips = max(res, bitflips);
> +		buf += pkt_size;
> +		ecc += ecc_size;
> +	}
> +
> +	mtd_ooblayout_ecc(mtd, 1, &oobregion);

Why is this needed?

> +
> +	return bitflips;
> +}
> +
>  static int fsl_ifc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
>  			     uint8_t *buf, int oob_required, int page)
>  {
> @@ -689,11 +695,19 @@ static int fsl_ifc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
>  	if (oob_required)
>  		fsl_ifc_read_buf(mtd, chip->oob_poi, mtd->oobsize);
>  
> -	if (ctrl->nand_stat & IFC_NAND_EVTER_STAT_ECCER)
> -		dev_err(priv->dev, "NAND Flash ECC Uncorrectable Error\n");
> -
>  	if (ctrl->nand_stat != IFC_NAND_EVTER_STAT_OPC)
>  		mtd->ecc_stats.failed++;
> +
> +	if (ctrl->nand_stat & IFC_NAND_EVTER_STAT_ECCER) {
> +		int res;
> +
> +		if (!oob_required)
> +			fsl_ifc_read_buf(mtd, chip->oob_poi, mtd->oobsize);
> +
> +		res = check_erased_page(chip, buf);
> +		return res;
> +	}
> +
>  
>  	return nctrl->max_bitflips;
>  }
> @@ -904,6 +922,21 @@ static int fsl_ifc_chip_init(struct fsl_ifc_mtd *priv)
>  		chip->ecc.algo = NAND_ECC_HAMMING;
>  	}
>  
> +	{
> +		struct mtd_oob_region oobregion = { };
> +
> +		mtd_ooblayout_ecc(mtd, 0, &oobregion);
> +		if (!oobregion.length) {
> +			dev_err(priv->dev, "No ECC in oobregion?\n");
> +			return -EINVAL;
> +		}
> +		mtd_ooblayout_ecc(mtd, 1, &oobregion);
> +		if (oobregion.length) {
> +			dev_err(priv->dev, "Extra data in oobregion?\n");
> +			return -EINVAL;
> +		}
> +	}

This clearly doesn't belong in this patch. And if you really want to
check that, please create a separate function instead of defining a
non-conditional code block inside fsl_ifc_chip_init().

> +
>  	if (ctrl->version == FSL_IFC_VERSION_1_1_0)
>  		fsl_ifc_sram_init(priv);
>  
> 
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] fsl_ifc_nand: fix handing of bit flips in erased nand
  2017-05-17 12:32             ` Boris Brezillon
@ 2017-05-17 13:00               ` Pavel Machek
  2017-05-17 13:25                 ` Boris Brezillon
  0 siblings, 1 reply; 42+ messages in thread
From: Pavel Machek @ 2017-05-17 13:00 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: richard, dwmw2, computersforpeace, marek.vasut, cyrille.pitchen,
	linux-mtd, linux-kernel, mark.marshall, b44839, prabhakar

[-- Attachment #1: Type: text/plain, Size: 2689 bytes --]

Hi!

> On Wed, 17 May 2017 14:22:24 +0200
> Pavel Machek <pavel@ucw.cz> wrote:
> 
> > If we see unrecoverable ECC error, we need to count number of bitflips
> > from all-ones and report correctable/uncorrectable according to
> > that. Otherwise we report ECC failed on erased flash with single bit error.
> > 
> > Signed-off-by: Pavel Machek <pavel@denx.de>
> > 
> > @@ -678,6 +648,41 @@ static int fsl_ifc_wait(struct mtd_info *mtd, struct nand_chip *chip)
> >  	return nand_fsr | NAND_STATUS_WP;
> >  }
> >  
> > +/*
> > + * The controller does not check for bitflips in erased pages,
> > + * therefore software must check instead.
> > + */
> > +static int check_erased_page(struct nand_chip *chip, u8 *buf)
> > +{
> > +	struct mtd_info *mtd = nand_to_mtd(chip);
> > +	u8 *ecc = chip->oob_poi;
> > +	const int ecc_size = chip->ecc.bytes;
> > +	const int pkt_size = chip->ecc.size;
> > +	int i, res, bitflips = 0;
> > +	struct mtd_oob_region oobregion = { };
> > +
> > +	mtd_ooblayout_ecc(mtd, 0, &oobregion);
> > +	ecc += oobregion.offset;
> > +
> > +	for (i = 0; i < chip->ecc.steps; ++i) {
> > +		res = nand_check_erased_ecc_chunk(buf, pkt_size, ecc, ecc_size,
> > +						  NULL, 0,
> > +						  chip->ecc.strength);
> > +		if (res < 0)
> > +			mtd->ecc_stats.failed++;
> > +		else
> > +			mtd->ecc_stats.corrected += res;
> > +
> > +		bitflips = max(res, bitflips);
> > +		buf += pkt_size;
> > +		ecc += ecc_size;
> > +	}
> > +
> > +	mtd_ooblayout_ecc(mtd, 1, &oobregion);
> 
> Why is this needed?

It is not, will remove.

> > @@ -904,6 +922,21 @@ static int fsl_ifc_chip_init(struct fsl_ifc_mtd *priv)
> >  		chip->ecc.algo = NAND_ECC_HAMMING;
> >  	}
> >  
> > +	{
> > +		struct mtd_oob_region oobregion = { };
> > +
> > +		mtd_ooblayout_ecc(mtd, 0, &oobregion);
> > +		if (!oobregion.length) {
> > +			dev_err(priv->dev, "No ECC in oobregion?\n");
> > +			return -EINVAL;
> > +		}
> > +		mtd_ooblayout_ecc(mtd, 1, &oobregion);
> > +		if (oobregion.length) {
> > +			dev_err(priv->dev, "Extra data in oobregion?\n");
> > +			return -EINVAL;
> > +		}
> > +	}
> 
> This clearly doesn't belong in this patch. And if you really want to
> check that, please create a separate function instead of defining a
> non-conditional code block inside fsl_ifc_chip_init().

I am not sure I want to check that. check_erased_page() can only
handle layout with just one oobregion. If you think check is not
needed, I'll happily remove the checking.

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] fsl_ifc_nand: fix handing of bit flips in erased nand
  2017-05-17 13:00               ` Pavel Machek
@ 2017-05-17 13:25                 ` Boris Brezillon
  2017-05-17 20:03                   ` [PATCH] mtd: nand: fsl_ifc: fix handing of bit flips in erased pages Pavel Machek
  0 siblings, 1 reply; 42+ messages in thread
From: Boris Brezillon @ 2017-05-17 13:25 UTC (permalink / raw)
  To: Pavel Machek
  Cc: richard, dwmw2, computersforpeace, marek.vasut, cyrille.pitchen,
	linux-mtd, linux-kernel, mark.marshall, b44839, prabhakar

On Wed, 17 May 2017 15:00:59 +0200
Pavel Machek <pavel@ucw.cz> wrote:

> 
> > > @@ -904,6 +922,21 @@ static int fsl_ifc_chip_init(struct fsl_ifc_mtd *priv)
> > >  		chip->ecc.algo = NAND_ECC_HAMMING;
> > >  	}
> > >  
> > > +	{
> > > +		struct mtd_oob_region oobregion = { };
> > > +
> > > +		mtd_ooblayout_ecc(mtd, 0, &oobregion);
> > > +		if (!oobregion.length) {
> > > +			dev_err(priv->dev, "No ECC in oobregion?\n");
> > > +			return -EINVAL;
> > > +		}
> > > +		mtd_ooblayout_ecc(mtd, 1, &oobregion);
> > > +		if (oobregion.length) {
> > > +			dev_err(priv->dev, "Extra data in oobregion?\n");
> > > +			return -EINVAL;
> > > +		}
> > > +	}  
> > 
> > This clearly doesn't belong in this patch. And if you really want to
> > check that, please create a separate function instead of defining a
> > non-conditional code block inside fsl_ifc_chip_init().  
> 
> I am not sure I want to check that. check_erased_page() can only
> handle layout with just one oobregion. If you think check is not
> needed, I'll happily remove the checking.

Let's drop it then. And please change the subject to "mtd: nand:
fsl_ifc: fix handing of bit flips in erased pages".

Thanks,

Boris

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH] mtd: nand: fsl_ifc: fix handing of bit flips in erased pages
  2017-05-17 13:25                 ` Boris Brezillon
@ 2017-05-17 20:03                   ` Pavel Machek
  2017-05-31 20:59                     ` [PATCHv2] " Pavel Machek
  0 siblings, 1 reply; 42+ messages in thread
From: Pavel Machek @ 2017-05-17 20:03 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: richard, dwmw2, computersforpeace, marek.vasut, cyrille.pitchen,
	linux-mtd, linux-kernel, mark.marshall, b44839, prabhakar

[-- Attachment #1: Type: text/plain, Size: 4086 bytes --]

If we see unrecoverable ECC error, we need to count number of bitflips
from all-ones and report correctable/uncorrectable according to
that. Otherwise we report ECC failed on erased flash with single bit error.

Signed-off-by: Pavel Machek <pavel@denx.de>

diff --git a/drivers/mtd/nand/fsl_ifc_nand.c b/drivers/mtd/nand/fsl_ifc_nand.c
index d1570f5..1c66696 100644
--- a/drivers/mtd/nand/fsl_ifc_nand.c
+++ b/drivers/mtd/nand/fsl_ifc_nand.c
@@ -171,34 +171,6 @@ static void set_addr(struct mtd_info *mtd, int column, int page_addr, int oob)
 		ifc_nand_ctrl->index += mtd->writesize;
 }
 
-static int is_blank(struct mtd_info *mtd, unsigned int bufnum)
-{
-	struct nand_chip *chip = mtd_to_nand(mtd);
-	struct fsl_ifc_mtd *priv = nand_get_controller_data(chip);
-	u8 __iomem *addr = priv->vbase + bufnum * (mtd->writesize * 2);
-	u32 __iomem *mainarea = (u32 __iomem *)addr;
-	u8 __iomem *oob = addr + mtd->writesize;
-	struct mtd_oob_region oobregion = { };
-	int i, section = 0;
-
-	for (i = 0; i < mtd->writesize / 4; i++) {
-		if (__raw_readl(&mainarea[i]) != 0xffffffff)
-			return 0;
-	}
-
-	mtd_ooblayout_ecc(mtd, section++, &oobregion);
-	while (oobregion.length) {
-		for (i = 0; i < oobregion.length; i++) {
-			if (__raw_readb(&oob[oobregion.offset + i]) != 0xff)
-				return 0;
-		}
-
-		mtd_ooblayout_ecc(mtd, section++, &oobregion);
-	}
-
-	return 1;
-}
-
 /* returns nonzero if entire page is blank */
 static int check_read_ecc(struct mtd_info *mtd, struct fsl_ifc_ctrl *ctrl,
 			  u32 *eccstat, unsigned int bufnum)
@@ -274,16 +246,13 @@ static void fsl_ifc_run_command(struct mtd_info *mtd)
 			if (errors == 15) {
 				/*
 				 * Uncorrectable error.
-				 * OK only if the whole page is blank.
+				 * We'll check for blank pages later.
 				 *
 				 * We disable ECCER reporting due to...
 				 * erratum IFC-A002770 -- so report it now if we
 				 * see an uncorrectable error in ECCSTAT.
 				 */
-				if (!is_blank(mtd, bufnum))
-					ctrl->nand_stat |=
-						IFC_NAND_EVTER_STAT_ECCER;
-				break;
+				ctrl->nand_stat |= IFC_NAND_EVTER_STAT_ECCER;
 			}
 
 			mtd->ecc_stats.corrected += errors;
@@ -678,6 +648,39 @@ static int fsl_ifc_wait(struct mtd_info *mtd, struct nand_chip *chip)
 	return nand_fsr | NAND_STATUS_WP;
 }
 
+/*
+ * The controller does not check for bitflips in erased pages,
+ * therefore software must check instead.
+ */
+static int check_erased_page(struct nand_chip *chip, u8 *buf)
+{
+	struct mtd_info *mtd = nand_to_mtd(chip);
+	u8 *ecc = chip->oob_poi;
+	const int ecc_size = chip->ecc.bytes;
+	const int pkt_size = chip->ecc.size;
+	int i, res, bitflips = 0;
+	struct mtd_oob_region oobregion;
+
+	mtd_ooblayout_ecc(mtd, 0, &oobregion);
+	ecc += oobregion.offset;
+
+	for (i = 0; i < chip->ecc.steps; ++i) {
+		res = nand_check_erased_ecc_chunk(buf, pkt_size, ecc, ecc_size,
+						  NULL, 0,
+						  chip->ecc.strength);
+		if (res < 0)
+			mtd->ecc_stats.failed++;
+		else
+			mtd->ecc_stats.corrected += res;
+
+		bitflips = max(res, bitflips);
+		buf += pkt_size;
+		ecc += ecc_size;
+	}
+
+	return bitflips;
+}
+
 static int fsl_ifc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 			     uint8_t *buf, int oob_required, int page)
 {
@@ -689,11 +695,18 @@ static int fsl_ifc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 	if (oob_required)
 		fsl_ifc_read_buf(mtd, chip->oob_poi, mtd->oobsize);
 
-	if (ctrl->nand_stat & IFC_NAND_EVTER_STAT_ECCER)
-		dev_err(priv->dev, "NAND Flash ECC Uncorrectable Error\n");
-
 	if (ctrl->nand_stat != IFC_NAND_EVTER_STAT_OPC)
 		mtd->ecc_stats.failed++;
+
+	if (ctrl->nand_stat & IFC_NAND_EVTER_STAT_ECCER) {
+		int res;
+
+		if (!oob_required)
+			fsl_ifc_read_buf(mtd, chip->oob_poi, mtd->oobsize);
+
+		res = check_erased_page(chip, buf);
+		return res;
+	}
 
 	return nctrl->max_bitflips;
 }


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCHv2] mtd: nand: fsl_ifc: fix handing of bit flips in erased pages
  2017-05-17 20:03                   ` [PATCH] mtd: nand: fsl_ifc: fix handing of bit flips in erased pages Pavel Machek
@ 2017-05-31 20:59                     ` Pavel Machek
  2017-05-31 22:59                       ` Darwin Dingel
  2017-06-01  1:09                       ` Darwin Dingel
  0 siblings, 2 replies; 42+ messages in thread
From: Pavel Machek @ 2017-05-31 20:59 UTC (permalink / raw)
  To: Boris Brezillon, Darwin Dingel
  Cc: richard, dwmw2, computersforpeace, marek.vasut, cyrille.pitchen,
	linux-mtd, linux-kernel, mark.marshall, b44839, prabhakar

[-- Attachment #1: Type: text/plain, Size: 4231 bytes --]

If we see unrecoverable ECC error, we need to count number of bitflips
from all-ones and report correctable/uncorrectable according to
that. Otherwise we report ECC failed on erased flash with single bit error.

Signed-off-by: Pavel Machek <pavel@denx.de>
Reported-by: Darwin Dingel <Darwin.Dingel@alliedtelesis.co.nz>

---

v2: I got order wrong, thus always reporting errors. Thanks to Darwin
who found the problem.


diff --git a/drivers/mtd/nand/fsl_ifc_nand.c b/drivers/mtd/nand/fsl_ifc_nand.c
index d1570f5..052a7e3 100644
--- a/drivers/mtd/nand/fsl_ifc_nand.c
+++ b/drivers/mtd/nand/fsl_ifc_nand.c
@@ -171,34 +171,6 @@ static void set_addr(struct mtd_info *mtd, int column, int page_addr, int oob)
 		ifc_nand_ctrl->index += mtd->writesize;
 }
 
-static int is_blank(struct mtd_info *mtd, unsigned int bufnum)
-{
-	struct nand_chip *chip = mtd_to_nand(mtd);
-	struct fsl_ifc_mtd *priv = nand_get_controller_data(chip);
-	u8 __iomem *addr = priv->vbase + bufnum * (mtd->writesize * 2);
-	u32 __iomem *mainarea = (u32 __iomem *)addr;
-	u8 __iomem *oob = addr + mtd->writesize;
-	struct mtd_oob_region oobregion = { };
-	int i, section = 0;
-
-	for (i = 0; i < mtd->writesize / 4; i++) {
-		if (__raw_readl(&mainarea[i]) != 0xffffffff)
-			return 0;
-	}
-
-	mtd_ooblayout_ecc(mtd, section++, &oobregion);
-	while (oobregion.length) {
-		for (i = 0; i < oobregion.length; i++) {
-			if (__raw_readb(&oob[oobregion.offset + i]) != 0xff)
-				return 0;
-		}
-
-		mtd_ooblayout_ecc(mtd, section++, &oobregion);
-	}
-
-	return 1;
-}
-
 /* returns nonzero if entire page is blank */
 static int check_read_ecc(struct mtd_info *mtd, struct fsl_ifc_ctrl *ctrl,
 			  u32 *eccstat, unsigned int bufnum)
@@ -274,16 +246,14 @@ static void fsl_ifc_run_command(struct mtd_info *mtd)
 			if (errors == 15) {
 				/*
 				 * Uncorrectable error.
-				 * OK only if the whole page is blank.
+				 * We'll check for blank pages later.
 				 *
 				 * We disable ECCER reporting due to...
 				 * erratum IFC-A002770 -- so report it now if we
 				 * see an uncorrectable error in ECCSTAT.
 				 */
-				if (!is_blank(mtd, bufnum))
-					ctrl->nand_stat |=
-						IFC_NAND_EVTER_STAT_ECCER;
-				break;
+				ctrl->nand_stat |= IFC_NAND_EVTER_STAT_ECCER;
+				continue;
 			}
 
 			mtd->ecc_stats.corrected += errors;
@@ -678,6 +648,38 @@ static int fsl_ifc_wait(struct mtd_info *mtd, struct nand_chip *chip)
 	return nand_fsr | NAND_STATUS_WP;
 }
 
+/*
+ * The controller does not check for bitflips in erased pages,
+ * therefore software must check instead.
+ */
+static int check_erased_page(struct nand_chip *chip, u8 *buf)
+{
+	struct mtd_info *mtd = nand_to_mtd(chip);
+	u8 *ecc = chip->oob_poi;
+	const int ecc_size = chip->ecc.bytes;
+	const int pkt_size = chip->ecc.size;
+	int i, res, bitflips = 0;
+	struct mtd_oob_region oobregion = { };
+
+	mtd_ooblayout_ecc(mtd, 0, &oobregion);
+	ecc += oobregion.offset;
+
+	for (i = 0; i < chip->ecc.steps; ++i) {
+		res = nand_check_erased_ecc_chunk(buf, pkt_size, ecc, ecc_size,
+						  NULL, 0,
+						  chip->ecc.strength);
+		if (res < 0)
+			mtd->ecc_stats.failed++;
+		else
+			mtd->ecc_stats.corrected += res;
+
+		bitflips = max(res, bitflips);
+		buf += pkt_size;
+		ecc += ecc_size;
+	}
+	return bitflips;
+}
+
 static int fsl_ifc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 			     uint8_t *buf, int oob_required, int page)
 {
@@ -689,8 +691,15 @@ static int fsl_ifc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 	if (oob_required)
 		fsl_ifc_read_buf(mtd, chip->oob_poi, mtd->oobsize);
 
-	if (ctrl->nand_stat & IFC_NAND_EVTER_STAT_ECCER)
-		dev_err(priv->dev, "NAND Flash ECC Uncorrectable Error\n");
+	if (ctrl->nand_stat & IFC_NAND_EVTER_STAT_ECCER) {
+		int res;
+
+		if (!oob_required)
+			fsl_ifc_read_buf(mtd, chip->oob_poi, mtd->oobsize);
+
+		res = check_erased_page(chip, buf);
+		return res;
+	}
 
 	if (ctrl->nand_stat != IFC_NAND_EVTER_STAT_OPC)
 		mtd->ecc_stats.failed++;

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCHv2] mtd: nand: fsl_ifc: fix handing of bit flips in erased pages
  2017-05-31 20:59                     ` [PATCHv2] " Pavel Machek
@ 2017-05-31 22:59                       ` Darwin Dingel
  2017-06-01  1:09                       ` Darwin Dingel
  1 sibling, 0 replies; 42+ messages in thread
From: Darwin Dingel @ 2017-05-31 22:59 UTC (permalink / raw)
  To: Pavel Machek, Boris Brezillon
  Cc: richard, dwmw2, computersforpeace, marek.vasut, cyrille.pitchen,
	linux-mtd, linux-kernel, mark.marshall, b44839, prabhakar


On 01/06/17 08:59, Pavel Machek wrote:
> If we see unrecoverable ECC error, we need to count number of bitflips
> from all-ones and report correctable/uncorrectable according to
> that. Otherwise we report ECC failed on erased flash with single bit error.
> 
> Signed-off-by: Pavel Machek <pavel@denx.de>
> Reported-by: Darwin Dingel <Darwin.Dingel@alliedtelesis.co.nz>
> 

Acked-by: Darwin Dingel <Darwin.Dingel@alliedtelesis.co.nz>


Cheers,
Darwin

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCHv2] mtd: nand: fsl_ifc: fix handing of bit flips in erased pages
  2017-05-31 20:59                     ` [PATCHv2] " Pavel Machek
  2017-05-31 22:59                       ` Darwin Dingel
@ 2017-06-01  1:09                       ` Darwin Dingel
  2017-06-01 13:12                         ` Pavel Machek
  1 sibling, 1 reply; 42+ messages in thread
From: Darwin Dingel @ 2017-06-01  1:09 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Boris Brezillon, richard, dwmw2, computersforpeace, marek.vasut,
	cyrille.pitchen, linux-mtd, linux-kernel, mark.marshall, b44839,
	prabhakar

Hi Pavel,

Just a minor thing. Sorry about the late comment.

On 01/06/17 08:59, Pavel Machek wrote:
> +
> +		res = check_erased_page(chip, buf);
> +		return res;
> +	}

Can we just remove 'res' and change this line to:
	return check_erased_page(chip, buf);



Cheers,
Darwin

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCHv2] mtd: nand: fsl_ifc: fix handing of bit flips in erased pages
  2017-06-01  1:09                       ` Darwin Dingel
@ 2017-06-01 13:12                         ` Pavel Machek
  2017-06-01 13:21                           ` Boris Brezillon
  0 siblings, 1 reply; 42+ messages in thread
From: Pavel Machek @ 2017-06-01 13:12 UTC (permalink / raw)
  To: Darwin Dingel
  Cc: Boris Brezillon, richard, dwmw2, computersforpeace, marek.vasut,
	cyrille.pitchen, linux-mtd, linux-kernel, mark.marshall, b44839,
	prabhakar

[-- Attachment #1: Type: text/plain, Size: 697 bytes --]

On Thu 2017-06-01 01:09:17, Darwin Dingel wrote:
> Hi Pavel,
> 
> Just a minor thing. Sorry about the late comment.
> 
> On 01/06/17 08:59, Pavel Machek wrote:
> > +
> > +		res = check_erased_page(chip, buf);
> > +		return res;
> > +	}
> 
> Can we just remove 'res' and change this line to:
> 	return check_erased_page(chip, buf);

Well... I originally had a printk there, and yes, it can be
simplified. I can roll v3, if required, but I'd leave it as is -- it
is still convenient place to add debugging to.

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCHv2] mtd: nand: fsl_ifc: fix handing of bit flips in erased pages
  2017-06-01 13:12                         ` Pavel Machek
@ 2017-06-01 13:21                           ` Boris Brezillon
  2017-06-07  7:31                             ` Boris Brezillon
  0 siblings, 1 reply; 42+ messages in thread
From: Boris Brezillon @ 2017-06-01 13:21 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Darwin Dingel, richard, dwmw2, computersforpeace, marek.vasut,
	cyrille.pitchen, linux-mtd, linux-kernel, mark.marshall, b44839,
	prabhakar

On Thu, 1 Jun 2017 15:12:32 +0200
Pavel Machek <pavel@ucw.cz> wrote:

> On Thu 2017-06-01 01:09:17, Darwin Dingel wrote:
> > Hi Pavel,
> > 
> > Just a minor thing. Sorry about the late comment.
> > 
> > On 01/06/17 08:59, Pavel Machek wrote:  
> > > +
> > > +		res = check_erased_page(chip, buf);
> > > +		return res;
> > > +	}  
> > 
> > Can we just remove 'res' and change this line to:
> > 	return check_erased_page(chip, buf);  
> 
> Well... I originally had a printk there, and yes, it can be
> simplified. I can roll v3, if required, but I'd leave it as is -- it
> is still convenient place to add debugging to.

I can fix it when applying, no need to send a new version.

Regards,

Boris

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCHv2] mtd: nand: fsl_ifc: fix handing of bit flips in erased pages
  2017-06-01 13:21                           ` Boris Brezillon
@ 2017-06-07  7:31                             ` Boris Brezillon
  0 siblings, 0 replies; 42+ messages in thread
From: Boris Brezillon @ 2017-06-07  7:31 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Darwin Dingel, richard, dwmw2, computersforpeace, marek.vasut,
	cyrille.pitchen, linux-mtd, linux-kernel, mark.marshall, b44839,
	prabhakar

On Thu, 1 Jun 2017 15:21:15 +0200
Boris Brezillon <boris.brezillon@free-electrons.com> wrote:

> On Thu, 1 Jun 2017 15:12:32 +0200
> Pavel Machek <pavel@ucw.cz> wrote:
> 
> > On Thu 2017-06-01 01:09:17, Darwin Dingel wrote:  
> > > Hi Pavel,
> > > 
> > > Just a minor thing. Sorry about the late comment.
> > > 
> > > On 01/06/17 08:59, Pavel Machek wrote:    
> > > > +
> > > > +		res = check_erased_page(chip, buf);
> > > > +		return res;
> > > > +	}    
> > > 
> > > Can we just remove 'res' and change this line to:
> > > 	return check_erased_page(chip, buf);    
> > 
> > Well... I originally had a printk there, and yes, it can be
> > simplified. I can roll v3, if required, but I'd leave it as is -- it
> > is still convenient place to add debugging to.  
> 
> I can fix it when applying, no need to send a new version.

Applied.

Thanks,

Boris

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2017-06-07  7:32 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-19 12:13 fsl_ifc_nand: are blank pages protected by ECC? Pavel Machek
2017-04-19 21:18 ` Boris Brezillon
2017-04-19 22:15   ` Pavel Machek
2017-04-19 22:27     ` Boris Brezillon
2017-04-20 11:40       ` Pavel Machek
2017-04-20 12:15         ` Boris Brezillon
2017-04-21 10:51         ` [PATCH] nand_base: optimize checking of erased buffers Pavel Machek
2017-05-17 11:27           ` Mason
2017-05-17 11:39             ` Mason
2017-05-17 11:52             ` Pavel Machek
2017-05-17 12:22           ` [PATCH] fsl_ifc_nand: fix handing of bit flips in erased nand Pavel Machek
2017-05-17 12:32             ` Boris Brezillon
2017-05-17 13:00               ` Pavel Machek
2017-05-17 13:25                 ` Boris Brezillon
2017-05-17 20:03                   ` [PATCH] mtd: nand: fsl_ifc: fix handing of bit flips in erased pages Pavel Machek
2017-05-31 20:59                     ` [PATCHv2] " Pavel Machek
2017-05-31 22:59                       ` Darwin Dingel
2017-06-01  1:09                       ` Darwin Dingel
2017-06-01 13:12                         ` Pavel Machek
2017-06-01 13:21                           ` Boris Brezillon
2017-06-07  7:31                             ` Boris Brezillon
2017-04-21 10:08   ` fsl_ifc_nand: are blank pages protected by ECC? Pavel Machek
2017-04-21 10:12     ` Richard Weinberger
2017-04-21 12:04     ` Boris Brezillon
2017-04-21 13:37     ` Pavel Machek
2017-04-21 13:49       ` Boris Brezillon
2017-04-22  7:01         ` Pavel Machek
2017-04-22 10:40         ` [PATCH] tango_nand.c: fix ecc.stats_corrected in empty flash case Pavel Machek
2017-04-24  8:58           ` Marc Gonzalez
2017-04-24  9:03             ` Pavel Machek
2017-05-02  9:42             ` Boris Brezillon
2017-05-02 11:52               ` Marc Gonzalez
2017-05-02 12:20                 ` Boris Brezillon
2017-05-03 20:02                   ` Pavel Machek
2017-05-03 20:04             ` Pavel Machek
2017-05-04  8:42               ` Boris Brezillon
2017-05-12 15:34                 ` [PATCH] mtd: nand: tango: Update ecc_stats.corrected Marc Gonzalez
2017-05-15  8:56                   ` Boris Brezillon
2017-05-15 20:47                   ` Boris Brezillon
2017-05-17 12:04                 ` [PATCH] tango_nand.c: fix ecc.stats_corrected in empty flash case Pavel Machek
2017-04-23  9:58         ` tango_nand: is logic right in error cases? (was Re: fsl_ifc_nand: are blank pages protected by ECC?) Pavel Machek
2017-04-24  7:12           ` Boris Brezillon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.