All of lore.kernel.org
 help / color / mirror / Atom feed
* nand: WARNING: a0000000.nand: the ECC used on your system (1b/256B) is too weak compared to the one required by the NAND chip (4b/512B)
@ 2021-06-17 17:17 Christophe Leroy
  2021-06-18  6:43 ` Miquel Raynal
  0 siblings, 1 reply; 8+ messages in thread
From: Christophe Leroy @ 2021-06-17 17:17 UTC (permalink / raw)
  To: Miquel Raynal; +Cc: linux-mtd

Hello Miquel,

I have a board running latest kernel with the following NAND:

[    1.523076] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda
[    1.529505] nand: Micron MT29F2G08ABAEAWP
[    1.533526] nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB siz
e: 64
[    1.541196] nand: WARNING: a0000000.nand: the ECC used on your system (1b/256
B) is too weak compared to the one required by the NAND chip (4b/512B)

Until now I was using kernel 4.14 and I was having no problem, allthough it was also exhibiting the 
following (less detailed) warning

[    0.591009] nand: WARNING: a0000000.nand: the ECC used on your system is too weak compared to the 
one required by the NAND chip

Now and then I'm using one of the latest kernels (Today is 5.13-rc6), and sometime in one of the 5.x 
releases, I started to get errors like:

[    5.098265] ecc_sw_hamming_correct: uncorrectable ECC error
[    5.103859] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 60
  bytes from PEB 99:59824, read only 60 bytes, retry
[    5.525843] ecc_sw_hamming_correct: uncorrectable ECC error
[    5.531571] ecc_sw_hamming_correct: uncorrectable ECC error
[    5.537490] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 30
73 bytes from PEB 107:108976, read only 3073 bytes, retry
[    5.691121] ecc_sw_hamming_correct: uncorrectable ECC error
[    5.696709] ecc_sw_hamming_correct: uncorrectable ECC error
[    5.702426] ecc_sw_hamming_correct: uncorrectable ECC error
[    5.708141] ecc_sw_hamming_correct: uncorrectable ECC error
[    5.714103] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 30
35 bytes from PEB 107:25144, read only 3035 bytes, retry
[   20.523689] random: crng init done
[   21.892130] ecc_sw_hamming_correct: uncorrectable ECC error
[   21.897730] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 13
94 bytes from PEB 116:75776, read only 1394 bytes, retry

Most of the time, when the reading of the file fails, I just have to read it once more and it gets 
read without that error.


What am I supposed to do to avoid the ECC weakness warning at startup and to fix that ECC error issue ?

Thanks for your help
Christophe

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nand: WARNING: a0000000.nand: the ECC used on your system (1b/256B) is too weak compared to the one required by the NAND chip (4b/512B)
  2021-06-17 17:17 nand: WARNING: a0000000.nand: the ECC used on your system (1b/256B) is too weak compared to the one required by the NAND chip (4b/512B) Christophe Leroy
@ 2021-06-18  6:43 ` Miquel Raynal
  2021-06-18 14:18   ` Christophe Leroy
  0 siblings, 1 reply; 8+ messages in thread
From: Miquel Raynal @ 2021-06-18  6:43 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linux-mtd

Hi Christophe,

Christophe Leroy <christophe.leroy@csgroup.eu> wrote on Thu, 17 Jun
2021 19:17:05 +0200:

> Hello Miquel,
> 
> I have a board running latest kernel with the following NAND:
> 
> [    1.523076] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda
> [    1.529505] nand: Micron MT29F2G08ABAEAWP
> [    1.533526] nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB siz
> e: 64
> [    1.541196] nand: WARNING: a0000000.nand: the ECC used on your system (1b/256
> B) is too weak compared to the one required by the NAND chip (4b/512B)
> 
> Until now I was using kernel 4.14 and I was having no problem, allthough it was also exhibiting the following (less detailed) warning

Yes, I decided to give more info of what is the minimum ECC scheme that
should be used and what is the one being applied.

> [    0.591009] nand: WARNING: a0000000.nand: the ECC used on your system is too weak compared to the one required by the NAND chip
> 
> Now and then I'm using one of the latest kernels (Today is 5.13-rc6), and sometime in one of the 5.x releases, I started to get errors like:
> 
> [    5.098265] ecc_sw_hamming_correct: uncorrectable ECC error
> [    5.103859] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 60
>   bytes from PEB 99:59824, read only 60 bytes, retry
> [    5.525843] ecc_sw_hamming_correct: uncorrectable ECC error
> [    5.531571] ecc_sw_hamming_correct: uncorrectable ECC error
> [    5.537490] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 30
> 73 bytes from PEB 107:108976, read only 3073 bytes, retry
> [    5.691121] ecc_sw_hamming_correct: uncorrectable ECC error
> [    5.696709] ecc_sw_hamming_correct: uncorrectable ECC error
> [    5.702426] ecc_sw_hamming_correct: uncorrectable ECC error
> [    5.708141] ecc_sw_hamming_correct: uncorrectable ECC error
> [    5.714103] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 30
> 35 bytes from PEB 107:25144, read only 3035 bytes, retry
> [   20.523689] random: crng init done
> [   21.892130] ecc_sw_hamming_correct: uncorrectable ECC error
> [   21.897730] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 13
> 94 bytes from PEB 116:75776, read only 1394 bytes, retry
> 
> Most of the time, when the reading of the file fails, I just have to read it once more and it gets read without that error.

It really looks like a regular bitflip happening "sometimes". Is this a
board which already had a life? What are the usage counters (UBI should
tell you this) compared to the official endurance of your chip (see the
datasheet)?

> What am I supposed to do to avoid the ECC weakness warning at startup and to fix that ECC error issue ?

I honestly don't think the errors come from the 5.1x kernels given the
above logs. If you flash back your old 4.14 I am pretty sure you'll
have the same errors at some point.

NAND really is a fragile storage medium, not following in a production
environment the minimum ECC scheme (there is a real difference between
1/256 vs 4/512) really leads to complicated solutions like this one,
unfortunately...

Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nand: WARNING: a0000000.nand: the ECC used on your system (1b/256B) is too weak compared to the one required by the NAND chip (4b/512B)
  2021-06-18  6:43 ` Miquel Raynal
@ 2021-06-18 14:18   ` Christophe Leroy
  2021-06-19 18:40     ` Miquel Raynal
  0 siblings, 1 reply; 8+ messages in thread
From: Christophe Leroy @ 2021-06-18 14:18 UTC (permalink / raw)
  To: Miquel Raynal; +Cc: linux-mtd



Le 18/06/2021 à 08:43, Miquel Raynal a écrit :
> Hi Christophe,
> 
> Christophe Leroy <christophe.leroy@csgroup.eu> wrote on Thu, 17 Jun
> 2021 19:17:05 +0200:
> 
>> Hello Miquel,
>>
>> I have a board running latest kernel with the following NAND:
>>
>> [    1.523076] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda
>> [    1.529505] nand: Micron MT29F2G08ABAEAWP
>> [    1.533526] nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB siz
>> e: 64
>> [    1.541196] nand: WARNING: a0000000.nand: the ECC used on your system (1b/256
>> B) is too weak compared to the one required by the NAND chip (4b/512B)
>>
>> Until now I was using kernel 4.14 and I was having no problem, allthough it was also exhibiting the following (less detailed) warning
> 
> Yes, I decided to give more info of what is the minimum ECC scheme that
> should be used and what is the one being applied.

Yes it was a good idea.

> 
>> [    0.591009] nand: WARNING: a0000000.nand: the ECC used on your system is too weak compared to the one required by the NAND chip
>>
>> Now and then I'm using one of the latest kernels (Today is 5.13-rc6), and sometime in one of the 5.x releases, I started to get errors like:
>>
>> [    5.098265] ecc_sw_hamming_correct: uncorrectable ECC error
>> [    5.103859] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 60
>>    bytes from PEB 99:59824, read only 60 bytes, retry
>> [    5.525843] ecc_sw_hamming_correct: uncorrectable ECC error
>> [    5.531571] ecc_sw_hamming_correct: uncorrectable ECC error
>> [    5.537490] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 30
>> 73 bytes from PEB 107:108976, read only 3073 bytes, retry
>> [    5.691121] ecc_sw_hamming_correct: uncorrectable ECC error
>> [    5.696709] ecc_sw_hamming_correct: uncorrectable ECC error
>> [    5.702426] ecc_sw_hamming_correct: uncorrectable ECC error
>> [    5.708141] ecc_sw_hamming_correct: uncorrectable ECC error
>> [    5.714103] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 30
>> 35 bytes from PEB 107:25144, read only 3035 bytes, retry
>> [   20.523689] random: crng init done
>> [   21.892130] ecc_sw_hamming_correct: uncorrectable ECC error
>> [   21.897730] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 13
>> 94 bytes from PEB 116:75776, read only 1394 bytes, retry
>>
>> Most of the time, when the reading of the file fails, I just have to read it once more and it gets read without that error.
> 
> It really looks like a regular bitflip happening "sometimes". Is this a
> board which already had a life? What are the usage counters (UBI should
> tell you this) compared to the official endurance of your chip (see the
> datasheet)?

The board had a peacefull life:

UBI reports "ubi0: max/mean erase counter: 49/20, WL threshold: 4096"

I have tried with half a dozen of boards and all have the issue.

> 
>> What am I supposed to do to avoid the ECC weakness warning at startup and to fix that ECC error issue ?
> 
> I honestly don't think the errors come from the 5.1x kernels given the
> above logs. If you flash back your old 4.14 I am pretty sure you'll
> have the same errors at some point.

I don't have any problem like that with 4.14 with any of the board.

When booting a 4.14 kernel I don't get any problem on the same board.

> 
> NAND really is a fragile storage medium, not following in a production
> environment the minimum ECC scheme (there is a real difference between
> 1/256 vs 4/512) really leads to complicated solutions like this one,
> unfortunately...

I see kernel has "Software BCH ECC". Should I use that with that chip ?

If yes, how do I use it ? Seems like selecting the option at Kernel build is not enough, do I have 
to configure something somewhere, for instance in the device tree ? At the time being I have the 
following in the device tree:

		nand@2,0 {
			compatible = "gpio-control-nand";
			reg = <2 0x0000 0x1>;
			#address-cells = <1>;
			#size-cells = <1>;
			rdy-gpio = <&cpld_etat 13 0>;	/* RDY */
			nce-gpio = <&CPM1_PIO_D 15 0>;	/* nCE */
			ale-gpio = <&CPM1_PIO_D 13 0>;	/* ALE */
			cle-gpio = <&CPM1_PIO_D 12 0>;	/* CLE */
		};


Thanks
Christophe

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nand: WARNING: a0000000.nand: the ECC used on your system (1b/256B) is too weak compared to the one required by the NAND chip (4b/512B)
  2021-06-18 14:18   ` Christophe Leroy
@ 2021-06-19 18:40     ` Miquel Raynal
  2021-06-23  9:41         ` Christophe Leroy
  0 siblings, 1 reply; 8+ messages in thread
From: Miquel Raynal @ 2021-06-19 18:40 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linux-mtd

Hi Christophe,

> >> Now and then I'm using one of the latest kernels (Today is 5.13-rc6), and sometime in one of the 5.x releases, I started to get errors like:
> >>
> >> [    5.098265] ecc_sw_hamming_correct: uncorrectable ECC error
> >> [    5.103859] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 60
> >>    bytes from PEB 99:59824, read only 60 bytes, retry
> >> [    5.525843] ecc_sw_hamming_correct: uncorrectable ECC error
> >> [    5.531571] ecc_sw_hamming_correct: uncorrectable ECC error
> >> [    5.537490] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 30
> >> 73 bytes from PEB 107:108976, read only 3073 bytes, retry
> >> [    5.691121] ecc_sw_hamming_correct: uncorrectable ECC error
> >> [    5.696709] ecc_sw_hamming_correct: uncorrectable ECC error
> >> [    5.702426] ecc_sw_hamming_correct: uncorrectable ECC error
> >> [    5.708141] ecc_sw_hamming_correct: uncorrectable ECC error
> >> [    5.714103] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 30
> >> 35 bytes from PEB 107:25144, read only 3035 bytes, retry
> >> [   20.523689] random: crng init done
> >> [   21.892130] ecc_sw_hamming_correct: uncorrectable ECC error
> >> [   21.897730] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 13
> >> 94 bytes from PEB 116:75776, read only 1394 bytes, retry
> >>
> >> Most of the time, when the reading of the file fails, I just have to read it once more and it gets read without that error.  
> > 
> > It really looks like a regular bitflip happening "sometimes". Is this a
> > board which already had a life? What are the usage counters (UBI should
> > tell you this) compared to the official endurance of your chip (see the
> > datasheet)?  
> 
> The board had a peacefull life:
> 
> UBI reports "ubi0: max/mean erase counter: 49/20, WL threshold: 4096"

Mmmh. Indeed.

> 
> I have tried with half a dozen of boards and all have the issue.
> 
> >   
> >> What am I supposed to do to avoid the ECC weakness warning at startup and to fix that ECC error issue ?  
> > 
> > I honestly don't think the errors come from the 5.1x kernels given the
> > above logs. If you flash back your old 4.14 I am pretty sure you'll
> > have the same errors at some point.  
> 
> I don't have any problem like that with 4.14 with any of the board.
> 
> When booting a 4.14 kernel I don't get any problem on the same board.
> 

If you can reliably show that when returning to a 4.14 kernel the ECC
weakness disappears, then there is certainly something new. What driver
are you using? Maybe you can do a bisection?

> > 
> > NAND really is a fragile storage medium, not following in a production
> > environment the minimum ECC scheme (there is a real difference between
> > 1/256 vs 4/512) really leads to complicated solutions like this one,
> > unfortunately...  
> 
> I see kernel has "Software BCH ECC". Should I use that with that chip ?
> 
> If yes, how do I use it ? Seems like selecting the option at Kernel build is not enough, do I have to configure something somewhere, for instance in the device tree ? At the time being I have the following in the device tree:

Enabling software BCH in the configuration will just built-in the
support. You then need to follow the NAND controller bindings, see the
example in [1].

However, given all the data you provided, I know think that there is
something weird happening in the driver you use, it might be relevant
to try to understand what. 

[1] Documentation/devicetree/bindings/mtd/nand-controller.yaml

Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nand: WARNING: a0000000.nand: the ECC used on your system (1b/256B) is too weak compared to the one required by the NAND chip (4b/512B)
  2021-06-19 18:40     ` Miquel Raynal
@ 2021-06-23  9:41         ` Christophe Leroy
  0 siblings, 0 replies; 8+ messages in thread
From: Christophe Leroy @ 2021-06-23  9:41 UTC (permalink / raw)
  To: Miquel Raynal; +Cc: linuxppc-dev, linux-mtd



Le 19/06/2021 à 20:40, Miquel Raynal a écrit :
> Hi Christophe,
> 
>>>> Now and then I'm using one of the latest kernels (Today is 5.13-rc6), and sometime in one of the 5.x releases, I started to get errors like:
>>>>
>>>> [    5.098265] ecc_sw_hamming_correct: uncorrectable ECC error
>>>> [    5.103859] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 60
>>>>     bytes from PEB 99:59824, read only 60 bytes, retry
>>>> [    5.525843] ecc_sw_hamming_correct: uncorrectable ECC error
>>>> [    5.531571] ecc_sw_hamming_correct: uncorrectable ECC error
>>>> [    5.537490] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 30
>>>> 73 bytes from PEB 107:108976, read only 3073 bytes, retry
>>>> [    5.691121] ecc_sw_hamming_correct: uncorrectable ECC error
>>>> [    5.696709] ecc_sw_hamming_correct: uncorrectable ECC error
>>>> [    5.702426] ecc_sw_hamming_correct: uncorrectable ECC error
>>>> [    5.708141] ecc_sw_hamming_correct: uncorrectable ECC error
>>>> [    5.714103] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 30
>>>> 35 bytes from PEB 107:25144, read only 3035 bytes, retry
>>>> [   20.523689] random: crng init done
>>>> [   21.892130] ecc_sw_hamming_correct: uncorrectable ECC error
>>>> [   21.897730] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 13
>>>> 94 bytes from PEB 116:75776, read only 1394 bytes, retry
>>>>
>>>> Most of the time, when the reading of the file fails, I just have to read it once more and it gets read without that error.
>>>
>>> It really looks like a regular bitflip happening "sometimes". Is this a
>>> board which already had a life? What are the usage counters (UBI should
>>> tell you this) compared to the official endurance of your chip (see the
>>> datasheet)?
>>
>> The board had a peacefull life:
>>
>> UBI reports "ubi0: max/mean erase counter: 49/20, WL threshold: 4096"
> 
> Mmmh. Indeed.
> 
>>
>> I have tried with half a dozen of boards and all have the issue.
>>
>>>    
>>>> What am I supposed to do to avoid the ECC weakness warning at startup and to fix that ECC error issue ?
>>>
>>> I honestly don't think the errors come from the 5.1x kernels given the
>>> above logs. If you flash back your old 4.14 I am pretty sure you'll
>>> have the same errors at some point.
>>
>> I don't have any problem like that with 4.14 with any of the board.
>>
>> When booting a 4.14 kernel I don't get any problem on the same board.
>>
> 
> If you can reliably show that when returning to a 4.14 kernel the ECC
> weakness disappears, then there is certainly something new. What driver
> are you using? Maybe you can do a bisection?

Using the GPIO driver, and the NAND chip is a HYNIX.

I can say that the ECC weakness doesn't exist until v5.5 included. The weakness appears with v5.6.

I have tried bisection between those two versions and I couldn't end up to a reliable result. The 
closer the v5.5 you go, the more difficult it is to reproduce the issue.

So I looked at what was done around the places, and in fact that's mainly optimisation in the 
powerpc code. It seems that the more powerpc is optimised, the more the problem occurs.

Looking at the GPIO nand driver, I saw that no-op gpio_nand_dosync() function. By adding a memory 
barrier in that function, the ECC weakness disappeared completely.

Not sure what the final solution has to be.

Christophe

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nand: WARNING: a0000000.nand: the ECC used on your system (1b/256B) is too weak compared to the one required by the NAND chip (4b/512B)
@ 2021-06-23  9:41         ` Christophe Leroy
  0 siblings, 0 replies; 8+ messages in thread
From: Christophe Leroy @ 2021-06-23  9:41 UTC (permalink / raw)
  To: Miquel Raynal; +Cc: linux-mtd, linuxppc-dev



Le 19/06/2021 à 20:40, Miquel Raynal a écrit :
> Hi Christophe,
> 
>>>> Now and then I'm using one of the latest kernels (Today is 5.13-rc6), and sometime in one of the 5.x releases, I started to get errors like:
>>>>
>>>> [    5.098265] ecc_sw_hamming_correct: uncorrectable ECC error
>>>> [    5.103859] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 60
>>>>     bytes from PEB 99:59824, read only 60 bytes, retry
>>>> [    5.525843] ecc_sw_hamming_correct: uncorrectable ECC error
>>>> [    5.531571] ecc_sw_hamming_correct: uncorrectable ECC error
>>>> [    5.537490] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 30
>>>> 73 bytes from PEB 107:108976, read only 3073 bytes, retry
>>>> [    5.691121] ecc_sw_hamming_correct: uncorrectable ECC error
>>>> [    5.696709] ecc_sw_hamming_correct: uncorrectable ECC error
>>>> [    5.702426] ecc_sw_hamming_correct: uncorrectable ECC error
>>>> [    5.708141] ecc_sw_hamming_correct: uncorrectable ECC error
>>>> [    5.714103] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 30
>>>> 35 bytes from PEB 107:25144, read only 3035 bytes, retry
>>>> [   20.523689] random: crng init done
>>>> [   21.892130] ecc_sw_hamming_correct: uncorrectable ECC error
>>>> [   21.897730] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 13
>>>> 94 bytes from PEB 116:75776, read only 1394 bytes, retry
>>>>
>>>> Most of the time, when the reading of the file fails, I just have to read it once more and it gets read without that error.
>>>
>>> It really looks like a regular bitflip happening "sometimes". Is this a
>>> board which already had a life? What are the usage counters (UBI should
>>> tell you this) compared to the official endurance of your chip (see the
>>> datasheet)?
>>
>> The board had a peacefull life:
>>
>> UBI reports "ubi0: max/mean erase counter: 49/20, WL threshold: 4096"
> 
> Mmmh. Indeed.
> 
>>
>> I have tried with half a dozen of boards and all have the issue.
>>
>>>    
>>>> What am I supposed to do to avoid the ECC weakness warning at startup and to fix that ECC error issue ?
>>>
>>> I honestly don't think the errors come from the 5.1x kernels given the
>>> above logs. If you flash back your old 4.14 I am pretty sure you'll
>>> have the same errors at some point.
>>
>> I don't have any problem like that with 4.14 with any of the board.
>>
>> When booting a 4.14 kernel I don't get any problem on the same board.
>>
> 
> If you can reliably show that when returning to a 4.14 kernel the ECC
> weakness disappears, then there is certainly something new. What driver
> are you using? Maybe you can do a bisection?

Using the GPIO driver, and the NAND chip is a HYNIX.

I can say that the ECC weakness doesn't exist until v5.5 included. The weakness appears with v5.6.

I have tried bisection between those two versions and I couldn't end up to a reliable result. The 
closer the v5.5 you go, the more difficult it is to reproduce the issue.

So I looked at what was done around the places, and in fact that's mainly optimisation in the 
powerpc code. It seems that the more powerpc is optimised, the more the problem occurs.

Looking at the GPIO nand driver, I saw that no-op gpio_nand_dosync() function. By adding a memory 
barrier in that function, the ECC weakness disappeared completely.

Not sure what the final solution has to be.

Christophe

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nand: WARNING: a0000000.nand: the ECC used on your system (1b/256B) is too weak compared to the one required by the NAND chip (4b/512B)
  2021-06-23  9:41         ` Christophe Leroy
@ 2021-06-23 13:16           ` Miquel Raynal
  -1 siblings, 0 replies; 8+ messages in thread
From: Miquel Raynal @ 2021-06-23 13:16 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linuxppc-dev, linux-mtd

Hi Christophe,

Christophe Leroy <christophe.leroy@csgroup.eu> wrote on Wed, 23 Jun
2021 11:41:46 +0200:

> Le 19/06/2021 à 20:40, Miquel Raynal a écrit :
> > Hi Christophe,
> >   
> >>>> Now and then I'm using one of the latest kernels (Today is 5.13-rc6), and sometime in one of the 5.x releases, I started to get errors like:
> >>>>
> >>>> [    5.098265] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.103859] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 60
> >>>>     bytes from PEB 99:59824, read only 60 bytes, retry
> >>>> [    5.525843] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.531571] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.537490] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 30
> >>>> 73 bytes from PEB 107:108976, read only 3073 bytes, retry
> >>>> [    5.691121] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.696709] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.702426] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.708141] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.714103] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 30
> >>>> 35 bytes from PEB 107:25144, read only 3035 bytes, retry
> >>>> [   20.523689] random: crng init done
> >>>> [   21.892130] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [   21.897730] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 13
> >>>> 94 bytes from PEB 116:75776, read only 1394 bytes, retry
> >>>>
> >>>> Most of the time, when the reading of the file fails, I just have to read it once more and it gets read without that error.  
> >>>
> >>> It really looks like a regular bitflip happening "sometimes". Is this a
> >>> board which already had a life? What are the usage counters (UBI should
> >>> tell you this) compared to the official endurance of your chip (see the
> >>> datasheet)?  
> >>
> >> The board had a peacefull life:
> >>
> >> UBI reports "ubi0: max/mean erase counter: 49/20, WL threshold: 4096"  
> > 
> > Mmmh. Indeed.
> >   
> >>
> >> I have tried with half a dozen of boards and all have the issue.
> >>  
> >>>    >>>> What am I supposed to do to avoid the ECC weakness warning at startup and to fix that ECC error issue ?  
> >>>
> >>> I honestly don't think the errors come from the 5.1x kernels given the
> >>> above logs. If you flash back your old 4.14 I am pretty sure you'll
> >>> have the same errors at some point.  
> >>
> >> I don't have any problem like that with 4.14 with any of the board.
> >>
> >> When booting a 4.14 kernel I don't get any problem on the same board.
> >>  
> > 
> > If you can reliably show that when returning to a 4.14 kernel the ECC
> > weakness disappears, then there is certainly something new. What driver
> > are you using? Maybe you can do a bisection?  
> 
> Using the GPIO driver, and the NAND chip is a HYNIX.
> 
> I can say that the ECC weakness doesn't exist until v5.5 included. The weakness appears with v5.6.
> 
> I have tried bisection between those two versions and I couldn't end up to a reliable result. The closer the v5.5 you go, the more difficult it is to reproduce the issue.
> 
> So I looked at what was done around the places, and in fact that's mainly optimisation in the powerpc code. It seems that the more powerpc is optimised, the more the problem occurs.
> 
> Looking at the GPIO nand driver, I saw that no-op gpio_nand_dosync() function. By adding a memory barrier in that function, the ECC weakness disappeared completely.

I see that the 'fix' in gpio_nand_dosync() has only been designed for
ARM platforms, perhaps it would make sense to have a PPC variant here?

> Not sure what the final solution has to be.

Perhaps PowerPC maintainers can sched some light on these findings?

Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nand: WARNING: a0000000.nand: the ECC used on your system (1b/256B) is too weak compared to the one required by the NAND chip (4b/512B)
@ 2021-06-23 13:16           ` Miquel Raynal
  0 siblings, 0 replies; 8+ messages in thread
From: Miquel Raynal @ 2021-06-23 13:16 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linux-mtd, linuxppc-dev

Hi Christophe,

Christophe Leroy <christophe.leroy@csgroup.eu> wrote on Wed, 23 Jun
2021 11:41:46 +0200:

> Le 19/06/2021 à 20:40, Miquel Raynal a écrit :
> > Hi Christophe,
> >   
> >>>> Now and then I'm using one of the latest kernels (Today is 5.13-rc6), and sometime in one of the 5.x releases, I started to get errors like:
> >>>>
> >>>> [    5.098265] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.103859] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 60
> >>>>     bytes from PEB 99:59824, read only 60 bytes, retry
> >>>> [    5.525843] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.531571] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.537490] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 30
> >>>> 73 bytes from PEB 107:108976, read only 3073 bytes, retry
> >>>> [    5.691121] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.696709] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.702426] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.708141] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.714103] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 30
> >>>> 35 bytes from PEB 107:25144, read only 3035 bytes, retry
> >>>> [   20.523689] random: crng init done
> >>>> [   21.892130] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [   21.897730] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 13
> >>>> 94 bytes from PEB 116:75776, read only 1394 bytes, retry
> >>>>
> >>>> Most of the time, when the reading of the file fails, I just have to read it once more and it gets read without that error.  
> >>>
> >>> It really looks like a regular bitflip happening "sometimes". Is this a
> >>> board which already had a life? What are the usage counters (UBI should
> >>> tell you this) compared to the official endurance of your chip (see the
> >>> datasheet)?  
> >>
> >> The board had a peacefull life:
> >>
> >> UBI reports "ubi0: max/mean erase counter: 49/20, WL threshold: 4096"  
> > 
> > Mmmh. Indeed.
> >   
> >>
> >> I have tried with half a dozen of boards and all have the issue.
> >>  
> >>>    >>>> What am I supposed to do to avoid the ECC weakness warning at startup and to fix that ECC error issue ?  
> >>>
> >>> I honestly don't think the errors come from the 5.1x kernels given the
> >>> above logs. If you flash back your old 4.14 I am pretty sure you'll
> >>> have the same errors at some point.  
> >>
> >> I don't have any problem like that with 4.14 with any of the board.
> >>
> >> When booting a 4.14 kernel I don't get any problem on the same board.
> >>  
> > 
> > If you can reliably show that when returning to a 4.14 kernel the ECC
> > weakness disappears, then there is certainly something new. What driver
> > are you using? Maybe you can do a bisection?  
> 
> Using the GPIO driver, and the NAND chip is a HYNIX.
> 
> I can say that the ECC weakness doesn't exist until v5.5 included. The weakness appears with v5.6.
> 
> I have tried bisection between those two versions and I couldn't end up to a reliable result. The closer the v5.5 you go, the more difficult it is to reproduce the issue.
> 
> So I looked at what was done around the places, and in fact that's mainly optimisation in the powerpc code. It seems that the more powerpc is optimised, the more the problem occurs.
> 
> Looking at the GPIO nand driver, I saw that no-op gpio_nand_dosync() function. By adding a memory barrier in that function, the ECC weakness disappeared completely.

I see that the 'fix' in gpio_nand_dosync() has only been designed for
ARM platforms, perhaps it would make sense to have a PPC variant here?

> Not sure what the final solution has to be.

Perhaps PowerPC maintainers can sched some light on these findings?

Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-06-23 13:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-17 17:17 nand: WARNING: a0000000.nand: the ECC used on your system (1b/256B) is too weak compared to the one required by the NAND chip (4b/512B) Christophe Leroy
2021-06-18  6:43 ` Miquel Raynal
2021-06-18 14:18   ` Christophe Leroy
2021-06-19 18:40     ` Miquel Raynal
2021-06-23  9:41       ` Christophe Leroy
2021-06-23  9:41         ` Christophe Leroy
2021-06-23 13:16         ` Miquel Raynal
2021-06-23 13:16           ` Miquel Raynal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.