All of lore.kernel.org
 help / color / mirror / Atom feed
From: Miquel Raynal <miquel.raynal@bootlin.com>
To: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	linux-mtd <linux-mtd@lists.infradead.org>
Subject: Re: nand: WARNING: a0000000.nand: the ECC used on your system (1b/256B) is too weak compared to the one required by the NAND chip (4b/512B)
Date: Wed, 23 Jun 2021 15:16:40 +0200	[thread overview]
Message-ID: <20210623151640.34b0fc3a@xps13> (raw)
In-Reply-To: <6eb7f394-7e0e-8ecf-e741-f6e6cc322689@csgroup.eu>

Hi Christophe,

Christophe Leroy <christophe.leroy@csgroup.eu> wrote on Wed, 23 Jun
2021 11:41:46 +0200:

> Le 19/06/2021 à 20:40, Miquel Raynal a écrit :
> > Hi Christophe,
> >   
> >>>> Now and then I'm using one of the latest kernels (Today is 5.13-rc6), and sometime in one of the 5.x releases, I started to get errors like:
> >>>>
> >>>> [    5.098265] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.103859] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 60
> >>>>     bytes from PEB 99:59824, read only 60 bytes, retry
> >>>> [    5.525843] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.531571] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.537490] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 30
> >>>> 73 bytes from PEB 107:108976, read only 3073 bytes, retry
> >>>> [    5.691121] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.696709] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.702426] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.708141] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.714103] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 30
> >>>> 35 bytes from PEB 107:25144, read only 3035 bytes, retry
> >>>> [   20.523689] random: crng init done
> >>>> [   21.892130] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [   21.897730] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 13
> >>>> 94 bytes from PEB 116:75776, read only 1394 bytes, retry
> >>>>
> >>>> Most of the time, when the reading of the file fails, I just have to read it once more and it gets read without that error.  
> >>>
> >>> It really looks like a regular bitflip happening "sometimes". Is this a
> >>> board which already had a life? What are the usage counters (UBI should
> >>> tell you this) compared to the official endurance of your chip (see the
> >>> datasheet)?  
> >>
> >> The board had a peacefull life:
> >>
> >> UBI reports "ubi0: max/mean erase counter: 49/20, WL threshold: 4096"  
> > 
> > Mmmh. Indeed.
> >   
> >>
> >> I have tried with half a dozen of boards and all have the issue.
> >>  
> >>>    >>>> What am I supposed to do to avoid the ECC weakness warning at startup and to fix that ECC error issue ?  
> >>>
> >>> I honestly don't think the errors come from the 5.1x kernels given the
> >>> above logs. If you flash back your old 4.14 I am pretty sure you'll
> >>> have the same errors at some point.  
> >>
> >> I don't have any problem like that with 4.14 with any of the board.
> >>
> >> When booting a 4.14 kernel I don't get any problem on the same board.
> >>  
> > 
> > If you can reliably show that when returning to a 4.14 kernel the ECC
> > weakness disappears, then there is certainly something new. What driver
> > are you using? Maybe you can do a bisection?  
> 
> Using the GPIO driver, and the NAND chip is a HYNIX.
> 
> I can say that the ECC weakness doesn't exist until v5.5 included. The weakness appears with v5.6.
> 
> I have tried bisection between those two versions and I couldn't end up to a reliable result. The closer the v5.5 you go, the more difficult it is to reproduce the issue.
> 
> So I looked at what was done around the places, and in fact that's mainly optimisation in the powerpc code. It seems that the more powerpc is optimised, the more the problem occurs.
> 
> Looking at the GPIO nand driver, I saw that no-op gpio_nand_dosync() function. By adding a memory barrier in that function, the ECC weakness disappeared completely.

I see that the 'fix' in gpio_nand_dosync() has only been designed for
ARM platforms, perhaps it would make sense to have a PPC variant here?

> Not sure what the final solution has to be.

Perhaps PowerPC maintainers can sched some light on these findings?

Thanks,
Miquèl

WARNING: multiple messages have this Message-ID (diff)
From: Miquel Raynal <miquel.raynal@bootlin.com>
To: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: linux-mtd <linux-mtd@lists.infradead.org>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>
Subject: Re: nand: WARNING: a0000000.nand: the ECC used on your system (1b/256B) is too weak compared to the one required by the NAND chip (4b/512B)
Date: Wed, 23 Jun 2021 15:16:40 +0200	[thread overview]
Message-ID: <20210623151640.34b0fc3a@xps13> (raw)
In-Reply-To: <6eb7f394-7e0e-8ecf-e741-f6e6cc322689@csgroup.eu>

Hi Christophe,

Christophe Leroy <christophe.leroy@csgroup.eu> wrote on Wed, 23 Jun
2021 11:41:46 +0200:

> Le 19/06/2021 à 20:40, Miquel Raynal a écrit :
> > Hi Christophe,
> >   
> >>>> Now and then I'm using one of the latest kernels (Today is 5.13-rc6), and sometime in one of the 5.x releases, I started to get errors like:
> >>>>
> >>>> [    5.098265] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.103859] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 60
> >>>>     bytes from PEB 99:59824, read only 60 bytes, retry
> >>>> [    5.525843] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.531571] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.537490] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 30
> >>>> 73 bytes from PEB 107:108976, read only 3073 bytes, retry
> >>>> [    5.691121] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.696709] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.702426] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.708141] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [    5.714103] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 30
> >>>> 35 bytes from PEB 107:25144, read only 3035 bytes, retry
> >>>> [   20.523689] random: crng init done
> >>>> [   21.892130] ecc_sw_hamming_correct: uncorrectable ECC error
> >>>> [   21.897730] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 13
> >>>> 94 bytes from PEB 116:75776, read only 1394 bytes, retry
> >>>>
> >>>> Most of the time, when the reading of the file fails, I just have to read it once more and it gets read without that error.  
> >>>
> >>> It really looks like a regular bitflip happening "sometimes". Is this a
> >>> board which already had a life? What are the usage counters (UBI should
> >>> tell you this) compared to the official endurance of your chip (see the
> >>> datasheet)?  
> >>
> >> The board had a peacefull life:
> >>
> >> UBI reports "ubi0: max/mean erase counter: 49/20, WL threshold: 4096"  
> > 
> > Mmmh. Indeed.
> >   
> >>
> >> I have tried with half a dozen of boards and all have the issue.
> >>  
> >>>    >>>> What am I supposed to do to avoid the ECC weakness warning at startup and to fix that ECC error issue ?  
> >>>
> >>> I honestly don't think the errors come from the 5.1x kernels given the
> >>> above logs. If you flash back your old 4.14 I am pretty sure you'll
> >>> have the same errors at some point.  
> >>
> >> I don't have any problem like that with 4.14 with any of the board.
> >>
> >> When booting a 4.14 kernel I don't get any problem on the same board.
> >>  
> > 
> > If you can reliably show that when returning to a 4.14 kernel the ECC
> > weakness disappears, then there is certainly something new. What driver
> > are you using? Maybe you can do a bisection?  
> 
> Using the GPIO driver, and the NAND chip is a HYNIX.
> 
> I can say that the ECC weakness doesn't exist until v5.5 included. The weakness appears with v5.6.
> 
> I have tried bisection between those two versions and I couldn't end up to a reliable result. The closer the v5.5 you go, the more difficult it is to reproduce the issue.
> 
> So I looked at what was done around the places, and in fact that's mainly optimisation in the powerpc code. It seems that the more powerpc is optimised, the more the problem occurs.
> 
> Looking at the GPIO nand driver, I saw that no-op gpio_nand_dosync() function. By adding a memory barrier in that function, the ECC weakness disappeared completely.

I see that the 'fix' in gpio_nand_dosync() has only been designed for
ARM platforms, perhaps it would make sense to have a PPC variant here?

> Not sure what the final solution has to be.

Perhaps PowerPC maintainers can sched some light on these findings?

Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

  reply	other threads:[~2021-06-23 13:48 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-17 17:17 nand: WARNING: a0000000.nand: the ECC used on your system (1b/256B) is too weak compared to the one required by the NAND chip (4b/512B) Christophe Leroy
2021-06-18  6:43 ` Miquel Raynal
2021-06-18 14:18   ` Christophe Leroy
2021-06-19 18:40     ` Miquel Raynal
2021-06-23  9:41       ` Christophe Leroy
2021-06-23  9:41         ` Christophe Leroy
2021-06-23 13:16         ` Miquel Raynal [this message]
2021-06-23 13:16           ` Miquel Raynal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210623151640.34b0fc3a@xps13 \
    --to=miquel.raynal@bootlin.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=linux-mtd@lists.infradead.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.