All of lore.kernel.org
 help / color / mirror / Atom feed
* Increased frequency of fastmap failure due to CRC mismatch
@ 2018-05-17 15:47 Ronak Desai
  2018-05-17 16:46 ` Richard Weinberger
  2018-05-17 16:49 ` Richard Weinberger
  0 siblings, 2 replies; 5+ messages in thread
From: Ronak Desai @ 2018-05-17 15:47 UTC (permalink / raw)
  To: linux-mtd @ lists . infradead . org

On one of our units we noticed increase in fastmap failure due to
fastmap CRC mismatch.  On this unit, on every power-up UBI observed
fixable bit flips on a specific PEB. We are using SW ECC for ECC
correction as the processor's NAND controller does not support the
required ECC strength. We have also implemented read retry in the NAND
controller driver.

When UBI reads the fastmap data using NAND-MTD framework,  NAND-MTD
subsystem returns EUCLEAN meaning there were corrections greater or
equals to ECC strength. But the data should be corrected as the read
call does not return any other error.

In this failure scenarios, even though NAND-MTD subsystem has fixed
the corruption with SW ECC, UBI still finds CRC mis-match on fastmap
data. Successful data read with read retries has already been tested
at temperature as well so there is no doubt about the reliability of
read-retries. So, UBI should never receive corrupted data with fixable
bit flip return code.

So, would like to understand what is causing the fastamp data
corruption which leads to CRC mis-match.  Interesting thing is we see
fixable bit flip error message for that specific PEB on every power up
but we don't see fastmap CRC failure on every power up. All the
reboots are graceful (UBI partition is  detached and unmounted) and
there are no abrupt power-cut.

We are at kernel 4.1.8.

-- 
Ronak A Desai
Sr. Software Engineer
Airborne Information Solutions / RC Linux Platform Software
MS 131-100, C Ave NE, Cedar Rapids, IA, 52498, USA
Ronak.Desai@rockwellcollins.com
https://www.rockwellcollins.com/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Increased frequency of fastmap failure due to CRC mismatch
  2018-05-17 15:47 Increased frequency of fastmap failure due to CRC mismatch Ronak Desai
@ 2018-05-17 16:46 ` Richard Weinberger
  2018-05-17 16:49 ` Richard Weinberger
  1 sibling, 0 replies; 5+ messages in thread
From: Richard Weinberger @ 2018-05-17 16:46 UTC (permalink / raw)
  To: Ronak Desai; +Cc: linux-mtd @ lists . infradead . org

On Thu, May 17, 2018 at 5:47 PM, Ronak Desai
<ronak.desai@rockwellcollins.com> wrote:
> On one of our units we noticed increase in fastmap failure due to
> fastmap CRC mismatch.  On this unit, on every power-up UBI observed
> fixable bit flips on a specific PEB. We are using SW ECC for ECC
> correction as the processor's NAND controller does not support the
> required ECC strength. We have also implemented read retry in the NAND
> controller driver.
>
> When UBI reads the fastmap data using NAND-MTD framework,  NAND-MTD
> subsystem returns EUCLEAN meaning there were corrections greater or
> equals to ECC strength. But the data should be corrected as the read
> call does not return any other error.
>
> In this failure scenarios, even though NAND-MTD subsystem has fixed
> the corruption with SW ECC, UBI still finds CRC mis-match on fastmap
> data. Successful data read with read retries has already been tested
> at temperature as well so there is no doubt about the reliability of
> read-retries. So, UBI should never receive corrupted data with fixable
> bit flip return code.
>
> So, would like to understand what is causing the fastamp data
> corruption which leads to CRC mis-match.  Interesting thing is we see
> fixable bit flip error message for that specific PEB on every power up
> but we don't see fastmap CRC failure on every power up. All the
> reboots are graceful (UBI partition is  detached and unmounted) and
> there are no abrupt power-cut.

Hmm, did you manually check the fastmap? I wonder what in the fastmap is wrong.
Is it just a bitlfip or are many bytes bad?

Is your mtd driver sane?

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Increased frequency of fastmap failure due to CRC mismatch
  2018-05-17 15:47 Increased frequency of fastmap failure due to CRC mismatch Ronak Desai
  2018-05-17 16:46 ` Richard Weinberger
@ 2018-05-17 16:49 ` Richard Weinberger
  2018-05-18 13:50   ` Ronak Desai
  1 sibling, 1 reply; 5+ messages in thread
From: Richard Weinberger @ 2018-05-17 16:49 UTC (permalink / raw)
  To: Ronak Desai; +Cc: linux-mtd @ lists . infradead . org

On Thu, May 17, 2018 at 5:47 PM, Ronak Desai
<ronak.desai@rockwellcollins.com> wrote:
> We are at kernel 4.1.8.

Woah, this kernel is very old, did you backport _all_ ubi/fastmap fixes?

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Increased frequency of fastmap failure due to CRC mismatch
  2018-05-17 16:49 ` Richard Weinberger
@ 2018-05-18 13:50   ` Ronak Desai
  2018-05-18 15:31     ` Richard Weinberger
  0 siblings, 1 reply; 5+ messages in thread
From: Ronak Desai @ 2018-05-18 13:50 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-mtd @ lists . infradead . org

Hey Richard,

On Thu, May 17, 2018 at 11:49 AM, Richard Weinberger
<richard.weinberger@gmail.com> wrote:
> On Thu, May 17, 2018 at 5:47 PM, Ronak Desai
> <ronak.desai@rockwellcollins.com> wrote:
>> We are at kernel 4.1.8.
>
> Woah, this kernel is very old, did you backport _all_ ubi/fastmap fixes?

No, I did not backport all the changes in ubi/fastmap as considering
the product state it is not feasible. This product has been tested for
several months and I checked all the paths in NAND-MTD subsystem as
well as in NAND controller specific driver but nothing stood out which
can cause this. I also compared the basic routines with latest kernel
and I don't see any major changes or fixes which I can relate to this
failure.

The only thing that bothers me is that NAND-MTD system return without
any error except it reports that it has corrected greater or equals to
bits on that block but still UBI finds corrupted fastmap. So, either
there is something wrong in the ECC algorithm where it thinks it has
corrected the bits but actually it did not or something is wrong in
UBI fastmap. Also, as I mentioned on every power-up we see this
fixable bit flip message on that specific block but fastmap fails only
few times and not always.

>
> --
> Thanks,
> //richard


-- 
Ronak A Desai
Sr. Software Engineer
Airborne Information Solutions / RC Linux Platform Software
MS 131-100, C Ave NE, Cedar Rapids, IA, 52498, USA
Ronak.Desai@rockwellcollins.com
https://www.rockwellcollins.com/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Increased frequency of fastmap failure due to CRC mismatch
  2018-05-18 13:50   ` Ronak Desai
@ 2018-05-18 15:31     ` Richard Weinberger
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Weinberger @ 2018-05-18 15:31 UTC (permalink / raw)
  To: Ronak Desai, linux-mtd

Ronak,

Am Freitag, 18. Mai 2018, 15:50:47 CEST schrieb Ronak Desai:
> Hey Richard,
> 
> On Thu, May 17, 2018 at 11:49 AM, Richard Weinberger
> <richard.weinberger@gmail.com> wrote:
> > On Thu, May 17, 2018 at 5:47 PM, Ronak Desai
> > <ronak.desai@rockwellcollins.com> wrote:
> >> We are at kernel 4.1.8.
> >
> > Woah, this kernel is very old, did you backport _all_ ubi/fastmap fixes?
> 
> No, I did not backport all the changes in ubi/fastmap as considering
> the product state it is not feasible. This product has been tested for
> several months and I checked all the paths in NAND-MTD subsystem as
> well as in NAND controller specific driver but nothing stood out which
> can cause this. I also compared the basic routines with latest kernel
> and I don't see any major changes or fixes which I can relate to this
> failure.

Well, I can guarantee you that fastmap is broken in 4.1.

> The only thing that bothers me is that NAND-MTD system return without
> any error except it reports that it has corrected greater or equals to
> bits on that block but still UBI finds corrupted fastmap. So, either
> there is something wrong in the ECC algorithm where it thinks it has
> corrected the bits but actually it did not or something is wrong in
> UBI fastmap. Also, as I mentioned on every power-up we see this
> fixable bit flip message on that specific block but fastmap fails only
> few times and not always.

Do mtd/ubi tests pass?

Thanks,
//richard

-- 
sigma star gmbh - Eduard-Bodem-Gasse 6 - 6020 Innsbruck - Austria
ATU66964118 - FN 374287y

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-05-18 15:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-17 15:47 Increased frequency of fastmap failure due to CRC mismatch Ronak Desai
2018-05-17 16:46 ` Richard Weinberger
2018-05-17 16:49 ` Richard Weinberger
2018-05-18 13:50   ` Ronak Desai
2018-05-18 15:31     ` Richard Weinberger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.