All of lore.kernel.org
 help / color / mirror / Atom feed
* ubi_io_read -74 and ubifs_scanned_corruption errors with i.MX28
@ 2014-06-16 20:13 Voytovich, Mike
  2014-06-19  2:16 ` Iwo Mergler
  2014-07-01 13:36 ` Artem Bityutskiy
  0 siblings, 2 replies; 4+ messages in thread
From: Voytovich, Mike @ 2014-06-16 20:13 UTC (permalink / raw)
  To: linux-mtd

Hi,

We're seeing a failed device after running for a few weeks with various
UBIFS errors, including "ubi_io_read: error -74", "ubifs_scan: corrupt
empty space", "ubifs_scanned_corruption", etc (please see the kernel
output below).  We're running Linux 3.10.0-rc7 on a Freescale i.MX28 board
with a Micron MT29F2G08ABAEA device.

I tried running some of the mtd tests, and most of them pass, with the
exception of mtd_oobtest and mtd_nandbiterrs (although reading the
archives, it appears these failures may be due to an issue with the tests,
and not necessarily related to the failure below).

Note that we're NOT using ubiformat; but, we don't use nandwrite either
(we flash_erase, then do an ubiattach + mount, then extract a root
filesystem image onto the mounted filesystem).  So I'm not sure the "Why
do I have to use ubiformat?" in the FAQ
(http://www.linux-mtd.infradead.org/faq/ubifs.html#L_why_ubiformat)
applies in this case.

And, I'm not sure that it's an issue with sub-pages not being properly
supported, as appending "--vid-hdr-offset 2048" to ubiattach results in
the same failure.

Any ideas regarding what might be going on here?  Perhaps we really do
need to use ubiformat?  Or maybe the mtd_oobtest / mtd_nandbiterrs test
failures are masking a real issue with the MTD and/or i.MX28 gpmi nand
drivers or configuration?

thanks,
-mike


============================================
[    5.815240] UBI: scanning is finished
[    5.865519] UBI: attached mtd3 (name "rootfs", size 180 MiB) to ubi0
[    5.871925] UBI: PEB size: 131072 bytes (128 KiB), LEB size: 126976
bytes
[    5.878877] UBI: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
[    5.885730] UBI: VID header offset: 2048 (aligned 2048), data offset:
4096
[    5.892644] UBI: good PEBs: 1440, bad PEBs: 0, corrupted PEBs: 0
[    5.898786] UBI: user volume: 2, internal volumes: 1, max. volumes
count: 128
[    5.906070] UBI: max/mean erase counter: 102/28, WL threshold: 4096,
image sequence number: 857856516
[    5.915434] UBI: available PEBs: 8, total reserved PEBs: 1432, PEBs
reserved for bad PEB handling: 40
[    5.924918] UBI: background thread "ubi_bgt0d" started, PID 40
[    5.931441] stmp3xxx-rtc 80056000.rtc: setting system clock to
1970-01-01 00:00:17 UTC (17)
[    6.100973] UBIFS: recovery needed
[    9.827117] UBIFS: recovery deferred
[    9.832000] UBIFS: mounted UBI device 0, volume 0, name "rootfs", R/O
mode
[    9.839091] UBIFS: LEB size: 126976 bytes (124 KiB), min./max. I/O unit
sizes: 2048 bytes/2048 bytes
[    9.848410] UBIFS: FS size: 171544576 bytes (163 MiB, 1351 LEBs),
journal size 8634368 bytes (8 MiB, 68 LEBs)
[    9.858476] UBIFS: reserved for root: 4952683 bytes (4836 KiB)
[    9.864358] UBIFS: media format: w4/r0 (latest is w4/r0), UUID
6C7D782B-6835-4B1C-B1C1-8BCF6A099BCF, small LPT model
[    9.882415] VFS: Mounted root (ubifs filesystem) readonly on device
0:11.
[    9.899348] devtmpfs: mounted
[    9.903868] Freeing unused kernel memory: 256K (c057f000 - c05bf000)>
[   12.509524] udevd[64]: starting version 182
[   27.523366] UBIFS: completing deferred recovery
[   27.679203] UBIFS: background thread "ubifs_bgt0_0" started, PID 218
[   27.867104] UBI warning: ubi_io_read: error -74 (ECC error) while
reading 126976 bytes from PEB 670:4096, read only 126976 bytes, retry
[   27.939522] UBI warning: ubi_io_read: error -74 (ECC error) while
reading 126976 bytes from PEB 670:4096, read only 126976 bytes, retry
[   28.011381] UBI warning: ubi_io_read: error -74 (ECC error) while
reading 126976 bytes from PEB 670:4096, read only 126976 bytes, retry
[   28.083325] UBI error: ubi_io_read: error -74 (ECC error) while reading
126976 bytes from PEB 670:4096, read 126976 bytes
[   28.094479] CPU: 0 PID: 217 Comm: mount Tainted: G         C
3.10.0-rc7 #1
[   28.101760] [<c0013dd4>] (unwind_backtrace+0x0/0xf0) from [<c0011b4c>]
(show_stack+0x10/0x14)
[   28.110505] [<c0011b4c>] (show_stack+0x10/0x14) from [<c028cee4>]
(ubi_io_read+0xfc/0x2c0)
[   28.118986] [<c028cee4>] (ubi_io_read+0xfc/0x2c0) from [<c028a47c>]
(ubi_eba_read_leb+0x190/0x424)
[   28.128155] [<c028a47c>] (ubi_eba_read_leb+0x190/0x424) from
[<c0289974>] (ubi_leb_read+0xac/0x120)
[   28.137429] [<c0289974>] (ubi_leb_read+0xac/0x120) from [<c01bd804>]
(ubifs_leb_read+0x28/0x8c)
[   28.146345] [<c01bd804>] (ubifs_leb_read+0x28/0x8c) from [<c01c537c>]
(ubifs_start_scan+0x74/0xec)
[   28.155512] [<c01c537c>] (ubifs_start_scan+0x74/0xec) from [<c01c56a4>]
(ubifs_scan+0x28/0x37c)
[   28.164293] [<c01c56a4>] (ubifs_scan+0x28/0x37c) from [<c01cf484>]
(ubifs_tnc_start_commit+0x604/0xa20)
[   28.173896] [<c01cf484>] (ubifs_tnc_start_commit+0x604/0xa20) from
[<c01c8c7c>] (do_commit+0x144/0x864)
[   28.183500] [<c01c8c7c>] (do_commit+0x144/0x864) from [<c01d79c0>]
(ubifs_rcvry_gc_commit+0x70/0x1dc)
[   28.192923] [<c01d79c0>] (ubifs_rcvry_gc_commit+0x70/0x1dc) from
[<c01b94a4>] (ubifs_remount_fs+0x238/0x790)
[   28.202956] [<c01b94a4>] (ubifs_remount_fs+0x238/0x790) from
[<c00ce554>] (do_remount_sb+0x9c/0x15c)
[   28.212358] [<c00ce554>] (do_remount_sb+0x9c/0x15c) from [<c00e7804>]
(do_mount+0x554/0x818)
[   28.220998] [<c00e7804>] (do_mount+0x554/0x818) from [<c00e7b4c>]
(SyS_mount+0x84/0xb8)
[   28.229208] [<c00e7b4c>] (SyS_mount+0x84/0xb8) from [<c000edc0>]
(ret_fast_syscall+0x0/0x44)
[   28.240863] UBIFS error (pid 217): ubifs_scan: corrupt empty space at
LEB 562:125583
[   28.248835] UBIFS error (pid 217): ubifs_scanned_corruption: corruption
at LEB 562:125583
[   28.257193] UBIFS error (pid 217): ubifs_scanned_corruption: first 1393
bytes from LEB 562:125583
[   28.266307] 00000000: ffffffbf ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.266429] 00000020: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.266533] 00000040: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.266636] 00000060: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.266737] 00000080: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.266835] 000000a0: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.266933] 000000c0: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.267032] 000000e0: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.267131] 00000100: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.267229] 00000120: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.267328] 00000140: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.267430] 00000160: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.267527] 00000180: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.267625] 000001a0: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.267724] 000001c0: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.267822] 000001e0: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.267919] 00000200: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.268018] 00000220: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.268117] 00000240: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.268215] 00000260: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.268315] 00000280: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.268413] 000002a0: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.268513] 000002c0: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.268611] 000002e0: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.268710] 00000300: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.268810] 00000320: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.268908] 00000340: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.269007] 00000360: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.269106] 00000380: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.269202] 000003a0: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.269298] 000003c0: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.269395] 000003e0: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.269495] 00000400: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.269590] 00000420: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.269687] 00000440: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.269783] 00000460: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.269881] 00000480: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.269977] 000004a0: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.270074] 000004c0: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.270174] 000004e0: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.270273] 00000500: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.270371] 00000520: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.270471] 00000540: ffffffff ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff  ................................
[   28.270514] 00000560: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ff                                   .................
[   28.270549] UBIFS error (pid 217): ubifs_scan: LEB 562 scanning failed
[   28.280651] UBIFS error (pid 217): do_commit: commit failed, error -117
[   28.287873] UBIFS warning (pid 217): ubifs_ro_mode: switched to
read-only mode, error -117
[   28.296360] CPU: 0 PID: 217 Comm: mount Tainted: G         C
3.10.0-rc7 #1
[   28.303512] [<c0013dd4>] (unwind_backtrace+0x0/0xf0) from [<c0011b4c>]
(show_stack+0x10/0x14)
[   28.312262] [<c0011b4c>] (show_stack+0x10/0x14) from [<c01c92d4>]
(do_commit+0x79c/0x864)
[   28.320646] [<c01c92d4>] (do_commit+0x79c/0x864) from [<c01d79c0>]
(ubifs_rcvry_gc_commit+0x70/0x1dc)
[   28.330126] [<c01d79c0>] (ubifs_rcvry_gc_commit+0x70/0x1dc) from
[<c01b94a4>] (ubifs_remount_fs+0x238/0x790)
[   28.340165] [<c01b94a4>] (ubifs_remount_fs+0x238/0x790) from
[<c00ce554>] (do_remount_sb+0x9c/0x15c)
[   28.349513] [<c00ce554>] (do_remount_sb+0x9c/0x15c) from [<c00e7804>]
(do_mount+0x554/0x818)
[   28.358162] [<c00e7804>] (do_mount+0x554/0x818) from [<c00e7b4c>]
(SyS_mount+0x84/0xb8)
[   28.366368] [<c00e7b4c>] (SyS_mount+0x84/0xb8) from [<c000edc0>]
(ret_fast_syscall+0x0/0x44)
[   28.375896] UBIFS: background thread "ubifs_bgt0_0" stops
[   28.754565] UBIFS: background thread "ubifs_bgt0_1" started, PID 227
[   28.879338] UBIFS: recovery needed
[   29.156390] UBIFS: recovery completed
============================================

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: ubi_io_read -74 and ubifs_scanned_corruption errors with i.MX28
  2014-06-16 20:13 ubi_io_read -74 and ubifs_scanned_corruption errors with i.MX28 Voytovich, Mike
@ 2014-06-19  2:16 ` Iwo Mergler
  2014-07-01 13:36 ` Artem Bityutskiy
  1 sibling, 0 replies; 4+ messages in thread
From: Iwo Mergler @ 2014-06-19  2:16 UTC (permalink / raw)
  To: Voytovich, Mike, linux-mtd

On Tue, 17 Jun 2014 06:13:09 +1000
"Voytovich, Mike" <mvoytovich@paypal.com> wrote:

> Hi,
> 
> We're seeing a failed device after running for a few weeks with
> various UBIFS errors, including "ubi_io_read: error -74",
> "ubifs_scan: corrupt empty space", "ubifs_scanned_corruption", etc
> (please see the kernel output below).  We're running Linux 3.10.0-rc7
> on a Freescale i.MX28 board with a Micron MT29F2G08ABAEA device.

-74 is -EBADMSG which essentially means "uncorrectable ECC errors"

> I tried running some of the mtd tests, and most of them pass, with the
> exception of mtd_oobtest and mtd_nandbiterrs (although reading the
> archives, it appears these failures may be due to an issue with the
> tests, and not necessarily related to the failure below).

Both oobtest and nandbiterrs use raw data writes which are not
available on the Freescale NAND drivers. It should be possible, however,
to change nandbiterrs to use normal writes instead.

> 
> Note that we're NOT using ubiformat; but, we don't use nandwrite
> either (we flash_erase, then do an ubiattach + mount, then extract a
> root filesystem image onto the mounted filesystem).  So I'm not sure
> the "Why do I have to use ubiformat?" in the FAQ
> (http://www.linux-mtd.infradead.org/faq/ubifs.html#L_why_ubiformat)
> applies in this case.

It does, not using ubiformat breaks the wear leveling mechanism.

UBI maintains block erase counters in each block and ensures that
the difference between those counters are below a threshold.

Ubiformat preserves the block erase counters and thus the real number
of erase cycles for the block. Your method drops the erase counters,
so you will wear out some blocks without allowing UBI to mitigate that.

> And, I'm not sure that it's an issue with sub-pages not being properly
> supported, as appending "--vid-hdr-offset 2048" to ubiattach results
> in the same failure.

If your subpage support was broken, you wouldn't have gotten that far.

> Any ideas regarding what might be going on here?  Perhaps we really do
> need to use ubiformat?  Or maybe the mtd_oobtest / mtd_nandbiterrs
> test failures are masking a real issue with the MTD and/or i.MX28
> gpmi nand drivers or configuration?

Oobtest is probably pointless here. I remember vaguely that the Freescale
NAND controller only implements a rather weird ECC layout where data,
ECC bits and bad block markers are interleaved within the page. It's the
reason raw access doesn't work.

Nandbiterrs could be modified to use ordinary writes though. Its job is
to test your ECC mechanism by generating temporary biterrors in flash.
It does this by repeatedly writing the same content into a page, breaking
the write only once / 4 times rule in most flashes.

> [   28.257193] UBIFS error (pid 217): ubifs_scanned_corruption: first
> 1393 bytes from LEB 562:125583
> [   28.266307] 00000000: ffffffbf ffffffff ffffffff ffffffff ffffffff

Looks like your erased page has developed a bit error (b instead of
f above). Not using ubiformat can do this for you rather quickly if
you are reflashing a lot.

Most modern ECC schemes can't deal with bit errors in erased pages,
since the ECC bits for a all-1 page are not all-1 themselves. So the
hardware ECC usually considers a perfectly good erased page as having
uncorrectable errors.

So you usually see some code in the NAND driver which recognises the
syndrome of a fully erased page and thus wont report the error.

If the syndrome doesn't match, it has to scan the page for 0-bits and
decide that e.g. less than 4 0-bits still counts as a fully erased page
and forcibly set it to all-1.

In other words, your low-level NAND driver probably doesn't currently
implement this "biterrors on erased page" scenario.


Best regards,

Iwo

______________________________________________________________________
This communication contains information which may be confidential or privileged. The information is intended solely for the use of the individual or entity named above.  If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this information is prohibited.  If you have received this communication in error, please notify me by telephone immediately.
______________________________________________________________________

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ubi_io_read -74 and ubifs_scanned_corruption errors with i.MX28
  2014-06-16 20:13 ubi_io_read -74 and ubifs_scanned_corruption errors with i.MX28 Voytovich, Mike
  2014-06-19  2:16 ` Iwo Mergler
@ 2014-07-01 13:36 ` Artem Bityutskiy
  2014-07-03  3:55   ` Iwo Mergler
  1 sibling, 1 reply; 4+ messages in thread
From: Artem Bityutskiy @ 2014-07-01 13:36 UTC (permalink / raw)
  To: Voytovich, Mike; +Cc: linux-mtd

Hi Mike,

On Mon, 2014-06-16 at 20:13 +0000, Voytovich, Mike wrote:
> [   28.266307] 00000000: ffffffbf ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff  ................................

The problem is here - there is a bit-flip in the empty space, notice
that all the bytes are "f" and one is "b".

This problem was brought up many times before, but no one came up with a
solution so far. Let me provide you some back-ground information.

Flash consists of erasable blocks, which we call 'eraseblocks', or
"LEBs" in UBI/UBIFS

UBIFS writes data to LEBs sequentially, from the beginning to the end.

UBIFS has the "journal", which is essentially a set of LEBs which UBIFS
has to scan during mount. These LEBs contain the data that were written
to the file-system last.

In case of a clean unmount, the journal is empty. There are data only in
case of a power cut.

Power cuts cause unfinished writes, so the journal may contain corrupted
nodes, and UBIFS is trying to be very careful about them - it detects
them and drops them.

Corrupted nodes may also appear for other reasons, not because of power
cuts. E.g., just faulty media, worn-out media, radiation, unstable power
supply, etc.

All the corruptions caused by power cuts are not fatal, and UBIFS should
be able to recover from all of them. The non-power cut corruptions are,
in contrast, fatal, and UBIFS has no way to automatically recover from
them.

UBIFS tries hard to distinguish between these 2 types of corruptions.
The power cut-related corruptions may only happen in the end of the
journal, because UBIFS writes sequentially. Namely, power cut-related
corruptions may only be at the end of the last written journal LEB. And
the corruption may span only 1 write unit, because UBIFS writes 1 write
unit at a time. For NAND flash write unit is usually 1 NAND page, which
is 2KiB in your case.

Scanning works like this. We take the first journal LEB and read each
UBIFS node one-by-one from the very beginning. This continues as long as
CRC matches and everything is fine.

If we find a corrupted node (CRC mismatch), we drop it, and we drop
everything else in this write unit. And we expect that the area _after_
this write unit contains _only_ empty space. Simply because UBIFS starts
with empty LEBs.

If there is something else but not just empty space the write-unit
containing corrupted nodes, then someone wrote something there, and the
corrupted node is not at the end of the journal, but somewhere in the
middle. And this means we are dealing with some other type of
corruption, not a corruption caused by a power cut. And we just refuse
mounting.

Now what is empty space? For current UBIFS it is the space at the end of
an LEB containing only "0xFF" bytes, and nothing else. This worked well
in the past, but does not always work nowadays.

In your case you have a single bit-flip in the empty space. UBIFS
detects it, and says that there is a corruption in an area which should
only contain empty space, and it refuses mounting.

How could this be fixed. We discussed 2 possibilities at this forum in
the past.

One possibility is to make the NAND driver/controller _protect_ the
empty NAND pages with ECC and correct bit-flips in the empty space, just
like for written-to pages. Empty NAND pages are those which were never
written to. If I write all 0xFFs to a NAND page, it is is _not_ and
empty NAND page anymore.

This is the preferable solution, but it is not necessarily the easiest
one and not always possible.

The other way is to change UBIFS's definition of empty space. Make UBIFS
be aware of bit-flips in empty space. Make UBIFS allow for a number of
bit-flips there, and this number would depend on the how strong is the
ECC. This would be a much much less plausible solution, because
_architecturally_ it breaks layering. Today we have the MTD layer taking
care of all the bit-flip stuff for upper layers. But this solution would
make UBIFS kind of duplicate MTD efforts, and have its own additional
bit-flip logic. But hey, if there is a good reason, why not?

HTH.

-- 
Best Regards,
Artem Bityutskiy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: ubi_io_read -74 and ubifs_scanned_corruption errors with i.MX28
  2014-07-01 13:36 ` Artem Bityutskiy
@ 2014-07-03  3:55   ` Iwo Mergler
  0 siblings, 0 replies; 4+ messages in thread
From: Iwo Mergler @ 2014-07-03  3:55 UTC (permalink / raw)
  To: dedekind1, Voytovich, Mike; +Cc: linux-mtd

On Tue, 1 Jul 2014 23:36:41 +1000
Artem Bityutskiy <dedekind1@gmail.com> wrote:

> This problem was brought up many times before, but no one came up
> with a solution so far. Let me provide you some back-ground
> information.
<snip> 
> One possibility is to make the NAND driver/controller _protect_ the
> empty NAND pages with ECC and correct bit-flips in the empty space,
> just like for written-to pages. Empty NAND pages are those which were
> never written to. If I write all 0xFFs to a NAND page, it is is _not_
> and empty NAND page anymore.
> 
> This is the preferable solution, but it is not necessarily the easiest
> one and not always possible.

Below is an analysis of the interactions between hardware ECC, driver
and reality with a view towards the erased page problem.

You probably know most of this already, but some may still be useful.
I wanted to write this down for a while, sorry about the length.


Good hardware
===========

The easy way to deal with the erased-page issue is to have an
ECC controller that produces all-1 parity bits for an all-1 page.

This would mean that there is no distinction between an erased
page and one written all-1, and ECC will correct both the same way.

While ridiculously easy to implement in hardware (constant XOR),
very few real world controllers do it. Pretty much anything more
powerful than a 1-bit correcting Hamming code has lost that property.


Making bad hardware do the right thing
==========================

The next best thing is to do the above mentioned XOR operation
in the NAND driver. However, this can be made complicated or
impossible by the specific ECC hardware implementation.

Typically, a hardware ECC controller is implemented by listening
on the incoming and outgoing data (and sometimes command)
traffic between the NAND controller and the external NAND chip.

This usually means that the driver resets the ECC controller before
writing a sub-page, writes the sub-page, and then reads the parity
bits from a few registers in the ECC controller. After all parity bits
for all such ECC steps are collected, the driver writes them to the
OOB.

In theory, this would allow the driver to XOR the resulting parity bits
with a constant, chosen such that the parity bits for an all-1 page are
transformed into all-1 themselves.

Some overly helpful ECC implementations force automatic writing
of ecc bits (e.g. Freescale). This leads to crazy layouts like interleaved
data and parity blocks within the page, with data spilling into the OOB.

On those, there is not much hope to work around the problem, since
the presence of the fully automated mechanism usually implies the
absence of a way to side-load the registers via software instead.

The real trouble (and most hardware bugs) start when it comes to
reading the data.

Again, the ECC controller listens on the bus for the incoming data,
computing on the fly. After a subpage worth of data, it must read
the corresponding parity bits. This results in a syndrome in the ECC
registers which will typically be all-0 for no errors. If there are errors,
the location of correctable bit errors can be extracted from the (non-0)
syndrome.

If we have XOR-ed the parity bits during write, we must undo that
operation (another XOR) before the ECC controller gets to see
them.

Some controllers can be fed data directly through the register
interface, in which case it's easy - read the parity bits (OOB) first,
then read the data followed by side-loading the XORed parity bits.

Unfortunately those controllers are rare and most can only read
the parity bits in the data stream directly.


Getting desperate
=============

Depending on the specifics of the ECC scheme, it can be possible to
transform the incorrect syndrome caused by the XORed parity bits after
the fact, but that usually means rather heavy software calculations in
the case of bit errors.

Worst case, the heavy calculations have to be performed for every
page, errors or not, which almost certainly makes the scheme
impractical - software ECC may be faster.


When all else fails
==============

In the special case of erased pages, it is possible to fake the above suggestion
without to much effort. This is for situations where we can't massage the ECC
controller to operate with the correct parity pattern, and we can't cheaply
fix the last, incorrect ECC steps.

A fully erased, error free page, with an all-1 parity pattern usually reports
ECC errors. These may be correctable, or, more likely, uncorrectable bit
errors. Of course, even correctable bit errors are bogus in this case.

If there are no actual errors, the syndrome is always going to be the
same, and can be recognised as the specific syndrome of a fully erased page.
In this case, we simply return the data and ignore the reported errors.

If there is a small number of 0-bits in the erased page, the syndrome will
be different. We can't recognise them all, but if an error is reported on
read, we can start looking into the data.

The software simply scans the page data for 0-bits. This can be done quite
efficiently by looking for non 0xffffffff words, etc. If the number of
encountered 0-bits (including parity bits) exceeds to correction power of the
ECC, we decide the page wasn't erased and handle the error as normal.

If the number of allowed 0-bits isn't exceeded, we decide that we're dealing
with an erased page with correctable bit errors. We ensure that the read
buffer is now all-1 and report an appropriate number of 'corrected' bit errors
to MTD.

There is a possible failure mode with all this. If there is a valid ECC code
word (data+ECC) which contains less than, say, 4 0-bits, we could
mis-diagnose it as an erased page.

There is a high likelihood that this isn't in fact the case with,
say, a 4-bit ECC implementation. A 4-bit BCH scheme would use 52 ECC
bits per 512 bytes of data. Without detailed knowledge of the specific
implementation, we can estimate the likelihood of the existence of the
above failure mode in a specific implementation as being less than (lots
of handwaving)...

    Number of ECC bits: 52
    Number of possible ECC bit combinations: 2^52
    Number of possible ECC bit combinations with up to 4 0-bits ~ 2*10^5

about 6*10^-11. Rather unlikely. 

If you want to be sure, there are 'only' about 10^12 possible data patterns
with up to 4 0-bits within 512 bytes. If you have a few flash chips to burn
(and a little time ;-) you could exhaustively test all possibilities and check
for parity bits having less than the remaining number of 0-bits.


And if that was too complicated...
=======================

... pick an unused OOB byte or two and always write them as 0 on any write.
If those bytes are mostly 1 on read we are dealing with an erased page.

If OOB space is tight, spare bytes or sometimes spare bits are available between
the ECC blocks as the exact number of ECC bits doesn't always fit into an
exact number of bytes or 16-bit words. If it's just spare bits, chances are that
they are already written as 0.


Best regards,

Iwo

______________________________________________________________________
This communication contains information which may be confidential or privileged. The information is intended solely for the use of the individual or entity named above.  If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this information is prohibited.  If you have received this communication in error, please notify me by telephone immediately.
______________________________________________________________________

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-07-03  3:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-16 20:13 ubi_io_read -74 and ubifs_scanned_corruption errors with i.MX28 Voytovich, Mike
2014-06-19  2:16 ` Iwo Mergler
2014-07-01 13:36 ` Artem Bityutskiy
2014-07-03  3:55   ` Iwo Mergler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.