linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* Help with UBI failure analysis
@ 2020-09-24  8:35 Sergei Poselenov
  2020-09-24 10:42 ` Richard Weinberger
  0 siblings, 1 reply; 4+ messages in thread
From: Sergei Poselenov @ 2020-09-24  8:35 UTC (permalink / raw)
  To: linux-mtd

Hello,

i.MX SoloX ARM SoC, running Linux 4.5 with the root filesystem on UBI on 512MB NAND chip. A customer is complaining on the following UBI error (excerpts from the boot log below):
...
nand: device found, Manufacturer ID: 0x01, Chip ID: 0xdc
nand: AMD/Spansion S34ML04G1
nand: 512 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
gpmi-nand 1806000.gpmi-nand: mode:4 ,failed in set feature.
Bad block table found at page 262080, version 0x01
Bad block table found at page 262016, version 0x01
nand_read_bbt: bad block at 0x00001d0a0000
6 ofpart partitions found on MTD device gpmi-nand
Creating 6 MTD partitions on "gpmi-nand":
0x000000000000-0x000000400000 : "nand_uboot_env"
0x000000400000-0x000000e00000 : "nand_splash_image"
0x000000e00000-0x000000e20000 : "nand_dtb_image"
0x000000e20000-0x000001a20000 : "nand_linux_image"
0x000001a20000-0x00000ba20000 : "nand_rootfs_image"
0x00000ba20000-0x000040000000 : "nand_ubifs"
mtd: partition "nand_ubifs" extends beyond the end of device "gpmi-nand" -- size truncated to 0x145e0000
...
ubi0: default fastmap pool size: 60
ubi0: default fastmap WL pool size: 30
ubi0: attaching mtd4
ubi0: attached by fastmap
ubi0: fastmap pool size: 60
ubi0: fastmap WL pool size: 30
ubi0: attached mtd4 (name "nand_rootfs_image", size 160 MiB)
ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
ubi0: good PEBs: 1280, bad PEBs: 0, corrupted PEBs: 0
ubi0: user volume: 1, internal volumes: 1, max. volumes count: 128
ubi0: max/mean erase counter: 3/0, WL threshold: 4096, image sequence number: 244395915
ubi0: available PEBs: 0, total reserved PEBs: 1280, PEBs reserved for bad PEB handling: 80
ubi0: background thread "ubi_bgt0d" started, PID 140
input: gpio-keys as /devices/soc0/gpio-keys/input/input2
hctosys: unable to open rtc device (rtc0)
ALSA device list:
  No soundcards found.
UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 141
UBIFS (ubi0:0): recovery needed
UBIFS (ubi0:0): recovery completed
UBIFS (ubi0:0): UBIFS: mounted UBI device 0, volume 0, name "rootfs"
UBIFS (ubi0:0): LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes
UBIFS (ubi0:0): FS size: 74788864 bytes (71 MiB, 589 LEBs), journal size 9023488 bytes (8 MiB, 72 LEBs)
UBIFS (ubi0:0): reserved for root: 0 bytes (0 KiB)
UBIFS (ubi0:0): media format: w4/r0 (latest is w4/r0), UUID 09F04725-4F95-492E-B899-800AFEC97FB9, small LPT model
VFS: Mounted root (ubifs filesystem) on device 0:13.
devtmpfs: mounted
Freeing unused kernel memory: 280K (80898000 - 808de000)
UBIFS error (ubi0:0 pid 1): ubifs_read_node: bad node type (0 but expected 9)
UBIFS error (ubi0:0 pid 1): ubifs_read_node: bad node at LEB 12:100352, LEB mapping status 1
Not a node, first 24 bytes:
00000000: 55 42 49 23 01 00 00 00 00 00 00 00 00 00 00 02 00 00 08 00 00 00 10 00                          UBI#....................
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.5.0 #4
Hardware name: Freescale i.MX6 SoloX (Device Tree)
[<800160f4>] (unwind_backtrace) from [<80013400>] (show_stack+0x10/0x14)
[<80013400>] (show_stack) from [<8029c6ec>] (dump_stack+0x88/0x9c)
...
UBIFS error (ubi0:0 pid 142): ubifs_iget: failed to read inode 438, error -22
UBIFS error (ubi0:0 pid 142): ubifs_lookup: dead directory entry 'etc', error -22
can't run '/etc/init.d/rcS': Invalid argument
...

I see that NAND has a badblock at 0x00001d0a0000, but it's not on the "rootfs" partition.

What could cause such error? Any thoughts will be much appreciated.

Regards,
Sergei

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Help with UBI failure analysis
  2020-09-24  8:35 Help with UBI failure analysis Sergei Poselenov
@ 2020-09-24 10:42 ` Richard Weinberger
  2020-11-23  7:37   ` Sergei Poselenov
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Weinberger @ 2020-09-24 10:42 UTC (permalink / raw)
  To: Sergei Poselenov; +Cc: linux-mtd

On Thu, Sep 24, 2020 at 10:41 AM Sergei Poselenov
<sposelenov@emcraft.com> wrote:
> Freeing unused kernel memory: 280K (80898000 - 808de000)
> UBIFS error (ubi0:0 pid 1): ubifs_read_node: bad node type (0 but expected 9)
> UBIFS error (ubi0:0 pid 1): ubifs_read_node: bad node at LEB 12:100352, LEB mapping status 1
> Not a node, first 24 bytes:
> 00000000: 55 42 49 23 01 00 00 00 00 00 00 00 00 00 00 02 00 00 08 00 00 00 10 00                          UBI#....................

So, UBIFS reads from a node and finds a UBI EC header, this must not happen.
UBI headers are invisible to UBIFS. If UBIFS reads from a LEB and a
given offset, the offset is translated
to read data after the page(s) where EC/VID headers reside.

*unless* LEB 12 offset 100352 contains an UBIFS data node for a
nanddump you did before
and the index tree is corrupted. But I don't think so.

Looks more like a problem on the mtd side.

-- 
Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Help with UBI failure analysis
  2020-09-24 10:42 ` Richard Weinberger
@ 2020-11-23  7:37   ` Sergei Poselenov
  2020-11-23  7:54     ` Richard Weinberger
  0 siblings, 1 reply; 4+ messages in thread
From: Sergei Poselenov @ 2020-11-23  7:37 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-mtd

Hello Richard,

Sorry for a long delay and thanks a lot for your reply. Could you please look at the crash dump below?

We set up a simple write-compare test on that device (please see the original message in the threa), on an empty UBI partition:
DIR=/mnt/data

rm -rf $DIR/*
touch $DIR/a1 $DIR/a2

while :; do
        for sz in 1 10 3 5; do
                cs1=`md5sum $DIR/a1 | cut -f1 -d' '`
                cs2=`md5sum $DIR/a2 | cut -f1 -d' '`
                if [ "$cs1" = "$cs2" ]; then
                        echo -n "." 
                else
                        echo "ERROR: $DIR/a1 and $DIR/a2 differ" 
                        exit 1
                fi
                mv $DIR/a1 $DIR/b1
                mv $DIR/a2 $DIR/b2
                dd if=/dev/urandom of=/tmp/rnd count=$sz bs=1M 2> /dev/null
                cp /tmp/rnd $DIR/a1
                cp /tmp/rnd $DIR/a2
                sync; echo 3 > /proc/sys/vm/drop_caches > /dev/null
        done
done


After about ~3days, the test failed with the following messages on the board console:
=================================
...........................................................................................................
...........................................................................................................
.......................................ubi1 warning: ubi_eba_copy_leb: read data back from PEB 895 and it is different
ubi1 error: wear_leveling_worker: error -22 while moving PEB 1 to PEB 895
ubi1 warning: ubi_ro_mode: switch to read-only mode
CPU: 0 PID: 181 Comm: ubi_bgt1d Not tainted 4.5.0 #4
Hardware name: Freescale i.MX6 SoloX (Device Tree)
[<800160f4>] (unwind_backtrace) from [<80013400>] (show_stack+0x10/0x14)
[<80013400>] (show_stack) from [<8029c6ec>] (dump_stack+0x88/0x9c)
[<8029c6ec>] (dump_stack) from [<8039fac8>] (wear_leveling_worker+0x808/0x858)
[<8039fac8>] (wear_leveling_worker) from [<8039e724>] (do_work+0x84/0x104)
[<8039e724>] (do_work) from [<803a02c8>] (ubi_thread+0xd4/0x170)
[<803a02c8>] (ubi_thread) from [<8003c9c0>] (kthread+0xdc/0xf4)
[<8003c9c0>] (kthread) from [<8000f7b8>] (ret_from_fork+0x14/0x3c)
ubi1 error: do_work: work failed with error code -22
ubi1 error: ubi_io_write: read-only mode
ubi1 error: ubi_write_fastmap: unable to write vid_hdr to fastmap SB!
ubi1 error: ubi_thread: ubi_bgt1d: work failed with error code -22
ubi1 warning: ubi_update_fastmap: Unable to write new fastmap, err=-30
ubi1 error: ubi_io_write: read-only mode
ubi1 warning: ubi_eba_write_leb: failed to write VID header to LEB 0:2476, PEB 885
UBIFS error (ubi1:0 pid 22714): ubifs_leb_map: mapping LEB 2476 failed, error -30
UBIFS warning (ubi1:0 pid 22714): ubifs_ro_mode: switched to read-only mode, error -30
CPU: 0 PID: 22714 Comm: cp Not tainted 4.5.0 #4
Hardware name: Freescale i.MX6 SoloX (Device Tree)
[<800160f4>] (unwind_backtrace) from [<80013400>] (show_stack+0x10/0x14)
[<80013400>] (show_stack) from [<8029c6ec>] (dump_stack+0x88/0x9c)
[<8029c6ec>] (dump_stack) from [<802057f8>] (ubifs_leb_map+0x118/0x120)
[<802057f8>] (ubifs_leb_map) from [<8020ef00>] (ubifs_add_bud_to_log+0x1d0/0x330)
[<8020ef00>] (ubifs_add_bud_to_log) from [<801f9b94>] (make_reservation+0x378/0x4ac)
[<801f9b94>] (make_reservation) from [<801fa5bc>] (ubifs_jnl_write_data+0xe4/0x264)
[<801fa5bc>] (ubifs_jnl_write_data) from [<801fc1ec>] (do_writepage+0x78/0x1cc)
[<801fc1ec>] (do_writepage) from [<800aafc0>] (__writepage+0x14/0x40)
[<800aafc0>] (__writepage) from [<800ab86c>] (write_cache_pages+0x170/0x3c0)
[<800ab86c>] (write_cache_pages) from [<800abafc>] (generic_writepages+0x40/0x60)
[<800abafc>] (generic_writepages) from [<800a2f68>] (__filemap_fdatawrite_range+0x78/0x9c)
[<800a2f68>] (__filemap_fdatawrite_range) from [<800a30a8>] (filemap_write_and_wait_range+0x34/0x74)
[<800a30a8>] (filemap_write_and_wait_range) from [<801fc934>] (ubifs_fsync+0x40/0xb4)
[<801fc934>] (ubifs_fsync) from [<800a3e70>] (generic_file_write_iter+0x1b0/0x264)
[<800a3e70>] (generic_file_write_iter) from [<801fde28>] (ubifs_write_iter+0xe8/0x178)
[<801fde28>] (ubifs_write_iter) from [<800e3820>] (__vfs_write+0xa8/0xd8)
[<800e3820>] (__vfs_write) from [<800e451c>] (vfs_write+0x90/0x164)
[<800e451c>] (vfs_write) from [<800e4f64>] (SyS_write+0x44/0x9c)
[<800e4f64>] (SyS_write) from [<8000f700>] (ret_fast_syscall+0x0/0x3c)
CPU: 0 PID: 22714 Comm: cp Not tainted 4.5.0 #4
Hardware name: Freescale i.MX6 SoloX (Device Tree)
[<800160f4>] (unwind_backtrace) from [<80013400>] (show_stack+0x10/0x14)
[<80013400>] (show_stack) from [<8029c6ec>] (dump_stack+0x88/0x9c)
[<8029c6ec>] (dump_stack) from [<802057e4>] (ubifs_leb_map+0x104/0x120)
[<802057e4>] (ubifs_leb_map) from [<8020ef00>] (ubifs_add_bud_to_log+0x1d0/0x330)
[<8020ef00>] (ubifs_add_bud_to_log) from [<801f9b94>] (make_reservation+0x378/0x4ac)
[<801f9b94>] (make_reservation) from [<801fa5bc>] (ubifs_jnl_write_data+0xe4/0x264)
[<801fa5bc>] (ubifs_jnl_write_data) from [<801fc1ec>] (do_writepage+0x78/0x1cc)
[<801fc1ec>] (do_writepage) from [<800aafc0>] (__writepage+0x14/0x40)
[<800aafc0>] (__writepage) from [<800ab86c>] (write_cache_pages+0x170/0x3c0)
[<800ab86c>] (write_cache_pages) from [<800abafc>] (generic_writepages+0x40/0x60)
[<800abafc>] (generic_writepages) from [<800a2f68>] (__filemap_fdatawrite_range+0x78/0x9c)
[<800a2f68>] (__filemap_fdatawrite_range) from [<800a30a8>] (filemap_write_and_wait_range+0x34/0x74)
[<800a30a8>] (filemap_write_and_wait_range) from [<801fc934>] (ubifs_fsync+0x40/0xb4)
[<801fc934>] (ubifs_fsync) from [<800a3e70>] (generic_file_write_iter+0x1b0/0x264)
[<800a3e70>] (generic_file_write_iter) from [<801fde28>] (ubifs_write_iter+0xe8/0x178)
[<801fde28>] (ubifs_write_iter) from [<800e3820>] (__vfs_write+0xa8/0xd8)
[<800e3820>] (__vfs_write) from [<800e451c>] (vfs_write+0x90/0x164)
[<800e451c>] (vfs_write) from [<800e4f64>] (SyS_write+0x44/0x9c)
[<800e4f64>] (SyS_write) from [<8000f700>] (ret_fast_syscall+0x0/0x3c)
UBIFS error (ubi1:0 pid 22714): make_reservation: cannot reserve 4144 bytes in jhead 2, error -30
UBIFS error (ubi1:0 pid 22714): do_writepage: cannot write page 299 of inode 58400, error -30
cp: write error: Read-only file system
cp: can't create '/mnt/data/a2': Read-only file system
UBIFS error (ubi1:0 pid 6): make_reservation: cannot reserve 4144 bytes in jhead 2, error -30
UBIFS error (ubi1:0 pid 6): do_writepage: cannot write page 300 of inode 58400, error -30
UBIFS error (ubi1:0 pid 6): make_reservation: cannot reserve 4144 bytes in jhead 2, error -30
UBIFS error (ubi1:0 pid 6): do_writepage: cannot write page 301 of inode 58400, error -30
md5sum: can't open '/mnt/data/a2': No such file or directory
ERROR: /mnt/data/a1 and /mnt/data/a2 differ
~ #
~ # UBIFS error (ubi1:0 pid 6): make_reservation: cannot reserve 4144 bytes in jhead 2, error -30
UBIFS error (ubi1:0 pid 6): make_reservation: cannot reserve 4144 bytes in jhead 2, error -30
UBIFS error (ubi1:0 pid 6): do_writepage: cannot write page 303 of inode 58400, error -30
========================

Could you please share you thoughts on what could cause such errors? Does it confirm your initial suggestion that something happened on the MTD layer?

--
Regards,
Sergei Poselenov,
Emcraft Systems

On Thu, 24 Sep 2020 12:42:46 +0200 Richard Weinberger <richard.weinberger@gmail.com> wrote:

> On Thu, Sep 24, 2020 at 10:41 AM Sergei Poselenov
> <sposelenov@emcraft.com> wrote:
> > Freeing unused kernel memory: 280K (80898000 - 808de000)
> > UBIFS error (ubi0:0 pid 1): ubifs_read_node: bad node type (0 but expected 9)
> > UBIFS error (ubi0:0 pid 1): ubifs_read_node: bad node at LEB 12:100352, LEB mapping status 1
> > Not a node, first 24 bytes:
> > 00000000: 55 42 49 23 01 00 00 00 00 00 00 00 00 00 00 02 00 00 08 00 00 00 10 00                          UBI#....................  
> 
> So, UBIFS reads from a node and finds a UBI EC header, this must not happen.
> UBI headers are invisible to UBIFS. If UBIFS reads from a LEB and a
> given offset, the offset is translated
> to read data after the page(s) where EC/VID headers reside.
> 
> *unless* LEB 12 offset 100352 contains an UBIFS data node for a
> nanddump you did before
> and the index tree is corrupted. But I don't think so.
> 
> Looks more like a problem on the mtd side.
> 


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Help with UBI failure analysis
  2020-11-23  7:37   ` Sergei Poselenov
@ 2020-11-23  7:54     ` Richard Weinberger
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Weinberger @ 2020-11-23  7:54 UTC (permalink / raw)
  To: Sergei Poselenov; +Cc: linux-mtd

Sergei,

----- Ursprüngliche Mail -----
> ...........................................................................................................
> .......................................ubi1 warning: ubi_eba_copy_leb: read data
> back from PEB 895 and it is different

seems to me like a mtd driver issue.
UBI reads back the data it just wrote, the he result is not the same.
Please note, in recent kernels this check no longer exists, so the failure will look different.

Thanks,
//richard



______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-11-23  7:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-24  8:35 Help with UBI failure analysis Sergei Poselenov
2020-09-24 10:42 ` Richard Weinberger
2020-11-23  7:37   ` Sergei Poselenov
2020-11-23  7:54     ` Richard Weinberger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).