All of lore.kernel.org
 help / color / mirror / Atom feed
* Debian 10 boot problems with corrupted rw /var
@ 2019-08-20 17:17 Travis Griggs
  2019-08-25 20:28 ` Richard Weinberger
  0 siblings, 1 reply; 2+ messages in thread
From: Travis Griggs @ 2019-08-20 17:17 UTC (permalink / raw)
  To: linux-mtd

I apologize if this question is an intrusion on the scope of this mailing list. StackExchange hasn’t yielded much fruit in furthering my understanding, so I’m trying this list. I’m a neophyte in this area, but eager to learn/understand more.


- - - My Setup
Single board computer (<https://www.emtrion.de/en/details_products-accessoires/sbc-sama5d36-56.html>)

Linux Kernel is version 4.18.8. Bootargs are:
    
    console=ttyS0,115200 earlyprintk rootfstype=ubifs ubi.mtd=3 root=ubi0:rootfs ro rootwait

The OS is Debian 10.

The filesystem is split into a read-only root file system and a read-write /var. They are set up using the following commands:
 
    ubiformat /dev/mtd3
    ubiattach -p /dev/mtd3
    ubimkvol /dev/ubi0 --size=200MiB -N rootfs -n 1
    ubimkvol /dev/ubi0 -m -N var -n 2

/etc/fstab is modified to include the following line:

    /dev/ubi0_2 /var ubifs defaults,auto 0 0


- - - My Test
I'm doing power cycling tests. I power the board up for 3 minutes, and then kill power for 2 minutes, and then rerun. Until the thing dies.


- - - My Results
Results
----
I've done this a number of times. Always somewhere between 1600 and 1800 cycles, the system becomes semi-unbootable. It will boot partially, but it will not finish the boot because corruption appears to be corrupted in the /var file system.

   *snip*

    ubi0: attaching mtd3
    random: fast init done
    random: crng init done
    ubi0: scanning is finished
    ubi0: attached mtd3 (name "rootfs", size 510 MiB)
    ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
    ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
    ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
    ubi0: good PEBs: 4084, bad PEBs: 0, corrupted PEBs: 0
    ubi0: user volume: 2, internal volumes: 1, max. volumes count: 128
    ubi0: max/mean erase counter: 19/3, WL threshold: 4096, image sequence number: 18031529
    ubi0: available PEBs: 0, total reserved PEBs: 4084, PEBs reserved for bad PEB handling: 80
    ubi0: background thread "ubi_bgt0d" started, PID 649
    at91_rtc fffffeb0.rtc: setting system clock to 2019-08-19 22:47:49 UTC (1566254869)
    cfg80211: Loading compiled-in X.509 certificates for regulatory database
    cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
    platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
    cfg80211: failed to load regulatory.db
    UBIFS (ubi0:1): background thread "ubifs_bgt0_1" started, PID 664
    UBIFS (ubi0:1): recovery needed
    UBIFS (ubi0:1): recovery completed
    UBIFS (ubi0:1): UBIFS: mounted UBI device 0, volume 1, name "rootfs"
    UBIFS (ubi0:1): LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes
    UBIFS (ubi0:1): FS size: 208240640 bytes (198 MiB, 1640 LEBs), journal size 10412032 bytes (9 MiB, 82 LEBs)
    UBIFS (ubi0:1): reserved for root: 4952683 bytes (4836 KiB)
    UBIFS (ubi0:1): media format: w4/r0 (latest is w5/r0), UUID 8F0ED0A2-F456-474F-858E-BBD7235162BC, small LPT model
    VFS: Mounted root (ubifs filesystem) on device 0:13.
    devtmpfs: mounted
    Freeing unused kernel memory: 1024K
    systemd[1]: Failed to find module 'autofs4'
    systemd[1]: systemd 241 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid)
    systemd[1]: Detected architecture arm.
    
    Welcome to Debian GNU/Linux 10 (buster)!
    
    systemd[1]: Set hostname to <nelson>.
    systemd[1]: File /lib/systemd/system/systemd-journald.service:12 configures an IP firewall (IPAddressDeny=any), but the local system does not support BPF/cgroup based firewalling.
    systemd[1]: Proceeding WITHOUT firewalling in effect! (This warning is only shown for the first loaded unit using IP firewalling.)
    systemd[1]: Listening on initctl Compatibility Named Pipe.
    [  OK  ] Listening on initctl Compatibility Named Pipe.
    systemd[1]: Listening on Journal Socket.
    [  OK  ] Listening on Journal Socket.
    systemd[1]: Created slice User and Session Slice.
    [  OK  ] Created slice User and Session Slice.
    systemd[1]: Listening on udev Kernel Socket.
    [  OK  ] Listening on udev Kernel Socket.
    systemd[1]: Started Forward Password Requests to Wall Directory Watch.
    [  OK  ] Started Forward Password R…uests to Wall Directory Watch.
    [  OK  ] Started Dispatch Password …ts to Console Directory Watch.
    [  OK  ] Reached target Paths.
    [  OK  ] Listening on Network Service Netlink Socket.
    [  OK  ] Reached target Local Encrypted Volumes.
    [  OK  ] Created slice system-getty.slice.
    [  OK  ] Listening on Syslog Socket.
    [  OK  ] Reached target Slices.
             Starting Load Kernel Modules...
             Mounting Kernel Debug File System...
    [  OK  ] Reached target Remote File Systems.
    [  OK  ] Listening on Journal Socket (/dev/log).
             Starting Journal Service...
    [  OK  ] Listening on udev Control Socket.
             Starting udev Coldplug all Devices...
             Starting Remount Root and Kernel File Systems...
    [  OK  ] Created slice system-serial\x2dgetty.slice.
    [  OK  ] Reached target Swap.
             Mounting Temporary Directory (/tmp)...
    [  OK  ] Started Load Kernel Modules.
    [  OK  ] Mounted Kernel Debug File System.
    [  OK  ] Mounted Temporary Directory (/tmp).
             Starting Apply Kernel Variables...
             Mounting Kernel Configuration File System...
    [  OK  ] Started Remount Root and Kernel File Systems.
             Starting Create System Users...
    [  OK  ] Mounted Kernel Configuration File System.
    [  OK  ] Started Apply Kernel Variables.
    [  OK  ] Started Create System Users.
             Starting Create Static Device Nodes in /dev...
    [  OK  ] Started Journal Service.
    [  OK  ] Started Create Static Device Nodes in /dev.
    [  OK  ] Reached target Local File Systems (Pre).
             Starting udev Kernel Device Manager...
    [  OK  ] Started udev Kernel Device Manager.
             Starting Network Service...
    [  OK  ] Started udev Coldplug all Devices.
             Starting Helper to synchronize boot up for ifupdown...
    [  OK  ] Started Helper to synchronize boot up for ifupdown.
    ubi0 error: ubi_open_volume.part.0: cannot open device 0, volume 1, error -16
    ubi0 error: ubi_open_volume.part.0: cannot open device 0, volume 1, error -16
    IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
    A link change request failed with some changes committed already. Interface eth1 may have been left with an inconsistent configuration, please check.
    [  OK  ] Started Network Service.
    macb f802c000.ethernet eth1: link up (100/Full)
    IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
    [  OK  ] Found device /dev/ttyS0.
    [  OK  ] Found device /dev/ubi0_2.
             Mounting /var...
    UBIFS (ubi0:2): background thread "ubifs_bgt0_2" started, PID 1152
    UBIFS (ubi0:2): recovery needed
    UBIFS error (ubi0:2 pid 1150): ubifs_get_pnode.part.4: error -22 reading pnode at 9:61574
    (pid 1150) dumping pnode:
            address ce53ce00 parent ce53ce80 cnext 0
            flags 0 iip 3 level 0 num 0
            0: free 0 dirty 113080 flags 1 lnum 0
            1: free 0 dirty 115264 flags 1 lnum 0
            2: free 0 dirty 113408 flags 1 lnum 0
            3: free 110592 dirty 130824 flags 34 lnum 0
    CPU: 0 PID: 1150 Comm: mount Not tainted 4.18.8-nelson #1
    Hardware name: Atmel SAMA5
    [<c010e094>] (unwind_backtrace) from [<c010b5dc>] (show_stack+0x10/0x14)
    [<c010b5dc>] (show_stack) from [<c02bf5c4>] (ubifs_get_pnode.part.4+0x22c/0x290)
    [<c02bf5c4>] (ubifs_get_pnode.part.4) from [<c02c0ff0>] (ubifs_lpt_lookup_dirty+0x22c/0x29c)
    [<c02c0ff0>] (ubifs_lpt_lookup_dirty) from [<c02c435c>] (ubifs_update_one_lp+0x34/0x140)
    [<c02c435c>] (ubifs_update_one_lp) from [<c02b277c>] (ubifs_tnc_add+0xd8/0x148)
    [<c02b277c>] (ubifs_tnc_add) from [<c02b5e48>] (ubifs_replay_journal+0xe18/0x11ec)
    [<c02b5e48>] (ubifs_replay_journal) from [<c02ab4a8>] (ubifs_mount+0x111c/0x1520)
    [<c02ab4a8>] (ubifs_mount) from [<c01ba788>] (mount_fs+0x14/0xa4)
    [<c01ba788>] (mount_fs) from [<c01d45d8>] (vfs_kern_mount.part.3+0x48/0xe4)
    [<c01d45d8>] (vfs_kern_mount.part.3) from [<c01d6a80>] (do_mount+0x54c/0xbb0)
    [<c01d6a80>] (do_mount) from [<c01d7450>] (ksys_mount+0x8c/0xb4)
    [<c01d7450>] (ksys_mount) from [<c0101000>] (ret_fast_syscall+0x0/0x54)
    Exception stack(0xce4cffa8 to 0xce4cfff0)
    ffa0:                   00000000 b6ef490c 00493c70 00493c80 00493c60 00000000
    ffc0: 00000000 b6ef490c 00000000 00000015 00493c60 b6ef50e8 00493c60 b6ef50e8
    ffe0: b6ef4fc4 becdab30 b6ec487f b6e20eaa
    UBIFS error (ubi0:2 pid 1150): ubifs_get_pnode.part.4: calc num: 115
    UBIFS error (ubi0:2 pid 1150): ubifs_update_one_lp: cannot update properties of LEB 476, error -22
    UBIFS (ubi0:2): background thread "ubifs_bgt0_2" stops
    [FAILED] Failed to mount /var.
    See 'systemctl status var.mount' for details.
    [DEPEND] Dependency failed for D-Bus System Message Bus Socket.
    [DEPEND] Dependency failed for Login Service.
    [DEPEND] Dependency failed for D-Bus System Message Bus.
    [DEPEND] Dependency failed for Load/Save Random Seed.
    [DEPEND] Dependency failed for Upda…about System Runlevel Changes.
    [DEPEND] Dependency failed for Network Time Synchronization.
    [DEPEND] Dependency failed for Upda…MP about System Boot/Shutdown.
    [DEPEND] Dependency failed for Daily rotation of log files.
    [DEPEND] Dependency failed for Daily apt download activities.
    [DEPEND] Dependency failed for Local File Systems.
    [DEPEND] Dependency failed for Flus…Journal to Persistent Storage.
    [DEPEND] Dependency failed for Network Name Resolution.
    [  OK  ] Reached target Host and Network Name Lookups.
             Starting Raise network interfaces...
    [  OK  ] Stopped Forward Password R…uests to Wall Directory Watch.
    [  OK  ] Stopped Dispatch Password …ts to Console Directory Watch.
    [  OK  ] Reached target Login Prompts.
    [  OK  ] Closed Syslog Socket.
    [  OK  ] Started Emergency Shell.
    [  OK  ] Reached target Emergency Mode.
    [  OK  ] Reached target Timers.
    [  OK  ] Reached target System Time Synchronized.
             Starting Create Volatile Files and Directories...
    [  OK  ] Reached target Sockets.
    [  OK  ] Started Raise network interfaces.
    [  OK  ] Reached target Network.
    [  OK  ] Started Create Volatile Files and Directories.
    You are in emergency mode. After logging in, type "journalctl -xb" to view
    system logs, "systemctl reboot" to reboot, "systemctl default" or "exit"
    to boot into default mode.
    
    Cannot open access to console, the root account is locked.
    See sulogin(8) man page for more details.
    
    Press Enter to continue.


- - - My Questions
1. How can I get to the bottom of what/why my /var file system is getting corrupted?
2. I accept that UBIFS doesn't protect me from power-cycle corruption issues 100%. I have been lead to believe it simply makes it more robust in the presence of power cycle events. But why does it always fail between 1600 and 1800 iterations? If it was truly random, I'd expect it to fail at random power cycle events. It's like some counter is wearing out though.
3. The assumption I'm working on here is that having the majority of the system be read-only would increase the robustness of the system (less brick able). But it seems it's only made it so it boots further, but ultimately still bricks. What can I add it to make the ro/rw split actually meaningful?
4. It’s been suggested that I should place the two separate volumes (rootfs and /var on separate mtds). I’m going to experiment with that, but will it make a difference?


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Debian 10 boot problems with corrupted rw /var
  2019-08-20 17:17 Debian 10 boot problems with corrupted rw /var Travis Griggs
@ 2019-08-25 20:28 ` Richard Weinberger
  0 siblings, 0 replies; 2+ messages in thread
From: Richard Weinberger @ 2019-08-25 20:28 UTC (permalink / raw)
  To: Travis Griggs; +Cc: linux-mtd

Travis,

On Tue, Aug 20, 2019 at 7:17 PM Travis Griggs <travisgriggs@gmail.com> wrote:
> 1. How can I get to the bottom of what/why my /var file system is getting corrupted?

This sounds a little like the UBIFS xattrs issue I've been hunting
down. Fixes are upstream but not in stable-trees yet.
Can you please give Linus' tree as of today a try?

> 2. I accept that UBIFS doesn't protect me from power-cycle corruption issues 100%. I have been lead to believe it simply makes it more robust in the presence of power cycle events. But why does it always fail between 1600 and 1800 iterations? If it was truly random, I'd expect it to fail at random power cycle events. It's like some counter is wearing out though.

UBIFS should not die from power-cuts. I have no idea why it fails in
your case between 1600 and 1800.

> 3. The assumption I'm working on here is that having the majority of the system be read-only would increase the robustness of the system (less brick able). But it seems it's only made it so it boots further, but ultimately still bricks. What can I add it to make the ro/rw split actually meaningful?

Well, the boot fails because your system depends hard on a rw UBIFS?
Usually such a split is useful to make update concepts easy or to
detect bad programs.

> 4. It’s been suggested that I should place the two separate volumes (rootfs and /var on separate mtds). I’m going to experiment with that, but will it make a difference?

Never have multiple UBI instances on the same flash. UBI should use as
much from the flash as it can to have a large
wear leveling domain. On top of UBI you can have multiple volumes if you want.

-- 
Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-08-25 20:29 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-20 17:17 Debian 10 boot problems with corrupted rw /var Travis Griggs
2019-08-25 20:28 ` Richard Weinberger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.