All of lore.kernel.org
 help / color / mirror / Atom feed
* Temporarily remounting rootfs as rw leads to kernel panic on reboot
@ 2016-06-07  0:17 David Olson
  2016-06-07  1:47 ` Mychaela Falconia
  0 siblings, 1 reply; 4+ messages in thread
From: David Olson @ 2016-06-07  0:17 UTC (permalink / raw)
  To: linux-mtd

In our currently shipping product, we've arranged so that rootfs is mounted ro
by doing the following:

 - using these parameters in the kernel command line invoked by U-Boot:
        rootwait=1 ro ubi.mtd=4,2048 rootfstype=ubifs root=ubi0:rootfs
 - having this line in /etc/fstab:
        /dev/root / auto ro,errors=remount-ro 0 0

After a successful boot, the 'mount' command shows (among other presumably
irrelevant lines), the following:
        rootfs on / type rootfs (rw)
        ubi0:rootfs on / type ubifs (ro,relatime)

Here is a portion of the boot log showing UBI & UBIFS related messages:

===========
ONFI flash detected
NAND device: Manufacturer ID: 0x2c, Chip ID: 0xba (Micron NAND 256MiB 1,8V 16-bit)
omap2-nand: detected x16 NAND flash
Creating 8 MTD partitions on "omap2-nand.0":
0x000000000000-0x000000020000 : "U-Boot-min"
0x000000020000-0x000000260000 : "U-Boot"
0x000000260000-0x000000280000 : "U-Boot Env"
0x000000280000-0x0000006c0000 : "Kernel"
0x0000006c0000-0x00000df80000 : "File System"
0x00000df80000-0x00000e3c0000 : "Guard Band"
0x00000e3c0000-0x00000f7c0000 : "Data"
0x00000f7c0000-0x000010000000 : "Reserved"
UBI: attaching mtd4 to ubi0
UBI: physical eraseblock size:   131072 bytes (128 KiB)
UBI: logical eraseblock size:    126976 bytes
UBI: smallest flash I/O unit:    2048
UBI: sub-page size:              512
UBI: VID header offset:          2048 (aligned 2048)
UBI: data offset:                4096
UBI: max. sequence number:       2
UBI: attached mtd4 to ubi0
UBI: MTD device name:            "File System"
UBI: MTD device size:            216 MiB
UBI: number of good PEBs:        1734
UBI: number of bad PEBs:         0
UBI: number of corrupted PEBs:   0
UBI: max. allowed volumes:       128
UBI: wear-leveling threshold:    4096
UBI: number of internal volumes: 1
UBI: number of user volumes:     1
UBI: available PEBs:             0
UBI: total number of reserved PEBs: 1734
UBI: number of PEBs reserved for bad PEB handling: 34
UBI: max/mean erase counter: 1/0
UBI: image sequence number:  1806916620
UBI: background thread "ubi_bgt0d" started, PID 46

...

UBIFS: mounted UBI device 0, volume 0, name "rootfs"
UBIFS: mounted read-only
UBIFS: file system size:   166211584 bytes (162316 KiB, 158 MiB, 1309 LEBs)
UBIFS: journal size:       9023488 bytes (8812 KiB, 8 MiB, 72 LEBs)
UBIFS: media format:       w4/r0 (latest is w4/r0)
UBIFS: default compressor: lzo
UBIFS: reserved for root:  0 bytes (0 KiB)
VFS: Mounted root (ubifs filesystem) readonly on device 0:13.
===========

We are starting work on the next release and want to allow developers with
shell access to temporarily remount rootfs rw.  We do so by using the
following command:

        mount -o remount,rw /

A subsequent 'mount' invocation shows that this was apparently successful.  If
we immediately cycle power, the boot log is essentially identical up to the
point where we should be seeing "UBIFS" messages.  Instead we see:

===========
UBI error: ubi_io_read: error -74 (ECC error) while reading 126976 bytes from PEB 3:4096, read 126976 bytes
UBI error: ubi_io_read: error -74 (ECC error) while reading 126976 bytes from PEB 4:4096, read 126976 bytes
UBI error: ubi_io_read: error -74 (ECC error) while reading 11 bytes from PEB 10:10240, read 11 bytes
UBIFS error (pid 1): ubifs_leb_read: reading 11 bytes from LEB 8:6144 failed, error -74
VFS: Cannot open root device "ubi0:rootfs" or unknown-block(0,0)
Please append a correct "root=" boot option; here are the available partitions:
1f00             128 mtdblock0  (driver?)
1f01            2304 mtdblock1  (driver?)
1f02             128 mtdblock2  (driver?)
1f03            4352 mtdblock3  (driver?)
1f04          221952 mtdblock4  (driver?)
1f05            4352 mtdblock5  (driver?)
1f06           20480 mtdblock6  (driver?)
1f07            8448 mtdblock7  (driver?)
b300         7879680 mmcblk0  driver: mmcblk
  b301         7875584 mmcblk0p1 00000000-0000-0000-0000-000000000000
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
Backtrace:
[<8004c86c>] (dump_backtrace+0x0/0x100) from [<803aeed8>] (dump_stack+0x18/0x1c)
 r6:8002e4e8 r5:af414000 r4:8053d018 r3:60000013
[<803aeec0>] (dump_stack+0x0/0x1c) from [<803af064>] (panic+0x60/0x178)
[<803af004>] (panic+0x0/0x178) from [<80009180>] (mount_block_root+0x1d8/0x218)
 r3:60000013 r2:00000000 r1:af43bf70 r0:8047168e
 r7:8002e4d4
[<80008fa8>] (mount_block_root+0x0/0x218) from [<80009418>] (prepare_namespace+0x94/0x1c4)
[<80009384>] (prepare_namespace+0x0/0x1c4) from [<80008d88>] (kernel_init+0x114/0x14c)
 r6:800731d0 r5:8002daac r4:8002daac
[<80008c74>] (kernel_init+0x0/0x14c) from [<800731d0>] (do_exit+0x0/0x5c0)
===========

This is a permanent condition, occurring on every subsequent boot.  Recovery
requires re-flashing rootfs.

The sequence
        mount -o remount,ro /
        <cycle power>
does not cause the problem.

The sequence
        mount -o remount,rw /
        mount -o remount,ro /
        <cycle power>
does cause the problem.

I should note that there was a period of time prior to our first release where
we could successfully temporarily remount rootfs rw.  However there have been
many changes since we last tried doing so.  So what I'm hoping for is to gain
sufficient understanding of the effect on a UBIFS file system of making it rw
so that I can chase this further.

P.S. This is my first ever message on this list.  I believe I've followed the published
protocol however I'm eager to correct any failures to do so.

--
David H. Olson
+1 503 530 7642
www.welchallyn.com
9525 SW Gemini Dr, Beaverton, OR 97008


CONFIDENTIAL NOTICE: If you are not the intended recipient of this message, you are not authorized to intercept, read, print, retain, copy, forward, or disseminate this communication. This communication may contain information that is proprietary, attorney/client privileged, attorney work product, confidential or otherwise legally exempt from disclosure. If you have received this message in error, please notify the sender immediately either by phone or by return e-mail, and destroy all copies of this message, electronic, paper, or otherwise.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Temporarily remounting rootfs as rw leads to kernel panic on reboot
  2016-06-07  0:17 Temporarily remounting rootfs as rw leads to kernel panic on reboot David Olson
@ 2016-06-07  1:47 ` Mychaela Falconia
  2016-06-09 21:31   ` David Olson
  2016-06-14 16:46   ` David Olson
  0 siblings, 2 replies; 4+ messages in thread
From: Mychaela Falconia @ 2016-06-07  1:47 UTC (permalink / raw)
  To: David Olson; +Cc: linux-mtd

> UBI: smallest flash I/O unit:    2048
> UBI: sub-page size:              512
> UBI: VID header offset:          2048 (aligned 2048)
> UBI: data offset:                4096

First sign of inconsistency: your UBI reports here that the underlying
NAND has subpages, yet it doesn't use them to put the EC and VID
headers in the same page, i.e., your VID offset is equal to the full
page size and the data offset is 2x the full page size. Perhaps when
you created the UBI data structures in your flash "in vitro" with
ubiformat or ubinize you forgot to tell those tools your 512 byte
subpage size?

> UBI error: ubi_io_read: error -74 (ECC error) while reading 126976 bytes
> from PEB 3:4096, read 126976 bytes
> UBI error: ubi_io_read: error -74 (ECC error) while reading 126976 bytes
> from PEB 4:4096, read 126976 bytes
> UBI error: ubi_io_read: error -74 (ECC error) while reading 11 bytes from
> PEB 10:10240, read 11 bytes
> UBIFS error (pid 1): ubifs_leb_read: reading 11 bytes from LEB 8:6144
> failed, error -74

Let me guess, the flash was written in production with a production
line programmer and not with ubiformat, was it? If you use a "dumb"
(non-UBI-aware) flasher to program your flash at the factory and then
try to mount something read-write, the following nasty problem occurs:
suppose some block has empty pages at the end, i.e., pages containing
all 0xFF bytes. UBIFS will assume that these NAND pages are truly
blank, i.e., never written to since they've been erased, and it will
write to them. But each NAND flash page must only be written once
(subpage writes aside, if and when they are allowed), and if you used
a non-UBI-aware dumb flasher, those pages may have already been
written to - even if they contain all 0xFF bytes. With some hardware
ECC schemes writing all 0xFF bytes to a page is NOT the same as
leaving it alone after the block erase, and when UBIFS later writes to
that same page again (on a read-write mount), the result is a
corrupted page that returns hard ECC errors when read. It seem to me
like you are hitting this very problem.

The solution is to write your NAND with a tool like ubiformat that
refrains from writing the trailing pages of each block whenever they
contain all 0xFF bytes. And while you are at it, you may want to fix
your image generation so that the subpage setting agrees with what the
kernel sees.

HTH,
M~

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Temporarily remounting rootfs as rw leads to kernel panic on reboot
  2016-06-07  1:47 ` Mychaela Falconia
@ 2016-06-09 21:31   ` David Olson
  2016-06-14 16:46   ` David Olson
  1 sibling, 0 replies; 4+ messages in thread
From: David Olson @ 2016-06-09 21:31 UTC (permalink / raw)
  To: Mychaela Falconia; +Cc: linux-mtd

(On our last project, we used ext3 and eMMC and didn't have to spend
much time learning about filesystems since it just worked.  This is our first
project with the MTD/UBI/UBIFS stack and it has proven to be far more
troublesome, requiring us to delve deeply into a considerable body of
complex code, including the build system.  Consequently answering
questions and trying experiments is taking quite some time ;-)

> > UBI: smallest flash I/O unit:    2048
> > UBI: sub-page size:              512
> > UBI: VID header offset:          2048 (aligned 2048)
> > UBI: data offset:                4096
>
> First sign of inconsistency: your UBI reports here that the underlying NAND
> has subpages, yet it doesn't use them to put the EC and VID headers in the
> same page, i.e., your VID offset is equal to the full page size and the data
> offset is 2x the full page size.

We started with a base provided by TI.  They were the ones who provided the
U-Boot & kernel forks as well as the build system.  The mechanism used to set
the header offset is this portion of the TI provided kernel command line:
"ubi.mtd=4,2048".  That is one of the techniques described in the UBI FAQ topic
"How do I force UBI to ignore sub-pages?".   Further, TI did not include the
'-s' parameter in the ubinize invocation, so may have done this deliberately.
I shall continue researching this.

> > UBI error: ubi_io_read: error -74 (ECC error) while reading 126976
> > bytes from PEB 3:4096, read 126976 bytes UBI error: ubi_io_read: error
> > -74 (ECC error) while reading 126976 bytes from PEB 4:4096, read
> > 126976 bytes UBI error: ubi_io_read: error -74 (ECC error) while
> > reading 11 bytes from PEB 10:10240, read 11 bytes UBIFS error (pid 1):
> > ubifs_leb_read: reading 11 bytes from LEB 8:6144 failed, error -74
>
> Let me guess, the flash was written in production with a production line
> programmer and not with ubiformat, was it?

It was written via the U-Boot "nand write" command.  We are constrained to
continue using U-Boot this way.  As noted, our U-Boot is a fork provided by TI
which is based on a U-Boot version from sometime early in 2010.  A later
U-Boot update introduced the "nand write.trimffs" variant which seems
attractive, however there was considerably more than the usual schedule
pressure for our first release so we instead used the "-F" option of the
mkfs.ubifs command.

That worked and continued to work when we changed to mounting rootfs ro.
There was even a time after we did that where it was possible on the
command line to remount rootfs rw.  However time passed, other changes
were made, and now things go very badly when we do that.

The immediate issue that has me confused is that merely remounting rootfs
rw and then immediately cycling power is sufficient to destroy rootfs.  Since
remounting ro does NOT destroy rootfs, it's not inherent in the mount
operation.  It must be the 'rw' that's triggering this destruction.  Any thoughts
on what might be happening?


CONFIDENTIAL NOTICE: If you are not the intended recipient of this message, you are not authorized to intercept, read, print, retain, copy, forward, or disseminate this communication. This communication may contain information that is proprietary, attorney/client privileged, attorney work product, confidential or otherwise legally exempt from disclosure. If you have received this message in error, please notify the sender immediately either by phone or by return e-mail, and destroy all copies of this message, electronic, paper, or otherwise.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Temporarily remounting rootfs as rw leads to kernel panic on reboot
  2016-06-07  1:47 ` Mychaela Falconia
  2016-06-09 21:31   ` David Olson
@ 2016-06-14 16:46   ` David Olson
  1 sibling, 0 replies; 4+ messages in thread
From: David Olson @ 2016-06-14 16:46 UTC (permalink / raw)
  To: linux-mtd

> > > UBI error: ubi_io_read: error -74 (ECC error) while reading 126976
> > > bytes from PEB 3:4096, read 126976 bytes UBI error: ubi_io_read:
> > > error
> > > -74 (ECC error) while reading 126976 bytes from PEB 4:4096, read
> > > 126976 bytes UBI error: ubi_io_read: error -74 (ECC error) while
> > > reading 11 bytes from PEB 10:10240, read 11 bytes UBIFS error (pid 1):
> > > ubifs_leb_read: reading 11 bytes from LEB 8:6144 failed, error -74

The root cause has been determined and was the result of two changes, one
deliberate and one accidental.

Since we write the UBIFS rootfs to NAND via commands from a version of
U-Boot that doesn't support "nand write.trimffs", we use the "-F" flag in the
invocation of mksf.ubifs.  It turns out that the "-F" flag (at least in the version
of UBIFS we are using) does NOT cause the requested "free space fixup"
operation on the first mount if the mount is not r/w.  For our system,
rootfs is mounted twice, once using parameters passed in from U-Boot on the
kernel command line and later when the kernel processes /etc/fstab.  While
the latter requested a ro mount, the former, until late in the project, requested
a rw mount.  Changing the kernel command line to request a ro mount was a
deliberate change late in the project.  The other change was the inadvertent
loss of the "-F" flag when we ported from the TI provide "makefile based" build
to a Yocto "bitbake based" build.  Since the "-F" flag is ineffective when the first
mount is ro, we didn't notice the change.  Undoing both those changes has
resolved our issue.

While I now understand why a write to our rootfs in the absence of the "free
space fixup" operation would corrupt it, I still don't understand why just
changing the mount parameters would cause a write.  But I guess it no longer
matters.


CONFIDENTIAL NOTICE: If you are not the intended recipient of this message, you are not authorized to intercept, read, print, retain, copy, forward, or disseminate this communication. This communication may contain information that is proprietary, attorney/client privileged, attorney work product, confidential or otherwise legally exempt from disclosure. If you have received this message in error, please notify the sender immediately either by phone or by return e-mail, and destroy all copies of this message, electronic, paper, or otherwise.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-06-14 16:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-07  0:17 Temporarily remounting rootfs as rw leads to kernel panic on reboot David Olson
2016-06-07  1:47 ` Mychaela Falconia
2016-06-09 21:31   ` David Olson
2016-06-14 16:46   ` David Olson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.