On Fri, 2012-06-29 at 16:05 +1000, Iwo Mergler wrote: > > > It is possible to avoid the failure by performing a large number of > > > filesystem operations (i.e. file system benchmark) during the first > > > session. > > > > Hmm, sounds strange. > > While trying to reproduce the problem, I have come across another > way to avoid it. If the boot scripts in the rootfs perform an > ubiformat, attach, mkvol & mount on an unrelated empty mtd > partition, the problem goes away. > > Is there any global state shared between separate UBI/UBIFS > partitions? No. Do you MTD partitions overlap? What is in /proc/mtd ? > > This means the driver is buggy: it does not support sub-pages but > > still reports that it does. Just fix it instead. > > I was under the impression that the subpage capability is extracted > from the ONFI information. So I take it there is a flag for the > driver to override that? I do not know your system, but if your flash chip supports subpages, but the ECC you use does not allow them, the driver should report that sub-pages are not supported.. > > Did you try to mount an empty volume and let UBIFS auto-format it, and > > then reproduce the issue? > > No, UBIFS created from an empty partition work OK. In fact, doing that > also stops the rootfs mount failure on the second boot. Sounds like this is not UBIFS fault but rather like a side-effect of something strange happening elsewhere. Probably it is related to how you flash it. We had the following issue in the past. 1. You have some UBI on your flash. Then you want to flash an new image. 2. The flasher for some reason did not erase some PEBs of the partition. Probably because Linux view of the partition and flashers did not 100% match. Anyway, on or few PEBs were not erased in the end of the partition. Lets call them "ghost PEBs". 3. We flashed new image. 4. UBI attached the partition, the ghost PEBs were scanned and treated as valid PEBs and their data appeared in one of the volumes, because their generation numbers were higher than in PEBs from the new image (the generation number is in the UBI headers). The ghost data, instead of valid data, was read by UBIFS. And we had strange corruptions. We introduced so-called "image sequence number" to catch such issues. It is stored in the EC header. All EC headers on the MTD device have to have the same. Every time we generate an image - we pick random one. So if there are ghost PEBs, we notice this because they have a different image sequence number. See 'image_seq' in drivers/mtd/ubi/ubi-media.h. Can this problem affect you as well? If you use 'ubiformat' for flashing your images, it will generate a random image sequence number every time it flashes. So it won't use the one in the image. Do you use ubiformat for flashing? If not, try to re-generate your image - ubinize will put a different number there, and flash it and see what happens. You'd get an error like this: UBI error: process_eb: bad image sequence number 3726164569 in PEB 47, expected 642536469 Additional thoughts... I think what could be more interesting if you could enable debugging for real. The docs on the web-site are out of date and we switched to dynamic debugging, so you need to enable the debugging messages differently. I need to write a howto, and I do not know how to do this via kernel cmdline so far, need to find out. I know how to do this via debugfs. But check Documentation/dynamic-debug-howto.txt. The image is not very helpful. UBI or UBIFS messages would probably allow to track what UBI/UBIFS is doing to the "faulty" LEB and corresponding PEB and verify that it is ok. But I really have a strong feeling it is not UBI/UBIFS fault, so may be we'd spend time to just prove this. -- Best Regards, Artem Bityutskiy