All of lore.kernel.org
 help / color / mirror / Atom feed
* Corrupted UBIFS, bad CRC
@ 2012-01-12 13:47 Karsten Jeppesen
  2012-01-15 12:24 ` Artem Bityutskiy
  0 siblings, 1 reply; 12+ messages in thread
From: Karsten Jeppesen @ 2012-01-12 13:47 UTC (permalink / raw)
  To: ubifs

Hi Guys,

Artem was the last one to respond back in November and I have been working hard on this ever since, but porting kernels takes a bit.
I am sorry not to have included the content of the earlier emails but I am attempting to answer all the outstanding questions.

Yes, Artem, I downloaded and adapted the backported tree (kernel 2.6.32 which was the closest to our 2.6.32.8) and it still showed the error.

I am painfully aware of that you like to look at problems close to current state and I am *really* trying to accomodate that.
I have ported kernel 3.2.0 (rudimentary though) to test for this problem, and it still exists.
I am provoking the error by having 16 machines powercycle at 20 secs power-on, 3 secs power-off and in 24hrs 2 machines will fail.

I have run the speed-test (see below if interested) and I will be running the stresstest later today or in the weekend.
As I stated: this test was done on a stock kenel 3.2.0 patched to our ARM9263

You stated last time that you were able to reclaim the blocks using a PC. Could this be an architectual problem PC/ARM ?


(Structure needs cleaning - is there an fsck for that?)

Sincerely,
Dr. Karsten Jeppesen


Last time I submitted way to much debug. This time hope it is correct:
--- MOUNTING DEBUG OUTPUT (mount -t ubifs ubi0:rootfs /skov/mnt/rootfs)

# mount -t ubifs ubi0:rootfs /skov/mnt/rootfs
UBIFS: recovery needed
UBIFS error (pid 18479): ubifs_recover_leb: corrupt empty space LEB 4:0, corruption starts at 144
UBIFS error (pid 18479): ubifs_scanned_corruption: corruption at LEB 4:144
UBIFS error (pid 18479): ubifs_scanned_corruption: first 8192 bytes from LEB 4:144
00000000: 00000000 00000000 00000000 00000000 ffffffff ffffffff ffffffff ffffffff  ................................
00000020: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff  ................................
00000040: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff  ................................
... more lines with just fffffff
00001fc0: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff  ................................
00001fe0: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff  ................................
UBIFS error (pid 18479): ubifs_recover_leb: LEB 4 scanning failed
mount: mounting ubi0:rootfs on /skov/mnt/rootfs failed: Structure needs cleaning
#
---

--- SPEEDTEST OUTPUT

# modprobe mtd_speedtest dev=4
 
=================================================
mtd_speedtest:
MTD device: 4
mtd_speedtest:
not NAND flash, assume page size is 512 bytes.
mtd_speedtest:
MTD device size 63700992, eraseblock size 131072, page size 512, count of
eraseblocks 486, pages per eraseblock 256, OOB size 0
mtd_speedtest:
testing eraseblock write speed
mtd_speedtest:
eraseblock write speed is 148 KiB/s
mtd_speedtest:
testing eraseblock read speed
mtd_speedtest:
eraseblock read speed is 1531 KiB/s
mtd_speedtest:
testing page write speed
mtd_speedtest:
page write speed is 149 KiB/s
mtd_speedtest:
testing page read speed
mtd_speedtest:
page read speed is 1475 KiB/s
mtd_speedtest:
testing 2 page write speed
mtd_speedtest:
2 page write speed is 147 KiB/s
mtd_speedtest:
testing 2 page read speed
mtd_speedtest:
2 page read speed is 1505 KiB/s
mtd_speedtest:
Testing erase speed
mtd_speedtest:
erase speed is 334 KiB/s
mtd_speedtest:
Testing 2x multi-block erase speed
mtd_speedtest:
2x multi-block erase speed is 299 KiB/s
mtd_speedtest:
Testing 4x multi-block erase speed
mtd_speedtest:
4x multi-block erase speed is 295 KiB/s
mtd_speedtest:
Testing 8x multi-block erase speed
mtd_speedtest:
8x multi-block erase speed is 293 KiB/s
mtd_speedtest:
Testing 16x multi-block erase speed
mtd_speedtest:
16x multi-block erase speed is 291 KiB/s
mtd_speedtest:
Testing 32x multi-block erase speed
mtd_speedtest:
32x multi-block erase speed is 289 KiB/s
mtd_speedtest:
Testing 64x multi-block erase speed
mtd_speedtest:
64x multi-block erase speed is 286 KiB/s
mtd_speedtest:
finished
=================================================
#
---


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Corrupted UBIFS, bad CRC
  2012-01-12 13:47 Corrupted UBIFS, bad CRC Karsten Jeppesen
@ 2012-01-15 12:24 ` Artem Bityutskiy
  2012-01-16  8:18   ` Karsten Jeppesen
  0 siblings, 1 reply; 12+ messages in thread
From: Artem Bityutskiy @ 2012-01-15 12:24 UTC (permalink / raw)
  To: Karsten Jeppesen; +Cc: ubifs

On Thu, 2012-01-12 at 05:47 -0800, Karsten Jeppesen wrote:
> Hi Guys,

Can you make a dump of your UBI volume and share it with me?

dd if=/dev/ubi0_0 of=ubi.img

and share ubi.img with me?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Corrupted UBIFS, bad CRC
  2012-01-15 12:24 ` Artem Bityutskiy
@ 2012-01-16  8:18   ` Karsten Jeppesen
  2012-01-16 10:24     ` Artem Bityutskiy
  0 siblings, 1 reply; 12+ messages in thread
From: Karsten Jeppesen @ 2012-01-16  8:18 UTC (permalink / raw)
  To: dedekind1; +Cc: ubifs

Hi Artem,

Of course I can. Anything I can do to help.

I put it on one of my outside servers and I put as well the /dev/mtd4 as the ubi0_0 you asked for:

<http://download.gnist.skov.com/corrupt_mtd4_20120116.img>
and
<http://download.gnist.skov.com/ubi0_0_20120116.img>

Have a nice one,

Karsten



----- Original Message -----
From: Artem Bityutskiy <dedekind1@gmail.com>
To: Karsten Jeppesen <arm9263@yahoo.com>
Cc: ubifs <linux-mtd@lists.infradead.org>
Sent: Sunday, January 15, 2012 1:24 PM
Subject: Re: Corrupted UBIFS, bad CRC

On Thu, 2012-01-12 at 05:47 -0800, Karsten Jeppesen wrote:
> Hi Guys,

Can you make a dump of your UBI volume and share it with me?

dd if=/dev/ubi0_0 of=ubi.img

and share ubi.img with me?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Corrupted UBIFS, bad CRC
  2012-01-16  8:18   ` Karsten Jeppesen
@ 2012-01-16 10:24     ` Artem Bityutskiy
  2012-01-16 12:40       ` Karsten Jeppesen
  0 siblings, 1 reply; 12+ messages in thread
From: Artem Bityutskiy @ 2012-01-16 10:24 UTC (permalink / raw)
  To: Karsten Jeppesen; +Cc: ubifs

[-- Attachment #1: Type: text/plain, Size: 2164 bytes --]

On Mon, 2012-01-16 at 00:18 -0800, Karsten Jeppesen wrote:
> Hi Artem,
> 
> Of course I can. Anything I can do to help.
> 
> I put it on one of my outside servers and I put as well the /dev/mtd4 as the ubi0_0 you asked for:
> 
> <http://download.gnist.skov.com/corrupt_mtd4_20120116.img>
> and
> <http://download.gnist.skov.com/ubi0_0_20120116.img>

Thanks.

I've taken a look by using mtdram. You have a strange corruption: 144
bytes of 0xFFs, then 32 bytes of zeroes, and then all 0xFFs. This looks
like some oddity of your NOR flash.

My theory is that your flash has write buffer and its size is 256 or
larger.

Anyway, first of all - start with pulling the latest ubifs-v2.6.32 tree
- I've added few changes there very recently which fix UBI/UBIFS
debugging messages.

Also, please, enable UIBFS debugging compilation option.

From now on I assume you have done this. Also I assume that you are
aware that you need to look at dmesg to see all the UBIFS messages.
There is some test in the MTD web site which explains this.


Next: if I hack UBIFS like this:

diff --git a/drivers/mtd/ubi/build.c b/drivers/mtd/ubi/build.c
index 6c3fb5a..58a49e7 100644
--- a/drivers/mtd/ubi/build.c
+++ b/drivers/mtd/ubi/build.c
@@ -691,6 +691,8 @@ static int io_init(struct ubi_device *ubi)
        ubi_assert(ubi->min_io_size % ubi->hdrs_min_io_size == 0);
 
        ubi->max_write_size = ubi->mtd->writebufsize;
+       ubi->max_write_size = 256;
+
        /*
         * Maximum write size has to be greater or equivalent to min. I/O
         * size, and be multiple of min. I/O size.

Then I can mount your image successfully.

What is 'mtd->writebufsize' in your setup? You need to find out the
right size and teach your driver to report it correctly.

UBI reports max_write_size when you attach the MTD device. E.g., with
mtdram I have the following:

[493058.328443] UBI DBG (pid 18798): io_init: min_io_size      1
[493058.328444] UBI DBG (pid 18798): io_init: max_write_size   64

With my hack it is 256, of course. The mtdram module which I use
hard-codes it to 64.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Corrupted UBIFS, bad CRC
  2012-01-16 10:24     ` Artem Bityutskiy
@ 2012-01-16 12:40       ` Karsten Jeppesen
  2012-01-16 12:46         ` Artem Bityutskiy
  2012-01-16 12:50         ` Artem Bityutskiy
  0 siblings, 2 replies; 12+ messages in thread
From: Karsten Jeppesen @ 2012-01-16 12:40 UTC (permalink / raw)
  To: dedekind1; +Cc: ubifs

Hi Artem,

OO something bad happened here. We must have lost track of each other.

What I sent you was ***NOT*** produced by our standard 2.6.32.8 kernel.
I spent quite some efforts in adapting the new 3.2.0  kernel.
I wrote that in my original email, but you may have drifted off to the emails from November.
To make life easier for you I assumed you would rather look at data including the newest stuff rather than backported stuff.

Debug was enabled and included as was any other information you asked for in the submission template.
And again: This is kernel 3.2.0 and all nondestructive tests were run with no errors.


The originating emails for this issue were both sent on Thusday January 12th 2012

Other notes:
We use 2 variants of FLASH - both NOR.  Spansion 29GL256P and 29GL512P
I found the datasheets and the 29GL256P says 32word/64 byte write buffer and the 29GL512P says the same.
When I attach the FLASH all ubiattach reports is:
UBI: attaching mtd4 to ubi0
UBI: physical eraseblock size:   131072 bytes (128 KiB)
UBI: logical eraseblock size:    130944 bytes
UBI: smallest flash I/O unit:    1
UBI: VID header offset:          64 (aligned 64)
UBI: data offset:                128
UBI: max. sequence number:       17653
UBI: attached mtd4 to ubi0
UBI: MTD device name:            "User"
UBI: MTD device size:            28 MiB
UBI: number of good PEBs:        230
UBI: number of bad PEBs:         0
UBI: number of corrupted PEBs:   0
UBI: max. allowed volumes:       128
UBI: wear-leveling threshold:    4096
UBI: number of internal volumes: 1
UBI: number of user volumes:     1
UBI: available PEBs:             0
UBI: total number of reserved PEBs: 230
UBI: number of PEBs reserved for bad PEB handling: 0
UBI: max/mean erase counter: 31/5
UBI: image sequence number:  1704669600
UBI: background thread "ubi_bgt0d" started, PID 18731
But no  max_write_size was reported. I take it that UBI debug has to be enabled as well for the system to report that - right?

Sorry for the confusion about the 3.2.0 kernel as opposed to the earlier 2.6.32.8 but I was as I wrote trying to take the backporting out of the equation,


Karsten








----- Original Message -----
From: Artem Bityutskiy <dedekind1@gmail.com>
To: Karsten Jeppesen <arm9263@yahoo.com>
Cc: ubifs <linux-mtd@lists.infradead.org>
Sent: Monday, January 16, 2012 11:24 AM
Subject: Re: Corrupted UBIFS, bad CRC

On Mon, 2012-01-16 at 00:18 -0800, Karsten Jeppesen wrote:
> Hi Artem,
> 
> Of course I can. Anything I can do to help.
> 
> I put it on one of my outside servers and I put as well the /dev/mtd4 as the ubi0_0 you asked for:
> 
> <http://download.gnist.skov.com/corrupt_mtd4_20120116.img>
> and
> <http://download.gnist.skov.com/ubi0_0_20120116.img>

Thanks.

I've taken a look by using mtdram. You have a strange corruption: 144
bytes of 0xFFs, then 32 bytes of zeroes, and then all 0xFFs. This looks
like some oddity of your NOR flash.

My theory is that your flash has write buffer and its size is 256 or
larger.

Anyway, first of all - start with pulling the latest ubifs-v2.6.32 tree
- I've added few changes there very recently which fix UBI/UBIFS
debugging messages.

Also, please, enable UIBFS debugging compilation option.

From now on I assume you have done this. Also I assume that you are
aware that you need to look at dmesg to see all the UBIFS messages.
There is some test in the MTD web site which explains this.


Next: if I hack UBIFS like this:

diff --git a/drivers/mtd/ubi/build.c b/drivers/mtd/ubi/build.c
index 6c3fb5a..58a49e7 100644
--- a/drivers/mtd/ubi/build.c
+++ b/drivers/mtd/ubi/build.c
@@ -691,6 +691,8 @@ static int io_init(struct ubi_device *ubi)
        ubi_assert(ubi->min_io_size % ubi->hdrs_min_io_size == 0);

        ubi->max_write_size = ubi->mtd->writebufsize;
+       ubi->max_write_size = 256;
+
        /*
         * Maximum write size has to be greater or equivalent to min. I/O
         * size, and be multiple of min. I/O size.

Then I can mount your image successfully.

What is 'mtd->writebufsize' in your setup? You need to find out the
right size and teach your driver to report it correctly.

UBI reports max_write_size when you attach the MTD device. E.g., with
mtdram I have the following:

[493058.328443] UBI DBG (pid 18798): io_init: min_io_size      1
[493058.328444] UBI DBG (pid 18798): io_init: max_write_size   64

With my hack it is 256, of course. The mtdram module which I use
hard-codes it to 64.

-- 
Best Regards,
Artem Bityutskiy


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Corrupted UBIFS, bad CRC
  2012-01-16 12:40       ` Karsten Jeppesen
@ 2012-01-16 12:46         ` Artem Bityutskiy
  2012-01-16 12:50         ` Artem Bityutskiy
  1 sibling, 0 replies; 12+ messages in thread
From: Artem Bityutskiy @ 2012-01-16 12:46 UTC (permalink / raw)
  To: Karsten Jeppesen; +Cc: ubifs

[-- Attachment #1: Type: text/plain, Size: 854 bytes --]

On Mon, 2012-01-16 at 04:40 -0800, Karsten Jeppesen wrote:
> Hi Artem,
> 
> OO something bad happened here. We must have lost track of each other.
> 
> What I sent you was ***NOT*** produced by our standard 2.6.32.8 kernel.
> I spent quite some efforts in adapting the new 3.2.0  kernel.

Well, ok, does not matter anyway.

> I wrote that in my original email, but you may have drifted off to the emails from November.
> To make life easier for you I assumed you would rather look at data including the newest stuff rather than backported stuff.

There are few fixes for 3.2 in ubifs-v3.2 tree. Namely, they make
UBI/UBIFS print some debugging message which are not printed otherwise,
due to a bug I introduced at 3.0 times. Please, pick them.

Please, just substitute s/2.6.32/3.2/ in my reply.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Corrupted UBIFS, bad CRC
  2012-01-16 12:40       ` Karsten Jeppesen
  2012-01-16 12:46         ` Artem Bityutskiy
@ 2012-01-16 12:50         ` Artem Bityutskiy
  2012-01-17 12:23           ` Karsten Jeppesen
  1 sibling, 1 reply; 12+ messages in thread
From: Artem Bityutskiy @ 2012-01-16 12:50 UTC (permalink / raw)
  To: Karsten Jeppesen; +Cc: ubifs

[-- Attachment #1: Type: text/plain, Size: 1175 bytes --]

To recap the most important things - but refer to my todays reply
anyway.

1. Pick printing fixes from ubifs-v3.2
2. Find out what UBIFS tells about your max_write_size
3. Check your _real_ buffer size from your flash docs and the driver and
verify that it is correct.
4. My theory that you should have 256 ore larger NOR flash write buffer
size.
5. With the hack I sent I can mount successfully. To verify my theory
you can test with this hack and see if you can reproduce the problem. If
you can, then either write-buffer is larger than 256 or my theory is
incorrect.

BTW, here is what I used to test your image in mtdram, just for
reference:

$ sudo dmesg -c > /dev/null && sudo ./unload_all.sh && sudo modprobe mtdram total_size=29440 && sudo dd if=~/tmp/Karsten/corrupt_mtd4_20120116.img of=/dev/mtd0 && sudo modprobe ubi mtd=0 && sudo modprobe ubifs && sudo sh -c 'echo "format \"UBIFS DBG\" +p" > /sys/kernel/debug/dynamic_debug/control' && sudo mount -t ubifs /dev/ubi0_0 /mnt/ubifs

Yeah, I know it is long and ugly, I wrote it on-the-spot to quickly
debug your issue. If you need, you can decompose it.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Corrupted UBIFS, bad CRC
  2012-01-16 12:50         ` Artem Bityutskiy
@ 2012-01-17 12:23           ` Karsten Jeppesen
  2012-01-18 14:43             ` Artem Bityutskiy
  0 siblings, 1 reply; 12+ messages in thread
From: Karsten Jeppesen @ 2012-01-17 12:23 UTC (permalink / raw)
  To: dedekind1; +Cc: ubifs

Hi Artem,

Sorry for the delay, but Im down with the flu and thus I have snot for brains.
This took *a lot* of kernel compiles to sample this data.
Err - that doesn't mean I don't appreciate your efforts - I really do.


Artem: Would it help you in any 
way if I get a some of these units sent to you? They are like 15 x 15 cm
 Single board. I would send it with all needed (battery included :-) ) 
and never to be returned  (You can keep everything).
I
 am asking because it seems to me that you have only limited hardware to
 test on and maybe thats the problem? The 64/256 seems to be real and a 
problem to be taken serious



1. Where are the patches for 3.2? git://git.infradead.org/~dedekind/ubifs-v3.2.git ?? To get the max_write output I changed the dbg_msg to ubi_msg.

2. UBI: max_write_size   64
3. Confirmed 64 from data sheets
4 Theory unfortunately bust.
5. See below

Now for the weird part: Setting the write buffer INCORRECTLY to 256 does mount the system - but is that healthy???: And what are the implications of setting it to a 4 times wrong value?

UBI: attaching mtd4 to ubi0
UBI: max_write_size   256   <<<<< forced via kernel patch
UBI: physical eraseblock size:   131072 bytes (128 KiB)
UBI: logical eraseblock size:    130944 bytes
UBI: smallest flash I/O unit:    1
UBI: VID header offset:          64 (aligned 64)
UBI: data offset:                128
UBI: max. sequence number:       17653
UBI: attached mtd4 to ubi0
UBI: MTD device name:            "User"
UBI: MTD device size:            28 MiB
UBI: number of good PEBs:        230
UBI: number of bad PEBs:         0
UBI: number of corrupted PEBs:   0
UBI: max. allowed volumes:       128
UBI: wear-leveling threshold:    4096
UBI: number of internal volumes: 1
UBI: number of user volumes:     1
UBI: available PEBs:             0
UBI: total number of reserved PEBs: 230
UBI: number of PEBs reserved for bad PEB handling: 0
UBI: max/mean erase counter: 31/5
UBI: image sequence number:  1704669600
UBI: background thread "ubi_bgt0d" started, PID 1734
UBI device number 0, total 230 LEBs (30117120 bytes, 28.7 MiB), available 0 LEBs (0 bytes), LEB size 130944 bytes (127.9 KiB)
# mkdir -p /skov/mnt/rootfs
# Starting dropbear
mount -t ubifs ubi0:rootfs /skov/mnt/rootfs
UBIFS: recovery needed
UBIFS: recovery completed
UBIFS: mounted UBI device 0, volume 0, name "rootfs"
UBIFS: file system size:   28414848 bytes (27748 KiB, 27 MiB, 217 LEBs)
UBIFS: journal size:       1440384 bytes (1406 KiB, 1 MiB, 11 LEBs)
UBIFS: media format:       w4/r0 (latest is w4/r0)
UBIFS: default compressor: lzo
UBIFS: reserved for root:  1342103 bytes (1310 KiB)

dmesg gives no additional info.



----- Original Message -----
From: Artem Bityutskiy <dedekind1@gmail.com>
To: Karsten Jeppesen <arm9263@yahoo.com>
Cc: ubifs <linux-mtd@lists.infradead.org>
Sent: Monday, January 16, 2012 1:50 PM
Subject: Re: Corrupted UBIFS, bad CRC

To recap the most important things - but refer to my todays reply
anyway.

1. Pick printing fixes from ubifs-v3.2
2. Find out what UBIFS tells about your max_write_size
3. Check your _real_ buffer size from your flash docs and the driver and
verify that it is correct.
4. My theory that you should have 256 ore larger NOR flash write buffer
size.
5. With the hack I sent I can mount successfully. To verify my theory
you can test with this hack and see if you can reproduce the problem. If
you can, then either write-buffer is larger than 256 or my theory is
incorrect.

BTW, here is what I used to test your image in mtdram, just for
reference:

$ sudo dmesg -c > /dev/null && sudo ./unload_all.sh && sudo modprobe mtdram total_size=29440 && sudo dd if=~/tmp/Karsten/corrupt_mtd4_20120116.img of=/dev/mtd0 && sudo modprobe ubi mtd=0 && sudo modprobe ubifs && sudo sh -c 'echo "format \"UBIFS DBG\" +p" > /sys/kernel/debug/dynamic_debug/control' && sudo mount -t ubifs /dev/ubi0_0 /mnt/ubifs

Yeah, I know it is long and ugly, I wrote it on-the-spot to quickly
debug your issue. If you need, you can decompose it.

-- 
Best Regards,
Artem Bityutskiy

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Corrupted UBIFS, bad CRC
  2012-01-17 12:23           ` Karsten Jeppesen
@ 2012-01-18 14:43             ` Artem Bityutskiy
  0 siblings, 0 replies; 12+ messages in thread
From: Artem Bityutskiy @ 2012-01-18 14:43 UTC (permalink / raw)
  To: Karsten Jeppesen; +Cc: ubifs

[-- Attachment #1: Type: text/plain, Size: 4147 bytes --]

On Tue, 2012-01-17 at 04:23 -0800, Karsten Jeppesen wrote:
> Artem: Would it help you in any 
> way if I get a some of these units sent to you? They are like 15 x 15 cm
>  Single board. I would send it with all needed (battery included :-) ) 
> and never to be returned  (You can keep everything).

I would be happy to help, but I really have no time to do more than
suggesting and giving some advises - my employer and the baby take all
my time, sorry.

Besides, I am not a flash HW expert, and the issue you observe look like
it is very related to your HW and how it behaves when it loses power
when a write operation is ongoing. Or may be erase operation, but it
looks like that was a write operation. It does not look at all like
UBI/UBIFS issue.

> 1. Where are the patches for 3.2? git://git.infradead.org/~dedekind/ubifs-v3.2.git

Yes.

>  ?? To get the max_write output I changed the dbg_msg to ubi_msg.
> 
> 2. UBI: max_write_size   64
> 3. Confirmed 64 from data sheets

OK.

> 4 Theory unfortunately bust.

Not necessarily. You need to dig deeper - what if your driver is doing
something you are not aware about or the controller? Better to ask the
vendor how the flash behaves on a power cut while writing.

> 5. See below
> 
> Now for the weird part: Setting the write buffer INCORRECTLY to 256
> does mount the system - but is that healthy???: And what are the
> implications of setting it to a 4 times wrong value?

You need to really dig deeper into this. Let me elaborate the concept of
'max_write_size', and also you can find it from git log and by googling
- we discussed this in the mailing list.

So, UBI has a notion of min_io_size - this is minimum amount of bytes
you can write. For NAND this is often 2048. For NOR it is 1 byte.

NORs have optimization called "write-buffer", which means that NOR can
write many bytes at a time.

This "write buffer" size is called 'max_write_size' in UBIFS, to be
consistent with 'min_io_size', and also because UBIFS has its own
write-buffers, so this term has already been occupied when we added
'max_write_size'.

Note, we added 'max_write_size' to fix NOR issues after power cuts, I
think last year.

So what happens when you write data and have a power cut? On the driver
level, you write write-buffer after write-buffer - 'max_write_size'
bytes at a time.  The experiments with NOR showed that the after the
power cut the 'max_write_size' area which you have been writing to
during the power cut will contain garbage, or unstable bits, or zeroes,
or few zeroes, or any other anomaly.

When UBIFS recovers after a power cut, in has to scan the journal and
find the last node. The last node is the one which follows with all 0xFF
bytes.

In your case you have one good node from offset 0 to offset 112 (AFAIR),
then 32 bytes of 0xFFs, and then 32 bytes of zeroes, and then the rest
is all 0xFFs.

So my theory is that your write-buffer size is 256, or you have some
kind of striping, or something, so that when UBIFS submits a 112 bytes
write request, on the driver level a 256-byte write buffer is used, and
actually the area (0, 256) is being programmed, but area 113-226 is
programmed with all 0xFFs.

Or something like that, I do not know NOR well enough.

Anyway, what happens is that due to power cut you end up with random
corruption within that 256 bytes area, so you end up with those zeroes.

UBIFS is aware of this effect and it knows that it should only check for
all 0xFFs starting from the next 'max_write_size'-align offset after the
last node. But because in your case 'max_write_size' is 64, it hits
those zeroes and refuses mounting, because it is unexpected.

I do not think there is a big downside of having 'max_write_size' to be
256 from the performance POW at least. The downside is that in case of
power cut you may lose a bit more data, because UBIFS has its own
write-buffers, but this is really minor.

You can also experiment by forcing your flash to not use write-buffer at
all to verify if the corruptions are related.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Corrupted UBIFS, bad CRC
@ 2012-01-12 14:31 Karsten Jeppesen
  0 siblings, 0 replies; 12+ messages in thread
From: Karsten Jeppesen @ 2012-01-12 14:31 UTC (permalink / raw)
  To: ubifs

Hi again,

Maybe I should include all info you require, but sometimes the brain is slower than the right pinkie....

...
* run the MTD tests to validate your flash
Done - nothing to report

* enable the UBIFS extra self checks and try to reproduce the problem.
Not possible ??? 

# echo 3 > /sys/module/ubifs/
uevent   version
... no parameters folder

* make sure you use up-to-date UBIFS
Sorry - best I can do right now is the 3.2.0 kernel. Is that ok?

* make sure you have compiled the kernel symbols in 

# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_HOTPLUG=y


* mark the Enable debugging support 

CONFIG_UBIFS_FS=y
# CONFIG_UBIFS_FS_XATTR is not set
# CONFIG_UBIFS_FS_ADVANCED_COMPR is not set
CONFIG_UBIFS_FS_LZO=y
CONFIG_UBIFS_FS_ZLIB=y
CONFIG_UBIFS_FS_DEBUG=y


* include all the messages UBIFS prints
# dmesg -n8
# ubiattach /dev/ubi_ctrl -m 4
UBI: attaching mtd4 to ubi0
UBI: physical eraseblock size:   131072 bytes (128 KiB)
UBI: logical eraseblock size:    130944 bytes
UBI: smallest flash I/O unit:    1
UBI: VID header offset:          64 (aligned 64)
UBI: data offset:                128
UBI: max. sequence number:       17653
UBI: attached mtd4 to ubi0
UBI: MTD device name:            "User"
UBI: MTD device size:            28 MiB
UBI: number of good PEBs:        230
UBI: number of bad PEBs:         0
UBI: number of corrupted PEBs:   0
UBI: max. allowed volumes:       128
UBI: wear-leveling threshold:    4096
UBI: number of internal volumes: 1
UBI: number of user volumes:     1
UBI: available PEBs:             0
UBI: total number of reserved PEBs: 230
UBI: number of PEBs reserved for bad PEB handling: 0
UBI: max/mean erase counter: 31/5
UBI: image sequence number:  1704669600
UBI: background thread "ubi_bgt0d" started, PID 6374
UBI device number 0, total 230 LEBs (30117120 bytes, 28.7 MiB), available 0 LEBs (0 bytes), LEB size 130944 bytes (127.9 KiB)
# mkdir -p /skov/mnt/rootfs
# mount -t ubifs ubi0:rootfs /skov/mnt/rootfs
UBIFS: recovery needed
UBIFS error (pid 6700): ubifs_recover_leb: corrupt empty space LEB 4:0, corruption starts at 144
UBIFS error (pid 6700): ubifs_scanned_corruption: corruption at LEB 4:144
UBIFS error (pid 6700): ubifs_scanned_corruption: first 8192 bytes from LEB 4:144
00000000: 00000000 00000000 00000000 00000000 ffffffff ffffffff ffffffff ffffffff  ................................
00000020: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff  ................................
...many lines of ffffff
00001fc0: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff  ................................
00001fe0: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff  ................................
UBIFS error (pid 6700): ubifs_recover_leb: LEB 4 scanning failed
mount: mounting ubi0:rootfs on /skov/mnt/rootfs failed: Structure needs cleaning
#


* explicitly tell about whether you did any checking
The UBI config looks like this:
CONFIG_MTD_UBI=y
CONFIG_MTD_UBI_WL_THRESHOLD=4096
CONFIG_MTD_UBI_BEB_RESERVE=1
# CONFIG_MTD_UBI_GLUEBI is not set
# CONFIG_MTD_UBI_DEBUG is not set


* describe your flash device
# mtdinfo -a
Count of MTD devices:           5
Present MTD devices:            mtd0, mtd1, mtd2, mtd3, mtd4
Sysfs interface supported:      yes

mtd0
Name:                           u-boot
Type:                           nor
Eraseblock size:                131072 bytes, 128.0 KiB
Amount of eraseblocks:          3 (393216 bytes, 384.0 KiB)
Minimum input/output unit size: 1 byte
Sub-page size:                  1 byte
Character device major/minor:   90:0
Bad blocks are allowed:         false
Device is writable:             true

mtd1
Name:                           Env
Type:                           nor
Eraseblock size:                131072 bytes, 128.0 KiB
Amount of eraseblocks:          1 (131072 bytes, 128.0 KiB)
Minimum input/output unit size: 1 byte
Sub-page size:                  1 byte
Character device major/minor:   90:2
Bad blocks are allowed:         false
Device is writable:             true

mtd2
Name:                           Linux
Type:                           nor
Eraseblock size:                131072 bytes, 128.0 KiB
Amount of eraseblocks:          16 (2097152 bytes, 2.0 MiB)
Minimum input/output unit size: 1 byte
Sub-page size:                  1 byte
Character device major/minor:   90:4
Bad blocks are allowed:         false
Device is writable:             true

mtd3
Name:                           Bmp_Image
Type:                           nor
Eraseblock size:                131072 bytes, 128.0 KiB
Amount of eraseblocks:          6 (786432 bytes, 768.0 KiB)
Minimum input/output unit size: 1 byte
Sub-page size:                  1 byte
Character device major/minor:   90:6
Bad blocks are allowed:         false
Device is writable:             true

mtd4
Name:                           User
Type:                           nor
Eraseblock size:                131072 bytes, 128.0 KiB
Amount of eraseblocks:          230 (30146560 bytes, 28.8 MiB)
Minimum input/output unit size: 1 byte
Sub-page size:                  1 byte
Character device major/minor:   90:8
Bad blocks are allowed:         false
Device is writable:             true


* describe how the problem can be reproduced
I have 16 machines running powercycle 20secs-on 3secs-off. In 24  hrs roughly 2 will fail this way.


Now I think I have added all I could think of.

Sincerely,
Dr. Karsten Jeppesen

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Corrupted UBIFS, bad CRC
  2011-11-23 12:49 Karsten Jeppesen
@ 2011-11-29 21:58 ` Artem Bityutskiy
  0 siblings, 0 replies; 12+ messages in thread
From: Artem Bityutskiy @ 2011-11-29 21:58 UTC (permalink / raw)
  To: Karsten Jeppesen; +Cc: linux-mtd

Hi,

On Wed, 2011-11-23 at 04:49 -0800, Karsten Jeppesen wrote:
> Uncompressing Linux........... done, booting the kernel.
> [    1.570000] UBIFS error (pid 1): ubifs_check_node: bad CRC: calculated 0x7d62d42c, read 0x1173c109
> [    1.580000] UBIFS error (pid 1): ubifs_check_node: bad node at LEB 84:50696
> [    1.580000] UBIFS error (pid 1): ubifs_read_node: expected node type 9
> [    1.590000] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

Well, difficult to say - you have a node with bad CRC. There may be
different reasons. Did you try to validate your flash with mtd tests?

Here are some hints:
http://linux-mtd.infradead.org/faq/ubifs.html#L_how_send_bugreport

> I am running a kernel 2.6.32.8 with most patches applied. Especially
> the recovery.c patch and the mtd (8 byte write buffer patch) applied.
> The target that showed this error does not have these patches applied.
> 
> Even so... I copied the FLASH content to a target with these patches
> and tried again in order to see if these patches would allow the
> kernel to rectify the problem. No cigar.
> Of course I ran with debug enabled so here are the output (but even
> better I hope - here is the flash image for download:
> http://download.gnist.skov.com/corrupt_ubifs.img )

I've downloaed this image and could mount it when I use mtdram:

$ sudo modprobe mtdram erase_size=128 total_size=60000
$ sudo dd if=~/tmp/corrupt_ubifs.img of=/dev/mtd0
$ sudo modprobe ubi mtd=0

[dedekind@koala l2-mtd-2.6 (master)]$ sudo modprobe mtdram
erase_size=128 total_size=29440
[dedekind@koala l2-mtd-2.6 (master)]$ sudo dd if=~/tmp/corrupt_ubifs.img
of=/dev/mtd0
58880+0 records in
58880+0 records out
30146560 bytes (30 MB) copied, 0.0739408 s, 408 MB/s
[dedekind@koala l2-mtd-2.6 (master)]$ sudo modprobe ubi mtd=0

[ 5920.203998] UBI: attaching mtd0 to ubi0
[ 5920.204016] UBI: physical eraseblock size:   131072 bytes (128 KiB)
[ 5920.204019] UBI: logical eraseblock size:    130944 bytes
[ 5920.204022] UBI: smallest flash I/O unit:    1
[ 5920.204028] UBI: VID header offset:          64 (aligned 64)
[ 5920.204034] UBI: data offset:                128
[ 5920.204330] UBI: max. sequence number:       325
[ 5920.204799] UBI: attached mtd0 to ubi0
[ 5920.204801] UBI: MTD device name:            "mtdram test device"
[ 5920.204804] UBI: MTD device size:            28 MiB
[ 5920.204805] UBI: number of good PEBs:        230
[ 5920.204807] UBI: number of bad PEBs:         0
[ 5920.204809] UBI: number of corrupted PEBs:   0
[ 5920.204810] UBI: max. allowed volumes:       128
[ 5920.204812] UBI: wear-leveling threshold:    4096
[ 5920.204814] UBI: number of internal volumes: 1
[ 5920.204815] UBI: number of user volumes:     1
[ 5920.204817] UBI: available PEBs:             0
[ 5920.204819] UBI: total number of reserved PEBs: 230
[ 5920.204820] UBI: number of PEBs reserved for bad PEB handling: 0
[ 5920.204822] UBI: max/mean erase counter: 2/0
[ 5920.204824] UBI: image sequence number:  1748877991
[ 5920.204832] UBI: background thread "ubi_bgt0d" started, PID 4759


I was using my fedora kernel. Did you try to pull the ubifs-v2.6.32
back-port tree? 

http://linux-mtd.infradead.org/doc/ubifs.html#L_source

Artem.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Corrupted UBIFS, bad CRC
@ 2011-11-23 12:49 Karsten Jeppesen
  2011-11-29 21:58 ` Artem Bityutskiy
  0 siblings, 1 reply; 12+ messages in thread
From: Karsten Jeppesen @ 2011-11-23 12:49 UTC (permalink / raw)
  To: linux-mtd

Hi Artem,

First: I love the UBIFS. It performs really really well.

Amongst many well functioning targets (ARM 9263 based) this sucker had the nerves to act up:



Uncompressing Linux........... done, booting the kernel.
[    1.570000] UBIFS error (pid 1): ubifs_check_node: bad CRC: calculated 0x7d62d42c, read 0x1173c109
[    1.580000] UBIFS error (pid 1): ubifs_check_node: bad node at LEB 84:50696
[    1.580000] UBIFS error (pid 1): ubifs_read_node: expected node type 9
[    1.590000] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)



I am running a kernel 2.6.32.8 with most patches applied. Especially the recovery.c patch and the mtd (8 byte write buffer patch) applied. The target that showed this error does not have these patches applied.

Even so... I copied the FLASH content to a target with these patches and tried again in order to see if these patches would allow the kernel to rectify the problem. No cigar.
Of course I ran with debug enabled so here are the output (but even better I hope - here is the flash image for download: http://download.gnist.skov.com/corrupt_ubifs.img )

----
[    1.570000] atmel_usart.3: ttyS3 at MMIO 0xfff94000 (irq = 9) is a ATMEL_SERIAL
[    1.630000] brd: module loaded
[    1.660000] loop: module loaded
[    1.670000] physmap platform flash device: 04000000 at 10000000
[    1.680000] Number of erase regions: 1
[    1.680000] Warning:  Overriding MaxBufWriteSize from 2^6 to 2^3
[    1.690000] Primary Vendor Command Set: 0002 (AMD/Fujitsu Standard)
[    1.690000] Primary Algorithm Table at 0040
[    1.700000] Alternative Vendor Command Set: 0000 (None)
[    1.700000] No Alternate Algorithm Table
[    1.710000] Vcc Minimum:  2.7 V
[    1.710000] Vcc Maximum:  3.6 V
[    1.710000] No Vpp line
[    1.720000] Typical byte/word write timeout: 64 µs
[    1.720000] Maximum byte/word write timeout: 512 µs
[    1.730000] Typical full buffer write timeout: 64 µs
[    1.730000] Maximum full buffer write timeout: 2048 µs
[    1.740000] Typical block erase timeout: 512 ms
[    1.740000] Maximum block erase timeout: 4096 ms
[    1.750000] Typical chip erase timeout: 131072 ms
[    1.750000] Maximum chip erase timeout: 524288 ms
[    1.760000] Device size: 0x2000000 bytes (32 MiB)
[    1.760000] Flash Device Interface description: 0x0002
[    1.770000]   - supports x8 and x16 via BYTE# with asynchronous interface
[    1.770000] Max. bytes in buffer write: 0x8
[    1.780000] Number of Erase Block Regions: 1
[    1.780000]   Erase Region #0: BlockSize 0x20000 bytes, 256 blocks
[    1.790000] physmap-flash.0: Found 1 x16 devices at 0x0 in 16-bit bank
[    1.790000]  Amd/Fujitsu Extended Query Table at 0x0040
[    1.800000] physmap-flash.0: CFI does not contain boot bank location. Assuming top.
[    1.810000] number of CFI chips: 1
[    1.810000] cfi_cmdset_0002: Disabling erase-suspend-program due to code brokenness.
[    1.820000] 5 cmdlinepart partitions found on MTD device physmap-flash.0

----
# skovsetup mountflash
[  165.940000] UBI: attaching mtd4 to ubi0
[  165.950000] UBI: physical eraseblock size:   131072 bytes (128 KiB)
[  165.950000] UBI: logical eraseblock size:    130944 bytes
[  165.960000] UBI: smallest flash I/O unit:    1
[  165.960000] UBI: VID header offset:          64 (aligned 64)
[  165.970000] UBI: data offset:                128
[  166.040000] UBI: attached mtd4 to ubi0
[  166.060000] UBI: MTD device name:            "User"
[  166.060000] UBI: MTD device size:            28 MiB
[  166.120000] UBI: number of good PEBs:        230
[  166.120000] UBI: number of bad PEBs:         0
[  166.120000] UBI: max. allowed volumes:       128
[  166.150000] UBI: wear-leveling threshold:    4096
[  166.150000] UBI: number of internal volumes: 1
[  166.150000] UBI: number of user volumes:     1
[  166.170000] UBI: available PEBs:             0
[  166.170000] UBI: total number of reserved PEBs: 230
[  166.190000] UBI: number of PEBs reserved for bad PEB handling: 0
[  166.230000] UBI: max/mean erase counter: 2/0
[  166.230000] UBI: image sequence number: 1748877991
[  166.260000] UBI: background thread "ubi_bgt0d" started, PID 2170
UBI device number 0, total 230 LEBs (30117120 bytes, 28.7 MiB), available 0 LEBs (0 bytes), LEB size 130944 bytes (127.9 KiB)
[  166.520000] UBIFS: recovery needed
[  166.680000] UBIFS error (pid 2177): ubifs_check_node: bad CRC: calculated 0x7d62d42c, read 0x1173c109
[  166.690000] UBIFS error (pid 2177): ubifs_check_node: bad node at LEB 84:50696
[  166.700000] UBIFS error (pid 2177): ubifs_read_node: expected node type 9
mount: mounting ubi0:rootfs on /skov/mnt/rootfs failed: Structure needs cleaning
#
---

So the question is if UBI can made to recover this situation???


Sincerely,

Dr. Karsten Jeppesen,
SKOV AS

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-01-18 14:41 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-12 13:47 Corrupted UBIFS, bad CRC Karsten Jeppesen
2012-01-15 12:24 ` Artem Bityutskiy
2012-01-16  8:18   ` Karsten Jeppesen
2012-01-16 10:24     ` Artem Bityutskiy
2012-01-16 12:40       ` Karsten Jeppesen
2012-01-16 12:46         ` Artem Bityutskiy
2012-01-16 12:50         ` Artem Bityutskiy
2012-01-17 12:23           ` Karsten Jeppesen
2012-01-18 14:43             ` Artem Bityutskiy
  -- strict thread matches above, loose matches on Subject: below --
2012-01-12 14:31 Karsten Jeppesen
2011-11-23 12:49 Karsten Jeppesen
2011-11-29 21:58 ` Artem Bityutskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.