All of lore.kernel.org
 help / color / mirror / Atom feed
* read_pnode: error -22 reading pnode at XX:YYYYY
@ 2011-04-19 21:08 Rick Johnson
  2011-04-21 13:20 ` Artem Bityutskiy
  0 siblings, 1 reply; 11+ messages in thread
From: Rick Johnson @ 2011-04-19 21:08 UTC (permalink / raw)
  To: linux-mtd

Hi,

We've been seeing this error

"read_pnode: error -22 reading pnode at XX:YYYYY"

from time to time.  It happens just often to cause problems, but not 
often enough to determine how to reproduce it.  We're using kernel 
version 2.6.36.

When it happens, we usually get the following messages after it:

UBIFS error (pid 115): ubifs_update_one_lp: cannot update properties of 
= LEB ZZZ, error -22
UBIFS warning (pid 115): ubifs_ro_mode: switched to read-only mode, = 
error -22

After that, our device is stuck.  We've only ever noticed the problem on 
boot up.

If anyone has any suggestions on where I could look to track the source, 
I would greatly appreciate it.  I'd be happy to provide any additional 
information needed.

Thank you in advance,
Rick Johnson


====================================================================
Below is the most recent instance of the problem.
====================================================================

U-Boot 2010.06 (Mar 08 2011 - 12:37:47) MPC83XX

Reset Status: Software Hard, External/Internal Soft, External/Internal Hard

CPU:   e300c3, MPC8313, Rev: 2.1 at 333.333 MHz, CSB: 166.667 MHz
I2C:   ready
DRAM:  128 MiB
NAND:  1024 MiB
Bad block table found at page 524224, version 0x01
Bad block table found at page 524160, version 0x01
nand_bbt: ECC error while reading bad block table
nand_read_bbt: Bad block at 0x000000180000
nand_read_bbt: Bad block at 0x000000200000
nand_read_bbt: Bad block at 0x000000400000
nand_read_bbt: Bad block at 0x000000f80000
nand_read_bbt: Bad block at 0x000001d80000
nand_read_bbt: Bad block at 0x000002180000
nand_read_bbt: Bad block at 0x000002c80000
nand_read_bbt: Bad block at 0x000003a80000
nand_read_bbt: Bad block at 0x000004400000
nand_read_bbt: Bad block at 0x000005380000
nand_read_bbt: Bad block at 0x000005800000
nand_read_bbt: Bad block at 0x000005f80000
nand_read_bbt: Bad block at 0x000006380000
nand_read_bbt: Bad block at 0x000007600000
nand_read_bbt: Bad block at 0x000007900000
nand_read_bbt: Bad block at 0x000007c00000
nand_read_bbt: Bad block at 0x000007d80000
nand_read_bbt: Bad block at 0x000008100000
nand_read_bbt: Bad block at 0x000008a80000
nand_read_bbt: Bad block at 0x000009e00000
nand_read_bbt: Bad block at 0x00000a580000
nand_read_bbt: Bad block at 0x00000cb80000
nand_read_bbt: Bad block at 0x00000fb00000
nand_read_bbt: Bad block at 0x000010580000
nand_read_bbt: Bad block at 0x000011080000
nand_read_bbt: Bad block at 0x000011880000
nand_read_bbt: Bad block at 0x000011980000
nand_read_bbt: Bad block at 0x000012180000
nand_read_bbt: Bad block at 0x000012780000
nand_read_bbt: Bad block at 0x000013e80000
nand_read_bbt: Bad block at 0x000014a80000
nand_read_bbt: Bad block at 0x000014c80000
nand_read_bbt: Bad block at 0x000015e00000
nand_read_bbt: Bad block at 0x00001b180000
nand_read_bbt: Bad block at 0x00001e580000
*** Warning - bad CRC or NAND, using default environment

In:    serial
Out:   serial
Err:   serial
Reading MAC from EEPROM
load_fpga(): err=0 len=130952
FPGA already running, not reloading
variant=1
nand1 size=40000000 nand2 size=0
Net:   TSEC0: No support for PHY id 221513; assuming generic
TSEC1: No support for PHY id 221513; assuming generic
TSEC0 [PRIME], TSEC1
Creating 1 MTD partitions on "nand0":
0x000000500000-0x000040000000 : "mtd=1"
UBI: attaching mtd1 to ubi0
UBI: physical eraseblock size:   131072 bytes (128 KiB)
UBI: logical eraseblock size:    129024 bytes
UBI: smallest flash I/O unit:    2048
UBI: sub-page size:              512
UBI: VID header offset:          512 (aligned 512)
UBI: data offset:                2048
UBI warning: ubi_eba_init_scan: cannot reserve enough PEBs for bad PEB 
handling, reserved 77, need 81
UBI: attached mtd1 to ubi0
UBI: MTD device name:            "mtd=1"
UBI: MTD device size:            1019 MiB
UBI: number of good PEBs:        8116
UBI: number of bad PEBs:         36
UBI: max. allowed volumes:       128
UBI: wear-leveling threshold:    4096
UBI: number of internal volumes: 1
UBI: number of user volumes:     1
UBI: available PEBs:             0
UBI: total number of reserved PEBs: 8116
UBI: number of PEBs reserved for bad PEB handling: 77
UBI: max/mean erase counter: 4/2
UBIFS: recovery needed
UBIFS: recovery deferred
UBIFS: mounted UBI device 0, volume 0, name "container"
UBIFS: mounted read-only
UBIFS: file system size:   1033869312 bytes (1009638 KiB, 985 MiB, 8013 
LEBs)
UBIFS: journal size:       9033728 bytes (8822 KiB, 8 MiB, 71 LEBs)
UBIFS: media format:       w4/r0 (latest is w4/r0)
UBIFS: default compressor: LZO
UBIFS: reserved for root:  0 bytes (0 KiB)
Loading file '/part' to addr 0x00200000 with size 1 (0x00000001)...
Done
byte at 0x00200000 (0x00) != byte at 0x00400000 (0x01)
Total of 0 bytes were the same
Loading file '/kernel0.img' to addr 0x00600000 with size 1715302 
(0x001a2c66)...
Done
Loading file '/dtb0.img' to addr 0x00bf0000 with size 55358 (0x0000d83e)...
Done
## Booting kernel from Legacy Image at 00600000 ...
    Image Name:   Linux-2.6.36
    Created:      2011-04-06  18:11:00 UTC
    Image Type:   PowerPC Linux Kernel Image (gzip compressed)
    Data Size:    1715238 Bytes = 1.6 MiB
    Load Address: 00000000
    Entry Point:  00000000
    Verifying Checksum ... OK
## Flattened Device Tree blob at 00bf0000
    Booting using the fdt blob at 0xbf0000
    Uncompressing Kernel Image ... OK
    Loading Device Tree to 007ef000, end 007ff83d ... OK
IntFixup: detected TSEC0, fixing interrupts for rev 2.1 8313 silicon
IntFixup: detected TSEC1, fixing interrupts for rev 2.1 8313 silicon
setup_arch: bootmem
mpc831x_rdb_setup_arch()
arch: exit

initramfs loader
Mounting UBIFS filesystem on 'ubi0:container'...
UBIFS error (pid 1): read_pnode: error -22 reading pnode at 14:38299
UBIFS error (pid 1): read_pnode: error -22 reading pnode at 14:38299
UBIFS error (pid 1): read_pnode: error -22 reading pnode at 14:41289
UBIFS error (pid 1): read_pnode: error -22 reading pnode at 14:41380
UBIFS error (pid 1): read_pnode: error -22 reading pnode at 18:56882
Booting partition 0...
Loop-mounting SquashFS image at '/container/rootfs0.img'...
Moving '/container' -> '/newroot/initlo'...
Changing root & launching '/linuxrc'...

====================================================================
An older instance of the problem, but with some debug enabled.
Unfortunately, I don't have the whole log for this instance.
====================================================================

UBIFS error (pid 463): read_pnode: error -22 reading pnode at 19:125709
(pid 463) dumping pnode:
	address c3078480 parent c3073d20 cnext 0
	flags 0 iip 3 level 0 num 8080
	0: free 40 dirty 122352 flags 1 lnum 0
	1: free 0 dirty 115064 flags 1 lnum 0
	2: free 0 dirty 127312 flags 1 lnum 0
	3: free 0 dirty 127704 flags 1 lnum 0
UBIFS DBG (pid 463): read_pnode: calc num: 1995
UBIFS error (pid 463): ubifs_update_one_lp: cannot update properties of 
LEB 8004, error -22
UBIFS warning (pid 463): ubifs_ro_mode: switched to read-only mode, 
error -22
Call Trace:
[c61cbcf0] [c00085b8] show_stack+0x7c/0x194 (unreliable)
[c61cbd30] [c0101444] ubifs_ro_mode+0x6c/0x7c
[c61cbd40] [c00f70a4] ubifs_jnl_write_data+0x2d0/0x314
[c61cbd80] [c00f8fc0] do_writepage+0xd8/0x224
[c61cbdd0] [c00620c4] __writepage+0x24/0x68
[c61cbde0] [c00628b0] write_cache_pages+0x254/0x378
[c61cbe80] [c005bbe0] __filemap_fdatawrite_range+0x94/0xc0
[c61cbee0] [c005bc5c] filemap_write_and_wait_range+0x50/0xac
[c61cbf00] [c00a738c] vfs_fsync_range+0x48/0xb4
[c61cbf20] [c00a7498] do_fsync+0x30/0x58
[c61cbf40] [c000f160] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xfb491a0
     LR = 0xfb49190


====================================================================
Some additional debug, just in case.
====================================================================

UBIFS DBG (pid 30): ubifs_bg_thread: background thread "ubifs_bgt0_0" 
started, PID 30
UBIFS: recovery needed
UBIFS: recovery completed
UBIFS: mounted UBI device 0, volume 0, name "container"
UBIFS: file system size:   1037223936 bytes (1012914 KiB, 989 MiB, 8039 
LEBs)
UBIFS: journal size:       9033728 bytes (8822 KiB, 8 MiB, 71 LEBs)
UBIFS: media format:       w4/r0 (latest is w4/r0)
UBIFS: default compressor: lzo
UBIFS: reserved for root:  0 bytes (0 KiB)
UBIFS DBG (pid 1): mount_ubifs: compiled on:         Jan 12 2011 at 14:28:23
UBIFS DBG (pid 1): mount_ubifs: min. I/O unit size:  2048 bytes
UBIFS DBG (pid 1): mount_ubifs: LEB size:            129024 bytes (126 KiB)
UBIFS DBG (pid 1): mount_ubifs: data journal heads:  1
UBIFS DBG (pid 1): mount_ubifs: UUID: 
15A2370F-36A8-453D-B649-DE5C482B68E9
UBIFS DBG (pid 1): mount_ubifs: big_lpt              1
UBIFS DBG (pid 1): mount_ubifs: log LEBs:            5 (3 - 7)
UBIFS DBG (pid 1): mount_ubifs: LPT area LEBs:       13 (8 - 20)
UBIFS DBG (pid 1): mount_ubifs: orphan area LEBs:    1 (21 - 21)
UBIFS DBG (pid 1): mount_ubifs: main area LEBs:      8039 (22 - 8060)
UBIFS DBG (pid 1): mount_ubifs: index LEBs:          254
UBIFS DBG (pid 1): mount_ubifs: total index bytes:   1041120 (1016 KiB, 
0 MiB)
UBIFS DBG (pid 1): mount_ubifs: key hash type:       0
UBIFS DBG (pid 1): mount_ubifs: tree fanout:         8
UBIFS DBG (pid 1): mount_ubifs: reserved GC LEB:     2545
UBIFS DBG (pid 1): mount_ubifs: first main LEB:      22
UBIFS DBG (pid 1): mount_ubifs: max. znode size      240
UBIFS DBG (pid 1): mount_ubifs: max. index node size 192
UBIFS DBG (pid 1): mount_ubifs: node sizes:          data 48, inode 160, 
dentry 56
UBIFS DBG (pid 1): mount_ubifs: node sizes:          trun 56, sb 4096, 
master 512
UBIFS DBG (pid 1): mount_ubifs: node sizes:          ref 64, cmt. start 
32, orph 32
UBIFS DBG (pid 1): mount_ubifs: max. node sizes:     data 4144, inode 
4256 dentry 312
UBIFS DBG (pid 1): mount_ubifs: dead watermark:      2048
UBIFS DBG (pid 1): mount_ubifs: dark watermark:      6144
UBIFS DBG (pid 1): mount_ubifs: LEB overhead:        560
UBIFS DBG (pid 1): mount_ubifs: max. dark space:     49391616 (48234 
KiB, 47 MiB)
UBIFS DBG (pid 1): mount_ubifs: maximum bud bytes:   8388608 (8192 KiB, 
8 MiB)
UBIFS DBG (pid 1): mount_ubifs: BG commit bud bytes: 6815744 (6656 KiB, 
6 MiB)
UBIFS DBG (pid 1): mount_ubifs: current bud bytes    90112 (88 KiB, 0 MiB)
UBIFS DBG (pid 1): mount_ubifs: max. seq. number:    4633434
UBIFS DBG (pid 1): mount_ubifs: commit number:       1080

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: read_pnode: error -22 reading pnode at XX:YYYYY
  2011-04-19 21:08 read_pnode: error -22 reading pnode at XX:YYYYY Rick Johnson
@ 2011-04-21 13:20 ` Artem Bityutskiy
  2011-05-05 21:07   ` Rick Johnson
  0 siblings, 1 reply; 11+ messages in thread
From: Artem Bityutskiy @ 2011-04-21 13:20 UTC (permalink / raw)
  To: Rick Johnson; +Cc: linux-mtd

On Tue, 2011-04-19 at 16:08 -0500, Rick Johnson wrote:
> Hi,
> 
> We've been seeing this error
> 
> "read_pnode: error -22 reading pnode at XX:YYYYY"

Hi, I'm very busy now, but here is a quick hint. It looks like a bug. Do
you do unclean power cuts? It might be a bug in the bits manipulation
functions. In short - a penode describes amount of free and dirty space
in one LEB. We try to store this information as compactly as possible,
so we jam it into bits.

There is a 'ubifs_pack_pnode()' function which is used to jam the
free/dirty information into bits. Try to instrument it with and:

1. Validate the pnode before packing, do the same validate_pnode() does.
May be you'll catch the place where it we write incorrect pnode. Because
what you see is a result of an error which might have happend long
before you hit it.

2. At the end of 'ubifs_pack_pnode()' unpack it and validate again, and
make sure it did not change. May be there is a bug in packing/unpacking
functions, then you will catch this.

I'll try to come up with a patch which does this every time we have
debugging enabled, but not now, no time.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: read_pnode: error -22 reading pnode at XX:YYYYY
  2011-04-21 13:20 ` Artem Bityutskiy
@ 2011-05-05 21:07   ` Rick Johnson
  2011-05-06 18:32     ` Artem Bityutskiy
  0 siblings, 1 reply; 11+ messages in thread
From: Rick Johnson @ 2011-05-05 21:07 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd

Hi Artem,

Thanks for your advice!  We were finally able to reproduce the problem.

> 1. Validate the pnode before packing, do the same validate_pnode() does.
> May be you'll catch the place where it we write incorrect pnode. Because
> what you see is a result of an error which might have happend long
> before you hit it.

We did validate_pnode() before the pnode was packed and have this output 
from dbg_dump_pnode():

(pid 6298) dumping pnode:
	address c631d380 parent c6005720 cnext c6005720
	flags 3 iip 3 level 0 num -969086448
	0: free 0 dirty 127984 flags 1 lnum 514
	1: free 129024 dirty 0 flags 4 lnum 515
	2: free 0 dirty 127920 flags 1 lnum 516
	3: free 129024 dirty 0 flags 4 lnum 517


It looks like 'num' is not valid.  Also, is it normal for 'parent' to be 
equal to 'cnext'?

We'll continue to look into this, but I thought it wouldn't hurt to get 
your opinion on the above debug.

Thanks!
Rick

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: read_pnode: error -22 reading pnode at XX:YYYYY
  2011-05-05 21:07   ` Rick Johnson
@ 2011-05-06 18:32     ` Artem Bityutskiy
  2011-05-09 17:41       ` Rick Johnson
  2011-05-20 21:16       ` Rick Johnson
  0 siblings, 2 replies; 11+ messages in thread
From: Artem Bityutskiy @ 2011-05-06 18:32 UTC (permalink / raw)
  To: Rick Johnson; +Cc: linux-mtd

On Thu, 2011-05-05 at 16:07 -0500, Rick Johnson wrote:
> Hi Artem,
> 
> Thanks for your advice!  We were finally able to reproduce the problem.
> 
> > 1. Validate the pnode before packing, do the same validate_pnode() does.
> > May be you'll catch the place where it we write incorrect pnode. Because
> > what you see is a result of an error which might have happend long
> > before you hit it.
> 
> We did validate_pnode() before the pnode was packed and have this output 
> from dbg_dump_pnode():
> 
> (pid 6298) dumping pnode:
> 	address c631d380 parent c6005720 cnext c6005720
> 	flags 3 iip 3 level 0 num -969086448
> 	0: free 0 dirty 127984 flags 1 lnum 514
> 	1: free 129024 dirty 0 flags 4 lnum 515
> 	2: free 0 dirty 127920 flags 1 lnum 516
> 	3: free 129024 dirty 0 flags 4 lnum 517
> 
> 
> It looks like 'num' is not valid.  Also, is it normal for 'parent' to be 
> equal to 'cnext'?

This code was written by Adiran and I do not know it, and it was a
little of "very quick programming", so it is not the cleanest code in
UBIFS, sorry.

I do not know by heart, but I think cnext is about the list of nodes
which should be written to the flash at the next commit. So if ->cnext
== ->parent this means that our parent is the next in the list. Hmm, and
I think we dirty nodes in the LPT tree from the bottom up to the root,
and add them to the commit list (cnext), so cnext == parent should be
very common.

But in 'read_pnode()' cnext must be NULL, I think.

Anyway, as I said, that was not my code, so I myself have difficulties
to deal with it. I asked Adrian to help  - may be he'll take a look at
this if he has time.

> We'll continue to look into this, but I thought it wouldn't hurt to get 
> your opinion on the above debug.

Sure. I hope you'll nail this. How well is it reproducible? Can you
reproduce this on nandsim which has equivalent characteristics (PEB
size, page size, count of PEBs) ?

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: read_pnode: error -22 reading pnode at XX:YYYYY
  2011-05-06 18:32     ` Artem Bityutskiy
@ 2011-05-09 17:41       ` Rick Johnson
  2011-05-20 21:16       ` Rick Johnson
  1 sibling, 0 replies; 11+ messages in thread
From: Rick Johnson @ 2011-05-09 17:41 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd


> I do not know by heart, but I think cnext is about the list of nodes
> which should be written to the flash at the next commit. So if ->cnext
> == ->parent this means that our parent is the next in the list. Hmm, and
> I think we dirty nodes in the LPT tree from the bottom up to the root,
> and add them to the commit list (cnext), so cnext == parent should be
> very common.
> 
> But in 'read_pnode()' cnext must be NULL, I think.

That makes sense with what we're seeing.


> Anyway, as I said, that was not my code, so I myself have difficulties
> to deal with it. I asked Adrian to help  - may be he'll take a look at
> this if he has time.

Thank you for passing the word along.

> Sure. I hope you'll nail this. How well is it reproducible? Can you
> reproduce this on nandsim which has equivalent characteristics (PEB
> size, page size, count of PEBs) ?

This is real tough to reproduce.  It happens just enough to give us 
trouble.  We haven't tried nandsim before, but I'll see if we can get 
that going.

Thanks again.

Rick

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: read_pnode: error -22 reading pnode at XX:YYYYY
  2011-05-06 18:32     ` Artem Bityutskiy
  2011-05-09 17:41       ` Rick Johnson
@ 2011-05-20 21:16       ` Rick Johnson
  2011-05-24  8:41         ` Artem Bityutskiy
  1 sibling, 1 reply; 11+ messages in thread
From: Rick Johnson @ 2011-05-20 21:16 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd


We were able to gather more debug this time when the error reoccurred. 
I thought I'd post it, just in case something jumps out.  The strange 
thing is that 'num' seems to have a value that resembles an address.  In 
the example below, num is set to 0xC72EB990.  That's been true every 
time we've seen any debug.  Maybe it doesn't mean anything.

Thanks for taking a look.

UBIFS DBG (pid 30): get_znodes_to_commit: committing 135 znodes
UBIFS DBG (pid 30): alloc_idx_lebs: need about 0 empty LEBS for TNC commit
UBIFS DBG (pid 30): ubifs_lpt_lookup_dirty: LEB 8007, free 120832, dirty 
8192, flags 48
UBIFS DBG (pid 30): ubifs_change_lp: LEB 8007, free 96256, dirty 9656, 
flags 48
UBIFS DBG (pid 30): ubifs_lpt_lookup_dirty: LEB 8007, free 120832, dirty 
8192, flags 48
UBIFS DBG (pid 30): ubifs_save_dirty_idx_lnums: found 1 dirty index LEBs
UBIFS DBG (pid 30): ubifs_save_dirty_idx_lnums: dirtiest index LEB is 
106 with dirty 39616 and free 0
UBIFS DBG (pid 30): ubifs_tnc_start_commit: number of index LEBs 2
UBIFS DBG (pid 30): ubifs_tnc_start_commit: size of index 112520
UBIFS DBG (pid 30): ubifs_lpt_start_commit:
UBIFS error (pid 30): dbg_chk_pnode: pnode num -953239152 expected 1980 
parent num 2029 iip 0
UBIFS error (pid 30): do_commit: commit failed, error -22
UBIFS warning (pid 30): ubifs_ro_mode: switched to read-only mode, error -22

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: read_pnode: error -22 reading pnode at XX:YYYYY
  2011-05-20 21:16       ` Rick Johnson
@ 2011-05-24  8:41         ` Artem Bityutskiy
  2011-05-24 11:43           ` Artem Bityutskiy
  2011-05-24 19:37           ` Rick Johnson
  0 siblings, 2 replies; 11+ messages in thread
From: Artem Bityutskiy @ 2011-05-24  8:41 UTC (permalink / raw)
  To: Rick Johnson; +Cc: linux-mtd

On Fri, 2011-05-20 at 16:16 -0500, Rick Johnson wrote:
> We were able to gather more debug this time when the error reoccurred. 
> I thought I'd post it, just in case something jumps out.  The strange 
> thing is that 'num' seems to have a value that resembles an address.  In 
> the example below, num is set to 0xC72EB990.  That's been true every 
> time we've seen any debug.  Maybe it doesn't mean anything.

Hi,

1. Your kernel version is 2.6.36, do you use up-to-date UBI/UBIFS from
   the ubifs-v2.6.36 back-port tree? See here:
   http://www.linux-mtd.infradead.org/doc/ubifs.html#L_source

2. Could you please make sure you have UBIFS debugging enabled - just
   debugging support, no additional messages or checks. Then post UBIFS
   messages which it prints when you mount it. It prints things like
   LPT model (c->big_lpt) which I want to know.

3. Please, enable LPT extra checks and run your tests and try to
   reproduce this issue. To enable LPT extra checks you need to do:

   $ sudo sh -c 'echo 32 > /sys/module/ubifs/parameters/debug_chks'

   This will make UBIFS to check lprops at each commit and we have a
   better chance to catch issues.

Thanks!

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: read_pnode: error -22 reading pnode at XX:YYYYY
  2011-05-24  8:41         ` Artem Bityutskiy
@ 2011-05-24 11:43           ` Artem Bityutskiy
  2011-05-27 20:26             ` Rick Johnson
  2011-05-24 19:37           ` Rick Johnson
  1 sibling, 1 reply; 11+ messages in thread
From: Artem Bityutskiy @ 2011-05-24 11:43 UTC (permalink / raw)
  To: Rick Johnson; +Cc: linux-mtd

On Tue, 2011-05-24 at 11:41 +0300, Artem Bityutskiy wrote:
> 3. Please, enable LPT extra checks and run your tests and try to
>    reproduce this issue. To enable LPT extra checks you need to do:
> 
>    $ sudo sh -c 'echo 32 > /sys/module/ubifs/parameters/debug_chks'
> 
>    This will make UBIFS to check lprops at each commit and we have a
>    better chance to catch issues.

Note, you should not enable other checks or any messages - this will
slow down things. Just enable LPT extra-checks.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: read_pnode: error -22 reading pnode at XX:YYYYY
  2011-05-24  8:41         ` Artem Bityutskiy
  2011-05-24 11:43           ` Artem Bityutskiy
@ 2011-05-24 19:37           ` Rick Johnson
  1 sibling, 0 replies; 11+ messages in thread
From: Rick Johnson @ 2011-05-24 19:37 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd

Artem Bityutskiy wrote:
> 
> 1. Your kernel version is 2.6.36, do you use up-to-date UBI/UBIFS from
>    the ubifs-v2.6.36 back-port tree? See here:
>    http://www.linux-mtd.infradead.org/doc/ubifs.html#L_source

We are not up to date.  I'll look into getting us there.  Thanks!

In the meantime, I have the debug info you were looking for.

> 2. Could you please make sure you have UBIFS debugging enabled - just
>    debugging support, no additional messages or checks. Then post UBIFS
>    messages which it prints when you mount it. It prints things like
>    LPT model (c->big_lpt) which I want to know.

UBIFS DBG (pid 1): mount_ubifs: compiled on:         May 13 2011 at 14:22:48
UBIFS DBG (pid 1): mount_ubifs: min. I/O unit size:  2048 bytes
UBIFS DBG (pid 1): mount_ubifs: LEB size:            129024 bytes (126 KiB)
UBIFS DBG (pid 1): mount_ubifs: data journal heads:  1
UBIFS DBG (pid 1): mount_ubifs: UUID: 
2A1E7A10-8791-42F1-9BA3-F13692BF16E2
UBIFS DBG (pid 1): mount_ubifs: big_lpt              1
UBIFS DBG (pid 1): mount_ubifs: log LEBs:            5 (3 - 7)
UBIFS DBG (pid 1): mount_ubifs: LPT area LEBs:       13 (8 - 20)
UBIFS DBG (pid 1): mount_ubifs: orphan area LEBs:    1 (21 - 21)
UBIFS DBG (pid 1): mount_ubifs: main area LEBs:      8041 (22 - 8062)
UBIFS DBG (pid 1): mount_ubifs: index LEBs:          2
UBIFS DBG (pid 1): mount_ubifs: total index bytes:   165544 (161 KiB, 0 MiB)
UBIFS DBG (pid 1): mount_ubifs: key hash type:       0
UBIFS DBG (pid 1): mount_ubifs: tree fanout:         8
UBIFS DBG (pid 1): mount_ubifs: reserved GC LEB:     7709
UBIFS DBG (pid 1): mount_ubifs: first main LEB:      22
UBIFS DBG (pid 1): mount_ubifs: max. znode size      240
UBIFS DBG (pid 1): mount_ubifs: max. index node size 192
UBIFS DBG (pid 1): mount_ubifs: node sizes:          data 48, inode 160, 
dentry 56
UBIFS DBG (pid 1): mount_ubifs: node sizes:          trun 56, sb 4096, 
master 512
UBIFS DBG (pid 1): mount_ubifs: node sizes:          ref 64, cmt. start 
32, orph 32
UBIFS DBG (pid 1): mount_ubifs: max. node sizes:     data 4144, inode 
4256 dentry 312
UBIFS DBG (pid 1): mount_ubifs: dead watermark:      2048
UBIFS DBG (pid 1): mount_ubifs: dark watermark:      6144
UBIFS DBG (pid 1): mount_ubifs: LEB overhead:        560
UBIFS DBG (pid 1): mount_ubifs: max. dark space:     49403904 (48246 
KiB, 47 MiB)
UBIFS DBG (pid 1): mount_ubifs: maximum bud bytes:   8388608 (8192 KiB, 
8 MiB)
UBIFS DBG (pid 1): mount_ubifs: BG commit bud bytes: 6815744 (6656 KiB, 
6 MiB)
UBIFS DBG (pid 1): mount_ubifs: current bud bytes    139264 (136 KiB, 0 MiB)
UBIFS DBG (pid 1): mount_ubifs: max. seq. number:    7823
UBIFS DBG (pid 1): mount_ubifs: commit number:       11


> 3. Please, enable LPT extra checks and run your tests and try to
>    reproduce this issue. To enable LPT extra checks you need to do:
> 
>    $ sudo sh -c 'echo 32 > /sys/module/ubifs/parameters/debug_chks'
> 
>    This will make UBIFS to check lprops at each commit and we have a
>    better chance to catch issues.
> 

We were able to make it fail before limiting the checks to just the LPT 
area.  I'll try it again with your above change.

UBIFS error (pid 29): dbg_chk_pnode: pnode num -952997232 expected 1803 
parent num 6196 iip 3
UBIFS error (pid 29): do_commit: commit failed, error -22
UBIFS warning (pid 29): ubifs_ro_mode: switched to read-only mode, error -22
Call Trace:
[c7a19ec0] [c00085b8] show_stack+0x7c/0x194 (unreliable)
[c7a19f00] [c00f2608] ubifs_ro_mode+0x6c/0x7c
[c7a19f10] [c01000bc] do_commit+0x5fc/0x620
[c7a19f90] [c0100468] ubifs_bg_thread+0xf4/0x154
[c7a19fb0] [c0037cec] kthread+0x80/0x84
[c7a19ff0] [c000ef84] kernel_thread+0x4c/0x68

> Thanks!
> 
Thank you!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: read_pnode: error -22 reading pnode at XX:YYYYY
  2011-05-24 11:43           ` Artem Bityutskiy
@ 2011-05-27 20:26             ` Rick Johnson
  2011-05-30 16:07               ` Artem Bityutskiy
  0 siblings, 1 reply; 11+ messages in thread
From: Rick Johnson @ 2011-05-27 20:26 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd

I had some other observations that I might as well note too.

We had added code to constantly check the value of pnode->num versus the 
  num returned from calc_pnode_num_from_parent().  We noticed errors 
mainly in two places: ubifs_pack_pnode() and ubifs_get_pnode().

The ubifs_get_pnode() was more interesting though.  We only got a 
corrupted 'num' value when the function returned the pnode from memory 
and not from flash.  In other words, when ubifs_get_pnode() followed 
this path:

branch = &parent->nbranch[iip];
pnode = branch->pnode;
if (pnode)
   return pnode;

So it seems like 'num' is getting corrupted or changed while the pnode 
is in memory.  If it being corrupted in memory, it's strange that 'num' 
is so consistently targeted and that the corruption is not more random.

Another thing we noticed was that the pnode's with the the corrupted 
'num' had lprops with LEB numbers between 7600 and 8010 (which is near 
the max of 8062 on our flash).  Maybe that's just a coincidence, but I 
might as well note it.

Thanks for any advice you might have!
Rick

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: read_pnode: error -22 reading pnode at XX:YYYYY
  2011-05-27 20:26             ` Rick Johnson
@ 2011-05-30 16:07               ` Artem Bityutskiy
  0 siblings, 0 replies; 11+ messages in thread
From: Artem Bityutskiy @ 2011-05-30 16:07 UTC (permalink / raw)
  To: Rick Johnson; +Cc: linux-mtd

On Fri, 2011-05-27 at 15:26 -0500, Rick Johnson wrote:
> I had some other observations that I might as well note too.
> 
> We had added code to constantly check the value of pnode->num versus the 
>   num returned from calc_pnode_num_from_parent().  We noticed errors 
> mainly in two places: ubifs_pack_pnode() and ubifs_get_pnode().
> 
> The ubifs_get_pnode() was more interesting though.  We only got a 
> corrupted 'num' value when the function returned the pnode from memory 
> and not from flash.  In other words, when ubifs_get_pnode() followed 
> this path:
> 
> branch = &parent->nbranch[iip];
> pnode = branch->pnode;
> if (pnode)
>    return pnode;
> 
> So it seems like 'num' is getting corrupted or changed while the pnode 
> is in memory.  If it being corrupted in memory, it's strange that 'num' 
> is so consistently targeted and that the corruption is not more random.

Well, there is some bug somewhere which corrupts memory, or this field,
I guess. I guess. I'd started putting more and more "targeted" checks at
different places and tried to narrow down the point when it gets
corrupted.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-05-30 16:12 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-19 21:08 read_pnode: error -22 reading pnode at XX:YYYYY Rick Johnson
2011-04-21 13:20 ` Artem Bityutskiy
2011-05-05 21:07   ` Rick Johnson
2011-05-06 18:32     ` Artem Bityutskiy
2011-05-09 17:41       ` Rick Johnson
2011-05-20 21:16       ` Rick Johnson
2011-05-24  8:41         ` Artem Bityutskiy
2011-05-24 11:43           ` Artem Bityutskiy
2011-05-27 20:26             ` Rick Johnson
2011-05-30 16:07               ` Artem Bityutskiy
2011-05-24 19:37           ` Rick Johnson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.