All of lore.kernel.org
 help / color / mirror / Atom feed
* UBIFS - strange behaviour after enabling chk_index
@ 2016-09-14  8:28 Krzeminski, Marcin (Nokia - PL/Wroclaw)
  2016-09-14 21:39 ` Richard Weinberger
  0 siblings, 1 reply; 5+ messages in thread
From: Krzeminski, Marcin (Nokia - PL/Wroclaw) @ 2016-09-14  8:28 UTC (permalink / raw)
  To: linux-mtd, Richard Weinberger

(Plain text now)

Hi Richard, list

During some power-cut tests I accidently came across strange case.
The test script is here (I could attach it this in thread later).

http://pastebin.com/daNrf6yG

Result of running this on HW and Qemu is the same, and it ends like below (logs from Qemu).

It does not reproduce if I use flash with 256KiB sectors (both Qemu and HW),
or do not enable index integrity checks.
Could it be real life case?

root@qemu:~# ./same_file.sh 
[   28.939927] ubi0: attaching mtd4
[   29.038279] ubi0: scanning is finished
[   29.038717] ubi0: empty MTD device detected
[   29.081212] device: 'ubi0': device_add
[   29.083301] ubi0: attached mtd4 (name "ubi", size 62 MiB)
[   29.083478] ubi0: PEB size: 65536 bytes (64 KiB), LEB size: 65408 bytes
[   29.084060] ubi0: min./max. I/O unit sizes: 1/256, sub-page size 1
[   29.084270] ubi0: VID header offset: 64 (aligned 64), data offset: 128
[   29.084452] ubi0: good PEBs: 996, bad PEBs: 0, corrupted PEBs: 0
[   29.084628] ubi0: user volume: 0, internal volumes: 1, max. volumes count: 128
[   29.084835] ubi0: max/mean erase counter: 0/0, WL threshold: 4096, image sequence number: 1271795826
[   29.085095] ubi0: available PEBs: 992, total reserved PEBs: 4, PEBs reserved for bad PEB handling: 0
[   29.104736] ubi0: background thread "ubi_bgt0d" started, PID 312
UBI device number 0, total 996 LEBs (65146368 bytes, 62.1 MiB), available 992 LEBs (64884736 [   29.268204] device: 'ubi0_0': device_add
[   29.292204] device: 'mtd7': device_add
[   29.324205] device: 'mtd7ro': device_add
[   29.326111] device: '31:7': device_add
[   29.326617] device: 'mtdblock7': device_add
bytes, 61.9 MiB), LEB size 65408 bytes (63.9 KiB)
Volume ID 0, size 161 LEBs (10530688 bytes[   29.556089] device: 'ubifs_0_0': device_add
[   29.627573] UBIFS (ubi0:0): default file-system created
[   29.653269] UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 317
[   29.696687] UBIFS (ubi0:0): UBIFS: mounted UBI device 0, volume 0, name "ubifs_vol"
[   29.697085] UBIFS (ubi0:0): LEB size: 65408 bytes (63 KiB), min./max. I/O unit sizes: 8 bytes/256 bytes
[   29.697539] UBIFS (ubi0:0): FS size: 9876608 bytes (9 MiB, 151 LEBs), journal size 523265 bytes (0 MiB, 8 LEBs)
[   29.697999] UBIFS (ubi0:0): reserved for root: 466496 bytes (455 KiB)
[   29.698668] UBIFS (ubi0:0): media format: w4/r0 (latest is w4/r0), UUID 58BD7596-296F-4179-AED9-A57E457F961D, small LPT model
, 10.0 MiB), LEB size 65408 bytes (63.9 KiB), dynamic, name "ubifs_vol", alignment 1
Generat[   30.049759] random: nonblocking pool is initialized
ing file...
5+0 records in
5+0 records out
Copying same file 1...
Copying same file 2...
Copying same file 3...
Deleting files
Generating new file...
5+0 records in
5+0 records out
Copying new file...
[   63.056570] UBIFS error (ubi0:0 pid 325): ubifs_read_node: bad node type (255 but expected 9)
[   63.056934] UBIFS error (ubi0:0 pid 325): ubifs_read_node: bad node at LEB 10:0, LEB mapping status 0
[   63.057301] Not a node, first 24 bytes:
[   63.058093] 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff                          ........................
[   63.058720] CPU: 0 PID: 325 Comm: cp Not tainted 4.4.0 #20
[   63.058878] Hardware name: Nokia-Nahka (Device Tree Support)
[   63.060141] [<c0012515>] (unwind_backtrace) from [<c00100e3>] (show_stack+0xb/0xc)
[   63.060612] [<c00100e3>] (show_stack) from [<c0120501>] (dump_stack+0x51/0x64)
[   63.060809] [<c0120501>] (dump_stack) from [<c00e0add>] (ubifs_read_node+0x221/0x25c)
[   63.061016] [<c00e0add>] (ubifs_read_node) from [<c00e75dd>] (dbg_check_old_index+0x85/0x2f8)
[   63.061233] [<c00e75dd>] (dbg_check_old_index) from [<c00e7a37>] (do_commit+0x1e7/0x444)
[   63.061484] [<c00e7a37>] (do_commit) from [<c00d68d5>] (make_reservation+0x395/0x420)
[   63.061744] [<c00d68d5>] (make_reservation) from [<c00d7049>] (ubifs_jnl_write_data+0x139/0x244)
[   63.062052] [<c00d7049>] (ubifs_jnl_write_data) from [<c00d83ef>] (do_writepage+0x87/0x158)
[   63.062327] [<c00d83ef>] (do_writepage) from [<c005cc4f>] (__writepage+0xb/0x24)
[   63.062570] [<c005cc4f>] (__writepage) from [<c005d1b7>] (write_cache_pages+0x157/0x1e0)
[   63.062819] [<c005d1b7>] (write_cache_pages) from [<c005d261>] (generic_writepages+0x21/0x34)
[   63.063040] [<c005d261>] (generic_writepages) from [<c005833d>] (__filemap_fdatawrite_range+0x57/0x66)
[   63.063267] [<c005833d>] (__filemap_fdatawrite_range) from [<c00583f1>] (filemap_write_and_wait_range+0x21/0x4e)
[   63.063548] [<c00583f1>] (filemap_write_and_wait_range) from [<c00d8911>] (ubifs_fsync+0x53/0xa2)
[   63.064021] [<c00d8911>] (ubifs_fsync) from [<c0058ba3>] (generic_file_write_iter+0x183/0x1a0)
[   63.064204] [<c0058ba3>] (generic_file_write_iter) from [<c00d98ab>] (ubifs_write_iter+0xa7/0xfc)
[   63.064487] [<c00d98ab>] (ubifs_write_iter) from [<c0078ebd>] (vfs_iter_write+0x4f/0x6e)
[   63.064717] [<c0078ebd>] (vfs_iter_write) from [<c0091e25>] (iter_file_splice_write+0x149/0x1e8)
[   63.065065] [<c0091e25>] (iter_file_splice_write) from [<c0092041>] (direct_splice_actor+0x1d/0x24)
[   63.065433] [<c0092041>] (direct_splice_actor) from [<c0091a39>] (splice_direct_to_actor+0xb5/0x142)
[   63.065798] [<c0091a39>] (splice_direct_to_actor) from [<c0091b1b>] (do_splice_direct+0x55/0x76)
[   63.066159] [<c0091b1b>] (do_splice_direct) from [<c0079797>] (do_sendfile+0x10b/0x218)
[   63.066554] [<c0079797>] (do_sendfile) from [<c0079dc7>] (SyS_sendfile64+0xc5/0xca)
[   63.066881] [<c0079dc7>] (SyS_sendfile64) from [<c000d9a1>] (ret_fast_syscall+0x1/0x4c)
[   63.067389] UBIFS error (ubi0:0 pid 325): dbg_check_old_index: failed, error -22
[   63.067724] UBIFS error (ubi0:0 pid 325): do_commit: commit failed, error -22
[   63.068106] UBIFS warning (ubi0:0 pid 325): ubifs_ro_mode: switched to read-only mode, error -22
[   63.068473] CPU: 0 PID: 325 Comm: cp Not tainted 4.4.0 #20
[   63.068697] Hardware name: Nokia-Nahka (Device Tree Support)
[   63.068964] [<c0012515>] (unwind_backtrace) from [<c00100e3>] (show_stack+0xb/0xc)
[   63.069284] [<c00100e3>] (show_stack) from [<c0120501>] (dump_stack+0x51/0x64)
[   63.069598] [<c0120501>] (dump_stack) from [<c00e7c6f>] (do_commit+0x41f/0x444)
[   63.069917] [<c00e7c6f>] (do_commit) from [<c00d68d5>] (make_reservation+0x395/0x420)
[   63.070242] [<c00d68d5>] (make_reservation) from [<c00d7049>] (ubifs_jnl_write_data+0x139/0x244)
[   63.070606] [<c00d7049>] (ubifs_jnl_write_data) from [<c00d83ef>] (do_writepage+0x87/0x158)
[   63.070928] [<c00d83ef>] (do_writepage) from [<c005cc4f>] (__writepage+0xb/0x24)
[   63.071234] [<c005cc4f>] (__writepage) from [<c005d1b7>] (write_cache_pages+0x157/0x1e0)
[   63.071557] [<c005d1b7>] (write_cache_pages) from [<c005d261>] (generic_writepages+0x21/0x34)
[   63.071894] [<c005d261>] (generic_writepages) from [<c005833d>] (__filemap_fdatawrite_range+0x57/0x66)
[   63.072272] [<c005833d>] (__filemap_fdatawrite_range) from [<c00583f1>] (filemap_write_and_wait_range+0x21/0x4e)
[   63.072671] [<c00583f1>] (filemap_write_and_wait_range) from [<c00d8911>] (ubifs_fsync+0x53/0xa2)
[   63.073030] [<c00d8911>] (ubifs_fsync) from [<c0058ba3>] (generic_file_write_iter+0x183/0x1a0)
[   63.073284] [<c0058ba3>] (generic_file_write_iter) from [<c00d98ab>] (ubifs_write_iter+0xa7/0xfc)
[   63.073540] [<c00d98ab>] (ubifs_write_iter) from [<c0078ebd>] (vfs_iter_write+0x4f/0x6e)
[   63.073894] [<c0078ebd>] (vfs_iter_write) from [<c0091e25>] (iter_file_splice_write+0x149/0x1e8)
[   63.074096] [<c0091e25>] (iter_file_splice_write) from [<c0092041>] (direct_splice_actor+0x1d/0x24)
[   63.074307] [<c0092041>] (direct_splice_actor) from [<c0091a39>] (splice_direct_to_actor+0xb5/0x142)
[   63.074510] [<c0091a39>] (splice_direct_to_actor) from [<c0091b1b>] (do_splice_direct+0x55/0x76)
[   63.074707] [<c0091b1b>] (do_splice_direct) from [<c0079797>] (do_sendfile+0x10b/0x218)
[   63.074898] [<c0079797>] (do_sendfile) from [<c0079dc7>] (SyS_sendfile64+0xc5/0xca)
[   63.075085] [<c0079dc7>] (SyS_sendfile64) from [<c000d9a1>] (ret_fast_syscall+0x1/0x4c)
[   63.075366] UBIFS error (ubi0:0 pid 325): do_writepage: cannot write page 45 of inode 68, error -22
cp: write error: Read-only file system
New file cp failed
root@qemu:~# [   92.894358] UBIFS error (ubi0:0 pid 6): make_reservation: cannot reserve 4144 bytes in jhead 2, error -30
[   92.894802] UBIFS error (ubi0:0 pid 6): do_writepage: cannot write page 46 of inode 68, error -30
[   92.895396] UBIFS error (ubi0:0 pid 6): make_reservation: cannot reserve 160 bytes in jhead 1, error -30
[   92.895899] UBIFS error (ubi0:0 pid 6): ubifs_write_inode: can't write inode 68, error -30
[   97.894231] UBIFS error (ubi0:0 pid 6): make_reservation: cannot reserve 4144 bytes in jhead 2, error -30
[   97.894675] UBIFS error (ubi0:0 pid 6): do_writepage: cannot write page 47 of inode 68, error -30

Regards,
Marcin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: UBIFS - strange behaviour after enabling chk_index
  2016-09-14  8:28 UBIFS - strange behaviour after enabling chk_index Krzeminski, Marcin (Nokia - PL/Wroclaw)
@ 2016-09-14 21:39 ` Richard Weinberger
  2016-09-15  6:33   ` Krzeminski, Marcin (Nokia - PL/Wroclaw)
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Weinberger @ 2016-09-14 21:39 UTC (permalink / raw)
  To: Krzeminski, Marcin (Nokia - PL/Wroclaw), linux-mtd

Marcin,

On 14.09.2016 10:28, Krzeminski, Marcin (Nokia - PL/Wroclaw) wrote:
> (Plain text now)
> 
> Hi Richard, list
> 
> During some power-cut tests I accidently came across strange case.
> The test script is here (I could attach it this in thread later).
> 
> http://pastebin.com/daNrf6yG
> 
> Result of running this on HW and Qemu is the same, and it ends like below (logs from Qemu).
> 
> It does not reproduce if I use flash with 256KiB sectors (both Qemu and HW),
> or do not enable index integrity checks.
> Could it be real life case?

Is this the SPI-NOR device you talked about on IRC?
Maybe the check is broken, maybe it uncovers a hard to hit issue
on your flash. Both could be.

Do you emulate exactly the same device in qemu?

Thanks,
//richard

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: UBIFS - strange behaviour after enabling chk_index
  2016-09-14 21:39 ` Richard Weinberger
@ 2016-09-15  6:33   ` Krzeminski, Marcin (Nokia - PL/Wroclaw)
  2016-09-16 15:59     ` Richard Weinberger
  0 siblings, 1 reply; 5+ messages in thread
From: Krzeminski, Marcin (Nokia - PL/Wroclaw) @ 2016-09-15  6:33 UTC (permalink / raw)
  To: Richard Weinberger, linux-mtd

Richad,

> -----Original Message-----
> From: Richard Weinberger [mailto:richard@nod.at]
> Sent: Wednesday, September 14, 2016 11:39 PM
> To: Krzeminski, Marcin (Nokia - PL/Wroclaw)
> <marcin.krzeminski@nokia.com>; linux-mtd@lists.infradead.org
> Subject: Re: UBIFS - strange behaviour after enabling chk_index
> 
> Marcin,
> 
> On 14.09.2016 10:28, Krzeminski, Marcin (Nokia - PL/Wroclaw) wrote:
> > (Plain text now)
> >
> > Hi Richard, list
> >
> > During some power-cut tests I accidently came across strange case.
> > The test script is here (I could attach it this in thread later).
> >
> > http://pastebin.com/daNrf6yG
> >
> > Result of running this on HW and Qemu is the same, and it ends like below
> (logs from Qemu).
> >
> > It does not reproduce if I use flash with 256KiB sectors (both Qemu
> > and HW), or do not enable index integrity checks.
> > Could it be real life case?
> 
> Is this the SPI-NOR device you talked about on IRC?
Yes.

> Maybe the check is broken, maybe it uncovers a hard to hit issue on your
> flash. Both could be.
> 
> Do you emulate exactly the same device in qemu?
Yes, this part of HW is virtualized in Qemu, in fact the same SW build runs on Qemu and HW.
Since it reproduce on both it allows to think that this is SW related (at least for now),
but sure does not mean that this is UBI/UBIFS.

Back to problem, it seem that this could be a debug facility issue, since file system and data
are not corrupted. If data on flash are correct (I have done many test and all passed,
so currently I assume that drier is working properly), it could be that data in debug structure
in this case are somehow outdated/bad (d->old_zroot).
In test script, if I copy the same file twice (instead of three times) the problem does not
reproduce. 
I will try to debug this more, maybe you have any inside where to look?

Thanks,
Marcin
> 
> Thanks,
> //richard

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: UBIFS - strange behaviour after enabling chk_index
  2016-09-15  6:33   ` Krzeminski, Marcin (Nokia - PL/Wroclaw)
@ 2016-09-16 15:59     ` Richard Weinberger
  2016-09-19  7:36       ` Krzeminski, Marcin (Nokia - PL/Wroclaw)
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Weinberger @ 2016-09-16 15:59 UTC (permalink / raw)
  To: Krzeminski, Marcin (Nokia - PL/Wroclaw), linux-mtd

Marcin,

On 15.09.2016 08:33, Krzeminski, Marcin (Nokia - PL/Wroclaw) wrote:
>> Is this the SPI-NOR device you talked about on IRC?
> Yes.

The one where you're facing issues after a power cut?

>> Maybe the check is broken, maybe it uncovers a hard to hit issue on your
>> flash. Both could be.
>>
>> Do you emulate exactly the same device in qemu?
> Yes, this part of HW is virtualized in Qemu, in fact the same SW build runs on Qemu and HW.
> Since it reproduce on both it allows to think that this is SW related (at least for now),
> but sure does not mean that this is UBI/UBIFS.
> 
> Back to problem, it seem that this could be a debug facility issue, since file system and data
> are not corrupted. If data on flash are correct (I have done many test and all passed,
> so currently I assume that drier is working properly), it could be that data in debug structure
> in this case are somehow outdated/bad (d->old_zroot).
> In test script, if I copy the same file twice (instead of three times) the problem does not
> reproduce. 
> I will try to debug this more, maybe you have any inside where to look?

I think chk_index exposes the power cut issue you told me on IRC.
For whatever reason the index corrupts on your flash. And chk_index
catches this very early.

Maybe there is an issue in the SPI-NOR code or UBI itself has trouble
with your flash (layout).

Thanks,
//richard

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: UBIFS - strange behaviour after enabling chk_index
  2016-09-16 15:59     ` Richard Weinberger
@ 2016-09-19  7:36       ` Krzeminski, Marcin (Nokia - PL/Wroclaw)
  0 siblings, 0 replies; 5+ messages in thread
From: Krzeminski, Marcin (Nokia - PL/Wroclaw) @ 2016-09-19  7:36 UTC (permalink / raw)
  To: Richard Weinberger, linux-mtd

Richard

> -----Original Message-----
> From: Richard Weinberger [mailto:richard@nod.at]
> Sent: Friday, September 16, 2016 5:59 PM
> To: Krzeminski, Marcin (Nokia - PL/Wroclaw)
> <marcin.krzeminski@nokia.com>; linux-mtd@lists.infradead.org
> Subject: Re: UBIFS - strange behaviour after enabling chk_index
> 
> Marcin,
> 
> On 15.09.2016 08:33, Krzeminski, Marcin (Nokia - PL/Wroclaw) wrote:
> >> Is this the SPI-NOR device you talked about on IRC?
> > Yes.
> 
> The one where you're facing issues after a power cut?
Yes, but more information is available now.
It is possible to reproduce missing master node case on 256KiB and 64KiB flash.
Also no need to power cut - eg. Watchdog reset can trigger it.
> 
> >> Maybe the check is broken, maybe it uncovers a hard to hit issue on
> >> your flash. Both could be.
> >>
> >> Do you emulate exactly the same device in qemu?
> > Yes, this part of HW is virtualized in Qemu, in fact the same SW build runs
> on Qemu and HW.
> > Since it reproduce on both it allows to think that this is SW related
> > (at least for now), but sure does not mean that this is UBI/UBIFS.
> >
> > Back to problem, it seem that this could be a debug facility issue,
> > since file system and data are not corrupted. If data on flash are
> > correct (I have done many test and all passed, so currently I assume
> > that drier is working properly), it could be that data in debug structure in
> this case are somehow outdated/bad (d->old_zroot).
> > In test script, if I copy the same file twice (instead of three times)
> > the problem does not reproduce.
> > I will try to debug this more, maybe you have any inside where to look?
> 
> I think chk_index exposes the power cut issue you told me on IRC.
> For whatever reason the index corrupts on your flash. And chk_index
> catches this very early.
In this case it is the UBIFS debug facility issue. Zroot is moving on flash,
but debug information is not updated - and at some point I am enabling debugs.
That simple change can fix this:

@@ -575,7 +594,7 @@ int dbg_check_old_index(struct ubifs_info *c, struct ubifs_zbranch *zroot)
        size_t sz;
 
        if (!dbg_is_chk_index(c))
-               return 0;
+               return dbg_old_index_check_init(c, zroot);
 
        INIT_LIST_HEAD(&list);

Obviously adding another read in commit is bad idea so it is feauture...

> 
> Maybe there is an issue in the SPI-NOR code or UBI itself has trouble with
> your flash (layout).
> 
It could be, it seem that it is easier to reproduce if UBIFS is placed on almost all flash area.

Thanks,
Marcin
> Thanks,
> //richard

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-09-19  7:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-14  8:28 UBIFS - strange behaviour after enabling chk_index Krzeminski, Marcin (Nokia - PL/Wroclaw)
2016-09-14 21:39 ` Richard Weinberger
2016-09-15  6:33   ` Krzeminski, Marcin (Nokia - PL/Wroclaw)
2016-09-16 15:59     ` Richard Weinberger
2016-09-19  7:36       ` Krzeminski, Marcin (Nokia - PL/Wroclaw)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.