Linux-mtd Archive on lore.kernel.org
 help / color / Atom feed
* ubifs: mounting fails due to error in orphan file handling
@ 2020-01-28 10:51 Driesen Jef (JDI)
  2020-02-05  8:22 ` Miquel Raynal
  0 siblings, 1 reply; 7+ messages in thread
From: Driesen Jef (JDI) @ 2020-01-28 10:51 UTC (permalink / raw)
  To: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 3735 bytes --]

Hi,

We're experiencing some kind of file system corruption with the UBIFS 
file system after power cuts. The problem shows up as an error during mount:

# mount -t ubifs ubi0:home /home
mount: /home: special device ubi0:home does not exist.

The underlying UBI volumes are all fine:

# mtdinfo /dev/mtd0
mtd0
Name:                           ubi
Type:                           nand
Eraseblock size:                131072 bytes, 128.0 KiB
Amount of eraseblocks:          8192 (1073741824 bytes, 1024.0 MiB)
Minimum input/output unit size: 2048 bytes
Sub-page size:                  2048 bytes
OOB size:                       64 bytes
Character device major/minor:   90:0
Bad blocks are allowed:         true
Device is writable:             true

# ubinfo -a
UBI version:                    1
Count of UBI devices:           1
UBI control device major/minor: 10:58
Present UBI devices:            ubi0

ubi0
Volumes count:                           3
Logical eraseblock size:                 126976 bytes, 124.0 KiB
Total amount of logical eraseblocks:     8192 (1040187392 bytes, 992.0 MiB)
Amount of available logical eraseblocks: 0 (0 bytes)
Maximum count of volumes                 128
Count of bad physical eraseblocks:       0
Count of reserved physical eraseblocks:  160
Current maximum erase counter value:     36
Minimum input/output unit size:          2048 bytes
Character device major/minor:            246:0
Present volumes:                         0, 1, 2

Volume ID:   0 (on ubi0)
Type:        dynamic
Alignment:   1
Size:        2676 LEBs (339787776 bytes, 324.0 MiB)
State:       OK
Name:        rfs2
Character device major/minor: 246:1
-----------------------------------
Volume ID:   1 (on ubi0)
Type:        dynamic
Alignment:   1
Size:        2676 LEBs (339787776 bytes, 324.0 MiB)
State:       OK
Name:        rfs3
Character device major/minor: 246:2
-----------------------------------
Volume ID:   2 (on ubi0)
Type:        dynamic
Alignment:   1
Size:        2674 LEBs (339533824 bytes, 323.8 MiB)
State:       OK
Name:        home
Character device major/minor: 246:3


I already debugged the ubifs kernel module to locate where exactly the 
error is returned, and the call chain is:

ubifs_mount -> ubifs_fill_super -> mount_ubifs -> ubifs_mount_orphans → 
kill_orphans -> do_kill_orphans -> ubifs_tnc_lookup -> ubifs_tnc_locate

The ubifs_tnc_locate function fails with -ENOENT because the 
ubifs_lookup_level0 function returns 0.

If I patch the mount_ubifs function to call ubifs_mount_orphans with 
zero for the unclean parameter (instead of the value of 
c->need_recovery), then the mounting succeeds. Afterwards, when 
rebooting once more with the original unpatched kernel, the file system 
appears to be fixed again, and mounting succeeds.

I'm not really sure what's going on under the hood, but it looks like a 
problem with the handling of the orphan files. With this knowledge, we 
are now able to reproduce the problem reliable, by doing a power cut 
while running the attached script. The scripts creates many files in a 
loop, keeps them all open and removes them again. With this approach we 
hit the problem about once every two attempts.

The problem appeared for the first time after we switched from kernel 
v4.7 to v5.3. I tried with v5.4 and master too, in case we are hitting a 
problem that is already fixed, but they show the same problem. After 
doing some bisecting, this commit appears to have introduced the problem:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs/ubifs/orphan.c?id=ee1438ce5dc4d67dd8dd1ff51583122a61f5bd9e

How can we fix this?

Jef

[-- Attachment #2: ubifs.sh --]
[-- Type: application/x-shellscript, Size: 319 bytes --]

[-- Attachment #3: Type: text/plain, Size: 144 bytes --]

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ubifs: mounting fails due to error in orphan file handling
  2020-01-28 10:51 ubifs: mounting fails due to error in orphan file handling Driesen Jef (JDI)
@ 2020-02-05  8:22 ` Miquel Raynal
  2020-02-05 15:25   ` Jef Driesen
  0 siblings, 1 reply; 7+ messages in thread
From: Miquel Raynal @ 2020-02-05  8:22 UTC (permalink / raw)
  To: Driesen Jef (JDI); +Cc: richard, linux-mtd

Hi Jef,

"Driesen Jef (JDI)" <Jef.Driesen@niko.eu> wrote on Tue, 28
Jan 2020 10:51:39 +0000:

> Hi,
> 
> We're experiencing some kind of file system corruption with the UBIFS 
> file system after power cuts. The problem shows up as an error during mount:
> 
> # mount -t ubifs ubi0:home /home
> mount: /home: special device ubi0:home does not exist.
> 
> The underlying UBI volumes are all fine:
> 
> # mtdinfo /dev/mtd0
> mtd0
> Name:                           ubi
> Type:                           nand
> Eraseblock size:                131072 bytes, 128.0 KiB
> Amount of eraseblocks:          8192 (1073741824 bytes, 1024.0 MiB)
> Minimum input/output unit size: 2048 bytes
> Sub-page size:                  2048 bytes
> OOB size:                       64 bytes
> Character device major/minor:   90:0
> Bad blocks are allowed:         true
> Device is writable:             true
> 
> # ubinfo -a
> UBI version:                    1
> Count of UBI devices:           1
> UBI control device major/minor: 10:58
> Present UBI devices:            ubi0
> 
> ubi0
> Volumes count:                           3
> Logical eraseblock size:                 126976 bytes, 124.0 KiB
> Total amount of logical eraseblocks:     8192 (1040187392 bytes, 992.0 MiB)
> Amount of available logical eraseblocks: 0 (0 bytes)
> Maximum count of volumes                 128
> Count of bad physical eraseblocks:       0
> Count of reserved physical eraseblocks:  160
> Current maximum erase counter value:     36
> Minimum input/output unit size:          2048 bytes
> Character device major/minor:            246:0
> Present volumes:                         0, 1, 2
> 
> Volume ID:   0 (on ubi0)
> Type:        dynamic
> Alignment:   1
> Size:        2676 LEBs (339787776 bytes, 324.0 MiB)
> State:       OK
> Name:        rfs2
> Character device major/minor: 246:1
> -----------------------------------
> Volume ID:   1 (on ubi0)
> Type:        dynamic
> Alignment:   1
> Size:        2676 LEBs (339787776 bytes, 324.0 MiB)
> State:       OK
> Name:        rfs3
> Character device major/minor: 246:2
> -----------------------------------
> Volume ID:   2 (on ubi0)
> Type:        dynamic
> Alignment:   1
> Size:        2674 LEBs (339533824 bytes, 323.8 MiB)
> State:       OK
> Name:        home
> Character device major/minor: 246:3
> 
> 
> I already debugged the ubifs kernel module to locate where exactly the 
> error is returned, and the call chain is:
> 
> ubifs_mount -> ubifs_fill_super -> mount_ubifs -> ubifs_mount_orphans → 
> kill_orphans -> do_kill_orphans -> ubifs_tnc_lookup -> ubifs_tnc_locate
> 
> The ubifs_tnc_locate function fails with -ENOENT because the 
> ubifs_lookup_level0 function returns 0.
> 
> If I patch the mount_ubifs function to call ubifs_mount_orphans with 
> zero for the unclean parameter (instead of the value of 
> c->need_recovery), then the mounting succeeds. Afterwards, when 
> rebooting once more with the original unpatched kernel, the file system 
> appears to be fixed again, and mounting succeeds.
> 
> I'm not really sure what's going on under the hood, but it looks like a 
> problem with the handling of the orphan files. With this knowledge, we 
> are now able to reproduce the problem reliable, by doing a power cut 
> while running the attached script. The scripts creates many files in a 
> loop, keeps them all open and removes them again. With this approach we 
> hit the problem about once every two attempts.
> 
> The problem appeared for the first time after we switched from kernel 
> v4.7 to v5.3. I tried with v5.4 and master too, in case we are hitting a 
> problem that is already fixed, but they show the same problem. After 
> doing some bisecting, this commit appears to have introduced the problem:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs/ubifs/orphan.c?id=ee1438ce5dc4d67dd8dd1ff51583122a61f5bd9e
> 
> How can we fix this?

Just adding Richard into the loop, he is not available right now but
will probably be interested by this issue. On my side, I have no clue :)

Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ubifs: mounting fails due to error in orphan file handling
  2020-02-05  8:22 ` Miquel Raynal
@ 2020-02-05 15:25   ` Jef Driesen
  2020-02-05 16:17     ` Steve deRosier
  0 siblings, 1 reply; 7+ messages in thread
From: Jef Driesen @ 2020-02-05 15:25 UTC (permalink / raw)
  To: Miquel Raynal; +Cc: richard, linux-mtd

On 2/5/20 9:22 AM, Miquel Raynal wrote:
> "Driesen Jef (JDI)" <Jef.Driesen@niko.eu> wrote on Tue, 28
> Jan 2020 10:51:39 +0000:
>> ...
>>
>> I'm not really sure what's going on under the hood, but it looks like a
>> problem with the handling of the orphan files. With this knowledge, we
>> are now able to reproduce the problem reliable, by doing a power cut
>> while running the attached script. The scripts creates many files in a
>> loop, keeps them all open and removes them again. With this approach we
>> hit the problem about once every two attempts.
>>
>> The problem appeared for the first time after we switched from kernel
>> v4.7 to v5.3. I tried with v5.4 and master too, in case we are hitting a
>> problem that is already fixed, but they show the same problem. After
>> doing some bisecting, this commit appears to have introduced the problem:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs/ubifs/orphan.c?id=ee1438ce5dc4d67dd8dd1ff51583122a61f5bd9e
>>
>> How can we fix this?
> 
> Just adding Richard into the loop, he is not available right now but
> will probably be interested by this issue. On my side, I have no clue :)

Thanks. If additional info is needed, or some extra testing is 
necessary, just ask. I'm happy to help to get this fixed.

For now, we have reverted the above commit. That appears to work (e.g. 
no more device that fail to boot), but I'm not convinced it's a good 
long-term solution.

Jef

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ubifs: mounting fails due to error in orphan file handling
  2020-02-05 15:25   ` Jef Driesen
@ 2020-02-05 16:17     ` Steve deRosier
  2020-02-07 10:18       ` Jef Driesen
  0 siblings, 1 reply; 7+ messages in thread
From: Steve deRosier @ 2020-02-05 16:17 UTC (permalink / raw)
  To: Jef Driesen; +Cc: Richard Weinberger, linux-mtd, Miquel Raynal

On Wed, Feb 5, 2020 at 7:25 AM Jef Driesen <jef.driesen@niko.eu> wrote:
>
> On 2/5/20 9:22 AM, Miquel Raynal wrote:
> > "Driesen Jef (JDI)" <Jef.Driesen@niko.eu> wrote on Tue, 28
> > Jan 2020 10:51:39 +0000:
> >> ...
> >>
> >> I'm not really sure what's going on under the hood, but it looks like a
> >> problem with the handling of the orphan files. With this knowledge, we
> >> are now able to reproduce the problem reliable, by doing a power cut
> >> while running the attached script. The scripts creates many files in a
> >> loop, keeps them all open and removes them again. With this approach we
> >> hit the problem about once every two attempts.
> >>
> >> The problem appeared for the first time after we switched from kernel
> >> v4.7 to v5.3. I tried with v5.4 and master too, in case we are hitting a
> >> problem that is already fixed, but they show the same problem. After
> >> doing some bisecting, this commit appears to have introduced the problem:
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs/ubifs/orphan.c?id=ee1438ce5dc4d67dd8dd1ff51583122a61f5bd9e
> >>
> >> How can we fix this?
> >
> > Just adding Richard into the loop, he is not available right now but
> > will probably be interested by this issue. On my side, I have no clue :)
>
> Thanks. If additional info is needed, or some extra testing is
> necessary, just ask. I'm happy to help to get this fixed.
>
> For now, we have reverted the above commit. That appears to work (e.g.
> no more device that fail to boot), but I'm not convinced it's a good
> long-term solution.
>

Looking at the comment in the mentioned commit: "This corner case
needs to get addressed in the orphans subsystem too."

Was it addressed?  Was there a second commit for that?  If so, is it
in your tree?

Beyond that, no ideas, it's not a chunk of code I am familiar with.

- Steve

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ubifs: mounting fails due to error in orphan file handling
  2020-02-05 16:17     ` Steve deRosier
@ 2020-02-07 10:18       ` Jef Driesen
  2020-02-07 11:04         ` Richard Weinberger
  0 siblings, 1 reply; 7+ messages in thread
From: Jef Driesen @ 2020-02-07 10:18 UTC (permalink / raw)
  To: Steve deRosier; +Cc: Richard Weinberger, linux-mtd, Miquel Raynal

On 2/5/20 5:17 PM, Steve deRosier wrote:
> Looking at the comment in the mentioned commit: "This corner case
> needs to get addressed in the orphans subsystem too."
> 
> Was it addressed?  Was there a second commit for that?  If so, is it
> in your tree?

I don't see anything relevant showing up with a quick:

git log ee1438ce5dc4d67dd8dd1ff51583122a61f5bd9e..master -- fs/ubifs/

The only fix that refers to that particular commit is this one:

commit 10256f000932f12596dc043cf880ecf488a32510
Author: Zhihao Cheng <chengzhihao1@huawei.com>
Date:   2019-10-29 20:58:23 +0800

     ubifs: do_kill_orphans: Fix a memory leak bug

     If there are more than one valid snod on the sleb->nodes list,
     do_kill_orphans will malloc ino more than once without releasing
     previous ino's memory. Finally, it will trigger memory leak.

     Fixes: ee1438ce5dc4 ("ubifs: Check link count of inodes when...")
     Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
     Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
     Signed-off-by: Richard Weinberger <richard@nod.at>

But that's about fixing a memory leak, and not the on-disk data.

> Beyond that, no ideas, it's not a chunk of code I am familiar with.

Me neither.

Jef

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ubifs: mounting fails due to error in orphan file handling
  2020-02-07 10:18       ` Jef Driesen
@ 2020-02-07 11:04         ` Richard Weinberger
  2020-02-11 13:47           ` Jef Driesen
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Weinberger @ 2020-02-07 11:04 UTC (permalink / raw)
  To: Jef Driesen; +Cc: Steve deRosier, Richard Weinberger, linux-mtd, Miquel Raynal

On Fri, Feb 7, 2020 at 11:18 AM Jef Driesen <jef.driesen@niko.eu> wrote:
>
> On 2/5/20 5:17 PM, Steve deRosier wrote:
> > Looking at the comment in the mentioned commit: "This corner case
> > needs to get addressed in the orphans subsystem too."
> >
> > Was it addressed?  Was there a second commit for that?  If so, is it
> > in your tree?
>
> I don't see anything relevant showing up with a quick:
>
> git log ee1438ce5dc4d67dd8dd1ff51583122a61f5bd9e..master -- fs/ubifs/
>
> The only fix that refers to that particular commit is this one:
>
> commit 10256f000932f12596dc043cf880ecf488a32510
> Author: Zhihao Cheng <chengzhihao1@huawei.com>
> Date:   2019-10-29 20:58:23 +0800
>
>      ubifs: do_kill_orphans: Fix a memory leak bug
>
>      If there are more than one valid snod on the sleb->nodes list,
>      do_kill_orphans will malloc ino more than once without releasing
>      previous ino's memory. Finally, it will trigger memory leak.
>
>      Fixes: ee1438ce5dc4 ("ubifs: Check link count of inodes when...")
>      Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
>      Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
>      Signed-off-by: Richard Weinberger <richard@nod.at>
>
> But that's about fixing a memory leak, and not the on-disk data.
>
> > Beyond that, no ideas, it's not a chunk of code I am familiar with.

I send a fix for this before I started traveling:
[PATCH] ubifs: Fix ubifs_tnc_lookup() usage in do_kill_orphans()

Digging currently thought all my mails....

-- 
Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ubifs: mounting fails due to error in orphan file handling
  2020-02-07 11:04         ` Richard Weinberger
@ 2020-02-11 13:47           ` Jef Driesen
  0 siblings, 0 replies; 7+ messages in thread
From: Jef Driesen @ 2020-02-11 13:47 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Steve deRosier, Richard Weinberger, linux-mtd, Miquel Raynal

On 2/7/20 12:04 PM, Richard Weinberger wrote:
> On Fri, Feb 7, 2020 at 11:18 AM Jef Driesen <jef.driesen@niko.eu> wrote:
>> On 2/5/20 5:17 PM, Steve deRosier wrote:
>>> Looking at the comment in the mentioned commit: "This corner case
>>> needs to get addressed in the orphans subsystem too."
>>>
>>> Was it addressed?  Was there a second commit for that?  If so, is it
>>> in your tree?
>>
>> I don't see anything relevant showing up with a quick:
>>
>> git log ee1438ce5dc4d67dd8dd1ff51583122a61f5bd9e..master -- fs/ubifs/
>>
>> ...
>>
>>> Beyond that, no ideas, it's not a chunk of code I am familiar with.
> 
> I send a fix for this before I started traveling:
> [PATCH] ubifs: Fix ubifs_tnc_lookup() usage in do_kill_orphans()
> 
> Digging currently thought all my mails....

I found your patch:

http://lists.infradead.org/pipermail/linux-mtd/2020-January/093390.html

I did some tests with it, and it appears to fix the problem for me!

Jef

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-28 10:51 ubifs: mounting fails due to error in orphan file handling Driesen Jef (JDI)
2020-02-05  8:22 ` Miquel Raynal
2020-02-05 15:25   ` Jef Driesen
2020-02-05 16:17     ` Steve deRosier
2020-02-07 10:18       ` Jef Driesen
2020-02-07 11:04         ` Richard Weinberger
2020-02-11 13:47           ` Jef Driesen

Linux-mtd Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mtd/0 linux-mtd/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mtd linux-mtd/ https://lore.kernel.org/linux-mtd \
		linux-mtd@lists.infradead.org
	public-inbox-index linux-mtd

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-mtd


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git