All of lore.kernel.org
 help / color / mirror / Atom feed
* ubifs: corrupted dirent (ENOENT), problably related to O_TMPFILE and linkat
@ 2021-07-22  7:53 sg
  2021-07-22  8:33 ` Richard Weinberger
  0 siblings, 1 reply; 6+ messages in thread
From: sg @ 2021-07-22  7:53 UTC (permalink / raw)
  To: linux-mtd
  Cc: Richard Weinberger, Artem Bityutskiy, Adrian Hunter, Zhihao Cheng

Hi,

we encountered an error in ubifs most likely when using O_TMPFILE and linkat.

The code (dir.c:ubifs_lookup) we ran into states

        /*
         * This should not happen. Probably the file-system needs
         * checking.
         */
I'd love to, but how? since recovery.c should have taken care of that before AFAIK


I wish I could give more elaborate information but this one is hard to track
and reproduce.

What has been tried so far and also was the first encounter

- systemd-coredump has been triggered (in our test-setup with kill -SIGABRT of
  some process)
- hardware power-cut (the timing when to poweroff has yet to be found)
- on replaying the journal the fs was mounted read only
- on later boots the fs was mounted properly but when accessing the respective
  directory an inode could not be found (with ls).
  Traced it down to ubifs_tnc_lookup.
- dead directory entry error lead to ro_mode

Following part of dmesg shows the error

   [   67.480858] UBIFS DBG gen (pid 241): dir ino 93, f_pos 0x0
   [   67.481634] UBIFS DBG gen (pid 241): ino 40735, new f_pos 0x295e98
   [   67.482024] UBIFS DBG gen (pid 241): ino 98493, new f_pos 0x9f9a65
   [   67.482753] UBIFS DBG gen (pid 241): ino 98146, new f_pos 0x18b5dd81
   [   67.483257] UBIFS DBG gen (pid 241): 'tmp' in dir ino 93
   [   67.486920] UBIFS DBG gen (pid 241): 'core.WPEWebProcess.0.e7c55980c230401d8e3941e72f56a95c.251.1530017543000000' in dir ino 93
   [   67.489412] UBIFS error (ubi0:2 pid 241): ubifs_iget: failed to read inode 98493, error -2


systemd-coredump opens a file with O_TMPFILE, writes to it and later calls
linkat()
With O_TMPFILE disabled in kernel the error does not occur and systemd uses an
alternate code path.

System 

  Linux 4.19.186 #26 SMP Wed Jul 21 13:19:08 CEST 2021 armv7l GNU/Linux

with following patches of which I thought they might solve the problem
  325b731983d010d358505bf2c40f5e0069af62d2
  dd7db149bcd914db8fdeb19cd5597c0740121bc7

/data/var is bind-mounted to /var the directory used by systemd-coredump.
/data is the actual mountpoint of ubi volume ubi0_2

  systemd version 239

Hardware
  i.MX6Q on a phyCore SOM with a SPANSION S34ML08G201TFI00 NAND flash

  The power-cut is done hard by pulling the CPU reset line.


Can you give me hints how to proceed on this?
- are there any patches from 5.14-rc2 I might have missed
- how to reproduce the error more reliable, ie. find the right time to cut power
  It is now at around 4s after issueing 'kill'. With a margin of 500ms
  and step width of 25ms
- with the mentioned above I might get my hands on more useful
  information than the blank tnc and journal spam with enabled DEBUG (had to set LOG_BUF to
  24 (16M) to capture/save all messages
- how to recover by hand
- might it be safe to just get rid of the inode in ubifs_readdir when it can not
  be found? In our particular case it is just a coredump an we don't care for 
  data loss as long as we can still use the rest of the volume/
  filesystem.
- any more information you require

@Zhihao Cheng I put you in CC for you seem to have some experience regarding
similar problems

Thanks in advance and regards


Sebastian Groß

-- 
B.Sc. Sebastian Groß, emlix GmbH, http://www.emlix.com
Fon +49 551 30664-0, Fax +49 551 30664-11,
Gothaer Platz 3, 37083 Göttingen, Germany
Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160
Geschäftsführung: Heike Jordan, Dr. Uwe Kracke
Ust-IdNr.: DE 205 198 055

emlix - your embedded linux partner

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ubifs: corrupted dirent (ENOENT), problably related to O_TMPFILE and linkat
  2021-07-22  7:53 ubifs: corrupted dirent (ENOENT), problably related to O_TMPFILE and linkat sg
@ 2021-07-22  8:33 ` Richard Weinberger
  2021-07-22 10:47   ` Sebastian Gross
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Weinberger @ 2021-07-22  8:33 UTC (permalink / raw)
  To: sg; +Cc: linux-mtd, Artem Bityutskiy, Adrian Hunter, chengzhihao1

----- Ursprüngliche Mail -----
> Von: sg@emlix.com
> An: "linux-mtd" <linux-mtd@lists.infradead.org>
> CC: "richard" <richard@nod.at>, "Artem Bityutskiy" <dedekind1@gmail.com>, "Adrian Hunter" <adrian.hunter@intel.com>,
> "chengzhihao1" <chengzhihao1@huawei.com>
> Gesendet: Donnerstag, 22. Juli 2021 09:53:24
> Betreff: ubifs: corrupted dirent (ENOENT), problably related to O_TMPFILE and linkat

> Hi,
> 
> we encountered an error in ubifs most likely when using O_TMPFILE and linkat.
> 
> The code (dir.c:ubifs_lookup) we ran into states
> 
>        /*
>         * This should not happen. Probably the file-system needs
>         * checking.
>         */
> I'd love to, but how? since recovery.c should have taken care of that before
> AFAIK

UBIFS code sometimes contains a little portion of humor.
 
> 
> I wish I could give more elaborate information but this one is hard to track
> and reproduce.
> 
> What has been tried so far and also was the first encounter
> 
> - systemd-coredump has been triggered (in our test-setup with kill -SIGABRT of
>  some process)
> - hardware power-cut (the timing when to poweroff has yet to be found)
> - on replaying the journal the fs was mounted read only
> - on later boots the fs was mounted properly but when accessing the respective
>  directory an inode could not be found (with ls).
>  Traced it down to ubifs_tnc_lookup.
> - dead directory entry error lead to ro_mode
> 
> Following part of dmesg shows the error
> 
>   [   67.480858] UBIFS DBG gen (pid 241): dir ino 93, f_pos 0x0
>   [   67.481634] UBIFS DBG gen (pid 241): ino 40735, new f_pos 0x295e98
>   [   67.482024] UBIFS DBG gen (pid 241): ino 98493, new f_pos 0x9f9a65
>   [   67.482753] UBIFS DBG gen (pid 241): ino 98146, new f_pos 0x18b5dd81
>   [   67.483257] UBIFS DBG gen (pid 241): 'tmp' in dir ino 93
>   [   67.486920] UBIFS DBG gen (pid 241):
>   'core.WPEWebProcess.0.e7c55980c230401d8e3941e72f56a95c.251.1530017543000000' in
>   dir ino 93
>   [   67.489412] UBIFS error (ubi0:2 pid 241): ubifs_iget: failed to read inode
>   98493, error -2
> 
> 
> systemd-coredump opens a file with O_TMPFILE, writes to it and later calls
> linkat()
> With O_TMPFILE disabled in kernel the error does not occur and systemd uses an
> alternate code path.

In this area we had a lot of issues, simply because the concept of inode "rebirth"
was unknown to UBIFS.

> System
> 
>  Linux 4.19.186 #26 SMP Wed Jul 21 13:19:08 CEST 2021 armv7l GNU/Linux

This is new enough to contain ee1438ce5dc4 ("ubifs: Check link count of inodes when killing orphans.").
So it must be something different.

> 
> with following patches of which I thought they might solve the problem
>  325b731983d010d358505bf2c40f5e0069af62d2

You want this one definitely. It could explain the problem you see.
BTW: Applying this patch will not recover the corrupted fs. It just makes sure
that the corruption does not happen. 

>  dd7db149bcd914db8fdeb19cd5597c0740121bc7

This one is unrelated. In worst case you'll hit an UBIFS assert.

> /data/var is bind-mounted to /var the directory used by systemd-coredump.
> /data is the actual mountpoint of ubi volume ubi0_2
> 
>  systemd version 239
> 
> Hardware
>  i.MX6Q on a phyCore SOM with a SPANSION S34ML08G201TFI00 NAND flash
> 
>  The power-cut is done hard by pulling the CPU reset line.
> 
> 
> Can you give me hints how to proceed on this?
> - are there any patches from 5.14-rc2 I might have missed
> - how to reproduce the error more reliable, ie. find the right time to cut power
>  It is now at around 4s after issueing 'kill'. With a margin of 500ms
>  and step width of 25ms

Can you reproduce using a recent kernel?

> - with the mentioned above I might get my hands on more useful
>  information than the blank tnc and journal spam with enabled DEBUG (had to set
>  LOG_BUF to
>  24 (16M) to capture/save all messages
> - how to recover by hand

Without inspecting the filesystem this impossible to answer.
You have some recovery tool in mind?

> - might it be safe to just get rid of the inode in ubifs_readdir when it can not
>  be found? In our particular case it is just a coredump an we don't care for
>  data loss as long as we can still use the rest of the volume/
>  filesystem.

Yes. Maybe it makes sense to make UBIFS at this stage more forgiving instead of
the current situation.

Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ubifs: corrupted dirent (ENOENT), problably related to O_TMPFILE and linkat
  2021-07-22  8:33 ` Richard Weinberger
@ 2021-07-22 10:47   ` Sebastian Gross
  2021-07-27 21:06     ` Richard Weinberger
  0 siblings, 1 reply; 6+ messages in thread
From: Sebastian Gross @ 2021-07-22 10:47 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: linux-mtd, Artem Bityutskiy, Adrian Hunter, chengzhihao1

On 7/22/21 10:33 AM, Richard Weinberger wrote:
> ----- Ursprüngliche Mail -----
>> Von: sg@emlix.com
>> An: "linux-mtd" <linux-mtd@lists.infradead.org>
>> CC: "richard" <richard@nod.at>, "Artem Bityutskiy" <dedekind1@gmail.com>, "Adrian Hunter" <adrian.hunter@intel.com>,
>> "chengzhihao1" <chengzhihao1@huawei.com>
>> Gesendet: Donnerstag, 22. Juli 2021 09:53:24
>> Betreff: ubifs: corrupted dirent (ENOENT), problably related to O_TMPFILE and linkat
>> The code (dir.c:ubifs_lookup) we ran into states
>>
>>         /*
>>          * This should not happen. Probably the file-system needs
>>          * checking.
>>          */
>> I'd love to, but how? since recovery.c should have taken care of that before
>> AFAIK
> 
> UBIFS code sometimes contains a little portion of humor.
I appreciate that - in fact it kept me going digging into the problem the last two days =)

>> System
>>
>>   Linux 4.19.186 #26 SMP Wed Jul 21 13:19:08 CEST 2021 armv7l GNU/Linux
> 
> This is new enough to contain ee1438ce5dc4 ("ubifs: Check link count of inodes when killing orphans.").
Confirmed

> So it must be something different.
> 
Agreed

>>
>> with following patches of which I thought they might solve the problem
>>   325b731983d010d358505bf2c40f5e0069af62d2
> 
> You want this one definitely. It could explain the problem you see.
> BTW: Applying this patch will not recover the corrupted fs. It just makes sure
> that the corruption does not happen.
I am not so sure anymore but I think this one was already present when I reproduced the corruption for the third time.

> 
>>   dd7db149bcd914db8fdeb19cd5597c0740121bc7
> 
> This one is unrelated. In worst case you'll hit an UBIFS assert.
Acknowledged. I'll drop it.

>> Can you give me hints how to proceed on this?
>> - are there any patches from 5.14-rc2 I might have missed
>> - how to reproduce the error more reliable, ie. find the right time to cut power
>>   It is now at around 4s after issueing 'kill'. With a margin of 500ms
>>   and step width of 25ms
> 
> Can you reproduce using a recent kernel?
As stated I am already having a hard time to reproduce it as it is. The last time took me ~2000 runs.
But I will try anyway.
Can you give me a pointer when there might a good time/place to cut the power?
During the linkat or fsync call? Or maybe a certain function in the ubifs driver?

> 
>> - with the mentioned above I might get my hands on more useful
>>   information than the blank tnc and journal spam with enabled DEBUG (had to set
>>   LOG_BUF to
>>   24 (16M) to capture/save all messages
>> - how to recover by hand
> 
> Without inspecting the filesystem this impossible to answer.
> You have some recovery tool in mind?
No not particular a tool but by the method I suggested below.

> 
>> - might it be safe to just get rid of the inode in ubifs_readdir when it can not
>>   be found? In our particular case it is just a coredump an we don't care for
>>   data loss as long as we can still use the rest of the volume/
>>   filesystem.
> 
> Yes. Maybe it makes sense to make UBIFS at this stage more forgiving instead of
> the current situation.
ubifs_readdir had then to check if the inodes it got are actually be found by tnc and then remove it.
Seems quite hacky to me and not really upstream-worthy.
Are there other places where a "faulty" inode can be "discovered"?

Regards


Sebastian

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ubifs: corrupted dirent (ENOENT), problably related to O_TMPFILE and linkat
  2021-07-22 10:47   ` Sebastian Gross
@ 2021-07-27 21:06     ` Richard Weinberger
  2021-08-12 12:01       ` Sebastian Gross
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Weinberger @ 2021-07-27 21:06 UTC (permalink / raw)
  To: Sebastian Gross
  Cc: Richard Weinberger, linux-mtd, Artem Bityutskiy, Adrian Hunter,
	chengzhihao1

On Thu, Jul 22, 2021 at 12:56 PM Sebastian Gross
<sebastian.gross@emlix.com> wrote:
> > Can you reproduce using a recent kernel?
> As stated I am already having a hard time to reproduce it as it is. The last time took me ~2000 runs.
> But I will try anyway.
> Can you give me a pointer when there might a good time/place to cut the power?
> During the linkat or fsync call? Or maybe a certain function in the ubifs driver?

I'd try first with linkat().
You can always emulate a power-cut by placing ubifs_ro_mode() calls at
code paths where you suspect
the bug.
But often it is more complicated. Especially when the bug is in the
recovery code when the filesystem
is mounted the next time.

> > Yes. Maybe it makes sense to make UBIFS at this stage more forgiving instead of
> > the current situation.
> ubifs_readdir had then to check if the inodes it got are actually be found by tnc and then remove it.
> Seems quite hacky to me and not really upstream-worthy.
> Are there other places where a "faulty" inode can be "discovered"?

If you enable /sys/kernel/debug/ubifs/chk_fs, it will do a full scan
upon mount time.
This can detect such problems. But not fix them.
I have some ideas how to turn this feature into a minimal repair tool
but this has to be
done with extreme care.

-- 
Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ubifs: corrupted dirent (ENOENT), problably related to O_TMPFILE and linkat
  2021-07-27 21:06     ` Richard Weinberger
@ 2021-08-12 12:01       ` Sebastian Gross
  2021-09-30  7:29         ` Richard Weinberger
  0 siblings, 1 reply; 6+ messages in thread
From: Sebastian Gross @ 2021-08-12 12:01 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Richard Weinberger, linux-mtd, Artem Bityutskiy, Adrian Hunter,
	chengzhihao1

Hi all,

On 7/27/21 11:06 PM, Richard Weinberger wrote:
> On Thu, Jul 22, 2021 at 12:56 PM Sebastian Gross
> <sebastian.gross@emlix.com> wrote:
>>> Can you reproduce using a recent kernel?
I have finally been able to look into it.
For v5.13 the error on replay no longer occurs. The tmp file in question is there and happy.

I was able to create a sandbox with qemu, generic Debian 10 rootfs and nandsim.
A simple reproducer was as stated before:
- open(O_TMPFILE)
- write (some MB; at least 32 seemed to suffice)
- linkat
After that the wbuf has been synced according to dmesg
{{{
UBIFS DBG io (pid 1681): synchronize
UBIFS DBG io (pid 1681): LEB 7775:1024, 392 bytes, jhead 1 (base)
UBIFS DBG lp (pid 1681): LEB 7775, free 14456, dirty 864, flags 16
UBIFS DBG lp (pid 1681): LEB 7775, free 14336, dirty 984, flags 16
UBIFS DBG lp (pid 1681): LEB 7775, free 14456, dirty 864, flags 16
}}}

Powercut has been simulated by dumping mtd0 at this stage.

This left me with a ready to (RE-)use NAND image.

> But often it is more complicated. Especially when the bug is in the
> recovery code when the filesystem
> is mounted the next time.
This seems to actually be the case.

> If you enable /sys/kernel/debug/ubifs/chk_fs, it will do a full scan
> upon mount time.
 > This can detect such problems. But not fix them.

Much appreciated, thanks. This did in fact discover the missing inode though the dirent was there (ls did work, ls -l not).

(On a related note the local ubifs_debug_info has also an entry for chk_fs but it will never be used :\)


With the method above I was able to "poor-man's" bisect the minor kernel versions.

The replay was fixed in v5.3. From there I bisected down to ee1438ce5dc4d67dd8dd1ff51583122a61f5bd9e.
This one might be a candidate to integrate into the LTS branches.

For 4.19.202 I had to pull in 988bec41318f3fa897e2f8af271bd456936d6caf also.


TYVM for your support so far

Kind regards


Sebastian

-- 
B.Sc. Sebastian Groß, emlix GmbH, http://www.emlix.com
Fon +49 551 30664-0, Fax +49 551 30664-11,
Gothaer Platz 3, 37083 Göttingen, Germany
Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160
Geschäftsführung: Heike Jordan, Dr. Uwe Kracke
Ust-IdNr.: DE 205 198 055

emlix - your embedded linux partner

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ubifs: corrupted dirent (ENOENT), problably related to O_TMPFILE and linkat
  2021-08-12 12:01       ` Sebastian Gross
@ 2021-09-30  7:29         ` Richard Weinberger
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Weinberger @ 2021-09-30  7:29 UTC (permalink / raw)
  To: Sebastian Gross
  Cc: Richard Weinberger, linux-mtd, Artem Bityutskiy, Adrian Hunter,
	chengzhihao1

Sebastian,

On Thu, Aug 12, 2021 at 2:01 PM Sebastian Gross
<sebastian.gross@emlix.com> wrote:
> > If you enable /sys/kernel/debug/ubifs/chk_fs, it will do a full scan
> > upon mount time.
>  > This can detect such problems. But not fix them.
>
> Much appreciated, thanks. This did in fact discover the missing inode though the dirent was there (ls did work, ls -l not).
>
> (On a related note the local ubifs_debug_info has also an entry for chk_fs but it will never be used :\)
>
>
> With the method above I was able to "poor-man's" bisect the minor kernel versions.
>
> The replay was fixed in v5.3. From there I bisected down to ee1438ce5dc4d67dd8dd1ff51583122a61f5bd9e.
> This one might be a candidate to integrate into the LTS branches.

Ohh. Right. In the past I assumed that a Fixes-Tag qualifies a patch
for stable trees, which did not work
always. :-(

> For 4.19.202 I had to pull in 988bec41318f3fa897e2f8af271bd456936d6caf also.

Be careful. This commit caused other problems that needed fixes. See
"Fixes:" tags.
I'm looking right now into rewriting ee1438 for 4.19.x such that it
can work without 988bec.

Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-09-30  7:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-22  7:53 ubifs: corrupted dirent (ENOENT), problably related to O_TMPFILE and linkat sg
2021-07-22  8:33 ` Richard Weinberger
2021-07-22 10:47   ` Sebastian Gross
2021-07-27 21:06     ` Richard Weinberger
2021-08-12 12:01       ` Sebastian Gross
2021-09-30  7:29         ` Richard Weinberger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.