linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* UBIFS corruption in empty space during mount
@ 2020-10-29  4:48 Barak Adam
  2020-10-29 10:36 ` Richard Weinberger
  0 siblings, 1 reply; 6+ messages in thread
From: Barak Adam @ 2020-10-29  4:48 UTC (permalink / raw)
  To: linux-mtd

Hi all,

We are facing a kernel panic in our legacy switches, similar to one in the following post:

https://patchwork.ozlabs.org/project/linux-mtd/patch/loom.20120319T102527-948@post.gmane.org/


This corruption happens upon root FS mount and thus triggers a kernel panic upon system init.

System description:
=================

Our system is legacy, using Marvell Cetus SOC with a raw 1Gbit NAND of Micron, NAND ECC is 8 bit.

We are using UBIFS, Linux-3.10.70.

NAND driver is "armada-nand" by Marvell (mtd/nand/mvebu_nfc/nand_nfc.c),  based on the PXA drivers/mtd/nand/pxa3xx_nand.c.

Using a script of endless loop of power cycling, we get this panic:
========================================================

UBIFS error (pid 1): ubifs_scan: corrupt empty space at LEB 3:7571

UBIFS error (pid 1): ubifs_scanned_corruption: corruption at LEB 3:7571

UBIFS error (pid 1): ubifs_scanned_corruption: first 8192 bytes from LEB 3:7571

UBIFS error (pid 1): ubifs_scan: LEB 3 scanning failed

VFS: Cannot open root device "ubi0:root" or unknown-block(0,0): error -117

Please append a correct "root=" boot option; here are the available partitions:

Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

============================================================


I did read some of the posts about corruption of empty space for UBIFS.

Most of them recommend applying a fix on the lower layers, mtd or nand drivers.



In the past we had similar issues, it was happening during recovery of master node and I applied the following commits:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=730a43fbc135e593cc3de3b1b895e49c05c8e2dc

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=40cbe6eee97b706f27bcc4c6aa1018bbe4f1e577

But now I think it is happening during mount, while UBIFS replaying the journal and it is a different scenario.


As far as I understand, this is the call stack now that leading to the panic:

 [<c0015050>] (unwind_backtrace+0x0/0xf8) from [<c00115f4>] (show_stack+0x10/0x18)

[<c00115f4>] (show_stack+0x10/0x18) from [<c0196634>] (ubifs_scan+0x29c/0x378)

[<c0196634>] (ubifs_scan+0x29c/0x378) from [<c0196aa4>] (ubifs_replay_journal+0x104/0x1380)

[<c0196aa4>] (ubifs_replay_journal+0x104/0x1380) from [<c018caf8>] (ubifs_mount+0xe88/0x15c8)

[<c018caf8>] (ubifs_mount+0xe88/0x15c8) from [<c00a0830>] (mount_fs+0x14/0xc8)

[<c00a0830>] (mount_fs+0x14/0xc8) from [<c00b7620>] (vfs_kern_mount+0x4c/0xc4)

[<c00b7620>] (vfs_kern_mount+0x4c/0xc4) from [<c00b992c>] (do_mount+0x1ac/0x8e8)

[<c00b992c>] (do_mount+0x1ac/0x8e8) from [<c00ba0ec>] (SyS_mount+0x84/0xbc)

[<c00ba0ec>] (SyS_mount+0x84/0xbc) from [<c0674ee0>] (mount_block_root+0x104/0x22c)

[<c0674ee0>] (mount_block_root+0x104/0x22c) from [<c06751a4>] (prepare_namespace+0x90/0x194)

[<c06751a4>] (prepare_namespace+0x90/0x194) from [<c0674bf0>] (kernel_init_freeable+0x180/0x1c8)

[<c0674bf0>] (kernel_init_freeable+0x180/0x1c8) from [<c04de5e8>] (kernel_init+0x8/0x154)

[<c04de5e8>] (kernel_init+0x8/0x154) from [<c000dfd8>] (ret_from_fork+0x14/0x3c)


ubifs_scan (fs/ubifs) is called to scan the lebs.

It detects the corrupted empty space, dump the corruption messages as shown above, and return the -EUCLEAN error code that makes the kernel panic.

 ubifs_scan:

--> calls ubifs_start_scan (fs/ubifs)

--> which calls ubifs_leb_read (fs/ubifs)

--> which calls ubi_read (mtd/ubi.h)

--> which calls ubi_leb_read (mtd/ubi)

ubi_leb_read calls lower layer nand driver functions but finally returns with -EBADMSG error code indicating that the MTD driver has detected a data integrity problem (unrecoverable ECC checksum mismatch in case of NAND).

 I am still debugging, looking for any solution / workaround.


Thanks !
Barak

Please see our privacy statement at https://www.adva.com/en/about-us/legal/privacy-statement for details of how ADVA processes personal information.

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: UBIFS corruption in empty space during mount
  2020-10-29  4:48 UBIFS corruption in empty space during mount Barak Adam
@ 2020-10-29 10:36 ` Richard Weinberger
  2020-10-29 14:52   ` Barak Adam
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Weinberger @ 2020-10-29 10:36 UTC (permalink / raw)
  To: Barak Adam; +Cc: linux-mtd

Barak,

On Thu, Oct 29, 2020 at 5:54 AM Barak Adam <BAdam@adva.com> wrote:
> ubi_leb_read calls lower layer nand driver functions but finally returns with -EBADMSG error code indicating that the MTD driver has detected a data integrity problem (unrecoverable ECC checksum mismatch in case of NAND).

So, your driver is facing ECC errors while reading empty space?
This is something UBIFS does not expect.

If this is really the case, please double check, you can try to
backport the patches you've listed.
Or change the driver. In any case UBIFS does not want to see ECC
errors when reading eased pages.

-- 
Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: UBIFS corruption in empty space during mount
  2020-10-29 10:36 ` Richard Weinberger
@ 2020-10-29 14:52   ` Barak Adam
  2020-10-29 21:19     ` Richard Weinberger
  0 siblings, 1 reply; 6+ messages in thread
From: Barak Adam @ 2020-10-29 14:52 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-mtd

<Barak,

<<On Thu, Oct 29, 2020 at 5:54 AM Barak Adam <BAdam@adva.com> wrote:
<> ubi_leb_read calls lower layer nand driver functions but finally returns with -EBADMSG error code indicating that the MTD driver has detected a data integrity problem (unrecoverable ECC checksum mismatch in case of NAND).

<So, your driver is facing ECC errors while reading empty space?
<This is something UBIFS does not expect.

<If this is really the case, please double check, you can try to backport the patches you've listed.
<Or change the driver. In any case UBIFS does not want to see ECC errors when reading eased pages.

<--
<Thanks,
<//Richard


Thanks Richard.

I already applied those patches, but the issue is not yet covered / resolved.

What kind of a fix should I apply to the driver layer - raise ECC errors for the upper layers even when ready empty / erased area ?

Thanks,
Barak

-----Original Message-----
From: Richard Weinberger <richard.weinberger@gmail.com>
Sent: Thursday, October 29, 2020 12:36 PM
To: Barak Adam <BAdam@adva.com>
Cc: linux-mtd@lists.infradead.org
Subject: Re: UBIFS corruption in empty space during mount

External email: [richard.weinberger@gmail.com]

......................................................................
Barak,

On Thu, Oct 29, 2020 at 5:54 AM Barak Adam <BAdam@adva.com> wrote:
> ubi_leb_read calls lower layer nand driver functions but finally returns with -EBADMSG error code indicating that the MTD driver has detected a data integrity problem (unrecoverable ECC checksum mismatch in case of NAND).

So, your driver is facing ECC errors while reading empty space?
This is something UBIFS does not expect.

If this is really the case, please double check, you can try to backport the patches you've listed.
Or change the driver. In any case UBIFS does not want to see ECC errors when reading eased pages.

--
Thanks,
//richard
Please see our privacy statement at https://www.adva.com/en/about-us/legal/privacy-statement for details of how ADVA processes personal information.
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: UBIFS corruption in empty space during mount
  2020-10-29 14:52   ` Barak Adam
@ 2020-10-29 21:19     ` Richard Weinberger
  2020-11-02  6:06       ` Barak Adam
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Weinberger @ 2020-10-29 21:19 UTC (permalink / raw)
  To: Barak Adam; +Cc: linux-mtd

Barak,

----- Ursprüngliche Mail -----
> Von: "Barak Adam" <BAdam@adva.com>
> Thanks Richard.
> 
> I already applied those patches, but the issue is not yet covered / resolved.
> 
> What kind of a fix should I apply to the driver layer - raise ECC errors for the
> upper layers even when ready empty / erased area ?

Maybe you're facing something else. Did you verify that it is really flipped bits
in empty space?

Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: UBIFS corruption in empty space during mount
  2020-10-29 21:19     ` Richard Weinberger
@ 2020-11-02  6:06       ` Barak Adam
  2020-11-02  7:44         ` Richard Weinberger
  0 siblings, 1 reply; 6+ messages in thread
From: Barak Adam @ 2020-11-02  6:06 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-mtd

Hi Richard,

It is detected as an empty area by UBIFS. Regarding bitflips, I am not sure. I am running it with debug printouts, to dump the corruption and analyze it.
Any other idea how to detect if the underlying corruption is bitflips or something else?
The returned error by the lower layer is -EBADMSG which I guess indicates an unrecoverable ECC error in the media.

Thanks,
Barak

-----Original Message-----
From: Richard Weinberger <richard@nod.at>
Sent: Thursday, October 29, 2020 11:19 PM
To: Barak Adam <BAdam@adva.com>
Cc: linux-mtd <linux-mtd@lists.infradead.org>
Subject: Re: UBIFS corruption in empty space during mount

External email: [richard@nod.at]

......................................................................
Barak,

----- Ursprüngliche Mail -----
>> Von: "Barak Adam" <BAdam@adva.com>
>> Thanks Richard.
>>
>> I already applied those patches, but the issue is not yet covered / resolved.
>>
>> What kind of a fix should I apply to the driver layer - raise ECC
>> errors for the upper layers even when ready empty / erased area ?

>Maybe you're facing something else. Did you verify that it is really flipped bits in empty space?

>Thanks,
>//richard
Please see our privacy statement at https://www.adva.com/en/about-us/legal/privacy-statement for details of how ADVA processes personal information.
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: UBIFS corruption in empty space during mount
  2020-11-02  6:06       ` Barak Adam
@ 2020-11-02  7:44         ` Richard Weinberger
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Weinberger @ 2020-11-02  7:44 UTC (permalink / raw)
  To: Barak Adam; +Cc: linux-mtd

Barak,

----- Ursprüngliche Mail -----
> Von: "Barak Adam" <BAdam@adva.com>
> An: "richard" <richard@nod.at>
> CC: "linux-mtd" <linux-mtd@lists.infradead.org>
> Gesendet: Montag, 2. November 2020 07:06:25
> Betreff: RE: UBIFS corruption in empty space during mount

> Hi Richard,
> 
> It is detected as an empty area by UBIFS. Regarding bitflips, I am not sure. I
> am running it with debug printouts, to dump the corruption and analyze it.
> Any other idea how to detect if the underlying corruption is bitflips or
> something else?

I think it is the over way around. UBIFS expects empty space but finds something else.
Find the message in UBIFS and dump the buffer to dmesg. :-)

Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-11-02  7:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-29  4:48 UBIFS corruption in empty space during mount Barak Adam
2020-10-29 10:36 ` Richard Weinberger
2020-10-29 14:52   ` Barak Adam
2020-10-29 21:19     ` Richard Weinberger
2020-11-02  6:06       ` Barak Adam
2020-11-02  7:44         ` Richard Weinberger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).