Linux-mtd Archive on lore.kernel.org
 help / color / Atom feed
* Re: ubifs: read bad node type in ubifs_tnc_read_wbuf
       [not found]     ` <46e2f6ca11a442f38d476d4346e8034c@shmbx04.spreadtrum.com>
@ 2020-01-12 23:48       ` Richard Weinberger
  2020-01-14  1:09         ` 李傲傲 (Carson Li1/9542)
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Weinberger @ 2020-01-12 23:48 UTC (permalink / raw)
  To: 李傲傲 (Carson Li1/9542); +Cc: linux-mtd

----- Ursprüngliche Mail -----
> Von: "李傲傲 (Carson Li1/9542)" <Carson.Li1@unisoc.com>
> An: "linux-mtd" <linux-mtd@lists.infradead.org>, "richard" <richard@nod.at>
> Gesendet: Freitag, 10. Januar 2020 08:46:38
> Betreff: FW: ubifs: read bad node type in ubifs_tnc_read_wbuf

> Hi mtd-list:

> I have a problem recently on ubifs: kernel reports to read a bad node type in

> ubifs_tnc_read_wbuf.

> I guess th e problem occurs because of tnc_mutex is freed in ubifs_tnc_locate.

> Though it marks the journal heads will not be GCed. But if there is a commit
> just

> finished, LEBs in journal heads will not be in bud rbtree anymore. And it might

> be GCed. Here is the kernel log:

> ------------------------------------------------kernel log
> ----------------------------------------------

> [ 30.072901] c0 UBIFS error (ubi0:23 pid 944): ubifs_read_node_wbuf: bad node
> type (232 but expected 0)

> [ 30.085890] c1 UBIFS error (ubi0:23 pid 944): ubifs_read_node_wbuf:
> wbuf=cd008f40,buf=d4cd9000,

> wbuf->buf=c90ec000,offs=164320, len=160, rlen=0

Do you have a test/reproducer for this issue?

Also see:
commit 601c0bc46753007be011b513ba4fc50ed8e30aef
Author: Adrian Hunter <ext-adrian.hunter@nokia.com>
Date:   Fri Aug 22 14:23:35 2008 +0300

    UBIFS: allow for racing between GC and TNC
    
    The TNC mutex is unlocked prematurely when reading leaf nodes
    with non-hashed keys.  This is unsafe because the node may be
    moved by garbage collection and the eraseblock unmapped, although
    that has never actually happened during stress testing.
    
    This patch fixes the flaw by detecting the race and retrying with
    the TNC mutex locked.
    
    Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>

It really looks like a journal is being garbage collected. Hmm.

Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: ubifs: read bad node type in ubifs_tnc_read_wbuf
  2020-01-12 23:48       ` ubifs: read bad node type in ubifs_tnc_read_wbuf Richard Weinberger
@ 2020-01-14  1:09         ` 李傲傲 (Carson Li1/9542)
  2020-01-14  8:09           ` Richard Weinberger
  0 siblings, 1 reply; 8+ messages in thread
From: 李傲傲 (Carson Li1/9542) @ 2020-01-14  1:09 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-mtd

On Monday, January 13, 2020 7:49 AM Richard <richard@nod.at> wrote:
>
>Do you have a test/reproducer for this issue?
>
I find out this issue from cases left over which used to happen frequently.
But when it comes to me, an one month's stress testing doesn't reproduce it.
it is only occasionally found on products that returned by costumer,but
with no logs added before to analysis it.


The kernel log I pasted before is one of the previous spot, including some
logs added to analysis it.
There are two more cases with the same appearance:
----------------------------------kernel log----------------------------------
[   30.293468] c0 UBIFS error (ubi0:23 pid 943): ubifs_read_node_wbuf: expected node type 0
[   30.304036] c1 UBIFS error (ubi0:23 pid 943): ubifs_read_node_wbuf: wbuf=c901b140,
buf=d5167000, wbuf->buf=c90ed000,offs=210744, len=160, rlen=0
[   30.354902] c1 UBIFS error (ubi0:23 pid 943): ubifs_read_node_wbuf: wxf bad node at LEB 54:210744
[   30.367063] c0 Not a node, first 24 bytes:
[   30.371204] c1 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   30.387611] c1 Kernel panic - not syncing: ch address is d5167000
[   30.398825] c1 CPU: 1 PID: 943 Comm: http-thread Tainted: G           O    4.4.83 #1
[   30.398836] c1 Hardware name: Generic DT based system
[   30.398869] c1 [<c001624c>] (unwind_backtrace) from [<c0012f8c>] (show_stack+0x10/0x14)
[   30.398885] c1 [<c0012f8c>] (show_stack) from [<c0287188>] (dump_stack+0x80/0xa0)
[   30.398900] c1 [<c0287188>] (dump_stack) from [<c00acff0>] (panic+0x70/0x1e4)
[   30.398919] c1 [<c00acff0>] (panic) from [<c01e3274>] (ubifs_dump_node+0x6c/0x9a0)
[   30.398938] c1 [<c01e3274>] (ubifs_dump_node) from [<c01c3200>] (ubifs_read_node_wbuf+0x350/0x384)
[   30.398951] c1 [<c01c3200>] (ubifs_read_node_wbuf) from [<c01e0d28>] (ubifs_tnc_read_node+0x54/0x214)
[   30.398965] c1 [<c01e0d28>] (ubifs_tnc_read_node) from [<c01c65ac>] (ubifs_tnc_locate+0x118/0x1b4)
[   30.398979] c1 [<c01c65ac>] (ubifs_tnc_locate) from [<c01bd2c8>] (ubifs_iget+0xb8/0x68c)
[   30.398990] c1 [<c01bd2c8>] (ubifs_iget) from [<c01b9b90>] (ubifs_lookup+0x1b4/0x258)
[   30.399005] c1 [<c01b9b90>] (ubifs_lookup) from [<c00f8134>] (lookup_real+0x30/0x4c)
[   30.399018] c1 [<c00f8134>] (lookup_real) from [<c00f8e68>] (__lookup_hash+0x34/0x3c)
[   30.399029] c1 [<c00f8e68>] (__lookup_hash) from [<c00f8f5c>] (walk_component+0xec/0x2a0)
[   30.399041] c1 [<c00f8f5c>] (walk_component) from [<c00fa600>] (path_lookupat+0x80/0xfc)
[   30.399053] c1 [<c00fa600>] (path_lookupat) from [<c00fd358>] (filename_lookup+0x5c/0xfc)
[   30.399068] c1 [<c00fd358>] (filename_lookup) from [<c00f4140>] (vfs_fstatat+0x4c/0x9c)
[   30.399082] c1 [<c00f4140>] (vfs_fstatat) from [<c00f4724>] (SyS_stat64+0x14/0x30)
[   30.399099] c1 [<c00f4724>] (SyS_stat64) from [<c000f8c0>] (ret_fast_syscall+0x0/0x34)
----------------------------------kernel log----------------------------------

---------------------zbranch passed to ubifs_tnc_read_node----------------
crash_arm> ubifs_zbranch c68cfd48
struct ubifs_zbranch {
  key = {
    u8 = "\203\003\000\000\000\000\000",
    u32 = {899, 0},
    u64 = {899},
    j32 = {899, 0}
  },
  {
    znode = 0x0,
    leaf = 0x0
  },
  lnum = 54, //DATAHD
  offs = 210744,
  len = 160
}
----------------------------zbranch found in TNC--------------------------
crash_arm> ubifs_zbranch  0xc8ef9230
struct ubifs_zbranch {
  key = {
    u8 = "\203\003\000\000\000\000\000",
    u32 = {899, 0},
    u64 = {899},
    j32 = {899, 0}
  },
  {
    znode = 0x0,
    leaf = 0x0
  },
  lnum = 224, //GCHD
  offs = 226240,
  len = 160
}

crash_arm> ubifs_jhead.wbuf.lnum 0xc901b000
  wbuf.lnum = 224, //GCHD
crash_arm> ubifs_jhead.wbuf.lnum 0xc901b0a0
  wbuf.lnum = 53,//BASEHD
crash_arm> ubifs_jhead.wbuf.lnum 0xc901b140
  wbuf.lnum = 54, //DATAHD

---------------------------------gced_lnum---------------------------------
crash_arm> ubifs_info.gced_lnum 0xc90ea000
  gced_lnum = 54

----------------------------------kernel log----------------------------------
[   29.979799] c1 UBIFS error (ubi0:23 pid 887): ubifs_read_node_wbuf: bad node type (3 but expected 1)
[   29.991550] c0 UBIFS error (ubi0:23 pid 887): ubifs_read_node_wbuf: wxf read times error:
wbuf=cd00b140,buf=d4a69000, wbuf->buf=c9198000,offs=250888, len=103, rlen=0
[   30.022582] c0 UBIFS error (ubi0:23 pid 887): ubifs_read_node_wbuf: wxf bad node at LEB 272:250888
[   30.033056] c0 Not a node, first 24 bytes:
[   30.036644] c0 00000000: 40 19 2a 00 00 81 20 01 02 00 60 23 50 1c 70 09 01 00 00 14 03 b9 00 14
[   30.052581] c0 Kernel panic - not syncing: ch address is d4a69000
[   30.063283] c0 CPU: 0 PID: 887 Comm: Binder_2 Tainted: G           O    4.4.83 #1
[   30.063295] c0 Hardware name: Generic DT based system
[   30.063330] c0 [<c001624c>] (unwind_backtrace) from [<c0012f8c>] (show_stack+0x10/0x14)
[   30.063347] c0 [<c0012f8c>] (show_stack) from [<c0287188>] (dump_stack+0x80/0xa0)
[   30.063363] c0 [<c0287188>] (dump_stack) from [<c00acff0>] (panic+0x70/0x1e4)
[   30.063382] c0 [<c00acff0>] (panic) from [<c01e3274>] (ubifs_dump_node+0x6c/0x9a0)
[   30.063399] c0 [<c01e3274>] (ubifs_dump_node) from [<c01c3200>] (ubifs_read_node_wbuf+0x350/0x384)
[   30.063412] c0 [<c01c3200>] (ubifs_read_node_wbuf) from [<c01e0d28>] (ubifs_tnc_read_node+0x54/0x214)
[   30.063426] c0 [<c01e0d28>] (ubifs_tnc_read_node) from [<c01c65ac>] (ubifs_tnc_locate+0x118/0x1b4)
[   30.063440] c0 [<c01c65ac>] (ubifs_tnc_locate) from [<c01b8170>] (do_readpage+0x1e0/0x494)
[   30.063452] c0 [<c01b8170>] (do_readpage) from [<c01b9838>] (ubifs_readpage+0x414/0x464)
[   30.063466] c0 [<c01b9838>] (ubifs_readpage) from [<c00afe74>] (generic_file_read_iter+0x33c/0x604)
[   30.063483] c0 [<c00afe74>] (generic_file_read_iter) from [<c00eff88>] (__vfs_read+0xa4/0xcc)
[   30.063496] c0 [<c00eff88>] (__vfs_read) from [<c00f05cc>] (vfs_read+0x84/0x134)
[   30.063507] c0 [<c00f05cc>] (vfs_read) from [<c00f0ed4>] (SyS_pread64+0x54/0x88)
[   30.063522] c0 [<c00f0ed4>] (SyS_pread64) from [<c000f8c0>] (ret_fast_syscall+0x0/0x34
----------------------------------kernel log----------------------------------

---------------------zbranch passed to ubifs_tnc_read_node----------------
struct ubifs_zbranch {
  key = {
    u8 = "-\002\000\000\004\000\000 ",
    u32 = {557, 536870916},
    u64 = {2305843026393563693},
    j32 = {557, 536870916}
  },
  {
    znode = 0x0,
    leaf = 0x0
  },
  lnum = 272, //2 DATAHD
  offs = 250888,
  len = 103
}
----------------------------zbranch found in TNC--------------------------
crash_arm> ubifs_zbranch 0xc646a1A8
struct ubifs_zbranch {
  key = {
    u8 = "-\002\000\000\004\000\000 ",
    u32 = {557, 536870916},
    u64 = {2305843026393563693},
    j32 = {557, 536870916}
  },
  {
    znode = 0x0,
    leaf = 0x0
  },
  lnum = 265,  //0 GCHD
  offs = 164872,
  len = 103
}

crash_arm> ubifs_jhead.wbuf.lnum 0xcd00b000
  wbuf.lnum = 265, //0 GCHD
crash_arm> ubifs_jhead.wbuf.lnum 0xcd00b0a0
  wbuf.lnum = 686,//1 BASEHD
crash_arm> ubifs_jhead.wbuf.lnum 0xcd00b140
  wbuf.lnum = 272, //2 DATAHD

---------------------------------gced_lnum---------------------------------
crash_arm> ubifs_info.gced_lnum 0xc90ea000
  gced_lnum = 272,

Thanks,
Carson
________________________________
 This email (including its attachments) is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. Unauthorized use, dissemination, distribution or copying of this email or the information herein or taking any action in reliance on the contents of this email or the information herein, by anyone other than the intended recipient, or an employee or agent responsible for delivering the message to the intended recipient, is strictly prohibited. If you are not the intended recipient, please do not read, copy, use or disclose any part of this e-mail to others. Please notify the sender immediately and permanently delete this e-mail and any attachments if you received it in error. Internet communications cannot be guaranteed to be timely, secure, error-free or virus-free. The sender does not accept liability for any errors or omissions.
本邮件及其附件具有保密性质,受法律保护不得泄露,仅发送给本邮件所指特定收件人。严禁非经授权使用、宣传、发布或复制本邮件或其内容。若非该特定收件人,请勿阅读、复制、 使用或披露本邮件的任何内容。若误收本邮件,请从系统中永久性删除本邮件及所有附件,并以回复邮件的方式即刻告知发件人。无法保证互联网通信及时、安全、无误或防毒。发件人对任何错漏均不承担责任。
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ubifs: read bad node type in ubifs_tnc_read_wbuf
  2020-01-14  1:09         ` 李傲傲 (Carson Li1/9542)
@ 2020-01-14  8:09           ` Richard Weinberger
  2020-01-14  8:28             ` 李傲傲 (Carson Li1/9542)
  2020-01-14 11:01             ` 李傲傲 (Carson Li1/9542)
  0 siblings, 2 replies; 8+ messages in thread
From: Richard Weinberger @ 2020-01-14  8:09 UTC (permalink / raw)
  To: 李傲傲; +Cc: linux-mtd

----- Ursprüngliche Mail -----
> Von: "李傲傲" <Carson.Li1@unisoc.com>
> An: "richard" <richard@nod.at>
> CC: "linux-mtd" <linux-mtd@lists.infradead.org>
> Gesendet: Dienstag, 14. Januar 2020 02:09:54
> Betreff: RE: ubifs: read bad node type in ubifs_tnc_read_wbuf

> On Monday, January 13, 2020 7:49 AM Richard <richard@nod.at> wrote:
>>
>>Do you have a test/reproducer for this issue?
>>
> I find out this issue from cases left over which used to happen frequently.
> But when it comes to me, an one month's stress testing doesn't reproduce it.
> it is only occasionally found on products that returned by costumer,but
> with no logs added before to analysis it.
> 
> 
> The kernel log I pasted before is one of the previous spot, including some
> logs added to analysis it.
> There are two more cases with the same appearance:

Is it always DATAHD which is being garbage collected?

Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: ubifs: read bad node type in ubifs_tnc_read_wbuf
  2020-01-14  8:09           ` Richard Weinberger
@ 2020-01-14  8:28             ` 李傲傲 (Carson Li1/9542)
  2020-01-14 11:01             ` 李傲傲 (Carson Li1/9542)
  1 sibling, 0 replies; 8+ messages in thread
From: 李傲傲 (Carson Li1/9542) @ 2020-01-14  8:28 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-mtd

On 14. Januar 2020 02:09:54 AM Richard <richard@nod.at> wrote:
>
>Is it always DATAHD which is being garbage collected?
>
In those three cases, it is always a LEB being garbage collected and then
used as DATAHD. I think it might because of the GCHD also has wbuf, so it
can make ubifs_get_wbuf returns true, while the other goes directly to
fallible_read_node in ubifs_tnc_locate.


Thanks.
Carson
________________________________
 This email (including its attachments) is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. Unauthorized use, dissemination, distribution or copying of this email or the information herein or taking any action in reliance on the contents of this email or the information herein, by anyone other than the intended recipient, or an employee or agent responsible for delivering the message to the intended recipient, is strictly prohibited. If you are not the intended recipient, please do not read, copy, use or disclose any part of this e-mail to others. Please notify the sender immediately and permanently delete this e-mail and any attachments if you received it in error. Internet communications cannot be guaranteed to be timely, secure, error-free or virus-free. The sender does not accept liability for any errors or omissions.
本邮件及其附件具有保密性质,受法律保护不得泄露,仅发送给本邮件所指特定收件人。严禁非经授权使用、宣传、发布或复制本邮件或其内容。若非该特定收件人,请勿阅读、复制、 使用或披露本邮件的任何内容。若误收本邮件,请从系统中永久性删除本邮件及所有附件,并以回复邮件的方式即刻告知发件人。无法保证互联网通信及时、安全、无误或防毒。发件人对任何错漏均不承担责任。
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: ubifs: read bad node type in ubifs_tnc_read_wbuf
  2020-01-14  8:09           ` Richard Weinberger
  2020-01-14  8:28             ` 李傲傲 (Carson Li1/9542)
@ 2020-01-14 11:01             ` 李傲傲 (Carson Li1/9542)
  2020-01-14 23:16               ` Richard Weinberger
  1 sibling, 1 reply; 8+ messages in thread
From: 李傲傲 (Carson Li1/9542) @ 2020-01-14 11:01 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-mtd

Hi Richard:
>---------------------zbranch passed to ubifs_tnc_read_node----------------
>crash_arm> ubifs_zbranch c68cfd48
>struct ubifs_zbranch {
>  key = {
>    u8 = "\203\003\000\000\000\000\000",
>    u32 = {899, 0},
>    u64 = {899},
>    j32 = {899, 0}
>  },
>  {
>    znode = 0x0,
>    leaf = 0x0
>  },
>  lnum = 54, //DATAHD
>  offs = 210744,
>  len = 160
>}
>----------------------------zbranch found in TNC--------------------------
>crash_arm> ubifs_zbranch  0xc8ef9230 struct ubifs_zbranch {
>  key = {
>    u8 = "\203\003\000\000\000\000\000",
>    u32 = {899, 0},
>    u64 = {899},
>    j32 = {899, 0}
>  },
>  {
>    znode = 0x0,
>    leaf = 0x0
>  },
>  lnum = 224, //GCHD
>  offs = 226240,
>  len = 160
>}

>crash_arm> ubifs_jhead.wbuf.lnum 0xc901b000
> wbuf.lnum = 224, //GCHD
>crash_arm> ubifs_jhead.wbuf.lnum 0xc901b0a0
>  wbuf.lnum = 53,//BASEHD
>crash_arm> ubifs_jhead.wbuf.lnum 0xc901b140
>  wbuf.lnum = 54, //DATAHD

I am not sure if i have made it clear, maybe following logic can make it clearer:

A  |          B
--------------------------------------------------------------------------------------------
ubifs_tnc_locate
   zbr->lnum = 54 (find in TNC)
     ubifs_get_wbuf(zbr->lnum = 54) is ture
          ubifs_tnc_read_node
                         ->GC(change zt->lnum to 224(GCHD) in _TNC_)
                         ->zbr->lnum = 54 becomes DATAHD
ubifs_get_wbuf(zbr->lnum = 54 as the DATAHD) is ture again
     ubifs_read_node_wbuf
-------------------------------------------------------------------------------------------
Data lays in LEB54 have been moved to LEB224, so error happens. if a journal head is
garbage collected, if that means there is a commit has finished? I guess this situation
happens when making reservation, runs commit first and then does garbage collect.

Thanks.
Carson.
________________________________
 This email (including its attachments) is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. Unauthorized use, dissemination, distribution or copying of this email or the information herein or taking any action in reliance on the contents of this email or the information herein, by anyone other than the intended recipient, or an employee or agent responsible for delivering the message to the intended recipient, is strictly prohibited. If you are not the intended recipient, please do not read, copy, use or disclose any part of this e-mail to others. Please notify the sender immediately and permanently delete this e-mail and any attachments if you received it in error. Internet communications cannot be guaranteed to be timely, secure, error-free or virus-free. The sender does not accept liability for any errors or omissions.
本邮件及其附件具有保密性质,受法律保护不得泄露,仅发送给本邮件所指特定收件人。严禁非经授权使用、宣传、发布或复制本邮件或其内容。若非该特定收件人,请勿阅读、复制、 使用或披露本邮件的任何内容。若误收本邮件,请从系统中永久性删除本邮件及所有附件,并以回复邮件的方式即刻告知发件人。无法保证互联网通信及时、安全、无误或防毒。发件人对任何错漏均不承担责任。
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ubifs: read bad node type in ubifs_tnc_read_wbuf
  2020-01-14 11:01             ` 李傲傲 (Carson Li1/9542)
@ 2020-01-14 23:16               ` Richard Weinberger
  2020-01-16  0:54                 ` 李傲傲 (Carson Li1/9542)
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Weinberger @ 2020-01-14 23:16 UTC (permalink / raw)
  To: 李傲傲; +Cc: linux-mtd

----- Ursprüngliche Mail -----
> I am not sure if i have made it clear, maybe following logic can make it
> clearer:
> 
> A  |          B
> --------------------------------------------------------------------------------------------
> ubifs_tnc_locate
>   zbr->lnum = 54 (find in TNC)
>     ubifs_get_wbuf(zbr->lnum = 54) is ture
>          ubifs_tnc_read_node
>                         ->GC(change zt->lnum to 224(GCHD) in _TNC_)
>                         ->zbr->lnum = 54 becomes DATAHD
> ubifs_get_wbuf(zbr->lnum = 54 as the DATAHD) is ture again
>     ubifs_read_node_wbuf
> -------------------------------------------------------------------------------------------
> Data lays in LEB54 have been moved to LEB224, so error happens. if a journal
> head is
> garbage collected, if that means there is a commit has finished? I guess this
> situation
> happens when making reservation, runs commit first and then does garbage
> collect.

Well, a journal head should never get garbage collected at first place.
If I understand your analysis correctly it looks like this rule was broken.
So we need to understand how this could happen.
LPROPS_TAKEN should avoid this.

If possible, can you please check the lprobs of the affected LEBs?

Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ubifs: read bad node type in ubifs_tnc_read_wbuf
  2020-01-14 23:16               ` Richard Weinberger
@ 2020-01-16  0:54                 ` 李傲傲 (Carson Li1/9542)
  0 siblings, 0 replies; 8+ messages in thread
From: 李傲傲 (Carson Li1/9542) @ 2020-01-16  0:54 UTC (permalink / raw)
  To: linux-mtd

>LPROPS_TAKEN should avoid this.
Is it possible that a commit finishes and removes the LPROPS_TAKEN flag?
and when it is reused as a journal head, LPROPS_TAKEN flag is set again.

>If possible, can you please check the lprobs of the affected LEBs?
--------------------------------case 1----------------------------------------------
jhead[GCHD].wbuf.lnum   = 195(found in TNC)
jhead[DATAHD].wbuf.lnum  = 139(zbr passed to ubifs_tnc_read_node)

struct ubifs_lprops =
{
      free = 4096,
      dirty = 123272,
      flags = 16, //LPROPS_TAKEN
      lnum = 195,
      {
        list = {
          next = 0xcd1ad6d8,
          prev = 0xcd1ad440
        },
        hpos = -853879080
      }
    }

struct ubifs_lprops = {
      free = 124944,
      dirty = 0,
      flags = 16, //LPROPS_TAKEN
      lnum = 139,
      {
        list = {
          next = 0xc8be96c0,
          prev = 0xc90e99c8
        },
        hpos = -927033664
      }
    }

one more information about the ubifs_bud rbtree:

crash_arm> tree -t rbtree -o ubifs_bud.rb -N 0xcd196dd4 -s ubifs_bud.lnum
cd196dc0
  lnum = 195//jhead[GCHD]
c89207c0
  lnum = 60
cd196fc0
  lnum = 59
c8bd5e40
  lnum = 125
c8935600
  lnum = 97
c2aafec0
  lnum = 139//jhead[DATAHD]
c8bb2140
  lnum = 279//jhead[BASEHD]
c8ee8d00
  lnum = 239
cd196980
  lnum = 348
--------------------------------------case 2----------------------------------------
jhead[GCHD].wbuf.lnum   = 224(found in TNC)
jhead[DATAHD].wbuf.lnum  = 54(zbr passed to ubifs_tnc_read_node)

struct ubifs_lprops =
{
      free = 24576,
      dirty = 127640,
      flags = 16, //LPROPS_TAKEN
      lnum = 224,
      {
        list = {
          next = 0xc9014ec0,
          prev = 0xc91ed0c0
        },
        hpos = -922661184
      }
 }
struct ubifs_lprops =
{
      free = 45952,
      dirty = 0,
      flags = 16, //LPROPS_TAKEN
      lnum = 54,
      {
        list = {
          next = 0xcd1d4328,
          prev = 0xc90ea9c8
        },
        hpos = -853720280
      }
    }

crash_arm> tree -t rbtree -o ubifs_bud.rb -N 0xc6afae14 -s ubifs_bud.lnum
c6afae00
  lnum = 88
c8fa5540
  lnum = 53//jhead[BASEHD]
c916b500
  lnum = 42
c8f8b140
  lnum = 62
c2a953c0
  lnum = 54//jhead[DATAHD]
c916b480
  lnum = 215
c2ac0040
  lnum = 120
c916b400
  lnum = 224//jhead[GCHD]
c6b02f80
  lnum = 271

By the way, there is another timing the LEB might be garbage collected:
>     A      |              B
> --------------------------------------------------------------------------
> ubifs_tnc_locate
>   zbr->lnum = 54 (find in TNC)
>     ubifs_get_wbuf(zbr->lnum = 54) is ture
>          ubifs_tnc_read_node
>                         ->GC(change zt->lnum to 224(GCHD) in _TNC_)
>                         ->zbr->lnum = 54 becomes DATAHD
>             ubifs_get_wbuf(zbr->lnum = 54 as the DATAHD) is ture again
>             ubifs_read_node_wbuf
> --------------------------------------------------------------------------

     A      |              B
 ---------------------------------------------------------------------------
ubifs_tnc_locate
   zbr->lnum = 54 (find in TNC)
                         ->GC(change zt->lnum to 224(GCHD) in _TNC_)
                         ->zbr->lnum = 54 becomes DATAHD
     ubifs_get_wbuf(zbr->lnum = 54 as the DATAHD) is ture
          ubifs_tnc_read_node
             ubifs_get_wbuf(zbr->lnum = 54) is ture again
             ubifs_read_node_wbuf
 ---------------------------------------------------------------------------

Thanks.
Carson
________________________________
 This email (including its attachments) is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. Unauthorized use, dissemination, distribution or copying of this email or the information herein or taking any action in reliance on the contents of this email or the information herein, by anyone other than the intended recipient, or an employee or agent responsible for delivering the message to the intended recipient, is strictly prohibited. If you are not the intended recipient, please do not read, copy, use or disclose any part of this e-mail to others. Please notify the sender immediately and permanently delete this e-mail and any attachments if you received it in error. Internet communications cannot be guaranteed to be timely, secure, error-free or virus-free. The sender does not accept liability for any errors or omissions.
本邮件及其附件具有保密性质,受法律保护不得泄露,仅发送给本邮件所指特定收件人。严禁非经授权使用、宣传、发布或复制本邮件或其内容。若非该特定收件人,请勿阅读、复制、 使用或披露本邮件的任何内容。若误收本邮件,请从系统中永久性删除本邮件及所有附件,并以回复邮件的方式即刻告知发件人。无法保证互联网通信及时、安全、无误或防毒。发件人对任何错漏均不承担责任。
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* ubifs: read bad node type in ubifs_tnc_read_wbuf 
@ 2020-01-10  9:10 李傲傲 (Carson Li1/9542)
  0 siblings, 0 replies; 8+ messages in thread
From: 李傲傲 (Carson Li1/9542) @ 2020-01-10  9:10 UTC (permalink / raw)
  To: linux-mtd

Hi mtd-list:
I have a problem recently on ubifs: kernel reports to read a bad node type
in ubifs_tnc_read_wbuf.

I guess the problem occurs because of tnc_mutex is freed in ubifs_tnc_locate.
Though it marks the journal heads will not be GCed. But if there is a commit
just finished, LEBs in journal heads will not be in bud rbtree anymore. And
it might be GCed. Here is the kernel log:

------------------------------------------------kernel log ------------------------------------
[   30.072901] c0 UBIFS error (ubi0:23 pid 944): ubifs_read_node_wbuf: bad node type (232 but expected 0)
[   30.085890] c1 UBIFS error (ubi0:23 pid 944): ubifs_read_node_wbuf: wbuf=cd008f40,buf=d4cd9000,
wbuf->buf=c90ec000,offs=164320, len=160, rlen=0
[   30.109436] c1 UBIFS error (ubi0:23 pid 944): ubifs_read_node_wbuf: wxf bad node at LEB 139:164320
[   30.121841] c0 Not a node, first 24 bytes:
[   30.125697] c1 00000000: 65 72 2e 78 6d 6c 2e 62 61 6b 00 b0 4d 14 c0 20 31 18 10 06 e8 3a 41 27
[   30.141302] c1 Kernel panic - not syncing: ch address is d4cd9000
[   30.152510] c1 CPU: 1 PID: 944 Comm: http-thread Tainted: G           O    4.4.83 #2
[   30.152522] c1 Hardware name: Generic DT based system
[   30.152557] c1 [<c001624c>] (unwind_backtrace) from [<c0012f8c>] (show_stack+0x10/0x14)
[   30.152574] c1 [<c0012f8c>] (show_stack) from [<c0287188>] (dump_stack+0x80/0xa0)
[   30.152590] c1 [<c0287188>] (dump_stack) from [<c00acff0>] (panic+0x70/0x1e4)
[   30.152609] c1 [<c00acff0>] (panic) from [<c01e3274>] (ubifs_dump_node+0x6c/0x9a0)
[   30.152626] c1 [<c01e3274>] (ubifs_dump_node) from [<c01c3200>] (ubifs_read_node_wbuf+0x350/0x384)
[   30.152641] c1 [<c01c3200>] (ubifs_read_node_wbuf) from [<c01e0d28>] (ubifs_tnc_read_node+0x54/0x214)
[   30.152658] c1 [<c01e0d28>] (ubifs_tnc_read_node) from [<c01c65ac>] (ubifs_tnc_locate+0x118/0x1b4)
[   30.152675] c1 [<c01c65ac>] (ubifs_tnc_locate) from [<c01bd2c8>] (ubifs_iget+0xb8/0x68c)
[   30.152691] c1 [<c01bd2c8>] (ubifs_iget) from [<c01b9b90>] (ubifs_lookup+0x1b4/0x258)
[   30.152706] c1 [<c01b9b90>] (ubifs_lookup) from [<c00f8134>] (lookup_real+0x30/0x4c)
[   30.152720] c1 [<c00f8134>] (lookup_real) from [<c00f8e68>] (__lookup_hash+0x34/0x3c)
[   30.152732] c1 [<c00f8e68>] (__lookup_hash) from [<c00f8f5c>] (walk_component+0xec/0x2a0)
[   30.152745] c1 [<c00f8f5c>] (walk_component) from [<c00fa600>] (path_lookupat+0x80/0xfc)
[   30.152761] c1 [<c00fa600>] (path_lookupat) from [<c00fd358>] (filename_lookup+0x5c/0xfc)
[   30.152776] c1 [<c00fd358>] (filename_lookup) from [<c00ef114>] (SyS_faccessat+0xa0/0x1c0)
[   30.152792] c1 [<c00ef114>] (SyS_faccessat) from [<c000f8c0>] (ret_fast_syscall+0x0/0x34)
------------------------------------------------kernel log ------------------------------------
And here is my analysis:
--------------------------zbranch passed to ubifs_tnc_read_node -------------------
crash_arm> ubifs_zbranch  c612bd90 struct ubifs_zbranch {
  key = {
    u8 = "]\003\000\000\000\000\000",
    u32 = {861, 0},
    u64 = {861},
    j32 = {861, 0}
  },
  {
    znode = 0x0,
    leaf = 0x0
  },
  lnum = 139,                               //jhead[DATAHD]
  offs = 164320,
  len = 160
}
--------------------------zbranch matches the key {861,0} in TNC-------------------
crash_arm> ubifs_zbranch 0xc8fc2230 struct ubifs_zbranch {
  key = {
    u8 = "]\003\000\000\000\000\000",
    u32 = {861, 0},
    u64 = {861},
    j32 = {861, 0}
  },
  {
    znode = 0x0,
    leaf = 0x0
  },
  lnum = 195,                            //jhead[GCHD]
  offs = 246080,
  len = 160
}
The zbr->lnum if different, and I found that the lnum = 139 is gced_lnum and also the
jhead[DATAHD].wbuf.lnum. The lnum = 195 is jhead[GCHD].wbuf.lnum.

crash_arm> ubifs_jhead.wbuf.lnum  0xcd008e00
  wbuf.lnum = 195,               //jhead[GCHD]
crash_arm> ubifs_jhead.wbuf.lnum  0xcd008ea0
  wbuf.lnum = 279,               //jhead[BASEHD]
crash_arm> ubifs_jhead.wbuf.lnum  0xcd008f40
  wbuf.lnum = 139,               //jhead[DATAHD]

so it can somehow prove my guess that when tnc_mutex is dropped, the zbr->lnum = 139
is GCed and taken to be the leb used by jhead[DATAHD]. In the GC progress, the nodes
lay in the GCing leb are moved to jhead[GCHD].wbuf, at the same time the nodes in the
TNC will be uptodated to point to the jhead[GCHD].wbuf.lnum = 195.

All progress above happened while ubifs_get_wbuf returns true in ubifs_tnc_locate.
So I doubt that marks saying “We do not GC journal heads” and delete some codes as a
temporary solution for this problem. This deletion is just take every leb found
as the same to execute fallible_read_node.

------------------------------------temporary solution--------------------------------
--- a/fs/ubifs/tnc.c
+++ b/fs/ubifs/tnc.c
@@ -1482,12 +1482,6 @@ again:
 gc_seq1 = c->gc_seq;
 mutex_unlock(&c->tnc_mutex);

-if (ubifs_get_wbuf(c, zbr.lnum)) {
-/* We do not GC journal heads */
-err = ubifs_tnc_read_node(c, &zbr, node);
-return err;
-}
-
Actually, compared to that solution described above, I more suggest to modify the
ubifs_get_wbuf.ubifs_get_wbuf is to check if the LEB is on the jhead, but
ubifs_tnc_read_wbuf only read node from wbuf when the lnum is equal to wbuf.lnum
and the others still need to read on flash. It seems to be better to just make
ubifs_get_wbuf to check if the LEB is equal to the wbufs.lnum, and then there is
no need to have a double check in ubifs_tnc_read_wbuf.
-----------------------------more suggested but not tested solution---------------------
--- a/fs/ubifs/log.c
+++ b/fs/ubifs/log.c
@@ -70,28 +70,16 @@ struct ubifs_bud *ubifs_search_bud(struct ubifs_info *c, int lnum)
  */
 struct ubifs_wbuf *ubifs_get_wbuf(struct ubifs_info *c, int lnum)  {
-struct rb_node *p;
-struct ubifs_bud *bud;
 int jhead;

 if (!c->jheads)
 return NULL;

-spin_lock(&c->buds_lock);
-p = c->buds.rb_node;
-while (p) {
-bud = rb_entry(p, struct ubifs_bud, rb);
-if (lnum < bud->lnum)
-p = p->rb_left;
-else if (lnum > bud->lnum)
-p = p->rb_right;
-else {
-jhead = bud->jhead;
-spin_unlock(&c->buds_lock);
+for(jhead = 0; jhead < c->jhead_cnt; jhead++){
+if(lnum == c->jheads[jhead].wbuf.lnum)
 return &c->jheads[jhead].wbuf;
-}
 }
-spin_unlock(&c->buds_lock);
+
 return NULL;
 }

--- a/fs/ubifs/io.c
+++ b/fs/ubifs/io.c
@@ -906,9 +906,10 @@ int ubifs_read_node_wbuf(struct ubifs_wbuf *wbuf, void *buf, int type, int len,
 ubifs_assert(wbuf && lnum >= 0 && lnum < c->leb_cnt && offs >= 0);
 ubifs_assert(!(offs & 7) && offs < c->leb_size);
 ubifs_assert(type >= 0 && type < UBIFS_NODE_TYPES_CNT);
+ubifs_assert(wbuf->lnum == lnum);

 spin_lock(&wbuf->lock);
-overlap = (lnum == wbuf->lnum && offs + len > wbuf->offs);
+overlap = (offs + len > wbuf->offs);
 if (!overlap) {
 /* We may safely unlock the write-buffer and read the data */
 spin_unlock(&wbuf->lock);

after the modification, the LEB contains node will not be GCed since even though
there is a commit, the wbuf.lnum is still on the bud rbtree as a journal head leb.

I am not sure if I have made things right, and it is a problem with low probability.
so I haven’t applied the modification yet. could you please kindly check for me
if I am on the wrong direction? If so, is there any relevant information for me to
fix that?

Thanks
Carson.li

________________________________
 This email (including its attachments) is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. Unauthorized use, dissemination, distribution or copying of this email or the information herein or taking any action in reliance on the contents of this email or the information herein, by anyone other than the intended recipient, or an employee or agent responsible for delivering the message to the intended recipient, is strictly prohibited. If you are not the intended recipient, please do not read, copy, use or disclose any part of this e-mail to others. Please notify the sender immediately and permanently delete this e-mail and any attachments if you received it in error. Internet communications cannot be guaranteed to be timely, secure, error-free or virus-free. The sender does not accept liability for any errors or omissions.
本邮件及其附件具有保密性质,受法律保护不得泄露,仅发送给本邮件所指特定收件人。严禁非经授权使用、宣传、发布或复制本邮件或其内容。若非该特定收件人,请勿阅读、复制、 使用或披露本邮件的任何内容。若误收本邮件,请从系统中永久性删除本邮件及所有附件,并以回复邮件的方式即刻告知发件人。无法保证互联网通信及时、安全、无误或防毒。发件人对任何错漏均不承担责任。
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, back to index

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <e2d34800d72243ed85e8df63bb80f7ab@shmbx04.spreadtrum.com>
     [not found] ` <ce621148db0e4f9c80f32c348975663a@shmbx04.spreadtrum.com>
     [not found]   ` <ae16d475c0d24c1ba6ab18f1e766e928@shmbx04.spreadtrum.com>
     [not found]     ` <46e2f6ca11a442f38d476d4346e8034c@shmbx04.spreadtrum.com>
2020-01-12 23:48       ` ubifs: read bad node type in ubifs_tnc_read_wbuf Richard Weinberger
2020-01-14  1:09         ` 李傲傲 (Carson Li1/9542)
2020-01-14  8:09           ` Richard Weinberger
2020-01-14  8:28             ` 李傲傲 (Carson Li1/9542)
2020-01-14 11:01             ` 李傲傲 (Carson Li1/9542)
2020-01-14 23:16               ` Richard Weinberger
2020-01-16  0:54                 ` 李傲傲 (Carson Li1/9542)
2020-01-10  9:10 李傲傲 (Carson Li1/9542)

Linux-mtd Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mtd/0 linux-mtd/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mtd linux-mtd/ https://lore.kernel.org/linux-mtd \
		linux-mtd@lists.infradead.org
	public-inbox-index linux-mtd

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-mtd


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git