ubifs: master area fails to recover when master node 1 is corrupted

All of lore.kernel.org
 help / color / mirror / Atom feed

* ubifs: master area fails to recover when master node 1 is corrupted
@ 2024-01-25 11:48 Ryder Wang
  2024-01-26  2:20 ` Zhihao Cheng
       [not found] ` <MEYP282MB3164D303B64CFF576D2FD741BF552@MEYP282MB3164.AUSP282.PROD.OUTLOOK.COM>
  0 siblings, 2 replies; 8+ messages in thread
From: Ryder Wang @ 2024-01-25 11:48 UTC (permalink / raw)
  To: linux-mtd

Hi,

I just find that master area will always fail to recover while mounting, when master node 1's CRC is corrupted but master node 2 is completely good.  It can be 100% reproduced on Kernel v5.4.233, but it seems a common issue.

How to reproduce it:
1. Corrupt the CRC value of master node 1 (keep master node 2 is good) on ubifs.
2. Mount this ubifs. 

Mount at step#2 will always fail. From the log, it looks master recovering fails, but master recovering is expected to be OK in such case.

Below is the kernel log of this failure:

ubifs_mount:2253: UBIFS DBG gen (pid 10770): name ubi0:test_volume, flags 0x0
ubifs_mount:2274: UBIFS DBG gen (pid 10770): opened ubi0_0
ubifs_read_node:1094: UBIFS DBG io (pid 10770): LEB 0:0, superblock node, length 4096
UBIFS (ubi0:0): Mounting in unauthenticated mode
ubifs_read_superblock:765: UBIFS DBG mnt (pid 10770): Auto resizing from 13 LEBs to 100 LEBs
ubifs_start_scan:131: UBIFS DBG scan (pid 10770): scan LEB 1:0
ubifs_scan:270: UBIFS DBG scan (pid 10770): look at LEB 1:0 (253952 bytes left)
ubifs_scan_a_node:77: UBIFS DBG scan (pid 10770): scanning master node at LEB 1:0
UBIFS error (ubi0:0 pid 10770): ubifs_scan [ubifs]: bad node
ubifs_recover_master_node:234: UBIFS DBG rcvry (pid 10770): recovery
ubifs_scan_a_node:77: UBIFS DBG scan (pid 10770): scanning master node at LEB 1:0
get_master_node:163: UBIFS DBG rcvry (pid 10770): found corruption at 1:0
ubifs_scan_a_node:77: UBIFS DBG scan (pid 10770): scanning master node at LEB 2:0
get_master_node:152: UBIFS DBG rcvry (pid 10770): found a master node at 2:0
UBIFS error (ubi0:0 pid 10770): ubifs_recover_master_node [ubifs]: failed to recover master node
UBIFS error (ubi0:0 pid 10770): ubifs_recover_master_node [ubifs]: dumping second master node
UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 10772
        magic          0x6101831
        crc            0x3a5c03b2
        node_type      7 (master node)
        group_type     0 (no node group)
        sqnum          9
        len            512
        highest_inum   65
        commit number  0
        flags          0x2
        log_lnum       3
        root_lnum      12
        root_offs      0
        root_len       108
        gc_lnum        11
        ihead_lnum     12
        ihead_offs     4096
        index_size     112
        lpt_lnum       7
        lpt_offs       44
        nhead_lnum     7
        nhead_offs     4096
        ltab_lnum      7
        ltab_offs      57
        lsave_lnum     0
        lsave_offs     0
        lscan_lnum     10
        leb_cnt        13
        empty_lebs     1
        idx_lebs       1
        total_free     753664
        total_dirty    7640
        total_used     440
        total_dead     0
        total_dark     16384
UBIFS (ubi0:0): background thread "ubifs_bgt0_0" stops

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ubifs: master area fails to recover when master node 1 is corrupted
  2024-01-25 11:48 ubifs: master area fails to recover when master node 1 is corrupted Ryder Wang
@ 2024-01-26  2:20 ` Zhihao Cheng
  2024-01-27  7:21   ` Ryder Wang
       [not found] ` <MEYP282MB3164D303B64CFF576D2FD741BF552@MEYP282MB3164.AUSP282.PROD.OUTLOOK.COM>
  1 sibling, 1 reply; 8+ messages in thread
From: Zhihao Cheng @ 2024-01-26  2:20 UTC (permalink / raw)
  To: Ryder Wang, linux-mtd

在 2024/1/25 19:48, Ryder Wang 写道:
> Hi,
> 
> I just find that master area will always fail to recover while mounting, when master node 1's CRC is corrupted but master node 2 is completely good.  It can be 100% reproduced on Kernel v5.4.233, but it seems a common issue.
> 

According to the debug messages below, the mounting failure occurs as 
follows:
                     LEB 1                       LEB 2
           |mst1 | 0xFF 0xFF ... |      |mst2 | 0xFF 0xFF ... |
offset    0                            0
* mst1 has bad crc.

ubifs_recover_master_node
  get_master_node(UBIFS_MST_LNUM, &mst1)
   ubifs_scan_a_node(buf, lnum, offs=0) // SCANNED_A_CORRUPT_NODE
    ubifs_check_node  // -EUCLEAN, caused by bad crc
   if (offs < c->leb_size) // true
    if (!is_empty(buf, min_t(int, len, sz))) // true
     dbg_rcvry("found corruption at %d:%d")
  get_master_node(UBIFS_MST_LNUM + 1, &buf2, &mst2)
   ubifs_scan_a_node // SCANNED_A_NODE
   *mst = buf // buf = sbuf
   buf2 = sbuf
  if (mst1) // false
  else {
   offs2 = (void *)mst2 - buf2;  // offs2 = 0
   if (offs2 + sz + sz <= c->leb_size) // true, mst2 is the first node 
in LEB 2
     goto out_err
  }

Above process is one situation recovering master nodes after powercut, 
which means that LEB 1 is unmapped and ready to be written the newest 
master node, then powercut happens:
ubifs_write_master
  lnum = UBIFS_MST_LNUM; // LEB 1
  if (offs + UBIFS_MST_NODE_SZ > c->leb_size) // true
   err = ubifs_leb_unmap(c, lnum);
  >> powercut <<
  err = ubifs_write_node_hmac(c->mst_node, lnum)
So master node from LEB 2 can only be recovered in condition that there 
is no room left for new master nodes in LEB 2.
Now, the problem is that we corrupt mst1 to construct this situation, 
UBIFS identifies that the fact is not the expected situation, UBIFS 
refuses to recover master nodes.

> How to reproduce it:
> 1. Corrupt the CRC value of master node 1 (keep master node 2 is good) on ubifs.
> 2. Mount this ubifs.
> 
> Mount at step#2 will always fail. From the log, it looks master recovering fails, but master recovering is expected to be OK in such case.

Master node is not expected to be OK in this situation. These two master 
nodes are not used to recovery in any situations, they are used to find 
a valid version of master node. You can refer to following section in [1]:

"The master node stores the position of all on-flash structures ... The 
first is that there could be a loss of power at the same instant that 
the master node is being written. The second is that there could be 
degradation or corruption of the flash media itself. ... In the second 
case, recovery is not possible because it cannot be determined reliably 
what is a valid master node version."

[1] http://linux-mtd.infradead.org/doc/ubifs_whitepaper.pdf

> 
> Below is the kernel log of this failure:
> 
> ubifs_mount:2253: UBIFS DBG gen (pid 10770): name ubi0:test_volume, flags 0x0
> ubifs_mount:2274: UBIFS DBG gen (pid 10770): opened ubi0_0
> ubifs_read_node:1094: UBIFS DBG io (pid 10770): LEB 0:0, superblock node, length 4096
> UBIFS (ubi0:0): Mounting in unauthenticated mode
> ubifs_read_superblock:765: UBIFS DBG mnt (pid 10770): Auto resizing from 13 LEBs to 100 LEBs
> ubifs_start_scan:131: UBIFS DBG scan (pid 10770): scan LEB 1:0
> ubifs_scan:270: UBIFS DBG scan (pid 10770): look at LEB 1:0 (253952 bytes left)
> ubifs_scan_a_node:77: UBIFS DBG scan (pid 10770): scanning master node at LEB 1:0
> UBIFS error (ubi0:0 pid 10770): ubifs_scan [ubifs]: bad node
> ubifs_recover_master_node:234: UBIFS DBG rcvry (pid 10770): recovery
> ubifs_scan_a_node:77: UBIFS DBG scan (pid 10770): scanning master node at LEB 1:0
> get_master_node:163: UBIFS DBG rcvry (pid 10770): found corruption at 1:0
> ubifs_scan_a_node:77: UBIFS DBG scan (pid 10770): scanning master node at LEB 2:0
> get_master_node:152: UBIFS DBG rcvry (pid 10770): found a master node at 2:0
> UBIFS error (ubi0:0 pid 10770): ubifs_recover_master_node [ubifs]: failed to recover master node
> UBIFS error (ubi0:0 pid 10770): ubifs_recover_master_node [ubifs]: dumping second master node
> UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 10772
>          magic          0x6101831
>          crc            0x3a5c03b2
>          node_type      7 (master node)
>          group_type     0 (no node group)
>          sqnum          9
>          len            512
>          highest_inum   65
>          commit number  0
>          flags          0x2
>          log_lnum       3
>          root_lnum      12
>          root_offs      0
>          root_len       108
>          gc_lnum        11
>          ihead_lnum     12
>          ihead_offs     4096
>          index_size     112
>          lpt_lnum       7
>          lpt_offs       44
>          nhead_lnum     7
>          nhead_offs     4096
>          ltab_lnum      7
>          ltab_offs      57
>          lsave_lnum     0
>          lsave_offs     0
>          lscan_lnum     10
>          leb_cnt        13
>          empty_lebs     1
>          idx_lebs       1
>          total_free     753664
>          total_dirty    7640
>          total_used     440
>          total_dead     0
>          total_dark     16384
> UBIFS (ubi0:0): background thread "ubifs_bgt0_0" stops
> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
> .
> 


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ubifs: master area fails to recover when master node 1 is corrupted
  2024-01-26  2:20 ` Zhihao Cheng
@ 2024-01-27  7:21   ` Ryder Wang
  2024-01-27  9:39     ` Zhihao Cheng
  0 siblings, 1 reply; 8+ messages in thread
From: Ryder Wang @ 2024-01-27  7:21 UTC (permalink / raw)
  To: Zhihao Cheng, linux-mtd

Hi Zhihao,

Your explanation is very professional. Thanks for it.

But I still have a doubt about the code logic:
------------------------------------
  if (mst1) // false
  else {
   offs2 = (void *)mst2 - buf2;  // offs2 = 0
   if (offs2 + sz + sz <= c->leb_size) // true, mst2 is the first node in LEB 2
     goto out_err
  }
------------------------------------
1. My testing result just proved that CRC-corrupted master#1 also runs to "else" clause of the code above, just like master#1 is unmapped.
2. For CRC corrupted master#1 case, the code logic looks inconsistent:
  2.1. If master#2 LEB is just to be full, master#2 will be used to recover master area.
  2.2. If master#2 LEB is not to be full, master recovering will be aborted with error.

I think whether master#2 LEB is to be full has nothing to do with whether to recover master area in such case. How do you think about it?
________________________________________
From: Zhihao Cheng <chengzhihao1@huawei.com>
Sent: Friday, January 26, 2024 10:20
To: Ryder Wang; linux-mtd@lists.infradead.org
Subject: Re: ubifs: master area fails to recover when master node 1 is corrupted

在 2024/1/25 19:48, Ryder Wang 写道:
> Hi,
>
> I just find that master area will always fail to recover while mounting, when master node 1's CRC is corrupted but master node 2 is completely good.  It can be 100% reproduced on Kernel v5.4.233, but it seems a common issue.
>

According to the debug messages below, the mounting failure occurs as
follows:
                     LEB 1                       LEB 2
           |mst1 | 0xFF 0xFF ... |      |mst2 | 0xFF 0xFF ... |
offset    0                            0
* mst1 has bad crc.

ubifs_recover_master_node
  get_master_node(UBIFS_MST_LNUM, &mst1)
   ubifs_scan_a_node(buf, lnum, offs=0) // SCANNED_A_CORRUPT_NODE
    ubifs_check_node  // -EUCLEAN, caused by bad crc
   if (offs < c->leb_size) // true
    if (!is_empty(buf, min_t(int, len, sz))) // true
     dbg_rcvry("found corruption at %d:%d")
  get_master_node(UBIFS_MST_LNUM + 1, &buf2, &mst2)
   ubifs_scan_a_node // SCANNED_A_NODE
   *mst = buf // buf = sbuf
   buf2 = sbuf
  if (mst1) // false
  else {
   offs2 = (void *)mst2 - buf2;  // offs2 = 0
   if (offs2 + sz + sz <= c->leb_size) // true, mst2 is the first node
in LEB 2
     goto out_err
  }

Above process is one situation recovering master nodes after powercut,
which means that LEB 1 is unmapped and ready to be written the newest
master node, then powercut happens:
ubifs_write_master
  lnum = UBIFS_MST_LNUM; // LEB 1
  if (offs + UBIFS_MST_NODE_SZ > c->leb_size) // true
   err = ubifs_leb_unmap(c, lnum);
  >> powercut <<
  err = ubifs_write_node_hmac(c->mst_node, lnum)
So master node from LEB 2 can only be recovered in condition that there
is no room left for new master nodes in LEB 2.
Now, the problem is that we corrupt mst1 to construct this situation,
UBIFS identifies that the fact is not the expected situation, UBIFS
refuses to recover master nodes.

> How to reproduce it:
> 1. Corrupt the CRC value of master node 1 (keep master node 2 is good) on ubifs.
> 2. Mount this ubifs.
>
> Mount at step#2 will always fail. From the log, it looks master recovering fails, but master recovering is expected to be OK in such case.

Master node is not expected to be OK in this situation. These two master
nodes are not used to recovery in any situations, they are used to find
a valid version of master node. You can refer to following section in [1]:

"The master node stores the position of all on-flash structures ... The
first is that there could be a loss of power at the same instant that
the master node is being written. The second is that there could be
degradation or corruption of the flash media itself. ... In the second
case, recovery is not possible because it cannot be determined reliably
what is a valid master node version."

[1] http://linux-mtd.infradead.org/doc/ubifs_whitepaper.pdf

>
> Below is the kernel log of this failure:
>
> ubifs_mount:2253: UBIFS DBG gen (pid 10770): name ubi0:test_volume, flags 0x0
> ubifs_mount:2274: UBIFS DBG gen (pid 10770): opened ubi0_0
> ubifs_read_node:1094: UBIFS DBG io (pid 10770): LEB 0:0, superblock node, length 4096
> UBIFS (ubi0:0): Mounting in unauthenticated mode
> ubifs_read_superblock:765: UBIFS DBG mnt (pid 10770): Auto resizing from 13 LEBs to 100 LEBs
> ubifs_start_scan:131: UBIFS DBG scan (pid 10770): scan LEB 1:0
> ubifs_scan:270: UBIFS DBG scan (pid 10770): look at LEB 1:0 (253952 bytes left)
> ubifs_scan_a_node:77: UBIFS DBG scan (pid 10770): scanning master node at LEB 1:0
> UBIFS error (ubi0:0 pid 10770): ubifs_scan [ubifs]: bad node
> ubifs_recover_master_node:234: UBIFS DBG rcvry (pid 10770): recovery
> ubifs_scan_a_node:77: UBIFS DBG scan (pid 10770): scanning master node at LEB 1:0
> get_master_node:163: UBIFS DBG rcvry (pid 10770): found corruption at 1:0
> ubifs_scan_a_node:77: UBIFS DBG scan (pid 10770): scanning master node at LEB 2:0
> get_master_node:152: UBIFS DBG rcvry (pid 10770): found a master node at 2:0
> UBIFS error (ubi0:0 pid 10770): ubifs_recover_master_node [ubifs]: failed to recover master node
> UBIFS error (ubi0:0 pid 10770): ubifs_recover_master_node [ubifs]: dumping second master node
> UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 10772
>          magic          0x6101831
>          crc            0x3a5c03b2
>          node_type      7 (master node)
>          group_type     0 (no node group)
>          sqnum          9
>          len            512
>          highest_inum   65
>          commit number  0
>          flags          0x2
>          log_lnum       3
>          root_lnum      12
>          root_offs      0
>          root_len       108
>          gc_lnum        11
>          ihead_lnum     12
>          ihead_offs     4096
>          index_size     112
>          lpt_lnum       7
>          lpt_offs       44
>          nhead_lnum     7
>          nhead_offs     4096
>          ltab_lnum      7
>          ltab_offs      57
>          lsave_lnum     0
>          lsave_offs     0
>          lscan_lnum     10
>          leb_cnt        13
>          empty_lebs     1
>          idx_lebs       1
>          total_free     753664
>          total_dirty    7640
>          total_used     440
>          total_dead     0
>          total_dark     16384
> UBIFS (ubi0:0): background thread "ubifs_bgt0_0" stops
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
> .
>


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ubifs: master area fails to recover when master node 1 is corrupted
  2024-01-27  7:21   ` Ryder Wang
@ 2024-01-27  9:39     ` Zhihao Cheng
  2024-01-27 10:21       ` Zhihao Cheng
  0 siblings, 1 reply; 8+ messages in thread
From: Zhihao Cheng @ 2024-01-27  9:39 UTC (permalink / raw)
  To: Ryder Wang, linux-mtd

在 2024/1/27 15:21, Ryder Wang 写道:
> Hi Zhihao,
> 
> Your explanation is very professional. Thanks for it.
> 
> But I still have a doubt about the code logic:
> ------------------------------------
>    if (mst1) // false
>    else {
>     offs2 = (void *)mst2 - buf2;  // offs2 = 0
>     if (offs2 + sz + sz <= c->leb_size) // true, mst2 is the first node in LEB 2
>       goto out_err
>    }
> ------------------------------------
> 1. My testing result just proved that CRC-corrupted master#1 also runs to "else" clause of the code above, just like master#1 is unmapped.
> 2. For CRC corrupted master#1 case, the code logic looks inconsistent:
>    2.1. If master#2 LEB is just to be full, master#2 will be used to recover master area.
>    2.2. If master#2 LEB is not to be full, master recovering will be aborted with error.
> 
> I think whether master#2 LEB is to be full has nothing to do with whether to recover master area in such case. How do you think about it?


Actually, UBIFS can still work even if master#2 is recovered in such  
case(master#1 is corrupted), because the master#2 is the newest version.
The offset checking for master#2 LEB being full is a way to make sure  
that UBIFS can find the newest master node. If we simply remove the  
checking, UBIFS could go wrong in some situations, for example:

Powercut happens before writing mst2_v2 on LEB2, so the UBIFS image  
looks like:
              LEB1                                LEB2
|mst1_v1 | mst1_v2 |0xFF 0xFF ... |      |mst2_v1 | 0xFF 0xFF ... |

The mast1_v2 is expected to be recovered after exeucting  
ubifs_recover_master_node(). If both mst1_v1 and mst1_v2 are corrupted,  
UBIFS will enter into this branch:

    if (mst1) // false
    else {
      offs2 = (void *)mst2 - buf2;  // offs2 = 0
      if (offs2 + sz + sz <= c->leb_size) // offset checking
        goto out_err
      mst = mst2;
    }
If the offset checking is removed, mst_2_v1 is recovered, apperantly,  
UBIFS picks wrong master node and it's not right.

So accodring to the realization of ubifs_recover_master_node(), UBFIS  
choose the newest master node by various offset checking, it's just like  
a whitelist of situations that UBIFS can fully trust on, other  
situations are failure pathes, although some of failure pathes can make  
UBIFS still get a right master node in some(not all) cases.

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ubifs: master area fails to recover when master node 1 is corrupted
  2024-01-27  9:39     ` Zhihao Cheng
@ 2024-01-27 10:21       ` Zhihao Cheng
  0 siblings, 0 replies; 8+ messages in thread
From: Zhihao Cheng @ 2024-01-27 10:21 UTC (permalink / raw)
  To: Ryder Wang, linux-mtd

在 2024/1/27 17:39, Zhihao Cheng 写道:
> 在 2024/1/27 15:21, Ryder Wang 写道:
>> Hi Zhihao,
>>
>> Your explanation is very professional. Thanks for it.
>>
>> But I still have a doubt about the code logic:
>> ------------------------------------
>>    if (mst1) // false
>>    else {
>>     offs2 = (void *)mst2 - buf2;  // offs2 = 0
>>     if (offs2 + sz + sz <= c->leb_size) // true, mst2 is the first 
>> node in LEB 2
>>       goto out_err
>>    }
>> ------------------------------------
>> 1. My testing result just proved that CRC-corrupted master#1 also runs 
>> to "else" clause of the code above, just like master#1 is unmapped.
>> 2. For CRC corrupted master#1 case, the code logic looks inconsistent:
>>    2.1. If master#2 LEB is just to be full, master#2 will be used to 
>> recover master area.
>>    2.2. If master#2 LEB is not to be full, master recovering will be 
>> aborted with error.
>>
>> I think whether master#2 LEB is to be full has nothing to do with 
>> whether to recover master area in such case. How do you think about it?
> 
> 
> Actually, UBIFS can still work even if master#2 is recovered in such 
> case(master#1 is corrupted), because the master#2 is the newest version.
> The offset checking for master#2 LEB being full is a way to make sure 
> that UBIFS can find the newest master node. If we simply remove the 
> checking, UBIFS could go wrong in some situations, for example:
> 
> Powercut happens before writing mst2_v2 on LEB2, so the UBIFS image 
> looks like:
>               LEB1                                LEB2
> |mst1_v1 | mst1_v2 |0xFF 0xFF ... |      |mst2_v1 | 0xFF 0xFF ... |
> 
> The mast1_v2 is expected to be recovered after exeucting 
> ubifs_recover_master_node(). If both mst1_v1 and mst1_v2 are corrupted, 
> UBIFS will enter into this branch:
> 
>     if (mst1) // false
>     else {
>       offs2 = (void *)mst2 - buf2;  // offs2 = 0
>       if (offs2 + sz + sz <= c->leb_size) // offset checking
>         goto out_err
>       mst = mst2;
>     }
> If the offset checking is removed, mst_2_v1 is recovered, apperantly, 
> UBIFS picks wrong master node and it's not right.
> 
> So accodring to the realization of ubifs_recover_master_node(), UBFIS 
> choose the newest master node by various offset checking, it's just like 
> a whitelist of situations that UBIFS can fully trust on, other 
> situations are failure pathes, although some of failure pathes can make 
> UBIFS still get a right master node in some(not all) cases.
> 

Besides, if there are corruptions in UBIFS image, UBIFS should report 
error, there is nothing that UBIFS can do to fix them.

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ubi: why ubi wear leveling always pick highly worn-out free physical eraseblock as the mov_to target
       [not found] ` <MEYP282MB3164D303B64CFF576D2FD741BF552@MEYP282MB3164.AUSP282.PROD.OUTLOOK.COM>
@ 2024-02-23  7:11   ` Zhihao Cheng
  2024-02-23 10:14     ` ubi: why ubi wear leveling always pick highly worn-out free physical eraseblock Ryder Wang
  0 siblings, 1 reply; 8+ messages in thread
From: Zhihao Cheng @ 2024-02-23  7:11 UTC (permalink / raw)
  To: Ryder Wang, linux-mtd, richard

在 2024/2/23 11:27, Ryder Wang 写道:
> Refer to the ubi source code:
> wear_leveling_worker
>    e2 = get_peb_for_wl(ubi)
>      e = find_wl_entry(ubi, &ubi->free, WL_FREE_MAX_DIFF);
>        
> The function find_wl_entry() always find the  highly worn-out free physical eraseblock (e2):
> 1. It's good to check such PEB (e2) to decide whether the following wear leveling procedure should be continued according to UBI_WL_THRESHOLD.
> 2. But personally I can't understand why such high worn-out free physical eraseblock should also be used as target PEB(move_to) to store the move_from data for wear leveling purpose. Will it be much more reasonable to use low worn-out free physical eraseblock (from ubi->free tree) in this case for more perfect wear leveling?.
> 

Normally, e1 is the smaller ec counter picked from ubi->used, e2 is 
bigger ec counter picked from ubi->free, the wear-leveling worker 
follows that rule which is based on the realization of free PEBs fetching.
First, let's talk about the simplest case without fastmap, assumpt that 
CONFIG_MTD_UBI_WL_THRESHOLD=2, there are total 3 PEBs.
Only 1 EB(erase block) is used, then free it, and repeat the process 
forvever. According to the implementation in 
ubi_wl_get_peb->wl_get_wle->find_mean_wl_entry, the ubi->free tree will 
change like:
(1,1,1) -> (1,1,2) -> (1,2,2) -> (1,2,3) -> (1,3,3) -> ... -> (1,4,5) -> 
(2,4,5), which means that wear-leveling worker is not needed, because 
find_mean_wl_entry has made sure that 'max_ec - min_ec <= 2 * 
CONFIG_MTD_UBI_WL_THRESHOLD'. Similar, when there are more EBs are used 
and freed, there is no need to trigger wear-leveling work.
However, if one free PEB is taken and not be freed, other free PEBs are 
taken and freed, like:
ubi->used: 1
ubi->free: (1, 1) -> (1, 2) -> ... (x, y)
After a while, min(x,y) - 1 will be greater than 
2*CONFIG_MTD_UBI_WL_THRESHOLD. Then, wear-leveling should start working, 
cold data always takes certain PEB for a long time, which causes PEBs 
with smaller ec counter gather in ubi->used and PEBs with bigger ec 
counter gather in ubi->freed.
I think that's the explainations of the rule how to pick e1/e2.

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ubi: why ubi wear leveling always pick highly worn-out free physical eraseblock
  2024-02-23  7:11   ` ubi: why ubi wear leveling always pick highly worn-out free physical eraseblock as the mov_to target Zhihao Cheng
@ 2024-02-23 10:14     ` Ryder Wang
  2024-02-23 10:41       ` Zhihao Cheng
  0 siblings, 1 reply; 8+ messages in thread
From: Ryder Wang @ 2024-02-23 10:14 UTC (permalink / raw)
  To: Zhihao Cheng, linux-mtd, richard

Hi Zhihao,

Thanks for your reply.

You explained why e1 and e2 use picked to compare with CONFIG_MTD_UBI_WL_THRESHOLD to decide whether WL is needed. That's right as I also mentioned. However, the point is that why such high worn-out free physical eraseblock (in ubi->free tree) should be used to store the data of e1? High worn-out PEB should always be avoided to use (low worn-out PEB of ubi->free tree should be preferred), right?

________________________________________
From: Zhihao Cheng <chengzhihao1@huawei.com>
Sent: Friday, February 23, 2024 15:11
To: Ryder Wang; linux-mtd@lists.infradead.org; richard@nod.at
Subject: Re: ubi: why ubi wear leveling always pick highly worn-out free physical eraseblock as the mov_to target

在 2024/2/23 11:27, Ryder Wang 写道:
> Refer to the ubi source code:
> wear_leveling_worker
>    e2 = get_peb_for_wl(ubi)
>      e = find_wl_entry(ubi, &ubi->free, WL_FREE_MAX_DIFF);
>
> The function find_wl_entry() always find the  highly worn-out free physical eraseblock (e2):
> 1. It's good to check such PEB (e2) to decide whether the following wear leveling procedure should be continued according to UBI_WL_THRESHOLD.
> 2. But personally I can't understand why such high worn-out free physical eraseblock should also be used as target PEB(move_to) to store the move_from data for wear leveling purpose. Will it be much more reasonable to use low worn-out free physical eraseblock (from ubi->free tree) in this case for more perfect wear leveling?.
>

Normally, e1 is the smaller ec counter picked from ubi->used, e2 is
bigger ec counter picked from ubi->free, the wear-leveling worker
follows that rule which is based on the realization of free PEBs fetching.
First, let's talk about the simplest case without fastmap, assumpt that
CONFIG_MTD_UBI_WL_THRESHOLD=2, there are total 3 PEBs.
Only 1 EB(erase block) is used, then free it, and repeat the process
forvever. According to the implementation in
ubi_wl_get_peb->wl_get_wle->find_mean_wl_entry, the ubi->free tree will
change like:
(1,1,1) -> (1,1,2) -> (1,2,2) -> (1,2,3) -> (1,3,3) -> ... -> (1,4,5) ->
(2,4,5), which means that wear-leveling worker is not needed, because
find_mean_wl_entry has made sure that 'max_ec - min_ec <= 2 *
CONFIG_MTD_UBI_WL_THRESHOLD'. Similar, when there are more EBs are used
and freed, there is no need to trigger wear-leveling work.
However, if one free PEB is taken and not be freed, other free PEBs are
taken and freed, like:
ubi->used: 1
ubi->free: (1, 1) -> (1, 2) -> ... (x, y)
After a while, min(x,y) - 1 will be greater than
2*CONFIG_MTD_UBI_WL_THRESHOLD. Then, wear-leveling should start working,
cold data always takes certain PEB for a long time, which causes PEBs
with smaller ec counter gather in ubi->used and PEBs with bigger ec
counter gather in ubi->freed.
I think that's the explainations of the rule how to pick e1/e2.

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ubi: why ubi wear leveling always pick highly worn-out free physical eraseblock
  2024-02-23 10:14     ` ubi: why ubi wear leveling always pick highly worn-out free physical eraseblock Ryder Wang
@ 2024-02-23 10:41       ` Zhihao Cheng
  0 siblings, 0 replies; 8+ messages in thread
From: Zhihao Cheng @ 2024-02-23 10:41 UTC (permalink / raw)
  To: Ryder Wang, linux-mtd, richard

在 2024/2/23 18:14, Ryder Wang 写道:
> Hi Zhihao,
> 
> Thanks for your reply.
> 
> You explained why e1 and e2 use picked to compare with CONFIG_MTD_UBI_WL_THRESHOLD to decide whether WL is needed. That's right as I also mentioned. However, the point is that why such high worn-out free physical eraseblock (in ubi->free tree) should be used to store the data of e1? High worn-out PEB should always be avoided to use (low worn-out PEB of ubi->free tree should be preferred), right?
> 

Since e1 has smaller ec counter. because it comes from ubi->used tree  
and likely holds cold data, so it won't be erased for a long time. If we  
choose low worn-out PEB from ubi->free tree as e2, the high worn-out PEB  
can still be used to store new data and be erased again, which will  
aggravate the worn-out of high worn-out PEB, but actually the high  
worn-out PEB should not be scheduled for using. So, the best choice is  
that make high worn-out PEB hold cold data, which can stop frequently  
erasing worn-out PEB.

> ________________________________________
> From: Zhihao Cheng <chengzhihao1@huawei.com>
> Sent: Friday, February 23, 2024 15:11
> To: Ryder Wang; linux-mtd@lists.infradead.org; richard@nod.at
> Subject: Re: ubi: why ubi wear leveling always pick highly worn-out free physical eraseblock as the mov_to target
> 
> 在 2024/2/23 11:27, Ryder Wang 写道:
>> Refer to the ubi source code:
>> wear_leveling_worker
>>     e2 = get_peb_for_wl(ubi)
>>       e = find_wl_entry(ubi, &ubi->free, WL_FREE_MAX_DIFF);
>>
>> The function find_wl_entry() always find the  highly worn-out free physical eraseblock (e2):
>> 1. It's good to check such PEB (e2) to decide whether the following wear leveling procedure should be continued according to UBI_WL_THRESHOLD.
>> 2. But personally I can't understand why such high worn-out free physical eraseblock should also be used as target PEB(move_to) to store the move_from data for wear leveling purpose. Will it be much more reasonable to use low worn-out free physical eraseblock (from ubi->free tree) in this case for more perfect wear leveling?.
>>
> 
> Normally, e1 is the smaller ec counter picked from ubi->used, e2 is
> bigger ec counter picked from ubi->free, the wear-leveling worker
> follows that rule which is based on the realization of free PEBs fetching.
> First, let's talk about the simplest case without fastmap, assumpt that
> CONFIG_MTD_UBI_WL_THRESHOLD=2, there are total 3 PEBs.
> Only 1 EB(erase block) is used, then free it, and repeat the process
> forvever. According to the implementation in
> ubi_wl_get_peb->wl_get_wle->find_mean_wl_entry, the ubi->free tree will
> change like:
> (1,1,1) -> (1,1,2) -> (1,2,2) -> (1,2,3) -> (1,3,3) -> ... -> (1,4,5) ->
> (2,4,5), which means that wear-leveling worker is not needed, because
> find_mean_wl_entry has made sure that 'max_ec - min_ec <= 2 *
> CONFIG_MTD_UBI_WL_THRESHOLD'. Similar, when there are more EBs are used
> and freed, there is no need to trigger wear-leveling work.
> However, if one free PEB is taken and not be freed, other free PEBs are
> taken and freed, like:
> ubi->used: 1
> ubi->free: (1, 1) -> (1, 2) -> ... (x, y)
> After a while, min(x,y) - 1 will be greater than
> 2*CONFIG_MTD_UBI_WL_THRESHOLD. Then, wear-leveling should start working,
> cold data always takes certain PEB for a long time, which causes PEBs
> with smaller ec counter gather in ubi->used and PEBs with bigger ec
> counter gather in ubi->freed.
> I think that's the explainations of the rule how to pick e1/e2.
> .
> 


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-02-23 10:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-25 11:48 ubifs: master area fails to recover when master node 1 is corrupted Ryder Wang
2024-01-26  2:20 ` Zhihao Cheng
2024-01-27  7:21   ` Ryder Wang
2024-01-27  9:39     ` Zhihao Cheng
2024-01-27 10:21       ` Zhihao Cheng
     [not found] ` <MEYP282MB3164D303B64CFF576D2FD741BF552@MEYP282MB3164.AUSP282.PROD.OUTLOOK.COM>
2024-02-23  7:11   ` ubi: why ubi wear leveling always pick highly worn-out free physical eraseblock as the mov_to target Zhihao Cheng
2024-02-23 10:14     ` ubi: why ubi wear leveling always pick highly worn-out free physical eraseblock Ryder Wang
2024-02-23 10:41       ` Zhihao Cheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.