* ubifs: master area fails to recover when master node 1 is corrupted
@ 2024-01-25 11:48 Ryder Wang
2024-01-26 2:20 ` Zhihao Cheng
[not found] ` <MEYP282MB3164D303B64CFF576D2FD741BF552@MEYP282MB3164.AUSP282.PROD.OUTLOOK.COM>
0 siblings, 2 replies; 8+ messages in thread
From: Ryder Wang @ 2024-01-25 11:48 UTC (permalink / raw)
To: linux-mtd
Hi,
I just find that master area will always fail to recover while mounting, when master node 1's CRC is corrupted but master node 2 is completely good. It can be 100% reproduced on Kernel v5.4.233, but it seems a common issue.
How to reproduce it:
1. Corrupt the CRC value of master node 1 (keep master node 2 is good) on ubifs.
2. Mount this ubifs.
Mount at step#2 will always fail. From the log, it looks master recovering fails, but master recovering is expected to be OK in such case.
Below is the kernel log of this failure:
ubifs_mount:2253: UBIFS DBG gen (pid 10770): name ubi0:test_volume, flags 0x0
ubifs_mount:2274: UBIFS DBG gen (pid 10770): opened ubi0_0
ubifs_read_node:1094: UBIFS DBG io (pid 10770): LEB 0:0, superblock node, length 4096
UBIFS (ubi0:0): Mounting in unauthenticated mode
ubifs_read_superblock:765: UBIFS DBG mnt (pid 10770): Auto resizing from 13 LEBs to 100 LEBs
ubifs_start_scan:131: UBIFS DBG scan (pid 10770): scan LEB 1:0
ubifs_scan:270: UBIFS DBG scan (pid 10770): look at LEB 1:0 (253952 bytes left)
ubifs_scan_a_node:77: UBIFS DBG scan (pid 10770): scanning master node at LEB 1:0
UBIFS error (ubi0:0 pid 10770): ubifs_scan [ubifs]: bad node
ubifs_recover_master_node:234: UBIFS DBG rcvry (pid 10770): recovery
ubifs_scan_a_node:77: UBIFS DBG scan (pid 10770): scanning master node at LEB 1:0
get_master_node:163: UBIFS DBG rcvry (pid 10770): found corruption at 1:0
ubifs_scan_a_node:77: UBIFS DBG scan (pid 10770): scanning master node at LEB 2:0
get_master_node:152: UBIFS DBG rcvry (pid 10770): found a master node at 2:0
UBIFS error (ubi0:0 pid 10770): ubifs_recover_master_node [ubifs]: failed to recover master node
UBIFS error (ubi0:0 pid 10770): ubifs_recover_master_node [ubifs]: dumping second master node
UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 10772
magic 0x6101831
crc 0x3a5c03b2
node_type 7 (master node)
group_type 0 (no node group)
sqnum 9
len 512
highest_inum 65
commit number 0
flags 0x2
log_lnum 3
root_lnum 12
root_offs 0
root_len 108
gc_lnum 11
ihead_lnum 12
ihead_offs 4096
index_size 112
lpt_lnum 7
lpt_offs 44
nhead_lnum 7
nhead_offs 4096
ltab_lnum 7
ltab_offs 57
lsave_lnum 0
lsave_offs 0
lscan_lnum 10
leb_cnt 13
empty_lebs 1
idx_lebs 1
total_free 753664
total_dirty 7640
total_used 440
total_dead 0
total_dark 16384
UBIFS (ubi0:0): background thread "ubifs_bgt0_0" stops
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: ubifs: master area fails to recover when master node 1 is corrupted
2024-01-25 11:48 ubifs: master area fails to recover when master node 1 is corrupted Ryder Wang
@ 2024-01-26 2:20 ` Zhihao Cheng
2024-01-27 7:21 ` Ryder Wang
[not found] ` <MEYP282MB3164D303B64CFF576D2FD741BF552@MEYP282MB3164.AUSP282.PROD.OUTLOOK.COM>
1 sibling, 1 reply; 8+ messages in thread
From: Zhihao Cheng @ 2024-01-26 2:20 UTC (permalink / raw)
To: Ryder Wang, linux-mtd
在 2024/1/25 19:48, Ryder Wang 写道:
> Hi,
>
> I just find that master area will always fail to recover while mounting, when master node 1's CRC is corrupted but master node 2 is completely good. It can be 100% reproduced on Kernel v5.4.233, but it seems a common issue.
>
According to the debug messages below, the mounting failure occurs as
follows:
LEB 1 LEB 2
|mst1 | 0xFF 0xFF ... | |mst2 | 0xFF 0xFF ... |
offset 0 0
* mst1 has bad crc.
ubifs_recover_master_node
get_master_node(UBIFS_MST_LNUM, &mst1)
ubifs_scan_a_node(buf, lnum, offs=0) // SCANNED_A_CORRUPT_NODE
ubifs_check_node // -EUCLEAN, caused by bad crc
if (offs < c->leb_size) // true
if (!is_empty(buf, min_t(int, len, sz))) // true
dbg_rcvry("found corruption at %d:%d")
get_master_node(UBIFS_MST_LNUM + 1, &buf2, &mst2)
ubifs_scan_a_node // SCANNED_A_NODE
*mst = buf // buf = sbuf
buf2 = sbuf
if (mst1) // false
else {
offs2 = (void *)mst2 - buf2; // offs2 = 0
if (offs2 + sz + sz <= c->leb_size) // true, mst2 is the first node
in LEB 2
goto out_err
}
Above process is one situation recovering master nodes after powercut,
which means that LEB 1 is unmapped and ready to be written the newest
master node, then powercut happens:
ubifs_write_master
lnum = UBIFS_MST_LNUM; // LEB 1
if (offs + UBIFS_MST_NODE_SZ > c->leb_size) // true
err = ubifs_leb_unmap(c, lnum);
>> powercut <<
err = ubifs_write_node_hmac(c->mst_node, lnum)
So master node from LEB 2 can only be recovered in condition that there
is no room left for new master nodes in LEB 2.
Now, the problem is that we corrupt mst1 to construct this situation,
UBIFS identifies that the fact is not the expected situation, UBIFS
refuses to recover master nodes.
> How to reproduce it:
> 1. Corrupt the CRC value of master node 1 (keep master node 2 is good) on ubifs.
> 2. Mount this ubifs.
>
> Mount at step#2 will always fail. From the log, it looks master recovering fails, but master recovering is expected to be OK in such case.
Master node is not expected to be OK in this situation. These two master
nodes are not used to recovery in any situations, they are used to find
a valid version of master node. You can refer to following section in [1]:
"The master node stores the position of all on-flash structures ... The
first is that there could be a loss of power at the same instant that
the master node is being written. The second is that there could be
degradation or corruption of the flash media itself. ... In the second
case, recovery is not possible because it cannot be determined reliably
what is a valid master node version."
[1] http://linux-mtd.infradead.org/doc/ubifs_whitepaper.pdf
>
> Below is the kernel log of this failure:
>
> ubifs_mount:2253: UBIFS DBG gen (pid 10770): name ubi0:test_volume, flags 0x0
> ubifs_mount:2274: UBIFS DBG gen (pid 10770): opened ubi0_0
> ubifs_read_node:1094: UBIFS DBG io (pid 10770): LEB 0:0, superblock node, length 4096
> UBIFS (ubi0:0): Mounting in unauthenticated mode
> ubifs_read_superblock:765: UBIFS DBG mnt (pid 10770): Auto resizing from 13 LEBs to 100 LEBs
> ubifs_start_scan:131: UBIFS DBG scan (pid 10770): scan LEB 1:0
> ubifs_scan:270: UBIFS DBG scan (pid 10770): look at LEB 1:0 (253952 bytes left)
> ubifs_scan_a_node:77: UBIFS DBG scan (pid 10770): scanning master node at LEB 1:0
> UBIFS error (ubi0:0 pid 10770): ubifs_scan [ubifs]: bad node
> ubifs_recover_master_node:234: UBIFS DBG rcvry (pid 10770): recovery
> ubifs_scan_a_node:77: UBIFS DBG scan (pid 10770): scanning master node at LEB 1:0
> get_master_node:163: UBIFS DBG rcvry (pid 10770): found corruption at 1:0
> ubifs_scan_a_node:77: UBIFS DBG scan (pid 10770): scanning master node at LEB 2:0
> get_master_node:152: UBIFS DBG rcvry (pid 10770): found a master node at 2:0
> UBIFS error (ubi0:0 pid 10770): ubifs_recover_master_node [ubifs]: failed to recover master node
> UBIFS error (ubi0:0 pid 10770): ubifs_recover_master_node [ubifs]: dumping second master node
> UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 10772
> magic 0x6101831
> crc 0x3a5c03b2
> node_type 7 (master node)
> group_type 0 (no node group)
> sqnum 9
> len 512
> highest_inum 65
> commit number 0
> flags 0x2
> log_lnum 3
> root_lnum 12
> root_offs 0
> root_len 108
> gc_lnum 11
> ihead_lnum 12
> ihead_offs 4096
> index_size 112
> lpt_lnum 7
> lpt_offs 44
> nhead_lnum 7
> nhead_offs 4096
> ltab_lnum 7
> ltab_offs 57
> lsave_lnum 0
> lsave_offs 0
> lscan_lnum 10
> leb_cnt 13
> empty_lebs 1
> idx_lebs 1
> total_free 753664
> total_dirty 7640
> total_used 440
> total_dead 0
> total_dark 16384
> UBIFS (ubi0:0): background thread "ubifs_bgt0_0" stops
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
> .
>
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: ubifs: master area fails to recover when master node 1 is corrupted
2024-01-26 2:20 ` Zhihao Cheng
@ 2024-01-27 7:21 ` Ryder Wang
2024-01-27 9:39 ` Zhihao Cheng
0 siblings, 1 reply; 8+ messages in thread
From: Ryder Wang @ 2024-01-27 7:21 UTC (permalink / raw)
To: Zhihao Cheng, linux-mtd
Hi Zhihao,
Your explanation is very professional. Thanks for it.
But I still have a doubt about the code logic:
------------------------------------
if (mst1) // false
else {
offs2 = (void *)mst2 - buf2; // offs2 = 0
if (offs2 + sz + sz <= c->leb_size) // true, mst2 is the first node in LEB 2
goto out_err
}
------------------------------------
1. My testing result just proved that CRC-corrupted master#1 also runs to "else" clause of the code above, just like master#1 is unmapped.
2. For CRC corrupted master#1 case, the code logic looks inconsistent:
2.1. If master#2 LEB is just to be full, master#2 will be used to recover master area.
2.2. If master#2 LEB is not to be full, master recovering will be aborted with error.
I think whether master#2 LEB is to be full has nothing to do with whether to recover master area in such case. How do you think about it?
________________________________________
From: Zhihao Cheng <chengzhihao1@huawei.com>
Sent: Friday, January 26, 2024 10:20
To: Ryder Wang; linux-mtd@lists.infradead.org
Subject: Re: ubifs: master area fails to recover when master node 1 is corrupted
在 2024/1/25 19:48, Ryder Wang 写道:
> Hi,
>
> I just find that master area will always fail to recover while mounting, when master node 1's CRC is corrupted but master node 2 is completely good. It can be 100% reproduced on Kernel v5.4.233, but it seems a common issue.
>
According to the debug messages below, the mounting failure occurs as
follows:
LEB 1 LEB 2
|mst1 | 0xFF 0xFF ... | |mst2 | 0xFF 0xFF ... |
offset 0 0
* mst1 has bad crc.
ubifs_recover_master_node
get_master_node(UBIFS_MST_LNUM, &mst1)
ubifs_scan_a_node(buf, lnum, offs=0) // SCANNED_A_CORRUPT_NODE
ubifs_check_node // -EUCLEAN, caused by bad crc
if (offs < c->leb_size) // true
if (!is_empty(buf, min_t(int, len, sz))) // true
dbg_rcvry("found corruption at %d:%d")
get_master_node(UBIFS_MST_LNUM + 1, &buf2, &mst2)
ubifs_scan_a_node // SCANNED_A_NODE
*mst = buf // buf = sbuf
buf2 = sbuf
if (mst1) // false
else {
offs2 = (void *)mst2 - buf2; // offs2 = 0
if (offs2 + sz + sz <= c->leb_size) // true, mst2 is the first node
in LEB 2
goto out_err
}
Above process is one situation recovering master nodes after powercut,
which means that LEB 1 is unmapped and ready to be written the newest
master node, then powercut happens:
ubifs_write_master
lnum = UBIFS_MST_LNUM; // LEB 1
if (offs + UBIFS_MST_NODE_SZ > c->leb_size) // true
err = ubifs_leb_unmap(c, lnum);
>> powercut <<
err = ubifs_write_node_hmac(c->mst_node, lnum)
So master node from LEB 2 can only be recovered in condition that there
is no room left for new master nodes in LEB 2.
Now, the problem is that we corrupt mst1 to construct this situation,
UBIFS identifies that the fact is not the expected situation, UBIFS
refuses to recover master nodes.
> How to reproduce it:
> 1. Corrupt the CRC value of master node 1 (keep master node 2 is good) on ubifs.
> 2. Mount this ubifs.
>
> Mount at step#2 will always fail. From the log, it looks master recovering fails, but master recovering is expected to be OK in such case.
Master node is not expected to be OK in this situation. These two master
nodes are not used to recovery in any situations, they are used to find
a valid version of master node. You can refer to following section in [1]:
"The master node stores the position of all on-flash structures ... The
first is that there could be a loss of power at the same instant that
the master node is being written. The second is that there could be
degradation or corruption of the flash media itself. ... In the second
case, recovery is not possible because it cannot be determined reliably
what is a valid master node version."
[1] http://linux-mtd.infradead.org/doc/ubifs_whitepaper.pdf
>
> Below is the kernel log of this failure:
>
> ubifs_mount:2253: UBIFS DBG gen (pid 10770): name ubi0:test_volume, flags 0x0
> ubifs_mount:2274: UBIFS DBG gen (pid 10770): opened ubi0_0
> ubifs_read_node:1094: UBIFS DBG io (pid 10770): LEB 0:0, superblock node, length 4096
> UBIFS (ubi0:0): Mounting in unauthenticated mode
> ubifs_read_superblock:765: UBIFS DBG mnt (pid 10770): Auto resizing from 13 LEBs to 100 LEBs
> ubifs_start_scan:131: UBIFS DBG scan (pid 10770): scan LEB 1:0
> ubifs_scan:270: UBIFS DBG scan (pid 10770): look at LEB 1:0 (253952 bytes left)
> ubifs_scan_a_node:77: UBIFS DBG scan (pid 10770): scanning master node at LEB 1:0
> UBIFS error (ubi0:0 pid 10770): ubifs_scan [ubifs]: bad node
> ubifs_recover_master_node:234: UBIFS DBG rcvry (pid 10770): recovery
> ubifs_scan_a_node:77: UBIFS DBG scan (pid 10770): scanning master node at LEB 1:0
> get_master_node:163: UBIFS DBG rcvry (pid 10770): found corruption at 1:0
> ubifs_scan_a_node:77: UBIFS DBG scan (pid 10770): scanning master node at LEB 2:0
> get_master_node:152: UBIFS DBG rcvry (pid 10770): found a master node at 2:0
> UBIFS error (ubi0:0 pid 10770): ubifs_recover_master_node [ubifs]: failed to recover master node
> UBIFS error (ubi0:0 pid 10770): ubifs_recover_master_node [ubifs]: dumping second master node
> UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 10772
> magic 0x6101831
> crc 0x3a5c03b2
> node_type 7 (master node)
> group_type 0 (no node group)
> sqnum 9
> len 512
> highest_inum 65
> commit number 0
> flags 0x2
> log_lnum 3
> root_lnum 12
> root_offs 0
> root_len 108
> gc_lnum 11
> ihead_lnum 12
> ihead_offs 4096
> index_size 112
> lpt_lnum 7
> lpt_offs 44
> nhead_lnum 7
> nhead_offs 4096
> ltab_lnum 7
> ltab_offs 57
> lsave_lnum 0
> lsave_offs 0
> lscan_lnum 10
> leb_cnt 13
> empty_lebs 1
> idx_lebs 1
> total_free 753664
> total_dirty 7640
> total_used 440
> total_dead 0
> total_dark 16384
> UBIFS (ubi0:0): background thread "ubifs_bgt0_0" stops
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
> .
>
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: ubifs: master area fails to recover when master node 1 is corrupted
2024-01-27 7:21 ` Ryder Wang
@ 2024-01-27 9:39 ` Zhihao Cheng
2024-01-27 10:21 ` Zhihao Cheng
0 siblings, 1 reply; 8+ messages in thread
From: Zhihao Cheng @ 2024-01-27 9:39 UTC (permalink / raw)
To: Ryder Wang, linux-mtd
在 2024/1/27 15:21, Ryder Wang 写道:
> Hi Zhihao,
>
> Your explanation is very professional. Thanks for it.
>
> But I still have a doubt about the code logic:
> ------------------------------------
> if (mst1) // false
> else {
> offs2 = (void *)mst2 - buf2; // offs2 = 0
> if (offs2 + sz + sz <= c->leb_size) // true, mst2 is the first node in LEB 2
> goto out_err
> }
> ------------------------------------
> 1. My testing result just proved that CRC-corrupted master#1 also runs to "else" clause of the code above, just like master#1 is unmapped.
> 2. For CRC corrupted master#1 case, the code logic looks inconsistent:
> 2.1. If master#2 LEB is just to be full, master#2 will be used to recover master area.
> 2.2. If master#2 LEB is not to be full, master recovering will be aborted with error.
>
> I think whether master#2 LEB is to be full has nothing to do with whether to recover master area in such case. How do you think about it?
Actually, UBIFS can still work even if master#2 is recovered in such
case(master#1 is corrupted), because the master#2 is the newest version.
The offset checking for master#2 LEB being full is a way to make sure
that UBIFS can find the newest master node. If we simply remove the
checking, UBIFS could go wrong in some situations, for example:
Powercut happens before writing mst2_v2 on LEB2, so the UBIFS image
looks like:
LEB1 LEB2
|mst1_v1 | mst1_v2 |0xFF 0xFF ... | |mst2_v1 | 0xFF 0xFF ... |
The mast1_v2 is expected to be recovered after exeucting
ubifs_recover_master_node(). If both mst1_v1 and mst1_v2 are corrupted,
UBIFS will enter into this branch:
if (mst1) // false
else {
offs2 = (void *)mst2 - buf2; // offs2 = 0
if (offs2 + sz + sz <= c->leb_size) // offset checking
goto out_err
mst = mst2;
}
If the offset checking is removed, mst_2_v1 is recovered, apperantly,
UBIFS picks wrong master node and it's not right.
So accodring to the realization of ubifs_recover_master_node(), UBFIS
choose the newest master node by various offset checking, it's just like
a whitelist of situations that UBIFS can fully trust on, other
situations are failure pathes, although some of failure pathes can make
UBIFS still get a right master node in some(not all) cases.
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: ubifs: master area fails to recover when master node 1 is corrupted
2024-01-27 9:39 ` Zhihao Cheng
@ 2024-01-27 10:21 ` Zhihao Cheng
0 siblings, 0 replies; 8+ messages in thread
From: Zhihao Cheng @ 2024-01-27 10:21 UTC (permalink / raw)
To: Ryder Wang, linux-mtd
在 2024/1/27 17:39, Zhihao Cheng 写道:
> 在 2024/1/27 15:21, Ryder Wang 写道:
>> Hi Zhihao,
>>
>> Your explanation is very professional. Thanks for it.
>>
>> But I still have a doubt about the code logic:
>> ------------------------------------
>> if (mst1) // false
>> else {
>> offs2 = (void *)mst2 - buf2; // offs2 = 0
>> if (offs2 + sz + sz <= c->leb_size) // true, mst2 is the first
>> node in LEB 2
>> goto out_err
>> }
>> ------------------------------------
>> 1. My testing result just proved that CRC-corrupted master#1 also runs
>> to "else" clause of the code above, just like master#1 is unmapped.
>> 2. For CRC corrupted master#1 case, the code logic looks inconsistent:
>> 2.1. If master#2 LEB is just to be full, master#2 will be used to
>> recover master area.
>> 2.2. If master#2 LEB is not to be full, master recovering will be
>> aborted with error.
>>
>> I think whether master#2 LEB is to be full has nothing to do with
>> whether to recover master area in such case. How do you think about it?
>
>
> Actually, UBIFS can still work even if master#2 is recovered in such
> case(master#1 is corrupted), because the master#2 is the newest version.
> The offset checking for master#2 LEB being full is a way to make sure
> that UBIFS can find the newest master node. If we simply remove the
> checking, UBIFS could go wrong in some situations, for example:
>
> Powercut happens before writing mst2_v2 on LEB2, so the UBIFS image
> looks like:
> LEB1 LEB2
> |mst1_v1 | mst1_v2 |0xFF 0xFF ... | |mst2_v1 | 0xFF 0xFF ... |
>
> The mast1_v2 is expected to be recovered after exeucting
> ubifs_recover_master_node(). If both mst1_v1 and mst1_v2 are corrupted,
> UBIFS will enter into this branch:
>
> if (mst1) // false
> else {
> offs2 = (void *)mst2 - buf2; // offs2 = 0
> if (offs2 + sz + sz <= c->leb_size) // offset checking
> goto out_err
> mst = mst2;
> }
> If the offset checking is removed, mst_2_v1 is recovered, apperantly,
> UBIFS picks wrong master node and it's not right.
>
> So accodring to the realization of ubifs_recover_master_node(), UBFIS
> choose the newest master node by various offset checking, it's just like
> a whitelist of situations that UBIFS can fully trust on, other
> situations are failure pathes, although some of failure pathes can make
> UBIFS still get a right master node in some(not all) cases.
>
Besides, if there are corruptions in UBIFS image, UBIFS should report
error, there is nothing that UBIFS can do to fix them.
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: ubi: why ubi wear leveling always pick highly worn-out free physical eraseblock as the mov_to target
[not found] ` <MEYP282MB3164D303B64CFF576D2FD741BF552@MEYP282MB3164.AUSP282.PROD.OUTLOOK.COM>
@ 2024-02-23 7:11 ` Zhihao Cheng
2024-02-23 10:14 ` ubi: why ubi wear leveling always pick highly worn-out free physical eraseblock Ryder Wang
0 siblings, 1 reply; 8+ messages in thread
From: Zhihao Cheng @ 2024-02-23 7:11 UTC (permalink / raw)
To: Ryder Wang, linux-mtd, richard
在 2024/2/23 11:27, Ryder Wang 写道:
> Refer to the ubi source code:
> wear_leveling_worker
> e2 = get_peb_for_wl(ubi)
> e = find_wl_entry(ubi, &ubi->free, WL_FREE_MAX_DIFF);
>
> The function find_wl_entry() always find the highly worn-out free physical eraseblock (e2):
> 1. It's good to check such PEB (e2) to decide whether the following wear leveling procedure should be continued according to UBI_WL_THRESHOLD.
> 2. But personally I can't understand why such high worn-out free physical eraseblock should also be used as target PEB(move_to) to store the move_from data for wear leveling purpose. Will it be much more reasonable to use low worn-out free physical eraseblock (from ubi->free tree) in this case for more perfect wear leveling?.
>
Normally, e1 is the smaller ec counter picked from ubi->used, e2 is
bigger ec counter picked from ubi->free, the wear-leveling worker
follows that rule which is based on the realization of free PEBs fetching.
First, let's talk about the simplest case without fastmap, assumpt that
CONFIG_MTD_UBI_WL_THRESHOLD=2, there are total 3 PEBs.
Only 1 EB(erase block) is used, then free it, and repeat the process
forvever. According to the implementation in
ubi_wl_get_peb->wl_get_wle->find_mean_wl_entry, the ubi->free tree will
change like:
(1,1,1) -> (1,1,2) -> (1,2,2) -> (1,2,3) -> (1,3,3) -> ... -> (1,4,5) ->
(2,4,5), which means that wear-leveling worker is not needed, because
find_mean_wl_entry has made sure that 'max_ec - min_ec <= 2 *
CONFIG_MTD_UBI_WL_THRESHOLD'. Similar, when there are more EBs are used
and freed, there is no need to trigger wear-leveling work.
However, if one free PEB is taken and not be freed, other free PEBs are
taken and freed, like:
ubi->used: 1
ubi->free: (1, 1) -> (1, 2) -> ... (x, y)
After a while, min(x,y) - 1 will be greater than
2*CONFIG_MTD_UBI_WL_THRESHOLD. Then, wear-leveling should start working,
cold data always takes certain PEB for a long time, which causes PEBs
with smaller ec counter gather in ubi->used and PEBs with bigger ec
counter gather in ubi->freed.
I think that's the explainations of the rule how to pick e1/e2.
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: ubi: why ubi wear leveling always pick highly worn-out free physical eraseblock
2024-02-23 7:11 ` ubi: why ubi wear leveling always pick highly worn-out free physical eraseblock as the mov_to target Zhihao Cheng
@ 2024-02-23 10:14 ` Ryder Wang
2024-02-23 10:41 ` Zhihao Cheng
0 siblings, 1 reply; 8+ messages in thread
From: Ryder Wang @ 2024-02-23 10:14 UTC (permalink / raw)
To: Zhihao Cheng, linux-mtd, richard
Hi Zhihao,
Thanks for your reply.
You explained why e1 and e2 use picked to compare with CONFIG_MTD_UBI_WL_THRESHOLD to decide whether WL is needed. That's right as I also mentioned. However, the point is that why such high worn-out free physical eraseblock (in ubi->free tree) should be used to store the data of e1? High worn-out PEB should always be avoided to use (low worn-out PEB of ubi->free tree should be preferred), right?
________________________________________
From: Zhihao Cheng <chengzhihao1@huawei.com>
Sent: Friday, February 23, 2024 15:11
To: Ryder Wang; linux-mtd@lists.infradead.org; richard@nod.at
Subject: Re: ubi: why ubi wear leveling always pick highly worn-out free physical eraseblock as the mov_to target
在 2024/2/23 11:27, Ryder Wang 写道:
> Refer to the ubi source code:
> wear_leveling_worker
> e2 = get_peb_for_wl(ubi)
> e = find_wl_entry(ubi, &ubi->free, WL_FREE_MAX_DIFF);
>
> The function find_wl_entry() always find the highly worn-out free physical eraseblock (e2):
> 1. It's good to check such PEB (e2) to decide whether the following wear leveling procedure should be continued according to UBI_WL_THRESHOLD.
> 2. But personally I can't understand why such high worn-out free physical eraseblock should also be used as target PEB(move_to) to store the move_from data for wear leveling purpose. Will it be much more reasonable to use low worn-out free physical eraseblock (from ubi->free tree) in this case for more perfect wear leveling?.
>
Normally, e1 is the smaller ec counter picked from ubi->used, e2 is
bigger ec counter picked from ubi->free, the wear-leveling worker
follows that rule which is based on the realization of free PEBs fetching.
First, let's talk about the simplest case without fastmap, assumpt that
CONFIG_MTD_UBI_WL_THRESHOLD=2, there are total 3 PEBs.
Only 1 EB(erase block) is used, then free it, and repeat the process
forvever. According to the implementation in
ubi_wl_get_peb->wl_get_wle->find_mean_wl_entry, the ubi->free tree will
change like:
(1,1,1) -> (1,1,2) -> (1,2,2) -> (1,2,3) -> (1,3,3) -> ... -> (1,4,5) ->
(2,4,5), which means that wear-leveling worker is not needed, because
find_mean_wl_entry has made sure that 'max_ec - min_ec <= 2 *
CONFIG_MTD_UBI_WL_THRESHOLD'. Similar, when there are more EBs are used
and freed, there is no need to trigger wear-leveling work.
However, if one free PEB is taken and not be freed, other free PEBs are
taken and freed, like:
ubi->used: 1
ubi->free: (1, 1) -> (1, 2) -> ... (x, y)
After a while, min(x,y) - 1 will be greater than
2*CONFIG_MTD_UBI_WL_THRESHOLD. Then, wear-leveling should start working,
cold data always takes certain PEB for a long time, which causes PEBs
with smaller ec counter gather in ubi->used and PEBs with bigger ec
counter gather in ubi->freed.
I think that's the explainations of the rule how to pick e1/e2.
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: ubi: why ubi wear leveling always pick highly worn-out free physical eraseblock
2024-02-23 10:14 ` ubi: why ubi wear leveling always pick highly worn-out free physical eraseblock Ryder Wang
@ 2024-02-23 10:41 ` Zhihao Cheng
0 siblings, 0 replies; 8+ messages in thread
From: Zhihao Cheng @ 2024-02-23 10:41 UTC (permalink / raw)
To: Ryder Wang, linux-mtd, richard
在 2024/2/23 18:14, Ryder Wang 写道:
> Hi Zhihao,
>
> Thanks for your reply.
>
> You explained why e1 and e2 use picked to compare with CONFIG_MTD_UBI_WL_THRESHOLD to decide whether WL is needed. That's right as I also mentioned. However, the point is that why such high worn-out free physical eraseblock (in ubi->free tree) should be used to store the data of e1? High worn-out PEB should always be avoided to use (low worn-out PEB of ubi->free tree should be preferred), right?
>
Since e1 has smaller ec counter. because it comes from ubi->used tree
and likely holds cold data, so it won't be erased for a long time. If we
choose low worn-out PEB from ubi->free tree as e2, the high worn-out PEB
can still be used to store new data and be erased again, which will
aggravate the worn-out of high worn-out PEB, but actually the high
worn-out PEB should not be scheduled for using. So, the best choice is
that make high worn-out PEB hold cold data, which can stop frequently
erasing worn-out PEB.
> ________________________________________
> From: Zhihao Cheng <chengzhihao1@huawei.com>
> Sent: Friday, February 23, 2024 15:11
> To: Ryder Wang; linux-mtd@lists.infradead.org; richard@nod.at
> Subject: Re: ubi: why ubi wear leveling always pick highly worn-out free physical eraseblock as the mov_to target
>
> 在 2024/2/23 11:27, Ryder Wang 写道:
>> Refer to the ubi source code:
>> wear_leveling_worker
>> e2 = get_peb_for_wl(ubi)
>> e = find_wl_entry(ubi, &ubi->free, WL_FREE_MAX_DIFF);
>>
>> The function find_wl_entry() always find the highly worn-out free physical eraseblock (e2):
>> 1. It's good to check such PEB (e2) to decide whether the following wear leveling procedure should be continued according to UBI_WL_THRESHOLD.
>> 2. But personally I can't understand why such high worn-out free physical eraseblock should also be used as target PEB(move_to) to store the move_from data for wear leveling purpose. Will it be much more reasonable to use low worn-out free physical eraseblock (from ubi->free tree) in this case for more perfect wear leveling?.
>>
>
> Normally, e1 is the smaller ec counter picked from ubi->used, e2 is
> bigger ec counter picked from ubi->free, the wear-leveling worker
> follows that rule which is based on the realization of free PEBs fetching.
> First, let's talk about the simplest case without fastmap, assumpt that
> CONFIG_MTD_UBI_WL_THRESHOLD=2, there are total 3 PEBs.
> Only 1 EB(erase block) is used, then free it, and repeat the process
> forvever. According to the implementation in
> ubi_wl_get_peb->wl_get_wle->find_mean_wl_entry, the ubi->free tree will
> change like:
> (1,1,1) -> (1,1,2) -> (1,2,2) -> (1,2,3) -> (1,3,3) -> ... -> (1,4,5) ->
> (2,4,5), which means that wear-leveling worker is not needed, because
> find_mean_wl_entry has made sure that 'max_ec - min_ec <= 2 *
> CONFIG_MTD_UBI_WL_THRESHOLD'. Similar, when there are more EBs are used
> and freed, there is no need to trigger wear-leveling work.
> However, if one free PEB is taken and not be freed, other free PEBs are
> taken and freed, like:
> ubi->used: 1
> ubi->free: (1, 1) -> (1, 2) -> ... (x, y)
> After a while, min(x,y) - 1 will be greater than
> 2*CONFIG_MTD_UBI_WL_THRESHOLD. Then, wear-leveling should start working,
> cold data always takes certain PEB for a long time, which causes PEBs
> with smaller ec counter gather in ubi->used and PEBs with bigger ec
> counter gather in ubi->freed.
> I think that's the explainations of the rule how to pick e1/e2.
> .
>
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-02-23 10:43 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-25 11:48 ubifs: master area fails to recover when master node 1 is corrupted Ryder Wang
2024-01-26 2:20 ` Zhihao Cheng
2024-01-27 7:21 ` Ryder Wang
2024-01-27 9:39 ` Zhihao Cheng
2024-01-27 10:21 ` Zhihao Cheng
[not found] ` <MEYP282MB3164D303B64CFF576D2FD741BF552@MEYP282MB3164.AUSP282.PROD.OUTLOOK.COM>
2024-02-23 7:11 ` ubi: why ubi wear leveling always pick highly worn-out free physical eraseblock as the mov_to target Zhihao Cheng
2024-02-23 10:14 ` ubi: why ubi wear leveling always pick highly worn-out free physical eraseblock Ryder Wang
2024-02-23 10:41 ` Zhihao Cheng
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.