All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 00/15] Some bugfixs for ubi/ubifs
@ 2021-12-27  3:22 ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

v1->v2:
  1. Add new fix for ubifs, "ubifs: Fix to add refcount once page is set
  private"
  2. Update "ubifs: Rename whiteout atomically":
     1) Move inode mode in create_whiteout()
     2) Don't check O_SYNC for whiteout, because it inherits from the old_dir
     3) Remove useless 'synced_i_size ' assignment for whiteout, because
	it's always be zero.
     4) Remove unused variable 'ui' in create_whiteout()
  3. Update "ubifs: setflags: Make dirtied_ino_d 8 bytes aligned":
     1) Align dirtied_ino_d with 8 bytes.

v2->v3:
  1. Update "ubifs: Rename whiteout atomically":
     1) Fix misspelling 'have already check the old dir inode' ->
        'have already checked the old dir inode'
     2) Fix misspelling "Whiteout don't have non-zero size" ->
        "Whiteout  have non-zero size"
  2. Update "ubifs: Fix to add refcount once page is set private"
     1) Fix commit message to explain the root cause.
  3. Update "ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when
     fm->used_blocks>=2"
     1) Add fastmap used pebs into 'ai' in for-loop, rather than in
        two-steps(Add pebs [pnum<UBI_FM_MAX_START] then add pebs
	[pnum>=UBI_FM_MAX_START] into 'ai').

v3->v4:
  1. Update "ubifs: Add missing iput if do_tmpfile() failed in rename whiteout":
     1) Move whiteout cleanup into do_tmpfile() according to Sascha's advice
  2. Add new fix for ubifs, "ubifs: ubifs_writepage: Mark page dirty after
     writing inode failed"

v4->v5:
  1. Add new fix for ubifs, "ubifs: ubifs_releasepage: Remove ubifs_assert(0)
     to valid this process"

v5->v6:
  1. Add new fix for ubi: "ubi: fastmap: Fix high cpu usage of ubi_bgt by
     making sure wl_pool not empty"

Zhihao Cheng (15):
  ubifs: rename_whiteout: Fix double free for whiteout_ui->data
  ubifs: Fix deadlock in concurrent rename whiteout and inode writeback
  ubifs: Fix wrong number of inodes locked by ui_mutex in ubifs_inode
    comment
  ubifs: Add missing iput if do_tmpfile() failed in rename whiteout
  ubifs: Rename whiteout atomically
  ubifs: Fix 'ui->dirty' race between do_tmpfile() and writeback work
  ubifs: Rectify space amount budget for mkdir/tmpfile operations
  ubifs: setflags: Make dirtied_ino_d 8 bytes aligned
  ubifs: Fix read out-of-bounds in ubifs_wbuf_write_nolock()
  ubifs: Fix to add refcount once page is set private
  ubi: fastmap: Return error code if memory allocation fails in
    add_aeb()
  ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when
    fm->used_blocks>=2
  ubifs: ubifs_writepage: Mark page dirty after writing inode failed
  ubifs: ubifs_releasepage: Remove ubifs_assert(0) to valid this process
  ubi: fastmap: Fix high cpu usage of ubi_bgt by making sure wl_pool not
    empty

 drivers/mtd/ubi/fastmap-wl.c |   5 +-
 drivers/mtd/ubi/fastmap.c    |  63 ++++------
 drivers/mtd/ubi/wl.h         |  11 +-
 fs/ubifs/dir.c               | 235 +++++++++++++++++++++--------------
 fs/ubifs/file.c              |  45 ++++---
 fs/ubifs/io.c                |  34 ++++-
 fs/ubifs/ioctl.c             |   2 +-
 fs/ubifs/journal.c           |  52 ++++++--
 fs/ubifs/ubifs.h             |   2 +-
 9 files changed, 284 insertions(+), 165 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v6 00/15] Some bugfixs for ubi/ubifs
@ 2021-12-27  3:22 ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

v1->v2:
  1. Add new fix for ubifs, "ubifs: Fix to add refcount once page is set
  private"
  2. Update "ubifs: Rename whiteout atomically":
     1) Move inode mode in create_whiteout()
     2) Don't check O_SYNC for whiteout, because it inherits from the old_dir
     3) Remove useless 'synced_i_size ' assignment for whiteout, because
	it's always be zero.
     4) Remove unused variable 'ui' in create_whiteout()
  3. Update "ubifs: setflags: Make dirtied_ino_d 8 bytes aligned":
     1) Align dirtied_ino_d with 8 bytes.

v2->v3:
  1. Update "ubifs: Rename whiteout atomically":
     1) Fix misspelling 'have already check the old dir inode' ->
        'have already checked the old dir inode'
     2) Fix misspelling "Whiteout don't have non-zero size" ->
        "Whiteout  have non-zero size"
  2. Update "ubifs: Fix to add refcount once page is set private"
     1) Fix commit message to explain the root cause.
  3. Update "ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when
     fm->used_blocks>=2"
     1) Add fastmap used pebs into 'ai' in for-loop, rather than in
        two-steps(Add pebs [pnum<UBI_FM_MAX_START] then add pebs
	[pnum>=UBI_FM_MAX_START] into 'ai').

v3->v4:
  1. Update "ubifs: Add missing iput if do_tmpfile() failed in rename whiteout":
     1) Move whiteout cleanup into do_tmpfile() according to Sascha's advice
  2. Add new fix for ubifs, "ubifs: ubifs_writepage: Mark page dirty after
     writing inode failed"

v4->v5:
  1. Add new fix for ubifs, "ubifs: ubifs_releasepage: Remove ubifs_assert(0)
     to valid this process"

v5->v6:
  1. Add new fix for ubi: "ubi: fastmap: Fix high cpu usage of ubi_bgt by
     making sure wl_pool not empty"

Zhihao Cheng (15):
  ubifs: rename_whiteout: Fix double free for whiteout_ui->data
  ubifs: Fix deadlock in concurrent rename whiteout and inode writeback
  ubifs: Fix wrong number of inodes locked by ui_mutex in ubifs_inode
    comment
  ubifs: Add missing iput if do_tmpfile() failed in rename whiteout
  ubifs: Rename whiteout atomically
  ubifs: Fix 'ui->dirty' race between do_tmpfile() and writeback work
  ubifs: Rectify space amount budget for mkdir/tmpfile operations
  ubifs: setflags: Make dirtied_ino_d 8 bytes aligned
  ubifs: Fix read out-of-bounds in ubifs_wbuf_write_nolock()
  ubifs: Fix to add refcount once page is set private
  ubi: fastmap: Return error code if memory allocation fails in
    add_aeb()
  ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when
    fm->used_blocks>=2
  ubifs: ubifs_writepage: Mark page dirty after writing inode failed
  ubifs: ubifs_releasepage: Remove ubifs_assert(0) to valid this process
  ubi: fastmap: Fix high cpu usage of ubi_bgt by making sure wl_pool not
    empty

 drivers/mtd/ubi/fastmap-wl.c |   5 +-
 drivers/mtd/ubi/fastmap.c    |  63 ++++------
 drivers/mtd/ubi/wl.h         |  11 +-
 fs/ubifs/dir.c               | 235 +++++++++++++++++++++--------------
 fs/ubifs/file.c              |  45 ++++---
 fs/ubifs/io.c                |  34 ++++-
 fs/ubifs/ioctl.c             |   2 +-
 fs/ubifs/journal.c           |  52 ++++++--
 fs/ubifs/ubifs.h             |   2 +-
 9 files changed, 284 insertions(+), 165 deletions(-)

-- 
2.31.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v6 01/15] ubifs: rename_whiteout: Fix double free for whiteout_ui->data
  2021-12-27  3:22 ` Zhihao Cheng
@ 2021-12-27  3:22   ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

'whiteout_ui->data' will be freed twice if space budget fail for
rename whiteout operation as following process:

rename_whiteout
  dev = kmalloc
  whiteout_ui->data = dev
  kfree(whiteout_ui->data)  // Free first time
  iput(whiteout)
    ubifs_free_inode
      kfree(ui->data)	    // Double free!

KASAN reports:
==================================================================
BUG: KASAN: double-free or invalid-free in ubifs_free_inode+0x4f/0x70
Call Trace:
  kfree+0x117/0x490
  ubifs_free_inode+0x4f/0x70 [ubifs]
  i_callback+0x30/0x60
  rcu_do_batch+0x366/0xac0
  __do_softirq+0x133/0x57f

Allocated by task 1506:
  kmem_cache_alloc_trace+0x3c2/0x7a0
  do_rename+0x9b7/0x1150 [ubifs]
  ubifs_rename+0x106/0x1f0 [ubifs]
  do_syscall_64+0x35/0x80

Freed by task 1506:
  kfree+0x117/0x490
  do_rename.cold+0x53/0x8a [ubifs]
  ubifs_rename+0x106/0x1f0 [ubifs]
  do_syscall_64+0x35/0x80

The buggy address belongs to the object at ffff88810238bed8 which
belongs to the cache kmalloc-8 of size 8
==================================================================

Let ubifs_free_inode() free 'whiteout_ui->data'. BTW, delete unused
assignment 'whiteout_ui->data_len = 0', process 'ubifs_evict_inode()
-> ubifs_jnl_delete_inode() -> ubifs_jnl_write_inode()' doesn't need it
(because 'inc_nlink(whiteout)' won't be excuted by 'goto out_release',
 and the nlink of whiteout inode is 0).

Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 fs/ubifs/dir.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 7c61d0ec0159..cfa8881d8cca 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -1425,8 +1425,6 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
 
 		err = ubifs_budget_space(c, &wht_req);
 		if (err) {
-			kfree(whiteout_ui->data);
-			whiteout_ui->data_len = 0;
 			iput(whiteout);
 			goto out_release;
 		}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 01/15] ubifs: rename_whiteout: Fix double free for whiteout_ui->data
@ 2021-12-27  3:22   ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

'whiteout_ui->data' will be freed twice if space budget fail for
rename whiteout operation as following process:

rename_whiteout
  dev = kmalloc
  whiteout_ui->data = dev
  kfree(whiteout_ui->data)  // Free first time
  iput(whiteout)
    ubifs_free_inode
      kfree(ui->data)	    // Double free!

KASAN reports:
==================================================================
BUG: KASAN: double-free or invalid-free in ubifs_free_inode+0x4f/0x70
Call Trace:
  kfree+0x117/0x490
  ubifs_free_inode+0x4f/0x70 [ubifs]
  i_callback+0x30/0x60
  rcu_do_batch+0x366/0xac0
  __do_softirq+0x133/0x57f

Allocated by task 1506:
  kmem_cache_alloc_trace+0x3c2/0x7a0
  do_rename+0x9b7/0x1150 [ubifs]
  ubifs_rename+0x106/0x1f0 [ubifs]
  do_syscall_64+0x35/0x80

Freed by task 1506:
  kfree+0x117/0x490
  do_rename.cold+0x53/0x8a [ubifs]
  ubifs_rename+0x106/0x1f0 [ubifs]
  do_syscall_64+0x35/0x80

The buggy address belongs to the object at ffff88810238bed8 which
belongs to the cache kmalloc-8 of size 8
==================================================================

Let ubifs_free_inode() free 'whiteout_ui->data'. BTW, delete unused
assignment 'whiteout_ui->data_len = 0', process 'ubifs_evict_inode()
-> ubifs_jnl_delete_inode() -> ubifs_jnl_write_inode()' doesn't need it
(because 'inc_nlink(whiteout)' won't be excuted by 'goto out_release',
 and the nlink of whiteout inode is 0).

Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 fs/ubifs/dir.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 7c61d0ec0159..cfa8881d8cca 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -1425,8 +1425,6 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
 
 		err = ubifs_budget_space(c, &wht_req);
 		if (err) {
-			kfree(whiteout_ui->data);
-			whiteout_ui->data_len = 0;
 			iput(whiteout);
 			goto out_release;
 		}
-- 
2.31.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 02/15] ubifs: Fix deadlock in concurrent rename whiteout and inode writeback
  2021-12-27  3:22 ` Zhihao Cheng
@ 2021-12-27  3:22   ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

Following hung tasks:
[   77.028764] task:kworker/u8:4    state:D stack:    0 pid:  132
[   77.028820] Call Trace:
[   77.029027]  schedule+0x8c/0x1b0
[   77.029067]  mutex_lock+0x50/0x60
[   77.029074]  ubifs_write_inode+0x68/0x1f0 [ubifs]
[   77.029117]  __writeback_single_inode+0x43c/0x570
[   77.029128]  writeback_sb_inodes+0x259/0x740
[   77.029148]  wb_writeback+0x107/0x4d0
[   77.029163]  wb_workfn+0x162/0x7b0

[   92.390442] task:aa              state:D stack:    0 pid: 1506
[   92.390448] Call Trace:
[   92.390458]  schedule+0x8c/0x1b0
[   92.390461]  wb_wait_for_completion+0x82/0xd0
[   92.390469]  __writeback_inodes_sb_nr+0xb2/0x110
[   92.390472]  writeback_inodes_sb_nr+0x14/0x20
[   92.390476]  ubifs_budget_space+0x705/0xdd0 [ubifs]
[   92.390503]  do_rename.cold+0x7f/0x187 [ubifs]
[   92.390549]  ubifs_rename+0x8b/0x180 [ubifs]
[   92.390571]  vfs_rename+0xdb2/0x1170
[   92.390580]  do_renameat2+0x554/0x770

, are caused by concurrent rename whiteout and inode writeback processes:
	rename_whiteout(Thread 1)	        wb_workfn(Thread2)
ubifs_rename
  do_rename
    lock_4_inodes (Hold ui_mutex)
    ubifs_budget_space
      make_free_space
        shrink_liability
	  __writeback_inodes_sb_nr
	    bdi_split_work_to_wbs (Queue new wb work)
					      wb_do_writeback(wb work)
						__writeback_single_inode
					          ubifs_write_inode
					            LOCK(ui_mutex)
							   ↑
	      wb_wait_for_completion (Wait wb work) <-- deadlock!

Reproducer (Detail program in [Link]):
  1. SYS_renameat2("/mp/dir/file", "/mp/dir/whiteout", RENAME_WHITEOUT)
  2. Consume out of space before kernel(mdelay) doing budget for whiteout

Fix it by doing whiteout space budget before locking ubifs inodes.
BTW, it also fixes wrong goto tag 'out_release' in whiteout budget
error handling path(It should at least recover dir i_size and unlock
4 ubifs inodes).

Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214733
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 fs/ubifs/dir.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index cfa8881d8cca..2735ad1affed 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -1324,6 +1324,7 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
 
 	if (flags & RENAME_WHITEOUT) {
 		union ubifs_dev_desc *dev = NULL;
+		struct ubifs_budget_req wht_req;
 
 		dev = kmalloc(sizeof(union ubifs_dev_desc), GFP_NOFS);
 		if (!dev) {
@@ -1345,6 +1346,20 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
 		whiteout_ui->data = dev;
 		whiteout_ui->data_len = ubifs_encode_dev(dev, MKDEV(0, 0));
 		ubifs_assert(c, !whiteout_ui->dirty);
+
+		memset(&wht_req, 0, sizeof(struct ubifs_budget_req));
+		wht_req.dirtied_ino = 1;
+		wht_req.dirtied_ino_d = ALIGN(whiteout_ui->data_len, 8);
+		/*
+		 * To avoid deadlock between space budget (holds ui_mutex and
+		 * waits wb work) and writeback work(waits ui_mutex), do space
+		 * budget before ubifs inodes locked.
+		 */
+		err = ubifs_budget_space(c, &wht_req);
+		if (err) {
+			iput(whiteout);
+			goto out_release;
+		}
 	}
 
 	lock_4_inodes(old_dir, new_dir, new_inode, whiteout);
@@ -1419,16 +1434,6 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
 	}
 
 	if (whiteout) {
-		struct ubifs_budget_req wht_req = { .dirtied_ino = 1,
-				.dirtied_ino_d = \
-				ALIGN(ubifs_inode(whiteout)->data_len, 8) };
-
-		err = ubifs_budget_space(c, &wht_req);
-		if (err) {
-			iput(whiteout);
-			goto out_release;
-		}
-
 		inc_nlink(whiteout);
 		mark_inode_dirty(whiteout);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 02/15] ubifs: Fix deadlock in concurrent rename whiteout and inode writeback
@ 2021-12-27  3:22   ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

Following hung tasks:
[   77.028764] task:kworker/u8:4    state:D stack:    0 pid:  132
[   77.028820] Call Trace:
[   77.029027]  schedule+0x8c/0x1b0
[   77.029067]  mutex_lock+0x50/0x60
[   77.029074]  ubifs_write_inode+0x68/0x1f0 [ubifs]
[   77.029117]  __writeback_single_inode+0x43c/0x570
[   77.029128]  writeback_sb_inodes+0x259/0x740
[   77.029148]  wb_writeback+0x107/0x4d0
[   77.029163]  wb_workfn+0x162/0x7b0

[   92.390442] task:aa              state:D stack:    0 pid: 1506
[   92.390448] Call Trace:
[   92.390458]  schedule+0x8c/0x1b0
[   92.390461]  wb_wait_for_completion+0x82/0xd0
[   92.390469]  __writeback_inodes_sb_nr+0xb2/0x110
[   92.390472]  writeback_inodes_sb_nr+0x14/0x20
[   92.390476]  ubifs_budget_space+0x705/0xdd0 [ubifs]
[   92.390503]  do_rename.cold+0x7f/0x187 [ubifs]
[   92.390549]  ubifs_rename+0x8b/0x180 [ubifs]
[   92.390571]  vfs_rename+0xdb2/0x1170
[   92.390580]  do_renameat2+0x554/0x770

, are caused by concurrent rename whiteout and inode writeback processes:
	rename_whiteout(Thread 1)	        wb_workfn(Thread2)
ubifs_rename
  do_rename
    lock_4_inodes (Hold ui_mutex)
    ubifs_budget_space
      make_free_space
        shrink_liability
	  __writeback_inodes_sb_nr
	    bdi_split_work_to_wbs (Queue new wb work)
					      wb_do_writeback(wb work)
						__writeback_single_inode
					          ubifs_write_inode
					            LOCK(ui_mutex)
							   ↑
	      wb_wait_for_completion (Wait wb work) <-- deadlock!

Reproducer (Detail program in [Link]):
  1. SYS_renameat2("/mp/dir/file", "/mp/dir/whiteout", RENAME_WHITEOUT)
  2. Consume out of space before kernel(mdelay) doing budget for whiteout

Fix it by doing whiteout space budget before locking ubifs inodes.
BTW, it also fixes wrong goto tag 'out_release' in whiteout budget
error handling path(It should at least recover dir i_size and unlock
4 ubifs inodes).

Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214733
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 fs/ubifs/dir.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index cfa8881d8cca..2735ad1affed 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -1324,6 +1324,7 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
 
 	if (flags & RENAME_WHITEOUT) {
 		union ubifs_dev_desc *dev = NULL;
+		struct ubifs_budget_req wht_req;
 
 		dev = kmalloc(sizeof(union ubifs_dev_desc), GFP_NOFS);
 		if (!dev) {
@@ -1345,6 +1346,20 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
 		whiteout_ui->data = dev;
 		whiteout_ui->data_len = ubifs_encode_dev(dev, MKDEV(0, 0));
 		ubifs_assert(c, !whiteout_ui->dirty);
+
+		memset(&wht_req, 0, sizeof(struct ubifs_budget_req));
+		wht_req.dirtied_ino = 1;
+		wht_req.dirtied_ino_d = ALIGN(whiteout_ui->data_len, 8);
+		/*
+		 * To avoid deadlock between space budget (holds ui_mutex and
+		 * waits wb work) and writeback work(waits ui_mutex), do space
+		 * budget before ubifs inodes locked.
+		 */
+		err = ubifs_budget_space(c, &wht_req);
+		if (err) {
+			iput(whiteout);
+			goto out_release;
+		}
 	}
 
 	lock_4_inodes(old_dir, new_dir, new_inode, whiteout);
@@ -1419,16 +1434,6 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
 	}
 
 	if (whiteout) {
-		struct ubifs_budget_req wht_req = { .dirtied_ino = 1,
-				.dirtied_ino_d = \
-				ALIGN(ubifs_inode(whiteout)->data_len, 8) };
-
-		err = ubifs_budget_space(c, &wht_req);
-		if (err) {
-			iput(whiteout);
-			goto out_release;
-		}
-
 		inc_nlink(whiteout);
 		mark_inode_dirty(whiteout);
 
-- 
2.31.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 03/15] ubifs: Fix wrong number of inodes locked by ui_mutex in ubifs_inode comment
  2021-12-27  3:22 ` Zhihao Cheng
@ 2021-12-27  3:22   ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

Since 9ec64962afb1702f75b("ubifs: Implement RENAME_EXCHANGE") and
9e0a1fff8db56eaaebb("ubifs: Implement RENAME_WHITEOUT") are applied,
ubifs_rename locks and changes 4 ubifs inodes, correct the comment
for ui_mutex in ubifs_inode.

Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 fs/ubifs/ubifs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h
index c38066ce9ab0..972e41daff01 100644
--- a/fs/ubifs/ubifs.h
+++ b/fs/ubifs/ubifs.h
@@ -372,7 +372,7 @@ struct ubifs_gced_idx_leb {
  * @ui_mutex exists for two main reasons. At first it prevents inodes from
  * being written back while UBIFS changing them, being in the middle of an VFS
  * operation. This way UBIFS makes sure the inode fields are consistent. For
- * example, in 'ubifs_rename()' we change 3 inodes simultaneously, and
+ * example, in 'ubifs_rename()' we change 4 inodes simultaneously, and
  * write-back must not write any of them before we have finished.
  *
  * The second reason is budgeting - UBIFS has to budget all operations. If an
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 03/15] ubifs: Fix wrong number of inodes locked by ui_mutex in ubifs_inode comment
@ 2021-12-27  3:22   ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

Since 9ec64962afb1702f75b("ubifs: Implement RENAME_EXCHANGE") and
9e0a1fff8db56eaaebb("ubifs: Implement RENAME_WHITEOUT") are applied,
ubifs_rename locks and changes 4 ubifs inodes, correct the comment
for ui_mutex in ubifs_inode.

Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 fs/ubifs/ubifs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h
index c38066ce9ab0..972e41daff01 100644
--- a/fs/ubifs/ubifs.h
+++ b/fs/ubifs/ubifs.h
@@ -372,7 +372,7 @@ struct ubifs_gced_idx_leb {
  * @ui_mutex exists for two main reasons. At first it prevents inodes from
  * being written back while UBIFS changing them, being in the middle of an VFS
  * operation. This way UBIFS makes sure the inode fields are consistent. For
- * example, in 'ubifs_rename()' we change 3 inodes simultaneously, and
+ * example, in 'ubifs_rename()' we change 4 inodes simultaneously, and
  * write-back must not write any of them before we have finished.
  *
  * The second reason is budgeting - UBIFS has to budget all operations. If an
-- 
2.31.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 04/15] ubifs: Add missing iput if do_tmpfile() failed in rename whiteout
  2021-12-27  3:22 ` Zhihao Cheng
@ 2021-12-27  3:22   ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

whiteout inode should be put when do_tmpfile() failed if inode has been
initialized. Otherwise we will get following warning during umount:
  UBIFS error (ubi0:0 pid 1494): ubifs_assert_failed [ubifs]: UBIFS
  assert failed: c->bi.dd_growth == 0, in fs/ubifs/super.c:1930
  VFS: Busy inodes after unmount of ubifs. Self-destruct in 5 seconds.

Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Suggested-by: Sascha Hauer <s.hauer@pengutronix.de>
---
 fs/ubifs/dir.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 2735ad1affed..2cbc5f05f671 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -432,6 +432,8 @@ static int do_tmpfile(struct inode *dir, struct dentry *dentry,
 	make_bad_inode(inode);
 	if (!instantiated)
 		iput(inode);
+	else if (whiteout)
+		iput(*whiteout);
 out_budg:
 	ubifs_release_budget(c, &req);
 	if (!instantiated)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 04/15] ubifs: Add missing iput if do_tmpfile() failed in rename whiteout
@ 2021-12-27  3:22   ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

whiteout inode should be put when do_tmpfile() failed if inode has been
initialized. Otherwise we will get following warning during umount:
  UBIFS error (ubi0:0 pid 1494): ubifs_assert_failed [ubifs]: UBIFS
  assert failed: c->bi.dd_growth == 0, in fs/ubifs/super.c:1930
  VFS: Busy inodes after unmount of ubifs. Self-destruct in 5 seconds.

Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Suggested-by: Sascha Hauer <s.hauer@pengutronix.de>
---
 fs/ubifs/dir.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 2735ad1affed..2cbc5f05f671 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -432,6 +432,8 @@ static int do_tmpfile(struct inode *dir, struct dentry *dentry,
 	make_bad_inode(inode);
 	if (!instantiated)
 		iput(inode);
+	else if (whiteout)
+		iput(*whiteout);
 out_budg:
 	ubifs_release_budget(c, &req);
 	if (!instantiated)
-- 
2.31.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 05/15] ubifs: Rename whiteout atomically
  2021-12-27  3:22 ` Zhihao Cheng
@ 2021-12-27  3:22   ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

Currently, rename whiteout has 3 steps:
  1. create tmpfile(which associates old dentry to tmpfile inode) for
     whiteout, and store tmpfile to disk
  2. link whiteout, associate whiteout inode to old dentry agagin and
     store old dentry, old inode, new dentry on disk
  3. writeback dirty whiteout inode to disk

Suddenly power-cut or error occurring(eg. ENOSPC returned by budget,
memory allocation failure) during above steps may cause kinds of problems:
  Problem 1: ENOSPC returned by whiteout space budget (before step 2),
	     old dentry will disappear after rename syscall, whiteout file
	     cannot be found either.

	     ls dir  // we get file, whiteout
	     rename(dir/file, dir/whiteout, REANME_WHITEOUT)
	     ENOSPC = ubifs_budget_space(&wht_req) // return
	     ls dir  // empty (no file, no whiteout)
  Problem 2: Power-cut happens before step 3, whiteout inode with 'nlink=1'
	     is not stored on disk, whiteout dentry(old dentry) is written
	     on disk, whiteout file is lost on next mount (We get "dead
	     directory entry" after executing 'ls -l' on whiteout file).

Now, we use following 3 steps to finish rename whiteout:
  1. create an in-mem inode with 'nlink = 1' as whiteout
  2. ubifs_jnl_rename (Write on disk to finish associating old dentry to
     whiteout inode, associating new dentry with old inode)
  3. iput(whiteout)

Rely writing in-mem inode on disk by ubifs_jnl_rename() to finish rename
whiteout, which avoids middle disk state caused by suddenly power-cut
and error occurring.

Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 fs/ubifs/dir.c     | 144 +++++++++++++++++++++++++++++----------------
 fs/ubifs/journal.c |  52 +++++++++++++---
 2 files changed, 136 insertions(+), 60 deletions(-)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 2cbc5f05f671..deaf2d5dba5b 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -349,8 +349,56 @@ static int ubifs_create(struct user_namespace *mnt_userns, struct inode *dir,
 	return err;
 }
 
-static int do_tmpfile(struct inode *dir, struct dentry *dentry,
-		      umode_t mode, struct inode **whiteout)
+static struct inode *create_whiteout(struct inode *dir, struct dentry *dentry)
+{
+	int err;
+	umode_t mode = S_IFCHR | WHITEOUT_MODE;
+	struct inode *inode;
+	struct ubifs_info *c = dir->i_sb->s_fs_info;
+	struct fscrypt_name nm;
+
+	/*
+	 * Create an inode('nlink = 1') for whiteout without updating journal,
+	 * let ubifs_jnl_rename() store it on flash to complete rename whiteout
+	 * atomically.
+	 */
+
+	dbg_gen("dent '%pd', mode %#hx in dir ino %lu",
+		dentry, mode, dir->i_ino);
+
+	err = fscrypt_setup_filename(dir, &dentry->d_name, 0, &nm);
+	if (err)
+		return ERR_PTR(err);
+
+	inode = ubifs_new_inode(c, dir, mode);
+	if (IS_ERR(inode)) {
+		err = PTR_ERR(inode);
+		goto out_free;
+	}
+
+	init_special_inode(inode, inode->i_mode, WHITEOUT_DEV);
+	ubifs_assert(c, inode->i_op == &ubifs_file_inode_operations);
+
+	err = ubifs_init_security(dir, inode, &dentry->d_name);
+	if (err)
+		goto out_inode;
+
+	/* The dir size is updated by do_rename. */
+	insert_inode_hash(inode);
+
+	return inode;
+
+out_inode:
+	make_bad_inode(inode);
+	iput(inode);
+out_free:
+	fscrypt_free_filename(&nm);
+	ubifs_err(c, "cannot create whiteout file, error %d", err);
+	return ERR_PTR(err);
+}
+
+static int ubifs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
+			 struct dentry *dentry, umode_t mode)
 {
 	struct inode *inode;
 	struct ubifs_info *c = dir->i_sb->s_fs_info;
@@ -392,25 +440,13 @@ static int do_tmpfile(struct inode *dir, struct dentry *dentry,
 	}
 	ui = ubifs_inode(inode);
 
-	if (whiteout) {
-		init_special_inode(inode, inode->i_mode, WHITEOUT_DEV);
-		ubifs_assert(c, inode->i_op == &ubifs_file_inode_operations);
-	}
-
 	err = ubifs_init_security(dir, inode, &dentry->d_name);
 	if (err)
 		goto out_inode;
 
 	mutex_lock(&ui->ui_mutex);
 	insert_inode_hash(inode);
-
-	if (whiteout) {
-		mark_inode_dirty(inode);
-		drop_nlink(inode);
-		*whiteout = inode;
-	} else {
-		d_tmpfile(dentry, inode);
-	}
+	d_tmpfile(dentry, inode);
 	ubifs_assert(c, ui->dirty);
 
 	instantiated = 1;
@@ -432,8 +468,6 @@ static int do_tmpfile(struct inode *dir, struct dentry *dentry,
 	make_bad_inode(inode);
 	if (!instantiated)
 		iput(inode);
-	else if (whiteout)
-		iput(*whiteout);
 out_budg:
 	ubifs_release_budget(c, &req);
 	if (!instantiated)
@@ -443,12 +477,6 @@ static int do_tmpfile(struct inode *dir, struct dentry *dentry,
 	return err;
 }
 
-static int ubifs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
-			 struct dentry *dentry, umode_t mode)
-{
-	return do_tmpfile(dir, dentry, mode, NULL);
-}
-
 /**
  * vfs_dent_type - get VFS directory entry type.
  * @type: UBIFS directory entry type
@@ -1266,17 +1294,19 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
 					.dirtied_ino = 3 };
 	struct ubifs_budget_req ino_req = { .dirtied_ino = 1,
 			.dirtied_ino_d = ALIGN(old_inode_ui->data_len, 8) };
+	struct ubifs_budget_req wht_req;
 	struct timespec64 time;
 	unsigned int saved_nlink;
 	struct fscrypt_name old_nm, new_nm;
 
 	/*
-	 * Budget request settings: deletion direntry, new direntry, removing
-	 * the old inode, and changing old and new parent directory inodes.
+	 * Budget request settings:
+	 *   req: deletion direntry, new direntry, removing the old inode,
+	 *   and changing old and new parent directory inodes.
 	 *
-	 * However, this operation also marks the target inode as dirty and
-	 * does not write it, so we allocate budget for the target inode
-	 * separately.
+	 *   wht_req: new whiteout inode for RENAME_WHITEOUT.
+	 *
+	 *   ino_req: marks the target inode as dirty and does not write it.
 	 */
 
 	dbg_gen("dent '%pd' ino %lu in dir ino %lu to dent '%pd' in dir ino %lu flags 0x%x",
@@ -1326,7 +1356,6 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
 
 	if (flags & RENAME_WHITEOUT) {
 		union ubifs_dev_desc *dev = NULL;
-		struct ubifs_budget_req wht_req;
 
 		dev = kmalloc(sizeof(union ubifs_dev_desc), GFP_NOFS);
 		if (!dev) {
@@ -1334,24 +1363,26 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
 			goto out_release;
 		}
 
-		err = do_tmpfile(old_dir, old_dentry, S_IFCHR | WHITEOUT_MODE, &whiteout);
-		if (err) {
+		/*
+		 * The whiteout inode without dentry is pinned in memory,
+		 * umount won't happen during rename process because we
+		 * got parent dentry.
+		 */
+		whiteout = create_whiteout(old_dir, old_dentry);
+		if (IS_ERR(whiteout)) {
+			err = PTR_ERR(whiteout);
 			kfree(dev);
 			goto out_release;
 		}
 
-		spin_lock(&whiteout->i_lock);
-		whiteout->i_state |= I_LINKABLE;
-		spin_unlock(&whiteout->i_lock);
-
 		whiteout_ui = ubifs_inode(whiteout);
 		whiteout_ui->data = dev;
 		whiteout_ui->data_len = ubifs_encode_dev(dev, MKDEV(0, 0));
 		ubifs_assert(c, !whiteout_ui->dirty);
 
 		memset(&wht_req, 0, sizeof(struct ubifs_budget_req));
-		wht_req.dirtied_ino = 1;
-		wht_req.dirtied_ino_d = ALIGN(whiteout_ui->data_len, 8);
+		wht_req.new_ino = 1;
+		wht_req.new_ino_d = ALIGN(whiteout_ui->data_len, 8);
 		/*
 		 * To avoid deadlock between space budget (holds ui_mutex and
 		 * waits wb work) and writeback work(waits ui_mutex), do space
@@ -1359,6 +1390,11 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
 		 */
 		err = ubifs_budget_space(c, &wht_req);
 		if (err) {
+			/*
+			 * Whiteout inode can not be written on flash by
+			 * ubifs_jnl_write_inode(), because it's neither
+			 * dirty nor zero-nlink.
+			 */
 			iput(whiteout);
 			goto out_release;
 		}
@@ -1433,17 +1469,11 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
 		sync = IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir);
 		if (unlink && IS_SYNC(new_inode))
 			sync = 1;
-	}
-
-	if (whiteout) {
-		inc_nlink(whiteout);
-		mark_inode_dirty(whiteout);
-
-		spin_lock(&whiteout->i_lock);
-		whiteout->i_state &= ~I_LINKABLE;
-		spin_unlock(&whiteout->i_lock);
-
-		iput(whiteout);
+		/*
+		 * S_SYNC flag of whiteout inherits from the old_dir, and we
+		 * have already checked the old dir inode. So there is no need
+		 * to check whiteout.
+		 */
 	}
 
 	err = ubifs_jnl_rename(c, old_dir, old_inode, &old_nm, new_dir,
@@ -1454,6 +1484,11 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
 	unlock_4_inodes(old_dir, new_dir, new_inode, whiteout);
 	ubifs_release_budget(c, &req);
 
+	if (whiteout) {
+		ubifs_release_budget(c, &wht_req);
+		iput(whiteout);
+	}
+
 	mutex_lock(&old_inode_ui->ui_mutex);
 	release = old_inode_ui->dirty;
 	mark_inode_dirty_sync(old_inode);
@@ -1462,11 +1497,16 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
 	if (release)
 		ubifs_release_budget(c, &ino_req);
 	if (IS_SYNC(old_inode))
-		err = old_inode->i_sb->s_op->write_inode(old_inode, NULL);
+		/*
+		 * Rename finished here. Although old inode cannot be updated
+		 * on flash, old ctime is not a big problem, don't return err
+		 * code to userspace.
+		 */
+		old_inode->i_sb->s_op->write_inode(old_inode, NULL);
 
 	fscrypt_free_filename(&old_nm);
 	fscrypt_free_filename(&new_nm);
-	return err;
+	return 0;
 
 out_cancel:
 	if (unlink) {
@@ -1487,11 +1527,11 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
 				inc_nlink(old_dir);
 		}
 	}
+	unlock_4_inodes(old_dir, new_dir, new_inode, whiteout);
 	if (whiteout) {
-		drop_nlink(whiteout);
+		ubifs_release_budget(c, &wht_req);
 		iput(whiteout);
 	}
-	unlock_4_inodes(old_dir, new_dir, new_inode, whiteout);
 out_release:
 	ubifs_release_budget(c, &ino_req);
 	ubifs_release_budget(c, &req);
diff --git a/fs/ubifs/journal.c b/fs/ubifs/journal.c
index 8ea680dba61e..75dab0ae3939 100644
--- a/fs/ubifs/journal.c
+++ b/fs/ubifs/journal.c
@@ -1207,9 +1207,9 @@ int ubifs_jnl_xrename(struct ubifs_info *c, const struct inode *fst_dir,
  * @sync: non-zero if the write-buffer has to be synchronized
  *
  * This function implements the re-name operation which may involve writing up
- * to 4 inodes and 2 directory entries. It marks the written inodes as clean
- * and returns zero on success. In case of failure, a negative error code is
- * returned.
+ * to 4 inodes(new inode, whiteout inode, old and new parent directory inodes)
+ * and 2 directory entries. It marks the written inodes as clean and returns
+ * zero on success. In case of failure, a negative error code is returned.
  */
 int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
 		     const struct inode *old_inode,
@@ -1222,14 +1222,15 @@ int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
 	void *p;
 	union ubifs_key key;
 	struct ubifs_dent_node *dent, *dent2;
-	int err, dlen1, dlen2, ilen, lnum, offs, len, orphan_added = 0;
+	int err, dlen1, dlen2, ilen, wlen, lnum, offs, len, orphan_added = 0;
 	int aligned_dlen1, aligned_dlen2, plen = UBIFS_INO_NODE_SZ;
 	int last_reference = !!(new_inode && new_inode->i_nlink == 0);
 	int move = (old_dir != new_dir);
-	struct ubifs_inode *new_ui;
+	struct ubifs_inode *new_ui, *whiteout_ui;
 	u8 hash_old_dir[UBIFS_HASH_ARR_SZ];
 	u8 hash_new_dir[UBIFS_HASH_ARR_SZ];
 	u8 hash_new_inode[UBIFS_HASH_ARR_SZ];
+	u8 hash_whiteout_inode[UBIFS_HASH_ARR_SZ];
 	u8 hash_dent1[UBIFS_HASH_ARR_SZ];
 	u8 hash_dent2[UBIFS_HASH_ARR_SZ];
 
@@ -1249,9 +1250,20 @@ int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
 	} else
 		ilen = 0;
 
+	if (whiteout) {
+		whiteout_ui = ubifs_inode(whiteout);
+		ubifs_assert(c, mutex_is_locked(&whiteout_ui->ui_mutex));
+		ubifs_assert(c, whiteout->i_nlink == 1);
+		ubifs_assert(c, !whiteout_ui->dirty);
+		wlen = UBIFS_INO_NODE_SZ;
+		wlen += whiteout_ui->data_len;
+	} else
+		wlen = 0;
+
 	aligned_dlen1 = ALIGN(dlen1, 8);
 	aligned_dlen2 = ALIGN(dlen2, 8);
-	len = aligned_dlen1 + aligned_dlen2 + ALIGN(ilen, 8) + ALIGN(plen, 8);
+	len = aligned_dlen1 + aligned_dlen2 + ALIGN(ilen, 8) +
+	      ALIGN(wlen, 8) + ALIGN(plen, 8);
 	if (move)
 		len += plen;
 
@@ -1313,6 +1325,15 @@ int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
 		p += ALIGN(ilen, 8);
 	}
 
+	if (whiteout) {
+		pack_inode(c, p, whiteout, 0);
+		err = ubifs_node_calc_hash(c, p, hash_whiteout_inode);
+		if (err)
+			goto out_release;
+
+		p += ALIGN(wlen, 8);
+	}
+
 	if (!move) {
 		pack_inode(c, p, old_dir, 1);
 		err = ubifs_node_calc_hash(c, p, hash_old_dir);
@@ -1352,6 +1373,9 @@ int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
 		if (new_inode)
 			ubifs_wbuf_add_ino_nolock(&c->jheads[BASEHD].wbuf,
 						  new_inode->i_ino);
+		if (whiteout)
+			ubifs_wbuf_add_ino_nolock(&c->jheads[BASEHD].wbuf,
+						  whiteout->i_ino);
 	}
 	release_head(c, BASEHD);
 
@@ -1368,8 +1392,6 @@ int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
 		err = ubifs_tnc_add_nm(c, &key, lnum, offs, dlen2, hash_dent2, old_nm);
 		if (err)
 			goto out_ro;
-
-		ubifs_delete_orphan(c, whiteout->i_ino);
 	} else {
 		err = ubifs_add_dirt(c, lnum, dlen2);
 		if (err)
@@ -1390,6 +1412,15 @@ int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
 		offs += ALIGN(ilen, 8);
 	}
 
+	if (whiteout) {
+		ino_key_init(c, &key, whiteout->i_ino);
+		err = ubifs_tnc_add(c, &key, lnum, offs, wlen,
+				    hash_whiteout_inode);
+		if (err)
+			goto out_ro;
+		offs += ALIGN(wlen, 8);
+	}
+
 	ino_key_init(c, &key, old_dir->i_ino);
 	err = ubifs_tnc_add(c, &key, lnum, offs, plen, hash_old_dir);
 	if (err)
@@ -1410,6 +1441,11 @@ int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
 		new_ui->synced_i_size = new_ui->ui_size;
 		spin_unlock(&new_ui->ui_lock);
 	}
+	/*
+	 * No need to mark whiteout inode clean.
+	 * Whiteout doesn't have non-zero size, no need to update
+	 * synced_i_size for whiteout_ui.
+	 */
 	mark_inode_clean(c, ubifs_inode(old_dir));
 	if (move)
 		mark_inode_clean(c, ubifs_inode(new_dir));
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 05/15] ubifs: Rename whiteout atomically
@ 2021-12-27  3:22   ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

Currently, rename whiteout has 3 steps:
  1. create tmpfile(which associates old dentry to tmpfile inode) for
     whiteout, and store tmpfile to disk
  2. link whiteout, associate whiteout inode to old dentry agagin and
     store old dentry, old inode, new dentry on disk
  3. writeback dirty whiteout inode to disk

Suddenly power-cut or error occurring(eg. ENOSPC returned by budget,
memory allocation failure) during above steps may cause kinds of problems:
  Problem 1: ENOSPC returned by whiteout space budget (before step 2),
	     old dentry will disappear after rename syscall, whiteout file
	     cannot be found either.

	     ls dir  // we get file, whiteout
	     rename(dir/file, dir/whiteout, REANME_WHITEOUT)
	     ENOSPC = ubifs_budget_space(&wht_req) // return
	     ls dir  // empty (no file, no whiteout)
  Problem 2: Power-cut happens before step 3, whiteout inode with 'nlink=1'
	     is not stored on disk, whiteout dentry(old dentry) is written
	     on disk, whiteout file is lost on next mount (We get "dead
	     directory entry" after executing 'ls -l' on whiteout file).

Now, we use following 3 steps to finish rename whiteout:
  1. create an in-mem inode with 'nlink = 1' as whiteout
  2. ubifs_jnl_rename (Write on disk to finish associating old dentry to
     whiteout inode, associating new dentry with old inode)
  3. iput(whiteout)

Rely writing in-mem inode on disk by ubifs_jnl_rename() to finish rename
whiteout, which avoids middle disk state caused by suddenly power-cut
and error occurring.

Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 fs/ubifs/dir.c     | 144 +++++++++++++++++++++++++++++----------------
 fs/ubifs/journal.c |  52 +++++++++++++---
 2 files changed, 136 insertions(+), 60 deletions(-)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 2cbc5f05f671..deaf2d5dba5b 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -349,8 +349,56 @@ static int ubifs_create(struct user_namespace *mnt_userns, struct inode *dir,
 	return err;
 }
 
-static int do_tmpfile(struct inode *dir, struct dentry *dentry,
-		      umode_t mode, struct inode **whiteout)
+static struct inode *create_whiteout(struct inode *dir, struct dentry *dentry)
+{
+	int err;
+	umode_t mode = S_IFCHR | WHITEOUT_MODE;
+	struct inode *inode;
+	struct ubifs_info *c = dir->i_sb->s_fs_info;
+	struct fscrypt_name nm;
+
+	/*
+	 * Create an inode('nlink = 1') for whiteout without updating journal,
+	 * let ubifs_jnl_rename() store it on flash to complete rename whiteout
+	 * atomically.
+	 */
+
+	dbg_gen("dent '%pd', mode %#hx in dir ino %lu",
+		dentry, mode, dir->i_ino);
+
+	err = fscrypt_setup_filename(dir, &dentry->d_name, 0, &nm);
+	if (err)
+		return ERR_PTR(err);
+
+	inode = ubifs_new_inode(c, dir, mode);
+	if (IS_ERR(inode)) {
+		err = PTR_ERR(inode);
+		goto out_free;
+	}
+
+	init_special_inode(inode, inode->i_mode, WHITEOUT_DEV);
+	ubifs_assert(c, inode->i_op == &ubifs_file_inode_operations);
+
+	err = ubifs_init_security(dir, inode, &dentry->d_name);
+	if (err)
+		goto out_inode;
+
+	/* The dir size is updated by do_rename. */
+	insert_inode_hash(inode);
+
+	return inode;
+
+out_inode:
+	make_bad_inode(inode);
+	iput(inode);
+out_free:
+	fscrypt_free_filename(&nm);
+	ubifs_err(c, "cannot create whiteout file, error %d", err);
+	return ERR_PTR(err);
+}
+
+static int ubifs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
+			 struct dentry *dentry, umode_t mode)
 {
 	struct inode *inode;
 	struct ubifs_info *c = dir->i_sb->s_fs_info;
@@ -392,25 +440,13 @@ static int do_tmpfile(struct inode *dir, struct dentry *dentry,
 	}
 	ui = ubifs_inode(inode);
 
-	if (whiteout) {
-		init_special_inode(inode, inode->i_mode, WHITEOUT_DEV);
-		ubifs_assert(c, inode->i_op == &ubifs_file_inode_operations);
-	}
-
 	err = ubifs_init_security(dir, inode, &dentry->d_name);
 	if (err)
 		goto out_inode;
 
 	mutex_lock(&ui->ui_mutex);
 	insert_inode_hash(inode);
-
-	if (whiteout) {
-		mark_inode_dirty(inode);
-		drop_nlink(inode);
-		*whiteout = inode;
-	} else {
-		d_tmpfile(dentry, inode);
-	}
+	d_tmpfile(dentry, inode);
 	ubifs_assert(c, ui->dirty);
 
 	instantiated = 1;
@@ -432,8 +468,6 @@ static int do_tmpfile(struct inode *dir, struct dentry *dentry,
 	make_bad_inode(inode);
 	if (!instantiated)
 		iput(inode);
-	else if (whiteout)
-		iput(*whiteout);
 out_budg:
 	ubifs_release_budget(c, &req);
 	if (!instantiated)
@@ -443,12 +477,6 @@ static int do_tmpfile(struct inode *dir, struct dentry *dentry,
 	return err;
 }
 
-static int ubifs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
-			 struct dentry *dentry, umode_t mode)
-{
-	return do_tmpfile(dir, dentry, mode, NULL);
-}
-
 /**
  * vfs_dent_type - get VFS directory entry type.
  * @type: UBIFS directory entry type
@@ -1266,17 +1294,19 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
 					.dirtied_ino = 3 };
 	struct ubifs_budget_req ino_req = { .dirtied_ino = 1,
 			.dirtied_ino_d = ALIGN(old_inode_ui->data_len, 8) };
+	struct ubifs_budget_req wht_req;
 	struct timespec64 time;
 	unsigned int saved_nlink;
 	struct fscrypt_name old_nm, new_nm;
 
 	/*
-	 * Budget request settings: deletion direntry, new direntry, removing
-	 * the old inode, and changing old and new parent directory inodes.
+	 * Budget request settings:
+	 *   req: deletion direntry, new direntry, removing the old inode,
+	 *   and changing old and new parent directory inodes.
 	 *
-	 * However, this operation also marks the target inode as dirty and
-	 * does not write it, so we allocate budget for the target inode
-	 * separately.
+	 *   wht_req: new whiteout inode for RENAME_WHITEOUT.
+	 *
+	 *   ino_req: marks the target inode as dirty and does not write it.
 	 */
 
 	dbg_gen("dent '%pd' ino %lu in dir ino %lu to dent '%pd' in dir ino %lu flags 0x%x",
@@ -1326,7 +1356,6 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
 
 	if (flags & RENAME_WHITEOUT) {
 		union ubifs_dev_desc *dev = NULL;
-		struct ubifs_budget_req wht_req;
 
 		dev = kmalloc(sizeof(union ubifs_dev_desc), GFP_NOFS);
 		if (!dev) {
@@ -1334,24 +1363,26 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
 			goto out_release;
 		}
 
-		err = do_tmpfile(old_dir, old_dentry, S_IFCHR | WHITEOUT_MODE, &whiteout);
-		if (err) {
+		/*
+		 * The whiteout inode without dentry is pinned in memory,
+		 * umount won't happen during rename process because we
+		 * got parent dentry.
+		 */
+		whiteout = create_whiteout(old_dir, old_dentry);
+		if (IS_ERR(whiteout)) {
+			err = PTR_ERR(whiteout);
 			kfree(dev);
 			goto out_release;
 		}
 
-		spin_lock(&whiteout->i_lock);
-		whiteout->i_state |= I_LINKABLE;
-		spin_unlock(&whiteout->i_lock);
-
 		whiteout_ui = ubifs_inode(whiteout);
 		whiteout_ui->data = dev;
 		whiteout_ui->data_len = ubifs_encode_dev(dev, MKDEV(0, 0));
 		ubifs_assert(c, !whiteout_ui->dirty);
 
 		memset(&wht_req, 0, sizeof(struct ubifs_budget_req));
-		wht_req.dirtied_ino = 1;
-		wht_req.dirtied_ino_d = ALIGN(whiteout_ui->data_len, 8);
+		wht_req.new_ino = 1;
+		wht_req.new_ino_d = ALIGN(whiteout_ui->data_len, 8);
 		/*
 		 * To avoid deadlock between space budget (holds ui_mutex and
 		 * waits wb work) and writeback work(waits ui_mutex), do space
@@ -1359,6 +1390,11 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
 		 */
 		err = ubifs_budget_space(c, &wht_req);
 		if (err) {
+			/*
+			 * Whiteout inode can not be written on flash by
+			 * ubifs_jnl_write_inode(), because it's neither
+			 * dirty nor zero-nlink.
+			 */
 			iput(whiteout);
 			goto out_release;
 		}
@@ -1433,17 +1469,11 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
 		sync = IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir);
 		if (unlink && IS_SYNC(new_inode))
 			sync = 1;
-	}
-
-	if (whiteout) {
-		inc_nlink(whiteout);
-		mark_inode_dirty(whiteout);
-
-		spin_lock(&whiteout->i_lock);
-		whiteout->i_state &= ~I_LINKABLE;
-		spin_unlock(&whiteout->i_lock);
-
-		iput(whiteout);
+		/*
+		 * S_SYNC flag of whiteout inherits from the old_dir, and we
+		 * have already checked the old dir inode. So there is no need
+		 * to check whiteout.
+		 */
 	}
 
 	err = ubifs_jnl_rename(c, old_dir, old_inode, &old_nm, new_dir,
@@ -1454,6 +1484,11 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
 	unlock_4_inodes(old_dir, new_dir, new_inode, whiteout);
 	ubifs_release_budget(c, &req);
 
+	if (whiteout) {
+		ubifs_release_budget(c, &wht_req);
+		iput(whiteout);
+	}
+
 	mutex_lock(&old_inode_ui->ui_mutex);
 	release = old_inode_ui->dirty;
 	mark_inode_dirty_sync(old_inode);
@@ -1462,11 +1497,16 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
 	if (release)
 		ubifs_release_budget(c, &ino_req);
 	if (IS_SYNC(old_inode))
-		err = old_inode->i_sb->s_op->write_inode(old_inode, NULL);
+		/*
+		 * Rename finished here. Although old inode cannot be updated
+		 * on flash, old ctime is not a big problem, don't return err
+		 * code to userspace.
+		 */
+		old_inode->i_sb->s_op->write_inode(old_inode, NULL);
 
 	fscrypt_free_filename(&old_nm);
 	fscrypt_free_filename(&new_nm);
-	return err;
+	return 0;
 
 out_cancel:
 	if (unlink) {
@@ -1487,11 +1527,11 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
 				inc_nlink(old_dir);
 		}
 	}
+	unlock_4_inodes(old_dir, new_dir, new_inode, whiteout);
 	if (whiteout) {
-		drop_nlink(whiteout);
+		ubifs_release_budget(c, &wht_req);
 		iput(whiteout);
 	}
-	unlock_4_inodes(old_dir, new_dir, new_inode, whiteout);
 out_release:
 	ubifs_release_budget(c, &ino_req);
 	ubifs_release_budget(c, &req);
diff --git a/fs/ubifs/journal.c b/fs/ubifs/journal.c
index 8ea680dba61e..75dab0ae3939 100644
--- a/fs/ubifs/journal.c
+++ b/fs/ubifs/journal.c
@@ -1207,9 +1207,9 @@ int ubifs_jnl_xrename(struct ubifs_info *c, const struct inode *fst_dir,
  * @sync: non-zero if the write-buffer has to be synchronized
  *
  * This function implements the re-name operation which may involve writing up
- * to 4 inodes and 2 directory entries. It marks the written inodes as clean
- * and returns zero on success. In case of failure, a negative error code is
- * returned.
+ * to 4 inodes(new inode, whiteout inode, old and new parent directory inodes)
+ * and 2 directory entries. It marks the written inodes as clean and returns
+ * zero on success. In case of failure, a negative error code is returned.
  */
 int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
 		     const struct inode *old_inode,
@@ -1222,14 +1222,15 @@ int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
 	void *p;
 	union ubifs_key key;
 	struct ubifs_dent_node *dent, *dent2;
-	int err, dlen1, dlen2, ilen, lnum, offs, len, orphan_added = 0;
+	int err, dlen1, dlen2, ilen, wlen, lnum, offs, len, orphan_added = 0;
 	int aligned_dlen1, aligned_dlen2, plen = UBIFS_INO_NODE_SZ;
 	int last_reference = !!(new_inode && new_inode->i_nlink == 0);
 	int move = (old_dir != new_dir);
-	struct ubifs_inode *new_ui;
+	struct ubifs_inode *new_ui, *whiteout_ui;
 	u8 hash_old_dir[UBIFS_HASH_ARR_SZ];
 	u8 hash_new_dir[UBIFS_HASH_ARR_SZ];
 	u8 hash_new_inode[UBIFS_HASH_ARR_SZ];
+	u8 hash_whiteout_inode[UBIFS_HASH_ARR_SZ];
 	u8 hash_dent1[UBIFS_HASH_ARR_SZ];
 	u8 hash_dent2[UBIFS_HASH_ARR_SZ];
 
@@ -1249,9 +1250,20 @@ int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
 	} else
 		ilen = 0;
 
+	if (whiteout) {
+		whiteout_ui = ubifs_inode(whiteout);
+		ubifs_assert(c, mutex_is_locked(&whiteout_ui->ui_mutex));
+		ubifs_assert(c, whiteout->i_nlink == 1);
+		ubifs_assert(c, !whiteout_ui->dirty);
+		wlen = UBIFS_INO_NODE_SZ;
+		wlen += whiteout_ui->data_len;
+	} else
+		wlen = 0;
+
 	aligned_dlen1 = ALIGN(dlen1, 8);
 	aligned_dlen2 = ALIGN(dlen2, 8);
-	len = aligned_dlen1 + aligned_dlen2 + ALIGN(ilen, 8) + ALIGN(plen, 8);
+	len = aligned_dlen1 + aligned_dlen2 + ALIGN(ilen, 8) +
+	      ALIGN(wlen, 8) + ALIGN(plen, 8);
 	if (move)
 		len += plen;
 
@@ -1313,6 +1325,15 @@ int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
 		p += ALIGN(ilen, 8);
 	}
 
+	if (whiteout) {
+		pack_inode(c, p, whiteout, 0);
+		err = ubifs_node_calc_hash(c, p, hash_whiteout_inode);
+		if (err)
+			goto out_release;
+
+		p += ALIGN(wlen, 8);
+	}
+
 	if (!move) {
 		pack_inode(c, p, old_dir, 1);
 		err = ubifs_node_calc_hash(c, p, hash_old_dir);
@@ -1352,6 +1373,9 @@ int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
 		if (new_inode)
 			ubifs_wbuf_add_ino_nolock(&c->jheads[BASEHD].wbuf,
 						  new_inode->i_ino);
+		if (whiteout)
+			ubifs_wbuf_add_ino_nolock(&c->jheads[BASEHD].wbuf,
+						  whiteout->i_ino);
 	}
 	release_head(c, BASEHD);
 
@@ -1368,8 +1392,6 @@ int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
 		err = ubifs_tnc_add_nm(c, &key, lnum, offs, dlen2, hash_dent2, old_nm);
 		if (err)
 			goto out_ro;
-
-		ubifs_delete_orphan(c, whiteout->i_ino);
 	} else {
 		err = ubifs_add_dirt(c, lnum, dlen2);
 		if (err)
@@ -1390,6 +1412,15 @@ int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
 		offs += ALIGN(ilen, 8);
 	}
 
+	if (whiteout) {
+		ino_key_init(c, &key, whiteout->i_ino);
+		err = ubifs_tnc_add(c, &key, lnum, offs, wlen,
+				    hash_whiteout_inode);
+		if (err)
+			goto out_ro;
+		offs += ALIGN(wlen, 8);
+	}
+
 	ino_key_init(c, &key, old_dir->i_ino);
 	err = ubifs_tnc_add(c, &key, lnum, offs, plen, hash_old_dir);
 	if (err)
@@ -1410,6 +1441,11 @@ int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
 		new_ui->synced_i_size = new_ui->ui_size;
 		spin_unlock(&new_ui->ui_lock);
 	}
+	/*
+	 * No need to mark whiteout inode clean.
+	 * Whiteout doesn't have non-zero size, no need to update
+	 * synced_i_size for whiteout_ui.
+	 */
 	mark_inode_clean(c, ubifs_inode(old_dir));
 	if (move)
 		mark_inode_clean(c, ubifs_inode(new_dir));
-- 
2.31.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 06/15] ubifs: Fix 'ui->dirty' race between do_tmpfile() and writeback work
  2021-12-27  3:22 ` Zhihao Cheng
@ 2021-12-27  3:22   ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

'ui->dirty' is not protected by 'ui_mutex' in function do_tmpfile() which
may race with ubifs_write_inode[wb_workfn] to access/update 'ui->dirty',
finally dirty space is released twice.

	open(O_TMPFILE)                wb_workfn
do_tmpfile
  ubifs_budget_space(ino_req = { .dirtied_ino = 1})
  d_tmpfile // mark inode(tmpfile) dirty
  ubifs_jnl_update // without holding tmpfile's ui_mutex
    mark_inode_clean(ui)
      if (ui->dirty)
        ubifs_release_dirty_inode_budget(ui)  // release first time
                                   ubifs_write_inode
				     mutex_lock(&ui->ui_mutex)
                                     ubifs_release_dirty_inode_budget(ui)
				     // release second time
				     mutex_unlock(&ui->ui_mutex)
      ui->dirty = 0

Run generic/476 can reproduce following message easily
(See reproducer in [Link]):

  UBIFS error (ubi0:0 pid 2578): ubifs_assert_failed [ubifs]: UBIFS assert
  failed: c->bi.dd_growth >= 0, in fs/ubifs/budget.c:554
  UBIFS warning (ubi0:0 pid 2578): ubifs_ro_mode [ubifs]: switched to
  read-only mode, error -22
  Workqueue: writeback wb_workfn (flush-ubifs_0_0)
  Call Trace:
    ubifs_ro_mode+0x54/0x60 [ubifs]
    ubifs_assert_failed+0x4b/0x80 [ubifs]
    ubifs_release_budget+0x468/0x5a0 [ubifs]
    ubifs_release_dirty_inode_budget+0x53/0x80 [ubifs]
    ubifs_write_inode+0x121/0x1f0 [ubifs]
    ...
    wb_workfn+0x283/0x7b0

Fix it by holding tmpfile ubifs inode lock during ubifs_jnl_update().
Similar problem exists in whiteout renaming, but previous fix("ubifs:
Rename whiteout atomically") has solved the problem.

Fixes: 474b93704f32163 ("ubifs: Implement O_TMPFILE")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214765
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 fs/ubifs/dir.c | 60 +++++++++++++++++++++++++-------------------------
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index deaf2d5dba5b..ebdc9aa04cbb 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -397,6 +397,32 @@ static struct inode *create_whiteout(struct inode *dir, struct dentry *dentry)
 	return ERR_PTR(err);
 }
 
+/**
+ * lock_2_inodes - a wrapper for locking two UBIFS inodes.
+ * @inode1: first inode
+ * @inode2: second inode
+ *
+ * We do not implement any tricks to guarantee strict lock ordering, because
+ * VFS has already done it for us on the @i_mutex. So this is just a simple
+ * wrapper function.
+ */
+static void lock_2_inodes(struct inode *inode1, struct inode *inode2)
+{
+	mutex_lock_nested(&ubifs_inode(inode1)->ui_mutex, WB_MUTEX_1);
+	mutex_lock_nested(&ubifs_inode(inode2)->ui_mutex, WB_MUTEX_2);
+}
+
+/**
+ * unlock_2_inodes - a wrapper for unlocking two UBIFS inodes.
+ * @inode1: first inode
+ * @inode2: second inode
+ */
+static void unlock_2_inodes(struct inode *inode1, struct inode *inode2)
+{
+	mutex_unlock(&ubifs_inode(inode2)->ui_mutex);
+	mutex_unlock(&ubifs_inode(inode1)->ui_mutex);
+}
+
 static int ubifs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
 			 struct dentry *dentry, umode_t mode)
 {
@@ -404,7 +430,7 @@ static int ubifs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
 	struct ubifs_info *c = dir->i_sb->s_fs_info;
 	struct ubifs_budget_req req = { .new_ino = 1, .new_dent = 1};
 	struct ubifs_budget_req ino_req = { .dirtied_ino = 1 };
-	struct ubifs_inode *ui, *dir_ui = ubifs_inode(dir);
+	struct ubifs_inode *ui;
 	int err, instantiated = 0;
 	struct fscrypt_name nm;
 
@@ -452,18 +478,18 @@ static int ubifs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
 	instantiated = 1;
 	mutex_unlock(&ui->ui_mutex);
 
-	mutex_lock(&dir_ui->ui_mutex);
+	lock_2_inodes(dir, inode);
 	err = ubifs_jnl_update(c, dir, &nm, inode, 1, 0);
 	if (err)
 		goto out_cancel;
-	mutex_unlock(&dir_ui->ui_mutex);
+	unlock_2_inodes(dir, inode);
 
 	ubifs_release_budget(c, &req);
 
 	return 0;
 
 out_cancel:
-	mutex_unlock(&dir_ui->ui_mutex);
+	unlock_2_inodes(dir, inode);
 out_inode:
 	make_bad_inode(inode);
 	if (!instantiated)
@@ -690,32 +716,6 @@ static int ubifs_dir_release(struct inode *dir, struct file *file)
 	return 0;
 }
 
-/**
- * lock_2_inodes - a wrapper for locking two UBIFS inodes.
- * @inode1: first inode
- * @inode2: second inode
- *
- * We do not implement any tricks to guarantee strict lock ordering, because
- * VFS has already done it for us on the @i_mutex. So this is just a simple
- * wrapper function.
- */
-static void lock_2_inodes(struct inode *inode1, struct inode *inode2)
-{
-	mutex_lock_nested(&ubifs_inode(inode1)->ui_mutex, WB_MUTEX_1);
-	mutex_lock_nested(&ubifs_inode(inode2)->ui_mutex, WB_MUTEX_2);
-}
-
-/**
- * unlock_2_inodes - a wrapper for unlocking two UBIFS inodes.
- * @inode1: first inode
- * @inode2: second inode
- */
-static void unlock_2_inodes(struct inode *inode1, struct inode *inode2)
-{
-	mutex_unlock(&ubifs_inode(inode2)->ui_mutex);
-	mutex_unlock(&ubifs_inode(inode1)->ui_mutex);
-}
-
 static int ubifs_link(struct dentry *old_dentry, struct inode *dir,
 		      struct dentry *dentry)
 {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 06/15] ubifs: Fix 'ui->dirty' race between do_tmpfile() and writeback work
@ 2021-12-27  3:22   ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

'ui->dirty' is not protected by 'ui_mutex' in function do_tmpfile() which
may race with ubifs_write_inode[wb_workfn] to access/update 'ui->dirty',
finally dirty space is released twice.

	open(O_TMPFILE)                wb_workfn
do_tmpfile
  ubifs_budget_space(ino_req = { .dirtied_ino = 1})
  d_tmpfile // mark inode(tmpfile) dirty
  ubifs_jnl_update // without holding tmpfile's ui_mutex
    mark_inode_clean(ui)
      if (ui->dirty)
        ubifs_release_dirty_inode_budget(ui)  // release first time
                                   ubifs_write_inode
				     mutex_lock(&ui->ui_mutex)
                                     ubifs_release_dirty_inode_budget(ui)
				     // release second time
				     mutex_unlock(&ui->ui_mutex)
      ui->dirty = 0

Run generic/476 can reproduce following message easily
(See reproducer in [Link]):

  UBIFS error (ubi0:0 pid 2578): ubifs_assert_failed [ubifs]: UBIFS assert
  failed: c->bi.dd_growth >= 0, in fs/ubifs/budget.c:554
  UBIFS warning (ubi0:0 pid 2578): ubifs_ro_mode [ubifs]: switched to
  read-only mode, error -22
  Workqueue: writeback wb_workfn (flush-ubifs_0_0)
  Call Trace:
    ubifs_ro_mode+0x54/0x60 [ubifs]
    ubifs_assert_failed+0x4b/0x80 [ubifs]
    ubifs_release_budget+0x468/0x5a0 [ubifs]
    ubifs_release_dirty_inode_budget+0x53/0x80 [ubifs]
    ubifs_write_inode+0x121/0x1f0 [ubifs]
    ...
    wb_workfn+0x283/0x7b0

Fix it by holding tmpfile ubifs inode lock during ubifs_jnl_update().
Similar problem exists in whiteout renaming, but previous fix("ubifs:
Rename whiteout atomically") has solved the problem.

Fixes: 474b93704f32163 ("ubifs: Implement O_TMPFILE")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214765
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 fs/ubifs/dir.c | 60 +++++++++++++++++++++++++-------------------------
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index deaf2d5dba5b..ebdc9aa04cbb 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -397,6 +397,32 @@ static struct inode *create_whiteout(struct inode *dir, struct dentry *dentry)
 	return ERR_PTR(err);
 }
 
+/**
+ * lock_2_inodes - a wrapper for locking two UBIFS inodes.
+ * @inode1: first inode
+ * @inode2: second inode
+ *
+ * We do not implement any tricks to guarantee strict lock ordering, because
+ * VFS has already done it for us on the @i_mutex. So this is just a simple
+ * wrapper function.
+ */
+static void lock_2_inodes(struct inode *inode1, struct inode *inode2)
+{
+	mutex_lock_nested(&ubifs_inode(inode1)->ui_mutex, WB_MUTEX_1);
+	mutex_lock_nested(&ubifs_inode(inode2)->ui_mutex, WB_MUTEX_2);
+}
+
+/**
+ * unlock_2_inodes - a wrapper for unlocking two UBIFS inodes.
+ * @inode1: first inode
+ * @inode2: second inode
+ */
+static void unlock_2_inodes(struct inode *inode1, struct inode *inode2)
+{
+	mutex_unlock(&ubifs_inode(inode2)->ui_mutex);
+	mutex_unlock(&ubifs_inode(inode1)->ui_mutex);
+}
+
 static int ubifs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
 			 struct dentry *dentry, umode_t mode)
 {
@@ -404,7 +430,7 @@ static int ubifs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
 	struct ubifs_info *c = dir->i_sb->s_fs_info;
 	struct ubifs_budget_req req = { .new_ino = 1, .new_dent = 1};
 	struct ubifs_budget_req ino_req = { .dirtied_ino = 1 };
-	struct ubifs_inode *ui, *dir_ui = ubifs_inode(dir);
+	struct ubifs_inode *ui;
 	int err, instantiated = 0;
 	struct fscrypt_name nm;
 
@@ -452,18 +478,18 @@ static int ubifs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
 	instantiated = 1;
 	mutex_unlock(&ui->ui_mutex);
 
-	mutex_lock(&dir_ui->ui_mutex);
+	lock_2_inodes(dir, inode);
 	err = ubifs_jnl_update(c, dir, &nm, inode, 1, 0);
 	if (err)
 		goto out_cancel;
-	mutex_unlock(&dir_ui->ui_mutex);
+	unlock_2_inodes(dir, inode);
 
 	ubifs_release_budget(c, &req);
 
 	return 0;
 
 out_cancel:
-	mutex_unlock(&dir_ui->ui_mutex);
+	unlock_2_inodes(dir, inode);
 out_inode:
 	make_bad_inode(inode);
 	if (!instantiated)
@@ -690,32 +716,6 @@ static int ubifs_dir_release(struct inode *dir, struct file *file)
 	return 0;
 }
 
-/**
- * lock_2_inodes - a wrapper for locking two UBIFS inodes.
- * @inode1: first inode
- * @inode2: second inode
- *
- * We do not implement any tricks to guarantee strict lock ordering, because
- * VFS has already done it for us on the @i_mutex. So this is just a simple
- * wrapper function.
- */
-static void lock_2_inodes(struct inode *inode1, struct inode *inode2)
-{
-	mutex_lock_nested(&ubifs_inode(inode1)->ui_mutex, WB_MUTEX_1);
-	mutex_lock_nested(&ubifs_inode(inode2)->ui_mutex, WB_MUTEX_2);
-}
-
-/**
- * unlock_2_inodes - a wrapper for unlocking two UBIFS inodes.
- * @inode1: first inode
- * @inode2: second inode
- */
-static void unlock_2_inodes(struct inode *inode1, struct inode *inode2)
-{
-	mutex_unlock(&ubifs_inode(inode2)->ui_mutex);
-	mutex_unlock(&ubifs_inode(inode1)->ui_mutex);
-}
-
 static int ubifs_link(struct dentry *old_dentry, struct inode *dir,
 		      struct dentry *dentry)
 {
-- 
2.31.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 07/15] ubifs: Rectify space amount budget for mkdir/tmpfile operations
  2021-12-27  3:22 ` Zhihao Cheng
@ 2021-12-27  3:22   ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

UBIFS should make sure the flash has enough space to store dirty (Data
that is newer than disk) data (in memory), space budget is exactly
designed to do that. If space budget calculates less data than we need,
'make_reservation()' will do more work(return -ENOSPC if no free space
lelf, sometimes we can see "cannot reserve xxx bytes in jhead xxx, error
-28" in ubifs error messages) with ubifs inodes locked, which may effect
other syscalls.

A simple way to decide how much space do we need when make a budget:
See how much space is needed by 'make_reservation()' in ubifs_jnl_xxx()
function according to corresponding operation.

It's better to report ENOSPC in ubifs_budget_space(), as early as we can.

Fixes: 474b93704f32163 ("ubifs: Implement O_TMPFILE")
Fixes: 1e51764a3c2ac05 ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 fs/ubifs/dir.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index ebdc9aa04cbb..7ae48f273feb 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -428,15 +428,18 @@ static int ubifs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
 {
 	struct inode *inode;
 	struct ubifs_info *c = dir->i_sb->s_fs_info;
-	struct ubifs_budget_req req = { .new_ino = 1, .new_dent = 1};
+	struct ubifs_budget_req req = { .new_ino = 1, .new_dent = 1,
+					.dirtied_ino = 1};
 	struct ubifs_budget_req ino_req = { .dirtied_ino = 1 };
 	struct ubifs_inode *ui;
 	int err, instantiated = 0;
 	struct fscrypt_name nm;
 
 	/*
-	 * Budget request settings: new dirty inode, new direntry,
-	 * budget for dirtied inode will be released via writeback.
+	 * Budget request settings: new inode, new direntry, changing the
+	 * parent directory inode.
+	 * Allocate budget separately for new dirtied inode, the budget will
+	 * be released via writeback.
 	 */
 
 	dbg_gen("dent '%pd', mode %#hx in dir ino %lu",
@@ -979,7 +982,8 @@ static int ubifs_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
 	struct ubifs_inode *dir_ui = ubifs_inode(dir);
 	struct ubifs_info *c = dir->i_sb->s_fs_info;
 	int err, sz_change;
-	struct ubifs_budget_req req = { .new_ino = 1, .new_dent = 1 };
+	struct ubifs_budget_req req = { .new_ino = 1, .new_dent = 1,
+					.dirtied_ino = 1};
 	struct fscrypt_name nm;
 
 	/*
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 07/15] ubifs: Rectify space amount budget for mkdir/tmpfile operations
@ 2021-12-27  3:22   ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

UBIFS should make sure the flash has enough space to store dirty (Data
that is newer than disk) data (in memory), space budget is exactly
designed to do that. If space budget calculates less data than we need,
'make_reservation()' will do more work(return -ENOSPC if no free space
lelf, sometimes we can see "cannot reserve xxx bytes in jhead xxx, error
-28" in ubifs error messages) with ubifs inodes locked, which may effect
other syscalls.

A simple way to decide how much space do we need when make a budget:
See how much space is needed by 'make_reservation()' in ubifs_jnl_xxx()
function according to corresponding operation.

It's better to report ENOSPC in ubifs_budget_space(), as early as we can.

Fixes: 474b93704f32163 ("ubifs: Implement O_TMPFILE")
Fixes: 1e51764a3c2ac05 ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 fs/ubifs/dir.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index ebdc9aa04cbb..7ae48f273feb 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -428,15 +428,18 @@ static int ubifs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
 {
 	struct inode *inode;
 	struct ubifs_info *c = dir->i_sb->s_fs_info;
-	struct ubifs_budget_req req = { .new_ino = 1, .new_dent = 1};
+	struct ubifs_budget_req req = { .new_ino = 1, .new_dent = 1,
+					.dirtied_ino = 1};
 	struct ubifs_budget_req ino_req = { .dirtied_ino = 1 };
 	struct ubifs_inode *ui;
 	int err, instantiated = 0;
 	struct fscrypt_name nm;
 
 	/*
-	 * Budget request settings: new dirty inode, new direntry,
-	 * budget for dirtied inode will be released via writeback.
+	 * Budget request settings: new inode, new direntry, changing the
+	 * parent directory inode.
+	 * Allocate budget separately for new dirtied inode, the budget will
+	 * be released via writeback.
 	 */
 
 	dbg_gen("dent '%pd', mode %#hx in dir ino %lu",
@@ -979,7 +982,8 @@ static int ubifs_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
 	struct ubifs_inode *dir_ui = ubifs_inode(dir);
 	struct ubifs_info *c = dir->i_sb->s_fs_info;
 	int err, sz_change;
-	struct ubifs_budget_req req = { .new_ino = 1, .new_dent = 1 };
+	struct ubifs_budget_req req = { .new_ino = 1, .new_dent = 1,
+					.dirtied_ino = 1};
 	struct fscrypt_name nm;
 
 	/*
-- 
2.31.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 08/15] ubifs: setflags: Make dirtied_ino_d 8 bytes aligned
  2021-12-27  3:22 ` Zhihao Cheng
@ 2021-12-27  3:22   ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

Make 'ui->data_len' aligned with 8 bytes before it is assigned to
dirtied_ino_d. Since 8871d84c8f8b0c6b("ubifs: convert to fileattr")
applied, 'setflags()' only affects regular files and directories, only
xattr inode, symlink inode and special inode(pipe/char_dev/block_dev)
have none- zero 'ui->data_len' field, so assertion
'!(req->dirtied_ino_d & 7)' cannot fail in ubifs_budget_space().
To avoid assertion fails in future evolution(eg. setflags can operate
special inodes), it's better to make dirtied_ino_d 8 bytes aligned,
after all aligned size is still zero for regular files.

Fixes: 1e51764a3c2ac05a ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 fs/ubifs/ioctl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ubifs/ioctl.c b/fs/ubifs/ioctl.c
index c6a863487780..71bcebe45f9c 100644
--- a/fs/ubifs/ioctl.c
+++ b/fs/ubifs/ioctl.c
@@ -108,7 +108,7 @@ static int setflags(struct inode *inode, int flags)
 	struct ubifs_inode *ui = ubifs_inode(inode);
 	struct ubifs_info *c = inode->i_sb->s_fs_info;
 	struct ubifs_budget_req req = { .dirtied_ino = 1,
-					.dirtied_ino_d = ui->data_len };
+			.dirtied_ino_d = ALIGN(ui->data_len, 8) };
 
 	err = ubifs_budget_space(c, &req);
 	if (err)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 08/15] ubifs: setflags: Make dirtied_ino_d 8 bytes aligned
@ 2021-12-27  3:22   ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

Make 'ui->data_len' aligned with 8 bytes before it is assigned to
dirtied_ino_d. Since 8871d84c8f8b0c6b("ubifs: convert to fileattr")
applied, 'setflags()' only affects regular files and directories, only
xattr inode, symlink inode and special inode(pipe/char_dev/block_dev)
have none- zero 'ui->data_len' field, so assertion
'!(req->dirtied_ino_d & 7)' cannot fail in ubifs_budget_space().
To avoid assertion fails in future evolution(eg. setflags can operate
special inodes), it's better to make dirtied_ino_d 8 bytes aligned,
after all aligned size is still zero for regular files.

Fixes: 1e51764a3c2ac05a ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 fs/ubifs/ioctl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ubifs/ioctl.c b/fs/ubifs/ioctl.c
index c6a863487780..71bcebe45f9c 100644
--- a/fs/ubifs/ioctl.c
+++ b/fs/ubifs/ioctl.c
@@ -108,7 +108,7 @@ static int setflags(struct inode *inode, int flags)
 	struct ubifs_inode *ui = ubifs_inode(inode);
 	struct ubifs_info *c = inode->i_sb->s_fs_info;
 	struct ubifs_budget_req req = { .dirtied_ino = 1,
-					.dirtied_ino_d = ui->data_len };
+			.dirtied_ino_d = ALIGN(ui->data_len, 8) };
 
 	err = ubifs_budget_space(c, &req);
 	if (err)
-- 
2.31.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 09/15] ubifs: Fix read out-of-bounds in ubifs_wbuf_write_nolock()
  2021-12-27  3:22 ` Zhihao Cheng
@ 2021-12-27  3:22   ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

Function ubifs_wbuf_write_nolock() may access buf out of bounds in
following process:

ubifs_wbuf_write_nolock():
  aligned_len = ALIGN(len, 8);   // Assume len = 4089, aligned_len = 4096
  if (aligned_len <= wbuf->avail) ... // Not satisfy
  if (wbuf->used) {
    ubifs_leb_write()  // Fill some data in avail wbuf
    len -= wbuf->avail;   // len is still not 8-bytes aligned
    aligned_len -= wbuf->avail;
  }
  n = aligned_len >> c->max_write_shift;
  if (n) {
    n <<= c->max_write_shift;
    err = ubifs_leb_write(c, wbuf->lnum, buf + written,
                          wbuf->offs, n);
    // n > len, read out of bounds less than 8(n-len) bytes
  }

, which can be catched by KASAN:
  =========================================================
  BUG: KASAN: slab-out-of-bounds in ecc_sw_hamming_calculate+0x1dc/0x7d0
  Read of size 4 at addr ffff888105594ff8 by task kworker/u8:4/128
  Workqueue: writeback wb_workfn (flush-ubifs_0_0)
  Call Trace:
    kasan_report.cold+0x81/0x165
    nand_write_page_swecc+0xa9/0x160
    ubifs_leb_write+0xf2/0x1b0 [ubifs]
    ubifs_wbuf_write_nolock+0x421/0x12c0 [ubifs]
    write_head+0xdc/0x1c0 [ubifs]
    ubifs_jnl_write_inode+0x627/0x960 [ubifs]
    wb_workfn+0x8af/0xb80

Function ubifs_wbuf_write_nolock() accepts that parameter 'len' is not 8
bytes aligned, the 'len' represents the true length of buf (which is
allocated in 'ubifs_jnl_xxx', eg. ubifs_jnl_write_inode), so
ubifs_wbuf_write_nolock() must handle the length read from 'buf' carefully
to write leb safely.

Fetch a reproducer in [Link].

Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214785
Reported-by: Chengsong Ke <kechengsong@huawei.com>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 fs/ubifs/io.c | 34 ++++++++++++++++++++++++++++++----
 1 file changed, 30 insertions(+), 4 deletions(-)

diff --git a/fs/ubifs/io.c b/fs/ubifs/io.c
index 00b61dba62b7..b019dd6f7fa0 100644
--- a/fs/ubifs/io.c
+++ b/fs/ubifs/io.c
@@ -833,16 +833,42 @@ int ubifs_wbuf_write_nolock(struct ubifs_wbuf *wbuf, void *buf, int len)
 	 */
 	n = aligned_len >> c->max_write_shift;
 	if (n) {
-		n <<= c->max_write_shift;
+		int m = n - 1;
+
 		dbg_io("write %d bytes to LEB %d:%d", n, wbuf->lnum,
 		       wbuf->offs);
-		err = ubifs_leb_write(c, wbuf->lnum, buf + written,
-				      wbuf->offs, n);
+
+		if (m) {
+			/* '(n-1)<<c->max_write_shift < len' is always true. */
+			m <<= c->max_write_shift;
+			err = ubifs_leb_write(c, wbuf->lnum, buf + written,
+					      wbuf->offs, m);
+			if (err)
+				goto out;
+			wbuf->offs += m;
+			aligned_len -= m;
+			len -= m;
+			written += m;
+		}
+
+		/*
+		 * The non-written len of buf may be less than 'n' because
+		 * parameter 'len' is not 8 bytes aligned, so here we read
+		 * min(len, n) bytes from buf.
+		 */
+		n = 1 << c->max_write_shift;
+		memcpy(wbuf->buf, buf + written, min(len, n));
+		if (n > len) {
+			ubifs_assert(c, n - len < 8);
+			ubifs_pad(c, wbuf->buf + len, n - len);
+		}
+
+		err = ubifs_leb_write(c, wbuf->lnum, wbuf->buf, wbuf->offs, n);
 		if (err)
 			goto out;
 		wbuf->offs += n;
 		aligned_len -= n;
-		len -= n;
+		len -= min(len, n);
 		written += n;
 	}
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 09/15] ubifs: Fix read out-of-bounds in ubifs_wbuf_write_nolock()
@ 2021-12-27  3:22   ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

Function ubifs_wbuf_write_nolock() may access buf out of bounds in
following process:

ubifs_wbuf_write_nolock():
  aligned_len = ALIGN(len, 8);   // Assume len = 4089, aligned_len = 4096
  if (aligned_len <= wbuf->avail) ... // Not satisfy
  if (wbuf->used) {
    ubifs_leb_write()  // Fill some data in avail wbuf
    len -= wbuf->avail;   // len is still not 8-bytes aligned
    aligned_len -= wbuf->avail;
  }
  n = aligned_len >> c->max_write_shift;
  if (n) {
    n <<= c->max_write_shift;
    err = ubifs_leb_write(c, wbuf->lnum, buf + written,
                          wbuf->offs, n);
    // n > len, read out of bounds less than 8(n-len) bytes
  }

, which can be catched by KASAN:
  =========================================================
  BUG: KASAN: slab-out-of-bounds in ecc_sw_hamming_calculate+0x1dc/0x7d0
  Read of size 4 at addr ffff888105594ff8 by task kworker/u8:4/128
  Workqueue: writeback wb_workfn (flush-ubifs_0_0)
  Call Trace:
    kasan_report.cold+0x81/0x165
    nand_write_page_swecc+0xa9/0x160
    ubifs_leb_write+0xf2/0x1b0 [ubifs]
    ubifs_wbuf_write_nolock+0x421/0x12c0 [ubifs]
    write_head+0xdc/0x1c0 [ubifs]
    ubifs_jnl_write_inode+0x627/0x960 [ubifs]
    wb_workfn+0x8af/0xb80

Function ubifs_wbuf_write_nolock() accepts that parameter 'len' is not 8
bytes aligned, the 'len' represents the true length of buf (which is
allocated in 'ubifs_jnl_xxx', eg. ubifs_jnl_write_inode), so
ubifs_wbuf_write_nolock() must handle the length read from 'buf' carefully
to write leb safely.

Fetch a reproducer in [Link].

Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214785
Reported-by: Chengsong Ke <kechengsong@huawei.com>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 fs/ubifs/io.c | 34 ++++++++++++++++++++++++++++++----
 1 file changed, 30 insertions(+), 4 deletions(-)

diff --git a/fs/ubifs/io.c b/fs/ubifs/io.c
index 00b61dba62b7..b019dd6f7fa0 100644
--- a/fs/ubifs/io.c
+++ b/fs/ubifs/io.c
@@ -833,16 +833,42 @@ int ubifs_wbuf_write_nolock(struct ubifs_wbuf *wbuf, void *buf, int len)
 	 */
 	n = aligned_len >> c->max_write_shift;
 	if (n) {
-		n <<= c->max_write_shift;
+		int m = n - 1;
+
 		dbg_io("write %d bytes to LEB %d:%d", n, wbuf->lnum,
 		       wbuf->offs);
-		err = ubifs_leb_write(c, wbuf->lnum, buf + written,
-				      wbuf->offs, n);
+
+		if (m) {
+			/* '(n-1)<<c->max_write_shift < len' is always true. */
+			m <<= c->max_write_shift;
+			err = ubifs_leb_write(c, wbuf->lnum, buf + written,
+					      wbuf->offs, m);
+			if (err)
+				goto out;
+			wbuf->offs += m;
+			aligned_len -= m;
+			len -= m;
+			written += m;
+		}
+
+		/*
+		 * The non-written len of buf may be less than 'n' because
+		 * parameter 'len' is not 8 bytes aligned, so here we read
+		 * min(len, n) bytes from buf.
+		 */
+		n = 1 << c->max_write_shift;
+		memcpy(wbuf->buf, buf + written, min(len, n));
+		if (n > len) {
+			ubifs_assert(c, n - len < 8);
+			ubifs_pad(c, wbuf->buf + len, n - len);
+		}
+
+		err = ubifs_leb_write(c, wbuf->lnum, wbuf->buf, wbuf->offs, n);
 		if (err)
 			goto out;
 		wbuf->offs += n;
 		aligned_len -= n;
-		len -= n;
+		len -= min(len, n);
 		written += n;
 	}
 
-- 
2.31.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 10/15] ubifs: Fix to add refcount once page is set private
  2021-12-27  3:22 ` Zhihao Cheng
@ 2021-12-27  3:22   ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

MM defined the rule [1] very clearly that once page was set with PG_private
flag, we should increment the refcount in that page, also main flows like
pageout(), migrate_page() will assume there is one additional page
reference count if page_has_private() returns true. Otherwise, we may
get a BUG in page migration:

  page:0000000080d05b9d refcount:-1 mapcount:0 mapping:000000005f4d82a8
  index:0xe2 pfn:0x14c12
  aops:ubifs_file_address_operations [ubifs] ino:8f1 dentry name:"f30e"
  flags: 0x1fffff80002405(locked|uptodate|owner_priv_1|private|node=0|
  zone=1|lastcpupid=0x1fffff)
  page dumped because: VM_BUG_ON_PAGE(page_count(page) != 0)
  ------------[ cut here ]------------
  kernel BUG at include/linux/page_ref.h:184!
  invalid opcode: 0000 [#1] SMP
  CPU: 3 PID: 38 Comm: kcompactd0 Not tainted 5.15.0-rc5
  RIP: 0010:migrate_page_move_mapping+0xac3/0xe70
  Call Trace:
    ubifs_migrate_page+0x22/0xc0 [ubifs]
    move_to_new_page+0xb4/0x600
    migrate_pages+0x1523/0x1cc0
    compact_zone+0x8c5/0x14b0
    kcompactd+0x2bc/0x560
    kthread+0x18c/0x1e0
    ret_from_fork+0x1f/0x30

Before the time, we should make clean a concept, what does refcount means
in page gotten from grab_cache_page_write_begin(). There are 2 situations:
Situation 1: refcount is 3, page is created by __page_cache_alloc.
  TYPE_A - the write process is using this page
  TYPE_B - page is assigned to one certain mapping by calling
	   __add_to_page_cache_locked()
  TYPE_C - page is added into pagevec list corresponding current cpu by
	   calling lru_cache_add()
Situation 2: refcount is 2, page is gotten from the mapping's tree
  TYPE_B - page has been assigned to one certain mapping
  TYPE_A - the write process is using this page (by calling
	   page_cache_get_speculative())
Filesystem releases one refcount by calling put_page() in xxx_write_end(),
the released refcount corresponds to TYPE_A (write task is using it). If
there are any processes using a page, page migration process will skip the
page by judging whether expected_page_refs() equals to page refcount.

The BUG is caused by following process:
    PA(cpu 0)                           kcompactd(cpu 1)
				compact_zone
ubifs_write_begin
  page_a = grab_cache_page_write_begin
    add_to_page_cache_lru
      lru_cache_add
        pagevec_add // put page into cpu 0's pagevec
  (refcnf = 3, for page creation process)
ubifs_write_end
  SetPagePrivate(page_a) // doesn't increase page count !
  unlock_page(page_a)
  put_page(page_a)  // refcnt = 2
				[...]

    PB(cpu 0)
filemap_read
  filemap_get_pages
    add_to_page_cache_lru
      lru_cache_add
        __pagevec_lru_add // traverse all pages in cpu 0's pagevec
	  __pagevec_lru_add_fn
	    SetPageLRU(page_a)
				isolate_migratepages
                                  isolate_migratepages_block
				    get_page_unless_zero(page_a)
				    // refcnt = 3
                                      list_add(page_a, from_list)
				migrate_pages(from_list)
				  __unmap_and_move
				    move_to_new_page
				      ubifs_migrate_page(page_a)
				        migrate_page_move_mapping
					  expected_page_refs get 3
                                  (migration[1] + mapping[1] + private[1])
	 release_pages
	   put_page_testzero(page_a) // refcnt = 3
                                          page_ref_freeze  // refcnt = 0
	     page_ref_dec_and_test(0 - 1 = -1)
                                          page_ref_unfreeze
                                            VM_BUG_ON_PAGE(-1 != 0, page)

UBIFS doesn't increase the page refcount after setting private flag, which
leads to page migration task believes the page is not used by any other
processes, so the page is migrated. This causes concurrent accessing on
page refcount between put_page() called by other process(eg. read process
calls lru_cache_add) and page_ref_unfreeze() called by migration task.

Actually zhangjun has tried to fix this problem [2] by recalculating page
refcnt in ubifs_migrate_page(). It's better to follow MM rules [1], because
just like Kirill suggested in [2], we need to check all users of
page_has_private() helper. Like f2fs does in [3], fix it by adding/deleting
refcount when setting/clearing private for a page. BTW, according to [4],
we set 'page->private' as 1 because ubifs just simply SetPagePrivate().
And, [5] provided a common helper to set/clear page private, ubifs can
use this helper following the example of iomap, afs, btrfs, etc.

Jump [6] to find a reproducer.

[1] https://lore.kernel.org/lkml/2b19b3c4-2bc4-15fa-15cc-27a13e5c7af1@aol.com
[2] https://www.spinics.net/lists/linux-mtd/msg04018.html
[3] http://lkml.iu.edu/hypermail/linux/kernel/1903.0/03313.html
[4] https://lore.kernel.org/linux-f2fs-devel/20210422154705.GO3596236@casper.infradead.org
[5] https://lore.kernel.org/all/20200517214718.468-1-guoqing.jiang@cloud.ionos.com
[6] https://bugzilla.kernel.org/show_bug.cgi?id=214961

Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 fs/ubifs/file.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 5cfa28cd00cd..6b45a037a047 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -570,7 +570,7 @@ static int ubifs_write_end(struct file *file, struct address_space *mapping,
 	}
 
 	if (!PagePrivate(page)) {
-		SetPagePrivate(page);
+		attach_page_private(page, (void *)1);
 		atomic_long_inc(&c->dirty_pg_cnt);
 		__set_page_dirty_nobuffers(page);
 	}
@@ -947,7 +947,7 @@ static int do_writepage(struct page *page, int len)
 		release_existing_page_budget(c);
 
 	atomic_long_dec(&c->dirty_pg_cnt);
-	ClearPagePrivate(page);
+	detach_page_private(page);
 	ClearPageChecked(page);
 
 	kunmap(page);
@@ -1304,7 +1304,7 @@ static void ubifs_invalidatepage(struct page *page, unsigned int offset,
 		release_existing_page_budget(c);
 
 	atomic_long_dec(&c->dirty_pg_cnt);
-	ClearPagePrivate(page);
+	detach_page_private(page);
 	ClearPageChecked(page);
 }
 
@@ -1471,8 +1471,8 @@ static int ubifs_migrate_page(struct address_space *mapping,
 		return rc;
 
 	if (PagePrivate(page)) {
-		ClearPagePrivate(page);
-		SetPagePrivate(newpage);
+		detach_page_private(page);
+		attach_page_private(newpage, (void *)1);
 	}
 
 	if (mode != MIGRATE_SYNC_NO_COPY)
@@ -1496,7 +1496,7 @@ static int ubifs_releasepage(struct page *page, gfp_t unused_gfp_flags)
 		return 0;
 	ubifs_assert(c, PagePrivate(page));
 	ubifs_assert(c, 0);
-	ClearPagePrivate(page);
+	detach_page_private(page);
 	ClearPageChecked(page);
 	return 1;
 }
@@ -1567,7 +1567,7 @@ static vm_fault_t ubifs_vm_page_mkwrite(struct vm_fault *vmf)
 	else {
 		if (!PageChecked(page))
 			ubifs_convert_page_budget(c);
-		SetPagePrivate(page);
+		attach_page_private(page, (void *)1);
 		atomic_long_inc(&c->dirty_pg_cnt);
 		__set_page_dirty_nobuffers(page);
 	}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 10/15] ubifs: Fix to add refcount once page is set private
@ 2021-12-27  3:22   ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

MM defined the rule [1] very clearly that once page was set with PG_private
flag, we should increment the refcount in that page, also main flows like
pageout(), migrate_page() will assume there is one additional page
reference count if page_has_private() returns true. Otherwise, we may
get a BUG in page migration:

  page:0000000080d05b9d refcount:-1 mapcount:0 mapping:000000005f4d82a8
  index:0xe2 pfn:0x14c12
  aops:ubifs_file_address_operations [ubifs] ino:8f1 dentry name:"f30e"
  flags: 0x1fffff80002405(locked|uptodate|owner_priv_1|private|node=0|
  zone=1|lastcpupid=0x1fffff)
  page dumped because: VM_BUG_ON_PAGE(page_count(page) != 0)
  ------------[ cut here ]------------
  kernel BUG at include/linux/page_ref.h:184!
  invalid opcode: 0000 [#1] SMP
  CPU: 3 PID: 38 Comm: kcompactd0 Not tainted 5.15.0-rc5
  RIP: 0010:migrate_page_move_mapping+0xac3/0xe70
  Call Trace:
    ubifs_migrate_page+0x22/0xc0 [ubifs]
    move_to_new_page+0xb4/0x600
    migrate_pages+0x1523/0x1cc0
    compact_zone+0x8c5/0x14b0
    kcompactd+0x2bc/0x560
    kthread+0x18c/0x1e0
    ret_from_fork+0x1f/0x30

Before the time, we should make clean a concept, what does refcount means
in page gotten from grab_cache_page_write_begin(). There are 2 situations:
Situation 1: refcount is 3, page is created by __page_cache_alloc.
  TYPE_A - the write process is using this page
  TYPE_B - page is assigned to one certain mapping by calling
	   __add_to_page_cache_locked()
  TYPE_C - page is added into pagevec list corresponding current cpu by
	   calling lru_cache_add()
Situation 2: refcount is 2, page is gotten from the mapping's tree
  TYPE_B - page has been assigned to one certain mapping
  TYPE_A - the write process is using this page (by calling
	   page_cache_get_speculative())
Filesystem releases one refcount by calling put_page() in xxx_write_end(),
the released refcount corresponds to TYPE_A (write task is using it). If
there are any processes using a page, page migration process will skip the
page by judging whether expected_page_refs() equals to page refcount.

The BUG is caused by following process:
    PA(cpu 0)                           kcompactd(cpu 1)
				compact_zone
ubifs_write_begin
  page_a = grab_cache_page_write_begin
    add_to_page_cache_lru
      lru_cache_add
        pagevec_add // put page into cpu 0's pagevec
  (refcnf = 3, for page creation process)
ubifs_write_end
  SetPagePrivate(page_a) // doesn't increase page count !
  unlock_page(page_a)
  put_page(page_a)  // refcnt = 2
				[...]

    PB(cpu 0)
filemap_read
  filemap_get_pages
    add_to_page_cache_lru
      lru_cache_add
        __pagevec_lru_add // traverse all pages in cpu 0's pagevec
	  __pagevec_lru_add_fn
	    SetPageLRU(page_a)
				isolate_migratepages
                                  isolate_migratepages_block
				    get_page_unless_zero(page_a)
				    // refcnt = 3
                                      list_add(page_a, from_list)
				migrate_pages(from_list)
				  __unmap_and_move
				    move_to_new_page
				      ubifs_migrate_page(page_a)
				        migrate_page_move_mapping
					  expected_page_refs get 3
                                  (migration[1] + mapping[1] + private[1])
	 release_pages
	   put_page_testzero(page_a) // refcnt = 3
                                          page_ref_freeze  // refcnt = 0
	     page_ref_dec_and_test(0 - 1 = -1)
                                          page_ref_unfreeze
                                            VM_BUG_ON_PAGE(-1 != 0, page)

UBIFS doesn't increase the page refcount after setting private flag, which
leads to page migration task believes the page is not used by any other
processes, so the page is migrated. This causes concurrent accessing on
page refcount between put_page() called by other process(eg. read process
calls lru_cache_add) and page_ref_unfreeze() called by migration task.

Actually zhangjun has tried to fix this problem [2] by recalculating page
refcnt in ubifs_migrate_page(). It's better to follow MM rules [1], because
just like Kirill suggested in [2], we need to check all users of
page_has_private() helper. Like f2fs does in [3], fix it by adding/deleting
refcount when setting/clearing private for a page. BTW, according to [4],
we set 'page->private' as 1 because ubifs just simply SetPagePrivate().
And, [5] provided a common helper to set/clear page private, ubifs can
use this helper following the example of iomap, afs, btrfs, etc.

Jump [6] to find a reproducer.

[1] https://lore.kernel.org/lkml/2b19b3c4-2bc4-15fa-15cc-27a13e5c7af1@aol.com
[2] https://www.spinics.net/lists/linux-mtd/msg04018.html
[3] http://lkml.iu.edu/hypermail/linux/kernel/1903.0/03313.html
[4] https://lore.kernel.org/linux-f2fs-devel/20210422154705.GO3596236@casper.infradead.org
[5] https://lore.kernel.org/all/20200517214718.468-1-guoqing.jiang@cloud.ionos.com
[6] https://bugzilla.kernel.org/show_bug.cgi?id=214961

Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 fs/ubifs/file.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 5cfa28cd00cd..6b45a037a047 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -570,7 +570,7 @@ static int ubifs_write_end(struct file *file, struct address_space *mapping,
 	}
 
 	if (!PagePrivate(page)) {
-		SetPagePrivate(page);
+		attach_page_private(page, (void *)1);
 		atomic_long_inc(&c->dirty_pg_cnt);
 		__set_page_dirty_nobuffers(page);
 	}
@@ -947,7 +947,7 @@ static int do_writepage(struct page *page, int len)
 		release_existing_page_budget(c);
 
 	atomic_long_dec(&c->dirty_pg_cnt);
-	ClearPagePrivate(page);
+	detach_page_private(page);
 	ClearPageChecked(page);
 
 	kunmap(page);
@@ -1304,7 +1304,7 @@ static void ubifs_invalidatepage(struct page *page, unsigned int offset,
 		release_existing_page_budget(c);
 
 	atomic_long_dec(&c->dirty_pg_cnt);
-	ClearPagePrivate(page);
+	detach_page_private(page);
 	ClearPageChecked(page);
 }
 
@@ -1471,8 +1471,8 @@ static int ubifs_migrate_page(struct address_space *mapping,
 		return rc;
 
 	if (PagePrivate(page)) {
-		ClearPagePrivate(page);
-		SetPagePrivate(newpage);
+		detach_page_private(page);
+		attach_page_private(newpage, (void *)1);
 	}
 
 	if (mode != MIGRATE_SYNC_NO_COPY)
@@ -1496,7 +1496,7 @@ static int ubifs_releasepage(struct page *page, gfp_t unused_gfp_flags)
 		return 0;
 	ubifs_assert(c, PagePrivate(page));
 	ubifs_assert(c, 0);
-	ClearPagePrivate(page);
+	detach_page_private(page);
 	ClearPageChecked(page);
 	return 1;
 }
@@ -1567,7 +1567,7 @@ static vm_fault_t ubifs_vm_page_mkwrite(struct vm_fault *vmf)
 	else {
 		if (!PageChecked(page))
 			ubifs_convert_page_budget(c);
-		SetPagePrivate(page);
+		attach_page_private(page, (void *)1);
 		atomic_long_inc(&c->dirty_pg_cnt);
 		__set_page_dirty_nobuffers(page);
 	}
-- 
2.31.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 11/15] ubi: fastmap: Return error code if memory allocation fails in add_aeb()
  2021-12-27  3:22 ` Zhihao Cheng
@ 2021-12-27  3:22   ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

Abort fastmap scanning and return error code if memory allocation fails
in add_aeb(). Otherwise ubi will get wrong peb statistics information
after scanning.

Fixes: dbb7d2a88d2a7b ("UBI: Add fastmap core")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 drivers/mtd/ubi/fastmap.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
index 022af59906aa..6b5f1ffd961b 100644
--- a/drivers/mtd/ubi/fastmap.c
+++ b/drivers/mtd/ubi/fastmap.c
@@ -468,7 +468,9 @@ static int scan_pool(struct ubi_device *ubi, struct ubi_attach_info *ai,
 			if (err == UBI_IO_FF_BITFLIPS)
 				scrub = 1;
 
-			add_aeb(ai, free, pnum, ec, scrub);
+			ret = add_aeb(ai, free, pnum, ec, scrub);
+			if (ret)
+				goto out;
 			continue;
 		} else if (err == 0 || err == UBI_IO_BITFLIPS) {
 			dbg_bld("Found non empty PEB:%i in pool", pnum);
@@ -638,8 +640,10 @@ static int ubi_attach_fastmap(struct ubi_device *ubi,
 		if (fm_pos >= fm_size)
 			goto fail_bad;
 
-		add_aeb(ai, &ai->free, be32_to_cpu(fmec->pnum),
-			be32_to_cpu(fmec->ec), 0);
+		ret = add_aeb(ai, &ai->free, be32_to_cpu(fmec->pnum),
+			      be32_to_cpu(fmec->ec), 0);
+		if (ret)
+			goto fail;
 	}
 
 	/* read EC values from used list */
@@ -649,8 +653,10 @@ static int ubi_attach_fastmap(struct ubi_device *ubi,
 		if (fm_pos >= fm_size)
 			goto fail_bad;
 
-		add_aeb(ai, &used, be32_to_cpu(fmec->pnum),
-			be32_to_cpu(fmec->ec), 0);
+		ret = add_aeb(ai, &used, be32_to_cpu(fmec->pnum),
+			      be32_to_cpu(fmec->ec), 0);
+		if (ret)
+			goto fail;
 	}
 
 	/* read EC values from scrub list */
@@ -660,8 +666,10 @@ static int ubi_attach_fastmap(struct ubi_device *ubi,
 		if (fm_pos >= fm_size)
 			goto fail_bad;
 
-		add_aeb(ai, &used, be32_to_cpu(fmec->pnum),
-			be32_to_cpu(fmec->ec), 1);
+		ret = add_aeb(ai, &used, be32_to_cpu(fmec->pnum),
+			      be32_to_cpu(fmec->ec), 1);
+		if (ret)
+			goto fail;
 	}
 
 	/* read EC values from erase list */
@@ -671,8 +679,10 @@ static int ubi_attach_fastmap(struct ubi_device *ubi,
 		if (fm_pos >= fm_size)
 			goto fail_bad;
 
-		add_aeb(ai, &ai->erase, be32_to_cpu(fmec->pnum),
-			be32_to_cpu(fmec->ec), 1);
+		ret = add_aeb(ai, &ai->erase, be32_to_cpu(fmec->pnum),
+			      be32_to_cpu(fmec->ec), 1);
+		if (ret)
+			goto fail;
 	}
 
 	ai->mean_ec = div_u64(ai->ec_sum, ai->ec_count);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 11/15] ubi: fastmap: Return error code if memory allocation fails in add_aeb()
@ 2021-12-27  3:22   ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

Abort fastmap scanning and return error code if memory allocation fails
in add_aeb(). Otherwise ubi will get wrong peb statistics information
after scanning.

Fixes: dbb7d2a88d2a7b ("UBI: Add fastmap core")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 drivers/mtd/ubi/fastmap.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
index 022af59906aa..6b5f1ffd961b 100644
--- a/drivers/mtd/ubi/fastmap.c
+++ b/drivers/mtd/ubi/fastmap.c
@@ -468,7 +468,9 @@ static int scan_pool(struct ubi_device *ubi, struct ubi_attach_info *ai,
 			if (err == UBI_IO_FF_BITFLIPS)
 				scrub = 1;
 
-			add_aeb(ai, free, pnum, ec, scrub);
+			ret = add_aeb(ai, free, pnum, ec, scrub);
+			if (ret)
+				goto out;
 			continue;
 		} else if (err == 0 || err == UBI_IO_BITFLIPS) {
 			dbg_bld("Found non empty PEB:%i in pool", pnum);
@@ -638,8 +640,10 @@ static int ubi_attach_fastmap(struct ubi_device *ubi,
 		if (fm_pos >= fm_size)
 			goto fail_bad;
 
-		add_aeb(ai, &ai->free, be32_to_cpu(fmec->pnum),
-			be32_to_cpu(fmec->ec), 0);
+		ret = add_aeb(ai, &ai->free, be32_to_cpu(fmec->pnum),
+			      be32_to_cpu(fmec->ec), 0);
+		if (ret)
+			goto fail;
 	}
 
 	/* read EC values from used list */
@@ -649,8 +653,10 @@ static int ubi_attach_fastmap(struct ubi_device *ubi,
 		if (fm_pos >= fm_size)
 			goto fail_bad;
 
-		add_aeb(ai, &used, be32_to_cpu(fmec->pnum),
-			be32_to_cpu(fmec->ec), 0);
+		ret = add_aeb(ai, &used, be32_to_cpu(fmec->pnum),
+			      be32_to_cpu(fmec->ec), 0);
+		if (ret)
+			goto fail;
 	}
 
 	/* read EC values from scrub list */
@@ -660,8 +666,10 @@ static int ubi_attach_fastmap(struct ubi_device *ubi,
 		if (fm_pos >= fm_size)
 			goto fail_bad;
 
-		add_aeb(ai, &used, be32_to_cpu(fmec->pnum),
-			be32_to_cpu(fmec->ec), 1);
+		ret = add_aeb(ai, &used, be32_to_cpu(fmec->pnum),
+			      be32_to_cpu(fmec->ec), 1);
+		if (ret)
+			goto fail;
 	}
 
 	/* read EC values from erase list */
@@ -671,8 +679,10 @@ static int ubi_attach_fastmap(struct ubi_device *ubi,
 		if (fm_pos >= fm_size)
 			goto fail_bad;
 
-		add_aeb(ai, &ai->erase, be32_to_cpu(fmec->pnum),
-			be32_to_cpu(fmec->ec), 1);
+		ret = add_aeb(ai, &ai->erase, be32_to_cpu(fmec->pnum),
+			      be32_to_cpu(fmec->ec), 1);
+		if (ret)
+			goto fail;
 	}
 
 	ai->mean_ec = div_u64(ai->ec_sum, ai->ec_count);
-- 
2.31.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
  2021-12-27  3:22 ` Zhihao Cheng
@ 2021-12-27  3:22   ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

Fastmap pebs(pnum >= UBI_FM_MAX_START) won't be added into 'ai->fastmap'
while attaching ubi device if 'fm->used_blocks' is greater than 2, which
may cause warning from 'ubi_assert(ubi->good_peb_count == found_pebs)':

  UBI assert failed in ubi_wl_init at 1878 (pid 2409)
  Call Trace:
    ubi_wl_init.cold+0xae/0x2af [ubi]
    ubi_attach+0x1b0/0x780 [ubi]
    ubi_init+0x23a/0x3ad [ubi]
    load_module+0x22d2/0x2430

Reproduce:
  ID="0x20,0x33,0x00,0x00" # 16M 16KB PEB, 512 page
  modprobe nandsim id_bytes=$ID
  modprobe ubi mtd="0,0" fm_autoconvert  # Fastmap takes 2 pebs
  rmmod ubi
  modprobe ubi mtd="0,0" fm_autoconvert  # Attach by fastmap

Add all used fastmap pebs into list 'ai->fastmap' to make sure they can
be counted into 'found_pebs'.

Fixes: fdf10ed710c0aa ("ubi: Rework Fastmap attach base code")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 drivers/mtd/ubi/fastmap.c | 35 +++++------------------------------
 1 file changed, 5 insertions(+), 30 deletions(-)

diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
index 6b5f1ffd961b..01dcdd94c9d2 100644
--- a/drivers/mtd/ubi/fastmap.c
+++ b/drivers/mtd/ubi/fastmap.c
@@ -828,24 +828,6 @@ static int find_fm_anchor(struct ubi_attach_info *ai)
 	return ret;
 }
 
-static struct ubi_ainf_peb *clone_aeb(struct ubi_attach_info *ai,
-				      struct ubi_ainf_peb *old)
-{
-	struct ubi_ainf_peb *new;
-
-	new = ubi_alloc_aeb(ai, old->pnum, old->ec);
-	if (!new)
-		return NULL;
-
-	new->vol_id = old->vol_id;
-	new->sqnum = old->sqnum;
-	new->lnum = old->lnum;
-	new->scrub = old->scrub;
-	new->copy_flag = old->copy_flag;
-
-	return new;
-}
-
 /**
  * ubi_scan_fastmap - scan the fastmap.
  * @ubi: UBI device object
@@ -865,7 +847,6 @@ int ubi_scan_fastmap(struct ubi_device *ubi, struct ubi_attach_info *ai,
 	struct ubi_vid_hdr *vh;
 	struct ubi_ec_hdr *ech;
 	struct ubi_fastmap_layout *fm;
-	struct ubi_ainf_peb *aeb;
 	int i, used_blocks, pnum, fm_anchor, ret = 0;
 	size_t fm_size;
 	__be32 crc, tmp_crc;
@@ -875,17 +856,6 @@ int ubi_scan_fastmap(struct ubi_device *ubi, struct ubi_attach_info *ai,
 	if (fm_anchor < 0)
 		return UBI_NO_FASTMAP;
 
-	/* Copy all (possible) fastmap blocks into our new attach structure. */
-	list_for_each_entry(aeb, &scan_ai->fastmap, u.list) {
-		struct ubi_ainf_peb *new;
-
-		new = clone_aeb(ai, aeb);
-		if (!new)
-			return -ENOMEM;
-
-		list_add(&new->u.list, &ai->fastmap);
-	}
-
 	down_write(&ubi->fm_protect);
 	memset(ubi->fm_buf, 0, ubi->fm_size);
 
@@ -1029,6 +999,11 @@ int ubi_scan_fastmap(struct ubi_device *ubi, struct ubi_attach_info *ai,
 				"err: %i)", i, pnum, ret);
 			goto free_hdr;
 		}
+
+		/* Add all fastmap blocks into attach structure. */
+		ret = add_aeb(ai, &ai->fastmap, pnum, be64_to_cpu(ech->ec), 0);
+		if (ret)
+			goto free_hdr;
 	}
 
 	kfree(fmsb);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
@ 2021-12-27  3:22   ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

Fastmap pebs(pnum >= UBI_FM_MAX_START) won't be added into 'ai->fastmap'
while attaching ubi device if 'fm->used_blocks' is greater than 2, which
may cause warning from 'ubi_assert(ubi->good_peb_count == found_pebs)':

  UBI assert failed in ubi_wl_init at 1878 (pid 2409)
  Call Trace:
    ubi_wl_init.cold+0xae/0x2af [ubi]
    ubi_attach+0x1b0/0x780 [ubi]
    ubi_init+0x23a/0x3ad [ubi]
    load_module+0x22d2/0x2430

Reproduce:
  ID="0x20,0x33,0x00,0x00" # 16M 16KB PEB, 512 page
  modprobe nandsim id_bytes=$ID
  modprobe ubi mtd="0,0" fm_autoconvert  # Fastmap takes 2 pebs
  rmmod ubi
  modprobe ubi mtd="0,0" fm_autoconvert  # Attach by fastmap

Add all used fastmap pebs into list 'ai->fastmap' to make sure they can
be counted into 'found_pebs'.

Fixes: fdf10ed710c0aa ("ubi: Rework Fastmap attach base code")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 drivers/mtd/ubi/fastmap.c | 35 +++++------------------------------
 1 file changed, 5 insertions(+), 30 deletions(-)

diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
index 6b5f1ffd961b..01dcdd94c9d2 100644
--- a/drivers/mtd/ubi/fastmap.c
+++ b/drivers/mtd/ubi/fastmap.c
@@ -828,24 +828,6 @@ static int find_fm_anchor(struct ubi_attach_info *ai)
 	return ret;
 }
 
-static struct ubi_ainf_peb *clone_aeb(struct ubi_attach_info *ai,
-				      struct ubi_ainf_peb *old)
-{
-	struct ubi_ainf_peb *new;
-
-	new = ubi_alloc_aeb(ai, old->pnum, old->ec);
-	if (!new)
-		return NULL;
-
-	new->vol_id = old->vol_id;
-	new->sqnum = old->sqnum;
-	new->lnum = old->lnum;
-	new->scrub = old->scrub;
-	new->copy_flag = old->copy_flag;
-
-	return new;
-}
-
 /**
  * ubi_scan_fastmap - scan the fastmap.
  * @ubi: UBI device object
@@ -865,7 +847,6 @@ int ubi_scan_fastmap(struct ubi_device *ubi, struct ubi_attach_info *ai,
 	struct ubi_vid_hdr *vh;
 	struct ubi_ec_hdr *ech;
 	struct ubi_fastmap_layout *fm;
-	struct ubi_ainf_peb *aeb;
 	int i, used_blocks, pnum, fm_anchor, ret = 0;
 	size_t fm_size;
 	__be32 crc, tmp_crc;
@@ -875,17 +856,6 @@ int ubi_scan_fastmap(struct ubi_device *ubi, struct ubi_attach_info *ai,
 	if (fm_anchor < 0)
 		return UBI_NO_FASTMAP;
 
-	/* Copy all (possible) fastmap blocks into our new attach structure. */
-	list_for_each_entry(aeb, &scan_ai->fastmap, u.list) {
-		struct ubi_ainf_peb *new;
-
-		new = clone_aeb(ai, aeb);
-		if (!new)
-			return -ENOMEM;
-
-		list_add(&new->u.list, &ai->fastmap);
-	}
-
 	down_write(&ubi->fm_protect);
 	memset(ubi->fm_buf, 0, ubi->fm_size);
 
@@ -1029,6 +999,11 @@ int ubi_scan_fastmap(struct ubi_device *ubi, struct ubi_attach_info *ai,
 				"err: %i)", i, pnum, ret);
 			goto free_hdr;
 		}
+
+		/* Add all fastmap blocks into attach structure. */
+		ret = add_aeb(ai, &ai->fastmap, pnum, be64_to_cpu(ech->ec), 0);
+		if (ret)
+			goto free_hdr;
 	}
 
 	kfree(fmsb);
-- 
2.31.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 13/15] ubifs: ubifs_writepage: Mark page dirty after writing inode failed
  2021-12-27  3:22 ` Zhihao Cheng
@ 2021-12-27  3:22   ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

There are two states for ubifs writing pages:
1. Dirty, Private
2. Not Dirty, Not Private

There is a third possibility which maybe related to [1] that page is
private but not dirty caused by following process:

          PA
lock(page)
ubifs_write_end
  attach_page_private		// set Private
    __set_page_dirty_nobuffers	// set Dirty
unlock(page)

write_cache_pages
  lock(page)
  clear_page_dirty_for_io(page)	// clear Dirty
  ubifs_writepage
    write_inode
    // fail, goto out, following codes are not executed
    // do_writepage
    //   set_page_writeback 	// set Writeback
    //   detach_page_private	// clear Private
    //   end_page_writeback 	// clear Writeback
    out:
    unlock(page)		// Private, Not Dirty

                                       PB
				ksys_fadvise64_64
				  generic_fadvise
				     invalidate_inode_page
				     // page is neither Dirty nor Writeback
				       invalidate_complete_page
				       // page_has_private is true
					 try_to_release_page
					   ubifs_releasepage
					     ubifs_assert(c, 0) !!!

Then we may get following assertion failed:
  UBIFS error (ubi0:0 pid 1492): ubifs_assert_failed [ubifs]:
  UBIFS assert failed: 0, in fs/ubifs/file.c:1499
  UBIFS warning (ubi0:0 pid 1492): ubifs_ro_mode [ubifs]:
  switched to read-only mode, error -22
  CPU: 2 PID: 1492 Comm: aa Not tainted 5.16.0-rc2-00012-g7bb767dee0ba-dirty
  Call Trace:
    dump_stack+0x13/0x1b
    ubifs_ro_mode+0x54/0x60 [ubifs]
    ubifs_assert_failed+0x4b/0x80 [ubifs]
    ubifs_releasepage+0x7e/0x1e0 [ubifs]
    try_to_release_page+0x57/0xe0
    invalidate_inode_page+0xfb/0x130
    invalidate_mapping_pagevec+0x12/0x20
    generic_fadvise+0x303/0x3c0
    vfs_fadvise+0x35/0x40
    ksys_fadvise64_64+0x4c/0xb0

Jump [2] to find a reproducer.

[1] https://linux-mtd.infradead.narkive.com/NQoBeT1u/patch-rfc-ubifs-fix-assert-failed-in-ubifs-set-page-dirty
[2] https://bugzilla.kernel.org/show_bug.cgi?id=215357

Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 fs/ubifs/file.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 6b45a037a047..7cc2abcb70ae 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1031,7 +1031,7 @@ static int ubifs_writepage(struct page *page, struct writeback_control *wbc)
 		if (page->index >= synced_i_size >> PAGE_SHIFT) {
 			err = inode->i_sb->s_op->write_inode(inode, NULL);
 			if (err)
-				goto out_unlock;
+				goto out_redirty;
 			/*
 			 * The inode has been written, but the write-buffer has
 			 * not been synchronized, so in case of an unclean
@@ -1059,11 +1059,17 @@ static int ubifs_writepage(struct page *page, struct writeback_control *wbc)
 	if (i_size > synced_i_size) {
 		err = inode->i_sb->s_op->write_inode(inode, NULL);
 		if (err)
-			goto out_unlock;
+			goto out_redirty;
 	}
 
 	return do_writepage(page, len);
-
+out_redirty:
+	/*
+	 * redirty_page_for_writepage() won't call ubifs_dirty_inode() because
+	 * it passes I_DIRTY_PAGES flag while calling __mark_inode_dirty(), so
+	 * there is no need to do space budget for dirty inode.
+	 */
+	redirty_page_for_writepage(wbc, page);
 out_unlock:
 	unlock_page(page);
 	return err;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 13/15] ubifs: ubifs_writepage: Mark page dirty after writing inode failed
@ 2021-12-27  3:22   ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

There are two states for ubifs writing pages:
1. Dirty, Private
2. Not Dirty, Not Private

There is a third possibility which maybe related to [1] that page is
private but not dirty caused by following process:

          PA
lock(page)
ubifs_write_end
  attach_page_private		// set Private
    __set_page_dirty_nobuffers	// set Dirty
unlock(page)

write_cache_pages
  lock(page)
  clear_page_dirty_for_io(page)	// clear Dirty
  ubifs_writepage
    write_inode
    // fail, goto out, following codes are not executed
    // do_writepage
    //   set_page_writeback 	// set Writeback
    //   detach_page_private	// clear Private
    //   end_page_writeback 	// clear Writeback
    out:
    unlock(page)		// Private, Not Dirty

                                       PB
				ksys_fadvise64_64
				  generic_fadvise
				     invalidate_inode_page
				     // page is neither Dirty nor Writeback
				       invalidate_complete_page
				       // page_has_private is true
					 try_to_release_page
					   ubifs_releasepage
					     ubifs_assert(c, 0) !!!

Then we may get following assertion failed:
  UBIFS error (ubi0:0 pid 1492): ubifs_assert_failed [ubifs]:
  UBIFS assert failed: 0, in fs/ubifs/file.c:1499
  UBIFS warning (ubi0:0 pid 1492): ubifs_ro_mode [ubifs]:
  switched to read-only mode, error -22
  CPU: 2 PID: 1492 Comm: aa Not tainted 5.16.0-rc2-00012-g7bb767dee0ba-dirty
  Call Trace:
    dump_stack+0x13/0x1b
    ubifs_ro_mode+0x54/0x60 [ubifs]
    ubifs_assert_failed+0x4b/0x80 [ubifs]
    ubifs_releasepage+0x7e/0x1e0 [ubifs]
    try_to_release_page+0x57/0xe0
    invalidate_inode_page+0xfb/0x130
    invalidate_mapping_pagevec+0x12/0x20
    generic_fadvise+0x303/0x3c0
    vfs_fadvise+0x35/0x40
    ksys_fadvise64_64+0x4c/0xb0

Jump [2] to find a reproducer.

[1] https://linux-mtd.infradead.narkive.com/NQoBeT1u/patch-rfc-ubifs-fix-assert-failed-in-ubifs-set-page-dirty
[2] https://bugzilla.kernel.org/show_bug.cgi?id=215357

Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 fs/ubifs/file.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 6b45a037a047..7cc2abcb70ae 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1031,7 +1031,7 @@ static int ubifs_writepage(struct page *page, struct writeback_control *wbc)
 		if (page->index >= synced_i_size >> PAGE_SHIFT) {
 			err = inode->i_sb->s_op->write_inode(inode, NULL);
 			if (err)
-				goto out_unlock;
+				goto out_redirty;
 			/*
 			 * The inode has been written, but the write-buffer has
 			 * not been synchronized, so in case of an unclean
@@ -1059,11 +1059,17 @@ static int ubifs_writepage(struct page *page, struct writeback_control *wbc)
 	if (i_size > synced_i_size) {
 		err = inode->i_sb->s_op->write_inode(inode, NULL);
 		if (err)
-			goto out_unlock;
+			goto out_redirty;
 	}
 
 	return do_writepage(page, len);
-
+out_redirty:
+	/*
+	 * redirty_page_for_writepage() won't call ubifs_dirty_inode() because
+	 * it passes I_DIRTY_PAGES flag while calling __mark_inode_dirty(), so
+	 * there is no need to do space budget for dirty inode.
+	 */
+	redirty_page_for_writepage(wbc, page);
 out_unlock:
 	unlock_page(page);
 	return err;
-- 
2.31.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 14/15] ubifs: ubifs_releasepage: Remove ubifs_assert(0) to valid this process
  2021-12-27  3:22 ` Zhihao Cheng
@ 2021-12-27  3:22   ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

There are two states for ubifs writing pages:
1. Dirty, Private
2. Not Dirty, Not Private

The normal process cannot go to ubifs_releasepage() which means there
exists pages being private but not dirty. Reproducer[1] shows that it
could occur (which maybe related to [2]) with following process:

     PA                     PB                    PC
lock(page)[PA]
ubifs_write_end
  attach_page_private         // set Private
  __set_page_dirty_nobuffers  // set Dirty
unlock(page)

write_cache_pages[PA]
  lock(page)
  clear_page_dirty_for_io(page)	// clear Dirty
  ubifs_writepage

                        do_truncation[PB]
			  truncate_setsize
			    i_size_write(inode, newsize) // newsize = 0

    i_size = i_size_read(inode)	// i_size = 0
    end_index = i_size >> PAGE_SHIFT
    if (page->index > end_index)
      goto out // jump
out:
unlock(page)   // Private, Not Dirty

						generic_fadvise[PC]
						  lock(page)
						  invalidate_inode_page
						    try_to_release_page
						      ubifs_releasepage
						        ubifs_assert(c, 0)
		                                        // bad assertion!
						  unlock(page)
			  truncate_pagecache[PB]

Then we may get following assertion failed:
  UBIFS error (ubi0:0 pid 1683): ubifs_assert_failed [ubifs]:
  UBIFS assert failed: 0, in fs/ubifs/file.c:1513
  UBIFS warning (ubi0:0 pid 1683): ubifs_ro_mode [ubifs]:
  switched to read-only mode, error -22
  CPU: 2 PID: 1683 Comm: aa Not tainted 5.16.0-rc5-00184-g0bca5994cacc-dirty #308
  Call Trace:
    dump_stack+0x13/0x1b
    ubifs_ro_mode+0x54/0x60 [ubifs]
    ubifs_assert_failed+0x4b/0x80 [ubifs]
    ubifs_releasepage+0x67/0x1d0 [ubifs]
    try_to_release_page+0x57/0xe0
    invalidate_inode_page+0xfb/0x130
    __invalidate_mapping_pages+0xb9/0x280
    invalidate_mapping_pagevec+0x12/0x20
    generic_fadvise+0x303/0x3c0
    ksys_fadvise64_64+0x4c/0xb0

[1] https://bugzilla.kernel.org/show_bug.cgi?id=215373
[2] https://linux-mtd.infradead.narkive.com/NQoBeT1u/patch-rfc-ubifs-fix-assert-failed-in-ubifs-set-page-dirty

Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 fs/ubifs/file.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 7cc2abcb70ae..4bafcb80d29c 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1494,14 +1494,23 @@ static int ubifs_releasepage(struct page *page, gfp_t unused_gfp_flags)
 	struct inode *inode = page->mapping->host;
 	struct ubifs_info *c = inode->i_sb->s_fs_info;
 
-	/*
-	 * An attempt to release a dirty page without budgeting for it - should
-	 * not happen.
-	 */
 	if (PageWriteback(page))
 		return 0;
+
+	/*
+	 * Page is private but not dirty, weird? There is one condition
+	 * making it happened. ubifs_writepage skipped the page because
+	 * page index beyonds isize (for example. truncated by other
+	 * process named A), then the page is invalidated by fadvise64
+	 * syscall before being truncated by process A.
+	 */
 	ubifs_assert(c, PagePrivate(page));
-	ubifs_assert(c, 0);
+	if (PageChecked(page))
+		release_new_page_budget(c);
+	else
+		release_existing_page_budget(c);
+
+	atomic_long_dec(&c->dirty_pg_cnt);
 	detach_page_private(page);
 	ClearPageChecked(page);
 	return 1;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 14/15] ubifs: ubifs_releasepage: Remove ubifs_assert(0) to valid this process
@ 2021-12-27  3:22   ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

There are two states for ubifs writing pages:
1. Dirty, Private
2. Not Dirty, Not Private

The normal process cannot go to ubifs_releasepage() which means there
exists pages being private but not dirty. Reproducer[1] shows that it
could occur (which maybe related to [2]) with following process:

     PA                     PB                    PC
lock(page)[PA]
ubifs_write_end
  attach_page_private         // set Private
  __set_page_dirty_nobuffers  // set Dirty
unlock(page)

write_cache_pages[PA]
  lock(page)
  clear_page_dirty_for_io(page)	// clear Dirty
  ubifs_writepage

                        do_truncation[PB]
			  truncate_setsize
			    i_size_write(inode, newsize) // newsize = 0

    i_size = i_size_read(inode)	// i_size = 0
    end_index = i_size >> PAGE_SHIFT
    if (page->index > end_index)
      goto out // jump
out:
unlock(page)   // Private, Not Dirty

						generic_fadvise[PC]
						  lock(page)
						  invalidate_inode_page
						    try_to_release_page
						      ubifs_releasepage
						        ubifs_assert(c, 0)
		                                        // bad assertion!
						  unlock(page)
			  truncate_pagecache[PB]

Then we may get following assertion failed:
  UBIFS error (ubi0:0 pid 1683): ubifs_assert_failed [ubifs]:
  UBIFS assert failed: 0, in fs/ubifs/file.c:1513
  UBIFS warning (ubi0:0 pid 1683): ubifs_ro_mode [ubifs]:
  switched to read-only mode, error -22
  CPU: 2 PID: 1683 Comm: aa Not tainted 5.16.0-rc5-00184-g0bca5994cacc-dirty #308
  Call Trace:
    dump_stack+0x13/0x1b
    ubifs_ro_mode+0x54/0x60 [ubifs]
    ubifs_assert_failed+0x4b/0x80 [ubifs]
    ubifs_releasepage+0x67/0x1d0 [ubifs]
    try_to_release_page+0x57/0xe0
    invalidate_inode_page+0xfb/0x130
    __invalidate_mapping_pages+0xb9/0x280
    invalidate_mapping_pagevec+0x12/0x20
    generic_fadvise+0x303/0x3c0
    ksys_fadvise64_64+0x4c/0xb0

[1] https://bugzilla.kernel.org/show_bug.cgi?id=215373
[2] https://linux-mtd.infradead.narkive.com/NQoBeT1u/patch-rfc-ubifs-fix-assert-failed-in-ubifs-set-page-dirty

Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 fs/ubifs/file.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 7cc2abcb70ae..4bafcb80d29c 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1494,14 +1494,23 @@ static int ubifs_releasepage(struct page *page, gfp_t unused_gfp_flags)
 	struct inode *inode = page->mapping->host;
 	struct ubifs_info *c = inode->i_sb->s_fs_info;
 
-	/*
-	 * An attempt to release a dirty page without budgeting for it - should
-	 * not happen.
-	 */
 	if (PageWriteback(page))
 		return 0;
+
+	/*
+	 * Page is private but not dirty, weird? There is one condition
+	 * making it happened. ubifs_writepage skipped the page because
+	 * page index beyonds isize (for example. truncated by other
+	 * process named A), then the page is invalidated by fadvise64
+	 * syscall before being truncated by process A.
+	 */
 	ubifs_assert(c, PagePrivate(page));
-	ubifs_assert(c, 0);
+	if (PageChecked(page))
+		release_new_page_budget(c);
+	else
+		release_existing_page_budget(c);
+
+	atomic_long_dec(&c->dirty_pg_cnt);
 	detach_page_private(page);
 	ClearPageChecked(page);
 	return 1;
-- 
2.31.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 15/15] ubi: fastmap: Fix high cpu usage of ubi_bgt by making sure wl_pool not empty
  2021-12-27  3:22 ` Zhihao Cheng
@ 2021-12-27  3:22   ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

There at least 6 PEBs reserved on UBI device:
1. EBA_RESERVED_PEBS[1]
2. WL_RESERVED_PEBS[1]
3. UBI_LAYOUT_VOLUME_EBS[2]
4. MIN_FASTMAP_RESERVED_PEBS[2]

When all ubi volumes take all their PEBs, there are 3 (EBA_RESERVED_PEBS +
WL_RESERVED_PEBS + MIN_FASTMAP_RESERVED_PEBS - MIN_FASTMAP_TAKEN_PEBS[1])
free PEBs. Since f9c34bb529975fe ("ubi: Fix producing anchor PEBs") and
4b68bf9a69d22dd ("ubi: Select fastmap anchor PEBs considering wear level
rules") applied, there is only 1 (3 - FASTMAP_ANCHOR_PEBS[1] -
FASTMAP_NEXT_ANCHOR_PEBS[1]) free PEB to fill pool and wl_pool, after
filling pool, wl_pool is always empty. So, UBI could be stuck in an
infinite loop:

	ubi_thread	   system_wq
wear_leveling_worker <--------------------------------------------------
  get_peb_for_wl							|
    // fm_wl_pool, used = size = 0					|
    schedule_work(&ubi->fm_work)					|
									|
		    update_fastmap_work_fn				|
		      ubi_update_fastmap				|
			ubi_refill_pools				|
			// ubi->free_count - ubi->beb_rsvd_pebs < 5	|
			// wl_pool is not filled with any PEBs		|
			schedule_erase(old_fm_anchor)			|
			ubi_ensure_anchor_pebs				|
			  __schedule_ubi_work(wear_leveling_worker)	|
									|
__erase_worker								|
  ensure_wear_leveling							|
    __schedule_ubi_work(wear_leveling_worker) --------------------------

, which cause high cpu usage of ubi_bgt:
top - 12:10:42 up 5 min,  2 users,  load average: 1.76, 0.68, 0.27
Tasks: 123 total,   3 running,  54 sleeping,   0 stopped,   0 zombie

  PID USER PR   NI VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1589 root 20   0   0      0      0 R  45.0  0.0   0:38.86 ubi_bgt0d
  319 root 20   0   0      0      0 I  15.2  0.0   0:15.29 kworker/0:3-eve
  371 root 20   0   0      0      0 I  14.9  0.0   0:12.85 kworker/3:3-eve
   20 root 20   0   0      0      0 I  11.3  0.0   0:05.33 kworker/1:0-eve
  202 root 20   0   0      0      0 I  11.3  0.0   0:04.93 kworker/2:3-eve

The fm_next_anchor always takes one free PEB to prepare for updating
fastmap, fm_anchor and fm_next_anchor are always allocated first of pool
and wl_pool, so UBI can regard these two PEBs as reserved PEBs.

Fix it by:
  1) Adding 2 PEBs reserved for fm_anchor and fm_next_anchor.
  2) Abandoning filling wl_pool until free count belows beb_rsvd_pebs.
Then, there are at least 2(EBA_RESERVED_PEBS + MIN_FASTMAP_RESERVED_PEBS -
MIN_FASTMAP_TAKEN_PEBS[1]) PEBs in pool and 1(WL_RESERVED_PEBS) PEB in
wl_pool after calling ubi_refill_pools() with all erase works done.

This modification will cause a compatibility problem with old UBI image.
If UBI volumes take the maximun number of PEBs for one certain UBI device,
there are no available PEBs to satisfy 2 new reserved PEBs, bad reserved
PEBs are taken firstly, if still not enough, ENOSPC will returned from ubi
initialization.

Fetch a reproducer in [Link].

Fixes: f9c34bb529975fe ("ubi: Fix producing anchor PEBs")
Fixes: 4b68bf9a69d22dd ("ubi: Select fastmap anchor PEBs ... rules")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215407
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 drivers/mtd/ubi/fastmap-wl.c |  5 +++--
 drivers/mtd/ubi/wl.h         | 11 +++++++++--
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/mtd/ubi/fastmap-wl.c b/drivers/mtd/ubi/fastmap-wl.c
index 28f55f9cf715..18f23e76fee9 100644
--- a/drivers/mtd/ubi/fastmap-wl.c
+++ b/drivers/mtd/ubi/fastmap-wl.c
@@ -134,7 +134,8 @@ void ubi_refill_pools(struct ubi_device *ubi)
 	for (;;) {
 		enough = 0;
 		if (pool->size < pool->max_size) {
-			if (!ubi->free.rb_node)
+			if (!ubi->free.rb_node ||
+			    (ubi->free_count <= ubi->beb_rsvd_pebs))
 				break;
 
 			e = wl_get_wle(ubi);
@@ -148,7 +149,7 @@ void ubi_refill_pools(struct ubi_device *ubi)
 
 		if (wl_pool->size < wl_pool->max_size) {
 			if (!ubi->free.rb_node ||
-			   (ubi->free_count - ubi->beb_rsvd_pebs < 5))
+			    (ubi->free_count <= ubi->beb_rsvd_pebs))
 				break;
 
 			e = find_wl_entry(ubi, &ubi->free, WL_FREE_MAX_DIFF);
diff --git a/drivers/mtd/ubi/wl.h b/drivers/mtd/ubi/wl.h
index c93a53293786..e6c8f68d5ffb 100644
--- a/drivers/mtd/ubi/wl.h
+++ b/drivers/mtd/ubi/wl.h
@@ -2,14 +2,21 @@
 #ifndef UBI_WL_H
 #define UBI_WL_H
 #ifdef CONFIG_MTD_UBI_FASTMAP
+
+/* Number of physical eraseblocks for fm anchor */
+#define FASTMAP_ANCHOR_PEBS 1
+/* Number of physical eraseblocks for next fm anchor */
+#define FASTMAP_NEXT_ANCHOR_PEBS 1
+
 static void update_fastmap_work_fn(struct work_struct *wrk);
 static struct ubi_wl_entry *find_anchor_wl_entry(struct rb_root *root);
 static struct ubi_wl_entry *get_peb_for_wl(struct ubi_device *ubi);
 static void ubi_fastmap_close(struct ubi_device *ubi);
 static inline void ubi_fastmap_init(struct ubi_device *ubi, int *count)
 {
-	/* Reserve enough LEBs to store two fastmaps. */
-	*count += (ubi->fm_size / ubi->leb_size) * 2;
+	/* Reserve enough LEBs to store two fastmaps and anchor fm pebs */
+	*count += (ubi->fm_size / ubi->leb_size) * 2 +
+		  FASTMAP_ANCHOR_PEBS + FASTMAP_NEXT_ANCHOR_PEBS;
 	INIT_WORK(&ubi->fm_work, update_fastmap_work_fn);
 }
 static struct ubi_wl_entry *may_reserve_for_fm(struct ubi_device *ubi,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 15/15] ubi: fastmap: Fix high cpu usage of ubi_bgt by making sure wl_pool not empty
@ 2021-12-27  3:22   ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27  3:22 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

There at least 6 PEBs reserved on UBI device:
1. EBA_RESERVED_PEBS[1]
2. WL_RESERVED_PEBS[1]
3. UBI_LAYOUT_VOLUME_EBS[2]
4. MIN_FASTMAP_RESERVED_PEBS[2]

When all ubi volumes take all their PEBs, there are 3 (EBA_RESERVED_PEBS +
WL_RESERVED_PEBS + MIN_FASTMAP_RESERVED_PEBS - MIN_FASTMAP_TAKEN_PEBS[1])
free PEBs. Since f9c34bb529975fe ("ubi: Fix producing anchor PEBs") and
4b68bf9a69d22dd ("ubi: Select fastmap anchor PEBs considering wear level
rules") applied, there is only 1 (3 - FASTMAP_ANCHOR_PEBS[1] -
FASTMAP_NEXT_ANCHOR_PEBS[1]) free PEB to fill pool and wl_pool, after
filling pool, wl_pool is always empty. So, UBI could be stuck in an
infinite loop:

	ubi_thread	   system_wq
wear_leveling_worker <--------------------------------------------------
  get_peb_for_wl							|
    // fm_wl_pool, used = size = 0					|
    schedule_work(&ubi->fm_work)					|
									|
		    update_fastmap_work_fn				|
		      ubi_update_fastmap				|
			ubi_refill_pools				|
			// ubi->free_count - ubi->beb_rsvd_pebs < 5	|
			// wl_pool is not filled with any PEBs		|
			schedule_erase(old_fm_anchor)			|
			ubi_ensure_anchor_pebs				|
			  __schedule_ubi_work(wear_leveling_worker)	|
									|
__erase_worker								|
  ensure_wear_leveling							|
    __schedule_ubi_work(wear_leveling_worker) --------------------------

, which cause high cpu usage of ubi_bgt:
top - 12:10:42 up 5 min,  2 users,  load average: 1.76, 0.68, 0.27
Tasks: 123 total,   3 running,  54 sleeping,   0 stopped,   0 zombie

  PID USER PR   NI VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1589 root 20   0   0      0      0 R  45.0  0.0   0:38.86 ubi_bgt0d
  319 root 20   0   0      0      0 I  15.2  0.0   0:15.29 kworker/0:3-eve
  371 root 20   0   0      0      0 I  14.9  0.0   0:12.85 kworker/3:3-eve
   20 root 20   0   0      0      0 I  11.3  0.0   0:05.33 kworker/1:0-eve
  202 root 20   0   0      0      0 I  11.3  0.0   0:04.93 kworker/2:3-eve

The fm_next_anchor always takes one free PEB to prepare for updating
fastmap, fm_anchor and fm_next_anchor are always allocated first of pool
and wl_pool, so UBI can regard these two PEBs as reserved PEBs.

Fix it by:
  1) Adding 2 PEBs reserved for fm_anchor and fm_next_anchor.
  2) Abandoning filling wl_pool until free count belows beb_rsvd_pebs.
Then, there are at least 2(EBA_RESERVED_PEBS + MIN_FASTMAP_RESERVED_PEBS -
MIN_FASTMAP_TAKEN_PEBS[1]) PEBs in pool and 1(WL_RESERVED_PEBS) PEB in
wl_pool after calling ubi_refill_pools() with all erase works done.

This modification will cause a compatibility problem with old UBI image.
If UBI volumes take the maximun number of PEBs for one certain UBI device,
there are no available PEBs to satisfy 2 new reserved PEBs, bad reserved
PEBs are taken firstly, if still not enough, ENOSPC will returned from ubi
initialization.

Fetch a reproducer in [Link].

Fixes: f9c34bb529975fe ("ubi: Fix producing anchor PEBs")
Fixes: 4b68bf9a69d22dd ("ubi: Select fastmap anchor PEBs ... rules")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215407
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 drivers/mtd/ubi/fastmap-wl.c |  5 +++--
 drivers/mtd/ubi/wl.h         | 11 +++++++++--
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/mtd/ubi/fastmap-wl.c b/drivers/mtd/ubi/fastmap-wl.c
index 28f55f9cf715..18f23e76fee9 100644
--- a/drivers/mtd/ubi/fastmap-wl.c
+++ b/drivers/mtd/ubi/fastmap-wl.c
@@ -134,7 +134,8 @@ void ubi_refill_pools(struct ubi_device *ubi)
 	for (;;) {
 		enough = 0;
 		if (pool->size < pool->max_size) {
-			if (!ubi->free.rb_node)
+			if (!ubi->free.rb_node ||
+			    (ubi->free_count <= ubi->beb_rsvd_pebs))
 				break;
 
 			e = wl_get_wle(ubi);
@@ -148,7 +149,7 @@ void ubi_refill_pools(struct ubi_device *ubi)
 
 		if (wl_pool->size < wl_pool->max_size) {
 			if (!ubi->free.rb_node ||
-			   (ubi->free_count - ubi->beb_rsvd_pebs < 5))
+			    (ubi->free_count <= ubi->beb_rsvd_pebs))
 				break;
 
 			e = find_wl_entry(ubi, &ubi->free, WL_FREE_MAX_DIFF);
diff --git a/drivers/mtd/ubi/wl.h b/drivers/mtd/ubi/wl.h
index c93a53293786..e6c8f68d5ffb 100644
--- a/drivers/mtd/ubi/wl.h
+++ b/drivers/mtd/ubi/wl.h
@@ -2,14 +2,21 @@
 #ifndef UBI_WL_H
 #define UBI_WL_H
 #ifdef CONFIG_MTD_UBI_FASTMAP
+
+/* Number of physical eraseblocks for fm anchor */
+#define FASTMAP_ANCHOR_PEBS 1
+/* Number of physical eraseblocks for next fm anchor */
+#define FASTMAP_NEXT_ANCHOR_PEBS 1
+
 static void update_fastmap_work_fn(struct work_struct *wrk);
 static struct ubi_wl_entry *find_anchor_wl_entry(struct rb_root *root);
 static struct ubi_wl_entry *get_peb_for_wl(struct ubi_device *ubi);
 static void ubi_fastmap_close(struct ubi_device *ubi);
 static inline void ubi_fastmap_init(struct ubi_device *ubi, int *count)
 {
-	/* Reserve enough LEBs to store two fastmaps. */
-	*count += (ubi->fm_size / ubi->leb_size) * 2;
+	/* Reserve enough LEBs to store two fastmaps and anchor fm pebs */
+	*count += (ubi->fm_size / ubi->leb_size) * 2 +
+		  FASTMAP_ANCHOR_PEBS + FASTMAP_NEXT_ANCHOR_PEBS;
 	INIT_WORK(&ubi->fm_work, update_fastmap_work_fn);
 }
 static struct ubi_wl_entry *may_reserve_for_fm(struct ubi_device *ubi,
-- 
2.31.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 00/15] Some bugfixs for ubi/ubifs
  2021-12-27  3:22 ` Zhihao Cheng
@ 2021-12-27 10:13   ` Richard Weinberger
  -1 siblings, 0 replies; 76+ messages in thread
From: Richard Weinberger @ 2021-12-27 10:13 UTC (permalink / raw)
  To: chengzhihao1
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "richard" <richard@nod.at>, "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>,
> "mcoquelin stm32" <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer"
> <s.hauer@pengutronix.de>
> CC: "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Gesendet: Montag, 27. Dezember 2021 04:22:31
> Betreff: [PATCH v6 00/15] Some bugfixs for ubi/ubifs

> v1->v2:
>  1. Add new fix for ubifs, "ubifs: Fix to add refcount once page is set
>  private"
>  2. Update "ubifs: Rename whiteout atomically":
>     1) Move inode mode in create_whiteout()
>     2) Don't check O_SYNC for whiteout, because it inherits from the old_dir
>     3) Remove useless 'synced_i_size ' assignment for whiteout, because
>	it's always be zero.
>     4) Remove unused variable 'ui' in create_whiteout()
>  3. Update "ubifs: setflags: Make dirtied_ino_d 8 bytes aligned":
>     1) Align dirtied_ino_d with 8 bytes.
> 
> v2->v3:
>  1. Update "ubifs: Rename whiteout atomically":
>     1) Fix misspelling 'have already check the old dir inode' ->
>        'have already checked the old dir inode'
>     2) Fix misspelling "Whiteout don't have non-zero size" ->
>        "Whiteout  have non-zero size"
>  2. Update "ubifs: Fix to add refcount once page is set private"
>     1) Fix commit message to explain the root cause.
>  3. Update "ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when
>     fm->used_blocks>=2"
>     1) Add fastmap used pebs into 'ai' in for-loop, rather than in
>        two-steps(Add pebs [pnum<UBI_FM_MAX_START] then add pebs
>	[pnum>=UBI_FM_MAX_START] into 'ai').
> 
> v3->v4:
>  1. Update "ubifs: Add missing iput if do_tmpfile() failed in rename whiteout":
>     1) Move whiteout cleanup into do_tmpfile() according to Sascha's advice
>  2. Add new fix for ubifs, "ubifs: ubifs_writepage: Mark page dirty after
>     writing inode failed"
> 
> v4->v5:
>  1. Add new fix for ubifs, "ubifs: ubifs_releasepage: Remove ubifs_assert(0)
>     to valid this process"
> 
> v5->v6:
>  1. Add new fix for ubi: "ubi: fastmap: Fix high cpu usage of ubi_bgt by
>     making sure wl_pool not empty"
> 
> Zhihao Cheng (15):
>  ubifs: rename_whiteout: Fix double free for whiteout_ui->data
>  ubifs: Fix deadlock in concurrent rename whiteout and inode writeback
>  ubifs: Fix wrong number of inodes locked by ui_mutex in ubifs_inode
>    comment
>  ubifs: Add missing iput if do_tmpfile() failed in rename whiteout
>  ubifs: Rename whiteout atomically
>  ubifs: Fix 'ui->dirty' race between do_tmpfile() and writeback work
>  ubifs: Rectify space amount budget for mkdir/tmpfile operations
>  ubifs: setflags: Make dirtied_ino_d 8 bytes aligned
>  ubifs: Fix read out-of-bounds in ubifs_wbuf_write_nolock()
>  ubifs: Fix to add refcount once page is set private
>  ubi: fastmap: Return error code if memory allocation fails in
>    add_aeb()
>  ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when
>    fm->used_blocks>=2
>  ubifs: ubifs_writepage: Mark page dirty after writing inode failed
>  ubifs: ubifs_releasepage: Remove ubifs_assert(0) to valid this process
>  ubi: fastmap: Fix high cpu usage of ubi_bgt by making sure wl_pool not
>    empty

Thanks a lot for all this fixes! I will start reviewing/picking them an
as soon the series is stable.
So please don't resend everything if you only add a new patch.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 00/15] Some bugfixs for ubi/ubifs
@ 2021-12-27 10:13   ` Richard Weinberger
  0 siblings, 0 replies; 76+ messages in thread
From: Richard Weinberger @ 2021-12-27 10:13 UTC (permalink / raw)
  To: chengzhihao1
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "richard" <richard@nod.at>, "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>,
> "mcoquelin stm32" <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer"
> <s.hauer@pengutronix.de>
> CC: "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Gesendet: Montag, 27. Dezember 2021 04:22:31
> Betreff: [PATCH v6 00/15] Some bugfixs for ubi/ubifs

> v1->v2:
>  1. Add new fix for ubifs, "ubifs: Fix to add refcount once page is set
>  private"
>  2. Update "ubifs: Rename whiteout atomically":
>     1) Move inode mode in create_whiteout()
>     2) Don't check O_SYNC for whiteout, because it inherits from the old_dir
>     3) Remove useless 'synced_i_size ' assignment for whiteout, because
>	it's always be zero.
>     4) Remove unused variable 'ui' in create_whiteout()
>  3. Update "ubifs: setflags: Make dirtied_ino_d 8 bytes aligned":
>     1) Align dirtied_ino_d with 8 bytes.
> 
> v2->v3:
>  1. Update "ubifs: Rename whiteout atomically":
>     1) Fix misspelling 'have already check the old dir inode' ->
>        'have already checked the old dir inode'
>     2) Fix misspelling "Whiteout don't have non-zero size" ->
>        "Whiteout  have non-zero size"
>  2. Update "ubifs: Fix to add refcount once page is set private"
>     1) Fix commit message to explain the root cause.
>  3. Update "ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when
>     fm->used_blocks>=2"
>     1) Add fastmap used pebs into 'ai' in for-loop, rather than in
>        two-steps(Add pebs [pnum<UBI_FM_MAX_START] then add pebs
>	[pnum>=UBI_FM_MAX_START] into 'ai').
> 
> v3->v4:
>  1. Update "ubifs: Add missing iput if do_tmpfile() failed in rename whiteout":
>     1) Move whiteout cleanup into do_tmpfile() according to Sascha's advice
>  2. Add new fix for ubifs, "ubifs: ubifs_writepage: Mark page dirty after
>     writing inode failed"
> 
> v4->v5:
>  1. Add new fix for ubifs, "ubifs: ubifs_releasepage: Remove ubifs_assert(0)
>     to valid this process"
> 
> v5->v6:
>  1. Add new fix for ubi: "ubi: fastmap: Fix high cpu usage of ubi_bgt by
>     making sure wl_pool not empty"
> 
> Zhihao Cheng (15):
>  ubifs: rename_whiteout: Fix double free for whiteout_ui->data
>  ubifs: Fix deadlock in concurrent rename whiteout and inode writeback
>  ubifs: Fix wrong number of inodes locked by ui_mutex in ubifs_inode
>    comment
>  ubifs: Add missing iput if do_tmpfile() failed in rename whiteout
>  ubifs: Rename whiteout atomically
>  ubifs: Fix 'ui->dirty' race between do_tmpfile() and writeback work
>  ubifs: Rectify space amount budget for mkdir/tmpfile operations
>  ubifs: setflags: Make dirtied_ino_d 8 bytes aligned
>  ubifs: Fix read out-of-bounds in ubifs_wbuf_write_nolock()
>  ubifs: Fix to add refcount once page is set private
>  ubi: fastmap: Return error code if memory allocation fails in
>    add_aeb()
>  ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when
>    fm->used_blocks>=2
>  ubifs: ubifs_writepage: Mark page dirty after writing inode failed
>  ubifs: ubifs_releasepage: Remove ubifs_assert(0) to valid this process
>  ubi: fastmap: Fix high cpu usage of ubi_bgt by making sure wl_pool not
>    empty

Thanks a lot for all this fixes! I will start reviewing/picking them an
as soon the series is stable.
So please don't resend everything if you only add a new patch.

Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 00/15] Some bugfixs for ubi/ubifs
  2021-12-27 10:13   ` Richard Weinberger
@ 2021-12-27 12:19     ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27 12:19 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

在 2021/12/27 18:13, Richard Weinberger 写道:

> Thanks a lot for all this fixes! I will start reviewing/picking them an
> as soon the series is stable.
> So please don't resend everything if you only add a new patch.
OK. New patches(If I find new bugs) will be sent indepently. v6 series 
is closed.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 00/15] Some bugfixs for ubi/ubifs
@ 2021-12-27 12:19     ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2021-12-27 12:19 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

在 2021/12/27 18:13, Richard Weinberger 写道:

> Thanks a lot for all this fixes! I will start reviewing/picking them an
> as soon the series is stable.
> So please don't resend everything if you only add a new patch.
OK. New patches(If I find new bugs) will be sent indepently. v6 series 
is closed.

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 00/15] Some bugfixs for ubi/ubifs
  2021-12-27 12:19     ` Zhihao Cheng
@ 2021-12-27 13:00       ` Richard Weinberger
  -1 siblings, 0 replies; 76+ messages in thread
From: Richard Weinberger @ 2021-12-27 13:00 UTC (permalink / raw)
  To: chengzhihao1
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
>> Thanks a lot for all this fixes! I will start reviewing/picking them an
>> as soon the series is stable.
>> So please don't resend everything if you only add a new patch.
> OK. New patches(If I find new bugs) will be sent indepently. v6 series
> is closed.

Thanks! Your fixes are most appreciated!

Thanks,
//richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 00/15] Some bugfixs for ubi/ubifs
@ 2021-12-27 13:00       ` Richard Weinberger
  0 siblings, 0 replies; 76+ messages in thread
From: Richard Weinberger @ 2021-12-27 13:00 UTC (permalink / raw)
  To: chengzhihao1
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
>> Thanks a lot for all this fixes! I will start reviewing/picking them an
>> as soon the series is stable.
>> So please don't resend everything if you only add a new patch.
> OK. New patches(If I find new bugs) will be sent indepently. v6 series
> is closed.

Thanks! Your fixes are most appreciated!

Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 05/15] ubifs: Rename whiteout atomically
  2021-12-27  3:22   ` Zhihao Cheng
@ 2022-01-09 21:14     ` Richard Weinberger
  -1 siblings, 0 replies; 76+ messages in thread
From: Richard Weinberger @ 2022-01-09 21:14 UTC (permalink / raw)
  To: chengzhihao1
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "richard" <richard@nod.at>, "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>,
> "mcoquelin stm32" <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer"
> <s.hauer@pengutronix.de>
> CC: "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Gesendet: Montag, 27. Dezember 2021 04:22:36
> Betreff: [PATCH v6 05/15] ubifs: Rename whiteout atomically

> Currently, rename whiteout has 3 steps:
>  1. create tmpfile(which associates old dentry to tmpfile inode) for
>     whiteout, and store tmpfile to disk
>  2. link whiteout, associate whiteout inode to old dentry agagin and
>     store old dentry, old inode, new dentry on disk
>  3. writeback dirty whiteout inode to disk
> 
> Suddenly power-cut or error occurring(eg. ENOSPC returned by budget,
> memory allocation failure) during above steps may cause kinds of problems:
>  Problem 1: ENOSPC returned by whiteout space budget (before step 2),
>	     old dentry will disappear after rename syscall, whiteout file
>	     cannot be found either.
> 
>	     ls dir  // we get file, whiteout
>	     rename(dir/file, dir/whiteout, REANME_WHITEOUT)
>	     ENOSPC = ubifs_budget_space(&wht_req) // return
>	     ls dir  // empty (no file, no whiteout)
>  Problem 2: Power-cut happens before step 3, whiteout inode with 'nlink=1'
>	     is not stored on disk, whiteout dentry(old dentry) is written
>	     on disk, whiteout file is lost on next mount (We get "dead
>	     directory entry" after executing 'ls -l' on whiteout file).
> 
> Now, we use following 3 steps to finish rename whiteout:
>  1. create an in-mem inode with 'nlink = 1' as whiteout
>  2. ubifs_jnl_rename (Write on disk to finish associating old dentry to
>     whiteout inode, associating new dentry with old inode)
>  3. iput(whiteout)
> 
> Rely writing in-mem inode on disk by ubifs_jnl_rename() to finish rename
> whiteout, which avoids middle disk state caused by suddenly power-cut
> and error occurring.

How do you make sure the the whiteout is never written to disk (by writeback) before ubifs_jnl_rename() linked
it? That's the reason why other filesystems use the tmpfile mechanism for whiteouts too.

Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 05/15] ubifs: Rename whiteout atomically
@ 2022-01-09 21:14     ` Richard Weinberger
  0 siblings, 0 replies; 76+ messages in thread
From: Richard Weinberger @ 2022-01-09 21:14 UTC (permalink / raw)
  To: chengzhihao1
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "richard" <richard@nod.at>, "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>,
> "mcoquelin stm32" <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer"
> <s.hauer@pengutronix.de>
> CC: "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Gesendet: Montag, 27. Dezember 2021 04:22:36
> Betreff: [PATCH v6 05/15] ubifs: Rename whiteout atomically

> Currently, rename whiteout has 3 steps:
>  1. create tmpfile(which associates old dentry to tmpfile inode) for
>     whiteout, and store tmpfile to disk
>  2. link whiteout, associate whiteout inode to old dentry agagin and
>     store old dentry, old inode, new dentry on disk
>  3. writeback dirty whiteout inode to disk
> 
> Suddenly power-cut or error occurring(eg. ENOSPC returned by budget,
> memory allocation failure) during above steps may cause kinds of problems:
>  Problem 1: ENOSPC returned by whiteout space budget (before step 2),
>	     old dentry will disappear after rename syscall, whiteout file
>	     cannot be found either.
> 
>	     ls dir  // we get file, whiteout
>	     rename(dir/file, dir/whiteout, REANME_WHITEOUT)
>	     ENOSPC = ubifs_budget_space(&wht_req) // return
>	     ls dir  // empty (no file, no whiteout)
>  Problem 2: Power-cut happens before step 3, whiteout inode with 'nlink=1'
>	     is not stored on disk, whiteout dentry(old dentry) is written
>	     on disk, whiteout file is lost on next mount (We get "dead
>	     directory entry" after executing 'ls -l' on whiteout file).
> 
> Now, we use following 3 steps to finish rename whiteout:
>  1. create an in-mem inode with 'nlink = 1' as whiteout
>  2. ubifs_jnl_rename (Write on disk to finish associating old dentry to
>     whiteout inode, associating new dentry with old inode)
>  3. iput(whiteout)
> 
> Rely writing in-mem inode on disk by ubifs_jnl_rename() to finish rename
> whiteout, which avoids middle disk state caused by suddenly power-cut
> and error occurring.

How do you make sure the the whiteout is never written to disk (by writeback) before ubifs_jnl_rename() linked
it? That's the reason why other filesystems use the tmpfile mechanism for whiteouts too.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 05/15] ubifs: Rename whiteout atomically
  2022-01-09 21:14     ` Richard Weinberger
@ 2022-01-10  9:35       ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2022-01-10  9:35 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

Hi, Richard
> 
> How do you make sure the the whiteout is never written to disk (by writeback) before ubifs_jnl_rename() linked
> it? That's the reason why other filesystems use the tmpfile mechanism for whiteouts too.
> 

The whiteout inode is clean after creation from create_whiteout(), and 
it can't be marked dirty until ubifs_jnl_rename() finished. So, I think 
there is no chance for whiteout being written on disk. Then, 
'ubifs_assert(c, !whiteout_ui->dirty)' never fails in ubifs_jnl_rename() 
during my local stress tests. You may add some delay executions after 
whiteout creation to make sure that whiteout won't be written back 
before ubifs_jnl_rename().

BTW, I considered plan B(Binding whiteout to a negative dentry), but we 
cannot get parent dir's entry in do_rename(), so I abandoned it.
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -1334,7 +1335,8 @@ static int do_rename(struct inode *old_dir, struct 
dentry *old_dentry,
                         goto out_release;
                 }

-               err = do_tmpfile(old_dir, old_dentry, S_IFCHR | 
WHITEOUT_MODE, &whiteout);
+               // tmp_dentry = d_alloc(old_dir_dentry, &slash_name); 
// refer to vfs_tmpfile
+               err = do_tmpfile(old_dir, tmp_dentry, S_IFCHR | 
WHITEOUT_MODE, &whiteout);
                 if (err) {
                         kfree(dev);
                         goto out_release;

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 05/15] ubifs: Rename whiteout atomically
@ 2022-01-10  9:35       ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2022-01-10  9:35 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

Hi, Richard
> 
> How do you make sure the the whiteout is never written to disk (by writeback) before ubifs_jnl_rename() linked
> it? That's the reason why other filesystems use the tmpfile mechanism for whiteouts too.
> 

The whiteout inode is clean after creation from create_whiteout(), and 
it can't be marked dirty until ubifs_jnl_rename() finished. So, I think 
there is no chance for whiteout being written on disk. Then, 
'ubifs_assert(c, !whiteout_ui->dirty)' never fails in ubifs_jnl_rename() 
during my local stress tests. You may add some delay executions after 
whiteout creation to make sure that whiteout won't be written back 
before ubifs_jnl_rename().

BTW, I considered plan B(Binding whiteout to a negative dentry), but we 
cannot get parent dir's entry in do_rename(), so I abandoned it.
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -1334,7 +1335,8 @@ static int do_rename(struct inode *old_dir, struct 
dentry *old_dentry,
                         goto out_release;
                 }

-               err = do_tmpfile(old_dir, old_dentry, S_IFCHR | 
WHITEOUT_MODE, &whiteout);
+               // tmp_dentry = d_alloc(old_dir_dentry, &slash_name); 
// refer to vfs_tmpfile
+               err = do_tmpfile(old_dir, tmp_dentry, S_IFCHR | 
WHITEOUT_MODE, &whiteout);
                 if (err) {
                         kfree(dev);
                         goto out_release;

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 05/15] ubifs: Rename whiteout atomically
  2022-01-10  9:35       ` Zhihao Cheng
@ 2022-01-10 10:14         ` Richard Weinberger
  -1 siblings, 0 replies; 76+ messages in thread
From: Richard Weinberger @ 2022-01-10 10:14 UTC (permalink / raw)
  To: chengzhihao1
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "richard" <richard@nod.at>
> CC: "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>, "mcoquelin stm32"
> <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer"
> <s.hauer@pengutronix.de>, "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Gesendet: Montag, 10. Januar 2022 10:35:02
> Betreff: Re: [PATCH v6 05/15] ubifs: Rename whiteout atomically

> Hi, Richard
>> 
>> How do you make sure the the whiteout is never written to disk (by writeback)
>> before ubifs_jnl_rename() linked
>> it? That's the reason why other filesystems use the tmpfile mechanism for
>> whiteouts too.
>> 
> 
> The whiteout inode is clean after creation from create_whiteout(), and
> it can't be marked dirty until ubifs_jnl_rename() finished. So, I think
> there is no chance for whiteout being written on disk. Then,
> 'ubifs_assert(c, !whiteout_ui->dirty)' never fails in ubifs_jnl_rename()
> during my local stress tests. You may add some delay executions after
> whiteout creation to make sure that whiteout won't be written back
> before ubifs_jnl_rename().

From UBIFS point of view I fully agree with you. I'm just a little puzzled why
other filesystems use the tmpfile approach. My fear is that VFS can do things
to the inode we don't have in mind right now.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 05/15] ubifs: Rename whiteout atomically
@ 2022-01-10 10:14         ` Richard Weinberger
  0 siblings, 0 replies; 76+ messages in thread
From: Richard Weinberger @ 2022-01-10 10:14 UTC (permalink / raw)
  To: chengzhihao1
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "richard" <richard@nod.at>
> CC: "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>, "mcoquelin stm32"
> <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer"
> <s.hauer@pengutronix.de>, "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Gesendet: Montag, 10. Januar 2022 10:35:02
> Betreff: Re: [PATCH v6 05/15] ubifs: Rename whiteout atomically

> Hi, Richard
>> 
>> How do you make sure the the whiteout is never written to disk (by writeback)
>> before ubifs_jnl_rename() linked
>> it? That's the reason why other filesystems use the tmpfile mechanism for
>> whiteouts too.
>> 
> 
> The whiteout inode is clean after creation from create_whiteout(), and
> it can't be marked dirty until ubifs_jnl_rename() finished. So, I think
> there is no chance for whiteout being written on disk. Then,
> 'ubifs_assert(c, !whiteout_ui->dirty)' never fails in ubifs_jnl_rename()
> during my local stress tests. You may add some delay executions after
> whiteout creation to make sure that whiteout won't be written back
> before ubifs_jnl_rename().

From UBIFS point of view I fully agree with you. I'm just a little puzzled why
other filesystems use the tmpfile approach. My fear is that VFS can do things
to the inode we don't have in mind right now.

Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 05/15] ubifs: Rename whiteout atomically
  2022-01-10 10:14         ` Richard Weinberger
@ 2022-01-10 20:58           ` Richard Weinberger
  -1 siblings, 0 replies; 76+ messages in thread
From: Richard Weinberger @ 2022-01-10 20:58 UTC (permalink / raw)
  To: chengzhihao1
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

----- Ursprüngliche Mail -----
>> The whiteout inode is clean after creation from create_whiteout(), and
>> it can't be marked dirty until ubifs_jnl_rename() finished. So, I think
>> there is no chance for whiteout being written on disk. Then,
>> 'ubifs_assert(c, !whiteout_ui->dirty)' never fails in ubifs_jnl_rename()
>> during my local stress tests. You may add some delay executions after
>> whiteout creation to make sure that whiteout won't be written back
>> before ubifs_jnl_rename().
> 
> From UBIFS point of view I fully agree with you. I'm just a little puzzled why
> other filesystems use the tmpfile approach. My fear is that VFS can do things
> to the inode we don't have in mind right now.

After digging a bit into XFS I'm sure your approach is okay.
So, UBIFS can do a whiteout without help of tmpfiles. :-)

Thanks,
//richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 05/15] ubifs: Rename whiteout atomically
@ 2022-01-10 20:58           ` Richard Weinberger
  0 siblings, 0 replies; 76+ messages in thread
From: Richard Weinberger @ 2022-01-10 20:58 UTC (permalink / raw)
  To: chengzhihao1
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

----- Ursprüngliche Mail -----
>> The whiteout inode is clean after creation from create_whiteout(), and
>> it can't be marked dirty until ubifs_jnl_rename() finished. So, I think
>> there is no chance for whiteout being written on disk. Then,
>> 'ubifs_assert(c, !whiteout_ui->dirty)' never fails in ubifs_jnl_rename()
>> during my local stress tests. You may add some delay executions after
>> whiteout creation to make sure that whiteout won't be written back
>> before ubifs_jnl_rename().
> 
> From UBIFS point of view I fully agree with you. I'm just a little puzzled why
> other filesystems use the tmpfile approach. My fear is that VFS can do things
> to the inode we don't have in mind right now.

After digging a bit into XFS I'm sure your approach is okay.
So, UBIFS can do a whiteout without help of tmpfiles. :-)

Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
  2021-12-27  3:22   ` Zhihao Cheng
@ 2022-01-10 23:23     ` Richard Weinberger
  -1 siblings, 0 replies; 76+ messages in thread
From: Richard Weinberger @ 2022-01-10 23:23 UTC (permalink / raw)
  To: chengzhihao1
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "richard" <richard@nod.at>, "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>,
> "mcoquelin stm32" <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer"
> <s.hauer@pengutronix.de>
> CC: "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Gesendet: Montag, 27. Dezember 2021 04:22:43
> Betreff: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

> Fastmap pebs(pnum >= UBI_FM_MAX_START) won't be added into 'ai->fastmap'
> while attaching ubi device if 'fm->used_blocks' is greater than 2, which
> may cause warning from 'ubi_assert(ubi->good_peb_count == found_pebs)':
> 
>  UBI assert failed in ubi_wl_init at 1878 (pid 2409)
>  Call Trace:
>    ubi_wl_init.cold+0xae/0x2af [ubi]
>    ubi_attach+0x1b0/0x780 [ubi]
>    ubi_init+0x23a/0x3ad [ubi]
>    load_module+0x22d2/0x2430
> 
> Reproduce:
>  ID="0x20,0x33,0x00,0x00" # 16M 16KB PEB, 512 page
>  modprobe nandsim id_bytes=$ID
>  modprobe ubi mtd="0,0" fm_autoconvert  # Fastmap takes 2 pebs
>  rmmod ubi
>  modprobe ubi mtd="0,0" fm_autoconvert  # Attach by fastmap
> 
> Add all used fastmap pebs into list 'ai->fastmap' to make sure they can
> be counted into 'found_pebs'.
> 
> Fixes: fdf10ed710c0aa ("ubi: Rework Fastmap attach base code")
> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
> ---
> drivers/mtd/ubi/fastmap.c | 35 +++++------------------------------
> 1 file changed, 5 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
> index 6b5f1ffd961b..01dcdd94c9d2 100644
> --- a/drivers/mtd/ubi/fastmap.c
> +++ b/drivers/mtd/ubi/fastmap.c
> /**
>  * ubi_scan_fastmap - scan the fastmap.
>  * @ubi: UBI device object
> @@ -865,7 +847,6 @@ int ubi_scan_fastmap(struct ubi_device *ubi, struct
> ubi_attach_info *ai,
> 	struct ubi_vid_hdr *vh;
> 	struct ubi_ec_hdr *ech;
> 	struct ubi_fastmap_layout *fm;
> -	struct ubi_ainf_peb *aeb;
> 	int i, used_blocks, pnum, fm_anchor, ret = 0;
> 	size_t fm_size;
> 	__be32 crc, tmp_crc;
> @@ -875,17 +856,6 @@ int ubi_scan_fastmap(struct ubi_device *ubi, struct
> ubi_attach_info *ai,
> 	if (fm_anchor < 0)
> 		return UBI_NO_FASTMAP;
> 
> -	/* Copy all (possible) fastmap blocks into our new attach structure. */
> -	list_for_each_entry(aeb, &scan_ai->fastmap, u.list) {
> -		struct ubi_ainf_peb *new;
> -
> -		new = clone_aeb(ai, aeb);
> -		if (!new)
> -			return -ENOMEM;
> -
> -		list_add(&new->u.list, &ai->fastmap);
> -	}
> -

scan_ai->fastmap may contain also old fastmap PEBs.
In the area < UBI_FM_MAX_START you can find outdated fastmap PEBs.
e.g. after power-cut.
That's why scan_ai->fastmap is copied into ai->fastmap.
Later in ubi_wl_init() these outdated PEBs will get erased.
So, you cannot remove this code.

But I fully agree with you that the fm->used_blocks > 1 case is not correct.
I fear if scan_ai->fastmap contains old fastmap PEBs and fm->used_blocks is > 1
we need to fall back to scanning mode while attaching.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
@ 2022-01-10 23:23     ` Richard Weinberger
  0 siblings, 0 replies; 76+ messages in thread
From: Richard Weinberger @ 2022-01-10 23:23 UTC (permalink / raw)
  To: chengzhihao1
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "richard" <richard@nod.at>, "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>,
> "mcoquelin stm32" <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer"
> <s.hauer@pengutronix.de>
> CC: "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Gesendet: Montag, 27. Dezember 2021 04:22:43
> Betreff: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

> Fastmap pebs(pnum >= UBI_FM_MAX_START) won't be added into 'ai->fastmap'
> while attaching ubi device if 'fm->used_blocks' is greater than 2, which
> may cause warning from 'ubi_assert(ubi->good_peb_count == found_pebs)':
> 
>  UBI assert failed in ubi_wl_init at 1878 (pid 2409)
>  Call Trace:
>    ubi_wl_init.cold+0xae/0x2af [ubi]
>    ubi_attach+0x1b0/0x780 [ubi]
>    ubi_init+0x23a/0x3ad [ubi]
>    load_module+0x22d2/0x2430
> 
> Reproduce:
>  ID="0x20,0x33,0x00,0x00" # 16M 16KB PEB, 512 page
>  modprobe nandsim id_bytes=$ID
>  modprobe ubi mtd="0,0" fm_autoconvert  # Fastmap takes 2 pebs
>  rmmod ubi
>  modprobe ubi mtd="0,0" fm_autoconvert  # Attach by fastmap
> 
> Add all used fastmap pebs into list 'ai->fastmap' to make sure they can
> be counted into 'found_pebs'.
> 
> Fixes: fdf10ed710c0aa ("ubi: Rework Fastmap attach base code")
> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
> ---
> drivers/mtd/ubi/fastmap.c | 35 +++++------------------------------
> 1 file changed, 5 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
> index 6b5f1ffd961b..01dcdd94c9d2 100644
> --- a/drivers/mtd/ubi/fastmap.c
> +++ b/drivers/mtd/ubi/fastmap.c
> /**
>  * ubi_scan_fastmap - scan the fastmap.
>  * @ubi: UBI device object
> @@ -865,7 +847,6 @@ int ubi_scan_fastmap(struct ubi_device *ubi, struct
> ubi_attach_info *ai,
> 	struct ubi_vid_hdr *vh;
> 	struct ubi_ec_hdr *ech;
> 	struct ubi_fastmap_layout *fm;
> -	struct ubi_ainf_peb *aeb;
> 	int i, used_blocks, pnum, fm_anchor, ret = 0;
> 	size_t fm_size;
> 	__be32 crc, tmp_crc;
> @@ -875,17 +856,6 @@ int ubi_scan_fastmap(struct ubi_device *ubi, struct
> ubi_attach_info *ai,
> 	if (fm_anchor < 0)
> 		return UBI_NO_FASTMAP;
> 
> -	/* Copy all (possible) fastmap blocks into our new attach structure. */
> -	list_for_each_entry(aeb, &scan_ai->fastmap, u.list) {
> -		struct ubi_ainf_peb *new;
> -
> -		new = clone_aeb(ai, aeb);
> -		if (!new)
> -			return -ENOMEM;
> -
> -		list_add(&new->u.list, &ai->fastmap);
> -	}
> -

scan_ai->fastmap may contain also old fastmap PEBs.
In the area < UBI_FM_MAX_START you can find outdated fastmap PEBs.
e.g. after power-cut.
That's why scan_ai->fastmap is copied into ai->fastmap.
Later in ubi_wl_init() these outdated PEBs will get erased.
So, you cannot remove this code.

But I fully agree with you that the fm->used_blocks > 1 case is not correct.
I fear if scan_ai->fastmap contains old fastmap PEBs and fm->used_blocks is > 1
we need to fall back to scanning mode while attaching.

Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
  2022-01-10 23:23     ` Richard Weinberger
@ 2022-01-11  2:48       ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2022-01-11  2:48 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

Hi Richard,
> scan_ai->fastmap may contain also old fastmap PEBs.
> In the area < UBI_FM_MAX_START you can find outdated fastmap PEBs.
> e.g. after power-cut.
> That's why scan_ai->fastmap is copied into ai->fastmap.
> Later in ubi_wl_init() these outdated PEBs will get erased.
> So, you cannot remove this code.
I thought old fastmap PEBs(async erase works in ubi_update_fastmap()) 
will be counted into erase PEBs in the next attaching process, because I 
saw following code snippet in ubi_write_fastmap():
1260         list_for_each_entry(ubi_wrk, &ubi->works, list) { 

1261                 if (ubi_is_erase_work(ubi_wrk)) { 

1262                         wl_e = ubi_wrk->e; 

1263                         ubi_assert(wl_e); 

1264 

1265                         fec = (struct ubi_fm_ec *)(fm_raw + 
fm_pos);
1266 

1267                         fec->pnum = cpu_to_be32(wl_e->pnum); 

1268                         set_seen(ubi, wl_e->pnum, seen_pebs); 

1269                         fec->ec = cpu_to_be32(wl_e->ec); 

1270 

1271                         erase_peb_count++; 

1272                         fm_pos += sizeof(*fec); 

1273                         ubi_assert(fm_pos <= ubi->fm_size); 

1274                 } 

1275         } 

1276         fmh->erase_peb_count = cpu_to_be32(erase_peb_count);
Half-writing on fastmap will be recognized in scanning, and UBI 
fallbacks full scanning, So, I come up with two situations:
1. power-cut before new fastmap written, the old fastmap is completely 
saved until next attaching, and some free PEBs are written with new 
fastmap data. Luckly, fastmap anchor PEB's vid header is written first 
of all, bad fastmap will be returned by ubi_attach_fastmap() in next 
attaching.
2. power-cut after new fastmap written, the old fastmap PEBs will be 
added into 'ai->erase' list in next attaching.
Did I miss other possible circumstances?
> 
> But I fully agree with you that the fm->used_blocks > 1 case is not correct.
> I fear if scan_ai->fastmap contains old fastmap PEBs and fm->used_blocks is > 1
> we need to fall back to scanning mode while attaching.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
@ 2022-01-11  2:48       ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2022-01-11  2:48 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

Hi Richard,
> scan_ai->fastmap may contain also old fastmap PEBs.
> In the area < UBI_FM_MAX_START you can find outdated fastmap PEBs.
> e.g. after power-cut.
> That's why scan_ai->fastmap is copied into ai->fastmap.
> Later in ubi_wl_init() these outdated PEBs will get erased.
> So, you cannot remove this code.
I thought old fastmap PEBs(async erase works in ubi_update_fastmap()) 
will be counted into erase PEBs in the next attaching process, because I 
saw following code snippet in ubi_write_fastmap():
1260         list_for_each_entry(ubi_wrk, &ubi->works, list) { 

1261                 if (ubi_is_erase_work(ubi_wrk)) { 

1262                         wl_e = ubi_wrk->e; 

1263                         ubi_assert(wl_e); 

1264 

1265                         fec = (struct ubi_fm_ec *)(fm_raw + 
fm_pos);
1266 

1267                         fec->pnum = cpu_to_be32(wl_e->pnum); 

1268                         set_seen(ubi, wl_e->pnum, seen_pebs); 

1269                         fec->ec = cpu_to_be32(wl_e->ec); 

1270 

1271                         erase_peb_count++; 

1272                         fm_pos += sizeof(*fec); 

1273                         ubi_assert(fm_pos <= ubi->fm_size); 

1274                 } 

1275         } 

1276         fmh->erase_peb_count = cpu_to_be32(erase_peb_count);
Half-writing on fastmap will be recognized in scanning, and UBI 
fallbacks full scanning, So, I come up with two situations:
1. power-cut before new fastmap written, the old fastmap is completely 
saved until next attaching, and some free PEBs are written with new 
fastmap data. Luckly, fastmap anchor PEB's vid header is written first 
of all, bad fastmap will be returned by ubi_attach_fastmap() in next 
attaching.
2. power-cut after new fastmap written, the old fastmap PEBs will be 
added into 'ai->erase' list in next attaching.
Did I miss other possible circumstances?
> 
> But I fully agree with you that the fm->used_blocks > 1 case is not correct.
> I fear if scan_ai->fastmap contains old fastmap PEBs and fm->used_blocks is > 1
> we need to fall back to scanning mode while attaching.

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
  2022-01-11  2:48       ` Zhihao Cheng
@ 2022-01-11  7:27         ` Richard Weinberger
  -1 siblings, 0 replies; 76+ messages in thread
From: Richard Weinberger @ 2022-01-11  7:27 UTC (permalink / raw)
  To: chengzhihao1
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "richard" <richard@nod.at>
> CC: "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>, "mcoquelin stm32"
> <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer"
> <s.hauer@pengutronix.de>, "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Gesendet: Dienstag, 11. Januar 2022 03:48:24
> Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

> Hi Richard,
>> scan_ai->fastmap may contain also old fastmap PEBs.
>> In the area < UBI_FM_MAX_START you can find outdated fastmap PEBs.
>> e.g. after power-cut.
>> That's why scan_ai->fastmap is copied into ai->fastmap.
>> Later in ubi_wl_init() these outdated PEBs will get erased.
>> So, you cannot remove this code.
> I thought old fastmap PEBs(async erase works in ubi_update_fastmap())
> will be counted into erase PEBs in the next attaching process, because I
> saw following code snippet in ubi_write_fastmap():
> 1260         list_for_each_entry(ubi_wrk, &ubi->works, list) {
> 
> 1261                 if (ubi_is_erase_work(ubi_wrk)) {
> 
> 1262                         wl_e = ubi_wrk->e;
> 
> 1263                         ubi_assert(wl_e);
> 
> 1264
> 
> 1265                         fec = (struct ubi_fm_ec *)(fm_raw +
> fm_pos);
> 1266
> 
> 1267                         fec->pnum = cpu_to_be32(wl_e->pnum);
> 
> 1268                         set_seen(ubi, wl_e->pnum, seen_pebs);
> 
> 1269                         fec->ec = cpu_to_be32(wl_e->ec);
> 
> 1270
> 
> 1271                         erase_peb_count++;
> 
> 1272                         fm_pos += sizeof(*fec);
> 
> 1273                         ubi_assert(fm_pos <= ubi->fm_size);
> 
> 1274                 }
> 
> 1275         }
> 
> 1276         fmh->erase_peb_count = cpu_to_be32(erase_peb_count);
> Half-writing on fastmap will be recognized in scanning, and UBI
> fallbacks full scanning, So, I come up with two situations:
> 1. power-cut before new fastmap written, the old fastmap is completely
> saved until next attaching, and some free PEBs are written with new
> fastmap data. Luckly, fastmap anchor PEB's vid header is written first
> of all, bad fastmap will be returned by ubi_attach_fastmap() in next
> attaching.
> 2. power-cut after new fastmap written, the old fastmap PEBs will be
> added into 'ai->erase' list in next attaching.
> Did I miss other possible circumstances?

In ubi_wl_init() there is another corner case documented:
                        /*
                         * The fastmap update code might not find a free PEB for
                         * writing the fastmap anchor to and then reuses the
                         * current fastmap anchor PEB. When this PEB gets erased
                         * and a power cut happens before it is written again we
                         * must make sure that the fastmap attach code doesn't
                         * find any outdated fastmap anchors, hence we erase the
                         * outdated fastmap anchor PEBs synchronously here.
                         */
                        if (aeb->vol_id == UBI_FM_SB_VOLUME_ID)
                                sync = true;

So ubi_wl_init() makes sure that all old fastmap anchors get erased before UBI
starts to operate. With your change this is no longer satisfied.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
@ 2022-01-11  7:27         ` Richard Weinberger
  0 siblings, 0 replies; 76+ messages in thread
From: Richard Weinberger @ 2022-01-11  7:27 UTC (permalink / raw)
  To: chengzhihao1
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "richard" <richard@nod.at>
> CC: "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>, "mcoquelin stm32"
> <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer"
> <s.hauer@pengutronix.de>, "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Gesendet: Dienstag, 11. Januar 2022 03:48:24
> Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

> Hi Richard,
>> scan_ai->fastmap may contain also old fastmap PEBs.
>> In the area < UBI_FM_MAX_START you can find outdated fastmap PEBs.
>> e.g. after power-cut.
>> That's why scan_ai->fastmap is copied into ai->fastmap.
>> Later in ubi_wl_init() these outdated PEBs will get erased.
>> So, you cannot remove this code.
> I thought old fastmap PEBs(async erase works in ubi_update_fastmap())
> will be counted into erase PEBs in the next attaching process, because I
> saw following code snippet in ubi_write_fastmap():
> 1260         list_for_each_entry(ubi_wrk, &ubi->works, list) {
> 
> 1261                 if (ubi_is_erase_work(ubi_wrk)) {
> 
> 1262                         wl_e = ubi_wrk->e;
> 
> 1263                         ubi_assert(wl_e);
> 
> 1264
> 
> 1265                         fec = (struct ubi_fm_ec *)(fm_raw +
> fm_pos);
> 1266
> 
> 1267                         fec->pnum = cpu_to_be32(wl_e->pnum);
> 
> 1268                         set_seen(ubi, wl_e->pnum, seen_pebs);
> 
> 1269                         fec->ec = cpu_to_be32(wl_e->ec);
> 
> 1270
> 
> 1271                         erase_peb_count++;
> 
> 1272                         fm_pos += sizeof(*fec);
> 
> 1273                         ubi_assert(fm_pos <= ubi->fm_size);
> 
> 1274                 }
> 
> 1275         }
> 
> 1276         fmh->erase_peb_count = cpu_to_be32(erase_peb_count);
> Half-writing on fastmap will be recognized in scanning, and UBI
> fallbacks full scanning, So, I come up with two situations:
> 1. power-cut before new fastmap written, the old fastmap is completely
> saved until next attaching, and some free PEBs are written with new
> fastmap data. Luckly, fastmap anchor PEB's vid header is written first
> of all, bad fastmap will be returned by ubi_attach_fastmap() in next
> attaching.
> 2. power-cut after new fastmap written, the old fastmap PEBs will be
> added into 'ai->erase' list in next attaching.
> Did I miss other possible circumstances?

In ubi_wl_init() there is another corner case documented:
                        /*
                         * The fastmap update code might not find a free PEB for
                         * writing the fastmap anchor to and then reuses the
                         * current fastmap anchor PEB. When this PEB gets erased
                         * and a power cut happens before it is written again we
                         * must make sure that the fastmap attach code doesn't
                         * find any outdated fastmap anchors, hence we erase the
                         * outdated fastmap anchor PEBs synchronously here.
                         */
                        if (aeb->vol_id == UBI_FM_SB_VOLUME_ID)
                                sync = true;

So ubi_wl_init() makes sure that all old fastmap anchors get erased before UBI
starts to operate. With your change this is no longer satisfied.

Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
  2022-01-11  7:27         ` Richard Weinberger
@ 2022-01-11 11:44           ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2022-01-11 11:44 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

Hi Richard,
> In ubi_wl_init() there is another corner case documented:
>                          /*
>                           * The fastmap update code might not find a free PEB for
>                           * writing the fastmap anchor to and then reuses the
>                           * current fastmap anchor PEB. When this PEB gets erased
>                           * and a power cut happens before it is written again we
>                           * must make sure that the fastmap attach code doesn't
>                           * find any outdated fastmap anchors, hence we erase the
>                           * outdated fastmap anchor PEBs synchronously here.
>                           */
>                          if (aeb->vol_id == UBI_FM_SB_VOLUME_ID)
>                                  sync = true;
> 
> So ubi_wl_init() makes sure that all old fastmap anchors get erased before UBI
> starts to operate. With your change this is no longer satisfied
I seem to understand the another case. But I'm still confused that why 
outdated fastmap PEBs cannot be erased. When UBI comes to this point, it 
means UBI is attached by **full scanning mode**, scan_fast() returns two 
values:
   UBI_NO_FASTMAP: scan all pebs from pnum UBI_FM_MAX_START, ai is 
assigned with scan_ai, at last, all fastmap pebs are added into 
'ai->fastmap'
   UBI_BAD_FASTMAP: scan all pebs from pnum 0, all fastmap pebs are 
added into 'ai->fastmap'

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
@ 2022-01-11 11:44           ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2022-01-11 11:44 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

Hi Richard,
> In ubi_wl_init() there is another corner case documented:
>                          /*
>                           * The fastmap update code might not find a free PEB for
>                           * writing the fastmap anchor to and then reuses the
>                           * current fastmap anchor PEB. When this PEB gets erased
>                           * and a power cut happens before it is written again we
>                           * must make sure that the fastmap attach code doesn't
>                           * find any outdated fastmap anchors, hence we erase the
>                           * outdated fastmap anchor PEBs synchronously here.
>                           */
>                          if (aeb->vol_id == UBI_FM_SB_VOLUME_ID)
>                                  sync = true;
> 
> So ubi_wl_init() makes sure that all old fastmap anchors get erased before UBI
> starts to operate. With your change this is no longer satisfied
I seem to understand the another case. But I'm still confused that why 
outdated fastmap PEBs cannot be erased. When UBI comes to this point, it 
means UBI is attached by **full scanning mode**, scan_fast() returns two 
values:
   UBI_NO_FASTMAP: scan all pebs from pnum UBI_FM_MAX_START, ai is 
assigned with scan_ai, at last, all fastmap pebs are added into 
'ai->fastmap'
   UBI_BAD_FASTMAP: scan all pebs from pnum 0, all fastmap pebs are 
added into 'ai->fastmap'

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
  2022-01-11 11:44           ` Zhihao Cheng
@ 2022-01-11 11:57             ` Richard Weinberger
  -1 siblings, 0 replies; 76+ messages in thread
From: Richard Weinberger @ 2022-01-11 11:57 UTC (permalink / raw)
  To: chengzhihao1
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "richard" <richard@nod.at>
> CC: "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>, "mcoquelin stm32"
> <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer"
> <s.hauer@pengutronix.de>, "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Gesendet: Dienstag, 11. Januar 2022 12:44:49
> Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

> Hi Richard,
>> In ubi_wl_init() there is another corner case documented:
>>                          /*
>>                           * The fastmap update code might not find a free PEB for
>>                           * writing the fastmap anchor to and then reuses the
>>                           * current fastmap anchor PEB. When this PEB gets erased
>>                           * and a power cut happens before it is written again we
>>                           * must make sure that the fastmap attach code doesn't
>>                           * find any outdated fastmap anchors, hence we erase the
>>                           * outdated fastmap anchor PEBs synchronously here.
>>                           */
>>                          if (aeb->vol_id == UBI_FM_SB_VOLUME_ID)
>>                                  sync = true;
>> 
>> So ubi_wl_init() makes sure that all old fastmap anchors get erased before UBI
>> starts to operate. With your change this is no longer satisfied
> I seem to understand the another case. But I'm still confused that why
> outdated fastmap PEBs cannot be erased. When UBI comes to this point, it
> means UBI is attached by **full scanning mode**, scan_fast() returns two
> values:
>   UBI_NO_FASTMAP: scan all pebs from pnum UBI_FM_MAX_START, ai is
> assigned with scan_ai, at last, all fastmap pebs are added into
> 'ai->fastmap'
>   UBI_BAD_FASTMAP: scan all pebs from pnum 0, all fastmap pebs are
> added into 'ai->fastmap'

I fear, I'm unable to follow your thoughts.
ubi_wl_init() is called in both cases, with and without fastmap.
And ai->fastmap contains all anchor PEBs that scan_fast() found.
This can be the most recent but also outdated anchor PEBs.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
@ 2022-01-11 11:57             ` Richard Weinberger
  0 siblings, 0 replies; 76+ messages in thread
From: Richard Weinberger @ 2022-01-11 11:57 UTC (permalink / raw)
  To: chengzhihao1
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "richard" <richard@nod.at>
> CC: "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>, "mcoquelin stm32"
> <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer"
> <s.hauer@pengutronix.de>, "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Gesendet: Dienstag, 11. Januar 2022 12:44:49
> Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

> Hi Richard,
>> In ubi_wl_init() there is another corner case documented:
>>                          /*
>>                           * The fastmap update code might not find a free PEB for
>>                           * writing the fastmap anchor to and then reuses the
>>                           * current fastmap anchor PEB. When this PEB gets erased
>>                           * and a power cut happens before it is written again we
>>                           * must make sure that the fastmap attach code doesn't
>>                           * find any outdated fastmap anchors, hence we erase the
>>                           * outdated fastmap anchor PEBs synchronously here.
>>                           */
>>                          if (aeb->vol_id == UBI_FM_SB_VOLUME_ID)
>>                                  sync = true;
>> 
>> So ubi_wl_init() makes sure that all old fastmap anchors get erased before UBI
>> starts to operate. With your change this is no longer satisfied
> I seem to understand the another case. But I'm still confused that why
> outdated fastmap PEBs cannot be erased. When UBI comes to this point, it
> means UBI is attached by **full scanning mode**, scan_fast() returns two
> values:
>   UBI_NO_FASTMAP: scan all pebs from pnum UBI_FM_MAX_START, ai is
> assigned with scan_ai, at last, all fastmap pebs are added into
> 'ai->fastmap'
>   UBI_BAD_FASTMAP: scan all pebs from pnum 0, all fastmap pebs are
> added into 'ai->fastmap'

I fear, I'm unable to follow your thoughts.
ubi_wl_init() is called in both cases, with and without fastmap.
And ai->fastmap contains all anchor PEBs that scan_fast() found.
This can be the most recent but also outdated anchor PEBs.

Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
  2022-01-11 11:57             ` Richard Weinberger
@ 2022-01-11 13:23               ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2022-01-11 13:23 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

在 2022/1/11 19:57, Richard Weinberger 写道:
> ubi_wl_init() is called in both cases, with and without fastmap.
I agree.

> And ai->fastmap contains all anchor PEBs that scan_fast() found.
> This can be the most recent but also outdated anchor PEBs.
Is it exists a case that outdated fastmap PEBs are neither counted into 
'fmhdr->erase_peb_count' nor scanned into 'ai->fastmap' after attaching 
by fastmap.

1853                         if (ubi->lookuptbl[aeb->pnum]) 

1854                                 continue; 

1855 

1856                         /* 

1857                          * The fastmap update code might not find a 
free PEB for
1858                          * writing the fastmap anchor to and then 
reuses the
1859                          * current fastmap anchor PEB. When this 
PEB gets erased
1860                          * and a power cut happens before it is 
written again we
1861                          * must make sure that the fastmap attach 
code doesn't
1862                          * find any outdated fastmap anchors, hence 
we erase the
1863                          * outdated fastmap anchor PEBs 
synchronously here.
1864                          */ 

1865                         if (aeb->vol_id == UBI_FM_SB_VOLUME_ID) 

1866                                 sync = true;

I think UBI attaches failed by fastmap if kernel goes here.
1870                         err = erase_aeb(ubi, aeb, sync);

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
@ 2022-01-11 13:23               ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2022-01-11 13:23 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

在 2022/1/11 19:57, Richard Weinberger 写道:
> ubi_wl_init() is called in both cases, with and without fastmap.
I agree.

> And ai->fastmap contains all anchor PEBs that scan_fast() found.
> This can be the most recent but also outdated anchor PEBs.
Is it exists a case that outdated fastmap PEBs are neither counted into 
'fmhdr->erase_peb_count' nor scanned into 'ai->fastmap' after attaching 
by fastmap.

1853                         if (ubi->lookuptbl[aeb->pnum]) 

1854                                 continue; 

1855 

1856                         /* 

1857                          * The fastmap update code might not find a 
free PEB for
1858                          * writing the fastmap anchor to and then 
reuses the
1859                          * current fastmap anchor PEB. When this 
PEB gets erased
1860                          * and a power cut happens before it is 
written again we
1861                          * must make sure that the fastmap attach 
code doesn't
1862                          * find any outdated fastmap anchors, hence 
we erase the
1863                          * outdated fastmap anchor PEBs 
synchronously here.
1864                          */ 

1865                         if (aeb->vol_id == UBI_FM_SB_VOLUME_ID) 

1866                                 sync = true;

I think UBI attaches failed by fastmap if kernel goes here.
1870                         err = erase_aeb(ubi, aeb, sync);

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
  2022-01-11 13:23               ` Zhihao Cheng
@ 2022-01-11 13:56                 ` Richard Weinberger
  -1 siblings, 0 replies; 76+ messages in thread
From: Richard Weinberger @ 2022-01-11 13:56 UTC (permalink / raw)
  To: chengzhihao1
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "richard" <richard@nod.at>
> CC: "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>, "mcoquelin stm32"
> <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer"
> <s.hauer@pengutronix.de>, "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Gesendet: Dienstag, 11. Januar 2022 14:23:52
> Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

> 在 2022/1/11 19:57, Richard Weinberger 写道:
>> ubi_wl_init() is called in both cases, with and without fastmap.
> I agree.
> 
>> And ai->fastmap contains all anchor PEBs that scan_fast() found.
>> This can be the most recent but also outdated anchor PEBs.
> Is it exists a case that outdated fastmap PEBs are neither counted into
> 'fmhdr->erase_peb_count' nor scanned into 'ai->fastmap' after attaching
> by fastmap.
> 

[...]
 
> I think UBI attaches failed by fastmap if kernel goes here.
> 1870                         err = erase_aeb(ubi, aeb, sync);

Hmm, I think the paranoia check in fastmap.c is too strict these days.
        if (WARN_ON(count_fastmap_pebs(ai) != ubi->peb_count -
                    ai->bad_peb_count - fm->used_blocks))
                goto fail_bad;

It does not account ai->fastmap. So if ai->fastmap contains old anchor PEBs
this check will trigger and force falling back to scanning mode.
With this check fixed, ubi_wl_init() will erase all old PEBs from ai->fastmap.

So I agree that this code path is wonky and can be cleaned up. But please be
extremely careful and give all your changes excessive testing with real workload
and power-cuts.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
@ 2022-01-11 13:56                 ` Richard Weinberger
  0 siblings, 0 replies; 76+ messages in thread
From: Richard Weinberger @ 2022-01-11 13:56 UTC (permalink / raw)
  To: chengzhihao1
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "richard" <richard@nod.at>
> CC: "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>, "mcoquelin stm32"
> <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer"
> <s.hauer@pengutronix.de>, "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Gesendet: Dienstag, 11. Januar 2022 14:23:52
> Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

> 在 2022/1/11 19:57, Richard Weinberger 写道:
>> ubi_wl_init() is called in both cases, with and without fastmap.
> I agree.
> 
>> And ai->fastmap contains all anchor PEBs that scan_fast() found.
>> This can be the most recent but also outdated anchor PEBs.
> Is it exists a case that outdated fastmap PEBs are neither counted into
> 'fmhdr->erase_peb_count' nor scanned into 'ai->fastmap' after attaching
> by fastmap.
> 

[...]
 
> I think UBI attaches failed by fastmap if kernel goes here.
> 1870                         err = erase_aeb(ubi, aeb, sync);

Hmm, I think the paranoia check in fastmap.c is too strict these days.
        if (WARN_ON(count_fastmap_pebs(ai) != ubi->peb_count -
                    ai->bad_peb_count - fm->used_blocks))
                goto fail_bad;

It does not account ai->fastmap. So if ai->fastmap contains old anchor PEBs
this check will trigger and force falling back to scanning mode.
With this check fixed, ubi_wl_init() will erase all old PEBs from ai->fastmap.

So I agree that this code path is wonky and can be cleaned up. But please be
extremely careful and give all your changes excessive testing with real workload
and power-cuts.

Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
  2022-01-11 13:56                 ` Richard Weinberger
@ 2022-01-12  3:46                   ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2022-01-12  3:46 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

>>> ubi_wl_init() is called in both cases, with and without fastmap.
>> I agree.
>>
>>> And ai->fastmap contains all anchor PEBs that scan_fast() found.
>>> This can be the most recent but also outdated anchor PEBs.
>> Is it exists a case that outdated fastmap PEBs are neither counted into
>> 'fmhdr->erase_peb_count' nor scanned into 'ai->fastmap' after attaching
>> by fastmap.
>>
> 
> [...]
>   
>> I think UBI attaches failed by fastmap if kernel goes here.
>> 1870                         err = erase_aeb(ubi, aeb, sync);
> 
> Hmm, I think the paranoia check in fastmap.c is too strict these days.
>          if (WARN_ON(count_fastmap_pebs(ai) != ubi->peb_count -
>                      ai->bad_peb_count - fm->used_blocks))
>                  goto fail_bad;
> 
> It does not account ai->fastmap. So if ai->fastmap contains old anchor PEBs
> this check will trigger and force falling back to scanning mode.
> With this check fixed, ubi_wl_init() will erase all old PEBs from ai->fastmap.
Forgive my stubbornness, I think this strict check is good, could you 
show me a process to trigger this WARN_ON, it would be nice to provide a 
reproducer.
I still insist the point(after my fix patch applied): All outdated 
fastmap PEBs are added into 'ai->fastmap'(full scanning case) or counted 
into 'fmhdr->erase_peb_count'(fast attached case).
> 
> So I agree that this code path is wonky and can be cleaned up. But please be
> extremely careful and give all your changes excessive testing with real workload
> and power-cuts.Let's list all power-cut cases during fastmap updateing(I tried to 
simulate some of them on nandsim), in theory, I think the WARN_ON check 
is okay and all outdated fastmap PEBs can be erased in next attaching:

ubi_update_fastmap:
1565         for (i = 1; i < new_fm->used_blocks; i++) {
1570                 if (!tmp_e) {
1571                         if (old_fm && old_fm->e[i]) {
                              /* sync erase old fm data PEBs */
power-cut!!!
1585                         } else {
                              /* async erase old fm data PEBs */
power-cut!!!
1595                         }
1596                 } else {
1599                         if (old_fm && old_fm->e[i]) {
			     /* async erase old fm data PEBs */
power-cut!!!
1603                         }
1604                 }
1605         }
1621         if (old_fm) {
1623                 if (!tmp_e) {
  		     /* sync erase old fm anchor PEB */
power-cut!!!
1638                 } else {
		     /* async erase old fm anchor PEB */
power-cut!!!
1644                 }
1645         } else {
1658         }
1660         ret = ubi_write_fastmap(ubi, new_fm);

ubi_write_fastmap:
1324         ret = ubi_io_write_vid_hdr(ubi, new_fm->e[0]->pnum, avbuf);
power-cut!!!
1345         for (i = 1; i < new_fm->used_blocks; i++) {
1350                 ret = ubi_io_write_vid_hdr(...);
power-cut!!!
1356         }

1358         for (i = 0; i < new_fm->used_blocks; i++) {
1359                 ret = ubi_io_write_data(...),
power-cut!!!
1366         }

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
@ 2022-01-12  3:46                   ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2022-01-12  3:46 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

>>> ubi_wl_init() is called in both cases, with and without fastmap.
>> I agree.
>>
>>> And ai->fastmap contains all anchor PEBs that scan_fast() found.
>>> This can be the most recent but also outdated anchor PEBs.
>> Is it exists a case that outdated fastmap PEBs are neither counted into
>> 'fmhdr->erase_peb_count' nor scanned into 'ai->fastmap' after attaching
>> by fastmap.
>>
> 
> [...]
>   
>> I think UBI attaches failed by fastmap if kernel goes here.
>> 1870                         err = erase_aeb(ubi, aeb, sync);
> 
> Hmm, I think the paranoia check in fastmap.c is too strict these days.
>          if (WARN_ON(count_fastmap_pebs(ai) != ubi->peb_count -
>                      ai->bad_peb_count - fm->used_blocks))
>                  goto fail_bad;
> 
> It does not account ai->fastmap. So if ai->fastmap contains old anchor PEBs
> this check will trigger and force falling back to scanning mode.
> With this check fixed, ubi_wl_init() will erase all old PEBs from ai->fastmap.
Forgive my stubbornness, I think this strict check is good, could you 
show me a process to trigger this WARN_ON, it would be nice to provide a 
reproducer.
I still insist the point(after my fix patch applied): All outdated 
fastmap PEBs are added into 'ai->fastmap'(full scanning case) or counted 
into 'fmhdr->erase_peb_count'(fast attached case).
> 
> So I agree that this code path is wonky and can be cleaned up. But please be
> extremely careful and give all your changes excessive testing with real workload
> and power-cuts.Let's list all power-cut cases during fastmap updateing(I tried to 
simulate some of them on nandsim), in theory, I think the WARN_ON check 
is okay and all outdated fastmap PEBs can be erased in next attaching:

ubi_update_fastmap:
1565         for (i = 1; i < new_fm->used_blocks; i++) {
1570                 if (!tmp_e) {
1571                         if (old_fm && old_fm->e[i]) {
                              /* sync erase old fm data PEBs */
power-cut!!!
1585                         } else {
                              /* async erase old fm data PEBs */
power-cut!!!
1595                         }
1596                 } else {
1599                         if (old_fm && old_fm->e[i]) {
			     /* async erase old fm data PEBs */
power-cut!!!
1603                         }
1604                 }
1605         }
1621         if (old_fm) {
1623                 if (!tmp_e) {
  		     /* sync erase old fm anchor PEB */
power-cut!!!
1638                 } else {
		     /* async erase old fm anchor PEB */
power-cut!!!
1644                 }
1645         } else {
1658         }
1660         ret = ubi_write_fastmap(ubi, new_fm);

ubi_write_fastmap:
1324         ret = ubi_io_write_vid_hdr(ubi, new_fm->e[0]->pnum, avbuf);
power-cut!!!
1345         for (i = 1; i < new_fm->used_blocks; i++) {
1350                 ret = ubi_io_write_vid_hdr(...);
power-cut!!!
1356         }

1358         for (i = 0; i < new_fm->used_blocks; i++) {
1359                 ret = ubi_io_write_data(...),
power-cut!!!
1366         }

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
  2022-01-12  3:46                   ` Zhihao Cheng
@ 2022-01-14 18:45                     ` Richard Weinberger
  -1 siblings, 0 replies; 76+ messages in thread
From: Richard Weinberger @ 2022-01-14 18:45 UTC (permalink / raw)
  To: chengzhihao1
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "richard" <richard@nod.at>
> CC: "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>, "mcoquelin stm32"
> <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer"
> <s.hauer@pengutronix.de>, "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Gesendet: Mittwoch, 12. Januar 2022 04:46:28
> Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

>>>> ubi_wl_init() is called in both cases, with and without fastmap.
>>> I agree.
>>>
>>>> And ai->fastmap contains all anchor PEBs that scan_fast() found.
>>>> This can be the most recent but also outdated anchor PEBs.
>>> Is it exists a case that outdated fastmap PEBs are neither counted into
>>> 'fmhdr->erase_peb_count' nor scanned into 'ai->fastmap' after attaching
>>> by fastmap.
>>>
>> 
>> [...]
>>   
>>> I think UBI attaches failed by fastmap if kernel goes here.
>>> 1870                         err = erase_aeb(ubi, aeb, sync);
>> 
>> Hmm, I think the paranoia check in fastmap.c is too strict these days.
>>          if (WARN_ON(count_fastmap_pebs(ai) != ubi->peb_count -
>>                      ai->bad_peb_count - fm->used_blocks))
>>                  goto fail_bad;
>> 
>> It does not account ai->fastmap. So if ai->fastmap contains old anchor PEBs
>> this check will trigger and force falling back to scanning mode.
>> With this check fixed, ubi_wl_init() will erase all old PEBs from ai->fastmap.
> Forgive my stubbornness, I think this strict check is good, could you
> show me a process to trigger this WARN_ON, it would be nice to provide a
> reproducer.

You can trigger this by interrupting UBI.
e.g. When UBI writes a new fastmap to the NAND, it schedules the old fastmap
PEBs for erasure. PEB erasure is asynchronous in UBI. So this can be delayed
for a very long time.
While developing UBI fastmap and performing powercut tests I saw this often
on targets.

> I still insist the point(after my fix patch applied): All outdated
> fastmap PEBs are added into 'ai->fastmap'(full scanning case) or counted
> into 'fmhdr->erase_peb_count'(fast attached case).

Yes. But if you look into ubi_wl_init() you see that fastmap anchor PEBs
get erases synchronously(!). The comment before the erasure explains why.
To complicate things, this code is currently unreachable because the WARN_ON()
is not right. I misses to count ai->fastmap.
So, when there are old fastmap PEBs found, the counter does not match
and UBI falls back to full scanning while it could to an attach by fastmap.

Fastmap is full with corner cases that have been found by massive amount of testing, sadly.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
@ 2022-01-14 18:45                     ` Richard Weinberger
  0 siblings, 0 replies; 76+ messages in thread
From: Richard Weinberger @ 2022-01-14 18:45 UTC (permalink / raw)
  To: chengzhihao1
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "richard" <richard@nod.at>
> CC: "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>, "mcoquelin stm32"
> <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer"
> <s.hauer@pengutronix.de>, "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Gesendet: Mittwoch, 12. Januar 2022 04:46:28
> Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

>>>> ubi_wl_init() is called in both cases, with and without fastmap.
>>> I agree.
>>>
>>>> And ai->fastmap contains all anchor PEBs that scan_fast() found.
>>>> This can be the most recent but also outdated anchor PEBs.
>>> Is it exists a case that outdated fastmap PEBs are neither counted into
>>> 'fmhdr->erase_peb_count' nor scanned into 'ai->fastmap' after attaching
>>> by fastmap.
>>>
>> 
>> [...]
>>   
>>> I think UBI attaches failed by fastmap if kernel goes here.
>>> 1870                         err = erase_aeb(ubi, aeb, sync);
>> 
>> Hmm, I think the paranoia check in fastmap.c is too strict these days.
>>          if (WARN_ON(count_fastmap_pebs(ai) != ubi->peb_count -
>>                      ai->bad_peb_count - fm->used_blocks))
>>                  goto fail_bad;
>> 
>> It does not account ai->fastmap. So if ai->fastmap contains old anchor PEBs
>> this check will trigger and force falling back to scanning mode.
>> With this check fixed, ubi_wl_init() will erase all old PEBs from ai->fastmap.
> Forgive my stubbornness, I think this strict check is good, could you
> show me a process to trigger this WARN_ON, it would be nice to provide a
> reproducer.

You can trigger this by interrupting UBI.
e.g. When UBI writes a new fastmap to the NAND, it schedules the old fastmap
PEBs for erasure. PEB erasure is asynchronous in UBI. So this can be delayed
for a very long time.
While developing UBI fastmap and performing powercut tests I saw this often
on targets.

> I still insist the point(after my fix patch applied): All outdated
> fastmap PEBs are added into 'ai->fastmap'(full scanning case) or counted
> into 'fmhdr->erase_peb_count'(fast attached case).

Yes. But if you look into ubi_wl_init() you see that fastmap anchor PEBs
get erases synchronously(!). The comment before the erasure explains why.
To complicate things, this code is currently unreachable because the WARN_ON()
is not right. I misses to count ai->fastmap.
So, when there are old fastmap PEBs found, the counter does not match
and UBI falls back to full scanning while it could to an attach by fastmap.

Fastmap is full with corner cases that have been found by massive amount of testing, sadly.

Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
  2022-01-14 18:45                     ` Richard Weinberger
@ 2022-01-15  8:22                       ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2022-01-15  8:22 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

>>>           if (WARN_ON(count_fastmap_pebs(ai) != ubi->peb_count -
>>>                       ai->bad_peb_count - fm->used_blocks))
>>>                   goto fail_bad;
>>>
>>> It does not account ai->fastmap. So if ai->fastmap contains old anchor PEBs
>>> this check will trigger and force falling back to scanning mode.
>>> With this check fixed, ubi_wl_init() will erase all old PEBs from ai->fastmap.
>> Forgive my stubbornness, I think this strict check is good, could you
>> show me a process to trigger this WARN_ON, it would be nice to provide a
>> reproducer.
> 
> You can trigger this by interrupting UBI.
> e.g. When UBI writes a new fastmap to the NAND, it schedules the old fastmap
> PEBs for erasure. PEB erasure is asynchronous in UBI. So this can be delayed
> for a very long time.
I tried with following modifications:

diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
index 01dcdd94c9d2..253e76e2a7d7 100644
--- a/drivers/mtd/ubi/fastmap.c
+++ b/drivers/mtd/ubi/fastmap.c
@@ -679,6 +679,7 @@ static int ubi_attach_fastmap(struct ubi_device *ubi,
  		if (fm_pos >= fm_size)
  			goto fail_bad;

+		pr_err("%s: add %d to erase list\n", __func__, be32_to_cpu(fmec->pnum));
  		ret = add_aeb(ai, &ai->erase, be32_to_cpu(fmec->pnum),
  			      be32_to_cpu(fmec->ec), 1);
  		if (ret)
@@ -1103,6 +1104,7 @@ void ubi_fastmap_destroy_checkmap(struct 
ubi_volume *vol)
   *
   * Returns 0 on success, < 0 indicates an internal error.
   */
+#include <linux/delay.h>
  static int ubi_write_fastmap(struct ubi_device *ubi,
  			     struct ubi_fastmap_layout *new_fm)
  {
@@ -1148,6 +1150,8 @@ static int ubi_write_fastmap(struct ubi_device *ubi,
  		goto out_free_dvbuf;
  	}

+	pr_err("%s: wait all erase workers deleted from list\n", __func__);
+	msleep(500);
  	spin_lock(&ubi->volumes_lock);
  	spin_lock(&ubi->wl_lock);

@@ -1196,6 +1200,8 @@ static int ubi_write_fastmap(struct ubi_device *ubi,

  	ubi_for_each_free_peb(ubi, wl_e, tmp_rb) {
  		fec = (struct ubi_fm_ec *)(fm_raw + fm_pos);
+		if (global_old_fm_peb == wl_e->pnum)
+			pr_err("%s: count peb %d as free(cannot happen)!!!\n", __func__, 
wl_e->pnum);

  		fec->pnum = cpu_to_be32(wl_e->pnum);
  		set_seen(ubi, wl_e->pnum, seen_pebs);
@@ -1208,6 +1214,8 @@ static int ubi_write_fastmap(struct ubi_device *ubi,
  	if (ubi->fm_next_anchor) {
  		fec = (struct ubi_fm_ec *)(fm_raw + fm_pos);

+		if (global_old_fm_peb == ubi->fm_next_anchor->pnum)
+			pr_err("%s: count peb %d as next anchor(cannot happen)!!!\n", 
__func__, ubi->fm_next_anchor->pnum);
  		fec->pnum = cpu_to_be32(ubi->fm_next_anchor->pnum);
  		set_seen(ubi, ubi->fm_next_anchor->pnum, seen_pebs);
  		fec->ec = cpu_to_be32(ubi->fm_next_anchor->ec);
@@ -1264,6 +1272,8 @@ static int ubi_write_fastmap(struct ubi_device *ubi,

  			fec = (struct ubi_fm_ec *)(fm_raw + fm_pos);

+			if (wl_e->pnum == global_old_fm_peb)
+				pr_err("%s: count fm anchor %d in erase work(should happen)!!!\n", 
__func__, wl_e->pnum);
  			fec->pnum = cpu_to_be32(wl_e->pnum);
  			set_seen(ubi, wl_e->pnum, seen_pebs);
  			fec->ec = cpu_to_be32(wl_e->ec);
@@ -1520,12 +1530,14 @@ static void return_fm_pebs(struct ubi_device *ubi,
   *
   * Returns 0 on success, < 0 indicates an internal error.
   */
+int global_old_fm_peb = -1;
  int ubi_update_fastmap(struct ubi_device *ubi)
  {
  	int ret, i, j;
  	struct ubi_fastmap_layout *new_fm, *old_fm;
  	struct ubi_wl_entry *tmp_e;

+	pr_err("%s: start\n", __func__);
  	down_write(&ubi->fm_protect);
  	down_write(&ubi->work_sem);
  	down_write(&ubi->fm_eba_sem);
@@ -1634,6 +1646,9 @@ int ubi_update_fastmap(struct ubi_device *ubi)
  			/* we've got a new anchor PEB, return the old one */
  			ubi_wl_put_fm_peb(ubi, old_fm->e[0], 0,
  					  old_fm->to_be_tortured[0]);
+			global_old_fm_peb = old_fm->e[0]->pnum;
+			pr_err("%s: put old fastmap peb %d\n", __func__, old_fm->e[0]->pnum);
+			/* Not check return value ? Fault injection may cause the WARNON() 
too! */
  			new_fm->e[0] = tmp_e;
  			old_fm->e[0] = NULL;
  		}
@@ -1665,6 +1680,7 @@ int ubi_update_fastmap(struct ubi_device *ubi)

  	ubi_ensure_anchor_pebs(ubi);

+	pr_err("%s: done\n", __func__);
  	return ret;

  err:
diff --git a/drivers/mtd/ubi/ubi.h b/drivers/mtd/ubi/ubi.h
index 7c083ad58274..17df8dcebbe9 100644
--- a/drivers/mtd/ubi/ubi.h
+++ b/drivers/mtd/ubi/ubi.h
@@ -37,6 +37,7 @@
  #define UBI_NAME_STR "ubi"

  struct ubi_device;
+extern int global_old_fm_peb;

  /* Normal UBI messages */
  __printf(2, 3)
diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c
index 8455f1d47f3c..a148280f4ce8 100644
--- a/drivers/mtd/ubi/wl.c
+++ b/drivers/mtd/ubi/wl.c
@@ -198,8 +198,12 @@ static int do_work(struct ubi_device *ubi)
  	 * the queue flush code has to be sure the whole queue of works is
  	 * done, and it takes the mutex in write mode.
  	 */
+	if (global_old_fm_peb != -1)
+		pr_err("-----\n");
  	down_read(&ubi->work_sem);
  	spin_lock(&ubi->wl_lock);
+	if (global_old_fm_peb != -1)
+		pr_err("=====\n");
  	if (list_empty(&ubi->works)) {
  		spin_unlock(&ubi->wl_lock);
  		up_read(&ubi->work_sem);
@@ -1070,6 +1074,7 @@ static int ensure_wear_leveling(struct ubi_device 
*ubi, int nested)
   * needed. Returns zero in case of success and a negative error code 
in case of
   * failure.
   */
+#include <linux/delay.h>
  static int __erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk)
  {
  	struct ubi_wl_entry *e = wl_wrk->e;
@@ -1078,6 +1083,12 @@ static int __erase_worker(struct ubi_device *ubi, 
struct ubi_work *wl_wrk)
  	int lnum = wl_wrk->lnum;
  	int err, available_consumed = 0;

+	if (pnum == global_old_fm_peb) {
+		pr_err("%s: erase worker(erase %d) has been deleted from list!", 
__func__, pnum);
+		pr_err("Dump ubi image to file\n");
+		ubi_ro_mode(ubi);
+	}
+
  	dbg_wl("erase PEB %d EC %d LEB %d:%d",
  	       pnum, e->ec, wl_wrk->vol_id, wl_wrk->lnum);

modprobe nandsim id_bytes="0xec,0xa1,0x00,0x15"
modprobe ubi mtd="0,2048" fm_autoconvert
sleep 1   # Wait until all erase works done
ubimkvol -N vol_a -m -n 0 /dev/ubi0  # trigger fastmap updating
dd if=/dev/mtd0 of=flash
rmmod ubi
nandwrite /dev/mtd0 flash > /dev/null
modprobe ubi mtd="0,2048" fm_autoconvert

Kernel will print:
[13198.624433] ubi_update_fastmap: start
[13198.625164] ubi_write_fastmap: wait all erase workers deleted from list
[13199.134011] ubi_update_fastmap: done		# first attaching triggers 
wearleveling
[13199.623438] ubi_update_fastmap: start	# volume creation triggers 
updating fastmap
[13199.624732] ubi_update_fastmap: put old fastmap peb 2	# schedule old 
fastmap PEB to erase
[13199.624757] -----
[13199.626532] ubi_write_fastmap: wait all erase workers deleted from list
[13200.133059] ubi_write_fastmap: count fm anchor 2 in erase work(should 
happen)!!!	# Count PEB(in erase work) into 'fmh->erase_peb_count'
[13200.136996] ubi_update_fastmap: done
[13200.137000] =====
[13200.138833] __erase_worker: erase worker(erase 2) has been deleted 
from list!
[13200.138842] Dump ubi image to file
[13203.091720] ubi0: attaching mtd0
[13203.111179] ubi_attach_fastmap: add 2 to erase list
[13203.113013] ubi0: attached by fastmap

I think the WARNON cannot be triggered because old fastmap PEBs are 
always counted into 'fmh->erase_peb_count', and 'ubi->work_sem' 
serilizes ubi_update_fastmap() and do_work():
             ubi_update_fastmap                       ubi_thread
     down_write(&ubi->work_sem)
       ubi_wl_put_fm_peb(old fastmap PEB)
         schedule_erase
           list_add_tail(&wrk->list, &ubi->works)
                                                        do_work
 
down_read(&ubi->work_sem)
                                                          // Stuck here!
       ubi_write_fastmap
         list_for_each_entry(ubi_wrk, &ubi->works, list)
           fec->pnum = cpu_to_be32(wl_e->pnum)
     up_write(&ubi->work_sem)
 
list_del(&wrk->list)
                                                          erase_worker

> While developing UBI fastmap and performing powercut tests I saw this often
> on targets.
Was commit 2e8f08deabbc7ee("ubi: Fix races around ubi_refill_pools()") 
applied at that time? The WANRON can be triggered without this commit, 
because this commit makes sure do_work() cannot happen between 
ubi_wl_put_fm_peb() and ubi_write_fastmap().
> 
>> I still insist the point(after my fix patch applied): All outdated
>> fastmap PEBs are added into 'ai->fastmap'(full scanning case) or counted
>> into 'fmhdr->erase_peb_count'(fast attached case).
> 
> Yes. But if you look into ubi_wl_init() you see that fastmap anchor PEBs
> get erases synchronously(!). The comment before the erasure explains why.
About erasing fastmap anchor PEB synchronously, I admit curreunt UBI 
implementation cannot satisfy it, even with my fix applied. Wait, it 
seems that UBI has never made it sure. Old fastmap PEBs could be erased 
asynchronously, they could be counted into 'fmh->erase_peb_count' even 
in early UBI implementation code, so old fastmap anchor PEB will be 
added into 'ai->erase' and be erased asynchronously in next attaching. 
But, I feel it is not a problem, find_fm_anchor() can help us find out 
the right fastmap anchor PEB according seqnum.
> To complicate things, this code is currently unreachable because the WARN_ON()
> is not right. I misses to count ai->fastmap.
> So, when there are old fastmap PEBs found, the counter does not match
> and UBI falls back to full scanning while it could to an attach by fastmap.
>I think the mainly cause of failed fastmap attaching is half-written of 
fastmap. The counter check(eg. WARNON) is okay.
> Fastmap is full with corner cases that have been found by massive amount of testing, sadly.

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
@ 2022-01-15  8:22                       ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2022-01-15  8:22 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

>>>           if (WARN_ON(count_fastmap_pebs(ai) != ubi->peb_count -
>>>                       ai->bad_peb_count - fm->used_blocks))
>>>                   goto fail_bad;
>>>
>>> It does not account ai->fastmap. So if ai->fastmap contains old anchor PEBs
>>> this check will trigger and force falling back to scanning mode.
>>> With this check fixed, ubi_wl_init() will erase all old PEBs from ai->fastmap.
>> Forgive my stubbornness, I think this strict check is good, could you
>> show me a process to trigger this WARN_ON, it would be nice to provide a
>> reproducer.
> 
> You can trigger this by interrupting UBI.
> e.g. When UBI writes a new fastmap to the NAND, it schedules the old fastmap
> PEBs for erasure. PEB erasure is asynchronous in UBI. So this can be delayed
> for a very long time.
I tried with following modifications:

diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
index 01dcdd94c9d2..253e76e2a7d7 100644
--- a/drivers/mtd/ubi/fastmap.c
+++ b/drivers/mtd/ubi/fastmap.c
@@ -679,6 +679,7 @@ static int ubi_attach_fastmap(struct ubi_device *ubi,
  		if (fm_pos >= fm_size)
  			goto fail_bad;

+		pr_err("%s: add %d to erase list\n", __func__, be32_to_cpu(fmec->pnum));
  		ret = add_aeb(ai, &ai->erase, be32_to_cpu(fmec->pnum),
  			      be32_to_cpu(fmec->ec), 1);
  		if (ret)
@@ -1103,6 +1104,7 @@ void ubi_fastmap_destroy_checkmap(struct 
ubi_volume *vol)
   *
   * Returns 0 on success, < 0 indicates an internal error.
   */
+#include <linux/delay.h>
  static int ubi_write_fastmap(struct ubi_device *ubi,
  			     struct ubi_fastmap_layout *new_fm)
  {
@@ -1148,6 +1150,8 @@ static int ubi_write_fastmap(struct ubi_device *ubi,
  		goto out_free_dvbuf;
  	}

+	pr_err("%s: wait all erase workers deleted from list\n", __func__);
+	msleep(500);
  	spin_lock(&ubi->volumes_lock);
  	spin_lock(&ubi->wl_lock);

@@ -1196,6 +1200,8 @@ static int ubi_write_fastmap(struct ubi_device *ubi,

  	ubi_for_each_free_peb(ubi, wl_e, tmp_rb) {
  		fec = (struct ubi_fm_ec *)(fm_raw + fm_pos);
+		if (global_old_fm_peb == wl_e->pnum)
+			pr_err("%s: count peb %d as free(cannot happen)!!!\n", __func__, 
wl_e->pnum);

  		fec->pnum = cpu_to_be32(wl_e->pnum);
  		set_seen(ubi, wl_e->pnum, seen_pebs);
@@ -1208,6 +1214,8 @@ static int ubi_write_fastmap(struct ubi_device *ubi,
  	if (ubi->fm_next_anchor) {
  		fec = (struct ubi_fm_ec *)(fm_raw + fm_pos);

+		if (global_old_fm_peb == ubi->fm_next_anchor->pnum)
+			pr_err("%s: count peb %d as next anchor(cannot happen)!!!\n", 
__func__, ubi->fm_next_anchor->pnum);
  		fec->pnum = cpu_to_be32(ubi->fm_next_anchor->pnum);
  		set_seen(ubi, ubi->fm_next_anchor->pnum, seen_pebs);
  		fec->ec = cpu_to_be32(ubi->fm_next_anchor->ec);
@@ -1264,6 +1272,8 @@ static int ubi_write_fastmap(struct ubi_device *ubi,

  			fec = (struct ubi_fm_ec *)(fm_raw + fm_pos);

+			if (wl_e->pnum == global_old_fm_peb)
+				pr_err("%s: count fm anchor %d in erase work(should happen)!!!\n", 
__func__, wl_e->pnum);
  			fec->pnum = cpu_to_be32(wl_e->pnum);
  			set_seen(ubi, wl_e->pnum, seen_pebs);
  			fec->ec = cpu_to_be32(wl_e->ec);
@@ -1520,12 +1530,14 @@ static void return_fm_pebs(struct ubi_device *ubi,
   *
   * Returns 0 on success, < 0 indicates an internal error.
   */
+int global_old_fm_peb = -1;
  int ubi_update_fastmap(struct ubi_device *ubi)
  {
  	int ret, i, j;
  	struct ubi_fastmap_layout *new_fm, *old_fm;
  	struct ubi_wl_entry *tmp_e;

+	pr_err("%s: start\n", __func__);
  	down_write(&ubi->fm_protect);
  	down_write(&ubi->work_sem);
  	down_write(&ubi->fm_eba_sem);
@@ -1634,6 +1646,9 @@ int ubi_update_fastmap(struct ubi_device *ubi)
  			/* we've got a new anchor PEB, return the old one */
  			ubi_wl_put_fm_peb(ubi, old_fm->e[0], 0,
  					  old_fm->to_be_tortured[0]);
+			global_old_fm_peb = old_fm->e[0]->pnum;
+			pr_err("%s: put old fastmap peb %d\n", __func__, old_fm->e[0]->pnum);
+			/* Not check return value ? Fault injection may cause the WARNON() 
too! */
  			new_fm->e[0] = tmp_e;
  			old_fm->e[0] = NULL;
  		}
@@ -1665,6 +1680,7 @@ int ubi_update_fastmap(struct ubi_device *ubi)

  	ubi_ensure_anchor_pebs(ubi);

+	pr_err("%s: done\n", __func__);
  	return ret;

  err:
diff --git a/drivers/mtd/ubi/ubi.h b/drivers/mtd/ubi/ubi.h
index 7c083ad58274..17df8dcebbe9 100644
--- a/drivers/mtd/ubi/ubi.h
+++ b/drivers/mtd/ubi/ubi.h
@@ -37,6 +37,7 @@
  #define UBI_NAME_STR "ubi"

  struct ubi_device;
+extern int global_old_fm_peb;

  /* Normal UBI messages */
  __printf(2, 3)
diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c
index 8455f1d47f3c..a148280f4ce8 100644
--- a/drivers/mtd/ubi/wl.c
+++ b/drivers/mtd/ubi/wl.c
@@ -198,8 +198,12 @@ static int do_work(struct ubi_device *ubi)
  	 * the queue flush code has to be sure the whole queue of works is
  	 * done, and it takes the mutex in write mode.
  	 */
+	if (global_old_fm_peb != -1)
+		pr_err("-----\n");
  	down_read(&ubi->work_sem);
  	spin_lock(&ubi->wl_lock);
+	if (global_old_fm_peb != -1)
+		pr_err("=====\n");
  	if (list_empty(&ubi->works)) {
  		spin_unlock(&ubi->wl_lock);
  		up_read(&ubi->work_sem);
@@ -1070,6 +1074,7 @@ static int ensure_wear_leveling(struct ubi_device 
*ubi, int nested)
   * needed. Returns zero in case of success and a negative error code 
in case of
   * failure.
   */
+#include <linux/delay.h>
  static int __erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk)
  {
  	struct ubi_wl_entry *e = wl_wrk->e;
@@ -1078,6 +1083,12 @@ static int __erase_worker(struct ubi_device *ubi, 
struct ubi_work *wl_wrk)
  	int lnum = wl_wrk->lnum;
  	int err, available_consumed = 0;

+	if (pnum == global_old_fm_peb) {
+		pr_err("%s: erase worker(erase %d) has been deleted from list!", 
__func__, pnum);
+		pr_err("Dump ubi image to file\n");
+		ubi_ro_mode(ubi);
+	}
+
  	dbg_wl("erase PEB %d EC %d LEB %d:%d",
  	       pnum, e->ec, wl_wrk->vol_id, wl_wrk->lnum);

modprobe nandsim id_bytes="0xec,0xa1,0x00,0x15"
modprobe ubi mtd="0,2048" fm_autoconvert
sleep 1   # Wait until all erase works done
ubimkvol -N vol_a -m -n 0 /dev/ubi0  # trigger fastmap updating
dd if=/dev/mtd0 of=flash
rmmod ubi
nandwrite /dev/mtd0 flash > /dev/null
modprobe ubi mtd="0,2048" fm_autoconvert

Kernel will print:
[13198.624433] ubi_update_fastmap: start
[13198.625164] ubi_write_fastmap: wait all erase workers deleted from list
[13199.134011] ubi_update_fastmap: done		# first attaching triggers 
wearleveling
[13199.623438] ubi_update_fastmap: start	# volume creation triggers 
updating fastmap
[13199.624732] ubi_update_fastmap: put old fastmap peb 2	# schedule old 
fastmap PEB to erase
[13199.624757] -----
[13199.626532] ubi_write_fastmap: wait all erase workers deleted from list
[13200.133059] ubi_write_fastmap: count fm anchor 2 in erase work(should 
happen)!!!	# Count PEB(in erase work) into 'fmh->erase_peb_count'
[13200.136996] ubi_update_fastmap: done
[13200.137000] =====
[13200.138833] __erase_worker: erase worker(erase 2) has been deleted 
from list!
[13200.138842] Dump ubi image to file
[13203.091720] ubi0: attaching mtd0
[13203.111179] ubi_attach_fastmap: add 2 to erase list
[13203.113013] ubi0: attached by fastmap

I think the WARNON cannot be triggered because old fastmap PEBs are 
always counted into 'fmh->erase_peb_count', and 'ubi->work_sem' 
serilizes ubi_update_fastmap() and do_work():
             ubi_update_fastmap                       ubi_thread
     down_write(&ubi->work_sem)
       ubi_wl_put_fm_peb(old fastmap PEB)
         schedule_erase
           list_add_tail(&wrk->list, &ubi->works)
                                                        do_work
 
down_read(&ubi->work_sem)
                                                          // Stuck here!
       ubi_write_fastmap
         list_for_each_entry(ubi_wrk, &ubi->works, list)
           fec->pnum = cpu_to_be32(wl_e->pnum)
     up_write(&ubi->work_sem)
 
list_del(&wrk->list)
                                                          erase_worker

> While developing UBI fastmap and performing powercut tests I saw this often
> on targets.
Was commit 2e8f08deabbc7ee("ubi: Fix races around ubi_refill_pools()") 
applied at that time? The WANRON can be triggered without this commit, 
because this commit makes sure do_work() cannot happen between 
ubi_wl_put_fm_peb() and ubi_write_fastmap().
> 
>> I still insist the point(after my fix patch applied): All outdated
>> fastmap PEBs are added into 'ai->fastmap'(full scanning case) or counted
>> into 'fmhdr->erase_peb_count'(fast attached case).
> 
> Yes. But if you look into ubi_wl_init() you see that fastmap anchor PEBs
> get erases synchronously(!). The comment before the erasure explains why.
About erasing fastmap anchor PEB synchronously, I admit curreunt UBI 
implementation cannot satisfy it, even with my fix applied. Wait, it 
seems that UBI has never made it sure. Old fastmap PEBs could be erased 
asynchronously, they could be counted into 'fmh->erase_peb_count' even 
in early UBI implementation code, so old fastmap anchor PEB will be 
added into 'ai->erase' and be erased asynchronously in next attaching. 
But, I feel it is not a problem, find_fm_anchor() can help us find out 
the right fastmap anchor PEB according seqnum.
> To complicate things, this code is currently unreachable because the WARN_ON()
> is not right. I misses to count ai->fastmap.
> So, when there are old fastmap PEBs found, the counter does not match
> and UBI falls back to full scanning while it could to an attach by fastmap.
>I think the mainly cause of failed fastmap attaching is half-written of 
fastmap. The counter check(eg. WARNON) is okay.
> Fastmap is full with corner cases that have been found by massive amount of testing, sadly.

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
  2022-01-15  8:22                       ` Zhihao Cheng
@ 2022-01-15  8:46                         ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2022-01-15  8:46 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

>> Yes. But if you look into ubi_wl_init() you see that fastmap anchor PEBs
>> get erases synchronously(!). The comment before the erasure explains why.
> About erasing fastmap anchor PEB synchronously, I admit curreunt UBI 
> implementation cannot satisfy it, even with my fix applied. Wait, it 
> seems that UBI has never made it sure. Old fastmap PEBs could be erased 
> asynchronously, they could be counted into 'fmh->erase_peb_count' even 
> in early UBI implementation code, so old fastmap anchor PEB will be 
> added into 'ai->erase' and be erased asynchronously in next attaching. 
In next attaching old fastmap PEBs will be processed as following:
ubi_attach_fastmap -> add_aeb(ai, &ai->erase...)
ubi_wl_init
   list_for_each_entry_safe(aeb, tmp, &ai->erase)
     erase_aeb       // erase asynchronously
       ubi->lookuptbl[e->pnum] = e
   list_for_each_entry(aeb, &ai->fastmap, u.list)
     e = ubi_find_fm_block(ubi, aeb->pnum)
     if (e) {
        ...
     } else {
       if (ubi->lookuptbl[aeb->pnum])     // old fastmap PEBs are 
assigned to 'ubi->lookuptbl'
         continue;
     }
> But, I feel it is not a problem, find_fm_anchor() can help us find out 
> the right fastmap anchor PEB according seqnum.

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
@ 2022-01-15  8:46                         ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2022-01-15  8:46 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

>> Yes. But if you look into ubi_wl_init() you see that fastmap anchor PEBs
>> get erases synchronously(!). The comment before the erasure explains why.
> About erasing fastmap anchor PEB synchronously, I admit curreunt UBI 
> implementation cannot satisfy it, even with my fix applied. Wait, it 
> seems that UBI has never made it sure. Old fastmap PEBs could be erased 
> asynchronously, they could be counted into 'fmh->erase_peb_count' even 
> in early UBI implementation code, so old fastmap anchor PEB will be 
> added into 'ai->erase' and be erased asynchronously in next attaching. 
In next attaching old fastmap PEBs will be processed as following:
ubi_attach_fastmap -> add_aeb(ai, &ai->erase...)
ubi_wl_init
   list_for_each_entry_safe(aeb, tmp, &ai->erase)
     erase_aeb       // erase asynchronously
       ubi->lookuptbl[e->pnum] = e
   list_for_each_entry(aeb, &ai->fastmap, u.list)
     e = ubi_find_fm_block(ubi, aeb->pnum)
     if (e) {
        ...
     } else {
       if (ubi->lookuptbl[aeb->pnum])     // old fastmap PEBs are 
assigned to 'ubi->lookuptbl'
         continue;
     }
> But, I feel it is not a problem, find_fm_anchor() can help us find out 
> the right fastmap anchor PEB according seqnum.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
  2022-01-15  8:46                         ` Zhihao Cheng
@ 2022-01-15 10:01                           ` Richard Weinberger
  -1 siblings, 0 replies; 76+ messages in thread
From: Richard Weinberger @ 2022-01-15 10:01 UTC (permalink / raw)
  To: chengzhihao1
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "richard" <richard@nod.at>
> CC: "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>, "mcoquelin stm32"
> <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer"
> <s.hauer@pengutronix.de>, "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Gesendet: Samstag, 15. Januar 2022 09:46:07
> Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

>>> Yes. But if you look into ubi_wl_init() you see that fastmap anchor PEBs
>>> get erases synchronously(!). The comment before the erasure explains why.
>> About erasing fastmap anchor PEB synchronously, I admit curreunt UBI
>> implementation cannot satisfy it, even with my fix applied. Wait, it
>> seems that UBI has never made it sure. Old fastmap PEBs could be erased
>> asynchronously, they could be counted into 'fmh->erase_peb_count' even
>> in early UBI implementation code, so old fastmap anchor PEB will be
>> added into 'ai->erase' and be erased asynchronously in next attaching.
> In next attaching old fastmap PEBs will be processed as following:
> ubi_attach_fastmap -> add_aeb(ai, &ai->erase...)
> ubi_wl_init
>   list_for_each_entry_safe(aeb, tmp, &ai->erase)
>     erase_aeb       // erase asynchronously
>       ubi->lookuptbl[e->pnum] = e
>   list_for_each_entry(aeb, &ai->fastmap, u.list)
>     e = ubi_find_fm_block(ubi, aeb->pnum)
>     if (e) {
>        ...
>     } else {
>       if (ubi->lookuptbl[aeb->pnum])     // old fastmap PEBs are
> assigned to 'ubi->lookuptbl'
>         continue;
>     }
>> But, I feel it is not a problem, find_fm_anchor() can help us find out
> > the right fastmap anchor PEB according seqnum.

FYI, I think I understand now our disagreement.
You assume that old Fastmap PEBs are *guaranteed* to be part of Fastmap's erase list.
That's okay and this is what Linux as of today does.

My point is that we need to be paranoid and check carefully for old Fastmap PEBs
which might be *not* on the erase list.
I saw such issues in the wild. These were causes by old and/or buggy Fastmap
implementations.
Keep in mind that Linux is not the only system that implements UBI (and fastmap).

So let me give the whole situation another thought on how to improve it.
I totally agree with you that currently there is a problem with fm->used_blocks > 1.
I'm just careful (maybe too careful) about changing Fastmap code.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
@ 2022-01-15 10:01                           ` Richard Weinberger
  0 siblings, 0 replies; 76+ messages in thread
From: Richard Weinberger @ 2022-01-15 10:01 UTC (permalink / raw)
  To: chengzhihao1
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "richard" <richard@nod.at>
> CC: "Miquel Raynal" <miquel.raynal@bootlin.com>, "Vignesh Raghavendra" <vigneshr@ti.com>, "mcoquelin stm32"
> <mcoquelin.stm32@gmail.com>, "kirill shutemov" <kirill.shutemov@linux.intel.com>, "Sascha Hauer"
> <s.hauer@pengutronix.de>, "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>
> Gesendet: Samstag, 15. Januar 2022 09:46:07
> Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

>>> Yes. But if you look into ubi_wl_init() you see that fastmap anchor PEBs
>>> get erases synchronously(!). The comment before the erasure explains why.
>> About erasing fastmap anchor PEB synchronously, I admit curreunt UBI
>> implementation cannot satisfy it, even with my fix applied. Wait, it
>> seems that UBI has never made it sure. Old fastmap PEBs could be erased
>> asynchronously, they could be counted into 'fmh->erase_peb_count' even
>> in early UBI implementation code, so old fastmap anchor PEB will be
>> added into 'ai->erase' and be erased asynchronously in next attaching.
> In next attaching old fastmap PEBs will be processed as following:
> ubi_attach_fastmap -> add_aeb(ai, &ai->erase...)
> ubi_wl_init
>   list_for_each_entry_safe(aeb, tmp, &ai->erase)
>     erase_aeb       // erase asynchronously
>       ubi->lookuptbl[e->pnum] = e
>   list_for_each_entry(aeb, &ai->fastmap, u.list)
>     e = ubi_find_fm_block(ubi, aeb->pnum)
>     if (e) {
>        ...
>     } else {
>       if (ubi->lookuptbl[aeb->pnum])     // old fastmap PEBs are
> assigned to 'ubi->lookuptbl'
>         continue;
>     }
>> But, I feel it is not a problem, find_fm_anchor() can help us find out
> > the right fastmap anchor PEB according seqnum.

FYI, I think I understand now our disagreement.
You assume that old Fastmap PEBs are *guaranteed* to be part of Fastmap's erase list.
That's okay and this is what Linux as of today does.

My point is that we need to be paranoid and check carefully for old Fastmap PEBs
which might be *not* on the erase list.
I saw such issues in the wild. These were causes by old and/or buggy Fastmap
implementations.
Keep in mind that Linux is not the only system that implements UBI (and fastmap).

So let me give the whole situation another thought on how to improve it.
I totally agree with you that currently there is a problem with fm->used_blocks > 1.
I'm just careful (maybe too careful) about changing Fastmap code.

Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
  2022-01-15 10:01                           ` Richard Weinberger
@ 2022-01-17  1:31                             ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2022-01-17  1:31 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

> FYI, I think I understand now our disagreement.
> You assume that old Fastmap PEBs are *guaranteed* to be part of Fastmap's erase list.
> That's okay and this is what Linux as of today does.
> 
> My point is that we need to be paranoid and check carefully for old Fastmap PEBs
> which might be *not* on the erase list.
> I saw such issues in the wild. These were causes by old and/or buggy Fastmap
> implementations.
> Keep in mind that Linux is not the only system that implements UBI (and fastmap).
Uh, that is really a point, I met UBI implemented in Vxworks ever. Now, 
you convinced me, we should process fastmap with considering bad 
images(caused by other implementations). Let's keep this wonky assertion 
until a better fix.
> 
> So let me give the whole situation another thought on how to improve it.
> I totally agree with you that currently there is a problem with fm->used_blocks > 1.
> I'm just careful (maybe too careful) about changing Fastmap code.
> 
> Thanks,
> //richard
> .
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
@ 2022-01-17  1:31                             ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2022-01-17  1:31 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

> FYI, I think I understand now our disagreement.
> You assume that old Fastmap PEBs are *guaranteed* to be part of Fastmap's erase list.
> That's okay and this is what Linux as of today does.
> 
> My point is that we need to be paranoid and check carefully for old Fastmap PEBs
> which might be *not* on the erase list.
> I saw such issues in the wild. These were causes by old and/or buggy Fastmap
> implementations.
> Keep in mind that Linux is not the only system that implements UBI (and fastmap).
Uh, that is really a point, I met UBI implemented in Vxworks ever. Now, 
you convinced me, we should process fastmap with considering bad 
images(caused by other implementations). Let's keep this wonky assertion 
until a better fix.
> 
> So let me give the whole situation another thought on how to improve it.
> I totally agree with you that currently there is a problem with fm->used_blocks > 1.
> I'm just careful (maybe too careful) about changing Fastmap code.
> 
> Thanks,
> //richard
> .
> 


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 15/15] ubi: fastmap: Fix high cpu usage of ubi_bgt by making sure wl_pool not empty
  2021-12-27  3:22   ` Zhihao Cheng
@ 2022-01-17  1:40     ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2022-01-17  1:40 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

Hi Richard,
> Fix it by:
>    1) Adding 2 PEBs reserved for fm_anchor and fm_next_anchor.
>    2) Abandoning filling wl_pool until free count belows beb_rsvd_pebs.
> Then, there are at least 2(EBA_RESERVED_PEBS + MIN_FASTMAP_RESERVED_PEBS -
> MIN_FASTMAP_TAKEN_PEBS[1]) PEBs in pool and 1(WL_RESERVED_PEBS) PEB in
> wl_pool after calling ubi_refill_pools() with all erase works done.
> 
> This modification will cause a compatibility problem with old UBI image.
> If UBI volumes take the maximun number of PEBs for one certain UBI device,
> there are no available PEBs to satisfy 2 new reserved PEBs, bad reserved
> PEBs are taken firstly, if still not enough, ENOSPC will returned from ubi
> initialization.
> 
Can you come up with a better solution that can be compatible with old 
images? In other words, can we solve this problem not by adding new 
reserved PEBs?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 15/15] ubi: fastmap: Fix high cpu usage of ubi_bgt by making sure wl_pool not empty
@ 2022-01-17  1:40     ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2022-01-17  1:40 UTC (permalink / raw)
  To: richard, miquel.raynal, vigneshr, mcoquelin.stm32,
	kirill.shutemov, s.hauer
  Cc: linux-mtd, linux-kernel

Hi Richard,
> Fix it by:
>    1) Adding 2 PEBs reserved for fm_anchor and fm_next_anchor.
>    2) Abandoning filling wl_pool until free count belows beb_rsvd_pebs.
> Then, there are at least 2(EBA_RESERVED_PEBS + MIN_FASTMAP_RESERVED_PEBS -
> MIN_FASTMAP_TAKEN_PEBS[1]) PEBs in pool and 1(WL_RESERVED_PEBS) PEB in
> wl_pool after calling ubi_refill_pools() with all erase works done.
> 
> This modification will cause a compatibility problem with old UBI image.
> If UBI volumes take the maximun number of PEBs for one certain UBI device,
> there are no available PEBs to satisfy 2 new reserved PEBs, bad reserved
> PEBs are taken firstly, if still not enough, ENOSPC will returned from ubi
> initialization.
> 
Can you come up with a better solution that can be compatible with old 
images? In other words, can we solve this problem not by adding new 
reserved PEBs?

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
  2022-01-17  1:31                             ` Zhihao Cheng
@ 2022-01-17  2:52                               ` Zhihao Cheng
  -1 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2022-01-17  2:52 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

Hi Richard,
>> FYI, I think I understand now our disagreement.
>> You assume that old Fastmap PEBs are *guaranteed* to be part of 
>> Fastmap's erase list.
>> That's okay and this is what Linux as of today does.
>>
>> My point is that we need to be paranoid and check carefully for old 
>> Fastmap PEBs
>> which might be *not* on the erase list.
>> I saw such issues in the wild. These were causes by old and/or buggy 
>> Fastmap
>> implementations.
>> Keep in mind that Linux is not the only system that implements UBI 
>> (and fastmap).
> Uh, that is really a point, I met UBI implemented in Vxworks ever. Now, 
> you convinced me, we should process fastmap with considering bad 
> images(caused by other implementations). Let's keep this wonky assertion 
> until a better fix.
>>
>> So let me give the whole situation another thought on how to improve it.
>> I totally agree with you that currently there is a problem with 
>> fm->used_blocks > 1.
>> I'm just careful (maybe too careful) about changing Fastmap code.
>>I reconsider the WARNON, it can recognize the bad images and fall back 
full scanning mode early. If linux kernel encounters the WARNON, it 
means something wrong with your image(caused by bad UBI implementation). 
I begin to like this strict check.
After comparing WARNON with the assertion:
   WARN_ON(count_fastmap_pebs(ai) != ubi->peb_count - ai->bad_peb_count 
- fm->used_blocks)
   ubi_assert(ubi->good_peb_count == found_pebs)
The 'found_pebs' consists of 'ai->erase', 'ai->free', 'ai->volumes' and 
'ai->fastmap'. The count_fastmap_pebs() includes 'ai->erase', 'ai->free' 
and 'ai->volumes'. This means the number of 'ai->fastmap' equals to 
'fm->used_blocks'. So, the 'ai->fastmap' adding position in my fix is right.
In a word, can you accept the point that 'WARNON' can help us recognize 
the bad fastmap images?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2
@ 2022-01-17  2:52                               ` Zhihao Cheng
  0 siblings, 0 replies; 76+ messages in thread
From: Zhihao Cheng @ 2022-01-17  2:52 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Miquel Raynal, Vignesh Raghavendra, mcoquelin stm32,
	kirill shutemov, Sascha Hauer, linux-mtd, linux-kernel

Hi Richard,
>> FYI, I think I understand now our disagreement.
>> You assume that old Fastmap PEBs are *guaranteed* to be part of 
>> Fastmap's erase list.
>> That's okay and this is what Linux as of today does.
>>
>> My point is that we need to be paranoid and check carefully for old 
>> Fastmap PEBs
>> which might be *not* on the erase list.
>> I saw such issues in the wild. These were causes by old and/or buggy 
>> Fastmap
>> implementations.
>> Keep in mind that Linux is not the only system that implements UBI 
>> (and fastmap).
> Uh, that is really a point, I met UBI implemented in Vxworks ever. Now, 
> you convinced me, we should process fastmap with considering bad 
> images(caused by other implementations). Let's keep this wonky assertion 
> until a better fix.
>>
>> So let me give the whole situation another thought on how to improve it.
>> I totally agree with you that currently there is a problem with 
>> fm->used_blocks > 1.
>> I'm just careful (maybe too careful) about changing Fastmap code.
>>I reconsider the WARNON, it can recognize the bad images and fall back 
full scanning mode early. If linux kernel encounters the WARNON, it 
means something wrong with your image(caused by bad UBI implementation). 
I begin to like this strict check.
After comparing WARNON with the assertion:
   WARN_ON(count_fastmap_pebs(ai) != ubi->peb_count - ai->bad_peb_count 
- fm->used_blocks)
   ubi_assert(ubi->good_peb_count == found_pebs)
The 'found_pebs' consists of 'ai->erase', 'ai->free', 'ai->volumes' and 
'ai->fastmap'. The count_fastmap_pebs() includes 'ai->erase', 'ai->free' 
and 'ai->volumes'. This means the number of 'ai->fastmap' equals to 
'fm->used_blocks'. So, the 'ai->fastmap' adding position in my fix is right.
In a word, can you accept the point that 'WARNON' can help us recognize 
the bad fastmap images?

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2022-01-17  2:53 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-27  3:22 [PATCH v6 00/15] Some bugfixs for ubi/ubifs Zhihao Cheng
2021-12-27  3:22 ` Zhihao Cheng
2021-12-27  3:22 ` [PATCH v6 01/15] ubifs: rename_whiteout: Fix double free for whiteout_ui->data Zhihao Cheng
2021-12-27  3:22   ` Zhihao Cheng
2021-12-27  3:22 ` [PATCH v6 02/15] ubifs: Fix deadlock in concurrent rename whiteout and inode writeback Zhihao Cheng
2021-12-27  3:22   ` Zhihao Cheng
2021-12-27  3:22 ` [PATCH v6 03/15] ubifs: Fix wrong number of inodes locked by ui_mutex in ubifs_inode comment Zhihao Cheng
2021-12-27  3:22   ` Zhihao Cheng
2021-12-27  3:22 ` [PATCH v6 04/15] ubifs: Add missing iput if do_tmpfile() failed in rename whiteout Zhihao Cheng
2021-12-27  3:22   ` Zhihao Cheng
2021-12-27  3:22 ` [PATCH v6 05/15] ubifs: Rename whiteout atomically Zhihao Cheng
2021-12-27  3:22   ` Zhihao Cheng
2022-01-09 21:14   ` Richard Weinberger
2022-01-09 21:14     ` Richard Weinberger
2022-01-10  9:35     ` Zhihao Cheng
2022-01-10  9:35       ` Zhihao Cheng
2022-01-10 10:14       ` Richard Weinberger
2022-01-10 10:14         ` Richard Weinberger
2022-01-10 20:58         ` Richard Weinberger
2022-01-10 20:58           ` Richard Weinberger
2021-12-27  3:22 ` [PATCH v6 06/15] ubifs: Fix 'ui->dirty' race between do_tmpfile() and writeback work Zhihao Cheng
2021-12-27  3:22   ` Zhihao Cheng
2021-12-27  3:22 ` [PATCH v6 07/15] ubifs: Rectify space amount budget for mkdir/tmpfile operations Zhihao Cheng
2021-12-27  3:22   ` Zhihao Cheng
2021-12-27  3:22 ` [PATCH v6 08/15] ubifs: setflags: Make dirtied_ino_d 8 bytes aligned Zhihao Cheng
2021-12-27  3:22   ` Zhihao Cheng
2021-12-27  3:22 ` [PATCH v6 09/15] ubifs: Fix read out-of-bounds in ubifs_wbuf_write_nolock() Zhihao Cheng
2021-12-27  3:22   ` Zhihao Cheng
2021-12-27  3:22 ` [PATCH v6 10/15] ubifs: Fix to add refcount once page is set private Zhihao Cheng
2021-12-27  3:22   ` Zhihao Cheng
2021-12-27  3:22 ` [PATCH v6 11/15] ubi: fastmap: Return error code if memory allocation fails in add_aeb() Zhihao Cheng
2021-12-27  3:22   ` Zhihao Cheng
2021-12-27  3:22 ` [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2 Zhihao Cheng
2021-12-27  3:22   ` Zhihao Cheng
2022-01-10 23:23   ` Richard Weinberger
2022-01-10 23:23     ` Richard Weinberger
2022-01-11  2:48     ` Zhihao Cheng
2022-01-11  2:48       ` Zhihao Cheng
2022-01-11  7:27       ` Richard Weinberger
2022-01-11  7:27         ` Richard Weinberger
2022-01-11 11:44         ` Zhihao Cheng
2022-01-11 11:44           ` Zhihao Cheng
2022-01-11 11:57           ` Richard Weinberger
2022-01-11 11:57             ` Richard Weinberger
2022-01-11 13:23             ` Zhihao Cheng
2022-01-11 13:23               ` Zhihao Cheng
2022-01-11 13:56               ` Richard Weinberger
2022-01-11 13:56                 ` Richard Weinberger
2022-01-12  3:46                 ` Zhihao Cheng
2022-01-12  3:46                   ` Zhihao Cheng
2022-01-14 18:45                   ` Richard Weinberger
2022-01-14 18:45                     ` Richard Weinberger
2022-01-15  8:22                     ` Zhihao Cheng
2022-01-15  8:22                       ` Zhihao Cheng
2022-01-15  8:46                       ` Zhihao Cheng
2022-01-15  8:46                         ` Zhihao Cheng
2022-01-15 10:01                         ` Richard Weinberger
2022-01-15 10:01                           ` Richard Weinberger
2022-01-17  1:31                           ` Zhihao Cheng
2022-01-17  1:31                             ` Zhihao Cheng
2022-01-17  2:52                             ` Zhihao Cheng
2022-01-17  2:52                               ` Zhihao Cheng
2021-12-27  3:22 ` [PATCH v6 13/15] ubifs: ubifs_writepage: Mark page dirty after writing inode failed Zhihao Cheng
2021-12-27  3:22   ` Zhihao Cheng
2021-12-27  3:22 ` [PATCH v6 14/15] ubifs: ubifs_releasepage: Remove ubifs_assert(0) to valid this process Zhihao Cheng
2021-12-27  3:22   ` Zhihao Cheng
2021-12-27  3:22 ` [PATCH v6 15/15] ubi: fastmap: Fix high cpu usage of ubi_bgt by making sure wl_pool not empty Zhihao Cheng
2021-12-27  3:22   ` Zhihao Cheng
2022-01-17  1:40   ` Zhihao Cheng
2022-01-17  1:40     ` Zhihao Cheng
2021-12-27 10:13 ` [PATCH v6 00/15] Some bugfixs for ubi/ubifs Richard Weinberger
2021-12-27 10:13   ` Richard Weinberger
2021-12-27 12:19   ` Zhihao Cheng
2021-12-27 12:19     ` Zhihao Cheng
2021-12-27 13:00     ` Richard Weinberger
2021-12-27 13:00       ` Richard Weinberger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.