linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] f2fs: fix sbi->extent_list corruption issue
@ 2018-11-26  4:47 Sahitya Tummala
  2018-11-26  7:17 ` Chao Yu
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Sahitya Tummala @ 2018-11-26  4:47 UTC (permalink / raw)
  To: Jaegeuk Kim, Chao Yu, linux-f2fs-devel; +Cc: linux-kernel, Sahitya Tummala

When there is a failure in f2fs_fill_super() after/during
the recovery of fsync'd nodes, it frees the current sbi and
retries again. This time the mount is successful, but the files
that got recovered before retry, still holds the extent tree,
whose extent nodes list is corrupted since sbi and sbi->extent_list
is freed up. The list_del corruption issue is observed when the
file system is getting unmounted and when those recoverd files extent
node is being freed up in the below context.

list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
<...>
kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
task: fffffff1f46f2280 task.stack: ffffff8008068000
lr : __list_del_entry_valid+0x94/0xb4
pc : __list_del_entry_valid+0x94/0xb4
<...>
Call trace:
__list_del_entry_valid+0x94/0xb4
__release_extent_node+0xb0/0x114
__free_extent_tree+0x58/0x7c
f2fs_shrink_extent_tree+0xdc/0x3b0
f2fs_leave_shrinker+0x28/0x7c
f2fs_put_super+0xfc/0x1e0
generic_shutdown_super+0x70/0xf4
kill_block_super+0x2c/0x5c
kill_f2fs_super+0x44/0x50
deactivate_locked_super+0x60/0x8c
deactivate_super+0x68/0x74
cleanup_mnt+0x40/0x78
__cleanup_mnt+0x1c/0x28
task_work_run+0x48/0xd0
do_notify_resume+0x678/0xe98
work_pending+0x8/0x14

Fix this by cleaning up inodes, extent tree and nodes of those
recovered files before freeing up sbi and before next retry.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
---
v2:
-call evict_inodes() and f2fs_shrink_extent_tree() to cleanup inodes

 fs/f2fs/f2fs.h     |  1 +
 fs/f2fs/shrinker.c |  2 +-
 fs/f2fs/super.c    | 13 ++++++++++++-
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 1e03197..aaee63b 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3407,6 +3407,7 @@ struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root_cached *root,
 bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi,
 						struct rb_root_cached *root);
 unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink);
+unsigned long __count_extent_cache(struct f2fs_sb_info *sbi);
 bool f2fs_init_extent_tree(struct inode *inode, struct f2fs_extent *i_ext);
 void f2fs_drop_extent_tree(struct inode *inode);
 unsigned int f2fs_destroy_extent_node(struct inode *inode);
diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
index 9e13db9..7e3c13b 100644
--- a/fs/f2fs/shrinker.c
+++ b/fs/f2fs/shrinker.c
@@ -30,7 +30,7 @@ static unsigned long __count_free_nids(struct f2fs_sb_info *sbi)
 	return count > 0 ? count : 0;
 }
 
-static unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
+unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
 {
 	return atomic_read(&sbi->total_zombie_tree) +
 				atomic_read(&sbi->total_ext_node);
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index af58b2c..769e7b1 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -3016,6 +3016,16 @@ static void f2fs_tuning_parameters(struct f2fs_sb_info *sbi)
 	sbi->readdir_ra = 1;
 }
 
+static void f2fs_cleanup_inodes(struct f2fs_sb_info *sbi)
+{
+	struct super_block *sb = sbi->sb;
+
+	sync_filesystem(sb);
+	shrink_dcache_sb(sb);
+	evict_inodes(sb);
+	f2fs_shrink_extent_tree(sbi, __count_extent_cache(sbi));
+}
+
 static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
 {
 	struct f2fs_sb_info *sbi;
@@ -3402,6 +3412,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
 	 * falls into an infinite loop in f2fs_sync_meta_pages().
 	 */
 	truncate_inode_pages_final(META_MAPPING(sbi));
+	/* cleanup recovery and quota inodes */
+	f2fs_cleanup_inodes(sbi);
 	f2fs_unregister_sysfs(sbi);
 free_root_inode:
 	dput(sb->s_root);
@@ -3445,7 +3457,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
 	/* give only one another chance */
 	if (retry) {
 		retry = false;
-		shrink_dcache_sb(sb);
 		goto try_onemore;
 	}
 	return err;
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] f2fs: fix sbi->extent_list corruption issue
  2018-11-26  4:47 [PATCH v2] f2fs: fix sbi->extent_list corruption issue Sahitya Tummala
@ 2018-11-26  7:17 ` Chao Yu
  2018-11-27  0:30 ` Jaegeuk Kim
  2019-01-04  8:05 ` Sahitya Tummala
  2 siblings, 0 replies; 21+ messages in thread
From: Chao Yu @ 2018-11-26  7:17 UTC (permalink / raw)
  To: Sahitya Tummala, Jaegeuk Kim, linux-f2fs-devel; +Cc: linux-kernel

On 2018/11/26 12:47, Sahitya Tummala wrote:
> When there is a failure in f2fs_fill_super() after/during
> the recovery of fsync'd nodes, it frees the current sbi and
> retries again. This time the mount is successful, but the files
> that got recovered before retry, still holds the extent tree,
> whose extent nodes list is corrupted since sbi and sbi->extent_list
> is freed up. The list_del corruption issue is observed when the
> file system is getting unmounted and when those recoverd files extent
> node is being freed up in the below context.
> 
> list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
> <...>
> kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
> task: fffffff1f46f2280 task.stack: ffffff8008068000
> lr : __list_del_entry_valid+0x94/0xb4
> pc : __list_del_entry_valid+0x94/0xb4
> <...>
> Call trace:
> __list_del_entry_valid+0x94/0xb4
> __release_extent_node+0xb0/0x114
> __free_extent_tree+0x58/0x7c
> f2fs_shrink_extent_tree+0xdc/0x3b0
> f2fs_leave_shrinker+0x28/0x7c
> f2fs_put_super+0xfc/0x1e0
> generic_shutdown_super+0x70/0xf4
> kill_block_super+0x2c/0x5c
> kill_f2fs_super+0x44/0x50
> deactivate_locked_super+0x60/0x8c
> deactivate_super+0x68/0x74
> cleanup_mnt+0x40/0x78
> __cleanup_mnt+0x1c/0x28
> task_work_run+0x48/0xd0
> do_notify_resume+0x678/0xe98
> work_pending+0x8/0x14
> 
> Fix this by cleaning up inodes, extent tree and nodes of those
> recovered files before freeing up sbi and before next retry.
> 
> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>

Looks good to me now. :)

Reviewed-by: Chao Yu <yuchao0@huawei.com>

Thanks,


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] f2fs: fix sbi->extent_list corruption issue
  2018-11-26  4:47 [PATCH v2] f2fs: fix sbi->extent_list corruption issue Sahitya Tummala
  2018-11-26  7:17 ` Chao Yu
@ 2018-11-27  0:30 ` Jaegeuk Kim
  2018-11-27  1:42   ` Chao Yu
  2019-01-04  8:05 ` Sahitya Tummala
  2 siblings, 1 reply; 21+ messages in thread
From: Jaegeuk Kim @ 2018-11-27  0:30 UTC (permalink / raw)
  To: Sahitya Tummala; +Cc: Chao Yu, linux-f2fs-devel, linux-kernel

On 11/26, Sahitya Tummala wrote:
> When there is a failure in f2fs_fill_super() after/during
> the recovery of fsync'd nodes, it frees the current sbi and
> retries again. This time the mount is successful, but the files
> that got recovered before retry, still holds the extent tree,
> whose extent nodes list is corrupted since sbi and sbi->extent_list
> is freed up. The list_del corruption issue is observed when the
> file system is getting unmounted and when those recoverd files extent
> node is being freed up in the below context.
> 
> list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
> <...>
> kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
> task: fffffff1f46f2280 task.stack: ffffff8008068000
> lr : __list_del_entry_valid+0x94/0xb4
> pc : __list_del_entry_valid+0x94/0xb4
> <...>
> Call trace:
> __list_del_entry_valid+0x94/0xb4
> __release_extent_node+0xb0/0x114
> __free_extent_tree+0x58/0x7c
> f2fs_shrink_extent_tree+0xdc/0x3b0
> f2fs_leave_shrinker+0x28/0x7c
> f2fs_put_super+0xfc/0x1e0
> generic_shutdown_super+0x70/0xf4
> kill_block_super+0x2c/0x5c
> kill_f2fs_super+0x44/0x50
> deactivate_locked_super+0x60/0x8c
> deactivate_super+0x68/0x74
> cleanup_mnt+0x40/0x78
> __cleanup_mnt+0x1c/0x28
> task_work_run+0x48/0xd0
> do_notify_resume+0x678/0xe98
> work_pending+0x8/0x14
> 
> Fix this by cleaning up inodes, extent tree and nodes of those
> recovered files before freeing up sbi and before next retry.
> 
> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
> ---
> v2:
> -call evict_inodes() and f2fs_shrink_extent_tree() to cleanup inodes
> 
>  fs/f2fs/f2fs.h     |  1 +
>  fs/f2fs/shrinker.c |  2 +-
>  fs/f2fs/super.c    | 13 ++++++++++++-
>  3 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 1e03197..aaee63b 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -3407,6 +3407,7 @@ struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root_cached *root,
>  bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi,
>  						struct rb_root_cached *root);
>  unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink);
> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi);
>  bool f2fs_init_extent_tree(struct inode *inode, struct f2fs_extent *i_ext);
>  void f2fs_drop_extent_tree(struct inode *inode);
>  unsigned int f2fs_destroy_extent_node(struct inode *inode);
> diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
> index 9e13db9..7e3c13b 100644
> --- a/fs/f2fs/shrinker.c
> +++ b/fs/f2fs/shrinker.c
> @@ -30,7 +30,7 @@ static unsigned long __count_free_nids(struct f2fs_sb_info *sbi)
>  	return count > 0 ? count : 0;
>  }
>  
> -static unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
>  {
>  	return atomic_read(&sbi->total_zombie_tree) +
>  				atomic_read(&sbi->total_ext_node);
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index af58b2c..769e7b1 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -3016,6 +3016,16 @@ static void f2fs_tuning_parameters(struct f2fs_sb_info *sbi)
>  	sbi->readdir_ra = 1;
>  }
>  
> +static void f2fs_cleanup_inodes(struct f2fs_sb_info *sbi)
> +{
> +	struct super_block *sb = sbi->sb;
> +
> +	sync_filesystem(sb);

This writes another checkpoint, which would not be what this retrial intended.
How about adding a condition in f2fs_may_extent_tree() when adding extents?
Likewise, if (shrinker is not registered) return false;


> +	shrink_dcache_sb(sb);
> +	evict_inodes(sb);
> +	f2fs_shrink_extent_tree(sbi, __count_extent_cache(sbi));
> +}
> +
>  static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>  {
>  	struct f2fs_sb_info *sbi;
> @@ -3402,6 +3412,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>  	 * falls into an infinite loop in f2fs_sync_meta_pages().
>  	 */
>  	truncate_inode_pages_final(META_MAPPING(sbi));
> +	/* cleanup recovery and quota inodes */
> +	f2fs_cleanup_inodes(sbi);
>  	f2fs_unregister_sysfs(sbi);
>  free_root_inode:
>  	dput(sb->s_root);
> @@ -3445,7 +3457,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>  	/* give only one another chance */
>  	if (retry) {
>  		retry = false;
> -		shrink_dcache_sb(sb);
>  		goto try_onemore;
>  	}
>  	return err;
> -- 
> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] f2fs: fix sbi->extent_list corruption issue
  2018-11-27  0:30 ` Jaegeuk Kim
@ 2018-11-27  1:42   ` Chao Yu
  2018-11-29  3:32     ` Sahitya Tummala
  0 siblings, 1 reply; 21+ messages in thread
From: Chao Yu @ 2018-11-27  1:42 UTC (permalink / raw)
  To: Jaegeuk Kim, Sahitya Tummala; +Cc: linux-f2fs-devel, linux-kernel

On 2018/11/27 8:30, Jaegeuk Kim wrote:
> On 11/26, Sahitya Tummala wrote:
>> When there is a failure in f2fs_fill_super() after/during
>> the recovery of fsync'd nodes, it frees the current sbi and
>> retries again. This time the mount is successful, but the files
>> that got recovered before retry, still holds the extent tree,
>> whose extent nodes list is corrupted since sbi and sbi->extent_list
>> is freed up. The list_del corruption issue is observed when the
>> file system is getting unmounted and when those recoverd files extent
>> node is being freed up in the below context.
>>
>> list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
>> <...>
>> kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
>> task: fffffff1f46f2280 task.stack: ffffff8008068000
>> lr : __list_del_entry_valid+0x94/0xb4
>> pc : __list_del_entry_valid+0x94/0xb4
>> <...>
>> Call trace:
>> __list_del_entry_valid+0x94/0xb4
>> __release_extent_node+0xb0/0x114
>> __free_extent_tree+0x58/0x7c
>> f2fs_shrink_extent_tree+0xdc/0x3b0
>> f2fs_leave_shrinker+0x28/0x7c
>> f2fs_put_super+0xfc/0x1e0
>> generic_shutdown_super+0x70/0xf4
>> kill_block_super+0x2c/0x5c
>> kill_f2fs_super+0x44/0x50
>> deactivate_locked_super+0x60/0x8c
>> deactivate_super+0x68/0x74
>> cleanup_mnt+0x40/0x78
>> __cleanup_mnt+0x1c/0x28
>> task_work_run+0x48/0xd0
>> do_notify_resume+0x678/0xe98
>> work_pending+0x8/0x14
>>
>> Fix this by cleaning up inodes, extent tree and nodes of those
>> recovered files before freeing up sbi and before next retry.
>>
>> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
>> ---
>> v2:
>> -call evict_inodes() and f2fs_shrink_extent_tree() to cleanup inodes
>>
>>  fs/f2fs/f2fs.h     |  1 +
>>  fs/f2fs/shrinker.c |  2 +-
>>  fs/f2fs/super.c    | 13 ++++++++++++-
>>  3 files changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>> index 1e03197..aaee63b 100644
>> --- a/fs/f2fs/f2fs.h
>> +++ b/fs/f2fs/f2fs.h
>> @@ -3407,6 +3407,7 @@ struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root_cached *root,
>>  bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi,
>>  						struct rb_root_cached *root);
>>  unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink);
>> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi);
>>  bool f2fs_init_extent_tree(struct inode *inode, struct f2fs_extent *i_ext);
>>  void f2fs_drop_extent_tree(struct inode *inode);
>>  unsigned int f2fs_destroy_extent_node(struct inode *inode);
>> diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
>> index 9e13db9..7e3c13b 100644
>> --- a/fs/f2fs/shrinker.c
>> +++ b/fs/f2fs/shrinker.c
>> @@ -30,7 +30,7 @@ static unsigned long __count_free_nids(struct f2fs_sb_info *sbi)
>>  	return count > 0 ? count : 0;
>>  }
>>  
>> -static unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
>> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
>>  {
>>  	return atomic_read(&sbi->total_zombie_tree) +
>>  				atomic_read(&sbi->total_ext_node);
>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>> index af58b2c..769e7b1 100644
>> --- a/fs/f2fs/super.c
>> +++ b/fs/f2fs/super.c
>> @@ -3016,6 +3016,16 @@ static void f2fs_tuning_parameters(struct f2fs_sb_info *sbi)
>>  	sbi->readdir_ra = 1;
>>  }
>>  
>> +static void f2fs_cleanup_inodes(struct f2fs_sb_info *sbi)
>> +{
>> +	struct super_block *sb = sbi->sb;
>> +
>> +	sync_filesystem(sb);
> 
> This writes another checkpoint, which would not be what this retrial intended.

Actually, checkpoint will not be triggered due to SBI_POR_DOING flag check
as below:

int f2fs_sync_fs(struct super_block *sb, int sync)
{
...
	if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
		return -EAGAIN;
...
}

And also all dirty data/node won't be persisted due to SBI_POR_DOING flag,
IIUC.

Thanks,

> How about adding a condition in f2fs_may_extent_tree() when adding extents?
> Likewise, if (shrinker is not registered) return false;
> 
> 
>> +	shrink_dcache_sb(sb);
>> +	evict_inodes(sb);
>> +	f2fs_shrink_extent_tree(sbi, __count_extent_cache(sbi));
>> +}
>> +
>>  static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>  {
>>  	struct f2fs_sb_info *sbi;
>> @@ -3402,6 +3412,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>  	 * falls into an infinite loop in f2fs_sync_meta_pages().
>>  	 */
>>  	truncate_inode_pages_final(META_MAPPING(sbi));
>> +	/* cleanup recovery and quota inodes */
>> +	f2fs_cleanup_inodes(sbi);
>>  	f2fs_unregister_sysfs(sbi);
>>  free_root_inode:
>>  	dput(sb->s_root);
>> @@ -3445,7 +3457,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>  	/* give only one another chance */
>>  	if (retry) {
>>  		retry = false;
>> -		shrink_dcache_sb(sb);
>>  		goto try_onemore;
>>  	}
>>  	return err;
>> -- 
>> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
> 
> .
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] f2fs: fix sbi->extent_list corruption issue
  2018-11-27  1:42   ` Chao Yu
@ 2018-11-29  3:32     ` Sahitya Tummala
  2018-11-30 20:33       ` Jaegeuk Kim
  0 siblings, 1 reply; 21+ messages in thread
From: Sahitya Tummala @ 2018-11-29  3:32 UTC (permalink / raw)
  To: Chao Yu; +Cc: Jaegeuk Kim, linux-f2fs-devel, linux-kernel


On Tue, Nov 27, 2018 at 09:42:39AM +0800, Chao Yu wrote:
> On 2018/11/27 8:30, Jaegeuk Kim wrote:
> > On 11/26, Sahitya Tummala wrote:
> >> When there is a failure in f2fs_fill_super() after/during
> >> the recovery of fsync'd nodes, it frees the current sbi and
> >> retries again. This time the mount is successful, but the files
> >> that got recovered before retry, still holds the extent tree,
> >> whose extent nodes list is corrupted since sbi and sbi->extent_list
> >> is freed up. The list_del corruption issue is observed when the
> >> file system is getting unmounted and when those recoverd files extent
> >> node is being freed up in the below context.
> >>
> >> list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
> >> <...>
> >> kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
> >> task: fffffff1f46f2280 task.stack: ffffff8008068000
> >> lr : __list_del_entry_valid+0x94/0xb4
> >> pc : __list_del_entry_valid+0x94/0xb4
> >> <...>
> >> Call trace:
> >> __list_del_entry_valid+0x94/0xb4
> >> __release_extent_node+0xb0/0x114
> >> __free_extent_tree+0x58/0x7c
> >> f2fs_shrink_extent_tree+0xdc/0x3b0
> >> f2fs_leave_shrinker+0x28/0x7c
> >> f2fs_put_super+0xfc/0x1e0
> >> generic_shutdown_super+0x70/0xf4
> >> kill_block_super+0x2c/0x5c
> >> kill_f2fs_super+0x44/0x50
> >> deactivate_locked_super+0x60/0x8c
> >> deactivate_super+0x68/0x74
> >> cleanup_mnt+0x40/0x78
> >> __cleanup_mnt+0x1c/0x28
> >> task_work_run+0x48/0xd0
> >> do_notify_resume+0x678/0xe98
> >> work_pending+0x8/0x14
> >>
> >> Fix this by cleaning up inodes, extent tree and nodes of those
> >> recovered files before freeing up sbi and before next retry.
> >>
> >> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
> >> ---
> >> v2:
> >> -call evict_inodes() and f2fs_shrink_extent_tree() to cleanup inodes
> >>
> >>  fs/f2fs/f2fs.h     |  1 +
> >>  fs/f2fs/shrinker.c |  2 +-
> >>  fs/f2fs/super.c    | 13 ++++++++++++-
> >>  3 files changed, 14 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >> index 1e03197..aaee63b 100644
> >> --- a/fs/f2fs/f2fs.h
> >> +++ b/fs/f2fs/f2fs.h
> >> @@ -3407,6 +3407,7 @@ struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root_cached *root,
> >>  bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi,
> >>  						struct rb_root_cached *root);
> >>  unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink);
> >> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi);
> >>  bool f2fs_init_extent_tree(struct inode *inode, struct f2fs_extent *i_ext);
> >>  void f2fs_drop_extent_tree(struct inode *inode);
> >>  unsigned int f2fs_destroy_extent_node(struct inode *inode);
> >> diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
> >> index 9e13db9..7e3c13b 100644
> >> --- a/fs/f2fs/shrinker.c
> >> +++ b/fs/f2fs/shrinker.c
> >> @@ -30,7 +30,7 @@ static unsigned long __count_free_nids(struct f2fs_sb_info *sbi)
> >>  	return count > 0 ? count : 0;
> >>  }
> >>  
> >> -static unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
> >> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
> >>  {
> >>  	return atomic_read(&sbi->total_zombie_tree) +
> >>  				atomic_read(&sbi->total_ext_node);
> >> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> >> index af58b2c..769e7b1 100644
> >> --- a/fs/f2fs/super.c
> >> +++ b/fs/f2fs/super.c
> >> @@ -3016,6 +3016,16 @@ static void f2fs_tuning_parameters(struct f2fs_sb_info *sbi)
> >>  	sbi->readdir_ra = 1;
> >>  }
> >>  
> >> +static void f2fs_cleanup_inodes(struct f2fs_sb_info *sbi)
> >> +{
> >> +	struct super_block *sb = sbi->sb;
> >> +
> >> +	sync_filesystem(sb);
> > 
> > This writes another checkpoint, which would not be what this retrial intended.
> 
> Actually, checkpoint will not be triggered due to SBI_POR_DOING flag check
> as below:
> 
> int f2fs_sync_fs(struct super_block *sb, int sync)
> {
> ...
> 	if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
> 		return -EAGAIN;
> ...
> }
> 
> And also all dirty data/node won't be persisted due to SBI_POR_DOING flag,
> IIUC.
> 

Thanks Chao for the clarification.

Hi Jaegeuk,

Do you still have any further concerns or comments on this patch?

Thanks,
Sahitya.

> Thanks,
> 
> > How about adding a condition in f2fs_may_extent_tree() when adding extents?
> > Likewise, if (shrinker is not registered) return false;
> > 
> > 
> >> +	shrink_dcache_sb(sb);
> >> +	evict_inodes(sb);
> >> +	f2fs_shrink_extent_tree(sbi, __count_extent_cache(sbi));
> >> +}
> >> +
> >>  static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>  {
> >>  	struct f2fs_sb_info *sbi;
> >> @@ -3402,6 +3412,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>  	 * falls into an infinite loop in f2fs_sync_meta_pages().
> >>  	 */
> >>  	truncate_inode_pages_final(META_MAPPING(sbi));
> >> +	/* cleanup recovery and quota inodes */
> >> +	f2fs_cleanup_inodes(sbi);
> >>  	f2fs_unregister_sysfs(sbi);
> >>  free_root_inode:
> >>  	dput(sb->s_root);
> >> @@ -3445,7 +3457,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>  	/* give only one another chance */
> >>  	if (retry) {
> >>  		retry = false;
> >> -		shrink_dcache_sb(sb);
> >>  		goto try_onemore;
> >>  	}
> >>  	return err;
> >> -- 
> >> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
> >> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
> > 
> > .
> > 
> 

-- 
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] f2fs: fix sbi->extent_list corruption issue
  2018-11-29  3:32     ` Sahitya Tummala
@ 2018-11-30 20:33       ` Jaegeuk Kim
  2018-12-07  9:47         ` Chao Yu
  0 siblings, 1 reply; 21+ messages in thread
From: Jaegeuk Kim @ 2018-11-30 20:33 UTC (permalink / raw)
  To: Sahitya Tummala; +Cc: Chao Yu, linux-f2fs-devel, linux-kernel

On 11/29, Sahitya Tummala wrote:
> 
> On Tue, Nov 27, 2018 at 09:42:39AM +0800, Chao Yu wrote:
> > On 2018/11/27 8:30, Jaegeuk Kim wrote:
> > > On 11/26, Sahitya Tummala wrote:
> > >> When there is a failure in f2fs_fill_super() after/during
> > >> the recovery of fsync'd nodes, it frees the current sbi and
> > >> retries again. This time the mount is successful, but the files
> > >> that got recovered before retry, still holds the extent tree,
> > >> whose extent nodes list is corrupted since sbi and sbi->extent_list
> > >> is freed up. The list_del corruption issue is observed when the
> > >> file system is getting unmounted and when those recoverd files extent
> > >> node is being freed up in the below context.
> > >>
> > >> list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
> > >> <...>
> > >> kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
> > >> task: fffffff1f46f2280 task.stack: ffffff8008068000
> > >> lr : __list_del_entry_valid+0x94/0xb4
> > >> pc : __list_del_entry_valid+0x94/0xb4
> > >> <...>
> > >> Call trace:
> > >> __list_del_entry_valid+0x94/0xb4
> > >> __release_extent_node+0xb0/0x114
> > >> __free_extent_tree+0x58/0x7c
> > >> f2fs_shrink_extent_tree+0xdc/0x3b0
> > >> f2fs_leave_shrinker+0x28/0x7c
> > >> f2fs_put_super+0xfc/0x1e0
> > >> generic_shutdown_super+0x70/0xf4
> > >> kill_block_super+0x2c/0x5c
> > >> kill_f2fs_super+0x44/0x50
> > >> deactivate_locked_super+0x60/0x8c
> > >> deactivate_super+0x68/0x74
> > >> cleanup_mnt+0x40/0x78
> > >> __cleanup_mnt+0x1c/0x28
> > >> task_work_run+0x48/0xd0
> > >> do_notify_resume+0x678/0xe98
> > >> work_pending+0x8/0x14
> > >>
> > >> Fix this by cleaning up inodes, extent tree and nodes of those
> > >> recovered files before freeing up sbi and before next retry.
> > >>
> > >> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
> > >> ---
> > >> v2:
> > >> -call evict_inodes() and f2fs_shrink_extent_tree() to cleanup inodes
> > >>
> > >>  fs/f2fs/f2fs.h     |  1 +
> > >>  fs/f2fs/shrinker.c |  2 +-
> > >>  fs/f2fs/super.c    | 13 ++++++++++++-
> > >>  3 files changed, 14 insertions(+), 2 deletions(-)
> > >>
> > >> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > >> index 1e03197..aaee63b 100644
> > >> --- a/fs/f2fs/f2fs.h
> > >> +++ b/fs/f2fs/f2fs.h
> > >> @@ -3407,6 +3407,7 @@ struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root_cached *root,
> > >>  bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi,
> > >>  						struct rb_root_cached *root);
> > >>  unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink);
> > >> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi);
> > >>  bool f2fs_init_extent_tree(struct inode *inode, struct f2fs_extent *i_ext);
> > >>  void f2fs_drop_extent_tree(struct inode *inode);
> > >>  unsigned int f2fs_destroy_extent_node(struct inode *inode);
> > >> diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
> > >> index 9e13db9..7e3c13b 100644
> > >> --- a/fs/f2fs/shrinker.c
> > >> +++ b/fs/f2fs/shrinker.c
> > >> @@ -30,7 +30,7 @@ static unsigned long __count_free_nids(struct f2fs_sb_info *sbi)
> > >>  	return count > 0 ? count : 0;
> > >>  }
> > >>  
> > >> -static unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
> > >> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
> > >>  {
> > >>  	return atomic_read(&sbi->total_zombie_tree) +
> > >>  				atomic_read(&sbi->total_ext_node);
> > >> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > >> index af58b2c..769e7b1 100644
> > >> --- a/fs/f2fs/super.c
> > >> +++ b/fs/f2fs/super.c
> > >> @@ -3016,6 +3016,16 @@ static void f2fs_tuning_parameters(struct f2fs_sb_info *sbi)
> > >>  	sbi->readdir_ra = 1;
> > >>  }
> > >>  
> > >> +static void f2fs_cleanup_inodes(struct f2fs_sb_info *sbi)
> > >> +{
> > >> +	struct super_block *sb = sbi->sb;
> > >> +
> > >> +	sync_filesystem(sb);
> > > 
> > > This writes another checkpoint, which would not be what this retrial intended.
> > 
> > Actually, checkpoint will not be triggered due to SBI_POR_DOING flag check
> > as below:
> > 
> > int f2fs_sync_fs(struct super_block *sb, int sync)
> > {
> > ...
> > 	if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
> > 		return -EAGAIN;
> > ...
> > }
> > 
> > And also all dirty data/node won't be persisted due to SBI_POR_DOING flag,
> > IIUC.
> > 
> 
> Thanks Chao for the clarification.
> 
> Hi Jaegeuk,
> 
> Do you still have any further concerns or comments on this patch?

Could you try the below first?

--  How about adding a condition in f2fs_may_extent_tree() when adding extents?
-- Likewise, if (shrinker is not registered) return false;

If we can fix what you described directly, I don't want to rely on such the
assumptions saying we won't do checkpoint. This flow literally says syncing
and evicting cached objects, which opposed to what we'd like to drop all caches
and restart fill_super again.

Let me consider this as a final resolution.

Thanks,

> 
> Thanks,
> Sahitya.
> 
> > Thanks,
> > 
> > > How about adding a condition in f2fs_may_extent_tree() when adding extents?
> > > Likewise, if (shrinker is not registered) return false;
> > > 
> > > 
> > >> +	shrink_dcache_sb(sb);
> > >> +	evict_inodes(sb);
> > >> +	f2fs_shrink_extent_tree(sbi, __count_extent_cache(sbi));
> > >> +}
> > >> +
> > >>  static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> > >>  {
> > >>  	struct f2fs_sb_info *sbi;
> > >> @@ -3402,6 +3412,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> > >>  	 * falls into an infinite loop in f2fs_sync_meta_pages().
> > >>  	 */
> > >>  	truncate_inode_pages_final(META_MAPPING(sbi));
> > >> +	/* cleanup recovery and quota inodes */
> > >> +	f2fs_cleanup_inodes(sbi);
> > >>  	f2fs_unregister_sysfs(sbi);
> > >>  free_root_inode:
> > >>  	dput(sb->s_root);
> > >> @@ -3445,7 +3457,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> > >>  	/* give only one another chance */
> > >>  	if (retry) {
> > >>  		retry = false;
> > >> -		shrink_dcache_sb(sb);
> > >>  		goto try_onemore;
> > >>  	}
> > >>  	return err;
> > >> -- 
> > >> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
> > >> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
> > > 
> > > .
> > > 
> > 
> 
> -- 
> --
> Sent by a consultant of the Qualcomm Innovation Center, Inc.
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] f2fs: fix sbi->extent_list corruption issue
  2018-11-30 20:33       ` Jaegeuk Kim
@ 2018-12-07  9:47         ` Chao Yu
  2018-12-12  3:17           ` Sahitya Tummala
  0 siblings, 1 reply; 21+ messages in thread
From: Chao Yu @ 2018-12-07  9:47 UTC (permalink / raw)
  To: Jaegeuk Kim, Sahitya Tummala; +Cc: linux-f2fs-devel, linux-kernel

On 2018/12/1 4:33, Jaegeuk Kim wrote:
> On 11/29, Sahitya Tummala wrote:
>>
>> On Tue, Nov 27, 2018 at 09:42:39AM +0800, Chao Yu wrote:
>>> On 2018/11/27 8:30, Jaegeuk Kim wrote:
>>>> On 11/26, Sahitya Tummala wrote:
>>>>> When there is a failure in f2fs_fill_super() after/during
>>>>> the recovery of fsync'd nodes, it frees the current sbi and
>>>>> retries again. This time the mount is successful, but the files
>>>>> that got recovered before retry, still holds the extent tree,
>>>>> whose extent nodes list is corrupted since sbi and sbi->extent_list
>>>>> is freed up. The list_del corruption issue is observed when the
>>>>> file system is getting unmounted and when those recoverd files extent
>>>>> node is being freed up in the below context.
>>>>>
>>>>> list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
>>>>> <...>
>>>>> kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
>>>>> task: fffffff1f46f2280 task.stack: ffffff8008068000
>>>>> lr : __list_del_entry_valid+0x94/0xb4
>>>>> pc : __list_del_entry_valid+0x94/0xb4
>>>>> <...>
>>>>> Call trace:
>>>>> __list_del_entry_valid+0x94/0xb4
>>>>> __release_extent_node+0xb0/0x114
>>>>> __free_extent_tree+0x58/0x7c
>>>>> f2fs_shrink_extent_tree+0xdc/0x3b0
>>>>> f2fs_leave_shrinker+0x28/0x7c
>>>>> f2fs_put_super+0xfc/0x1e0
>>>>> generic_shutdown_super+0x70/0xf4
>>>>> kill_block_super+0x2c/0x5c
>>>>> kill_f2fs_super+0x44/0x50
>>>>> deactivate_locked_super+0x60/0x8c
>>>>> deactivate_super+0x68/0x74
>>>>> cleanup_mnt+0x40/0x78
>>>>> __cleanup_mnt+0x1c/0x28
>>>>> task_work_run+0x48/0xd0
>>>>> do_notify_resume+0x678/0xe98
>>>>> work_pending+0x8/0x14
>>>>>
>>>>> Fix this by cleaning up inodes, extent tree and nodes of those
>>>>> recovered files before freeing up sbi and before next retry.
>>>>>
>>>>> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
>>>>> ---
>>>>> v2:
>>>>> -call evict_inodes() and f2fs_shrink_extent_tree() to cleanup inodes
>>>>>
>>>>>  fs/f2fs/f2fs.h     |  1 +
>>>>>  fs/f2fs/shrinker.c |  2 +-
>>>>>  fs/f2fs/super.c    | 13 ++++++++++++-
>>>>>  3 files changed, 14 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>>>> index 1e03197..aaee63b 100644
>>>>> --- a/fs/f2fs/f2fs.h
>>>>> +++ b/fs/f2fs/f2fs.h
>>>>> @@ -3407,6 +3407,7 @@ struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root_cached *root,
>>>>>  bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi,
>>>>>  						struct rb_root_cached *root);
>>>>>  unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink);
>>>>> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi);
>>>>>  bool f2fs_init_extent_tree(struct inode *inode, struct f2fs_extent *i_ext);
>>>>>  void f2fs_drop_extent_tree(struct inode *inode);
>>>>>  unsigned int f2fs_destroy_extent_node(struct inode *inode);
>>>>> diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
>>>>> index 9e13db9..7e3c13b 100644
>>>>> --- a/fs/f2fs/shrinker.c
>>>>> +++ b/fs/f2fs/shrinker.c
>>>>> @@ -30,7 +30,7 @@ static unsigned long __count_free_nids(struct f2fs_sb_info *sbi)
>>>>>  	return count > 0 ? count : 0;
>>>>>  }
>>>>>  
>>>>> -static unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
>>>>> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
>>>>>  {
>>>>>  	return atomic_read(&sbi->total_zombie_tree) +
>>>>>  				atomic_read(&sbi->total_ext_node);
>>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>>>> index af58b2c..769e7b1 100644
>>>>> --- a/fs/f2fs/super.c
>>>>> +++ b/fs/f2fs/super.c
>>>>> @@ -3016,6 +3016,16 @@ static void f2fs_tuning_parameters(struct f2fs_sb_info *sbi)
>>>>>  	sbi->readdir_ra = 1;
>>>>>  }
>>>>>  
>>>>> +static void f2fs_cleanup_inodes(struct f2fs_sb_info *sbi)
>>>>> +{
>>>>> +	struct super_block *sb = sbi->sb;
>>>>> +
>>>>> +	sync_filesystem(sb);
>>>>
>>>> This writes another checkpoint, which would not be what this retrial intended.
>>>
>>> Actually, checkpoint will not be triggered due to SBI_POR_DOING flag check
>>> as below:
>>>
>>> int f2fs_sync_fs(struct super_block *sb, int sync)
>>> {
>>> ...
>>> 	if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
>>> 		return -EAGAIN;
>>> ...
>>> }
>>>
>>> And also all dirty data/node won't be persisted due to SBI_POR_DOING flag,
>>> IIUC.
>>>
>>
>> Thanks Chao for the clarification.
>>
>> Hi Jaegeuk,
>>
>> Do you still have any further concerns or comments on this patch?
> 
> Could you try the below first?
> 
> --  How about adding a condition in f2fs_may_extent_tree() when adding extents?
> -- Likewise, if (shrinker is not registered) return false;
> 
> If we can fix what you described directly, I don't want to rely on such the
> assumptions saying we won't do checkpoint. This flow literally says syncing
> and evicting cached objects, which opposed to what we'd like to drop all caches
> and restart fill_super again.
> 
> Let me consider this as a final resolution.

Jaegeuk,

Still I want to ask, what kind of scenario we have to add retry logic in
fill_super for? As in android scenario, it must be extreme rare case that
system runs out-of-memory in boot time...at least, I didn't get any kind of
report like that.

Thanks,

> 
> Thanks,
> 
>>
>> Thanks,
>> Sahitya.
>>
>>> Thanks,
>>>
>>>> How about adding a condition in f2fs_may_extent_tree() when adding extents?
>>>> Likewise, if (shrinker is not registered) return false;
>>>>
>>>>
>>>>> +	shrink_dcache_sb(sb);
>>>>> +	evict_inodes(sb);
>>>>> +	f2fs_shrink_extent_tree(sbi, __count_extent_cache(sbi));
>>>>> +}
>>>>> +
>>>>>  static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>>>>  {
>>>>>  	struct f2fs_sb_info *sbi;
>>>>> @@ -3402,6 +3412,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>>>>  	 * falls into an infinite loop in f2fs_sync_meta_pages().
>>>>>  	 */
>>>>>  	truncate_inode_pages_final(META_MAPPING(sbi));
>>>>> +	/* cleanup recovery and quota inodes */
>>>>> +	f2fs_cleanup_inodes(sbi);
>>>>>  	f2fs_unregister_sysfs(sbi);
>>>>>  free_root_inode:
>>>>>  	dput(sb->s_root);
>>>>> @@ -3445,7 +3457,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>>>>  	/* give only one another chance */
>>>>>  	if (retry) {
>>>>>  		retry = false;
>>>>> -		shrink_dcache_sb(sb);
>>>>>  		goto try_onemore;
>>>>>  	}
>>>>>  	return err;
>>>>> -- 
>>>>> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
>>>>
>>>> .
>>>>
>>>
>>
>> -- 
>> --
>> Sent by a consultant of the Qualcomm Innovation Center, Inc.
>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
> 
> .
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] f2fs: fix sbi->extent_list corruption issue
  2018-12-07  9:47         ` Chao Yu
@ 2018-12-12  3:17           ` Sahitya Tummala
  2018-12-12  3:36             ` Chao Yu
  0 siblings, 1 reply; 21+ messages in thread
From: Sahitya Tummala @ 2018-12-12  3:17 UTC (permalink / raw)
  To: Chao Yu; +Cc: Jaegeuk Kim, linux-f2fs-devel, linux-kernel

On Fri, Dec 07, 2018 at 05:47:31PM +0800, Chao Yu wrote:
> On 2018/12/1 4:33, Jaegeuk Kim wrote:
> > On 11/29, Sahitya Tummala wrote:
> >>
> >> On Tue, Nov 27, 2018 at 09:42:39AM +0800, Chao Yu wrote:
> >>> On 2018/11/27 8:30, Jaegeuk Kim wrote:
> >>>> On 11/26, Sahitya Tummala wrote:
> >>>>> When there is a failure in f2fs_fill_super() after/during
> >>>>> the recovery of fsync'd nodes, it frees the current sbi and
> >>>>> retries again. This time the mount is successful, but the files
> >>>>> that got recovered before retry, still holds the extent tree,
> >>>>> whose extent nodes list is corrupted since sbi and sbi->extent_list
> >>>>> is freed up. The list_del corruption issue is observed when the
> >>>>> file system is getting unmounted and when those recoverd files extent
> >>>>> node is being freed up in the below context.
> >>>>>
> >>>>> list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
> >>>>> <...>
> >>>>> kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
> >>>>> task: fffffff1f46f2280 task.stack: ffffff8008068000
> >>>>> lr : __list_del_entry_valid+0x94/0xb4
> >>>>> pc : __list_del_entry_valid+0x94/0xb4
> >>>>> <...>
> >>>>> Call trace:
> >>>>> __list_del_entry_valid+0x94/0xb4
> >>>>> __release_extent_node+0xb0/0x114
> >>>>> __free_extent_tree+0x58/0x7c
> >>>>> f2fs_shrink_extent_tree+0xdc/0x3b0
> >>>>> f2fs_leave_shrinker+0x28/0x7c
> >>>>> f2fs_put_super+0xfc/0x1e0
> >>>>> generic_shutdown_super+0x70/0xf4
> >>>>> kill_block_super+0x2c/0x5c
> >>>>> kill_f2fs_super+0x44/0x50
> >>>>> deactivate_locked_super+0x60/0x8c
> >>>>> deactivate_super+0x68/0x74
> >>>>> cleanup_mnt+0x40/0x78
> >>>>> __cleanup_mnt+0x1c/0x28
> >>>>> task_work_run+0x48/0xd0
> >>>>> do_notify_resume+0x678/0xe98
> >>>>> work_pending+0x8/0x14
> >>>>>
> >>>>> Fix this by cleaning up inodes, extent tree and nodes of those
> >>>>> recovered files before freeing up sbi and before next retry.
> >>>>>
> >>>>> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
> >>>>> ---
> >>>>> v2:
> >>>>> -call evict_inodes() and f2fs_shrink_extent_tree() to cleanup inodes
> >>>>>
> >>>>>  fs/f2fs/f2fs.h     |  1 +
> >>>>>  fs/f2fs/shrinker.c |  2 +-
> >>>>>  fs/f2fs/super.c    | 13 ++++++++++++-
> >>>>>  3 files changed, 14 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >>>>> index 1e03197..aaee63b 100644
> >>>>> --- a/fs/f2fs/f2fs.h
> >>>>> +++ b/fs/f2fs/f2fs.h
> >>>>> @@ -3407,6 +3407,7 @@ struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root_cached *root,
> >>>>>  bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi,
> >>>>>  						struct rb_root_cached *root);
> >>>>>  unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink);
> >>>>> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi);
> >>>>>  bool f2fs_init_extent_tree(struct inode *inode, struct f2fs_extent *i_ext);
> >>>>>  void f2fs_drop_extent_tree(struct inode *inode);
> >>>>>  unsigned int f2fs_destroy_extent_node(struct inode *inode);
> >>>>> diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
> >>>>> index 9e13db9..7e3c13b 100644
> >>>>> --- a/fs/f2fs/shrinker.c
> >>>>> +++ b/fs/f2fs/shrinker.c
> >>>>> @@ -30,7 +30,7 @@ static unsigned long __count_free_nids(struct f2fs_sb_info *sbi)
> >>>>>  	return count > 0 ? count : 0;
> >>>>>  }
> >>>>>  
> >>>>> -static unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
> >>>>> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
> >>>>>  {
> >>>>>  	return atomic_read(&sbi->total_zombie_tree) +
> >>>>>  				atomic_read(&sbi->total_ext_node);
> >>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> >>>>> index af58b2c..769e7b1 100644
> >>>>> --- a/fs/f2fs/super.c
> >>>>> +++ b/fs/f2fs/super.c
> >>>>> @@ -3016,6 +3016,16 @@ static void f2fs_tuning_parameters(struct f2fs_sb_info *sbi)
> >>>>>  	sbi->readdir_ra = 1;
> >>>>>  }
> >>>>>  
> >>>>> +static void f2fs_cleanup_inodes(struct f2fs_sb_info *sbi)
> >>>>> +{
> >>>>> +	struct super_block *sb = sbi->sb;
> >>>>> +
> >>>>> +	sync_filesystem(sb);
> >>>>
> >>>> This writes another checkpoint, which would not be what this retrial intended.
> >>>
> >>> Actually, checkpoint will not be triggered due to SBI_POR_DOING flag check
> >>> as below:
> >>>
> >>> int f2fs_sync_fs(struct super_block *sb, int sync)
> >>> {
> >>> ...
> >>> 	if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
> >>> 		return -EAGAIN;
> >>> ...
> >>> }
> >>>
> >>> And also all dirty data/node won't be persisted due to SBI_POR_DOING flag,
> >>> IIUC.
> >>>
> >>
> >> Thanks Chao for the clarification.
> >>
> >> Hi Jaegeuk,
> >>
> >> Do you still have any further concerns or comments on this patch?
> > 
> > Could you try the below first?
> > 
> > --  How about adding a condition in f2fs_may_extent_tree() when adding extents?
> > -- Likewise, if (shrinker is not registered) return false;
> > 
> > If we can fix what you described directly, I don't want to rely on such the
> > assumptions saying we won't do checkpoint. This flow literally says syncing
> > and evicting cached objects, which opposed to what we'd like to drop all caches
> > and restart fill_super again.
> > 
> > Let me consider this as a final resolution.
> 
> Jaegeuk,
> 
> Still I want to ask, what kind of scenario we have to add retry logic in
> fill_super for? As in android scenario, it must be extreme rare case that
> system runs out-of-memory in boot time...at least, I didn't get any kind of
> report like that.
> 
Hi Chao,

In my case, the first boot up has a failure in recovery as below -

F2FS-fs (mmcblk0p75): find_fsync_dnodes: detect looped node chain, blkaddr:1979471, next:1979472
F2FS-fs (mmcblk0p75): Cannot recover all fsync data errno=-22

But in the second attempt, retry will be set to false and because of that
recover_fsync_data() is skipped. This helped mount to be successful in
the second attempt. 

Thanks,

> Thanks,
> 
> > 
> > Thanks,
> > 
> >>
> >> Thanks,
> >> Sahitya.
> >>
> >>> Thanks,
> >>>
> >>>> How about adding a condition in f2fs_may_extent_tree() when adding extents?
> >>>> Likewise, if (shrinker is not registered) return false;
> >>>>
> >>>>
> >>>>> +	shrink_dcache_sb(sb);
> >>>>> +	evict_inodes(sb);
> >>>>> +	f2fs_shrink_extent_tree(sbi, __count_extent_cache(sbi));
> >>>>> +}
> >>>>> +
> >>>>>  static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>>>>  {
> >>>>>  	struct f2fs_sb_info *sbi;
> >>>>> @@ -3402,6 +3412,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>>>>  	 * falls into an infinite loop in f2fs_sync_meta_pages().
> >>>>>  	 */
> >>>>>  	truncate_inode_pages_final(META_MAPPING(sbi));
> >>>>> +	/* cleanup recovery and quota inodes */
> >>>>> +	f2fs_cleanup_inodes(sbi);
> >>>>>  	f2fs_unregister_sysfs(sbi);
> >>>>>  free_root_inode:
> >>>>>  	dput(sb->s_root);
> >>>>> @@ -3445,7 +3457,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>>>>  	/* give only one another chance */
> >>>>>  	if (retry) {
> >>>>>  		retry = false;
> >>>>> -		shrink_dcache_sb(sb);
> >>>>>  		goto try_onemore;
> >>>>>  	}
> >>>>>  	return err;
> >>>>> -- 
> >>>>> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
> >>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
> >>>>
> >>>> .
> >>>>
> >>>
> >>
> >> -- 
> >> --
> >> Sent by a consultant of the Qualcomm Innovation Center, Inc.
> >> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
> > 
> > .
> > 
> 

-- 
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] f2fs: fix sbi->extent_list corruption issue
  2018-12-12  3:17           ` Sahitya Tummala
@ 2018-12-12  3:36             ` Chao Yu
  2018-12-14  7:56               ` Sahitya Tummala
  0 siblings, 1 reply; 21+ messages in thread
From: Chao Yu @ 2018-12-12  3:36 UTC (permalink / raw)
  To: Sahitya Tummala; +Cc: Jaegeuk Kim, linux-f2fs-devel, linux-kernel

On 2018/12/12 11:17, Sahitya Tummala wrote:
> On Fri, Dec 07, 2018 at 05:47:31PM +0800, Chao Yu wrote:
>> On 2018/12/1 4:33, Jaegeuk Kim wrote:
>>> On 11/29, Sahitya Tummala wrote:
>>>>
>>>> On Tue, Nov 27, 2018 at 09:42:39AM +0800, Chao Yu wrote:
>>>>> On 2018/11/27 8:30, Jaegeuk Kim wrote:
>>>>>> On 11/26, Sahitya Tummala wrote:
>>>>>>> When there is a failure in f2fs_fill_super() after/during
>>>>>>> the recovery of fsync'd nodes, it frees the current sbi and
>>>>>>> retries again. This time the mount is successful, but the files
>>>>>>> that got recovered before retry, still holds the extent tree,
>>>>>>> whose extent nodes list is corrupted since sbi and sbi->extent_list
>>>>>>> is freed up. The list_del corruption issue is observed when the
>>>>>>> file system is getting unmounted and when those recoverd files extent
>>>>>>> node is being freed up in the below context.
>>>>>>>
>>>>>>> list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
>>>>>>> <...>
>>>>>>> kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
>>>>>>> task: fffffff1f46f2280 task.stack: ffffff8008068000
>>>>>>> lr : __list_del_entry_valid+0x94/0xb4
>>>>>>> pc : __list_del_entry_valid+0x94/0xb4
>>>>>>> <...>
>>>>>>> Call trace:
>>>>>>> __list_del_entry_valid+0x94/0xb4
>>>>>>> __release_extent_node+0xb0/0x114
>>>>>>> __free_extent_tree+0x58/0x7c
>>>>>>> f2fs_shrink_extent_tree+0xdc/0x3b0
>>>>>>> f2fs_leave_shrinker+0x28/0x7c
>>>>>>> f2fs_put_super+0xfc/0x1e0
>>>>>>> generic_shutdown_super+0x70/0xf4
>>>>>>> kill_block_super+0x2c/0x5c
>>>>>>> kill_f2fs_super+0x44/0x50
>>>>>>> deactivate_locked_super+0x60/0x8c
>>>>>>> deactivate_super+0x68/0x74
>>>>>>> cleanup_mnt+0x40/0x78
>>>>>>> __cleanup_mnt+0x1c/0x28
>>>>>>> task_work_run+0x48/0xd0
>>>>>>> do_notify_resume+0x678/0xe98
>>>>>>> work_pending+0x8/0x14
>>>>>>>
>>>>>>> Fix this by cleaning up inodes, extent tree and nodes of those
>>>>>>> recovered files before freeing up sbi and before next retry.
>>>>>>>
>>>>>>> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
>>>>>>> ---
>>>>>>> v2:
>>>>>>> -call evict_inodes() and f2fs_shrink_extent_tree() to cleanup inodes
>>>>>>>
>>>>>>>  fs/f2fs/f2fs.h     |  1 +
>>>>>>>  fs/f2fs/shrinker.c |  2 +-
>>>>>>>  fs/f2fs/super.c    | 13 ++++++++++++-
>>>>>>>  3 files changed, 14 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>>>>>> index 1e03197..aaee63b 100644
>>>>>>> --- a/fs/f2fs/f2fs.h
>>>>>>> +++ b/fs/f2fs/f2fs.h
>>>>>>> @@ -3407,6 +3407,7 @@ struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root_cached *root,
>>>>>>>  bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi,
>>>>>>>  						struct rb_root_cached *root);
>>>>>>>  unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink);
>>>>>>> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi);
>>>>>>>  bool f2fs_init_extent_tree(struct inode *inode, struct f2fs_extent *i_ext);
>>>>>>>  void f2fs_drop_extent_tree(struct inode *inode);
>>>>>>>  unsigned int f2fs_destroy_extent_node(struct inode *inode);
>>>>>>> diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
>>>>>>> index 9e13db9..7e3c13b 100644
>>>>>>> --- a/fs/f2fs/shrinker.c
>>>>>>> +++ b/fs/f2fs/shrinker.c
>>>>>>> @@ -30,7 +30,7 @@ static unsigned long __count_free_nids(struct f2fs_sb_info *sbi)
>>>>>>>  	return count > 0 ? count : 0;
>>>>>>>  }
>>>>>>>  
>>>>>>> -static unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
>>>>>>> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
>>>>>>>  {
>>>>>>>  	return atomic_read(&sbi->total_zombie_tree) +
>>>>>>>  				atomic_read(&sbi->total_ext_node);
>>>>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>>>>>> index af58b2c..769e7b1 100644
>>>>>>> --- a/fs/f2fs/super.c
>>>>>>> +++ b/fs/f2fs/super.c
>>>>>>> @@ -3016,6 +3016,16 @@ static void f2fs_tuning_parameters(struct f2fs_sb_info *sbi)
>>>>>>>  	sbi->readdir_ra = 1;
>>>>>>>  }
>>>>>>>  
>>>>>>> +static void f2fs_cleanup_inodes(struct f2fs_sb_info *sbi)
>>>>>>> +{
>>>>>>> +	struct super_block *sb = sbi->sb;
>>>>>>> +
>>>>>>> +	sync_filesystem(sb);
>>>>>>
>>>>>> This writes another checkpoint, which would not be what this retrial intended.
>>>>>
>>>>> Actually, checkpoint will not be triggered due to SBI_POR_DOING flag check
>>>>> as below:
>>>>>
>>>>> int f2fs_sync_fs(struct super_block *sb, int sync)
>>>>> {
>>>>> ...
>>>>> 	if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
>>>>> 		return -EAGAIN;
>>>>> ...
>>>>> }
>>>>>
>>>>> And also all dirty data/node won't be persisted due to SBI_POR_DOING flag,
>>>>> IIUC.
>>>>>
>>>>
>>>> Thanks Chao for the clarification.
>>>>
>>>> Hi Jaegeuk,
>>>>
>>>> Do you still have any further concerns or comments on this patch?
>>>
>>> Could you try the below first?
>>>
>>> --  How about adding a condition in f2fs_may_extent_tree() when adding extents?
>>> -- Likewise, if (shrinker is not registered) return false;
>>>
>>> If we can fix what you described directly, I don't want to rely on such the
>>> assumptions saying we won't do checkpoint. This flow literally says syncing
>>> and evicting cached objects, which opposed to what we'd like to drop all caches
>>> and restart fill_super again.
>>>
>>> Let me consider this as a final resolution.
>>
>> Jaegeuk,
>>
>> Still I want to ask, what kind of scenario we have to add retry logic in
>> fill_super for? As in android scenario, it must be extreme rare case that
>> system runs out-of-memory in boot time...at least, I didn't get any kind of
>> report like that.
>>
> Hi Chao,

Hi Sahitya,

Thanks for letting me know that, I git-blamed the code, and found the
original intention is like what you described:

commit ed2e621a95d704e6a4e904cc00524e8cbddda0c2
Author: Jaegeuk Kim <jaegeuk@kernel.org>
Date:   Fri Aug 8 15:37:41 2014 -0700

    f2fs: give a chance to mount again when encountering errors

    This patch gives another chance to try mount process when we encounter
an error.
    This makes an effect on the roll-forward recovery failures as well.

But I doubt that if we failed in recovery, maybe there is corruption in
this image, would it be better to fail the mount, and let user fsck it and
retry the mount? otherwise, the corruption may be expanded...

Thanks,

> 
> In my case, the first boot up has a failure in recovery as below -
> 
> F2FS-fs (mmcblk0p75): find_fsync_dnodes: detect looped node chain, blkaddr:1979471, next:1979472
> F2FS-fs (mmcblk0p75): Cannot recover all fsync data errno=-22
> 
> But in the second attempt, retry will be set to false and because of that
> recover_fsync_data() is skipped. This helped mount to be successful in
> the second attempt. 
> 
> Thanks,
> 
>> Thanks,
>>
>>>
>>> Thanks,
>>>
>>>>
>>>> Thanks,
>>>> Sahitya.
>>>>
>>>>> Thanks,
>>>>>
>>>>>> How about adding a condition in f2fs_may_extent_tree() when adding extents?
>>>>>> Likewise, if (shrinker is not registered) return false;
>>>>>>
>>>>>>
>>>>>>> +	shrink_dcache_sb(sb);
>>>>>>> +	evict_inodes(sb);
>>>>>>> +	f2fs_shrink_extent_tree(sbi, __count_extent_cache(sbi));
>>>>>>> +}
>>>>>>> +
>>>>>>>  static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>>>>>>  {
>>>>>>>  	struct f2fs_sb_info *sbi;
>>>>>>> @@ -3402,6 +3412,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>>>>>>  	 * falls into an infinite loop in f2fs_sync_meta_pages().
>>>>>>>  	 */
>>>>>>>  	truncate_inode_pages_final(META_MAPPING(sbi));
>>>>>>> +	/* cleanup recovery and quota inodes */
>>>>>>> +	f2fs_cleanup_inodes(sbi);
>>>>>>>  	f2fs_unregister_sysfs(sbi);
>>>>>>>  free_root_inode:
>>>>>>>  	dput(sb->s_root);
>>>>>>> @@ -3445,7 +3457,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>>>>>>  	/* give only one another chance */
>>>>>>>  	if (retry) {
>>>>>>>  		retry = false;
>>>>>>> -		shrink_dcache_sb(sb);
>>>>>>>  		goto try_onemore;
>>>>>>>  	}
>>>>>>>  	return err;
>>>>>>> -- 
>>>>>>> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
>>>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
>>>>>>
>>>>>> .
>>>>>>
>>>>>
>>>>
>>>> -- 
>>>> --
>>>> Sent by a consultant of the Qualcomm Innovation Center, Inc.
>>>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
>>>
>>> .
>>>
>>
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] f2fs: fix sbi->extent_list corruption issue
  2018-12-12  3:36             ` Chao Yu
@ 2018-12-14  7:56               ` Sahitya Tummala
  2018-12-14 14:25                 ` Jaegeuk Kim
  0 siblings, 1 reply; 21+ messages in thread
From: Sahitya Tummala @ 2018-12-14  7:56 UTC (permalink / raw)
  To: Chao Yu; +Cc: Jaegeuk Kim, linux-f2fs-devel, linux-kernel

On Wed, Dec 12, 2018 at 11:36:08AM +0800, Chao Yu wrote:
> On 2018/12/12 11:17, Sahitya Tummala wrote:
> > On Fri, Dec 07, 2018 at 05:47:31PM +0800, Chao Yu wrote:
> >> On 2018/12/1 4:33, Jaegeuk Kim wrote:
> >>> On 11/29, Sahitya Tummala wrote:
> >>>>
> >>>> On Tue, Nov 27, 2018 at 09:42:39AM +0800, Chao Yu wrote:
> >>>>> On 2018/11/27 8:30, Jaegeuk Kim wrote:
> >>>>>> On 11/26, Sahitya Tummala wrote:
> >>>>>>> When there is a failure in f2fs_fill_super() after/during
> >>>>>>> the recovery of fsync'd nodes, it frees the current sbi and
> >>>>>>> retries again. This time the mount is successful, but the files
> >>>>>>> that got recovered before retry, still holds the extent tree,
> >>>>>>> whose extent nodes list is corrupted since sbi and sbi->extent_list
> >>>>>>> is freed up. The list_del corruption issue is observed when the
> >>>>>>> file system is getting unmounted and when those recoverd files extent
> >>>>>>> node is being freed up in the below context.
> >>>>>>>
> >>>>>>> list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
> >>>>>>> <...>
> >>>>>>> kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
> >>>>>>> task: fffffff1f46f2280 task.stack: ffffff8008068000
> >>>>>>> lr : __list_del_entry_valid+0x94/0xb4
> >>>>>>> pc : __list_del_entry_valid+0x94/0xb4
> >>>>>>> <...>
> >>>>>>> Call trace:
> >>>>>>> __list_del_entry_valid+0x94/0xb4
> >>>>>>> __release_extent_node+0xb0/0x114
> >>>>>>> __free_extent_tree+0x58/0x7c
> >>>>>>> f2fs_shrink_extent_tree+0xdc/0x3b0
> >>>>>>> f2fs_leave_shrinker+0x28/0x7c
> >>>>>>> f2fs_put_super+0xfc/0x1e0
> >>>>>>> generic_shutdown_super+0x70/0xf4
> >>>>>>> kill_block_super+0x2c/0x5c
> >>>>>>> kill_f2fs_super+0x44/0x50
> >>>>>>> deactivate_locked_super+0x60/0x8c
> >>>>>>> deactivate_super+0x68/0x74
> >>>>>>> cleanup_mnt+0x40/0x78
> >>>>>>> __cleanup_mnt+0x1c/0x28
> >>>>>>> task_work_run+0x48/0xd0
> >>>>>>> do_notify_resume+0x678/0xe98
> >>>>>>> work_pending+0x8/0x14
> >>>>>>>
> >>>>>>> Fix this by cleaning up inodes, extent tree and nodes of those
> >>>>>>> recovered files before freeing up sbi and before next retry.
> >>>>>>>
> >>>>>>> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
> >>>>>>> ---
> >>>>>>> v2:
> >>>>>>> -call evict_inodes() and f2fs_shrink_extent_tree() to cleanup inodes
> >>>>>>>
> >>>>>>>  fs/f2fs/f2fs.h     |  1 +
> >>>>>>>  fs/f2fs/shrinker.c |  2 +-
> >>>>>>>  fs/f2fs/super.c    | 13 ++++++++++++-
> >>>>>>>  3 files changed, 14 insertions(+), 2 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >>>>>>> index 1e03197..aaee63b 100644
> >>>>>>> --- a/fs/f2fs/f2fs.h
> >>>>>>> +++ b/fs/f2fs/f2fs.h
> >>>>>>> @@ -3407,6 +3407,7 @@ struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root_cached *root,
> >>>>>>>  bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi,
> >>>>>>>  						struct rb_root_cached *root);
> >>>>>>>  unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink);
> >>>>>>> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi);
> >>>>>>>  bool f2fs_init_extent_tree(struct inode *inode, struct f2fs_extent *i_ext);
> >>>>>>>  void f2fs_drop_extent_tree(struct inode *inode);
> >>>>>>>  unsigned int f2fs_destroy_extent_node(struct inode *inode);
> >>>>>>> diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
> >>>>>>> index 9e13db9..7e3c13b 100644
> >>>>>>> --- a/fs/f2fs/shrinker.c
> >>>>>>> +++ b/fs/f2fs/shrinker.c
> >>>>>>> @@ -30,7 +30,7 @@ static unsigned long __count_free_nids(struct f2fs_sb_info *sbi)
> >>>>>>>  	return count > 0 ? count : 0;
> >>>>>>>  }
> >>>>>>>  
> >>>>>>> -static unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
> >>>>>>> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
> >>>>>>>  {
> >>>>>>>  	return atomic_read(&sbi->total_zombie_tree) +
> >>>>>>>  				atomic_read(&sbi->total_ext_node);
> >>>>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> >>>>>>> index af58b2c..769e7b1 100644
> >>>>>>> --- a/fs/f2fs/super.c
> >>>>>>> +++ b/fs/f2fs/super.c
> >>>>>>> @@ -3016,6 +3016,16 @@ static void f2fs_tuning_parameters(struct f2fs_sb_info *sbi)
> >>>>>>>  	sbi->readdir_ra = 1;
> >>>>>>>  }
> >>>>>>>  
> >>>>>>> +static void f2fs_cleanup_inodes(struct f2fs_sb_info *sbi)
> >>>>>>> +{
> >>>>>>> +	struct super_block *sb = sbi->sb;
> >>>>>>> +
> >>>>>>> +	sync_filesystem(sb);
> >>>>>>
> >>>>>> This writes another checkpoint, which would not be what this retrial intended.
> >>>>>
> >>>>> Actually, checkpoint will not be triggered due to SBI_POR_DOING flag check
> >>>>> as below:
> >>>>>
> >>>>> int f2fs_sync_fs(struct super_block *sb, int sync)
> >>>>> {
> >>>>> ...
> >>>>> 	if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
> >>>>> 		return -EAGAIN;
> >>>>> ...
> >>>>> }
> >>>>>
> >>>>> And also all dirty data/node won't be persisted due to SBI_POR_DOING flag,
> >>>>> IIUC.
> >>>>>
> >>>>
> >>>> Thanks Chao for the clarification.
> >>>>
> >>>> Hi Jaegeuk,
> >>>>
> >>>> Do you still have any further concerns or comments on this patch?
> >>>
> >>> Could you try the below first?
> >>>
> >>> --  How about adding a condition in f2fs_may_extent_tree() when adding extents?
> >>> -- Likewise, if (shrinker is not registered) return false;
> >>>
> >>> If we can fix what you described directly, I don't want to rely on such the
> >>> assumptions saying we won't do checkpoint. This flow literally says syncing
> >>> and evicting cached objects, which opposed to what we'd like to drop all caches
> >>> and restart fill_super again.
> >>>
> >>> Let me consider this as a final resolution.
> >>
> >> Jaegeuk,
> >>
> >> Still I want to ask, what kind of scenario we have to add retry logic in
> >> fill_super for? As in android scenario, it must be extreme rare case that
> >> system runs out-of-memory in boot time...at least, I didn't get any kind of
> >> report like that.
> >>
> > Hi Chao,
> 
> Hi Sahitya,
> 
> Thanks for letting me know that, I git-blamed the code, and found the
> original intention is like what you described:
> 
> commit ed2e621a95d704e6a4e904cc00524e8cbddda0c2
> Author: Jaegeuk Kim <jaegeuk@kernel.org>
> Date:   Fri Aug 8 15:37:41 2014 -0700
> 
>     f2fs: give a chance to mount again when encountering errors
> 
>     This patch gives another chance to try mount process when we encounter
> an error.
>     This makes an effect on the roll-forward recovery failures as well.
> 
> But I doubt that if we failed in recovery, maybe there is corruption in
> this image, would it be better to fail the mount, and let user fsck it and
> retry the mount? otherwise, the corruption may be expanded...
> 

Hi Jaegeuk,

How do you think about this? If you think it is okay, then I will fix the
sbi->extent_list corruption issue, by removing the retry logic. Otherwise,
I will fix it in the extent handling as you have suggested earlier.

Thanks,

> Thanks,
> 
> > 
> > In my case, the first boot up has a failure in recovery as below -
> > 
> > F2FS-fs (mmcblk0p75): find_fsync_dnodes: detect looped node chain, blkaddr:1979471, next:1979472
> > F2FS-fs (mmcblk0p75): Cannot recover all fsync data errno=-22
> > 
> > But in the second attempt, retry will be set to false and because of that
> > recover_fsync_data() is skipped. This helped mount to be successful in
> > the second attempt. 
> > 
> > Thanks,
> > 
> >> Thanks,
> >>
> >>>
> >>> Thanks,
> >>>
> >>>>
> >>>> Thanks,
> >>>> Sahitya.
> >>>>
> >>>>> Thanks,
> >>>>>
> >>>>>> How about adding a condition in f2fs_may_extent_tree() when adding extents?
> >>>>>> Likewise, if (shrinker is not registered) return false;
> >>>>>>
> >>>>>>
> >>>>>>> +	shrink_dcache_sb(sb);
> >>>>>>> +	evict_inodes(sb);
> >>>>>>> +	f2fs_shrink_extent_tree(sbi, __count_extent_cache(sbi));
> >>>>>>> +}
> >>>>>>> +
> >>>>>>>  static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>>>>>>  {
> >>>>>>>  	struct f2fs_sb_info *sbi;
> >>>>>>> @@ -3402,6 +3412,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>>>>>>  	 * falls into an infinite loop in f2fs_sync_meta_pages().
> >>>>>>>  	 */
> >>>>>>>  	truncate_inode_pages_final(META_MAPPING(sbi));
> >>>>>>> +	/* cleanup recovery and quota inodes */
> >>>>>>> +	f2fs_cleanup_inodes(sbi);
> >>>>>>>  	f2fs_unregister_sysfs(sbi);
> >>>>>>>  free_root_inode:
> >>>>>>>  	dput(sb->s_root);
> >>>>>>> @@ -3445,7 +3457,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>>>>>>  	/* give only one another chance */
> >>>>>>>  	if (retry) {
> >>>>>>>  		retry = false;
> >>>>>>> -		shrink_dcache_sb(sb);
> >>>>>>>  		goto try_onemore;
> >>>>>>>  	}
> >>>>>>>  	return err;
> >>>>>>> -- 
> >>>>>>> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
> >>>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
> >>>>>>
> >>>>>> .
> >>>>>>
> >>>>>
> >>>>
> >>>> -- 
> >>>> --
> >>>> Sent by a consultant of the Qualcomm Innovation Center, Inc.
> >>>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
> >>>
> >>> .
> >>>
> >>
> > 
> 

-- 
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] f2fs: fix sbi->extent_list corruption issue
  2018-12-14  7:56               ` Sahitya Tummala
@ 2018-12-14 14:25                 ` Jaegeuk Kim
  2018-12-18 10:28                   ` Chao Yu
  0 siblings, 1 reply; 21+ messages in thread
From: Jaegeuk Kim @ 2018-12-14 14:25 UTC (permalink / raw)
  To: Sahitya Tummala; +Cc: Chao Yu, linux-f2fs-devel, linux-kernel

On 12/14, Sahitya Tummala wrote:
> On Wed, Dec 12, 2018 at 11:36:08AM +0800, Chao Yu wrote:
> > On 2018/12/12 11:17, Sahitya Tummala wrote:
> > > On Fri, Dec 07, 2018 at 05:47:31PM +0800, Chao Yu wrote:
> > >> On 2018/12/1 4:33, Jaegeuk Kim wrote:
> > >>> On 11/29, Sahitya Tummala wrote:
> > >>>>
> > >>>> On Tue, Nov 27, 2018 at 09:42:39AM +0800, Chao Yu wrote:
> > >>>>> On 2018/11/27 8:30, Jaegeuk Kim wrote:
> > >>>>>> On 11/26, Sahitya Tummala wrote:
> > >>>>>>> When there is a failure in f2fs_fill_super() after/during
> > >>>>>>> the recovery of fsync'd nodes, it frees the current sbi and
> > >>>>>>> retries again. This time the mount is successful, but the files
> > >>>>>>> that got recovered before retry, still holds the extent tree,
> > >>>>>>> whose extent nodes list is corrupted since sbi and sbi->extent_list
> > >>>>>>> is freed up. The list_del corruption issue is observed when the
> > >>>>>>> file system is getting unmounted and when those recoverd files extent
> > >>>>>>> node is being freed up in the below context.
> > >>>>>>>
> > >>>>>>> list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
> > >>>>>>> <...>
> > >>>>>>> kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
> > >>>>>>> task: fffffff1f46f2280 task.stack: ffffff8008068000
> > >>>>>>> lr : __list_del_entry_valid+0x94/0xb4
> > >>>>>>> pc : __list_del_entry_valid+0x94/0xb4
> > >>>>>>> <...>
> > >>>>>>> Call trace:
> > >>>>>>> __list_del_entry_valid+0x94/0xb4
> > >>>>>>> __release_extent_node+0xb0/0x114
> > >>>>>>> __free_extent_tree+0x58/0x7c
> > >>>>>>> f2fs_shrink_extent_tree+0xdc/0x3b0
> > >>>>>>> f2fs_leave_shrinker+0x28/0x7c
> > >>>>>>> f2fs_put_super+0xfc/0x1e0
> > >>>>>>> generic_shutdown_super+0x70/0xf4
> > >>>>>>> kill_block_super+0x2c/0x5c
> > >>>>>>> kill_f2fs_super+0x44/0x50
> > >>>>>>> deactivate_locked_super+0x60/0x8c
> > >>>>>>> deactivate_super+0x68/0x74
> > >>>>>>> cleanup_mnt+0x40/0x78
> > >>>>>>> __cleanup_mnt+0x1c/0x28
> > >>>>>>> task_work_run+0x48/0xd0
> > >>>>>>> do_notify_resume+0x678/0xe98
> > >>>>>>> work_pending+0x8/0x14
> > >>>>>>>
> > >>>>>>> Fix this by cleaning up inodes, extent tree and nodes of those
> > >>>>>>> recovered files before freeing up sbi and before next retry.
> > >>>>>>>
> > >>>>>>> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
> > >>>>>>> ---
> > >>>>>>> v2:
> > >>>>>>> -call evict_inodes() and f2fs_shrink_extent_tree() to cleanup inodes
> > >>>>>>>
> > >>>>>>>  fs/f2fs/f2fs.h     |  1 +
> > >>>>>>>  fs/f2fs/shrinker.c |  2 +-
> > >>>>>>>  fs/f2fs/super.c    | 13 ++++++++++++-
> > >>>>>>>  3 files changed, 14 insertions(+), 2 deletions(-)
> > >>>>>>>
> > >>>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > >>>>>>> index 1e03197..aaee63b 100644
> > >>>>>>> --- a/fs/f2fs/f2fs.h
> > >>>>>>> +++ b/fs/f2fs/f2fs.h
> > >>>>>>> @@ -3407,6 +3407,7 @@ struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root_cached *root,
> > >>>>>>>  bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi,
> > >>>>>>>  						struct rb_root_cached *root);
> > >>>>>>>  unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink);
> > >>>>>>> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi);
> > >>>>>>>  bool f2fs_init_extent_tree(struct inode *inode, struct f2fs_extent *i_ext);
> > >>>>>>>  void f2fs_drop_extent_tree(struct inode *inode);
> > >>>>>>>  unsigned int f2fs_destroy_extent_node(struct inode *inode);
> > >>>>>>> diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
> > >>>>>>> index 9e13db9..7e3c13b 100644
> > >>>>>>> --- a/fs/f2fs/shrinker.c
> > >>>>>>> +++ b/fs/f2fs/shrinker.c
> > >>>>>>> @@ -30,7 +30,7 @@ static unsigned long __count_free_nids(struct f2fs_sb_info *sbi)
> > >>>>>>>  	return count > 0 ? count : 0;
> > >>>>>>>  }
> > >>>>>>>  
> > >>>>>>> -static unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
> > >>>>>>> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
> > >>>>>>>  {
> > >>>>>>>  	return atomic_read(&sbi->total_zombie_tree) +
> > >>>>>>>  				atomic_read(&sbi->total_ext_node);
> > >>>>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > >>>>>>> index af58b2c..769e7b1 100644
> > >>>>>>> --- a/fs/f2fs/super.c
> > >>>>>>> +++ b/fs/f2fs/super.c
> > >>>>>>> @@ -3016,6 +3016,16 @@ static void f2fs_tuning_parameters(struct f2fs_sb_info *sbi)
> > >>>>>>>  	sbi->readdir_ra = 1;
> > >>>>>>>  }
> > >>>>>>>  
> > >>>>>>> +static void f2fs_cleanup_inodes(struct f2fs_sb_info *sbi)
> > >>>>>>> +{
> > >>>>>>> +	struct super_block *sb = sbi->sb;
> > >>>>>>> +
> > >>>>>>> +	sync_filesystem(sb);
> > >>>>>>
> > >>>>>> This writes another checkpoint, which would not be what this retrial intended.
> > >>>>>
> > >>>>> Actually, checkpoint will not be triggered due to SBI_POR_DOING flag check
> > >>>>> as below:
> > >>>>>
> > >>>>> int f2fs_sync_fs(struct super_block *sb, int sync)
> > >>>>> {
> > >>>>> ...
> > >>>>> 	if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
> > >>>>> 		return -EAGAIN;
> > >>>>> ...
> > >>>>> }
> > >>>>>
> > >>>>> And also all dirty data/node won't be persisted due to SBI_POR_DOING flag,
> > >>>>> IIUC.
> > >>>>>
> > >>>>
> > >>>> Thanks Chao for the clarification.
> > >>>>
> > >>>> Hi Jaegeuk,
> > >>>>
> > >>>> Do you still have any further concerns or comments on this patch?
> > >>>
> > >>> Could you try the below first?
> > >>>
> > >>> --  How about adding a condition in f2fs_may_extent_tree() when adding extents?
> > >>> -- Likewise, if (shrinker is not registered) return false;
> > >>>
> > >>> If we can fix what you described directly, I don't want to rely on such the
> > >>> assumptions saying we won't do checkpoint. This flow literally says syncing
> > >>> and evicting cached objects, which opposed to what we'd like to drop all caches
> > >>> and restart fill_super again.
> > >>>
> > >>> Let me consider this as a final resolution.
> > >>
> > >> Jaegeuk,
> > >>
> > >> Still I want to ask, what kind of scenario we have to add retry logic in
> > >> fill_super for? As in android scenario, it must be extreme rare case that
> > >> system runs out-of-memory in boot time...at least, I didn't get any kind of
> > >> report like that.
> > >>
> > > Hi Chao,
> > 
> > Hi Sahitya,
> > 
> > Thanks for letting me know that, I git-blamed the code, and found the
> > original intention is like what you described:
> > 
> > commit ed2e621a95d704e6a4e904cc00524e8cbddda0c2
> > Author: Jaegeuk Kim <jaegeuk@kernel.org>
> > Date:   Fri Aug 8 15:37:41 2014 -0700
> > 
> >     f2fs: give a chance to mount again when encountering errors
> > 
> >     This patch gives another chance to try mount process when we encounter
> > an error.
> >     This makes an effect on the roll-forward recovery failures as well.
> > 
> > But I doubt that if we failed in recovery, maybe there is corruption in
> > this image, would it be better to fail the mount, and let user fsck it and
> > retry the mount? otherwise, the corruption may be expanded...

The problem was there was no way to recover roll-forward area by fsck. IOWs,
mount was failing all the time. I don't think roll-forward itself can corrupt
the image more. Please report, if you have any issue on this.

> > 
> 
> Hi Jaegeuk,
> 
> How do you think about this? If you think it is okay, then I will fix the
> sbi->extent_list corruption issue, by removing the retry logic. Otherwise,
> I will fix it in the extent handling as you have suggested earlier.

I'd like to keep retry logic, so could you please test what I suggested above?

Thanks,

> 
> Thanks,
> 
> > Thanks,
> > 
> > > 
> > > In my case, the first boot up has a failure in recovery as below -
> > > 
> > > F2FS-fs (mmcblk0p75): find_fsync_dnodes: detect looped node chain, blkaddr:1979471, next:1979472
> > > F2FS-fs (mmcblk0p75): Cannot recover all fsync data errno=-22
> > > 
> > > But in the second attempt, retry will be set to false and because of that
> > > recover_fsync_data() is skipped. This helped mount to be successful in
> > > the second attempt. 
> > > 
> > > Thanks,
> > > 
> > >> Thanks,
> > >>
> > >>>
> > >>> Thanks,
> > >>>
> > >>>>
> > >>>> Thanks,
> > >>>> Sahitya.
> > >>>>
> > >>>>> Thanks,
> > >>>>>
> > >>>>>> How about adding a condition in f2fs_may_extent_tree() when adding extents?
> > >>>>>> Likewise, if (shrinker is not registered) return false;
> > >>>>>>
> > >>>>>>
> > >>>>>>> +	shrink_dcache_sb(sb);
> > >>>>>>> +	evict_inodes(sb);
> > >>>>>>> +	f2fs_shrink_extent_tree(sbi, __count_extent_cache(sbi));
> > >>>>>>> +}
> > >>>>>>> +
> > >>>>>>>  static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> > >>>>>>>  {
> > >>>>>>>  	struct f2fs_sb_info *sbi;
> > >>>>>>> @@ -3402,6 +3412,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> > >>>>>>>  	 * falls into an infinite loop in f2fs_sync_meta_pages().
> > >>>>>>>  	 */
> > >>>>>>>  	truncate_inode_pages_final(META_MAPPING(sbi));
> > >>>>>>> +	/* cleanup recovery and quota inodes */
> > >>>>>>> +	f2fs_cleanup_inodes(sbi);
> > >>>>>>>  	f2fs_unregister_sysfs(sbi);
> > >>>>>>>  free_root_inode:
> > >>>>>>>  	dput(sb->s_root);
> > >>>>>>> @@ -3445,7 +3457,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> > >>>>>>>  	/* give only one another chance */
> > >>>>>>>  	if (retry) {
> > >>>>>>>  		retry = false;
> > >>>>>>> -		shrink_dcache_sb(sb);
> > >>>>>>>  		goto try_onemore;
> > >>>>>>>  	}
> > >>>>>>>  	return err;
> > >>>>>>> -- 
> > >>>>>>> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
> > >>>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
> > >>>>>>
> > >>>>>> .
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>> -- 
> > >>>> --
> > >>>> Sent by a consultant of the Qualcomm Innovation Center, Inc.
> > >>>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
> > >>>
> > >>> .
> > >>>
> > >>
> > > 
> > 
> 
> -- 
> --
> Sent by a consultant of the Qualcomm Innovation Center, Inc.
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] f2fs: fix sbi->extent_list corruption issue
  2018-12-14 14:25                 ` Jaegeuk Kim
@ 2018-12-18 10:28                   ` Chao Yu
  2018-12-18 22:47                     ` Jaegeuk Kim
  0 siblings, 1 reply; 21+ messages in thread
From: Chao Yu @ 2018-12-18 10:28 UTC (permalink / raw)
  To: Jaegeuk Kim, Sahitya Tummala; +Cc: linux-f2fs-devel, linux-kernel

On 2018/12/14 22:25, Jaegeuk Kim wrote:
> On 12/14, Sahitya Tummala wrote:
>> On Wed, Dec 12, 2018 at 11:36:08AM +0800, Chao Yu wrote:
>>> On 2018/12/12 11:17, Sahitya Tummala wrote:
>>>> On Fri, Dec 07, 2018 at 05:47:31PM +0800, Chao Yu wrote:
>>>>> On 2018/12/1 4:33, Jaegeuk Kim wrote:
>>>>>> On 11/29, Sahitya Tummala wrote:
>>>>>>>
>>>>>>> On Tue, Nov 27, 2018 at 09:42:39AM +0800, Chao Yu wrote:
>>>>>>>> On 2018/11/27 8:30, Jaegeuk Kim wrote:
>>>>>>>>> On 11/26, Sahitya Tummala wrote:
>>>>>>>>>> When there is a failure in f2fs_fill_super() after/during
>>>>>>>>>> the recovery of fsync'd nodes, it frees the current sbi and
>>>>>>>>>> retries again. This time the mount is successful, but the files
>>>>>>>>>> that got recovered before retry, still holds the extent tree,
>>>>>>>>>> whose extent nodes list is corrupted since sbi and sbi->extent_list
>>>>>>>>>> is freed up. The list_del corruption issue is observed when the
>>>>>>>>>> file system is getting unmounted and when those recoverd files extent
>>>>>>>>>> node is being freed up in the below context.
>>>>>>>>>>
>>>>>>>>>> list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
>>>>>>>>>> <...>
>>>>>>>>>> kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
>>>>>>>>>> task: fffffff1f46f2280 task.stack: ffffff8008068000
>>>>>>>>>> lr : __list_del_entry_valid+0x94/0xb4
>>>>>>>>>> pc : __list_del_entry_valid+0x94/0xb4
>>>>>>>>>> <...>
>>>>>>>>>> Call trace:
>>>>>>>>>> __list_del_entry_valid+0x94/0xb4
>>>>>>>>>> __release_extent_node+0xb0/0x114
>>>>>>>>>> __free_extent_tree+0x58/0x7c
>>>>>>>>>> f2fs_shrink_extent_tree+0xdc/0x3b0
>>>>>>>>>> f2fs_leave_shrinker+0x28/0x7c
>>>>>>>>>> f2fs_put_super+0xfc/0x1e0
>>>>>>>>>> generic_shutdown_super+0x70/0xf4
>>>>>>>>>> kill_block_super+0x2c/0x5c
>>>>>>>>>> kill_f2fs_super+0x44/0x50
>>>>>>>>>> deactivate_locked_super+0x60/0x8c
>>>>>>>>>> deactivate_super+0x68/0x74
>>>>>>>>>> cleanup_mnt+0x40/0x78
>>>>>>>>>> __cleanup_mnt+0x1c/0x28
>>>>>>>>>> task_work_run+0x48/0xd0
>>>>>>>>>> do_notify_resume+0x678/0xe98
>>>>>>>>>> work_pending+0x8/0x14
>>>>>>>>>>
>>>>>>>>>> Fix this by cleaning up inodes, extent tree and nodes of those
>>>>>>>>>> recovered files before freeing up sbi and before next retry.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
>>>>>>>>>> ---
>>>>>>>>>> v2:
>>>>>>>>>> -call evict_inodes() and f2fs_shrink_extent_tree() to cleanup inodes
>>>>>>>>>>
>>>>>>>>>>  fs/f2fs/f2fs.h     |  1 +
>>>>>>>>>>  fs/f2fs/shrinker.c |  2 +-
>>>>>>>>>>  fs/f2fs/super.c    | 13 ++++++++++++-
>>>>>>>>>>  3 files changed, 14 insertions(+), 2 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>>>>>>>>> index 1e03197..aaee63b 100644
>>>>>>>>>> --- a/fs/f2fs/f2fs.h
>>>>>>>>>> +++ b/fs/f2fs/f2fs.h
>>>>>>>>>> @@ -3407,6 +3407,7 @@ struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root_cached *root,
>>>>>>>>>>  bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi,
>>>>>>>>>>  						struct rb_root_cached *root);
>>>>>>>>>>  unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink);
>>>>>>>>>> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi);
>>>>>>>>>>  bool f2fs_init_extent_tree(struct inode *inode, struct f2fs_extent *i_ext);
>>>>>>>>>>  void f2fs_drop_extent_tree(struct inode *inode);
>>>>>>>>>>  unsigned int f2fs_destroy_extent_node(struct inode *inode);
>>>>>>>>>> diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
>>>>>>>>>> index 9e13db9..7e3c13b 100644
>>>>>>>>>> --- a/fs/f2fs/shrinker.c
>>>>>>>>>> +++ b/fs/f2fs/shrinker.c
>>>>>>>>>> @@ -30,7 +30,7 @@ static unsigned long __count_free_nids(struct f2fs_sb_info *sbi)
>>>>>>>>>>  	return count > 0 ? count : 0;
>>>>>>>>>>  }
>>>>>>>>>>  
>>>>>>>>>> -static unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
>>>>>>>>>> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
>>>>>>>>>>  {
>>>>>>>>>>  	return atomic_read(&sbi->total_zombie_tree) +
>>>>>>>>>>  				atomic_read(&sbi->total_ext_node);
>>>>>>>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>>>>>>>>> index af58b2c..769e7b1 100644
>>>>>>>>>> --- a/fs/f2fs/super.c
>>>>>>>>>> +++ b/fs/f2fs/super.c
>>>>>>>>>> @@ -3016,6 +3016,16 @@ static void f2fs_tuning_parameters(struct f2fs_sb_info *sbi)
>>>>>>>>>>  	sbi->readdir_ra = 1;
>>>>>>>>>>  }
>>>>>>>>>>  
>>>>>>>>>> +static void f2fs_cleanup_inodes(struct f2fs_sb_info *sbi)
>>>>>>>>>> +{
>>>>>>>>>> +	struct super_block *sb = sbi->sb;
>>>>>>>>>> +
>>>>>>>>>> +	sync_filesystem(sb);
>>>>>>>>>
>>>>>>>>> This writes another checkpoint, which would not be what this retrial intended.
>>>>>>>>
>>>>>>>> Actually, checkpoint will not be triggered due to SBI_POR_DOING flag check
>>>>>>>> as below:
>>>>>>>>
>>>>>>>> int f2fs_sync_fs(struct super_block *sb, int sync)
>>>>>>>> {
>>>>>>>> ...
>>>>>>>> 	if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
>>>>>>>> 		return -EAGAIN;
>>>>>>>> ...
>>>>>>>> }
>>>>>>>>
>>>>>>>> And also all dirty data/node won't be persisted due to SBI_POR_DOING flag,
>>>>>>>> IIUC.
>>>>>>>>
>>>>>>>
>>>>>>> Thanks Chao for the clarification.
>>>>>>>
>>>>>>> Hi Jaegeuk,
>>>>>>>
>>>>>>> Do you still have any further concerns or comments on this patch?
>>>>>>
>>>>>> Could you try the below first?
>>>>>>
>>>>>> --  How about adding a condition in f2fs_may_extent_tree() when adding extents?
>>>>>> -- Likewise, if (shrinker is not registered) return false;
>>>>>>
>>>>>> If we can fix what you described directly, I don't want to rely on such the
>>>>>> assumptions saying we won't do checkpoint. This flow literally says syncing
>>>>>> and evicting cached objects, which opposed to what we'd like to drop all caches
>>>>>> and restart fill_super again.
>>>>>>
>>>>>> Let me consider this as a final resolution.
>>>>>
>>>>> Jaegeuk,
>>>>>
>>>>> Still I want to ask, what kind of scenario we have to add retry logic in
>>>>> fill_super for? As in android scenario, it must be extreme rare case that
>>>>> system runs out-of-memory in boot time...at least, I didn't get any kind of
>>>>> report like that.
>>>>>
>>>> Hi Chao,
>>>
>>> Hi Sahitya,
>>>
>>> Thanks for letting me know that, I git-blamed the code, and found the
>>> original intention is like what you described:
>>>
>>> commit ed2e621a95d704e6a4e904cc00524e8cbddda0c2
>>> Author: Jaegeuk Kim <jaegeuk@kernel.org>
>>> Date:   Fri Aug 8 15:37:41 2014 -0700
>>>
>>>     f2fs: give a chance to mount again when encountering errors
>>>
>>>     This patch gives another chance to try mount process when we encounter
>>> an error.
>>>     This makes an effect on the roll-forward recovery failures as well.
>>>
>>> But I doubt that if we failed in recovery, maybe there is corruption in
>>> this image, would it be better to fail the mount, and let user fsck it and
>>> retry the mount? otherwise, the corruption may be expanded...
> 
> The problem was there was no way to recover roll-forward area by fsck. IOWs,
> mount was failing all the time. I don't think roll-forward itself can corrupt

I got your concern, IMO, if mount fails, it will be better to let user
decide how to handle it.

If mount fails due to:

1) recovery, user can run fsck and/or try disable_roll_forward or
norecovery option in another mount;
2) -EINVAL caused by sanity, user can run fsck and retry mount.
3) -ENOMEM caused low memory in system, user can add more memory and retry
mount.
...

Thanks,

> the image more. Please report, if you have any issue on this.>
>>>
>>
>> Hi Jaegeuk,
>>
>> How do you think about this? If you think it is okay, then I will fix the
>> sbi->extent_list corruption issue, by removing the retry logic. Otherwise,
>> I will fix it in the extent handling as you have suggested earlier.
> 
> I'd like to keep retry logic, so could you please test what I suggested above?
> 
> Thanks,
> 
>>
>> Thanks,
>>
>>> Thanks,
>>>
>>>>
>>>> In my case, the first boot up has a failure in recovery as below -
>>>>
>>>> F2FS-fs (mmcblk0p75): find_fsync_dnodes: detect looped node chain, blkaddr:1979471, next:1979472
>>>> F2FS-fs (mmcblk0p75): Cannot recover all fsync data errno=-22
>>>>
>>>> But in the second attempt, retry will be set to false and because of that
>>>> recover_fsync_data() is skipped. This helped mount to be successful in
>>>> the second attempt. 
>>>>
>>>> Thanks,
>>>>
>>>>> Thanks,
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Sahitya.
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>>> How about adding a condition in f2fs_may_extent_tree() when adding extents?
>>>>>>>>> Likewise, if (shrinker is not registered) return false;
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> +	shrink_dcache_sb(sb);
>>>>>>>>>> +	evict_inodes(sb);
>>>>>>>>>> +	f2fs_shrink_extent_tree(sbi, __count_extent_cache(sbi));
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>>  static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>>>>>>>>>  {
>>>>>>>>>>  	struct f2fs_sb_info *sbi;
>>>>>>>>>> @@ -3402,6 +3412,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>>>>>>>>>  	 * falls into an infinite loop in f2fs_sync_meta_pages().
>>>>>>>>>>  	 */
>>>>>>>>>>  	truncate_inode_pages_final(META_MAPPING(sbi));
>>>>>>>>>> +	/* cleanup recovery and quota inodes */
>>>>>>>>>> +	f2fs_cleanup_inodes(sbi);
>>>>>>>>>>  	f2fs_unregister_sysfs(sbi);
>>>>>>>>>>  free_root_inode:
>>>>>>>>>>  	dput(sb->s_root);
>>>>>>>>>> @@ -3445,7 +3457,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>>>>>>>>>  	/* give only one another chance */
>>>>>>>>>>  	if (retry) {
>>>>>>>>>>  		retry = false;
>>>>>>>>>> -		shrink_dcache_sb(sb);
>>>>>>>>>>  		goto try_onemore;
>>>>>>>>>>  	}
>>>>>>>>>>  	return err;
>>>>>>>>>> -- 
>>>>>>>>>> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
>>>>>>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
>>>>>>>>>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> -- 
>>>>>>> --
>>>>>>> Sent by a consultant of the Qualcomm Innovation Center, Inc.
>>>>>>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
>>>>>>
>>>>>> .
>>>>>>
>>>>>
>>>>
>>>
>>
>> -- 
>> --
>> Sent by a consultant of the Qualcomm Innovation Center, Inc.
>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
> 
> .
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] f2fs: fix sbi->extent_list corruption issue
  2018-12-18 10:28                   ` Chao Yu
@ 2018-12-18 22:47                     ` Jaegeuk Kim
  2018-12-19  3:43                       ` Chao Yu
  0 siblings, 1 reply; 21+ messages in thread
From: Jaegeuk Kim @ 2018-12-18 22:47 UTC (permalink / raw)
  To: Chao Yu; +Cc: Sahitya Tummala, linux-f2fs-devel, linux-kernel

On 12/18, Chao Yu wrote:
> On 2018/12/14 22:25, Jaegeuk Kim wrote:
> > On 12/14, Sahitya Tummala wrote:
> >> On Wed, Dec 12, 2018 at 11:36:08AM +0800, Chao Yu wrote:
> >>> On 2018/12/12 11:17, Sahitya Tummala wrote:
> >>>> On Fri, Dec 07, 2018 at 05:47:31PM +0800, Chao Yu wrote:
> >>>>> On 2018/12/1 4:33, Jaegeuk Kim wrote:
> >>>>>> On 11/29, Sahitya Tummala wrote:
> >>>>>>>
> >>>>>>> On Tue, Nov 27, 2018 at 09:42:39AM +0800, Chao Yu wrote:
> >>>>>>>> On 2018/11/27 8:30, Jaegeuk Kim wrote:
> >>>>>>>>> On 11/26, Sahitya Tummala wrote:
> >>>>>>>>>> When there is a failure in f2fs_fill_super() after/during
> >>>>>>>>>> the recovery of fsync'd nodes, it frees the current sbi and
> >>>>>>>>>> retries again. This time the mount is successful, but the files
> >>>>>>>>>> that got recovered before retry, still holds the extent tree,
> >>>>>>>>>> whose extent nodes list is corrupted since sbi and sbi->extent_list
> >>>>>>>>>> is freed up. The list_del corruption issue is observed when the
> >>>>>>>>>> file system is getting unmounted and when those recoverd files extent
> >>>>>>>>>> node is being freed up in the below context.
> >>>>>>>>>>
> >>>>>>>>>> list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
> >>>>>>>>>> <...>
> >>>>>>>>>> kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
> >>>>>>>>>> task: fffffff1f46f2280 task.stack: ffffff8008068000
> >>>>>>>>>> lr : __list_del_entry_valid+0x94/0xb4
> >>>>>>>>>> pc : __list_del_entry_valid+0x94/0xb4
> >>>>>>>>>> <...>
> >>>>>>>>>> Call trace:
> >>>>>>>>>> __list_del_entry_valid+0x94/0xb4
> >>>>>>>>>> __release_extent_node+0xb0/0x114
> >>>>>>>>>> __free_extent_tree+0x58/0x7c
> >>>>>>>>>> f2fs_shrink_extent_tree+0xdc/0x3b0
> >>>>>>>>>> f2fs_leave_shrinker+0x28/0x7c
> >>>>>>>>>> f2fs_put_super+0xfc/0x1e0
> >>>>>>>>>> generic_shutdown_super+0x70/0xf4
> >>>>>>>>>> kill_block_super+0x2c/0x5c
> >>>>>>>>>> kill_f2fs_super+0x44/0x50
> >>>>>>>>>> deactivate_locked_super+0x60/0x8c
> >>>>>>>>>> deactivate_super+0x68/0x74
> >>>>>>>>>> cleanup_mnt+0x40/0x78
> >>>>>>>>>> __cleanup_mnt+0x1c/0x28
> >>>>>>>>>> task_work_run+0x48/0xd0
> >>>>>>>>>> do_notify_resume+0x678/0xe98
> >>>>>>>>>> work_pending+0x8/0x14
> >>>>>>>>>>
> >>>>>>>>>> Fix this by cleaning up inodes, extent tree and nodes of those
> >>>>>>>>>> recovered files before freeing up sbi and before next retry.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
> >>>>>>>>>> ---
> >>>>>>>>>> v2:
> >>>>>>>>>> -call evict_inodes() and f2fs_shrink_extent_tree() to cleanup inodes
> >>>>>>>>>>
> >>>>>>>>>>  fs/f2fs/f2fs.h     |  1 +
> >>>>>>>>>>  fs/f2fs/shrinker.c |  2 +-
> >>>>>>>>>>  fs/f2fs/super.c    | 13 ++++++++++++-
> >>>>>>>>>>  3 files changed, 14 insertions(+), 2 deletions(-)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >>>>>>>>>> index 1e03197..aaee63b 100644
> >>>>>>>>>> --- a/fs/f2fs/f2fs.h
> >>>>>>>>>> +++ b/fs/f2fs/f2fs.h
> >>>>>>>>>> @@ -3407,6 +3407,7 @@ struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root_cached *root,
> >>>>>>>>>>  bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi,
> >>>>>>>>>>  						struct rb_root_cached *root);
> >>>>>>>>>>  unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink);
> >>>>>>>>>> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi);
> >>>>>>>>>>  bool f2fs_init_extent_tree(struct inode *inode, struct f2fs_extent *i_ext);
> >>>>>>>>>>  void f2fs_drop_extent_tree(struct inode *inode);
> >>>>>>>>>>  unsigned int f2fs_destroy_extent_node(struct inode *inode);
> >>>>>>>>>> diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
> >>>>>>>>>> index 9e13db9..7e3c13b 100644
> >>>>>>>>>> --- a/fs/f2fs/shrinker.c
> >>>>>>>>>> +++ b/fs/f2fs/shrinker.c
> >>>>>>>>>> @@ -30,7 +30,7 @@ static unsigned long __count_free_nids(struct f2fs_sb_info *sbi)
> >>>>>>>>>>  	return count > 0 ? count : 0;
> >>>>>>>>>>  }
> >>>>>>>>>>  
> >>>>>>>>>> -static unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
> >>>>>>>>>> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
> >>>>>>>>>>  {
> >>>>>>>>>>  	return atomic_read(&sbi->total_zombie_tree) +
> >>>>>>>>>>  				atomic_read(&sbi->total_ext_node);
> >>>>>>>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> >>>>>>>>>> index af58b2c..769e7b1 100644
> >>>>>>>>>> --- a/fs/f2fs/super.c
> >>>>>>>>>> +++ b/fs/f2fs/super.c
> >>>>>>>>>> @@ -3016,6 +3016,16 @@ static void f2fs_tuning_parameters(struct f2fs_sb_info *sbi)
> >>>>>>>>>>  	sbi->readdir_ra = 1;
> >>>>>>>>>>  }
> >>>>>>>>>>  
> >>>>>>>>>> +static void f2fs_cleanup_inodes(struct f2fs_sb_info *sbi)
> >>>>>>>>>> +{
> >>>>>>>>>> +	struct super_block *sb = sbi->sb;
> >>>>>>>>>> +
> >>>>>>>>>> +	sync_filesystem(sb);
> >>>>>>>>>
> >>>>>>>>> This writes another checkpoint, which would not be what this retrial intended.
> >>>>>>>>
> >>>>>>>> Actually, checkpoint will not be triggered due to SBI_POR_DOING flag check
> >>>>>>>> as below:
> >>>>>>>>
> >>>>>>>> int f2fs_sync_fs(struct super_block *sb, int sync)
> >>>>>>>> {
> >>>>>>>> ...
> >>>>>>>> 	if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
> >>>>>>>> 		return -EAGAIN;
> >>>>>>>> ...
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> And also all dirty data/node won't be persisted due to SBI_POR_DOING flag,
> >>>>>>>> IIUC.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Thanks Chao for the clarification.
> >>>>>>>
> >>>>>>> Hi Jaegeuk,
> >>>>>>>
> >>>>>>> Do you still have any further concerns or comments on this patch?
> >>>>>>
> >>>>>> Could you try the below first?
> >>>>>>
> >>>>>> --  How about adding a condition in f2fs_may_extent_tree() when adding extents?
> >>>>>> -- Likewise, if (shrinker is not registered) return false;
> >>>>>>
> >>>>>> If we can fix what you described directly, I don't want to rely on such the
> >>>>>> assumptions saying we won't do checkpoint. This flow literally says syncing
> >>>>>> and evicting cached objects, which opposed to what we'd like to drop all caches
> >>>>>> and restart fill_super again.
> >>>>>>
> >>>>>> Let me consider this as a final resolution.
> >>>>>
> >>>>> Jaegeuk,
> >>>>>
> >>>>> Still I want to ask, what kind of scenario we have to add retry logic in
> >>>>> fill_super for? As in android scenario, it must be extreme rare case that
> >>>>> system runs out-of-memory in boot time...at least, I didn't get any kind of
> >>>>> report like that.
> >>>>>
> >>>> Hi Chao,
> >>>
> >>> Hi Sahitya,
> >>>
> >>> Thanks for letting me know that, I git-blamed the code, and found the
> >>> original intention is like what you described:
> >>>
> >>> commit ed2e621a95d704e6a4e904cc00524e8cbddda0c2
> >>> Author: Jaegeuk Kim <jaegeuk@kernel.org>
> >>> Date:   Fri Aug 8 15:37:41 2014 -0700
> >>>
> >>>     f2fs: give a chance to mount again when encountering errors
> >>>
> >>>     This patch gives another chance to try mount process when we encounter
> >>> an error.
> >>>     This makes an effect on the roll-forward recovery failures as well.
> >>>
> >>> But I doubt that if we failed in recovery, maybe there is corruption in
> >>> this image, would it be better to fail the mount, and let user fsck it and
> >>> retry the mount? otherwise, the corruption may be expanded...
> > 
> > The problem was there was no way to recover roll-forward area by fsck. IOWs,
> > mount was failing all the time. I don't think roll-forward itself can corrupt
> 
> I got your concern, IMO, if mount fails, it will be better to let user
> decide how to handle it.

Roll-forward is not based on user decision, but f2fs does internally. So, I'm
in doubt we have to ask users on any failed case here.
I don't catch why we need to revert this which has been landed for a long time.

> 
> If mount fails due to:
> 
> 1) recovery, user can run fsck and/or try disable_roll_forward or
> norecovery option in another mount;
> 2) -EINVAL caused by sanity, user can run fsck and retry mount.
> 3) -ENOMEM caused low memory in system, user can add more memory and retry
> mount.
> ...
> 
> Thanks,
> 
> > the image more. Please report, if you have any issue on this.>
> >>>
> >>
> >> Hi Jaegeuk,
> >>
> >> How do you think about this? If you think it is okay, then I will fix the
> >> sbi->extent_list corruption issue, by removing the retry logic. Otherwise,
> >> I will fix it in the extent handling as you have suggested earlier.
> > 
> > I'd like to keep retry logic, so could you please test what I suggested above?
> > 
> > Thanks,
> > 
> >>
> >> Thanks,
> >>
> >>> Thanks,
> >>>
> >>>>
> >>>> In my case, the first boot up has a failure in recovery as below -
> >>>>
> >>>> F2FS-fs (mmcblk0p75): find_fsync_dnodes: detect looped node chain, blkaddr:1979471, next:1979472
> >>>> F2FS-fs (mmcblk0p75): Cannot recover all fsync data errno=-22
> >>>>
> >>>> But in the second attempt, retry will be set to false and because of that
> >>>> recover_fsync_data() is skipped. This helped mount to be successful in
> >>>> the second attempt. 
> >>>>
> >>>> Thanks,
> >>>>
> >>>>> Thanks,
> >>>>>
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Sahitya.
> >>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>>
> >>>>>>>>> How about adding a condition in f2fs_may_extent_tree() when adding extents?
> >>>>>>>>> Likewise, if (shrinker is not registered) return false;
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> +	shrink_dcache_sb(sb);
> >>>>>>>>>> +	evict_inodes(sb);
> >>>>>>>>>> +	f2fs_shrink_extent_tree(sbi, __count_extent_cache(sbi));
> >>>>>>>>>> +}
> >>>>>>>>>> +
> >>>>>>>>>>  static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>>>>>>>>>  {
> >>>>>>>>>>  	struct f2fs_sb_info *sbi;
> >>>>>>>>>> @@ -3402,6 +3412,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>>>>>>>>>  	 * falls into an infinite loop in f2fs_sync_meta_pages().
> >>>>>>>>>>  	 */
> >>>>>>>>>>  	truncate_inode_pages_final(META_MAPPING(sbi));
> >>>>>>>>>> +	/* cleanup recovery and quota inodes */
> >>>>>>>>>> +	f2fs_cleanup_inodes(sbi);
> >>>>>>>>>>  	f2fs_unregister_sysfs(sbi);
> >>>>>>>>>>  free_root_inode:
> >>>>>>>>>>  	dput(sb->s_root);
> >>>>>>>>>> @@ -3445,7 +3457,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>>>>>>>>>  	/* give only one another chance */
> >>>>>>>>>>  	if (retry) {
> >>>>>>>>>>  		retry = false;
> >>>>>>>>>> -		shrink_dcache_sb(sb);
> >>>>>>>>>>  		goto try_onemore;
> >>>>>>>>>>  	}
> >>>>>>>>>>  	return err;
> >>>>>>>>>> -- 
> >>>>>>>>>> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
> >>>>>>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
> >>>>>>>>>
> >>>>>>>>> .
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> -- 
> >>>>>>> --
> >>>>>>> Sent by a consultant of the Qualcomm Innovation Center, Inc.
> >>>>>>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
> >>>>>>
> >>>>>> .
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >> -- 
> >> --
> >> Sent by a consultant of the Qualcomm Innovation Center, Inc.
> >> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
> > 
> > .
> > 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] f2fs: fix sbi->extent_list corruption issue
  2018-12-18 22:47                     ` Jaegeuk Kim
@ 2018-12-19  3:43                       ` Chao Yu
  2018-12-19 23:42                         ` Jaegeuk Kim
  0 siblings, 1 reply; 21+ messages in thread
From: Chao Yu @ 2018-12-19  3:43 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: Sahitya Tummala, linux-f2fs-devel, linux-kernel

On 2018/12/19 6:47, Jaegeuk Kim wrote:
> On 12/18, Chao Yu wrote:
>> On 2018/12/14 22:25, Jaegeuk Kim wrote:
>>> On 12/14, Sahitya Tummala wrote:
>>>> On Wed, Dec 12, 2018 at 11:36:08AM +0800, Chao Yu wrote:
>>>>> On 2018/12/12 11:17, Sahitya Tummala wrote:
>>>>>> On Fri, Dec 07, 2018 at 05:47:31PM +0800, Chao Yu wrote:
>>>>>>> On 2018/12/1 4:33, Jaegeuk Kim wrote:
>>>>>>>> On 11/29, Sahitya Tummala wrote:
>>>>>>>>>
>>>>>>>>> On Tue, Nov 27, 2018 at 09:42:39AM +0800, Chao Yu wrote:
>>>>>>>>>> On 2018/11/27 8:30, Jaegeuk Kim wrote:
>>>>>>>>>>> On 11/26, Sahitya Tummala wrote:
>>>>>>>>>>>> When there is a failure in f2fs_fill_super() after/during
>>>>>>>>>>>> the recovery of fsync'd nodes, it frees the current sbi and
>>>>>>>>>>>> retries again. This time the mount is successful, but the files
>>>>>>>>>>>> that got recovered before retry, still holds the extent tree,
>>>>>>>>>>>> whose extent nodes list is corrupted since sbi and sbi->extent_list
>>>>>>>>>>>> is freed up. The list_del corruption issue is observed when the
>>>>>>>>>>>> file system is getting unmounted and when those recoverd files extent
>>>>>>>>>>>> node is being freed up in the below context.
>>>>>>>>>>>>
>>>>>>>>>>>> list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
>>>>>>>>>>>> <...>
>>>>>>>>>>>> kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
>>>>>>>>>>>> task: fffffff1f46f2280 task.stack: ffffff8008068000
>>>>>>>>>>>> lr : __list_del_entry_valid+0x94/0xb4
>>>>>>>>>>>> pc : __list_del_entry_valid+0x94/0xb4
>>>>>>>>>>>> <...>
>>>>>>>>>>>> Call trace:
>>>>>>>>>>>> __list_del_entry_valid+0x94/0xb4
>>>>>>>>>>>> __release_extent_node+0xb0/0x114
>>>>>>>>>>>> __free_extent_tree+0x58/0x7c
>>>>>>>>>>>> f2fs_shrink_extent_tree+0xdc/0x3b0
>>>>>>>>>>>> f2fs_leave_shrinker+0x28/0x7c
>>>>>>>>>>>> f2fs_put_super+0xfc/0x1e0
>>>>>>>>>>>> generic_shutdown_super+0x70/0xf4
>>>>>>>>>>>> kill_block_super+0x2c/0x5c
>>>>>>>>>>>> kill_f2fs_super+0x44/0x50
>>>>>>>>>>>> deactivate_locked_super+0x60/0x8c
>>>>>>>>>>>> deactivate_super+0x68/0x74
>>>>>>>>>>>> cleanup_mnt+0x40/0x78
>>>>>>>>>>>> __cleanup_mnt+0x1c/0x28
>>>>>>>>>>>> task_work_run+0x48/0xd0
>>>>>>>>>>>> do_notify_resume+0x678/0xe98
>>>>>>>>>>>> work_pending+0x8/0x14
>>>>>>>>>>>>
>>>>>>>>>>>> Fix this by cleaning up inodes, extent tree and nodes of those
>>>>>>>>>>>> recovered files before freeing up sbi and before next retry.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
>>>>>>>>>>>> ---
>>>>>>>>>>>> v2:
>>>>>>>>>>>> -call evict_inodes() and f2fs_shrink_extent_tree() to cleanup inodes
>>>>>>>>>>>>
>>>>>>>>>>>>  fs/f2fs/f2fs.h     |  1 +
>>>>>>>>>>>>  fs/f2fs/shrinker.c |  2 +-
>>>>>>>>>>>>  fs/f2fs/super.c    | 13 ++++++++++++-
>>>>>>>>>>>>  3 files changed, 14 insertions(+), 2 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>>>>>>>>>>> index 1e03197..aaee63b 100644
>>>>>>>>>>>> --- a/fs/f2fs/f2fs.h
>>>>>>>>>>>> +++ b/fs/f2fs/f2fs.h
>>>>>>>>>>>> @@ -3407,6 +3407,7 @@ struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root_cached *root,
>>>>>>>>>>>>  bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi,
>>>>>>>>>>>>  						struct rb_root_cached *root);
>>>>>>>>>>>>  unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink);
>>>>>>>>>>>> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi);
>>>>>>>>>>>>  bool f2fs_init_extent_tree(struct inode *inode, struct f2fs_extent *i_ext);
>>>>>>>>>>>>  void f2fs_drop_extent_tree(struct inode *inode);
>>>>>>>>>>>>  unsigned int f2fs_destroy_extent_node(struct inode *inode);
>>>>>>>>>>>> diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
>>>>>>>>>>>> index 9e13db9..7e3c13b 100644
>>>>>>>>>>>> --- a/fs/f2fs/shrinker.c
>>>>>>>>>>>> +++ b/fs/f2fs/shrinker.c
>>>>>>>>>>>> @@ -30,7 +30,7 @@ static unsigned long __count_free_nids(struct f2fs_sb_info *sbi)
>>>>>>>>>>>>  	return count > 0 ? count : 0;
>>>>>>>>>>>>  }
>>>>>>>>>>>>  
>>>>>>>>>>>> -static unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
>>>>>>>>>>>> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
>>>>>>>>>>>>  {
>>>>>>>>>>>>  	return atomic_read(&sbi->total_zombie_tree) +
>>>>>>>>>>>>  				atomic_read(&sbi->total_ext_node);
>>>>>>>>>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>>>>>>>>>>> index af58b2c..769e7b1 100644
>>>>>>>>>>>> --- a/fs/f2fs/super.c
>>>>>>>>>>>> +++ b/fs/f2fs/super.c
>>>>>>>>>>>> @@ -3016,6 +3016,16 @@ static void f2fs_tuning_parameters(struct f2fs_sb_info *sbi)
>>>>>>>>>>>>  	sbi->readdir_ra = 1;
>>>>>>>>>>>>  }
>>>>>>>>>>>>  
>>>>>>>>>>>> +static void f2fs_cleanup_inodes(struct f2fs_sb_info *sbi)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +	struct super_block *sb = sbi->sb;
>>>>>>>>>>>> +
>>>>>>>>>>>> +	sync_filesystem(sb);
>>>>>>>>>>>
>>>>>>>>>>> This writes another checkpoint, which would not be what this retrial intended.
>>>>>>>>>>
>>>>>>>>>> Actually, checkpoint will not be triggered due to SBI_POR_DOING flag check
>>>>>>>>>> as below:
>>>>>>>>>>
>>>>>>>>>> int f2fs_sync_fs(struct super_block *sb, int sync)
>>>>>>>>>> {
>>>>>>>>>> ...
>>>>>>>>>> 	if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
>>>>>>>>>> 		return -EAGAIN;
>>>>>>>>>> ...
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> And also all dirty data/node won't be persisted due to SBI_POR_DOING flag,
>>>>>>>>>> IIUC.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks Chao for the clarification.
>>>>>>>>>
>>>>>>>>> Hi Jaegeuk,
>>>>>>>>>
>>>>>>>>> Do you still have any further concerns or comments on this patch?
>>>>>>>>
>>>>>>>> Could you try the below first?
>>>>>>>>
>>>>>>>> --  How about adding a condition in f2fs_may_extent_tree() when adding extents?
>>>>>>>> -- Likewise, if (shrinker is not registered) return false;
>>>>>>>>
>>>>>>>> If we can fix what you described directly, I don't want to rely on such the
>>>>>>>> assumptions saying we won't do checkpoint. This flow literally says syncing
>>>>>>>> and evicting cached objects, which opposed to what we'd like to drop all caches
>>>>>>>> and restart fill_super again.
>>>>>>>>
>>>>>>>> Let me consider this as a final resolution.
>>>>>>>
>>>>>>> Jaegeuk,
>>>>>>>
>>>>>>> Still I want to ask, what kind of scenario we have to add retry logic in
>>>>>>> fill_super for? As in android scenario, it must be extreme rare case that
>>>>>>> system runs out-of-memory in boot time...at least, I didn't get any kind of
>>>>>>> report like that.
>>>>>>>
>>>>>> Hi Chao,
>>>>>
>>>>> Hi Sahitya,
>>>>>
>>>>> Thanks for letting me know that, I git-blamed the code, and found the
>>>>> original intention is like what you described:
>>>>>
>>>>> commit ed2e621a95d704e6a4e904cc00524e8cbddda0c2
>>>>> Author: Jaegeuk Kim <jaegeuk@kernel.org>
>>>>> Date:   Fri Aug 8 15:37:41 2014 -0700
>>>>>
>>>>>     f2fs: give a chance to mount again when encountering errors
>>>>>
>>>>>     This patch gives another chance to try mount process when we encounter
>>>>> an error.
>>>>>     This makes an effect on the roll-forward recovery failures as well.
>>>>>
>>>>> But I doubt that if we failed in recovery, maybe there is corruption in
>>>>> this image, would it be better to fail the mount, and let user fsck it and
>>>>> retry the mount? otherwise, the corruption may be expanded...
>>>
>>> The problem was there was no way to recover roll-forward area by fsck. IOWs,
>>> mount was failing all the time. I don't think roll-forward itself can corrupt
>>
>> I got your concern, IMO, if mount fails, it will be better to let user
>> decide how to handle it.
> 
> Roll-forward is not based on user decision, but f2fs does internally. So, I'm

Yup, IMO without roll-forward, data may lose, and posix compliance can be
corrupted, f2fs should do roll-forward internally as possible as it can.

> in doubt we have to ask users on any failed case here.
> I don't catch why we need to revert this which has been landed for a long time.

Actually, what I mean is mount can fail due to different reason, but we
handle it with fixed retry method by dropping recovery, it may be not flexible.

For example, first fill_super fails due to no memory, then second
fill_super runs w/o recovery, if we succeed, we may lose fsynced data. I
don't think it make sense.

Thanks,

> 
>>
>> If mount fails due to:
>>
>> 1) recovery, user can run fsck and/or try disable_roll_forward or
>> norecovery option in another mount;
>> 2) -EINVAL caused by sanity, user can run fsck and retry mount.
>> 3) -ENOMEM caused low memory in system, user can add more memory and retry
>> mount.
>> ...
>>
>> Thanks,
>>
>>> the image more. Please report, if you have any issue on this.>
>>>>>
>>>>
>>>> Hi Jaegeuk,
>>>>
>>>> How do you think about this? If you think it is okay, then I will fix the
>>>> sbi->extent_list corruption issue, by removing the retry logic. Otherwise,
>>>> I will fix it in the extent handling as you have suggested earlier.
>>>
>>> I'd like to keep retry logic, so could you please test what I suggested above?
>>>
>>> Thanks,
>>>
>>>>
>>>> Thanks,
>>>>
>>>>> Thanks,
>>>>>
>>>>>>
>>>>>> In my case, the first boot up has a failure in recovery as below -
>>>>>>
>>>>>> F2FS-fs (mmcblk0p75): find_fsync_dnodes: detect looped node chain, blkaddr:1979471, next:1979472
>>>>>> F2FS-fs (mmcblk0p75): Cannot recover all fsync data errno=-22
>>>>>>
>>>>>> But in the second attempt, retry will be set to false and because of that
>>>>>> recover_fsync_data() is skipped. This helped mount to be successful in
>>>>>> the second attempt. 
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Sahitya.
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>>> How about adding a condition in f2fs_may_extent_tree() when adding extents?
>>>>>>>>>>> Likewise, if (shrinker is not registered) return false;
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> +	shrink_dcache_sb(sb);
>>>>>>>>>>>> +	evict_inodes(sb);
>>>>>>>>>>>> +	f2fs_shrink_extent_tree(sbi, __count_extent_cache(sbi));
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>>  static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>>>>>>>>>>>  {
>>>>>>>>>>>>  	struct f2fs_sb_info *sbi;
>>>>>>>>>>>> @@ -3402,6 +3412,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>>>>>>>>>>>  	 * falls into an infinite loop in f2fs_sync_meta_pages().
>>>>>>>>>>>>  	 */
>>>>>>>>>>>>  	truncate_inode_pages_final(META_MAPPING(sbi));
>>>>>>>>>>>> +	/* cleanup recovery and quota inodes */
>>>>>>>>>>>> +	f2fs_cleanup_inodes(sbi);
>>>>>>>>>>>>  	f2fs_unregister_sysfs(sbi);
>>>>>>>>>>>>  free_root_inode:
>>>>>>>>>>>>  	dput(sb->s_root);
>>>>>>>>>>>> @@ -3445,7 +3457,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>>>>>>>>>>>  	/* give only one another chance */
>>>>>>>>>>>>  	if (retry) {
>>>>>>>>>>>>  		retry = false;
>>>>>>>>>>>> -		shrink_dcache_sb(sb);
>>>>>>>>>>>>  		goto try_onemore;
>>>>>>>>>>>>  	}
>>>>>>>>>>>>  	return err;
>>>>>>>>>>>> -- 
>>>>>>>>>>>> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
>>>>>>>>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
>>>>>>>>>>>
>>>>>>>>>>> .
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> --
>>>>>>>>> Sent by a consultant of the Qualcomm Innovation Center, Inc.
>>>>>>>>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
>>>>>>>>
>>>>>>>> .
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>> -- 
>>>> --
>>>> Sent by a consultant of the Qualcomm Innovation Center, Inc.
>>>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
>>>
>>> .
>>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] f2fs: fix sbi->extent_list corruption issue
  2018-12-19  3:43                       ` Chao Yu
@ 2018-12-19 23:42                         ` Jaegeuk Kim
  0 siblings, 0 replies; 21+ messages in thread
From: Jaegeuk Kim @ 2018-12-19 23:42 UTC (permalink / raw)
  To: Chao Yu; +Cc: Sahitya Tummala, linux-f2fs-devel, linux-kernel

On 12/19, Chao Yu wrote:
> On 2018/12/19 6:47, Jaegeuk Kim wrote:
> > On 12/18, Chao Yu wrote:
> >> On 2018/12/14 22:25, Jaegeuk Kim wrote:
> >>> On 12/14, Sahitya Tummala wrote:
> >>>> On Wed, Dec 12, 2018 at 11:36:08AM +0800, Chao Yu wrote:
> >>>>> On 2018/12/12 11:17, Sahitya Tummala wrote:
> >>>>>> On Fri, Dec 07, 2018 at 05:47:31PM +0800, Chao Yu wrote:
> >>>>>>> On 2018/12/1 4:33, Jaegeuk Kim wrote:
> >>>>>>>> On 11/29, Sahitya Tummala wrote:
> >>>>>>>>>
> >>>>>>>>> On Tue, Nov 27, 2018 at 09:42:39AM +0800, Chao Yu wrote:
> >>>>>>>>>> On 2018/11/27 8:30, Jaegeuk Kim wrote:
> >>>>>>>>>>> On 11/26, Sahitya Tummala wrote:
> >>>>>>>>>>>> When there is a failure in f2fs_fill_super() after/during
> >>>>>>>>>>>> the recovery of fsync'd nodes, it frees the current sbi and
> >>>>>>>>>>>> retries again. This time the mount is successful, but the files
> >>>>>>>>>>>> that got recovered before retry, still holds the extent tree,
> >>>>>>>>>>>> whose extent nodes list is corrupted since sbi and sbi->extent_list
> >>>>>>>>>>>> is freed up. The list_del corruption issue is observed when the
> >>>>>>>>>>>> file system is getting unmounted and when those recoverd files extent
> >>>>>>>>>>>> node is being freed up in the below context.
> >>>>>>>>>>>>
> >>>>>>>>>>>> list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
> >>>>>>>>>>>> <...>
> >>>>>>>>>>>> kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
> >>>>>>>>>>>> task: fffffff1f46f2280 task.stack: ffffff8008068000
> >>>>>>>>>>>> lr : __list_del_entry_valid+0x94/0xb4
> >>>>>>>>>>>> pc : __list_del_entry_valid+0x94/0xb4
> >>>>>>>>>>>> <...>
> >>>>>>>>>>>> Call trace:
> >>>>>>>>>>>> __list_del_entry_valid+0x94/0xb4
> >>>>>>>>>>>> __release_extent_node+0xb0/0x114
> >>>>>>>>>>>> __free_extent_tree+0x58/0x7c
> >>>>>>>>>>>> f2fs_shrink_extent_tree+0xdc/0x3b0
> >>>>>>>>>>>> f2fs_leave_shrinker+0x28/0x7c
> >>>>>>>>>>>> f2fs_put_super+0xfc/0x1e0
> >>>>>>>>>>>> generic_shutdown_super+0x70/0xf4
> >>>>>>>>>>>> kill_block_super+0x2c/0x5c
> >>>>>>>>>>>> kill_f2fs_super+0x44/0x50
> >>>>>>>>>>>> deactivate_locked_super+0x60/0x8c
> >>>>>>>>>>>> deactivate_super+0x68/0x74
> >>>>>>>>>>>> cleanup_mnt+0x40/0x78
> >>>>>>>>>>>> __cleanup_mnt+0x1c/0x28
> >>>>>>>>>>>> task_work_run+0x48/0xd0
> >>>>>>>>>>>> do_notify_resume+0x678/0xe98
> >>>>>>>>>>>> work_pending+0x8/0x14
> >>>>>>>>>>>>
> >>>>>>>>>>>> Fix this by cleaning up inodes, extent tree and nodes of those
> >>>>>>>>>>>> recovered files before freeing up sbi and before next retry.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
> >>>>>>>>>>>> ---
> >>>>>>>>>>>> v2:
> >>>>>>>>>>>> -call evict_inodes() and f2fs_shrink_extent_tree() to cleanup inodes
> >>>>>>>>>>>>
> >>>>>>>>>>>>  fs/f2fs/f2fs.h     |  1 +
> >>>>>>>>>>>>  fs/f2fs/shrinker.c |  2 +-
> >>>>>>>>>>>>  fs/f2fs/super.c    | 13 ++++++++++++-
> >>>>>>>>>>>>  3 files changed, 14 insertions(+), 2 deletions(-)
> >>>>>>>>>>>>
> >>>>>>>>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >>>>>>>>>>>> index 1e03197..aaee63b 100644
> >>>>>>>>>>>> --- a/fs/f2fs/f2fs.h
> >>>>>>>>>>>> +++ b/fs/f2fs/f2fs.h
> >>>>>>>>>>>> @@ -3407,6 +3407,7 @@ struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root_cached *root,
> >>>>>>>>>>>>  bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi,
> >>>>>>>>>>>>  						struct rb_root_cached *root);
> >>>>>>>>>>>>  unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink);
> >>>>>>>>>>>> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi);
> >>>>>>>>>>>>  bool f2fs_init_extent_tree(struct inode *inode, struct f2fs_extent *i_ext);
> >>>>>>>>>>>>  void f2fs_drop_extent_tree(struct inode *inode);
> >>>>>>>>>>>>  unsigned int f2fs_destroy_extent_node(struct inode *inode);
> >>>>>>>>>>>> diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
> >>>>>>>>>>>> index 9e13db9..7e3c13b 100644
> >>>>>>>>>>>> --- a/fs/f2fs/shrinker.c
> >>>>>>>>>>>> +++ b/fs/f2fs/shrinker.c
> >>>>>>>>>>>> @@ -30,7 +30,7 @@ static unsigned long __count_free_nids(struct f2fs_sb_info *sbi)
> >>>>>>>>>>>>  	return count > 0 ? count : 0;
> >>>>>>>>>>>>  }
> >>>>>>>>>>>>  
> >>>>>>>>>>>> -static unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
> >>>>>>>>>>>> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
> >>>>>>>>>>>>  {
> >>>>>>>>>>>>  	return atomic_read(&sbi->total_zombie_tree) +
> >>>>>>>>>>>>  				atomic_read(&sbi->total_ext_node);
> >>>>>>>>>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> >>>>>>>>>>>> index af58b2c..769e7b1 100644
> >>>>>>>>>>>> --- a/fs/f2fs/super.c
> >>>>>>>>>>>> +++ b/fs/f2fs/super.c
> >>>>>>>>>>>> @@ -3016,6 +3016,16 @@ static void f2fs_tuning_parameters(struct f2fs_sb_info *sbi)
> >>>>>>>>>>>>  	sbi->readdir_ra = 1;
> >>>>>>>>>>>>  }
> >>>>>>>>>>>>  
> >>>>>>>>>>>> +static void f2fs_cleanup_inodes(struct f2fs_sb_info *sbi)
> >>>>>>>>>>>> +{
> >>>>>>>>>>>> +	struct super_block *sb = sbi->sb;
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +	sync_filesystem(sb);
> >>>>>>>>>>>
> >>>>>>>>>>> This writes another checkpoint, which would not be what this retrial intended.
> >>>>>>>>>>
> >>>>>>>>>> Actually, checkpoint will not be triggered due to SBI_POR_DOING flag check
> >>>>>>>>>> as below:
> >>>>>>>>>>
> >>>>>>>>>> int f2fs_sync_fs(struct super_block *sb, int sync)
> >>>>>>>>>> {
> >>>>>>>>>> ...
> >>>>>>>>>> 	if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
> >>>>>>>>>> 		return -EAGAIN;
> >>>>>>>>>> ...
> >>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>> And also all dirty data/node won't be persisted due to SBI_POR_DOING flag,
> >>>>>>>>>> IIUC.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks Chao for the clarification.
> >>>>>>>>>
> >>>>>>>>> Hi Jaegeuk,
> >>>>>>>>>
> >>>>>>>>> Do you still have any further concerns or comments on this patch?
> >>>>>>>>
> >>>>>>>> Could you try the below first?
> >>>>>>>>
> >>>>>>>> --  How about adding a condition in f2fs_may_extent_tree() when adding extents?
> >>>>>>>> -- Likewise, if (shrinker is not registered) return false;
> >>>>>>>>
> >>>>>>>> If we can fix what you described directly, I don't want to rely on such the
> >>>>>>>> assumptions saying we won't do checkpoint. This flow literally says syncing
> >>>>>>>> and evicting cached objects, which opposed to what we'd like to drop all caches
> >>>>>>>> and restart fill_super again.
> >>>>>>>>
> >>>>>>>> Let me consider this as a final resolution.
> >>>>>>>
> >>>>>>> Jaegeuk,
> >>>>>>>
> >>>>>>> Still I want to ask, what kind of scenario we have to add retry logic in
> >>>>>>> fill_super for? As in android scenario, it must be extreme rare case that
> >>>>>>> system runs out-of-memory in boot time...at least, I didn't get any kind of
> >>>>>>> report like that.
> >>>>>>>
> >>>>>> Hi Chao,
> >>>>>
> >>>>> Hi Sahitya,
> >>>>>
> >>>>> Thanks for letting me know that, I git-blamed the code, and found the
> >>>>> original intention is like what you described:
> >>>>>
> >>>>> commit ed2e621a95d704e6a4e904cc00524e8cbddda0c2
> >>>>> Author: Jaegeuk Kim <jaegeuk@kernel.org>
> >>>>> Date:   Fri Aug 8 15:37:41 2014 -0700
> >>>>>
> >>>>>     f2fs: give a chance to mount again when encountering errors
> >>>>>
> >>>>>     This patch gives another chance to try mount process when we encounter
> >>>>> an error.
> >>>>>     This makes an effect on the roll-forward recovery failures as well.
> >>>>>
> >>>>> But I doubt that if we failed in recovery, maybe there is corruption in
> >>>>> this image, would it be better to fail the mount, and let user fsck it and
> >>>>> retry the mount? otherwise, the corruption may be expanded...
> >>>
> >>> The problem was there was no way to recover roll-forward area by fsck. IOWs,
> >>> mount was failing all the time. I don't think roll-forward itself can corrupt
> >>
> >> I got your concern, IMO, if mount fails, it will be better to let user
> >> decide how to handle it.
> > 
> > Roll-forward is not based on user decision, but f2fs does internally. So, I'm
> 
> Yup, IMO without roll-forward, data may lose, and posix compliance can be
> corrupted, f2fs should do roll-forward internally as possible as it can.
> 
> > in doubt we have to ask users on any failed case here.
> > I don't catch why we need to revert this which has been landed for a long time.
> 
> Actually, what I mean is mount can fail due to different reason, but we
> handle it with fixed retry method by dropping recovery, it may be not flexible.
> 
> For example, first fill_super fails due to no memory, then second
> fill_super runs w/o recovery, if we succeed, we may lose fsynced data. I
> don't think it make sense.

Then, that's another issue, and yeah, we can prepare a patch for that.

> 
> Thanks,
> 
> > 
> >>
> >> If mount fails due to:
> >>
> >> 1) recovery, user can run fsck and/or try disable_roll_forward or
> >> norecovery option in another mount;
> >> 2) -EINVAL caused by sanity, user can run fsck and retry mount.
> >> 3) -ENOMEM caused low memory in system, user can add more memory and retry
> >> mount.
> >> ...
> >>
> >> Thanks,
> >>
> >>> the image more. Please report, if you have any issue on this.>
> >>>>>
> >>>>
> >>>> Hi Jaegeuk,
> >>>>
> >>>> How do you think about this? If you think it is okay, then I will fix the
> >>>> sbi->extent_list corruption issue, by removing the retry logic. Otherwise,
> >>>> I will fix it in the extent handling as you have suggested earlier.
> >>>
> >>> I'd like to keep retry logic, so could you please test what I suggested above?
> >>>
> >>> Thanks,
> >>>
> >>>>
> >>>> Thanks,
> >>>>
> >>>>> Thanks,
> >>>>>
> >>>>>>
> >>>>>> In my case, the first boot up has a failure in recovery as below -
> >>>>>>
> >>>>>> F2FS-fs (mmcblk0p75): find_fsync_dnodes: detect looped node chain, blkaddr:1979471, next:1979472
> >>>>>> F2FS-fs (mmcblk0p75): Cannot recover all fsync data errno=-22
> >>>>>>
> >>>>>> But in the second attempt, retry will be set to false and because of that
> >>>>>> recover_fsync_data() is skipped. This helped mount to be successful in
> >>>>>> the second attempt. 
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Sahitya.
> >>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>>
> >>>>>>>>>>> How about adding a condition in f2fs_may_extent_tree() when adding extents?
> >>>>>>>>>>> Likewise, if (shrinker is not registered) return false;
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> +	shrink_dcache_sb(sb);
> >>>>>>>>>>>> +	evict_inodes(sb);
> >>>>>>>>>>>> +	f2fs_shrink_extent_tree(sbi, __count_extent_cache(sbi));
> >>>>>>>>>>>> +}
> >>>>>>>>>>>> +
> >>>>>>>>>>>>  static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>>>>>>>>>>>  {
> >>>>>>>>>>>>  	struct f2fs_sb_info *sbi;
> >>>>>>>>>>>> @@ -3402,6 +3412,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>>>>>>>>>>>  	 * falls into an infinite loop in f2fs_sync_meta_pages().
> >>>>>>>>>>>>  	 */
> >>>>>>>>>>>>  	truncate_inode_pages_final(META_MAPPING(sbi));
> >>>>>>>>>>>> +	/* cleanup recovery and quota inodes */
> >>>>>>>>>>>> +	f2fs_cleanup_inodes(sbi);
> >>>>>>>>>>>>  	f2fs_unregister_sysfs(sbi);
> >>>>>>>>>>>>  free_root_inode:
> >>>>>>>>>>>>  	dput(sb->s_root);
> >>>>>>>>>>>> @@ -3445,7 +3457,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>>>>>>>>>>>  	/* give only one another chance */
> >>>>>>>>>>>>  	if (retry) {
> >>>>>>>>>>>>  		retry = false;
> >>>>>>>>>>>> -		shrink_dcache_sb(sb);
> >>>>>>>>>>>>  		goto try_onemore;
> >>>>>>>>>>>>  	}
> >>>>>>>>>>>>  	return err;
> >>>>>>>>>>>> -- 
> >>>>>>>>>>>> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
> >>>>>>>>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
> >>>>>>>>>>>
> >>>>>>>>>>> .
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> -- 
> >>>>>>>>> --
> >>>>>>>>> Sent by a consultant of the Qualcomm Innovation Center, Inc.
> >>>>>>>>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
> >>>>>>>>
> >>>>>>>> .
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>> -- 
> >>>> --
> >>>> Sent by a consultant of the Qualcomm Innovation Center, Inc.
> >>>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
> >>>
> >>> .
> >>>
> > 
> > .
> > 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] f2fs: fix sbi->extent_list corruption issue
  2018-11-26  4:47 [PATCH v2] f2fs: fix sbi->extent_list corruption issue Sahitya Tummala
  2018-11-26  7:17 ` Chao Yu
  2018-11-27  0:30 ` Jaegeuk Kim
@ 2019-01-04  8:05 ` Sahitya Tummala
  2019-01-04 20:33   ` Jaegeuk Kim
  2 siblings, 1 reply; 21+ messages in thread
From: Sahitya Tummala @ 2019-01-04  8:05 UTC (permalink / raw)
  To: Jaegeuk Kim, Chao Yu, linux-f2fs-devel; +Cc: linux-kernel

On Mon, Nov 26, 2018 at 10:17:20AM +0530, Sahitya Tummala wrote:
> When there is a failure in f2fs_fill_super() after/during
> the recovery of fsync'd nodes, it frees the current sbi and
> retries again. This time the mount is successful, but the files
> that got recovered before retry, still holds the extent tree,
> whose extent nodes list is corrupted since sbi and sbi->extent_list
> is freed up. The list_del corruption issue is observed when the
> file system is getting unmounted and when those recoverd files extent
> node is being freed up in the below context.
> 
> list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
> <...>
> kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
> task: fffffff1f46f2280 task.stack: ffffff8008068000
> lr : __list_del_entry_valid+0x94/0xb4
> pc : __list_del_entry_valid+0x94/0xb4
> <...>
> Call trace:
> __list_del_entry_valid+0x94/0xb4
> __release_extent_node+0xb0/0x114
> __free_extent_tree+0x58/0x7c
> f2fs_shrink_extent_tree+0xdc/0x3b0
> f2fs_leave_shrinker+0x28/0x7c
> f2fs_put_super+0xfc/0x1e0
> generic_shutdown_super+0x70/0xf4
> kill_block_super+0x2c/0x5c
> kill_f2fs_super+0x44/0x50
> deactivate_locked_super+0x60/0x8c
> deactivate_super+0x68/0x74
> cleanup_mnt+0x40/0x78
> __cleanup_mnt+0x1c/0x28
> task_work_run+0x48/0xd0
> do_notify_resume+0x678/0xe98
> work_pending+0x8/0x14
> 
> Fix this by cleaning up inodes, extent tree and nodes of those
> recovered files before freeing up sbi and before next retry.
> 
Hi Jaegeuk, Chao,

I have observed another scenario where the similar list corruption issue
can happen with sbi->inode_list as well. If recover_fsync_data()
fails at some point in write_checkpoint() due to some error and if
those recovered inodes are still dirty, then after the mount is
successful, this issue is observed when that dirty inode is under 
writeback.

[   90.400500] list_del corruption. prev->next should be ffffffed1f566208, but was (null)
[   90.675349] Call trace:
[   90.677869]  __list_del_entry_valid+0x94/0xb4
[   90.682351]  remove_dirty_inode+0xac/0x114
[   90.686563]  __f2fs_write_data_pages+0x6a8/0x6c8
[   90.691302]  f2fs_write_data_pages+0x40/0x4c
[   90.695695]  do_writepages+0x80/0xf0
[   90.699372]  __writeback_single_inode+0xdc/0x4ac
[   90.704113]  writeback_sb_inodes+0x280/0x440
[   90.708501]  wb_writeback+0x1b8/0x3d0
[   90.712267]  wb_workfn+0x1a8/0x4d4
[   90.715765]  process_one_work+0x1c0/0x3d4
[   90.719883]  worker_thread+0x224/0x344
[   90.723739]  kthread+0x120/0x130
[   90.727055]  ret_from_fork+0x10/0x18

I think it is better to cleanup those inodes completely before freeing sbi
and before next retry as done in this patch. Would you like to re-consider
this patch for this new issue?

Please share your comments.

Thanks,

> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
> ---
> v2:
> -call evict_inodes() and f2fs_shrink_extent_tree() to cleanup inodes
> 
>  fs/f2fs/f2fs.h     |  1 +
>  fs/f2fs/shrinker.c |  2 +-
>  fs/f2fs/super.c    | 13 ++++++++++++-
>  3 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 1e03197..aaee63b 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -3407,6 +3407,7 @@ struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root_cached *root,
>  bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi,
>  						struct rb_root_cached *root);
>  unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink);
> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi);
>  bool f2fs_init_extent_tree(struct inode *inode, struct f2fs_extent *i_ext);
>  void f2fs_drop_extent_tree(struct inode *inode);
>  unsigned int f2fs_destroy_extent_node(struct inode *inode);
> diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
> index 9e13db9..7e3c13b 100644
> --- a/fs/f2fs/shrinker.c
> +++ b/fs/f2fs/shrinker.c
> @@ -30,7 +30,7 @@ static unsigned long __count_free_nids(struct f2fs_sb_info *sbi)
>  	return count > 0 ? count : 0;
>  }
>  
> -static unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
> +unsigned long __count_extent_cache(struct f2fs_sb_info *sbi)
>  {
>  	return atomic_read(&sbi->total_zombie_tree) +
>  				atomic_read(&sbi->total_ext_node);
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index af58b2c..769e7b1 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -3016,6 +3016,16 @@ static void f2fs_tuning_parameters(struct f2fs_sb_info *sbi)
>  	sbi->readdir_ra = 1;
>  }
>  
> +static void f2fs_cleanup_inodes(struct f2fs_sb_info *sbi)
> +{
> +	struct super_block *sb = sbi->sb;
> +
> +	sync_filesystem(sb);
> +	shrink_dcache_sb(sb);
> +	evict_inodes(sb);
> +	f2fs_shrink_extent_tree(sbi, __count_extent_cache(sbi));
> +}
> +
>  static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>  {
>  	struct f2fs_sb_info *sbi;
> @@ -3402,6 +3412,8 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>  	 * falls into an infinite loop in f2fs_sync_meta_pages().
>  	 */
>  	truncate_inode_pages_final(META_MAPPING(sbi));
> +	/* cleanup recovery and quota inodes */
> +	f2fs_cleanup_inodes(sbi);
>  	f2fs_unregister_sysfs(sbi);
>  free_root_inode:
>  	dput(sb->s_root);
> @@ -3445,7 +3457,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>  	/* give only one another chance */
>  	if (retry) {
>  		retry = false;
> -		shrink_dcache_sb(sb);
>  		goto try_onemore;
>  	}
>  	return err;
> -- 
> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
> 

-- 
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] f2fs: fix sbi->extent_list corruption issue
  2019-01-04  8:05 ` Sahitya Tummala
@ 2019-01-04 20:33   ` Jaegeuk Kim
  2019-01-07  2:25     ` Chao Yu
  0 siblings, 1 reply; 21+ messages in thread
From: Jaegeuk Kim @ 2019-01-04 20:33 UTC (permalink / raw)
  To: Sahitya Tummala; +Cc: Chao Yu, linux-f2fs-devel, linux-kernel

On 01/04, Sahitya Tummala wrote:
> On Mon, Nov 26, 2018 at 10:17:20AM +0530, Sahitya Tummala wrote:
> > When there is a failure in f2fs_fill_super() after/during
> > the recovery of fsync'd nodes, it frees the current sbi and
> > retries again. This time the mount is successful, but the files
> > that got recovered before retry, still holds the extent tree,
> > whose extent nodes list is corrupted since sbi and sbi->extent_list
> > is freed up. The list_del corruption issue is observed when the
> > file system is getting unmounted and when those recoverd files extent
> > node is being freed up in the below context.
> > 
> > list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
> > <...>
> > kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
> > task: fffffff1f46f2280 task.stack: ffffff8008068000
> > lr : __list_del_entry_valid+0x94/0xb4
> > pc : __list_del_entry_valid+0x94/0xb4
> > <...>
> > Call trace:
> > __list_del_entry_valid+0x94/0xb4
> > __release_extent_node+0xb0/0x114
> > __free_extent_tree+0x58/0x7c
> > f2fs_shrink_extent_tree+0xdc/0x3b0
> > f2fs_leave_shrinker+0x28/0x7c
> > f2fs_put_super+0xfc/0x1e0
> > generic_shutdown_super+0x70/0xf4
> > kill_block_super+0x2c/0x5c
> > kill_f2fs_super+0x44/0x50
> > deactivate_locked_super+0x60/0x8c
> > deactivate_super+0x68/0x74
> > cleanup_mnt+0x40/0x78
> > __cleanup_mnt+0x1c/0x28
> > task_work_run+0x48/0xd0
> > do_notify_resume+0x678/0xe98
> > work_pending+0x8/0x14
> > 
> > Fix this by cleaning up inodes, extent tree and nodes of those
> > recovered files before freeing up sbi and before next retry.
> > 
> Hi Jaegeuk, Chao,
> 
> I have observed another scenario where the similar list corruption issue
> can happen with sbi->inode_list as well. If recover_fsync_data()
> fails at some point in write_checkpoint() due to some error and if
> those recovered inodes are still dirty, then after the mount is
> successful, this issue is observed when that dirty inode is under 
> writeback.

recover_fsync_data() does iget/iput in pair, and destroy_fsync_dnodes() drops
its dirty list and call iput(), when there is an error. So, after then, there'd
be no dirty inodes. If there's no error, checkpoint() flushes quota/dentry pages
in dirty inodes as well. Can we check where this dirty inode came from?

Oh, one sceanrio can be an error by f2fs_disable_checkpoint() which will do GC.

> 
> [   90.400500] list_del corruption. prev->next should be ffffffed1f566208, but was (null)
> [   90.675349] Call trace:
> [   90.677869]  __list_del_entry_valid+0x94/0xb4
> [   90.682351]  remove_dirty_inode+0xac/0x114
> [   90.686563]  __f2fs_write_data_pages+0x6a8/0x6c8
> [   90.691302]  f2fs_write_data_pages+0x40/0x4c
> [   90.695695]  do_writepages+0x80/0xf0
> [   90.699372]  __writeback_single_inode+0xdc/0x4ac
> [   90.704113]  writeback_sb_inodes+0x280/0x440
> [   90.708501]  wb_writeback+0x1b8/0x3d0
> [   90.712267]  wb_workfn+0x1a8/0x4d4
> [   90.715765]  process_one_work+0x1c0/0x3d4
> [   90.719883]  worker_thread+0x224/0x344
> [   90.723739]  kthread+0x120/0x130
> [   90.727055]  ret_from_fork+0x10/0x18
> 
> I think it is better to cleanup those inodes completely before freeing sbi
> and before next retry as done in this patch. Would you like to re-consider
> this patch for this new issue?

The patch was merged in mainline already.
Could you take a look at this patch?

From cb1d20e640402beed300c2bdce79311ee8a781ad Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim <jaegeuk@kernel.org>
Date: Fri, 4 Jan 2019 12:29:00 -0800
Subject: [PATCH] f2fs: sync filesystem after roll-forward recovery

Some works after roll-forward recovery can get an error which will release
all the data structures. Let's flush them in order to make it clean.

One possible corruption came from:

[   90.400500] list_del corruption. prev->next should be ffffffed1f566208, but was (null)
[   90.675349] Call trace:
[   90.677869]  __list_del_entry_valid+0x94/0xb4
[   90.682351]  remove_dirty_inode+0xac/0x114
[   90.686563]  __f2fs_write_data_pages+0x6a8/0x6c8
[   90.691302]  f2fs_write_data_pages+0x40/0x4c
[   90.695695]  do_writepages+0x80/0xf0
[   90.699372]  __writeback_single_inode+0xdc/0x4ac
[   90.704113]  writeback_sb_inodes+0x280/0x440
[   90.708501]  wb_writeback+0x1b8/0x3d0
[   90.712267]  wb_workfn+0x1a8/0x4d4
[   90.715765]  process_one_work+0x1c0/0x3d4
[   90.719883]  worker_thread+0x224/0x344
[   90.723739]  kthread+0x120/0x130
[   90.727055]  ret_from_fork+0x10/0x18

Reported-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/super.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 547cb7459be7..bb02186293a3 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -3357,7 +3357,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
 	if (test_opt(sbi, DISABLE_CHECKPOINT)) {
 		err = f2fs_disable_checkpoint(sbi);
 		if (err)
-			goto free_meta;
+			goto sync_free_meta;
 	} else if (is_set_ckpt_flags(sbi, CP_DISABLED_FLAG)) {
 		f2fs_enable_checkpoint(sbi);
 	}
@@ -3370,7 +3370,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
 		/* After POR, we can run background GC thread.*/
 		err = f2fs_start_gc_thread(sbi);
 		if (err)
-			goto free_meta;
+			goto sync_free_meta;
 	}
 	kvfree(options);
 
@@ -3392,6 +3392,10 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
 	f2fs_update_time(sbi, REQ_TIME);
 	return 0;
 
+sync_free_meta:
+	/* safe to flush all the data */
+	sync_filesystem(sbi->sb);
+
 free_meta:
 	/* flush dirty orphan inode objects */
 	f2fs_sync_inode_meta(sbi);
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] f2fs: fix sbi->extent_list corruption issue
  2019-01-04 20:33   ` Jaegeuk Kim
@ 2019-01-07  2:25     ` Chao Yu
  2019-01-09  4:38       ` Jaegeuk Kim
  0 siblings, 1 reply; 21+ messages in thread
From: Chao Yu @ 2019-01-07  2:25 UTC (permalink / raw)
  To: Jaegeuk Kim, Sahitya Tummala; +Cc: linux-f2fs-devel, linux-kernel

On 2019/1/5 4:33, Jaegeuk Kim wrote:
> On 01/04, Sahitya Tummala wrote:
>> On Mon, Nov 26, 2018 at 10:17:20AM +0530, Sahitya Tummala wrote:
>>> When there is a failure in f2fs_fill_super() after/during
>>> the recovery of fsync'd nodes, it frees the current sbi and
>>> retries again. This time the mount is successful, but the files
>>> that got recovered before retry, still holds the extent tree,
>>> whose extent nodes list is corrupted since sbi and sbi->extent_list
>>> is freed up. The list_del corruption issue is observed when the
>>> file system is getting unmounted and when those recoverd files extent
>>> node is being freed up in the below context.
>>>
>>> list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
>>> <...>
>>> kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
>>> task: fffffff1f46f2280 task.stack: ffffff8008068000
>>> lr : __list_del_entry_valid+0x94/0xb4
>>> pc : __list_del_entry_valid+0x94/0xb4
>>> <...>
>>> Call trace:
>>> __list_del_entry_valid+0x94/0xb4
>>> __release_extent_node+0xb0/0x114
>>> __free_extent_tree+0x58/0x7c
>>> f2fs_shrink_extent_tree+0xdc/0x3b0
>>> f2fs_leave_shrinker+0x28/0x7c
>>> f2fs_put_super+0xfc/0x1e0
>>> generic_shutdown_super+0x70/0xf4
>>> kill_block_super+0x2c/0x5c
>>> kill_f2fs_super+0x44/0x50
>>> deactivate_locked_super+0x60/0x8c
>>> deactivate_super+0x68/0x74
>>> cleanup_mnt+0x40/0x78
>>> __cleanup_mnt+0x1c/0x28
>>> task_work_run+0x48/0xd0
>>> do_notify_resume+0x678/0xe98
>>> work_pending+0x8/0x14
>>>
>>> Fix this by cleaning up inodes, extent tree and nodes of those
>>> recovered files before freeing up sbi and before next retry.
>>>
>> Hi Jaegeuk, Chao,
>>
>> I have observed another scenario where the similar list corruption issue
>> can happen with sbi->inode_list as well. If recover_fsync_data()
>> fails at some point in write_checkpoint() due to some error and if
>> those recovered inodes are still dirty, then after the mount is
>> successful, this issue is observed when that dirty inode is under 
>> writeback.
> 
> recover_fsync_data() does iget/iput in pair, and destroy_fsync_dnodes() drops
> its dirty list and call iput(), when there is an error. So, after then, there'd
> be no dirty inodes. If there's no error, checkpoint() flushes quota/dentry pages
> in dirty inodes as well. Can we check where this dirty inode came from?

I guess it comes from:

f2fs_recover_fsync_data()

	/* Needed for iput() to work correctly and not trash data */
	sbi->sb->s_flags |= SB_ACTIVE;

iput_final()

	if (!drop && (sb->s_flags & SB_ACTIVE)) {
		inode_add_lru(inode);
		spin_unlock(&inode->i_lock);
		return;
	}

So dirty data in those inode can be remained after iput(), then meta/node
can be persisted during next checkpoint, if checkpoint failed due to error,
dirty inode remain in system. IIUC.

> 
> Oh, one sceanrio can be an error by f2fs_disable_checkpoint() which will do GC.
> 
>>
>> [   90.400500] list_del corruption. prev->next should be ffffffed1f566208, but was (null)
>> [   90.675349] Call trace:
>> [   90.677869]  __list_del_entry_valid+0x94/0xb4
>> [   90.682351]  remove_dirty_inode+0xac/0x114
>> [   90.686563]  __f2fs_write_data_pages+0x6a8/0x6c8
>> [   90.691302]  f2fs_write_data_pages+0x40/0x4c
>> [   90.695695]  do_writepages+0x80/0xf0
>> [   90.699372]  __writeback_single_inode+0xdc/0x4ac
>> [   90.704113]  writeback_sb_inodes+0x280/0x440
>> [   90.708501]  wb_writeback+0x1b8/0x3d0
>> [   90.712267]  wb_workfn+0x1a8/0x4d4
>> [   90.715765]  process_one_work+0x1c0/0x3d4
>> [   90.719883]  worker_thread+0x224/0x344
>> [   90.723739]  kthread+0x120/0x130
>> [   90.727055]  ret_from_fork+0x10/0x18
>>
>> I think it is better to cleanup those inodes completely before freeing sbi
>> and before next retry as done in this patch. Would you like to re-consider
>> this patch for this new issue?
> 
> The patch was merged in mainline already.
> Could you take a look at this patch?
> 
>>From cb1d20e640402beed300c2bdce79311ee8a781ad Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim <jaegeuk@kernel.org>
> Date: Fri, 4 Jan 2019 12:29:00 -0800
> Subject: [PATCH] f2fs: sync filesystem after roll-forward recovery

You mean android kernel mainline?

Thanks,

> 
> Some works after roll-forward recovery can get an error which will release
> all the data structures. Let's flush them in order to make it clean.
> 
> One possible corruption came from:
> 
> [   90.400500] list_del corruption. prev->next should be ffffffed1f566208, but was (null)
> [   90.675349] Call trace:
> [   90.677869]  __list_del_entry_valid+0x94/0xb4
> [   90.682351]  remove_dirty_inode+0xac/0x114
> [   90.686563]  __f2fs_write_data_pages+0x6a8/0x6c8
> [   90.691302]  f2fs_write_data_pages+0x40/0x4c
> [   90.695695]  do_writepages+0x80/0xf0
> [   90.699372]  __writeback_single_inode+0xdc/0x4ac
> [   90.704113]  writeback_sb_inodes+0x280/0x440
> [   90.708501]  wb_writeback+0x1b8/0x3d0
> [   90.712267]  wb_workfn+0x1a8/0x4d4
> [   90.715765]  process_one_work+0x1c0/0x3d4
> [   90.719883]  worker_thread+0x224/0x344
> [   90.723739]  kthread+0x120/0x130
> [   90.727055]  ret_from_fork+0x10/0x18
> 
> Reported-by: Sahitya Tummala <stummala@codeaurora.org>
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
>  fs/f2fs/super.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 547cb7459be7..bb02186293a3 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -3357,7 +3357,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>  	if (test_opt(sbi, DISABLE_CHECKPOINT)) {
>  		err = f2fs_disable_checkpoint(sbi);
>  		if (err)
> -			goto free_meta;
> +			goto sync_free_meta;
>  	} else if (is_set_ckpt_flags(sbi, CP_DISABLED_FLAG)) {
>  		f2fs_enable_checkpoint(sbi);
>  	}
> @@ -3370,7 +3370,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>  		/* After POR, we can run background GC thread.*/
>  		err = f2fs_start_gc_thread(sbi);
>  		if (err)
> -			goto free_meta;
> +			goto sync_free_meta;
>  	}
>  	kvfree(options);
>  
> @@ -3392,6 +3392,10 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>  	f2fs_update_time(sbi, REQ_TIME);
>  	return 0;
>  
> +sync_free_meta:
> +	/* safe to flush all the data */
> +	sync_filesystem(sbi->sb);
> +
>  free_meta:
>  	/* flush dirty orphan inode objects */
>  	f2fs_sync_inode_meta(sbi);
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] f2fs: fix sbi->extent_list corruption issue
  2019-01-07  2:25     ` Chao Yu
@ 2019-01-09  4:38       ` Jaegeuk Kim
  2019-01-09  6:13         ` Chao Yu
  0 siblings, 1 reply; 21+ messages in thread
From: Jaegeuk Kim @ 2019-01-09  4:38 UTC (permalink / raw)
  To: Chao Yu; +Cc: Sahitya Tummala, linux-f2fs-devel, linux-kernel

On 01/07, Chao Yu wrote:
> On 2019/1/5 4:33, Jaegeuk Kim wrote:
> > On 01/04, Sahitya Tummala wrote:
> >> On Mon, Nov 26, 2018 at 10:17:20AM +0530, Sahitya Tummala wrote:
> >>> When there is a failure in f2fs_fill_super() after/during
> >>> the recovery of fsync'd nodes, it frees the current sbi and
> >>> retries again. This time the mount is successful, but the files
> >>> that got recovered before retry, still holds the extent tree,
> >>> whose extent nodes list is corrupted since sbi and sbi->extent_list
> >>> is freed up. The list_del corruption issue is observed when the
> >>> file system is getting unmounted and when those recoverd files extent
> >>> node is being freed up in the below context.
> >>>
> >>> list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
> >>> <...>
> >>> kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
> >>> task: fffffff1f46f2280 task.stack: ffffff8008068000
> >>> lr : __list_del_entry_valid+0x94/0xb4
> >>> pc : __list_del_entry_valid+0x94/0xb4
> >>> <...>
> >>> Call trace:
> >>> __list_del_entry_valid+0x94/0xb4
> >>> __release_extent_node+0xb0/0x114
> >>> __free_extent_tree+0x58/0x7c
> >>> f2fs_shrink_extent_tree+0xdc/0x3b0
> >>> f2fs_leave_shrinker+0x28/0x7c
> >>> f2fs_put_super+0xfc/0x1e0
> >>> generic_shutdown_super+0x70/0xf4
> >>> kill_block_super+0x2c/0x5c
> >>> kill_f2fs_super+0x44/0x50
> >>> deactivate_locked_super+0x60/0x8c
> >>> deactivate_super+0x68/0x74
> >>> cleanup_mnt+0x40/0x78
> >>> __cleanup_mnt+0x1c/0x28
> >>> task_work_run+0x48/0xd0
> >>> do_notify_resume+0x678/0xe98
> >>> work_pending+0x8/0x14
> >>>
> >>> Fix this by cleaning up inodes, extent tree and nodes of those
> >>> recovered files before freeing up sbi and before next retry.
> >>>
> >> Hi Jaegeuk, Chao,
> >>
> >> I have observed another scenario where the similar list corruption issue
> >> can happen with sbi->inode_list as well. If recover_fsync_data()
> >> fails at some point in write_checkpoint() due to some error and if
> >> those recovered inodes are still dirty, then after the mount is
> >> successful, this issue is observed when that dirty inode is under 
> >> writeback.
> > 
> > recover_fsync_data() does iget/iput in pair, and destroy_fsync_dnodes() drops
> > its dirty list and call iput(), when there is an error. So, after then, there'd
> > be no dirty inodes. If there's no error, checkpoint() flushes quota/dentry pages
> > in dirty inodes as well. Can we check where this dirty inode came from?
> 
> I guess it comes from:
> 
> f2fs_recover_fsync_data()
> 
> 	/* Needed for iput() to work correctly and not trash data */
> 	sbi->sb->s_flags |= SB_ACTIVE;
> 
> iput_final()
> 
> 	if (!drop && (sb->s_flags & SB_ACTIVE)) {
> 		inode_add_lru(inode);
> 		spin_unlock(&inode->i_lock);
> 		return;
> 	}
> 
> So dirty data in those inode can be remained after iput(), then meta/node
> can be persisted during next checkpoint, if checkpoint failed due to error,
> dirty inode remain in system. IIUC.


749         err = recover_data(sbi, &inode_list, &tmp_inode_list, &dir_list);
750         if (!err)
751                 f2fs_bug_on(sbi, !list_empty(&inode_list));
752         else {
753                 /* restore s_flags to let iput() trash data */
754                 sbi->sb->s_flags = s_flags;
755         }

We deactivate sb before iput?

> 
> > 
> > Oh, one sceanrio can be an error by f2fs_disable_checkpoint() which will do GC.
> > 
> >>
> >> [   90.400500] list_del corruption. prev->next should be ffffffed1f566208, but was (null)
> >> [   90.675349] Call trace:
> >> [   90.677869]  __list_del_entry_valid+0x94/0xb4
> >> [   90.682351]  remove_dirty_inode+0xac/0x114
> >> [   90.686563]  __f2fs_write_data_pages+0x6a8/0x6c8
> >> [   90.691302]  f2fs_write_data_pages+0x40/0x4c
> >> [   90.695695]  do_writepages+0x80/0xf0
> >> [   90.699372]  __writeback_single_inode+0xdc/0x4ac
> >> [   90.704113]  writeback_sb_inodes+0x280/0x440
> >> [   90.708501]  wb_writeback+0x1b8/0x3d0
> >> [   90.712267]  wb_workfn+0x1a8/0x4d4
> >> [   90.715765]  process_one_work+0x1c0/0x3d4
> >> [   90.719883]  worker_thread+0x224/0x344
> >> [   90.723739]  kthread+0x120/0x130
> >> [   90.727055]  ret_from_fork+0x10/0x18
> >>
> >> I think it is better to cleanup those inodes completely before freeing sbi
> >> and before next retry as done in this patch. Would you like to re-consider
> >> this patch for this new issue?
> > 
> > The patch was merged in mainline already.
> > Could you take a look at this patch?
> > 
> >>From cb1d20e640402beed300c2bdce79311ee8a781ad Mon Sep 17 00:00:00 2001
> > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > Date: Fri, 4 Jan 2019 12:29:00 -0800
> > Subject: [PATCH] f2fs: sync filesystem after roll-forward recovery
> 
> You mean android kernel mainline?

I meant the previous patch was upstreamed. We need another patch to address this
second issue.

Thanks,

> 
> Thanks,
> 
> > 
> > Some works after roll-forward recovery can get an error which will release
> > all the data structures. Let's flush them in order to make it clean.
> > 
> > One possible corruption came from:
> > 
> > [   90.400500] list_del corruption. prev->next should be ffffffed1f566208, but was (null)
> > [   90.675349] Call trace:
> > [   90.677869]  __list_del_entry_valid+0x94/0xb4
> > [   90.682351]  remove_dirty_inode+0xac/0x114
> > [   90.686563]  __f2fs_write_data_pages+0x6a8/0x6c8
> > [   90.691302]  f2fs_write_data_pages+0x40/0x4c
> > [   90.695695]  do_writepages+0x80/0xf0
> > [   90.699372]  __writeback_single_inode+0xdc/0x4ac
> > [   90.704113]  writeback_sb_inodes+0x280/0x440
> > [   90.708501]  wb_writeback+0x1b8/0x3d0
> > [   90.712267]  wb_workfn+0x1a8/0x4d4
> > [   90.715765]  process_one_work+0x1c0/0x3d4
> > [   90.719883]  worker_thread+0x224/0x344
> > [   90.723739]  kthread+0x120/0x130
> > [   90.727055]  ret_from_fork+0x10/0x18
> > 
> > Reported-by: Sahitya Tummala <stummala@codeaurora.org>
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > ---
> >  fs/f2fs/super.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > index 547cb7459be7..bb02186293a3 100644
> > --- a/fs/f2fs/super.c
> > +++ b/fs/f2fs/super.c
> > @@ -3357,7 +3357,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >  	if (test_opt(sbi, DISABLE_CHECKPOINT)) {
> >  		err = f2fs_disable_checkpoint(sbi);
> >  		if (err)
> > -			goto free_meta;
> > +			goto sync_free_meta;
> >  	} else if (is_set_ckpt_flags(sbi, CP_DISABLED_FLAG)) {
> >  		f2fs_enable_checkpoint(sbi);
> >  	}
> > @@ -3370,7 +3370,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >  		/* After POR, we can run background GC thread.*/
> >  		err = f2fs_start_gc_thread(sbi);
> >  		if (err)
> > -			goto free_meta;
> > +			goto sync_free_meta;
> >  	}
> >  	kvfree(options);
> >  
> > @@ -3392,6 +3392,10 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >  	f2fs_update_time(sbi, REQ_TIME);
> >  	return 0;
> >  
> > +sync_free_meta:
> > +	/* safe to flush all the data */
> > +	sync_filesystem(sbi->sb);
> > +
> >  free_meta:
> >  	/* flush dirty orphan inode objects */
> >  	f2fs_sync_inode_meta(sbi);
> > 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] f2fs: fix sbi->extent_list corruption issue
  2019-01-09  4:38       ` Jaegeuk Kim
@ 2019-01-09  6:13         ` Chao Yu
  2019-01-10  6:39           ` Sahitya Tummala
  0 siblings, 1 reply; 21+ messages in thread
From: Chao Yu @ 2019-01-09  6:13 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: Sahitya Tummala, linux-f2fs-devel, linux-kernel

On 2019/1/9 12:38, Jaegeuk Kim wrote:
> On 01/07, Chao Yu wrote:
>> On 2019/1/5 4:33, Jaegeuk Kim wrote:
>>> On 01/04, Sahitya Tummala wrote:
>>>> On Mon, Nov 26, 2018 at 10:17:20AM +0530, Sahitya Tummala wrote:
>>>>> When there is a failure in f2fs_fill_super() after/during
>>>>> the recovery of fsync'd nodes, it frees the current sbi and
>>>>> retries again. This time the mount is successful, but the files
>>>>> that got recovered before retry, still holds the extent tree,
>>>>> whose extent nodes list is corrupted since sbi and sbi->extent_list
>>>>> is freed up. The list_del corruption issue is observed when the
>>>>> file system is getting unmounted and when those recoverd files extent
>>>>> node is being freed up in the below context.
>>>>>
>>>>> list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
>>>>> <...>
>>>>> kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
>>>>> task: fffffff1f46f2280 task.stack: ffffff8008068000
>>>>> lr : __list_del_entry_valid+0x94/0xb4
>>>>> pc : __list_del_entry_valid+0x94/0xb4
>>>>> <...>
>>>>> Call trace:
>>>>> __list_del_entry_valid+0x94/0xb4
>>>>> __release_extent_node+0xb0/0x114
>>>>> __free_extent_tree+0x58/0x7c
>>>>> f2fs_shrink_extent_tree+0xdc/0x3b0
>>>>> f2fs_leave_shrinker+0x28/0x7c
>>>>> f2fs_put_super+0xfc/0x1e0
>>>>> generic_shutdown_super+0x70/0xf4
>>>>> kill_block_super+0x2c/0x5c
>>>>> kill_f2fs_super+0x44/0x50
>>>>> deactivate_locked_super+0x60/0x8c
>>>>> deactivate_super+0x68/0x74
>>>>> cleanup_mnt+0x40/0x78
>>>>> __cleanup_mnt+0x1c/0x28
>>>>> task_work_run+0x48/0xd0
>>>>> do_notify_resume+0x678/0xe98
>>>>> work_pending+0x8/0x14
>>>>>
>>>>> Fix this by cleaning up inodes, extent tree and nodes of those
>>>>> recovered files before freeing up sbi and before next retry.
>>>>>
>>>> Hi Jaegeuk, Chao,
>>>>
>>>> I have observed another scenario where the similar list corruption issue
>>>> can happen with sbi->inode_list as well. If recover_fsync_data()
>>>> fails at some point in write_checkpoint() due to some error and if
>>>> those recovered inodes are still dirty, then after the mount is
>>>> successful, this issue is observed when that dirty inode is under 
>>>> writeback.
>>>
>>> recover_fsync_data() does iget/iput in pair, and destroy_fsync_dnodes() drops
>>> its dirty list and call iput(), when there is an error. So, after then, there'd
>>> be no dirty inodes. If there's no error, checkpoint() flushes quota/dentry pages
>>> in dirty inodes as well. Can we check where this dirty inode came from?
>>
>> I guess it comes from:
>>
>> f2fs_recover_fsync_data()
>>
>> 	/* Needed for iput() to work correctly and not trash data */
>> 	sbi->sb->s_flags |= SB_ACTIVE;
>>
>> iput_final()
>>
>> 	if (!drop && (sb->s_flags & SB_ACTIVE)) {
>> 		inode_add_lru(inode);
>> 		spin_unlock(&inode->i_lock);
>> 		return;
>> 	}
>>
>> So dirty data in those inode can be remained after iput(), then meta/node
>> can be persisted during next checkpoint, if checkpoint failed due to error,
>> dirty inode remain in system. IIUC.
> 
> 
> 749         err = recover_data(sbi, &inode_list, &tmp_inode_list, &dir_list);
> 750         if (!err)
> 751                 f2fs_bug_on(sbi, !list_empty(&inode_list));
> 752         else {
> 753                 /* restore s_flags to let iput() trash data */
> 754                 sbi->sb->s_flags = s_flags;
> 755         }
> 
> We deactivate sb before iput?

We only restore s_flags in error path of recover_data? I remember Sahitya
said his case is encountering error in later checkpoint()?

Thanks,

> 
>>
>>>
>>> Oh, one sceanrio can be an error by f2fs_disable_checkpoint() which will do GC.
>>>
>>>>
>>>> [   90.400500] list_del corruption. prev->next should be ffffffed1f566208, but was (null)
>>>> [   90.675349] Call trace:
>>>> [   90.677869]  __list_del_entry_valid+0x94/0xb4
>>>> [   90.682351]  remove_dirty_inode+0xac/0x114
>>>> [   90.686563]  __f2fs_write_data_pages+0x6a8/0x6c8
>>>> [   90.691302]  f2fs_write_data_pages+0x40/0x4c
>>>> [   90.695695]  do_writepages+0x80/0xf0
>>>> [   90.699372]  __writeback_single_inode+0xdc/0x4ac
>>>> [   90.704113]  writeback_sb_inodes+0x280/0x440
>>>> [   90.708501]  wb_writeback+0x1b8/0x3d0
>>>> [   90.712267]  wb_workfn+0x1a8/0x4d4
>>>> [   90.715765]  process_one_work+0x1c0/0x3d4
>>>> [   90.719883]  worker_thread+0x224/0x344
>>>> [   90.723739]  kthread+0x120/0x130
>>>> [   90.727055]  ret_from_fork+0x10/0x18
>>>>
>>>> I think it is better to cleanup those inodes completely before freeing sbi
>>>> and before next retry as done in this patch. Would you like to re-consider
>>>> this patch for this new issue?
>>>
>>> The patch was merged in mainline already.
>>> Could you take a look at this patch?
>>>
>>> >From cb1d20e640402beed300c2bdce79311ee8a781ad Mon Sep 17 00:00:00 2001
>>> From: Jaegeuk Kim <jaegeuk@kernel.org>
>>> Date: Fri, 4 Jan 2019 12:29:00 -0800
>>> Subject: [PATCH] f2fs: sync filesystem after roll-forward recovery
>>
>> You mean android kernel mainline?
> 
> I meant the previous patch was upstreamed. We need another patch to address this
> second issue.
> 
> Thanks,
> 
>>
>> Thanks,
>>
>>>
>>> Some works after roll-forward recovery can get an error which will release
>>> all the data structures. Let's flush them in order to make it clean.
>>>
>>> One possible corruption came from:
>>>
>>> [   90.400500] list_del corruption. prev->next should be ffffffed1f566208, but was (null)
>>> [   90.675349] Call trace:
>>> [   90.677869]  __list_del_entry_valid+0x94/0xb4
>>> [   90.682351]  remove_dirty_inode+0xac/0x114
>>> [   90.686563]  __f2fs_write_data_pages+0x6a8/0x6c8
>>> [   90.691302]  f2fs_write_data_pages+0x40/0x4c
>>> [   90.695695]  do_writepages+0x80/0xf0
>>> [   90.699372]  __writeback_single_inode+0xdc/0x4ac
>>> [   90.704113]  writeback_sb_inodes+0x280/0x440
>>> [   90.708501]  wb_writeback+0x1b8/0x3d0
>>> [   90.712267]  wb_workfn+0x1a8/0x4d4
>>> [   90.715765]  process_one_work+0x1c0/0x3d4
>>> [   90.719883]  worker_thread+0x224/0x344
>>> [   90.723739]  kthread+0x120/0x130
>>> [   90.727055]  ret_from_fork+0x10/0x18
>>>
>>> Reported-by: Sahitya Tummala <stummala@codeaurora.org>
>>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
>>> ---
>>>  fs/f2fs/super.c | 8 ++++++--
>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>> index 547cb7459be7..bb02186293a3 100644
>>> --- a/fs/f2fs/super.c
>>> +++ b/fs/f2fs/super.c
>>> @@ -3357,7 +3357,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>>  	if (test_opt(sbi, DISABLE_CHECKPOINT)) {
>>>  		err = f2fs_disable_checkpoint(sbi);
>>>  		if (err)
>>> -			goto free_meta;
>>> +			goto sync_free_meta;
>>>  	} else if (is_set_ckpt_flags(sbi, CP_DISABLED_FLAG)) {
>>>  		f2fs_enable_checkpoint(sbi);
>>>  	}
>>> @@ -3370,7 +3370,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>>  		/* After POR, we can run background GC thread.*/
>>>  		err = f2fs_start_gc_thread(sbi);
>>>  		if (err)
>>> -			goto free_meta;
>>> +			goto sync_free_meta;
>>>  	}
>>>  	kvfree(options);
>>>  
>>> @@ -3392,6 +3392,10 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>>  	f2fs_update_time(sbi, REQ_TIME);
>>>  	return 0;
>>>  
>>> +sync_free_meta:
>>> +	/* safe to flush all the data */
>>> +	sync_filesystem(sbi->sb);
>>> +
>>>  free_meta:
>>>  	/* flush dirty orphan inode objects */
>>>  	f2fs_sync_inode_meta(sbi);
>>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] f2fs: fix sbi->extent_list corruption issue
  2019-01-09  6:13         ` Chao Yu
@ 2019-01-10  6:39           ` Sahitya Tummala
  0 siblings, 0 replies; 21+ messages in thread
From: Sahitya Tummala @ 2019-01-10  6:39 UTC (permalink / raw)
  To: Chao Yu; +Cc: Jaegeuk Kim, linux-f2fs-devel, linux-kernel

On Wed, Jan 09, 2019 at 02:13:02PM +0800, Chao Yu wrote:
> On 2019/1/9 12:38, Jaegeuk Kim wrote:
> > On 01/07, Chao Yu wrote:
> >> On 2019/1/5 4:33, Jaegeuk Kim wrote:
> >>> On 01/04, Sahitya Tummala wrote:
> >>>> On Mon, Nov 26, 2018 at 10:17:20AM +0530, Sahitya Tummala wrote:
> >>>>> When there is a failure in f2fs_fill_super() after/during
> >>>>> the recovery of fsync'd nodes, it frees the current sbi and
> >>>>> retries again. This time the mount is successful, but the files
> >>>>> that got recovered before retry, still holds the extent tree,
> >>>>> whose extent nodes list is corrupted since sbi and sbi->extent_list
> >>>>> is freed up. The list_del corruption issue is observed when the
> >>>>> file system is getting unmounted and when those recoverd files extent
> >>>>> node is being freed up in the below context.
> >>>>>
> >>>>> list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
> >>>>> <...>
> >>>>> kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
> >>>>> task: fffffff1f46f2280 task.stack: ffffff8008068000
> >>>>> lr : __list_del_entry_valid+0x94/0xb4
> >>>>> pc : __list_del_entry_valid+0x94/0xb4
> >>>>> <...>
> >>>>> Call trace:
> >>>>> __list_del_entry_valid+0x94/0xb4
> >>>>> __release_extent_node+0xb0/0x114
> >>>>> __free_extent_tree+0x58/0x7c
> >>>>> f2fs_shrink_extent_tree+0xdc/0x3b0
> >>>>> f2fs_leave_shrinker+0x28/0x7c
> >>>>> f2fs_put_super+0xfc/0x1e0
> >>>>> generic_shutdown_super+0x70/0xf4
> >>>>> kill_block_super+0x2c/0x5c
> >>>>> kill_f2fs_super+0x44/0x50
> >>>>> deactivate_locked_super+0x60/0x8c
> >>>>> deactivate_super+0x68/0x74
> >>>>> cleanup_mnt+0x40/0x78
> >>>>> __cleanup_mnt+0x1c/0x28
> >>>>> task_work_run+0x48/0xd0
> >>>>> do_notify_resume+0x678/0xe98
> >>>>> work_pending+0x8/0x14
> >>>>>
> >>>>> Fix this by cleaning up inodes, extent tree and nodes of those
> >>>>> recovered files before freeing up sbi and before next retry.
> >>>>>
> >>>> Hi Jaegeuk, Chao,
> >>>>
> >>>> I have observed another scenario where the similar list corruption issue
> >>>> can happen with sbi->inode_list as well. If recover_fsync_data()
> >>>> fails at some point in write_checkpoint() due to some error and if
> >>>> those recovered inodes are still dirty, then after the mount is
> >>>> successful, this issue is observed when that dirty inode is under 
> >>>> writeback.
> >>>
> >>> recover_fsync_data() does iget/iput in pair, and destroy_fsync_dnodes() drops
> >>> its dirty list and call iput(), when there is an error. So, after then, there'd
> >>> be no dirty inodes. If there's no error, checkpoint() flushes quota/dentry pages
> >>> in dirty inodes as well. Can we check where this dirty inode came from?
> >>
> >> I guess it comes from:
> >>
> >> f2fs_recover_fsync_data()
> >>
> >> 	/* Needed for iput() to work correctly and not trash data */
> >> 	sbi->sb->s_flags |= SB_ACTIVE;
> >>
> >> iput_final()
> >>
> >> 	if (!drop && (sb->s_flags & SB_ACTIVE)) {
> >> 		inode_add_lru(inode);
> >> 		spin_unlock(&inode->i_lock);
> >> 		return;
> >> 	}
> >>
> >> So dirty data in those inode can be remained after iput(), then meta/node
> >> can be persisted during next checkpoint, if checkpoint failed due to error,
> >> dirty inode remain in system. IIUC.
> > 
> > 
> > 749         err = recover_data(sbi, &inode_list, &tmp_inode_list, &dir_list);
> > 750         if (!err)
> > 751                 f2fs_bug_on(sbi, !list_empty(&inode_list));
> > 752         else {
> > 753                 /* restore s_flags to let iput() trash data */
> > 754                 sbi->sb->s_flags = s_flags;
> > 755         }
> > 
> > We deactivate sb before iput?
> 
> We only restore s_flags in error path of recover_data? I remember Sahitya
> said his case is encountering error in later checkpoint()?
> 

Thanks Chao and Jaegeuk for taking a look at this.
I might be wrong earlier on the point of error.
The case that Jaegeuk has addressed below looks good for now.
https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev&id=cd2dcebde8b8088b8ad77184ee3a1e81fd3da6ba

> Thanks,
> 
> > 
> >>
> >>>
> >>> Oh, one sceanrio can be an error by f2fs_disable_checkpoint() which will do GC.
> >>>
> >>>>
> >>>> [   90.400500] list_del corruption. prev->next should be ffffffed1f566208, but was (null)
> >>>> [   90.675349] Call trace:
> >>>> [   90.677869]  __list_del_entry_valid+0x94/0xb4
> >>>> [   90.682351]  remove_dirty_inode+0xac/0x114
> >>>> [   90.686563]  __f2fs_write_data_pages+0x6a8/0x6c8
> >>>> [   90.691302]  f2fs_write_data_pages+0x40/0x4c
> >>>> [   90.695695]  do_writepages+0x80/0xf0
> >>>> [   90.699372]  __writeback_single_inode+0xdc/0x4ac
> >>>> [   90.704113]  writeback_sb_inodes+0x280/0x440
> >>>> [   90.708501]  wb_writeback+0x1b8/0x3d0
> >>>> [   90.712267]  wb_workfn+0x1a8/0x4d4
> >>>> [   90.715765]  process_one_work+0x1c0/0x3d4
> >>>> [   90.719883]  worker_thread+0x224/0x344
> >>>> [   90.723739]  kthread+0x120/0x130
> >>>> [   90.727055]  ret_from_fork+0x10/0x18
> >>>>
> >>>> I think it is better to cleanup those inodes completely before freeing sbi
> >>>> and before next retry as done in this patch. Would you like to re-consider
> >>>> this patch for this new issue?
> >>>
> >>> The patch was merged in mainline already.
> >>> Could you take a look at this patch?
> >>>
> >>> >From cb1d20e640402beed300c2bdce79311ee8a781ad Mon Sep 17 00:00:00 2001
> >>> From: Jaegeuk Kim <jaegeuk@kernel.org>
> >>> Date: Fri, 4 Jan 2019 12:29:00 -0800
> >>> Subject: [PATCH] f2fs: sync filesystem after roll-forward recovery
> >>
> >> You mean android kernel mainline?
> > 
> > I meant the previous patch was upstreamed. We need another patch to address this
> > second issue.
> > 
> > Thanks,
> > 
> >>
> >> Thanks,
> >>
> >>>
> >>> Some works after roll-forward recovery can get an error which will release
> >>> all the data structures. Let's flush them in order to make it clean.
> >>>
> >>> One possible corruption came from:
> >>>
> >>> [   90.400500] list_del corruption. prev->next should be ffffffed1f566208, but was (null)
> >>> [   90.675349] Call trace:
> >>> [   90.677869]  __list_del_entry_valid+0x94/0xb4
> >>> [   90.682351]  remove_dirty_inode+0xac/0x114
> >>> [   90.686563]  __f2fs_write_data_pages+0x6a8/0x6c8
> >>> [   90.691302]  f2fs_write_data_pages+0x40/0x4c
> >>> [   90.695695]  do_writepages+0x80/0xf0
> >>> [   90.699372]  __writeback_single_inode+0xdc/0x4ac
> >>> [   90.704113]  writeback_sb_inodes+0x280/0x440
> >>> [   90.708501]  wb_writeback+0x1b8/0x3d0
> >>> [   90.712267]  wb_workfn+0x1a8/0x4d4
> >>> [   90.715765]  process_one_work+0x1c0/0x3d4
> >>> [   90.719883]  worker_thread+0x224/0x344
> >>> [   90.723739]  kthread+0x120/0x130
> >>> [   90.727055]  ret_from_fork+0x10/0x18
> >>>
> >>> Reported-by: Sahitya Tummala <stummala@codeaurora.org>
> >>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> >>> ---
> >>>  fs/f2fs/super.c | 8 ++++++--
> >>>  1 file changed, 6 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> >>> index 547cb7459be7..bb02186293a3 100644
> >>> --- a/fs/f2fs/super.c
> >>> +++ b/fs/f2fs/super.c
> >>> @@ -3357,7 +3357,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>>  	if (test_opt(sbi, DISABLE_CHECKPOINT)) {
> >>>  		err = f2fs_disable_checkpoint(sbi);
> >>>  		if (err)
> >>> -			goto free_meta;
> >>> +			goto sync_free_meta;
> >>>  	} else if (is_set_ckpt_flags(sbi, CP_DISABLED_FLAG)) {
> >>>  		f2fs_enable_checkpoint(sbi);
> >>>  	}
> >>> @@ -3370,7 +3370,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>>  		/* After POR, we can run background GC thread.*/
> >>>  		err = f2fs_start_gc_thread(sbi);
> >>>  		if (err)
> >>> -			goto free_meta;
> >>> +			goto sync_free_meta;
> >>>  	}
> >>>  	kvfree(options);
> >>>  
> >>> @@ -3392,6 +3392,10 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>>  	f2fs_update_time(sbi, REQ_TIME);
> >>>  	return 0;
> >>>  
> >>> +sync_free_meta:
> >>> +	/* safe to flush all the data */
> >>> +	sync_filesystem(sbi->sb);
> >>> +
> >>>  free_meta:
> >>>  	/* flush dirty orphan inode objects */
> >>>  	f2fs_sync_inode_meta(sbi);
> >>>
> > 
> > .
> > 
> 

-- 
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-01-10  6:39 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-26  4:47 [PATCH v2] f2fs: fix sbi->extent_list corruption issue Sahitya Tummala
2018-11-26  7:17 ` Chao Yu
2018-11-27  0:30 ` Jaegeuk Kim
2018-11-27  1:42   ` Chao Yu
2018-11-29  3:32     ` Sahitya Tummala
2018-11-30 20:33       ` Jaegeuk Kim
2018-12-07  9:47         ` Chao Yu
2018-12-12  3:17           ` Sahitya Tummala
2018-12-12  3:36             ` Chao Yu
2018-12-14  7:56               ` Sahitya Tummala
2018-12-14 14:25                 ` Jaegeuk Kim
2018-12-18 10:28                   ` Chao Yu
2018-12-18 22:47                     ` Jaegeuk Kim
2018-12-19  3:43                       ` Chao Yu
2018-12-19 23:42                         ` Jaegeuk Kim
2019-01-04  8:05 ` Sahitya Tummala
2019-01-04 20:33   ` Jaegeuk Kim
2019-01-07  2:25     ` Chao Yu
2019-01-09  4:38       ` Jaegeuk Kim
2019-01-09  6:13         ` Chao Yu
2019-01-10  6:39           ` Sahitya Tummala

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).