Re: [PATCH 1/2] f2fs: remove stale inode entry before eviction from gdirty_list

From: Ritesh Harjani <riteshh@codeaurora.org>
To: Chao Yu <yuchao0@huawei.com>, Jaegeuk Kim <jaegeuk@kernel.org>
Cc: linux-f2fs-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/2] f2fs: remove stale inode entry before eviction from gdirty_list
Date: Fri, 23 Nov 2018 16:37:30 +0530	[thread overview]
Message-ID: <b46e161b-a6b2-1fac-32bc-fab37fedbe90@codeaurora.org> (raw)
In-Reply-To: <c3cc8092-9093-f6d6-7150-48a782f9de50@huawei.com>

Hi Chao,

On 11/21/2018 9:00 AM, Chao Yu wrote:
> Hi Ritesh,
>
> On 2018/11/20 18:33, Ritesh Harjani wrote:
>> On 11/20/2018 2:33 AM, Jaegeuk Kim wrote:
>>> On 11/19, Ritesh Harjani wrote:
>>>> Hi Chao,
>>>>
>>>> On 11/19/2018 12:09 PM, Chao Yu wrote:
>>>>> Hi Ritesh,
>>>>>
>>>>> On 2018/11/19 14:02, Ritesh Harjani wrote:
>>>>>> This is seen when CP_ERROR_FLAG is not set & FS may be corrupted.
>>>>>> There is a case observed where dirty stale inode pointer data is still
>>>>>> present in the gdirty_list causing panic on access while doing
>>>>>> checkpoint operation.
>>>>>>
>>>>>> WARNING: CPU: 3 PID: 1827 at
>>>>>> kernel/msm-4.14/fs/f2fs/inode.c:567
>>>>>> f2fs_evict_inode+0x364/0x37c
>>>>>> <...>
>>>>>> [42246.776289] BUG: spinlock bad magic on CPU#4, 1245
>>>>>> [42246.782674] Unable to handle kernel paging request at virtual address 6b6b6b6b6b713b
>>>>>> <...>
>>>>>> [42246.896370] task: ffffffc0f0434080 task.stack: ffffff8023ea0000
>>>>>> [42246.902465] pc : spin_bug+0x80/0xb8
>>>>>> [42246.906055] lr : spin_bug+0x64/0xb8
>>>>>> <...>
>>>>>> [42247.122346] Call trace:
>>>>>> [42247.124876]  spin_bug+0x80/0xb8
>>>>>> [42247.128110]  do_raw_spin_lock+0xe8/0x118
>>>>>> [42247.132144]  _raw_spin_lock+0x24/0x30
>>>>>> [42247.135916]  igrab+0x20/0x6c
>>>>>> [42247.138894]  f2fs_sync_inode_meta+0x58/0xc0
>>>>>> [42247.143199]  write_checkpoint+0x1c4/0xecc
>>>>>> [42247.147322]  f2fs_sync_fs+0x118/0x170
>>>>>> [42247.151096]  f2fs_do_sync_file+0x4f0/0x798
>>>>>> [42247.155311]  f2fs_sync_file+0x54/0x6c
>>>>>> [42247.159087]  vfs_fsync_range+0x90/0xac
>>>>>> [42247.162950]  vfs_fsync+0x2c/0x38
>>>>>> [42247.166278]  do_fsync+0x3c/0x78
>>>>>> [42247.169515]  SyS_fdatasync+0x20/0x30
>>>>>>
>>>>>> Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
>>>>>> ---
>>>>>>     fs/f2fs/inode.c | 10 ++++++----
>>>>>>     1 file changed, 6 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
>>>>>> index 91ceee0..c57f636 100644
>>>>>> --- a/fs/f2fs/inode.c
>>>>>> +++ b/fs/f2fs/inode.c
>>>>>> @@ -702,11 +702,13 @@ void f2fs_evict_inode(struct inode *inode)
>>>>>>     	stat_dec_inline_dir(inode);
>>>>>>     	stat_dec_inline_inode(inode);
>>>>>> -	if (likely(!is_set_ckpt_flags(sbi, CP_ERROR_FLAG) &&
>>>>>> -				!is_sbi_flag_set(sbi, SBI_CP_DISABLED)))
>>>>>> -		f2fs_bug_on(sbi, is_inode_flag_set(inode, FI_DIRTY_INODE));
>>>>>> -	else
>>>>>> +	if (unlikely(is_inode_flag_set(inode, FI_DIRTY_INODE))) {
>>>>>>     		f2fs_inode_synced(inode);
>>>>>> +		f2fs_msg(sbi->sb, KERN_WARNING,
>>>>>> +			 "inconsistent dirty inode:%u entry found during eviction\n",
>>>>>> +			 inode->i_ino);
>>>>>> +		f2fs_bug_on(sbi, 1);
>>>>> IIRC, Jaegeuk added below condition to avoid f2fs_bug_on during doing test
>>>>> w/ checkpoint error injection, if we remove this, we may still encounter
>>>>> such problem.
>>>>>
>>>>> if (likely(!is_set_ckpt_flags(sbi, CP_ERROR_FLAG)))
>>>> Ok, agreed. Does below sounds good then?
>>>> The idea is to go ahead and call f2fs_inode_synced(inode) when the inode
>>>> FI_DIRTY_INODE flag is set irrespective of CP_ERROR_FLAG set or not. Because
>>>> otherwise there is a stale inode entry which will
>>>> remain in gdirty_list => causing kernel panic in checkpoint path.
>>>>
>>>> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
>>>> index 91ceee0..00915c2 100644
>>>> --- a/fs/f2fs/inode.c
>>>> +++ b/fs/f2fs/inode.c
>>>> @@ -702,11 +702,15 @@ void f2fs_evict_inode(struct inode *inode)
>>>>           stat_dec_inline_dir(inode);
>>>>           stat_dec_inline_inode(inode);
>>>>
>>>> -       if (likely(!is_set_ckpt_flags(sbi, CP_ERROR_FLAG) &&
>>>> -                               !is_sbi_flag_set(sbi, SBI_CP_DISABLED)))
>>>> -               f2fs_bug_on(sbi, is_inode_flag_set(inode, FI_DIRTY_INODE));
>>>> -       else
>>>> +       if (unlikely(is_inode_flag_set(inode, FI_DIRTY_INODE))) {
>>>>                   f2fs_inode_synced(inode);
>>>> +               f2fs_msg(sbi->sb, KERN_WARNING,
>>>> +                        "inconsistent dirty inode:%u entry found during
>>>> eviction\n",
>>>> +                        inode->i_ino);
>>>> +               if ((!is_set_ckpt_flags(sbi, CP_ERROR_FLAG) &&
>>>> +                               !is_sbi_flag_set(sbi, SBI_CP_DISABLED)))
>>>> +                       f2fs_bug_on(sbi, 1);
>>>> +       }
>>>>
>>>>> So I'd like to know what kind of case can make dirty inode during evict(),
>>>>> can you explain more?
>>>> Yes, we also could not get exact path about when this can happen. But below
>>>> are the parallel ongoing contexts when the issue is seen:-
>>>> We do suspect that there is something already wrong in the FS even before
>>>> when this issue occurred (due to "inconsistent node block" logs). This could
>>>> be due to some other underlying storage driver issue.
>>>> Not sure though.
>>>>
>>>> 1. unlinkat operation causing f2fs_evict_inode causing a warning (as
>>>> mentioned in the commit text).
>>>> 2. echo 3 > /proc/sys/vm/drop_caches.
>>>> 3. Rename operation on some file.
>>>> 4. vfs_fsync -> doing checkpointing => this causes a kernel panic since
>>>> stale inode entry is still present which got freed from f2fs_evict_inode
>>>> path.
>>>>
>>>> Some error logs which were seen long before this issue occurred.
>>>> [42219.089810] F2FS-fs (mmcblk0p75): inconsistent node block, nid:50018, node_footer[nid:50327,ino:50327,ofs:0,cpver:12146505996282773182,blkaddr:515484]
>>>> [42219.104281] F2FS-fs (mmcblk0p75): inconsistent node block, nid:49836, node_footer[nid:50228,ino:50228,ofs:0,cpver:10230247294256961017,blkaddr:1496346]
>>>> [42219.118723] F2FS-fs (mmcblk0p75): inconsistent node block, nid:50327, node_footer[nid:0,ino:0,ofs:0,cpver:0,blkaddr:0]
>>>> [42219.130782] F2FS-fs (mmcblk0p75): inconsistent node block, nid:50228, node_footer[nid:0,ino:0,ofs:0,cpver:0,blkaddr:0]
>>> I've concerned that this patch hides the bug and makes the partition being more
>>> corrupted. We have to figure out where such the node block were generated.
> I agree with Jaegeuk, we'd better to figure out root cause of this issue,
> instead of hiding it.
Sure I will try and see if I can get the disk dump to validate few things.
But this may take a while, since FS itself is not a suspect here, but 
this is due to some other subsystem.
On disabling those, we don't see any "inconsistent node block" logs.
>
>> This issue seems to be mostly due to problem from underlying storage
>> driver. As of now the suspect
>> is not from f2fs since we don't see this issue on a different
>> configuration of storage driver.
>> The above patch is mainly to fix a kernel crash when the system/FS has
>> actually gone bad due to some other subsystem causing corruption.
>> We thought it is a good fix to have because anyways if there is a dirty
>> inode found during eviction,
>> then we should remove any stale entries of it from RAM data structures,
>> before freeing the inode (with a warning msg).
> Thanks for the explanation, I can understand your concern, if you worry
> about further corruption, would it be better to stop checkpoint in addition
> for this case in qcom's kernel?
Sure. I was only thinking in case if there is exist a situation where 
stale inode entries are present,
instead of kernel panic, we can remove those entries while we can during 
f2fs_evict_inode and give a pr_warn msg.

>
>>> How many patches have you cherry-picked? Which kernel version are you using?
>> kernel 4.14. Yes we do have all the patches pulled from android aosp.
> You mean code under below tag?
>
> https://android.googlesource.com/kernel/common/+/ASB-2018-11-05_4.14
It is below tag:-
https://android.googlesource.com/kernel/common.git/+/refs/tags/ASB-2018-11-05_4.14-p

Thanks,
Ritesh
>
> Thanks,
>
>>> Have you enabled ICE?
>> Yes. ICE is enabled.
>>
>>
>> Thanks
>> Ritesh
>>
>>> Thanks,
>>>
>>>
>>>> Thanks,
>>>>
>>>>> Thanks,
>>>>>
>>>>>> +	}
>>>>>>     	/* ino == 0, if f2fs_new_inode() was failed t*/
>>>>>>     	if (inode->i_ino)
>>>>>>
>>
>> .
>>

_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel