linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] f2fs: avoid race condition for shinker count
@ 2020-11-09 17:00 Jaegeuk Kim
  2020-11-10  2:15 ` [f2fs-dev] " Chao Yu
  2020-11-12  5:34 ` [PATCH v2] " Jaegeuk Kim
  0 siblings, 2 replies; 13+ messages in thread
From: Jaegeuk Kim @ 2020-11-09 17:00 UTC (permalink / raw)
  To: linux-kernel, linux-f2fs-devel, kernel-team; +Cc: Jaegeuk Kim, Light Hsieh

Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in
wrong do_shinker work. Basically the two counts should not happen like that.

So, I suspect this race condtion where:
- f2fs_try_to_free_nats            __flush_nat_entry_set
 nat_cnt=2, dirty_nat_cnt=2
                                   __clear_nat_cache_dirty
                                    spin_lock(nat_list_lock)
                                    list_move()
                                    spin_unlock(nat_list_lock)
 spin_lock(nat_list_lock)
 list_del()
 spin_unlock(nat_list_lock)
 nat_cnt=1, dirty_nat_cnt=2
                                   nat_cnt=1, dirty_nat_cnt=1

Reported-by: Light Hsieh <Light.Hsieh@mediatek.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/node.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 42394de6c7eb..e8ec65e40f06 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -269,11 +269,10 @@ static void __clear_nat_cache_dirty(struct f2fs_nm_info *nm_i,
 {
 	spin_lock(&nm_i->nat_list_lock);
 	list_move_tail(&ne->list, &nm_i->nat_entries);
-	spin_unlock(&nm_i->nat_list_lock);
-
 	set_nat_flag(ne, IS_DIRTY, false);
 	set->entry_cnt--;
 	nm_i->dirty_nat_cnt--;
+	spin_unlock(&nm_i->nat_list_lock);
 }
 
 static unsigned int __gang_lookup_nat_set(struct f2fs_nm_info *nm_i,
-- 
2.29.2.222.g5d2a92d10f8-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [f2fs-dev] [PATCH] f2fs: avoid race condition for shinker count
  2020-11-09 17:00 [PATCH] f2fs: avoid race condition for shinker count Jaegeuk Kim
@ 2020-11-10  2:15 ` Chao Yu
  2020-11-10  4:19   ` Jaegeuk Kim
  2020-11-12  5:34 ` [PATCH v2] " Jaegeuk Kim
  1 sibling, 1 reply; 13+ messages in thread
From: Chao Yu @ 2020-11-10  2:15 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-kernel, linux-f2fs-devel, kernel-team; +Cc: Light Hsieh

On 2020/11/10 1:00, Jaegeuk Kim wrote:
> Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in

I didn't get the problem clearly, did you mean __count_nat_entries() will
give the wrong shrink count due to race condition? should there be a lock
while reading these two variables?

> wrong do_shinker work. Basically the two counts should not happen like that.
> 
> So, I suspect this race condtion where:
> - f2fs_try_to_free_nats            __flush_nat_entry_set
>   nat_cnt=2, dirty_nat_cnt=2
>                                     __clear_nat_cache_dirty
>                                      spin_lock(nat_list_lock)
>                                      list_move()
>                                      spin_unlock(nat_list_lock)
>   spin_lock(nat_list_lock)
>   list_del()
>   spin_unlock(nat_list_lock)
>   nat_cnt=1, dirty_nat_cnt=2
>                                     nat_cnt=1, dirty_nat_cnt=1

nm_i->nat_cnt and nm_i->dirty_nat_cnt were protected by
nm_i->nat_tree_lock, I didn't see why expanding nat_list_lock range
will help... since there are still places nat_list_lock() didn't
cover these two reference counts.

Thanks,

> 
> Reported-by: Light Hsieh <Light.Hsieh@mediatek.com>
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
>   fs/f2fs/node.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index 42394de6c7eb..e8ec65e40f06 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -269,11 +269,10 @@ static void __clear_nat_cache_dirty(struct f2fs_nm_info *nm_i,
>   {
>   	spin_lock(&nm_i->nat_list_lock);
>   	list_move_tail(&ne->list, &nm_i->nat_entries);
> -	spin_unlock(&nm_i->nat_list_lock);
> -
>   	set_nat_flag(ne, IS_DIRTY, false);
>   	set->entry_cnt--;
>   	nm_i->dirty_nat_cnt--;
> +	spin_unlock(&nm_i->nat_list_lock);
>   }
>   
>   static unsigned int __gang_lookup_nat_set(struct f2fs_nm_info *nm_i,
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [f2fs-dev] [PATCH] f2fs: avoid race condition for shinker count
  2020-11-10  2:15 ` [f2fs-dev] " Chao Yu
@ 2020-11-10  4:19   ` Jaegeuk Kim
  2020-11-10  6:04     ` Chao Yu
  0 siblings, 1 reply; 13+ messages in thread
From: Jaegeuk Kim @ 2020-11-10  4:19 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-kernel, linux-f2fs-devel, kernel-team, Light Hsieh

On 11/10, Chao Yu wrote:
> On 2020/11/10 1:00, Jaegeuk Kim wrote:
> > Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in
> 
> I didn't get the problem clearly, did you mean __count_nat_entries() will
> give the wrong shrink count due to race condition? should there be a lock
> while reading these two variables?
> 
> > wrong do_shinker work. Basically the two counts should not happen like that.
> > 
> > So, I suspect this race condtion where:
> > - f2fs_try_to_free_nats            __flush_nat_entry_set
> >   nat_cnt=2, dirty_nat_cnt=2
> >                                     __clear_nat_cache_dirty
> >                                      spin_lock(nat_list_lock)
> >                                      list_move()
> >                                      spin_unlock(nat_list_lock)
> >   spin_lock(nat_list_lock)
> >   list_del()
> >   spin_unlock(nat_list_lock)
> >   nat_cnt=1, dirty_nat_cnt=2
> >                                     nat_cnt=1, dirty_nat_cnt=1
> 
> nm_i->nat_cnt and nm_i->dirty_nat_cnt were protected by
> nm_i->nat_tree_lock, I didn't see why expanding nat_list_lock range
> will help... since there are still places nat_list_lock() didn't
> cover these two reference counts.

Yeah, I missed nat_tree_lock, and indeed it should cover this. So, the problem
is Light reported subtle case of nat_cnt < dirty_nat_cnt in shrink_count.
We may need to use nat_tree_lock in shrink_count?

> 
> Thanks,
> 
> > 
> > Reported-by: Light Hsieh <Light.Hsieh@mediatek.com>
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > ---
> >   fs/f2fs/node.c | 3 +--
> >   1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > index 42394de6c7eb..e8ec65e40f06 100644
> > --- a/fs/f2fs/node.c
> > +++ b/fs/f2fs/node.c
> > @@ -269,11 +269,10 @@ static void __clear_nat_cache_dirty(struct f2fs_nm_info *nm_i,
> >   {
> >   	spin_lock(&nm_i->nat_list_lock);
> >   	list_move_tail(&ne->list, &nm_i->nat_entries);
> > -	spin_unlock(&nm_i->nat_list_lock);
> > -
> >   	set_nat_flag(ne, IS_DIRTY, false);
> >   	set->entry_cnt--;
> >   	nm_i->dirty_nat_cnt--;
> > +	spin_unlock(&nm_i->nat_list_lock);
> >   }
> >   static unsigned int __gang_lookup_nat_set(struct f2fs_nm_info *nm_i,
> > 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [f2fs-dev] [PATCH] f2fs: avoid race condition for shinker count
  2020-11-10  4:19   ` Jaegeuk Kim
@ 2020-11-10  6:04     ` Chao Yu
  2020-11-12  5:31       ` Jaegeuk Kim
  0 siblings, 1 reply; 13+ messages in thread
From: Chao Yu @ 2020-11-10  6:04 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-kernel, linux-f2fs-devel, kernel-team, Light Hsieh

On 2020/11/10 12:19, Jaegeuk Kim wrote:
> On 11/10, Chao Yu wrote:
>> On 2020/11/10 1:00, Jaegeuk Kim wrote:
>>> Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in
>>
>> I didn't get the problem clearly, did you mean __count_nat_entries() will
>> give the wrong shrink count due to race condition? should there be a lock
>> while reading these two variables?
>>
>>> wrong do_shinker work. Basically the two counts should not happen like that.
>>>
>>> So, I suspect this race condtion where:
>>> - f2fs_try_to_free_nats            __flush_nat_entry_set
>>>    nat_cnt=2, dirty_nat_cnt=2
>>>                                      __clear_nat_cache_dirty
>>>                                       spin_lock(nat_list_lock)
>>>                                       list_move()
>>>                                       spin_unlock(nat_list_lock)
>>>    spin_lock(nat_list_lock)
>>>    list_del()
>>>    spin_unlock(nat_list_lock)
>>>    nat_cnt=1, dirty_nat_cnt=2
>>>                                      nat_cnt=1, dirty_nat_cnt=1
>>
>> nm_i->nat_cnt and nm_i->dirty_nat_cnt were protected by
>> nm_i->nat_tree_lock, I didn't see why expanding nat_list_lock range
>> will help... since there are still places nat_list_lock() didn't
>> cover these two reference counts.
> 
> Yeah, I missed nat_tree_lock, and indeed it should cover this. So, the problem
> is Light reported subtle case of nat_cnt < dirty_nat_cnt in shrink_count.
> We may need to use nat_tree_lock in shrink_count?

change like this?

__count_nat_entries()

	down_read(&nm_i->nat_tree_lock);
	count = NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;
	up_read(&nm_i->nat_tree_lock);

Thanks,

> 
>>
>> Thanks,
>>
>>>
>>> Reported-by: Light Hsieh <Light.Hsieh@mediatek.com>
>>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
>>> ---
>>>    fs/f2fs/node.c | 3 +--
>>>    1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
>>> index 42394de6c7eb..e8ec65e40f06 100644
>>> --- a/fs/f2fs/node.c
>>> +++ b/fs/f2fs/node.c
>>> @@ -269,11 +269,10 @@ static void __clear_nat_cache_dirty(struct f2fs_nm_info *nm_i,
>>>    {
>>>    	spin_lock(&nm_i->nat_list_lock);
>>>    	list_move_tail(&ne->list, &nm_i->nat_entries);
>>> -	spin_unlock(&nm_i->nat_list_lock);
>>> -
>>>    	set_nat_flag(ne, IS_DIRTY, false);
>>>    	set->entry_cnt--;
>>>    	nm_i->dirty_nat_cnt--;
>>> +	spin_unlock(&nm_i->nat_list_lock);
>>>    }
>>>    static unsigned int __gang_lookup_nat_set(struct f2fs_nm_info *nm_i,
>>>
> .
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [f2fs-dev] [PATCH] f2fs: avoid race condition for shinker count
  2020-11-10  6:04     ` Chao Yu
@ 2020-11-12  5:31       ` Jaegeuk Kim
  0 siblings, 0 replies; 13+ messages in thread
From: Jaegeuk Kim @ 2020-11-12  5:31 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-kernel, linux-f2fs-devel, kernel-team, Light Hsieh

On 11/10, Chao Yu wrote:
> On 2020/11/10 12:19, Jaegeuk Kim wrote:
> > On 11/10, Chao Yu wrote:
> > > On 2020/11/10 1:00, Jaegeuk Kim wrote:
> > > > Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in
> > > 
> > > I didn't get the problem clearly, did you mean __count_nat_entries() will
> > > give the wrong shrink count due to race condition? should there be a lock
> > > while reading these two variables?
> > > 
> > > > wrong do_shinker work. Basically the two counts should not happen like that.
> > > > 
> > > > So, I suspect this race condtion where:
> > > > - f2fs_try_to_free_nats            __flush_nat_entry_set
> > > >    nat_cnt=2, dirty_nat_cnt=2
> > > >                                      __clear_nat_cache_dirty
> > > >                                       spin_lock(nat_list_lock)
> > > >                                       list_move()
> > > >                                       spin_unlock(nat_list_lock)
> > > >    spin_lock(nat_list_lock)
> > > >    list_del()
> > > >    spin_unlock(nat_list_lock)
> > > >    nat_cnt=1, dirty_nat_cnt=2
> > > >                                      nat_cnt=1, dirty_nat_cnt=1
> > > 
> > > nm_i->nat_cnt and nm_i->dirty_nat_cnt were protected by
> > > nm_i->nat_tree_lock, I didn't see why expanding nat_list_lock range
> > > will help... since there are still places nat_list_lock() didn't
> > > cover these two reference counts.
> > 
> > Yeah, I missed nat_tree_lock, and indeed it should cover this. So, the problem
> > is Light reported subtle case of nat_cnt < dirty_nat_cnt in shrink_count.
> > We may need to use nat_tree_lock in shrink_count?
> 
> change like this?
> 

Yup.

> __count_nat_entries()
> 
> 	down_read(&nm_i->nat_tree_lock);
> 	count = NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;
> 	up_read(&nm_i->nat_tree_lock);
> 
> Thanks,
> 
> > 
> > > 
> > > Thanks,
> > > 
> > > > 
> > > > Reported-by: Light Hsieh <Light.Hsieh@mediatek.com>
> > > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > > > ---
> > > >    fs/f2fs/node.c | 3 +--
> > > >    1 file changed, 1 insertion(+), 2 deletions(-)
> > > > 
> > > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > > > index 42394de6c7eb..e8ec65e40f06 100644
> > > > --- a/fs/f2fs/node.c
> > > > +++ b/fs/f2fs/node.c
> > > > @@ -269,11 +269,10 @@ static void __clear_nat_cache_dirty(struct f2fs_nm_info *nm_i,
> > > >    {
> > > >    	spin_lock(&nm_i->nat_list_lock);
> > > >    	list_move_tail(&ne->list, &nm_i->nat_entries);
> > > > -	spin_unlock(&nm_i->nat_list_lock);
> > > > -
> > > >    	set_nat_flag(ne, IS_DIRTY, false);
> > > >    	set->entry_cnt--;
> > > >    	nm_i->dirty_nat_cnt--;
> > > > +	spin_unlock(&nm_i->nat_list_lock);
> > > >    }
> > > >    static unsigned int __gang_lookup_nat_set(struct f2fs_nm_info *nm_i,
> > > > 
> > .
> > 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] f2fs: avoid race condition for shinker count
  2020-11-09 17:00 [PATCH] f2fs: avoid race condition for shinker count Jaegeuk Kim
  2020-11-10  2:15 ` [f2fs-dev] " Chao Yu
@ 2020-11-12  5:34 ` Jaegeuk Kim
  2020-11-12  5:40   ` [f2fs-dev] [PATCH v3] " Jaegeuk Kim
  1 sibling, 1 reply; 13+ messages in thread
From: Jaegeuk Kim @ 2020-11-12  5:34 UTC (permalink / raw)
  To: linux-kernel, linux-f2fs-devel, kernel-team; +Cc: Light Hsieh

Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in
wrong do_shinker work. Let's avoid to get stale data by using nat_tree_lock.

Reported-by: Light Hsieh <Light.Hsieh@mediatek.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/shrinker.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
index d66de5999a26..d42245ab07f4 100644
--- a/fs/f2fs/shrinker.c
+++ b/fs/f2fs/shrinker.c
@@ -18,7 +18,11 @@ static unsigned int shrinker_run_no;
 
 static unsigned long __count_nat_entries(struct f2fs_sb_info *sbi)
 {
-	long count = NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;
+	long count;
+
+	down_read(&nm_i->nat_tree_lock);
+	count = NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;
+	up_read(&nm_i->nat_tree_lock);
 
 	return count > 0 ? count : 0;
 }
-- 
2.29.2.299.gdc1121823c-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [f2fs-dev] [PATCH v3] f2fs: avoid race condition for shinker count
  2020-11-12  5:34 ` [PATCH v2] " Jaegeuk Kim
@ 2020-11-12  5:40   ` Jaegeuk Kim
  2020-11-12  6:57     ` Chao Yu
  2020-12-03  6:07     ` Jaegeuk Kim
  0 siblings, 2 replies; 13+ messages in thread
From: Jaegeuk Kim @ 2020-11-12  5:40 UTC (permalink / raw)
  To: linux-kernel, linux-f2fs-devel, kernel-team; +Cc: Light Hsieh

Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in
wrong do_shinker work. Let's avoid to get stale data by using nat_tree_lock.

Reported-by: Light Hsieh <Light.Hsieh@mediatek.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
v3:
 - fix to use NM_I(sbi)

 fs/f2fs/shrinker.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
index d66de5999a26..555712ba06d8 100644
--- a/fs/f2fs/shrinker.c
+++ b/fs/f2fs/shrinker.c
@@ -18,7 +18,11 @@ static unsigned int shrinker_run_no;
 
 static unsigned long __count_nat_entries(struct f2fs_sb_info *sbi)
 {
-	long count = NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;
+	long count;
+
+	down_read(&NM_I(sbi)->nat_tree_lock);
+	count = NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;
+	up_read(&NM_I(sbi)->nat_tree_lock);
 
 	return count > 0 ? count : 0;
 }
-- 
2.29.2.299.gdc1121823c-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [f2fs-dev] [PATCH v3] f2fs: avoid race condition for shinker count
  2020-11-12  5:40   ` [f2fs-dev] [PATCH v3] " Jaegeuk Kim
@ 2020-11-12  6:57     ` Chao Yu
  2020-11-17 16:40       ` Jaegeuk Kim
  2020-12-03  6:07     ` Jaegeuk Kim
  1 sibling, 1 reply; 13+ messages in thread
From: Chao Yu @ 2020-11-12  6:57 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-kernel, linux-f2fs-devel, kernel-team; +Cc: Light Hsieh

On 2020/11/12 13:40, Jaegeuk Kim wrote:
> Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in
> wrong do_shinker work. Let's avoid to get stale data by using nat_tree_lock.
> 
> Reported-by: Light Hsieh <Light.Hsieh@mediatek.com>
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
> v3:
>   - fix to use NM_I(sbi)
> 
>   fs/f2fs/shrinker.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
> index d66de5999a26..555712ba06d8 100644
> --- a/fs/f2fs/shrinker.c
> +++ b/fs/f2fs/shrinker.c
> @@ -18,7 +18,11 @@ static unsigned int shrinker_run_no;
>   
>   static unsigned long __count_nat_entries(struct f2fs_sb_info *sbi)
>   {
> -	long count = NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;
> +	long count;
> +
> +	down_read(&NM_I(sbi)->nat_tree_lock);
> +	count = NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;
> +	up_read(&NM_I(sbi)->nat_tree_lock);
>   
>   	return count > 0 ? count : 0;

Minor thing,

Just return count is enough? since count should never be smaller than 0.

Anyway,

Reviewed-by: Chao Yu <yuchao0@huawei.com>

Thanks,

Thanks,

>   }
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [f2fs-dev] [PATCH v3] f2fs: avoid race condition for shinker count
  2020-11-12  6:57     ` Chao Yu
@ 2020-11-17 16:40       ` Jaegeuk Kim
  0 siblings, 0 replies; 13+ messages in thread
From: Jaegeuk Kim @ 2020-11-17 16:40 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-kernel, linux-f2fs-devel, kernel-team, Light Hsieh

On 11/12, Chao Yu wrote:
> On 2020/11/12 13:40, Jaegeuk Kim wrote:
> > Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in
> > wrong do_shinker work. Let's avoid to get stale data by using nat_tree_lock.
> > 
> > Reported-by: Light Hsieh <Light.Hsieh@mediatek.com>
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > ---
> > v3:
> >   - fix to use NM_I(sbi)
> > 
> >   fs/f2fs/shrinker.c | 6 +++++-
> >   1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
> > index d66de5999a26..555712ba06d8 100644
> > --- a/fs/f2fs/shrinker.c
> > +++ b/fs/f2fs/shrinker.c
> > @@ -18,7 +18,11 @@ static unsigned int shrinker_run_no;
> >   static unsigned long __count_nat_entries(struct f2fs_sb_info *sbi)
> >   {
> > -	long count = NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;
> > +	long count;
> > +
> > +	down_read(&NM_I(sbi)->nat_tree_lock);
> > +	count = NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;
> > +	up_read(&NM_I(sbi)->nat_tree_lock);
> >   	return count > 0 ? count : 0;
> 
> Minor thing,
> 
> Just return count is enough? since count should never be smaller than 0.

Yeah, but let me keep this just in case.

> 
> Anyway,
> 
> Reviewed-by: Chao Yu <yuchao0@huawei.com>
> 
> Thanks,
> 
> Thanks,
> 
> >   }
> > 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [f2fs-dev] [PATCH v3] f2fs: avoid race condition for shinker count
  2020-11-12  5:40   ` [f2fs-dev] [PATCH v3] " Jaegeuk Kim
  2020-11-12  6:57     ` Chao Yu
@ 2020-12-03  6:07     ` Jaegeuk Kim
  2020-12-03  6:55       ` Chao Yu
  1 sibling, 1 reply; 13+ messages in thread
From: Jaegeuk Kim @ 2020-12-03  6:07 UTC (permalink / raw)
  To: linux-kernel, linux-f2fs-devel, kernel-team; +Cc: Light Hsieh

On 11/11, Jaegeuk Kim wrote:
> Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in
> wrong do_shinker work. Let's avoid to get stale data by using nat_tree_lock.
> 
> Reported-by: Light Hsieh <Light.Hsieh@mediatek.com>
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
> v3:
>  - fix to use NM_I(sbi)
> 
>  fs/f2fs/shrinker.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
> index d66de5999a26..555712ba06d8 100644
> --- a/fs/f2fs/shrinker.c
> +++ b/fs/f2fs/shrinker.c
> @@ -18,7 +18,11 @@ static unsigned int shrinker_run_no;
>  
>  static unsigned long __count_nat_entries(struct f2fs_sb_info *sbi)
>  {
> -	long count = NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;
> +	long count;
> +
> +	down_read(&NM_I(sbi)->nat_tree_lock);
> +	count = NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;
> +	up_read(&NM_I(sbi)->nat_tree_lock);

I just fosund this can give kernel hang due to the following backtrace.
f2fs_shrink_count
shrink_slab
shrink_node
do_try_to_free_pages
try_to_free_pages
__alloc_pages_nodemask
alloc_pages_current
allocate_slab
__slab_alloc
__slab_alloc
kmem_cache_alloc
add_free_nid
f2fs_flush_nat_entries
f2fs_write_checkpoint

Let me just check like this.

From 971163330224449d90aac90957ea38f77d494f0f Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim <jaegeuk@kernel.org>
Date: Fri, 6 Nov 2020 13:22:05 -0800
Subject: [PATCH] f2fs: avoid race condition for shrinker count

Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in
wrong do_shinker work. Let's avoid to return insane overflowed value.

Reported-by: Light Hsieh <Light.Hsieh@mediatek.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/shrinker.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
index d66de5999a26..75b5b4aaed99 100644
--- a/fs/f2fs/shrinker.c
+++ b/fs/f2fs/shrinker.c
@@ -18,9 +18,9 @@ static unsigned int shrinker_run_no;
 
 static unsigned long __count_nat_entries(struct f2fs_sb_info *sbi)
 {
-	long count = NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;
-
-	return count > 0 ? count : 0;
+	if (NM_I(sbi)->nat_cnt > NM_I(sbi)->dirty_nat_cnt)
+		return NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;
+	return 0;
 }
 
 static unsigned long __count_free_nids(struct f2fs_sb_info *sbi)
-- 
2.29.2.454.gaff20da3a2-goog



>  
>  	return count > 0 ? count : 0;
>  }
> -- 
> 2.29.2.299.gdc1121823c-goog
> 
> 
> 
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [f2fs-dev] [PATCH v3] f2fs: avoid race condition for shinker count
  2020-12-03  6:07     ` Jaegeuk Kim
@ 2020-12-03  6:55       ` Chao Yu
  2020-12-03  7:55         ` Jaegeuk Kim
  0 siblings, 1 reply; 13+ messages in thread
From: Chao Yu @ 2020-12-03  6:55 UTC (permalink / raw)
  To: Jaegeuk Kim, linux-kernel, linux-f2fs-devel, kernel-team; +Cc: Light Hsieh

On 2020/12/3 14:07, Jaegeuk Kim wrote:
> On 11/11, Jaegeuk Kim wrote:
>> Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in
>> wrong do_shinker work. Let's avoid to get stale data by using nat_tree_lock.
>>
>> Reported-by: Light Hsieh <Light.Hsieh@mediatek.com>
>> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
>> ---
>> v3:
>>   - fix to use NM_I(sbi)
>>
>>   fs/f2fs/shrinker.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
>> index d66de5999a26..555712ba06d8 100644
>> --- a/fs/f2fs/shrinker.c
>> +++ b/fs/f2fs/shrinker.c
>> @@ -18,7 +18,11 @@ static unsigned int shrinker_run_no;
>>   
>>   static unsigned long __count_nat_entries(struct f2fs_sb_info *sbi)
>>   {
>> -	long count = NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;
>> +	long count;
>> +
>> +	down_read(&NM_I(sbi)->nat_tree_lock);
>> +	count = NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;
>> +	up_read(&NM_I(sbi)->nat_tree_lock);
> 
> I just fosund this can give kernel hang due to the following backtrace.
> f2fs_shrink_count
> shrink_slab
> shrink_node
> do_try_to_free_pages
> try_to_free_pages
> __alloc_pages_nodemask
> alloc_pages_current
> allocate_slab
> __slab_alloc
> __slab_alloc
> kmem_cache_alloc
> add_free_nid
> f2fs_flush_nat_entries
> f2fs_write_checkpoint

Oh, I missed that case.

> 
> Let me just check like this.
> 
>>From 971163330224449d90aac90957ea38f77d494f0f Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim <jaegeuk@kernel.org>
> Date: Fri, 6 Nov 2020 13:22:05 -0800
> Subject: [PATCH] f2fs: avoid race condition for shrinker count
> 
> Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in
> wrong do_shinker work. Let's avoid to return insane overflowed value.
> 
> Reported-by: Light Hsieh <Light.Hsieh@mediatek.com>
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
>   fs/f2fs/shrinker.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
> index d66de5999a26..75b5b4aaed99 100644
> --- a/fs/f2fs/shrinker.c
> +++ b/fs/f2fs/shrinker.c
> @@ -18,9 +18,9 @@ static unsigned int shrinker_run_no;
>   
>   static unsigned long __count_nat_entries(struct f2fs_sb_info *sbi)
>   {
> -	long count = NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;
> -
> -	return count > 0 ? count : 0;
> +	if (NM_I(sbi)->nat_cnt > NM_I(sbi)->dirty_nat_cnt)
> +		return NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;

Hmm.. in the case that we are not in checkpoint progress, nat_cnt and dirty_nat_cnt
is mutable, how can we make sure the calculation is non-negative after the check
condition? :(

Thanks

> +	return 0;
>   }
>   
>   static unsigned long __count_free_nids(struct f2fs_sb_info *sbi)
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [f2fs-dev] [PATCH v3] f2fs: avoid race condition for shinker count
  2020-12-03  6:55       ` Chao Yu
@ 2020-12-03  7:55         ` Jaegeuk Kim
  2020-12-03  8:16           ` Chao Yu
  0 siblings, 1 reply; 13+ messages in thread
From: Jaegeuk Kim @ 2020-12-03  7:55 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-kernel, linux-f2fs-devel, kernel-team, Light Hsieh

On 12/03, Chao Yu wrote:
> On 2020/12/3 14:07, Jaegeuk Kim wrote:
> > On 11/11, Jaegeuk Kim wrote:
> > > Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in
> > > wrong do_shinker work. Let's avoid to get stale data by using nat_tree_lock.
> > > 
> > > Reported-by: Light Hsieh <Light.Hsieh@mediatek.com>
> > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > > ---
> > > v3:
> > >   - fix to use NM_I(sbi)
> > > 
> > >   fs/f2fs/shrinker.c | 6 +++++-
> > >   1 file changed, 5 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
> > > index d66de5999a26..555712ba06d8 100644
> > > --- a/fs/f2fs/shrinker.c
> > > +++ b/fs/f2fs/shrinker.c
> > > @@ -18,7 +18,11 @@ static unsigned int shrinker_run_no;
> > >   static unsigned long __count_nat_entries(struct f2fs_sb_info *sbi)
> > >   {
> > > -	long count = NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;
> > > +	long count;
> > > +
> > > +	down_read(&NM_I(sbi)->nat_tree_lock);
> > > +	count = NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;
> > > +	up_read(&NM_I(sbi)->nat_tree_lock);
> > 
> > I just fosund this can give kernel hang due to the following backtrace.
> > f2fs_shrink_count
> > shrink_slab
> > shrink_node
> > do_try_to_free_pages
> > try_to_free_pages
> > __alloc_pages_nodemask
> > alloc_pages_current
> > allocate_slab
> > __slab_alloc
> > __slab_alloc
> > kmem_cache_alloc
> > add_free_nid
> > f2fs_flush_nat_entries
> > f2fs_write_checkpoint
> 
> Oh, I missed that case.
> 
> > 
> > Let me just check like this.
> > 
> > > From 971163330224449d90aac90957ea38f77d494f0f Mon Sep 17 00:00:00 2001
> > From: Jaegeuk Kim <jaegeuk@kernel.org>
> > Date: Fri, 6 Nov 2020 13:22:05 -0800
> > Subject: [PATCH] f2fs: avoid race condition for shrinker count
> > 
> > Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in
> > wrong do_shinker work. Let's avoid to return insane overflowed value.
> > 
> > Reported-by: Light Hsieh <Light.Hsieh@mediatek.com>
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > ---
> >   fs/f2fs/shrinker.c | 6 +++---
> >   1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
> > index d66de5999a26..75b5b4aaed99 100644
> > --- a/fs/f2fs/shrinker.c
> > +++ b/fs/f2fs/shrinker.c
> > @@ -18,9 +18,9 @@ static unsigned int shrinker_run_no;
> >   static unsigned long __count_nat_entries(struct f2fs_sb_info *sbi)
> >   {
> > -	long count = NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;
> > -
> > -	return count > 0 ? count : 0;
> > +	if (NM_I(sbi)->nat_cnt > NM_I(sbi)->dirty_nat_cnt)
> > +		return NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;
> 
> Hmm.. in the case that we are not in checkpoint progress, nat_cnt and dirty_nat_cnt
> is mutable, how can we make sure the calculation is non-negative after the check
> condition? :(

How about adding another variable to monitor it?

From bdc5a805487f0210df7ef4e85ce5a4f0471bca72 Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim <jaegeuk@kernel.org>
Date: Fri, 6 Nov 2020 13:22:05 -0800
Subject: [PATCH] f2fs: avoid race condition for shrinker count

Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in
wrong do_shinker work. Let's avoid to return insane overflowed value by adding
single tracking value.

Reported-by: Light Hsieh <Light.Hsieh@mediatek.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/checkpoint.c |  2 +-
 fs/f2fs/debug.c      | 11 ++++++-----
 fs/f2fs/f2fs.h       | 10 ++++++++--
 fs/f2fs/node.c       | 29 ++++++++++++++++++-----------
 fs/f2fs/node.h       |  4 ++--
 fs/f2fs/shrinker.c   |  4 +---
 6 files changed, 36 insertions(+), 24 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 14ba1519639e..617d0f6b0836 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1619,7 +1619,7 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
 			goto out;
 		}
 
-		if (NM_I(sbi)->dirty_nat_cnt == 0 &&
+		if (NM_I(sbi)->nat_cnt[DIRTY_NAT] == 0 &&
 				SIT_I(sbi)->dirty_sentries == 0 &&
 				prefree_segments(sbi) == 0) {
 			f2fs_flush_sit_entries(sbi, cpc);
diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index a8357fd4f5fa..197c914119da 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -145,8 +145,8 @@ static void update_general_status(struct f2fs_sb_info *sbi)
 		si->node_pages = NODE_MAPPING(sbi)->nrpages;
 	if (sbi->meta_inode)
 		si->meta_pages = META_MAPPING(sbi)->nrpages;
-	si->nats = NM_I(sbi)->nat_cnt;
-	si->dirty_nats = NM_I(sbi)->dirty_nat_cnt;
+	si->nats = NM_I(sbi)->nat_cnt[TOTAL_NAT];
+	si->dirty_nats = NM_I(sbi)->nat_cnt[DIRTY_NAT];
 	si->sits = MAIN_SEGS(sbi);
 	si->dirty_sits = SIT_I(sbi)->dirty_sentries;
 	si->free_nids = NM_I(sbi)->nid_cnt[FREE_NID];
@@ -278,9 +278,10 @@ static void update_mem_info(struct f2fs_sb_info *sbi)
 	si->cache_mem += (NM_I(sbi)->nid_cnt[FREE_NID] +
 				NM_I(sbi)->nid_cnt[PREALLOC_NID]) *
 				sizeof(struct free_nid);
-	si->cache_mem += NM_I(sbi)->nat_cnt * sizeof(struct nat_entry);
-	si->cache_mem += NM_I(sbi)->dirty_nat_cnt *
-					sizeof(struct nat_entry_set);
+	si->cache_mem += NM_I(sbi)->nat_cnt[TOTAL_NAT] *
+				sizeof(struct nat_entry);
+	si->cache_mem += NM_I(sbi)->nat_cnt[DIRTY_NAT] *
+				sizeof(struct nat_entry_set);
 	si->cache_mem += si->inmem_pages * sizeof(struct inmem_pages);
 	for (i = 0; i < MAX_INO_ENTRY; i++)
 		si->cache_mem += sbi->im[i].ino_num * sizeof(struct ino_entry);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 4417f791dbc6..78394ea91717 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -818,6 +818,13 @@ enum nid_state {
 	MAX_NID_STATE,
 };
 
+enum nat_state {
+	TOTAL_NAT,
+	DIRTY_NAT,
+	RECLAIMABLE_NAT,
+	MAX_NAT_STATE,
+};
+
 struct f2fs_nm_info {
 	block_t nat_blkaddr;		/* base disk address of NAT */
 	nid_t max_nid;			/* maximum possible node ids */
@@ -833,8 +840,7 @@ struct f2fs_nm_info {
 	struct rw_semaphore nat_tree_lock;	/* protect nat_tree_lock */
 	struct list_head nat_entries;	/* cached nat entry list (clean) */
 	spinlock_t nat_list_lock;	/* protect clean nat entry list */
-	unsigned int nat_cnt;		/* the # of cached nat entries */
-	unsigned int dirty_nat_cnt;	/* total num of nat entries in set */
+	unsigned int nat_cnt[MAX_NAT_STATE]; /* the # of cached nat entries */
 	unsigned int nat_blocks;	/* # of nat blocks */
 
 	/* free node ids management */
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 42394de6c7eb..e65d73293a3f 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -62,8 +62,8 @@ bool f2fs_available_free_memory(struct f2fs_sb_info *sbi, int type)
 				sizeof(struct free_nid)) >> PAGE_SHIFT;
 		res = mem_size < ((avail_ram * nm_i->ram_thresh / 100) >> 2);
 	} else if (type == NAT_ENTRIES) {
-		mem_size = (nm_i->nat_cnt * sizeof(struct nat_entry)) >>
-							PAGE_SHIFT;
+		mem_size = (nm_i->nat_cnt[TOTAL_NAT] *
+				sizeof(struct nat_entry)) >> PAGE_SHIFT;
 		res = mem_size < ((avail_ram * nm_i->ram_thresh / 100) >> 2);
 		if (excess_cached_nats(sbi))
 			res = false;
@@ -177,7 +177,8 @@ static struct nat_entry *__init_nat_entry(struct f2fs_nm_info *nm_i,
 	list_add_tail(&ne->list, &nm_i->nat_entries);
 	spin_unlock(&nm_i->nat_list_lock);
 
-	nm_i->nat_cnt++;
+	nm_i->nat_cnt[TOTAL_NAT]++;
+	nm_i->nat_cnt[RECLAIMABLE_NAT]++;
 	return ne;
 }
 
@@ -207,7 +208,8 @@ static unsigned int __gang_lookup_nat_cache(struct f2fs_nm_info *nm_i,
 static void __del_from_nat_cache(struct f2fs_nm_info *nm_i, struct nat_entry *e)
 {
 	radix_tree_delete(&nm_i->nat_root, nat_get_nid(e));
-	nm_i->nat_cnt--;
+	nm_i->nat_cnt[TOTAL_NAT]--;
+	nm_i->nat_cnt[RECLAIMABLE_NAT]--;
 	__free_nat_entry(e);
 }
 
@@ -253,7 +255,8 @@ static void __set_nat_cache_dirty(struct f2fs_nm_info *nm_i,
 	if (get_nat_flag(ne, IS_DIRTY))
 		goto refresh_list;
 
-	nm_i->dirty_nat_cnt++;
+	nm_i->nat_cnt[DIRTY_NAT]++;
+	nm_i->nat_cnt[RECLAIMABLE_NAT]--;
 	set_nat_flag(ne, IS_DIRTY, true);
 refresh_list:
 	spin_lock(&nm_i->nat_list_lock);
@@ -273,7 +276,8 @@ static void __clear_nat_cache_dirty(struct f2fs_nm_info *nm_i,
 
 	set_nat_flag(ne, IS_DIRTY, false);
 	set->entry_cnt--;
-	nm_i->dirty_nat_cnt--;
+	nm_i->nat_cnt[DIRTY_NAT]--;
+	nm_i->nat_cnt[RECLAIMABLE_NAT]++;
 }
 
 static unsigned int __gang_lookup_nat_set(struct f2fs_nm_info *nm_i,
@@ -2944,14 +2948,17 @@ int f2fs_flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
 	LIST_HEAD(sets);
 	int err = 0;
 
-	/* during unmount, let's flush nat_bits before checking dirty_nat_cnt */
+	/*
+	 * during unmount, let's flush nat_bits before checking
+	 * nat_cnt[DIRTY_NAT].
+	 */
 	if (enabled_nat_bits(sbi, cpc)) {
 		down_write(&nm_i->nat_tree_lock);
 		remove_nats_in_journal(sbi);
 		up_write(&nm_i->nat_tree_lock);
 	}
 
-	if (!nm_i->dirty_nat_cnt)
+	if (!nm_i->nat_cnt[DIRTY_NAT])
 		return 0;
 
 	down_write(&nm_i->nat_tree_lock);
@@ -2962,7 +2969,8 @@ int f2fs_flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
 	 * into nat entry set.
 	 */
 	if (enabled_nat_bits(sbi, cpc) ||
-		!__has_cursum_space(journal, nm_i->dirty_nat_cnt, NAT_JOURNAL))
+		!__has_cursum_space(journal,
+			nm_i->nat_cnt[DIRTY_NAT], NAT_JOURNAL))
 		remove_nats_in_journal(sbi);
 
 	while ((found = __gang_lookup_nat_set(nm_i,
@@ -3086,7 +3094,6 @@ static int init_node_manager(struct f2fs_sb_info *sbi)
 						F2FS_RESERVED_NODE_NUM;
 	nm_i->nid_cnt[FREE_NID] = 0;
 	nm_i->nid_cnt[PREALLOC_NID] = 0;
-	nm_i->nat_cnt = 0;
 	nm_i->ram_thresh = DEF_RAM_THRESHOLD;
 	nm_i->ra_nid_pages = DEF_RA_NID_PAGES;
 	nm_i->dirty_nats_ratio = DEF_DIRTY_NAT_RATIO_THRESHOLD;
@@ -3220,7 +3227,7 @@ void f2fs_destroy_node_manager(struct f2fs_sb_info *sbi)
 			__del_from_nat_cache(nm_i, natvec[idx]);
 		}
 	}
-	f2fs_bug_on(sbi, nm_i->nat_cnt);
+	f2fs_bug_on(sbi, nm_i->nat_cnt[TOTAL_NAT]);
 
 	/* destroy nat set cache */
 	nid = 0;
diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h
index 69e5859e993c..f84541b57acb 100644
--- a/fs/f2fs/node.h
+++ b/fs/f2fs/node.h
@@ -126,13 +126,13 @@ static inline void raw_nat_from_node_info(struct f2fs_nat_entry *raw_ne,
 
 static inline bool excess_dirty_nats(struct f2fs_sb_info *sbi)
 {
-	return NM_I(sbi)->dirty_nat_cnt >= NM_I(sbi)->max_nid *
+	return NM_I(sbi)->nat_cnt[DIRTY_NAT] >= NM_I(sbi)->max_nid *
 					NM_I(sbi)->dirty_nats_ratio / 100;
 }
 
 static inline bool excess_cached_nats(struct f2fs_sb_info *sbi)
 {
-	return NM_I(sbi)->nat_cnt >= DEF_NAT_CACHE_THRESHOLD;
+	return NM_I(sbi)->nat_cnt[TOTAL_NAT] >= DEF_NAT_CACHE_THRESHOLD;
 }
 
 static inline bool excess_dirty_nodes(struct f2fs_sb_info *sbi)
diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
index d66de5999a26..dd3c3c7a90ec 100644
--- a/fs/f2fs/shrinker.c
+++ b/fs/f2fs/shrinker.c
@@ -18,9 +18,7 @@ static unsigned int shrinker_run_no;
 
 static unsigned long __count_nat_entries(struct f2fs_sb_info *sbi)
 {
-	long count = NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;
-
-	return count > 0 ? count : 0;
+	return NM_I(sbi)->nat_cnt[RECLAIMABLE_NAT];
 }
 
 static unsigned long __count_free_nids(struct f2fs_sb_info *sbi)
-- 
2.29.2.454.gaff20da3a2-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [f2fs-dev] [PATCH v3] f2fs: avoid race condition for shinker count
  2020-12-03  7:55         ` Jaegeuk Kim
@ 2020-12-03  8:16           ` Chao Yu
  0 siblings, 0 replies; 13+ messages in thread
From: Chao Yu @ 2020-12-03  8:16 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-kernel, linux-f2fs-devel, kernel-team, Light Hsieh

On 2020/12/3 15:55, Jaegeuk Kim wrote:
> How about adding another variable to monitor it?

That makes more sense. :)

I've checked this, LGTM.

Reviewed-by: Chao Yu <yuchao0@huawei.com>

Thanks,

> 
>>From bdc5a805487f0210df7ef4e85ce5a4f0471bca72 Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim<jaegeuk@kernel.org>
> Date: Fri, 6 Nov 2020 13:22:05 -0800
> Subject: [PATCH] f2fs: avoid race condition for shrinker count
> 
> Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in
> wrong do_shinker work. Let's avoid to return insane overflowed value by adding
> single tracking value.
> 
> Reported-by: Light Hsieh<Light.Hsieh@mediatek.com>
> Signed-off-by: Jaegeuk Kim<jaegeuk@kernel.org>
> ---
>   fs/f2fs/checkpoint.c |  2 +-
>   fs/f2fs/debug.c      | 11 ++++++-----
>   fs/f2fs/f2fs.h       | 10 ++++++++--
>   fs/f2fs/node.c       | 29 ++++++++++++++++++-----------
>   fs/f2fs/node.h       |  4 ++--
>   fs/f2fs/shrinker.c   |  4 +---
>   6 files changed, 36 insertions(+), 24 deletions(-)
> 
> =======================================
> 
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 14ba1519639e..617d0f6b0836 100644
> 
> --- a/fs/f2fs/checkpoint.c
> 
> +++ b/fs/f2fs/checkpoint.c
> 
> @@ -1619,7 +1619,7 @@
> 
>   int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> 
> goto out;
> }
> - if (NM_I(sbi)->dirty_nat_cnt == 0 &&
> + if (NM_I(sbi)->nat_cnt[DIRTY_NAT] == 0 &&
> SIT_I(sbi)->dirty_sentries == 0 &&
> prefree_segments(sbi) == 0) {
> f2fs_flush_sit_entries(sbi, cpc);
> 
> =======================================
> 
> diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
> index a8357fd4f5fa..197c914119da 100644
> 
> --- a/fs/f2fs/debug.c
> 
> +++ b/fs/f2fs/debug.c
> 
> @@ -145,8 +145,8 @@
> 
>   static void update_general_status(struct f2fs_sb_info *sbi)
> 
> si->node_pages = NODE_MAPPING(sbi)->nrpages;
> if (sbi->meta_inode)
> si->meta_pages = META_MAPPING(sbi)->nrpages;
> - si->nats = NM_I(sbi)->nat_cnt;
> - si->dirty_nats = NM_I(sbi)->dirty_nat_cnt;
> + si->nats = NM_I(sbi)->nat_cnt[TOTAL_NAT];
> + si->dirty_nats = NM_I(sbi)->nat_cnt[DIRTY_NAT];
> si->sits = MAIN_SEGS(sbi);
> si->dirty_sits = SIT_I(sbi)->dirty_sentries;
> si->free_nids = NM_I(sbi)->nid_cnt[FREE_NID];
> 
> @@ -278,9 +278,10 @@
> 
>   static void update_mem_info(struct f2fs_sb_info *sbi)
> 
> si->cache_mem += (NM_I(sbi)->nid_cnt[FREE_NID] +
> NM_I(sbi)->nid_cnt[PREALLOC_NID]) *
> sizeof(struct free_nid);
> - si->cache_mem += NM_I(sbi)->nat_cnt * sizeof(struct nat_entry);
> - si->cache_mem += NM_I(sbi)->dirty_nat_cnt *
> - sizeof(struct nat_entry_set);
> + si->cache_mem += NM_I(sbi)->nat_cnt[TOTAL_NAT] *
> + sizeof(struct nat_entry);
> + si->cache_mem += NM_I(sbi)->nat_cnt[DIRTY_NAT] *
> + sizeof(struct nat_entry_set);
> si->cache_mem += si->inmem_pages * sizeof(struct inmem_pages);
> for (i = 0; i < MAX_INO_ENTRY; i++)
> si->cache_mem += sbi->im[i].ino_num * sizeof(struct ino_entry);
> 
> =======================================
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 4417f791dbc6..78394ea91717 100644
> 
> --- a/fs/f2fs/f2fs.h
> 
> +++ b/fs/f2fs/f2fs.h
> 
> @@ -818,6 +818,13 @@
> 
>   enum nid_state {
> 
> MAX_NID_STATE,
> };
> +enum nat_state {
> + TOTAL_NAT,
> + DIRTY_NAT,
> + RECLAIMABLE_NAT,
> + MAX_NAT_STATE,
> +};
> +
> struct f2fs_nm_info {
> block_t nat_blkaddr; /* base disk address of NAT */
> nid_t max_nid; /* maximum possible node ids */
> 
> @@ -833,8 +840,7 @@
> 
>   struct f2fs_nm_info {
> 
> struct rw_semaphore nat_tree_lock; /* protect nat_tree_lock */
> struct list_head nat_entries; /* cached nat entry list (clean) */
> spinlock_t nat_list_lock; /* protect clean nat entry list */
> - unsigned int nat_cnt; /* the # of cached nat entries */
> - unsigned int dirty_nat_cnt; /* total num of nat entries in set */
> + unsigned int nat_cnt[MAX_NAT_STATE]; /* the # of cached nat entries */
> unsigned int nat_blocks; /* # of nat blocks */
> /* free node ids management */
> 
> =======================================
> 
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index 42394de6c7eb..e65d73293a3f 100644
> 
> --- a/fs/f2fs/node.c
> 
> +++ b/fs/f2fs/node.c
> 
> @@ -62,8 +62,8 @@
> 
>   bool f2fs_available_free_memory(struct f2fs_sb_info *sbi, int type)
> 
> sizeof(struct free_nid)) >> PAGE_SHIFT;
> res = mem_size < ((avail_ram * nm_i->ram_thresh / 100) >> 2);
> } else if (type == NAT_ENTRIES) {
> - mem_size = (nm_i->nat_cnt * sizeof(struct nat_entry)) >>
> - PAGE_SHIFT;
> + mem_size = (nm_i->nat_cnt[TOTAL_NAT] *
> + sizeof(struct nat_entry)) >> PAGE_SHIFT;
> res = mem_size < ((avail_ram * nm_i->ram_thresh / 100) >> 2);
> if (excess_cached_nats(sbi))
> res = false;
> 
> @@ -177,7 +177,8 @@
> 
>   static struct nat_entry *__init_nat_entry(struct f2fs_nm_info *nm_i,
> 
> list_add_tail(&ne->list, &nm_i->nat_entries);
> spin_unlock(&nm_i->nat_list_lock);
> - nm_i->nat_cnt++;
> + nm_i->nat_cnt[TOTAL_NAT]++;
> + nm_i->nat_cnt[RECLAIMABLE_NAT]++;
> return ne;
> }
> 
> @@ -207,7 +208,8 @@
> 
>   static unsigned int __gang_lookup_nat_cache(struct f2fs_nm_info *nm_i,
> 
> static void __del_from_nat_cache(struct f2fs_nm_info *nm_i, struct nat_entry *e)
> {
> radix_tree_delete(&nm_i->nat_root, nat_get_nid(e));
> - nm_i->nat_cnt--;
> + nm_i->nat_cnt[TOTAL_NAT]--;
> + nm_i->nat_cnt[RECLAIMABLE_NAT]--;
> __free_nat_entry(e);
> }
> 
> @@ -253,7 +255,8 @@
> 
>   static void __set_nat_cache_dirty(struct f2fs_nm_info *nm_i,
> 
> if (get_nat_flag(ne, IS_DIRTY))
> goto refresh_list;
> - nm_i->dirty_nat_cnt++;
> + nm_i->nat_cnt[DIRTY_NAT]++;
> + nm_i->nat_cnt[RECLAIMABLE_NAT]--;
> set_nat_flag(ne, IS_DIRTY, true);
> refresh_list:
> spin_lock(&nm_i->nat_list_lock);
> 
> @@ -273,7 +276,8 @@
> 
>   static void __clear_nat_cache_dirty(struct f2fs_nm_info *nm_i,
> 
> set_nat_flag(ne, IS_DIRTY, false);
> set->entry_cnt--;
> - nm_i->dirty_nat_cnt--;
> + nm_i->nat_cnt[DIRTY_NAT]--;
> + nm_i->nat_cnt[RECLAIMABLE_NAT]++;
> }
> static unsigned int __gang_lookup_nat_set(struct f2fs_nm_info *nm_i,
> 
> @@ -2944,14 +2948,17 @@
> 
>   int f2fs_flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> 
> LIST_HEAD(sets);
> int err = 0;
> - /* during unmount, let's flush nat_bits before checking dirty_nat_cnt */
> + /*
> + * during unmount, let's flush nat_bits before checking
> + * nat_cnt[DIRTY_NAT].
> + */
> if (enabled_nat_bits(sbi, cpc)) {
> down_write(&nm_i->nat_tree_lock);
> remove_nats_in_journal(sbi);
> up_write(&nm_i->nat_tree_lock);
> }
> - if (!nm_i->dirty_nat_cnt)
> + if (!nm_i->nat_cnt[DIRTY_NAT])
> return 0;
> down_write(&nm_i->nat_tree_lock);
> 
> @@ -2962,7 +2969,8 @@
> 
>   int f2fs_flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> 
> * into nat entry set.
> */
> if (enabled_nat_bits(sbi, cpc) ||
> - !__has_cursum_space(journal, nm_i->dirty_nat_cnt, NAT_JOURNAL))
> + !__has_cursum_space(journal,
> + nm_i->nat_cnt[DIRTY_NAT], NAT_JOURNAL))
> remove_nats_in_journal(sbi);
> while ((found = __gang_lookup_nat_set(nm_i,
> 
> @@ -3086,7 +3094,6 @@
> 
>   static int init_node_manager(struct f2fs_sb_info *sbi)
> 
> F2FS_RESERVED_NODE_NUM;
> nm_i->nid_cnt[FREE_NID] = 0;
> nm_i->nid_cnt[PREALLOC_NID] = 0;
> - nm_i->nat_cnt = 0;
> nm_i->ram_thresh = DEF_RAM_THRESHOLD;
> nm_i->ra_nid_pages = DEF_RA_NID_PAGES;
> nm_i->dirty_nats_ratio = DEF_DIRTY_NAT_RATIO_THRESHOLD;
> 
> @@ -3220,7 +3227,7 @@
> 
>   void f2fs_destroy_node_manager(struct f2fs_sb_info *sbi)
> 
> __del_from_nat_cache(nm_i, natvec[idx]);
> }
> }
> - f2fs_bug_on(sbi, nm_i->nat_cnt);
> + f2fs_bug_on(sbi, nm_i->nat_cnt[TOTAL_NAT]);
> /* destroy nat set cache */
> nid = 0;
> 
> =======================================
> 
> diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h
> index 69e5859e993c..f84541b57acb 100644
> 
> --- a/fs/f2fs/node.h
> 
> +++ b/fs/f2fs/node.h
> 
> @@ -126,13 +126,13 @@
> 
>   static inline void raw_nat_from_node_info(struct f2fs_nat_entry *raw_ne,
> 
> static inline bool excess_dirty_nats(struct f2fs_sb_info *sbi)
> {
> - return NM_I(sbi)->dirty_nat_cnt >= NM_I(sbi)->max_nid *
> + return NM_I(sbi)->nat_cnt[DIRTY_NAT] >= NM_I(sbi)->max_nid *
> NM_I(sbi)->dirty_nats_ratio / 100;
> }
> static inline bool excess_cached_nats(struct f2fs_sb_info *sbi)
> {
> - return NM_I(sbi)->nat_cnt >= DEF_NAT_CACHE_THRESHOLD;
> + return NM_I(sbi)->nat_cnt[TOTAL_NAT] >= DEF_NAT_CACHE_THRESHOLD;
> }
> static inline bool excess_dirty_nodes(struct f2fs_sb_info *sbi)
> 
> =======================================
> 
> diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
> index d66de5999a26..dd3c3c7a90ec 100644
> 
> --- a/fs/f2fs/shrinker.c
> 
> +++ b/fs/f2fs/shrinker.c
> 
> @@ -18,9 +18,7 @@
> 
>   static unsigned int shrinker_run_no;
> 
> static unsigned long __count_nat_entries(struct f2fs_sb_info *sbi)
> {
> - long count = NM_I(sbi)->nat_cnt - NM_I(sbi)->dirty_nat_cnt;
> -
> - return count > 0 ? count : 0;
> + return NM_I(sbi)->nat_cnt[RECLAIMABLE_NAT];
> }
> static unsigned long __count_free_nids(struct f2fs_sb_info *sbi)
> 
> -- 
> 2.29.2.454.gaff20da3a2-goog
> 
> .
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-12-03  8:17 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-09 17:00 [PATCH] f2fs: avoid race condition for shinker count Jaegeuk Kim
2020-11-10  2:15 ` [f2fs-dev] " Chao Yu
2020-11-10  4:19   ` Jaegeuk Kim
2020-11-10  6:04     ` Chao Yu
2020-11-12  5:31       ` Jaegeuk Kim
2020-11-12  5:34 ` [PATCH v2] " Jaegeuk Kim
2020-11-12  5:40   ` [f2fs-dev] [PATCH v3] " Jaegeuk Kim
2020-11-12  6:57     ` Chao Yu
2020-11-17 16:40       ` Jaegeuk Kim
2020-12-03  6:07     ` Jaegeuk Kim
2020-12-03  6:55       ` Chao Yu
2020-12-03  7:55         ` Jaegeuk Kim
2020-12-03  8:16           ` Chao Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).