linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] ext4: Fix dead loop in ext4_mb_new_blocks
@ 2020-09-10  9:12 Ye Bin
  2020-09-10 16:17 ` Jan Kara
  0 siblings, 1 reply; 5+ messages in thread
From: Ye Bin @ 2020-09-10  9:12 UTC (permalink / raw)
  To: tytso, adilger.kernel, jack, linux-ext4; +Cc: Ye Bin

As we test disk offline/online with running fsstress, we find fsstress
process is keeping running state.
kworker/u32:3-262   [004] ...1   140.787471: ext4_mb_discard_preallocations: dev 8,32 needed 114
....
kworker/u32:3-262   [004] ...1   140.787471: ext4_mb_discard_preallocations: dev 8,32 needed 114

ext4_mb_new_blocks
repeat:
	ext4_mb_discard_preallocations_should_retry(sb, ac, &seq)
		freed = ext4_mb_discard_preallocations
			ext4_mb_discard_group_preallocations
				this_cpu_inc(discard_pa_seq);
		---> freed == 0
		seq_retry = ext4_get_discard_pa_seq_sum
			for_each_possible_cpu(__cpu)
				__seq += per_cpu(discard_pa_seq, __cpu);
		if (seq_retry != *seq) {
			*seq = seq_retry;
			ret = true;
		}

As we see seq_retry is sum of discard_pa_seq every cpu, if
ext4_mb_discard_group_preallocations return zero discard_pa_seq in this
cpu maybe increase one, so condition "seq_retry != *seq" have always
been met.
To Fix this problem, in ext4_mb_discard_group_preallocations function increase
discard_pa_seq only when it found preallocation to discard.

Fixes: 07b5b8e1ac40 ("ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling")
Signed-off-by: Ye Bin <yebin10@huawei.com>
---
 fs/ext4/mballoc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index f386fe62727d..fd55264dc3fe 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4191,7 +4191,6 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
 	INIT_LIST_HEAD(&list);
 repeat:
 	ext4_lock_group(sb, group);
-	this_cpu_inc(discard_pa_seq);
 	list_for_each_entry_safe(pa, tmp,
 				&grp->bb_prealloc_list, pa_group_list) {
 		spin_lock(&pa->pa_lock);
@@ -4233,6 +4232,9 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
 		goto out;
 	}
 
+	/* only increase when find reallocation to discard */
+	this_cpu_inc(discard_pa_seq);
+
 	/* now free all selected PAs */
 	list_for_each_entry_safe(pa, tmp, &list, u.pa_tmp_list) {
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] ext4: Fix dead loop in ext4_mb_new_blocks
  2020-09-10  9:12 [PATCH v3] ext4: Fix dead loop in ext4_mb_new_blocks Ye Bin
@ 2020-09-10 16:17 ` Jan Kara
  2020-09-11 13:20   ` Ritesh Harjani
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kara @ 2020-09-10 16:17 UTC (permalink / raw)
  To: Ye Bin; +Cc: tytso, adilger.kernel, jack, linux-ext4, Ritesh Harjani

On Thu 10-09-20 17:12:52, Ye Bin wrote:
> As we test disk offline/online with running fsstress, we find fsstress
> process is keeping running state.
> kworker/u32:3-262   [004] ...1   140.787471: ext4_mb_discard_preallocations: dev 8,32 needed 114
> ....
> kworker/u32:3-262   [004] ...1   140.787471: ext4_mb_discard_preallocations: dev 8,32 needed 114
> 
> ext4_mb_new_blocks
> repeat:
> 	ext4_mb_discard_preallocations_should_retry(sb, ac, &seq)
> 		freed = ext4_mb_discard_preallocations
> 			ext4_mb_discard_group_preallocations
> 				this_cpu_inc(discard_pa_seq);
> 		---> freed == 0
> 		seq_retry = ext4_get_discard_pa_seq_sum
> 			for_each_possible_cpu(__cpu)
> 				__seq += per_cpu(discard_pa_seq, __cpu);
> 		if (seq_retry != *seq) {
> 			*seq = seq_retry;
> 			ret = true;
> 		}
> 
> As we see seq_retry is sum of discard_pa_seq every cpu, if
> ext4_mb_discard_group_preallocations return zero discard_pa_seq in this
> cpu maybe increase one, so condition "seq_retry != *seq" have always
> been met.
> To Fix this problem, in ext4_mb_discard_group_preallocations function increase
> discard_pa_seq only when it found preallocation to discard.
> 
> Fixes: 07b5b8e1ac40 ("ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling")
> Signed-off-by: Ye Bin <yebin10@huawei.com>

Thanks for the patch. One comment below.

> ---
>  fs/ext4/mballoc.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index f386fe62727d..fd55264dc3fe 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -4191,7 +4191,6 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
>  	INIT_LIST_HEAD(&list);
>  repeat:
>  	ext4_lock_group(sb, group);
> -	this_cpu_inc(discard_pa_seq);
>  	list_for_each_entry_safe(pa, tmp,
>  				&grp->bb_prealloc_list, pa_group_list) {
>  		spin_lock(&pa->pa_lock);
> @@ -4233,6 +4232,9 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
>  		goto out;
>  	}
>  
> +	/* only increase when find reallocation to discard */
> +	this_cpu_inc(discard_pa_seq);
> +

This is a good place to increment the counter but I think you also need to
handle the case:

        if (free < needed && busy) {
                busy = 0;
                ext4_unlock_group(sb, group);
                cond_resched();
                goto repeat;
        }

We can unlock the group here after removing some preallocations and thus
other processes checking discard_pa_seq could miss we did this. In fact I
think the code is somewhat buggy here and we should also discard extents
accumulated on "list" so far before unlocking the group. Ritesh?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] ext4: Fix dead loop in ext4_mb_new_blocks
  2020-09-10 16:17 ` Jan Kara
@ 2020-09-11 13:20   ` Ritesh Harjani
  2020-09-14  3:21     ` yebin
  2020-09-14  8:20     ` Jan Kara
  0 siblings, 2 replies; 5+ messages in thread
From: Ritesh Harjani @ 2020-09-11 13:20 UTC (permalink / raw)
  To: Ye Bin; +Cc: Jan Kara, tytso, adilger.kernel, jack, linux-ext4

Hello Ye,

Please excuse if there is something horribly wrong with my email
formatting. Have yesterday received this laptop and still setting up
few things.

On 9/10/20 9:47 PM, Jan Kara wrote:
> On Thu 10-09-20 17:12:52, Ye Bin wrote:
>> As we test disk offline/online with running fsstress, we find fsstress
>> process is keeping running state.
>> kworker/u32:3-262   [004] ...1   140.787471: ext4_mb_discard_preallocations: dev 8,32 needed 114
>> ....
>> kworker/u32:3-262   [004] ...1   140.787471: ext4_mb_discard_preallocations: dev 8,32 needed 114
>>
>> ext4_mb_new_blocks
>> repeat:
>> 	ext4_mb_discard_preallocations_should_retry(sb, ac, &seq)
>> 		freed = ext4_mb_discard_preallocations
>> 			ext4_mb_discard_group_preallocations
>> 				this_cpu_inc(discard_pa_seq);
>> 		---> freed == 0
>> 		seq_retry = ext4_get_discard_pa_seq_sum
>> 			for_each_possible_cpu(__cpu)
>> 				__seq += per_cpu(discard_pa_seq, __cpu);
>> 		if (seq_retry != *seq) {
>> 			*seq = seq_retry;
>> 			ret = true;
>> 		}
>>
>> As we see seq_retry is sum of discard_pa_seq every cpu, if
>> ext4_mb_discard_group_preallocations return zero discard_pa_seq in this
>> cpu maybe increase one, so condition "seq_retry != *seq" have always
>> been met.
>> To Fix this problem, in ext4_mb_discard_group_preallocations function increase
>> discard_pa_seq only when it found preallocation to discard.
>>
>> Fixes: 07b5b8e1ac40 ("ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling")
>> Signed-off-by: Ye Bin <yebin10@huawei.com>
> 
> Thanks for the patch. One comment below.
> 
>> ---
>>   fs/ext4/mballoc.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index f386fe62727d..fd55264dc3fe 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -4191,7 +4191,6 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
>>   	INIT_LIST_HEAD(&list);
>>   repeat:
>>   	ext4_lock_group(sb, group);
>> -	this_cpu_inc(discard_pa_seq);
>>   	list_for_each_entry_safe(pa, tmp,
>>   				&grp->bb_prealloc_list, pa_group_list) {
>>   		spin_lock(&pa->pa_lock);
>> @@ -4233,6 +4232,9 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
>>   		goto out;
>>   	}
>>   
>> +	/* only increase when find reallocation to discard */
>> +	this_cpu_inc(discard_pa_seq);
>> +
> 
> This is a good place to increment the counter but I think you also need to
> handle the case:
> 
>          if (free < needed && busy) {
>                  busy = 0;
>                  ext4_unlock_group(sb, group);
>                  cond_resched();
>                  goto repeat;
>          }
> 
> We can unlock the group here after removing some preallocations and thus
> other processes checking discard_pa_seq could miss we did this. In fact I
> think the code is somewhat buggy here and we should also discard extents
> accumulated on "list" so far before unlocking the group. Ritesh?
> 

mmm, so even though this code is not discarding those buffers b4
unlocking, but it has still removed those from the grp->bb_prealloc_list
and added it to the local list. And since it will at some point get
scheduled and start operating from repeat: label so functionally wise
this should be ok. Am I missing anything?

Although I agree, that if we remove at least the current pa's before
unlocking the group may be a good idea, but we should also check
why was this done like this at the first place.


I agree with Jan, that we should increment discard_pa_seq once we 
actually have something
to discard. I should have written a comment here to explain why we did 
this here.
But I think commit msg should have all the history (since I have a habit 
of writing long commit msgs ;)

But IIRC, it was done since in case if there is a parallel thread which 
is discarding
all the preallocations so the current thread may return 0 since it 
checks the
list_empty(&grp->bb_prealloc_list) check in 
ext4_mb_discard_group_preallocations() and returns 0 directly.

And why the discard_pa_seq counter at other places may not help since we 
remove the pa nodes from
grp->bb_prealloc_list into a local list and then start operating on
that. So meanwhile some thread may comes and just checks that the list
is empty and return 0 while some other thread may start discarding from
it's local list.
So I guess the main problem was that in the current code we remove
the pa from grp->bb_prealloc_list and add it to local list. So if 
someone else comes
and checks that grp->bb_prealloc_list is empty then it will directly 
return 0.

So, maybe we could do something like this then?

repeat:
	ext4_lock_group(sb, group);
-	this_cpu_inc(discard_pa_seq);
list_for_each_entry_safe(pa, tmp,
				&grp->bb_prealloc_list, pa_group_list) {<...>

+		if (!free)
+			this_cpu_inc(discard_pa_seq);   // we should do this here before 
calling list_del(&pa->pa_group_list);

		/* we can trust pa_free ... */
		free += pa->pa_free;
		spin_unlock(&pa->pa_lock);

		list_del(&pa->pa_group_list);
		list_add(&pa->u.pa_tmp_list, &list);
	}

I have some test cases around this to test for cases which were
failing. Since in world of parallelism you can't be 100% certain of some
corner case (like this one you just reported).
But, I don't have my other box rite now where I kept all of those -
due to some technical issues. I think I should be able to get those by
next week, if not, I anyways will setup my current machine for testing
this.

-ritesh

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] ext4: Fix dead loop in ext4_mb_new_blocks
  2020-09-11 13:20   ` Ritesh Harjani
@ 2020-09-14  3:21     ` yebin
  2020-09-14  8:20     ` Jan Kara
  1 sibling, 0 replies; 5+ messages in thread
From: yebin @ 2020-09-14  3:21 UTC (permalink / raw)
  To: Ritesh Harjani, Jan Kara; +Cc: tytso, adilger.kernel, jack, linux-ext4

In fact, we didn't free the available space, and other processes 
couldn't get it
even if they tried again.

According to your opinions, I made two different revisions. Which one do 
you think is better?
(1)Free PAs before repeat
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 132c118d12e1..4ab76882350d 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4189,7 +4189,6 @@ ext4_mb_discard_group_preallocations(struct 
super_block *sb,
         INIT_LIST_HEAD(&list);
  repeat:
         ext4_lock_group(sb, group);
-       this_cpu_inc(discard_pa_seq);
         list_for_each_entry_safe(pa, tmp,
                                 &grp->bb_prealloc_list, pa_group_list) {
                 spin_lock(&pa->pa_lock);
@@ -4215,22 +4214,6 @@ ext4_mb_discard_group_preallocations(struct 
super_block *sb,
                 list_add(&pa->u.pa_tmp_list, &list);
         }

-       /* if we still need more blocks and some PAs were used, try again */
-       if (free < needed && busy) {
-               busy = 0;
-               ext4_unlock_group(sb, group);
-               cond_resched();
-               goto repeat;
-       }
-
-       /* found anything to free? */
-       if (list_empty(&list)) {
-               BUG_ON(free != 0);
-               mb_debug(sb, "Someone else may have freed PA for this 
group %u\n",
-                        group);
-               goto out;
-       }
-
         /* now free all selected PAs */
         list_for_each_entry_safe(pa, tmp, &list, u.pa_tmp_list) {

@@ -4248,6 +4231,14 @@ ext4_mb_discard_group_preallocations(struct 
super_block *sb,
                 call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback);
         }

+       /* if we still need more blocks and some PAs were used, try again */
+       if (free < needed && busy) {
+               busy = 0;
+               ext4_unlock_group(sb, group);
+               cond_resched();
+               goto repeat;
+       }
+
  out:
         ext4_unlock_group(sb, group);
         ext4_mb_unload_buddy(&e4b);
--

(2)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 132c118d12e1..188772bbf679 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4189,7 +4189,6 @@ ext4_mb_discard_group_preallocations(struct 
super_block *sb,
         INIT_LIST_HEAD(&list);
  repeat:
         ext4_lock_group(sb, group);
-       this_cpu_inc(discard_pa_seq);
         list_for_each_entry_safe(pa, tmp,
                                 &grp->bb_prealloc_list, pa_group_list) {
                 spin_lock(&pa->pa_lock);
@@ -4217,6 +4216,8 @@ ext4_mb_discard_group_preallocations(struct 
super_block *sb,

         /* if we still need more blocks and some PAs were used, try 
again */
         if (free < needed && busy) {
+               if (free)
+                       this_cpu_inc(discard_pa_seq);
                 busy = 0;
                 ext4_unlock_group(sb, group);
                 cond_resched();



On 2020/9/11 21:20, Ritesh Harjani wrote:
> Hello Ye,
>
> Please excuse if there is something horribly wrong with my email
> formatting. Have yesterday received this laptop and still setting up
> few things.
>
> On 9/10/20 9:47 PM, Jan Kara wrote:
>> On Thu 10-09-20 17:12:52, Ye Bin wrote:
>>> As we test disk offline/online with running fsstress, we find fsstress
>>> process is keeping running state.
>>> kworker/u32:3-262   [004] ...1   140.787471: 
>>> ext4_mb_discard_preallocations: dev 8,32 needed 114
>>> ....
>>> kworker/u32:3-262   [004] ...1   140.787471: 
>>> ext4_mb_discard_preallocations: dev 8,32 needed 114
>>>
>>> ext4_mb_new_blocks
>>> repeat:
>>>     ext4_mb_discard_preallocations_should_retry(sb, ac, &seq)
>>>         freed = ext4_mb_discard_preallocations
>>>             ext4_mb_discard_group_preallocations
>>>                 this_cpu_inc(discard_pa_seq);
>>>         ---> freed == 0
>>>         seq_retry = ext4_get_discard_pa_seq_sum
>>>             for_each_possible_cpu(__cpu)
>>>                 __seq += per_cpu(discard_pa_seq, __cpu);
>>>         if (seq_retry != *seq) {
>>>             *seq = seq_retry;
>>>             ret = true;
>>>         }
>>>
>>> As we see seq_retry is sum of discard_pa_seq every cpu, if
>>> ext4_mb_discard_group_preallocations return zero discard_pa_seq in this
>>> cpu maybe increase one, so condition "seq_retry != *seq" have always
>>> been met.
>>> To Fix this problem, in ext4_mb_discard_group_preallocations 
>>> function increase
>>> discard_pa_seq only when it found preallocation to discard.
>>>
>>> Fixes: 07b5b8e1ac40 ("ext4: mballoc: introduce pcpu seqcnt for 
>>> freeing PA to improve ENOSPC handling")
>>> Signed-off-by: Ye Bin <yebin10@huawei.com>
>>
>> Thanks for the patch. One comment below.
>>
>>> ---
>>>   fs/ext4/mballoc.c | 4 +++-
>>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>>> index f386fe62727d..fd55264dc3fe 100644
>>> --- a/fs/ext4/mballoc.c
>>> +++ b/fs/ext4/mballoc.c
>>> @@ -4191,7 +4191,6 @@ ext4_mb_discard_group_preallocations(struct 
>>> super_block *sb,
>>>       INIT_LIST_HEAD(&list);
>>>   repeat:
>>>       ext4_lock_group(sb, group);
>>> -    this_cpu_inc(discard_pa_seq);
>>>       list_for_each_entry_safe(pa, tmp,
>>>                   &grp->bb_prealloc_list, pa_group_list) {
>>>           spin_lock(&pa->pa_lock);
>>> @@ -4233,6 +4232,9 @@ ext4_mb_discard_group_preallocations(struct 
>>> super_block *sb,
>>>           goto out;
>>>       }
>>>   +    /* only increase when find reallocation to discard */
>>> +    this_cpu_inc(discard_pa_seq);
>>> +
>>
>> This is a good place to increment the counter but I think you also 
>> need to
>> handle the case:
>>
>>          if (free < needed && busy) {
>>                  busy = 0;
>>                  ext4_unlock_group(sb, group);
>>                  cond_resched();
>>                  goto repeat;
>>          }
>>
>> We can unlock the group here after removing some preallocations and thus
>> other processes checking discard_pa_seq could miss we did this. In 
>> fact I
>> think the code is somewhat buggy here and we should also discard extents
>> accumulated on "list" so far before unlocking the group. Ritesh?
>>
>
> mmm, so even though this code is not discarding those buffers b4
> unlocking, but it has still removed those from the grp->bb_prealloc_list
> and added it to the local list. And since it will at some point get
> scheduled and start operating from repeat: label so functionally wise
> this should be ok. Am I missing anything?
>
> Although I agree, that if we remove at least the current pa's before
> unlocking the group may be a good idea, but we should also check
> why was this done like this at the first place.
>
>
> I agree with Jan, that we should increment discard_pa_seq once we 
> actually have something
> to discard. I should have written a comment here to explain why we did 
> this here.
> But I think commit msg should have all the history (since I have a 
> habit of writing long commit msgs ;)
>
> But IIRC, it was done since in case if there is a parallel thread 
> which is discarding
> all the preallocations so the current thread may return 0 since it 
> checks the
> list_empty(&grp->bb_prealloc_list) check in 
> ext4_mb_discard_group_preallocations() and returns 0 directly.
>
> And why the discard_pa_seq counter at other places may not help since 
> we remove the pa nodes from
> grp->bb_prealloc_list into a local list and then start operating on
> that. So meanwhile some thread may comes and just checks that the list
> is empty and return 0 while some other thread may start discarding from
> it's local list.
> So I guess the main problem was that in the current code we remove
> the pa from grp->bb_prealloc_list and add it to local list. So if 
> someone else comes
> and checks that grp->bb_prealloc_list is empty then it will directly 
> return 0.
>
> So, maybe we could do something like this then?
>
> repeat:
>     ext4_lock_group(sb, group);
> -    this_cpu_inc(discard_pa_seq);
> list_for_each_entry_safe(pa, tmp,
>                 &grp->bb_prealloc_list, pa_group_list) {<...>
>
> +        if (!free)
> +            this_cpu_inc(discard_pa_seq);   // we should do this here 
> before calling list_del(&pa->pa_group_list);
>
>         /* we can trust pa_free ... */
>         free += pa->pa_free;
>         spin_unlock(&pa->pa_lock);
>
>         list_del(&pa->pa_group_list);
>         list_add(&pa->u.pa_tmp_list, &list);
>     }
>
> I have some test cases around this to test for cases which were
> failing. Since in world of parallelism you can't be 100% certain of some
> corner case (like this one you just reported).
> But, I don't have my other box rite now where I kept all of those -
> due to some technical issues. I think I should be able to get those by
> next week, if not, I anyways will setup my current machine for testing
> this.
>
> -ritesh
> .
>


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] ext4: Fix dead loop in ext4_mb_new_blocks
  2020-09-11 13:20   ` Ritesh Harjani
  2020-09-14  3:21     ` yebin
@ 2020-09-14  8:20     ` Jan Kara
  1 sibling, 0 replies; 5+ messages in thread
From: Jan Kara @ 2020-09-14  8:20 UTC (permalink / raw)
  To: Ritesh Harjani; +Cc: Ye Bin, Jan Kara, tytso, adilger.kernel, jack, linux-ext4


Hello Ritesh,

On Fri 11-09-20 18:50:22, Ritesh Harjani wrote:
> On 9/10/20 9:47 PM, Jan Kara wrote:
> > > ---
> > >   fs/ext4/mballoc.c | 4 +++-
> > >   1 file changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> > > index f386fe62727d..fd55264dc3fe 100644
> > > --- a/fs/ext4/mballoc.c
> > > +++ b/fs/ext4/mballoc.c
> > > @@ -4191,7 +4191,6 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
> > >   	INIT_LIST_HEAD(&list);
> > >   repeat:
> > >   	ext4_lock_group(sb, group);
> > > -	this_cpu_inc(discard_pa_seq);
> > >   	list_for_each_entry_safe(pa, tmp,
> > >   				&grp->bb_prealloc_list, pa_group_list) {
> > >   		spin_lock(&pa->pa_lock);
> > > @@ -4233,6 +4232,9 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
> > >   		goto out;
> > >   	}
> > > +	/* only increase when find reallocation to discard */
> > > +	this_cpu_inc(discard_pa_seq);
> > > +
> > 
> > This is a good place to increment the counter but I think you also need to
> > handle the case:
> > 
> >          if (free < needed && busy) {
> >                  busy = 0;
> >                  ext4_unlock_group(sb, group);
> >                  cond_resched();
> >                  goto repeat;
> >          }
> > 
> > We can unlock the group here after removing some preallocations and thus
> > other processes checking discard_pa_seq could miss we did this. In fact I
> > think the code is somewhat buggy here and we should also discard extents
> > accumulated on "list" so far before unlocking the group. Ritesh?
> > 
> 
> mmm, so even though this code is not discarding those buffers b4
> unlocking, but it has still removed those from the grp->bb_prealloc_list
> and added it to the local list. And since it will at some point get
> scheduled and start operating from repeat: label so functionally wise
> this should be ok. Am I missing anything?

Yes, this function itself will be working correctly. But if we unlock the
group before discarding preallocations we have in our local list the
following can happen:

TASK0					TASK1
					ext4_mb_new_blocks()
					  - no free block found
					  ext4_mb_discard_preallocations_should_retry()
					    ext4_mb_discard_preallocations()
ext4_mb_discard_group_preallocations()
  ext4_lock_group(sb, group);
  this_cpu_inc(discard_pa_seq);
  scans group, moves some preallocations to local list
  if (free < needed && busy) {
    ext4_unlock_group(sb, group);
					      - finds no pa to discard
					    seq_retry = ext4_get_discard_pa_seq_sum()
					      - ok seq_retry != seq -> retry
					  - new search still didn't return
					    anything because preallocations
					    are sitting in TASK0 local list
					    and are not freed yet

> Although I agree, that if we remove at least the current pa's before
> unlocking the group may be a good idea, but we should also check
> why was this done like this at the first place.

If we don't care about somewhat premature ENOSPC error, then not discarding
preallocations before unlocking the group is not a problem. That's why
previously it was done like this I believe.
 
> I agree with Jan, that we should increment discard_pa_seq once we
> actually have something to discard. I should have written a comment here
> to explain why we did this here.  But I think commit msg should have all
> the history (since I have a habit of writing long commit msgs ;)
> 
> But IIRC, it was done since in case if there is a parallel thread which
> is discarding all the preallocations so the current thread may return 0
> since it checks the list_empty(&grp->bb_prealloc_list) check in
> ext4_mb_discard_group_preallocations() and returns 0 directly.

Ah, OK. I forgot about the unlocked check for empty bb_prealloc_list.

> And why the discard_pa_seq counter at other places may not help since we
> remove the pa nodes from grp->bb_prealloc_list into a local list and then
> start operating on that. So meanwhile some thread may comes and just
> checks that the list is empty and return 0 while some other thread may
> start discarding from it's local list.  So I guess the main problem was
> that in the current code we remove the pa from grp->bb_prealloc_list and
> add it to local list. So if someone else comes and checks that
> grp->bb_prealloc_list is empty then it will directly return 0.
> 
> So, maybe we could do something like this then?
> 
> repeat:
> 	ext4_lock_group(sb, group);
> -	this_cpu_inc(discard_pa_seq);
> list_for_each_entry_safe(pa, tmp,
> 				&grp->bb_prealloc_list, pa_group_list) {<...>
> 
> +		if (!free)
> +			this_cpu_inc(discard_pa_seq);   // we should do this here before calling
> list_del(&pa->pa_group_list);

Yup, that looks good.

> 		/* we can trust pa_free ... */
> 		free += pa->pa_free;
> 		spin_unlock(&pa->pa_lock);
> 
> 		list_del(&pa->pa_group_list);
> 		list_add(&pa->u.pa_tmp_list, &list);
> 	}
> 
> I have some test cases around this to test for cases which were
> failing. Since in world of parallelism you can't be 100% certain of some
> corner case (like this one you just reported).
> But, I don't have my other box rite now where I kept all of those -
> due to some technical issues. I think I should be able to get those by
> next week, if not, I anyways will setup my current machine for testing
> this.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-09-14  8:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-10  9:12 [PATCH v3] ext4: Fix dead loop in ext4_mb_new_blocks Ye Bin
2020-09-10 16:17 ` Jan Kara
2020-09-11 13:20   ` Ritesh Harjani
2020-09-14  3:21     ` yebin
2020-09-14  8:20     ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).