All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] raid5: avoid add sh->lru to different list
       [not found] <20200108163023.9301-1-guoqing.jiang@cloud.ionos.com>
@ 2020-01-09  7:02 ` Xiao Ni
  2020-01-09 10:32   ` Guoqing Jiang
  2020-01-10  1:06 ` Song Liu
  1 sibling, 1 reply; 4+ messages in thread
From: Xiao Ni @ 2020-01-09  7:02 UTC (permalink / raw)
  To: jgq516, liu.song.a23; +Cc: linux-raid, Guoqing Jiang



On 01/09/2020 12:30 AM, jgq516@gmail.com wrote:
> From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
>
> release_stripe_plug adds lru to unplug list, then raid5_unplug
> iterates unplug list and release the stripe in the list. But
> sh->lru could exist in another list as well since there is no
> protection of the race since release_stripe_plug is lock free.
>
> For example, the same sh could be handled by raid5_release_stripe
> which is lock free too, it does two things in case "sh->count == 1".
>
> 1. add sh to released_stripes.
>
> Or
>
> 2. go to slow path if sh is already set with ON_RELEASE_LIST.
>
> Either 1 or 2 could trigger do_release_stripe finally, and this
> function mainly move sh->lru to different lists such as delayed_list,
> handle_list or temp_inactive_list etc.

Hi Guoqing

The sh->count should be >= 1 before calling release_stripe_plug. Because 
the stripe is got
from raid5_get_active_stripe. In your example, if the same sh will be 
handled by raid5_release_stripe.
The sh->count should be >= 2, right? Because it should be added 1 when 
someone gets the same
sh. So the sh->count can't be zero and can't be released before/in/after 
release_stripe_plug. Could
you explain about this if sh->count can be zero when calling 
release_stripe_plug?

When it moves sh->lru to unplug list, it doesn't check sh->count and get 
any lock. The unplug method
is to avoid conf->device_lock. Is it enough to check sh->lru here? If 
it's not empty, it can call raid5_release_stripe
directly.
>
> Then the same node could be in different lists, which causes
What's the meaning of "node" here? It's stripe?
> raid5_unplug sticks with "while (!list_empty(&cb->list))", and
> since spin_lock_irq(&conf->device_lock) is called before, it
> causes:
You mean a dead loop? In the loop, stripe is moved from cb->list. The 
while loop
can finish successfully, right?

>
> 1. hard lock up in [1], [2] and [3] since irq is disabled.
>
> 2. raid5_get_active_stripe can't get device_lock and calltrace
> happens.
>
> [<ffffffff81598060>] _raw_spin_lock+0x20/0x30
> [<ffffffffa0230b0a>] raid5_get_active_stripe+0x1da/0x5250 [raid456]
> [<ffffffff8112d165>] ? mempool_alloc_slab+0x15/0x20
> [<ffffffffa0231174>] raid5_get_active_stripe+0x844/0x5250 [raid456]
> [<ffffffff812d5574>] ? generic_make_request+0x24/0x2b0
> [<ffffffff810938b0>] ? wait_woken+0x90/0x90
> [<ffffffff814a2adc>] md_make_request+0xfc/0x250
> [<ffffffff812d5867>] submit_bio+0x67/0x150
>
> So, this commit tries to avoid the issue by makes below change:
>
> 1. firstly we can't add sh->lru to cb->list if sh->count == 0.
>
> 2. check STRIPE_ON_UNPLUG_LIST too in do_release_stripe to avoid
> list corruption, and the lock version of test_and_set_bit/clear_bit
> are used (though it should not effective on x86).
>
> [1]. https://marc.info/?l=linux-raid&m=150348807422853&w=2
> [2]. https://marc.info/?l=linux-raid&m=146883211430999&w=2
> [3]. https://marc.info/?l=linux-raid&m=157434565331673&w=2
>
> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
> ---
> As you can see, this version is different from previous one.
>
> The previous verison is made because I thought that ON_UNPLUG_LIST
> and ON_RELEASE_LIST are kind of exclusive, the sh should be
> only set with  either of the flag but not both, but perhaps
> it is not true ...
Yes. The stripe may be handled in other cases and it calls 
raid5_release_stripe after the job.
Now the stripe maybe is in conf->released_stripes and sh->count is 1.
>
> Instead, since the goal is to avoid put the same sh->lru in
> different list, or we can fix it from other side. Before
> do_release_stripe move sh->lru to another list (not cb->list),
> check if the sh is on unplug list yet, if it is true then wake
> up mddev->thread to trigger the plug path:
>
>    raid5d -> blk_finish_plug -> raid5_unplug ->
>    __release_stripe -> do_release_stripe
>
> Then raid5_unplug remove sh from cb->list one by one, clear
> ON_UNPLUG_LIST flag and release stripe. So we ensure sh on
> unplug list is actually handled by plug mechanism instead
> of other paths.
>
> Comments and tests are welcomed.
>
> The new changes are tested with "./test --raidtype=raid456",
> seems good.
>
> Thanks,
> Guoqing
>
>   drivers/md/raid5.c | 24 ++++++++++++++++++++++--
>   1 file changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 223e97ab27e6..808b0bd18c8c 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -218,6 +218,25 @@ static void do_release_stripe(struct r5conf *conf, struct stripe_head *sh,
>   	BUG_ON(!list_empty(&sh->lru));
>   	BUG_ON(atomic_read(&conf->active_stripes)==0);
>   
> +	/*
> +	 * If stripe is on unplug list then original caller of __release_stripe
> +	 * is not raid5_unplug, so sh->lru is still in cb->list, don't risk to
> +	 * add lru to another list in do_release_stripe.
> +	 */
> +	if (!test_and_set_bit_lock(STRIPE_ON_UNPLUG_LIST, &sh->state))
> +		clear_bit_unlock(STRIPE_ON_UNPLUG_LIST, &sh->state);
> +	else {
> +		/*
> +		 * The sh is on unplug list, so increase count (because count
> +		 * is decrease before enter do_release_stripe), then trigger
is decreased before entering
> +		 * raid5d -> plug -> raid5_unplug -> __release_stripe to handle
> +		 * this stripe.
> +		 */
> +		atomic_inc(&sh->count);
> +		md_wakeup_thread(conf->mddev->thread);
> +		return;
> +	}
> +
>   	if (r5c_is_writeback(conf->log))
>   		for (i = sh->disks; i--; )
>   			if (test_bit(R5_InJournal, &sh->dev[i].flags))
> @@ -5441,7 +5460,7 @@ static void raid5_unplug(struct blk_plug_cb *blk_cb, bool from_schedule)
>   			 * is still in our list
>   			 */
>   			smp_mb__before_atomic();
> -			clear_bit(STRIPE_ON_UNPLUG_LIST, &sh->state);
> +			clear_bit_unlock(STRIPE_ON_UNPLUG_LIST, &sh->state);
>   			/*
>   			 * STRIPE_ON_RELEASE_LIST could be set here. In that
>   			 * case, the count is always > 1 here
> @@ -5481,7 +5500,8 @@ static void release_stripe_plug(struct mddev *mddev,
>   			INIT_LIST_HEAD(cb->temp_inactive_list + i);
>   	}
>   
> -	if (!test_and_set_bit(STRIPE_ON_UNPLUG_LIST, &sh->state))
> +	if (!test_and_set_bit_lock(STRIPE_ON_UNPLUG_LIST, &sh->state) &&
> +	    (atomic_read(&sh->count) != 0))
>   		list_add_tail(&sh->lru, &cb->list);
>   	else
>   		raid5_release_stripe(sh);

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] raid5: avoid add sh->lru to different list
  2020-01-09  7:02 ` [PATCH] raid5: avoid add sh->lru to different list Xiao Ni
@ 2020-01-09 10:32   ` Guoqing Jiang
  0 siblings, 0 replies; 4+ messages in thread
From: Guoqing Jiang @ 2020-01-09 10:32 UTC (permalink / raw)
  To: Xiao Ni, jgq516, liu.song.a23; +Cc: linux-raid

Hi Xiao,

On 1/9/20 8:02 AM, Xiao Ni wrote:
>
>
> On 01/09/2020 12:30 AM, jgq516@gmail.com wrote:
>> From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
>>
>> release_stripe_plug adds lru to unplug list, then raid5_unplug
>> iterates unplug list and release the stripe in the list. But
>> sh->lru could exist in another list as well since there is no
>> protection of the race since release_stripe_plug is lock free.
>>
>> For example, the same sh could be handled by raid5_release_stripe
>> which is lock free too, it does two things in case "sh->count == 1".
>>
>> 1. add sh to released_stripes.
>>
>> Or
>>
>> 2. go to slow path if sh is already set with ON_RELEASE_LIST.
>>
>> Either 1 or 2 could trigger do_release_stripe finally, and this
>> function mainly move sh->lru to different lists such as delayed_list,
>> handle_list or temp_inactive_list etc.
>
> Hi Guoqing
>
> The sh->count should be >= 1 before calling release_stripe_plug. 
> Because the stripe is got
> from raid5_get_active_stripe. In your example, if the same sh will be 
> handled by raid5_release_stripe.
> The sh->count should be >= 2, right? 

I am not sure it is true, since sh->count could be decreased in both 
raid5_release_stripe (lockless)
and __release_stripe (protected by device_lock), so there are quite a 
lot of paths can decrease the
count.

> Because it should be added 1 when someone gets the same
> sh. So the sh->count can't be zero and can't be released 
> before/in/after release_stripe_plug.

After sh->lru is added to cb->list, could it possible that other 
processes decrease sh->count and
add the same sh->lru to another list? I don't know where can prevent it 
happen.

> Could you explain about this if sh->count can be zero when calling 
> release_stripe_plug?

Because raid5_get_active_stripe is lockless, and image there are massive 
IOs against the same sh
especially when multiple r5workers are activated. I am also not sure the 
sh->count can guarantee
sh->lru lives in only one list, so there could be corner case which 
causes lockup.

>
> When it moves sh->lru to unplug list, it doesn't check sh->count and 
> get any lock. The unplug method
> is to avoid conf->device_lock. Is it enough to check sh->lru here? If 
> it's not empty, it can call raid5_release_stripe
> directly.
>>
>> Then the same node could be in different lists, which causes
> What's the meaning of "node" here? It's stripe?

I means the list node, maybe entry is more appropriate.

>> raid5_unplug sticks with "while (!list_empty(&cb->list))", and
>> since spin_lock_irq(&conf->device_lock) is called before, it
>> causes:
> You mean a dead loop? In the loop, stripe is moved from cb->list. The 
> while loop
> can finish successfully, right?

Please see link [1].

>
>>
>> 1. hard lock up in [1], [2] and [3] since irq is disabled.
>>
>> 2. raid5_get_active_stripe can't get device_lock and calltrace
>> happens.
>>
>> [<ffffffff81598060>] _raw_spin_lock+0x20/0x30
>> [<ffffffffa0230b0a>] raid5_get_active_stripe+0x1da/0x5250 [raid456]
>> [<ffffffff8112d165>] ? mempool_alloc_slab+0x15/0x20
>> [<ffffffffa0231174>] raid5_get_active_stripe+0x844/0x5250 [raid456]
>> [<ffffffff812d5574>] ? generic_make_request+0x24/0x2b0
>> [<ffffffff810938b0>] ? wait_woken+0x90/0x90
>> [<ffffffff814a2adc>] md_make_request+0xfc/0x250
>> [<ffffffff812d5867>] submit_bio+0x67/0x150
>>
>> So, this commit tries to avoid the issue by makes below change:
>>
>> 1. firstly we can't add sh->lru to cb->list if sh->count == 0.
>>
>> 2. check STRIPE_ON_UNPLUG_LIST too in do_release_stripe to avoid
>> list corruption, and the lock version of test_and_set_bit/clear_bit
>> are used (though it should not effective on x86).
>>
>> [1]. https://marc.info/?l=linux-raid&m=150348807422853&w=2
>> [2]. https://marc.info/?l=linux-raid&m=146883211430999&w=2
>> [3]. https://marc.info/?l=linux-raid&m=157434565331673&w=2
>>
>> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
>> ---
>> As you can see, this version is different from previous one.
>>
>> The previous verison is made because I thought that ON_UNPLUG_LIST
>> and ON_RELEASE_LIST are kind of exclusive, the sh should be
>> only set with  either of the flag but not both, but perhaps
>> it is not true ...
> Yes. The stripe may be handled in other cases and it calls 
> raid5_release_stripe after the job.
> Now the stripe maybe is in conf->released_stripes and sh->count is 1.

So I changed my mind :-). If the sh is on unplug list, then it should be 
handled
by raid5_unplug instead of by other callers, it seems more natural from my
point view whether one sh could live different list simultaneously or not.

>>
>> Instead, since the goal is to avoid put the same sh->lru in
>> different list, or we can fix it from other side. Before
>> do_release_stripe move sh->lru to another list (not cb->list),
>> check if the sh is on unplug list yet, if it is true then wake
>> up mddev->thread to trigger the plug path:
>>
>>    raid5d -> blk_finish_plug -> raid5_unplug ->
>>    __release_stripe -> do_release_stripe
>>
>> Then raid5_unplug remove sh from cb->list one by one, clear
>> ON_UNPLUG_LIST flag and release stripe. So we ensure sh on
>> unplug list is actually handled by plug mechanism instead
>> of other paths.
>>
>> Comments and tests are welcomed.
>>
>> The new changes are tested with "./test --raidtype=raid456",
>> seems good.

Could you help to test the change in your nvdimm environment (would be 
better
if more scenarios are covered) to avoid regression?

>>
>> Thanks,
>> Guoqing
>>
>>   drivers/md/raid5.c | 24 ++++++++++++++++++++++--
>>   1 file changed, 22 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index 223e97ab27e6..808b0bd18c8c 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -218,6 +218,25 @@ static void do_release_stripe(struct r5conf 
>> *conf, struct stripe_head *sh,
>>       BUG_ON(!list_empty(&sh->lru));
>>       BUG_ON(atomic_read(&conf->active_stripes)==0);
>>   +    /*
>> +     * If stripe is on unplug list then original caller of 
>> __release_stripe
>> +     * is not raid5_unplug, so sh->lru is still in cb->list, don't 
>> risk to
>> +     * add lru to another list in do_release_stripe.
>> +     */
>> +    if (!test_and_set_bit_lock(STRIPE_ON_UNPLUG_LIST, &sh->state))
>> +        clear_bit_unlock(STRIPE_ON_UNPLUG_LIST, &sh->state);
>> +    else {
>> +        /*
>> +         * The sh is on unplug list, so increase count (because count
>> +         * is decrease before enter do_release_stripe), then trigger
> is decreased before entering

Thanks for checking.

Cheers,
Guoqing

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] raid5: avoid add sh->lru to different list
       [not found] <20200108163023.9301-1-guoqing.jiang@cloud.ionos.com>
  2020-01-09  7:02 ` [PATCH] raid5: avoid add sh->lru to different list Xiao Ni
@ 2020-01-10  1:06 ` Song Liu
  2020-01-10 10:51   ` Guoqing Jiang
  1 sibling, 1 reply; 4+ messages in thread
From: Song Liu @ 2020-01-10  1:06 UTC (permalink / raw)
  To: Guoqing Jiang, Xiao Ni; +Cc: linux-raid, Guoqing Jiang

Hi Guoqing and Xiao,

Thanks for looking into this. I am still looking into this. Some
questions below...

On Wed, Jan 8, 2020 at 8:30 AM <jgq516@gmail.com> wrote:
[...]
>
>  drivers/md/raid5.c | 24 ++++++++++++++++++++++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 223e97ab27e6..808b0bd18c8c 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -218,6 +218,25 @@ static void do_release_stripe(struct r5conf *conf, struct stripe_head *sh,
>         BUG_ON(!list_empty(&sh->lru));
>         BUG_ON(atomic_read(&conf->active_stripes)==0);
>
> +       /*
> +        * If stripe is on unplug list then original caller of __release_stripe
> +        * is not raid5_unplug, so sh->lru is still in cb->list, don't risk to
> +        * add lru to another list in do_release_stripe.
> +        */

Can we do the check before calling into do_release_stripe()?

> +       if (!test_and_set_bit_lock(STRIPE_ON_UNPLUG_LIST, &sh->state))
> +               clear_bit_unlock(STRIPE_ON_UNPLUG_LIST, &sh->state);
> +       else {
> +               /*
> +                * The sh is on unplug list, so increase count (because count
> +                * is decrease before enter do_release_stripe), then trigger
> +                * raid5d -> plug -> raid5_unplug -> __release_stripe to handle
> +                * this stripe.
> +                */
> +               atomic_inc(&sh->count);
> +               md_wakeup_thread(conf->mddev->thread);
> +               return;
> +       }
> +
>         if (r5c_is_writeback(conf->log))
>                 for (i = sh->disks; i--; )
>                         if (test_bit(R5_InJournal, &sh->dev[i].flags))
> @@ -5441,7 +5460,7 @@ static void raid5_unplug(struct blk_plug_cb *blk_cb, bool from_schedule)
>                          * is still in our list
>                          */
>                         smp_mb__before_atomic();
> -                       clear_bit(STRIPE_ON_UNPLUG_LIST, &sh->state);
> +                       clear_bit_unlock(STRIPE_ON_UNPLUG_LIST, &sh->state);
>                         /*
>                          * STRIPE_ON_RELEASE_LIST could be set here. In that
>                          * case, the count is always > 1 here
> @@ -5481,7 +5500,8 @@ static void release_stripe_plug(struct mddev *mddev,
>                         INIT_LIST_HEAD(cb->temp_inactive_list + i);
>         }
>
> -       if (!test_and_set_bit(STRIPE_ON_UNPLUG_LIST, &sh->state))
> +       if (!test_and_set_bit_lock(STRIPE_ON_UNPLUG_LIST, &sh->state) &&
> +           (atomic_read(&sh->count) != 0))

Do we really see sh->count == 0 here? If yes, I guess we should check
sh->count first,
so that we reduce the case where STRIPE_ON_UNPLUG_LIST is set, but the sh is
not really on the unplug_list.

>                 list_add_tail(&sh->lru, &cb->list);
>         else
>                 raid5_release_stripe(sh);
>

Overall, I think we should try our best to let STRIPE_ON_RELEASE_LIST
means it is
on release_list, and STRIPE_ON_UNPLUG_LIST means it is on unplug_list. I think
there will be some exceptions, but we should try to minimize those cases.

Does this make sense?

Thanks,
Song

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] raid5: avoid add sh->lru to different list
  2020-01-10  1:06 ` Song Liu
@ 2020-01-10 10:51   ` Guoqing Jiang
  0 siblings, 0 replies; 4+ messages in thread
From: Guoqing Jiang @ 2020-01-10 10:51 UTC (permalink / raw)
  To: Song Liu, Guoqing Jiang, Xiao Ni; +Cc: linux-raid

Hi Song,

On 1/10/20 2:06 AM, Song Liu wrote:
> Hi Guoqing and Xiao,
>
> Thanks for looking into this. I am still looking into this. Some
> questions below...

Thanks for looking into it.

> On Wed, Jan 8, 2020 at 8:30 AM <jgq516@gmail.com> wrote:
> [...]
>>   drivers/md/raid5.c | 24 ++++++++++++++++++++++--
>>   1 file changed, 22 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index 223e97ab27e6..808b0bd18c8c 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -218,6 +218,25 @@ static void do_release_stripe(struct r5conf *conf, struct stripe_head *sh,
>>          BUG_ON(!list_empty(&sh->lru));
>>          BUG_ON(atomic_read(&conf->active_stripes)==0);
>>
>> +       /*
>> +        * If stripe is on unplug list then original caller of __release_stripe
>> +        * is not raid5_unplug, so sh->lru is still in cb->list, don't risk to
>> +        * add lru to another list in do_release_stripe.
>> +        */
> Can we do the check before calling into do_release_stripe()?

Then we need to add the check twice, one in raid5_release_stripe, 
another in
__release_stripe.

>
>> +       if (!test_and_set_bit_lock(STRIPE_ON_UNPLUG_LIST, &sh->state))
>> +               clear_bit_unlock(STRIPE_ON_UNPLUG_LIST, &sh->state);
>> +       else {
>> +               /*
>> +                * The sh is on unplug list, so increase count (because count
>> +                * is decrease before enter do_release_stripe), then trigger
>> +                * raid5d -> plug -> raid5_unplug -> __release_stripe to handle
>> +                * this stripe.
>> +                */
>> +               atomic_inc(&sh->count);
>> +               md_wakeup_thread(conf->mddev->thread);
>> +               return;
>> +       }
>> +
>>          if (r5c_is_writeback(conf->log))
>>                  for (i = sh->disks; i--; )
>>                          if (test_bit(R5_InJournal, &sh->dev[i].flags))
>> @@ -5441,7 +5460,7 @@ static void raid5_unplug(struct blk_plug_cb *blk_cb, bool from_schedule)
>>                           * is still in our list
>>                           */
>>                          smp_mb__before_atomic();
>> -                       clear_bit(STRIPE_ON_UNPLUG_LIST, &sh->state);
>> +                       clear_bit_unlock(STRIPE_ON_UNPLUG_LIST, &sh->state);
>>                          /*
>>                           * STRIPE_ON_RELEASE_LIST could be set here. In that
>>                           * case, the count is always > 1 here
>> @@ -5481,7 +5500,8 @@ static void release_stripe_plug(struct mddev *mddev,
>>                          INIT_LIST_HEAD(cb->temp_inactive_list + i);
>>          }
>>
>> -       if (!test_and_set_bit(STRIPE_ON_UNPLUG_LIST, &sh->state))
>> +       if (!test_and_set_bit_lock(STRIPE_ON_UNPLUG_LIST, &sh->state) &&
>> +           (atomic_read(&sh->count) != 0))
> Do we really see sh->count == 0 here? If yes, I guess we should check
> sh->count first,
> so that we reduce the case where STRIPE_ON_UNPLUG_LIST is set, but the sh is
> not really on the unplug_list.

Yes, it could be, thanks. But with check count first, I am worried a 
potential
race window which causes sh->lru could be added to different lists as well.

1. release_stripe_plug -> atomic_read(&sh->count) != 0

2. do_release_stripe -> atomic_dec_and_test(&sh->count)
!test_bit(STRIPE_ON_UNPLUG_LIST, &sh->state)
                                      list_add_tail(&sh->lru, ***_list);

3. release_stripe_plug -> !test_and_set_bit(STRIPE_ON_UNPLUG_LIST, 
&sh->state)
list_add_tail(&sh->lru, &cb->list);


IMO, the difficult thing is we need kind of atomic operation to check both
sh->count and ON_UNPLUG flag, but we do want lockless as much as possible.

How about get an extra count before add add sh to cb->list? I am 
thinking that
atomic_inc(&sh->count) and atomic_dec(&sh->count) should be paired, right?
Then do_release_stripe should not be triggered in theory. Something like.

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index d4d3b67ffbba..225bd9ca4e42 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -222,6 +222,24 @@ static void do_release_stripe(struct r5conf *conf, 
struct stripe_head *sh,
                 for (i = sh->disks; i--; )
                         if (test_bit(R5_InJournal, &sh->dev[i].flags))
                                 injournal++;
+
+       /*
+        * If stripe is on unplug list then original caller of 
__release_stripe
+        * is not raid5_unplug, so sh->lru is still in cb->list, don't 
risk to
+        * add lru to another list in do_release_stripe.
+        */
+       if (unlikely(test_bit(STRIPE_ON_UNPLUG_LIST, &sh->state))){
+               /*
+                * The sh is on unplug list, so increase count (because 
count
+                * is decreased before entering do_release_stripe), then 
trigger
+                * raid5d -> plug -> raid5_unplug -> __release_stripe to 
handle
+                * this stripe.
+                */
+               atomic_inc(&sh->count);
+               md_wakeup_thread(conf->mddev->thread);
+               return;
+       }
+
         /*
          * In the following cases, the stripe cannot be released to cached
          * lists. Therefore, we make the stripe write out and set
@@ -5481,10 +5499,15 @@ static void release_stripe_plug(struct mddev 
*mddev,
                         INIT_LIST_HEAD(cb->temp_inactive_list + i);
         }

-       if (!test_and_set_bit(STRIPE_ON_UNPLUG_LIST, &sh->state))
+       /* hold extra count to avoid the running of do_release_stripe */
+       atomic_inc(&sh->count);
+       if (!test_and_set_bit(STRIPE_ON_UNPLUG_LIST, &sh->state)) {
                 list_add_tail(&sh->lru, &cb->list);
-       else
+               atomic_dec(&sh->count);
+       } else {
+               atomic_dec(&sh->count);
                 raid5_release_stripe(sh);
+       }
  }

>>                  list_add_tail(&sh->lru, &cb->list);
>>          else
>>                  raid5_release_stripe(sh);
>>
> Overall, I think we should try our best to let STRIPE_ON_RELEASE_LIST
> means it is
> on release_list, and STRIPE_ON_UNPLUG_LIST means it is on unplug_list. I think
> there will be some exceptions, but we should try to minimize those cases.
>
> Does this make sense?

Agreed, though I don't see there is code to do it unfortunately ...

Thanks,
Guoqing

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-01-10 10:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200108163023.9301-1-guoqing.jiang@cloud.ionos.com>
2020-01-09  7:02 ` [PATCH] raid5: avoid add sh->lru to different list Xiao Ni
2020-01-09 10:32   ` Guoqing Jiang
2020-01-10  1:06 ` Song Liu
2020-01-10 10:51   ` Guoqing Jiang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.