All of lore.kernel.org
 help / color / mirror / Atom feed
* raid1: freeze_array/wait_all_barriers deadlock
@ 2017-10-13 18:32 Nate Dailey
  2017-10-14 21:45 ` Coly Li
  0 siblings, 1 reply; 7+ messages in thread
From: Nate Dailey @ 2017-10-13 18:32 UTC (permalink / raw)
  To: linux-raid

I hit the following deadlock:

PID: 1819   TASK: ffff9ca137dd42c0  CPU: 35 COMMAND: "md125_raid1"
  #0 [ffffaba8c988fc18] __schedule at ffffffff8df6a84d
  #1 [ffffaba8c988fca8] schedule at ffffffff8df6ae86
  #2 [ffffaba8c988fcc0] freeze_array at ffffffffc017d866 [raid1]
  #3 [ffffaba8c988fd20] handle_read_error at ffffffffc017fda1 [raid1]
  #4 [ffffaba8c988fdd0] raid1d at ffffffffc01807d0 [raid1]
  #5 [ffffaba8c988fea0] md_thread at ffffffff8ddc2e92
  #6 [ffffaba8c988ff08] kthread at ffffffff8d8af739
  #7 [ffffaba8c988ff50] ret_from_fork at ffffffff8df70485

PID: 7812   TASK: ffff9ca11f451640  CPU: 3 COMMAND: "md125_resync"
  #0 [ffffaba8cb5d3b38] __schedule at ffffffff8df6a84d
  #1 [ffffaba8cb5d3bc8] schedule at ffffffff8df6ae86
  #2 [ffffaba8cb5d3be0] _wait_barrier at ffffffffc017cc81 [raid1]
  #3 [ffffaba8cb5d3c40] raid1_sync_request at ffffffffc017db5e [raid1]
  #4 [ffffaba8cb5d3d10] md_do_sync at ffffffff8ddc9799
  #5 [ffffaba8cb5d3ea0] md_thread at ffffffff8ddc2e92
  #6 [ffffaba8cb5d3f08] kthread at ffffffff8d8af739
  #7 [ffffaba8cb5d3f50] ret_from_fork at ffffffff8df70485

The second one is actually raid1_sync_request -> close_sync -> wait_all_barriers.

The problem is that wait_all_barriers increments all nr_pending buckets, but 
those have no corresponding nr_queued. If freeze_array is called in the middle 
of wait_all_barriers, it hangs waiting for nr_pending and nr_queued to line up. 
This never happens because an in-progress _wait_barrier also gets stuck due to 
the freeze.

This was originally hit organically, but I was able to make it easier by 
inserting a 10ms delay before each _wait_barrier_call in wait_all_barriers, and 
a 4 sec delay before handle_read_error's call to freeze_array. Then, I start 2 
dd processes reading from a raid1, start up a check, and pull a disk. Usually 
within 2 or 3 pulls I can hit the deadlock.

I came up with a change that seems to avoid this, by manipulating nr_queued in 
wait/allow_all_barriers (not suggesting that this is the best way, but it seems 
safe at least):

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index f3f3e40dc9d8..e34dfda1c629 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -994,8 +994,11 @@ static void wait_all_barriers(struct r1conf *conf)
  {
      int idx;

-    for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
+    for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) {
          _wait_barrier(conf, idx);
+        atomic_inc(&conf->nr_queued[idx]);
+        wake_up(&conf->wait_barrier);
+    }
  }

  static void _allow_barrier(struct r1conf *conf, int idx)
@@ -1015,8 +1018,10 @@ static void allow_all_barriers(struct r1conf *conf)
  {
      int idx;

-    for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
+    for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) {
+        atomic_dec(&conf->nr_queued[idx]);
          _allow_barrier(conf, idx);
+    }
  }

  /* conf->resync_lock should be held */


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: raid1: freeze_array/wait_all_barriers deadlock
  2017-10-13 18:32 raid1: freeze_array/wait_all_barriers deadlock Nate Dailey
@ 2017-10-14 21:45 ` Coly Li
  2017-10-16 12:58   ` Nate Dailey
  0 siblings, 1 reply; 7+ messages in thread
From: Coly Li @ 2017-10-14 21:45 UTC (permalink / raw)
  To: Nate Dailey; +Cc: linux-raid

On 2017/10/14 上午2:32, Nate Dailey wrote:
> I hit the following deadlock:
> 
> PID: 1819   TASK: ffff9ca137dd42c0  CPU: 35 COMMAND: "md125_raid1"
>  #0 [ffffaba8c988fc18] __schedule at ffffffff8df6a84d
>  #1 [ffffaba8c988fca8] schedule at ffffffff8df6ae86
>  #2 [ffffaba8c988fcc0] freeze_array at ffffffffc017d866 [raid1]
>  #3 [ffffaba8c988fd20] handle_read_error at ffffffffc017fda1 [raid1]
>  #4 [ffffaba8c988fdd0] raid1d at ffffffffc01807d0 [raid1]
>  #5 [ffffaba8c988fea0] md_thread at ffffffff8ddc2e92
>  #6 [ffffaba8c988ff08] kthread at ffffffff8d8af739
>  #7 [ffffaba8c988ff50] ret_from_fork at ffffffff8df70485
> 
> PID: 7812   TASK: ffff9ca11f451640  CPU: 3 COMMAND: "md125_resync"
>  #0 [ffffaba8cb5d3b38] __schedule at ffffffff8df6a84d
>  #1 [ffffaba8cb5d3bc8] schedule at ffffffff8df6ae86
>  #2 [ffffaba8cb5d3be0] _wait_barrier at ffffffffc017cc81 [raid1]
>  #3 [ffffaba8cb5d3c40] raid1_sync_request at ffffffffc017db5e [raid1]
>  #4 [ffffaba8cb5d3d10] md_do_sync at ffffffff8ddc9799
>  #5 [ffffaba8cb5d3ea0] md_thread at ffffffff8ddc2e92
>  #6 [ffffaba8cb5d3f08] kthread at ffffffff8d8af739
>  #7 [ffffaba8cb5d3f50] ret_from_fork at ffffffff8df70485
> 
> The second one is actually raid1_sync_request -> close_sync ->
> wait_all_barriers.
> 
> The problem is that wait_all_barriers increments all nr_pending buckets,
> but those have no corresponding nr_queued. If freeze_array is called in
> the middle of wait_all_barriers, it hangs waiting for nr_pending and
> nr_queued to line up. This never happens because an in-progress
> _wait_barrier also gets stuck due to the freeze.
> 
> This was originally hit organically, but I was able to make it easier by
> inserting a 10ms delay before each _wait_barrier_call in
> wait_all_barriers, and a 4 sec delay before handle_read_error's call to
> freeze_array. Then, I start 2 dd processes reading from a raid1, start
> up a check, and pull a disk. Usually within 2 or 3 pulls I can hit the
> deadlock.

Hi Nate,

Nice catch! Thanks for the debug, I agree with your analysis for the
deadlock, neat :-)

> 
> I came up with a change that seems to avoid this, by manipulating
> nr_queued in wait/allow_all_barriers (not suggesting that this is the
> best way, but it seems safe at least):
> 

At first glance, I feel your fix works. But I worry about increasing and
decreasing nr_pending[idx] may introduce other race related to
"get_unqueued_pending() == extra" in freeze_array().

A solution I used when I wrote the barrier buckets, was to add a new
wait_event_* routine called wait_event_lock_irq_cmd_timeout(), which
wakes up freeze_array() after a timeout, to avoid a deadlock.

The reason whey I didn't use it in finally version was,
- the routine name is too long
- some hidden deadlock will not be trigger because freeze_array() will
be self-waken up.

For now, it seems maybe wait_event_lock_irq_cmd_timeout() has to be
used, again. Could you like to compose a patch with new
wait_event_lock_irq_cmd_timeout() and add a loop-after-timeout in
freeze_array() ? or if you are busy, I can handle this.

Thanks in advance.

Coly Li

> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index f3f3e40dc9d8..e34dfda1c629 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -994,8 +994,11 @@ static void wait_all_barriers(struct r1conf *conf)
>  {
>      int idx;
> 
> -    for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
> +    for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) {
>          _wait_barrier(conf, idx);
> +        atomic_inc(&conf->nr_queued[idx]);
> +        wake_up(&conf->wait_barrier);
> +    }
>  }
> 
>  static void _allow_barrier(struct r1conf *conf, int idx)
> @@ -1015,8 +1018,10 @@ static void allow_all_barriers(struct r1conf *conf)
>  {
>      int idx;
> 
> -    for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
> +    for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) {
> +        atomic_dec(&conf->nr_queued[idx]);
>          _allow_barrier(conf, idx);
> +    }
>  }
> 
>  /* conf->resync_lock should be held */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid1: freeze_array/wait_all_barriers deadlock
  2017-10-14 21:45 ` Coly Li
@ 2017-10-16 12:58   ` Nate Dailey
  2017-10-16 14:43     ` Coly Li
  0 siblings, 1 reply; 7+ messages in thread
From: Nate Dailey @ 2017-10-16 12:58 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-raid

Hi Coly, I'm not sure I understand the change you're proposing. Would it be 
something like the following?

         spin_lock_irq(&conf->resync_lock);
         conf->array_frozen = 1;
         raid1_log(conf->mddev, "wait freeze");
         while (get_unqueued_pending(conf) != extra) {
             wait_event_lock_irq_cmd_timeout(
                 conf->wait_barrier,
                 get_unqueued_pending(conf) == extra,
                 conf->resync_lock,
                 flush_pending_writes(conf),
                 timeout);
         }
         spin_unlock_irq(&conf->resync_lock);

On its own, I don't see how this would make any difference. Until array_frozen 
== 0, wait_all_barriers will continue to be blocked, which in turn will prevent 
the condition freeze_array is waiting on from ever becoming true.

Or should something else be done inside the new freeze_array loop that would 
allow wait_all_barriers to make progress?

Thanks,
Nate


On 10/14/2017 05:45 PM, Coly Li wrote:
> On 2017/10/14 上午2:32, Nate Dailey wrote:
> > I hit the following deadlock:
> >
> > PID: 1819   TASK: ffff9ca137dd42c0  CPU: 35 COMMAND: "md125_raid1"
> >  #0 [ffffaba8c988fc18] __schedule at ffffffff8df6a84d
> >  #1 [ffffaba8c988fca8] schedule at ffffffff8df6ae86
> >  #2 [ffffaba8c988fcc0] freeze_array at ffffffffc017d866 [raid1]
> >  #3 [ffffaba8c988fd20] handle_read_error at ffffffffc017fda1 [raid1]
> >  #4 [ffffaba8c988fdd0] raid1d at ffffffffc01807d0 [raid1]
> >  #5 [ffffaba8c988fea0] md_thread at ffffffff8ddc2e92
> >  #6 [ffffaba8c988ff08] kthread at ffffffff8d8af739
> >  #7 [ffffaba8c988ff50] ret_from_fork at ffffffff8df70485
> >
> > PID: 7812   TASK: ffff9ca11f451640  CPU: 3 COMMAND: "md125_resync"
> >  #0 [ffffaba8cb5d3b38] __schedule at ffffffff8df6a84d
> >  #1 [ffffaba8cb5d3bc8] schedule at ffffffff8df6ae86
> >  #2 [ffffaba8cb5d3be0] _wait_barrier at ffffffffc017cc81 [raid1]
> >  #3 [ffffaba8cb5d3c40] raid1_sync_request at ffffffffc017db5e [raid1]
> >  #4 [ffffaba8cb5d3d10] md_do_sync at ffffffff8ddc9799
> >  #5 [ffffaba8cb5d3ea0] md_thread at ffffffff8ddc2e92
> >  #6 [ffffaba8cb5d3f08] kthread at ffffffff8d8af739
> >  #7 [ffffaba8cb5d3f50] ret_from_fork at ffffffff8df70485
> >
> > The second one is actually raid1_sync_request -> close_sync ->
> > wait_all_barriers.
> >
> > The problem is that wait_all_barriers increments all nr_pending buckets,
> > but those have no corresponding nr_queued. If freeze_array is called in
> > the middle of wait_all_barriers, it hangs waiting for nr_pending and
> > nr_queued to line up. This never happens because an in-progress
> > _wait_barrier also gets stuck due to the freeze.
> >
> > This was originally hit organically, but I was able to make it easier by
> > inserting a 10ms delay before each _wait_barrier_call in
> > wait_all_barriers, and a 4 sec delay before handle_read_error's call to
> > freeze_array. Then, I start 2 dd processes reading from a raid1, start
> > up a check, and pull a disk. Usually within 2 or 3 pulls I can hit the
> > deadlock.
>
> Hi Nate,
>
> Nice catch! Thanks for the debug, I agree with your analysis for the
> deadlock, neat :-)
>
> >
> > I came up with a change that seems to avoid this, by manipulating
> > nr_queued in wait/allow_all_barriers (not suggesting that this is the
> > best way, but it seems safe at least):
> >
>
> At first glance, I feel your fix works. But I worry about increasing and
> decreasing nr_pending[idx] may introduce other race related to
> "get_unqueued_pending() == extra" in freeze_array().
>
> A solution I used when I wrote the barrier buckets, was to add a new
> wait_event_* routine called wait_event_lock_irq_cmd_timeout(), which
> wakes up freeze_array() after a timeout, to avoid a deadlock.
>
> The reason whey I didn't use it in finally version was,
> - the routine name is too long
> - some hidden deadlock will not be trigger because freeze_array() will
> be self-waken up.
>
> For now, it seems maybe wait_event_lock_irq_cmd_timeout() has to be
> used, again. Could you like to compose a patch with new
> wait_event_lock_irq_cmd_timeout() and add a loop-after-timeout in
> freeze_array() ? or if you are busy, I can handle this.
>
> Thanks in advance.
>
> Coly Li
>
> > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > index f3f3e40dc9d8..e34dfda1c629 100644
> > --- a/drivers/md/raid1.c
> > +++ b/drivers/md/raid1.c
> > @@ -994,8 +994,11 @@ static void wait_all_barriers(struct r1conf *conf)
> >  {
> >      int idx;
> >
> > -    for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
> > +    for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) {
> >          _wait_barrier(conf, idx);
> > +        atomic_inc(&conf->nr_queued[idx]);
> > +        wake_up(&conf->wait_barrier);
> > +    }
> >  }
> >
> >  static void _allow_barrier(struct r1conf *conf, int idx)
> > @@ -1015,8 +1018,10 @@ static void allow_all_barriers(struct r1conf *conf)
> >  {
> >      int idx;
> >
> > -    for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
> > +    for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) {
> > +        atomic_dec(&conf->nr_queued[idx]);
> >          _allow_barrier(conf, idx);
> > +    }
> >  }
> >
> >  /* conf->resync_lock should be held */
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html 
> <http://vger.kernel.org/majordomo-info.html>



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid1: freeze_array/wait_all_barriers deadlock
  2017-10-16 12:58   ` Nate Dailey
@ 2017-10-16 14:43     ` Coly Li
  2017-10-16 15:34       ` Nate Dailey
  0 siblings, 1 reply; 7+ messages in thread
From: Coly Li @ 2017-10-16 14:43 UTC (permalink / raw)
  To: Nate Dailey; +Cc: linux-raid

On 2017/10/16 下午8:58, Nate Dailey wrote:
> Hi Coly, I'm not sure I understand the change you're proposing. Would it
> be something like the following?
> 
>         spin_lock_irq(&conf->resync_lock);
>         conf->array_frozen = 1;
>         raid1_log(conf->mddev, "wait freeze");
>         while (get_unqueued_pending(conf) != extra) {
>             wait_event_lock_irq_cmd_timeout(
>                 conf->wait_barrier,
>                 get_unqueued_pending(conf) == extra,
>                 conf->resync_lock,
>                 flush_pending_writes(conf),
>                 timeout);
>         }
>         spin_unlock_irq(&conf->resync_lock);
> 
> On its own, I don't see how this would make any difference. Until
> array_frozen == 0, wait_all_barriers will continue to be blocked, which
> in turn will prevent the condition freeze_array is waiting on from ever
> becoming true.

Hi Nate,

You are right, this idea does not help too much, we need to find another
way.

> Or should something else be done inside the new freeze_array loop that
> would allow wait_all_barriers to make progress?

It seems wait_all_barriers() is only used in close_sync(), which is to
make sure all sync requests hit platters before raid1_sync_request()
returns.

How about setting a critical section in close_sync() and protected by
another lock. It might be something like this,

static void close_sync(struct r1conf *conf)
{
+	mutex_lock close_sync_lock;
        wait_all_barriers(conf);
+	mutex_unlock close_sync_lock;
        allow_all_barriers(conf);
	[snip]
}


static void freeze_array(struct r1conf *conf, int extra)
{
+	mutex_lock close_sync_lock
        spin_lock_irq(&conf->resync_lock);
	[snip]
        spin_unlock_irq(&conf->resync_lock);
+	mutex_unlock close_sync_lock
}

Then conf->array_frozen won't be set when wait_all_barriers() partially
iterated on barrier buckets. Then a deadlock can be avoided.

How do you think of this one ?

Thanks.

Coly Li


[snip]


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid1: freeze_array/wait_all_barriers deadlock
  2017-10-16 14:43     ` Coly Li
@ 2017-10-16 15:34       ` Nate Dailey
  2017-10-16 16:49         ` Coly Li
  0 siblings, 1 reply; 7+ messages in thread
From: Nate Dailey @ 2017-10-16 15:34 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-raid

Hi Coly, I'm not 100% sure that's going to work.

What I'm imagining is that close_sync gets the lock, so now 
raid1d/handle_read_error/freeze_array is blocked waiting for it.

Is it possible that raid1d might need to complete some sync IO from the 
retry_list, that close_sync/wait_all_barriers is waiting for? In that case, 
there would still be a deadlock.

Actually, what if close_sync just did this:

static void close_sync(struct r1conf *conf)
{
         int idx;

         for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) {
                 _wait_barrier(conf, idx);
                 _allow_barrier(conf, idx);
         }
...

Is there any reason all the _wait_barriers have to complete before starting on 
_allow_barrier? If not, then I think this change would work (but will think 
about it more).

Nate



On 10/16/2017 10:43 AM, Coly Li wrote:
> On 2017/10/16 下午8:58, Nate Dailey wrote:
>> Hi Coly, I'm not sure I understand the change you're proposing. Would it
>> be something like the following?
>>
>>          spin_lock_irq(&conf->resync_lock);
>>          conf->array_frozen = 1;
>>          raid1_log(conf->mddev, "wait freeze");
>>          while (get_unqueued_pending(conf) != extra) {
>>              wait_event_lock_irq_cmd_timeout(
>>                  conf->wait_barrier,
>>                  get_unqueued_pending(conf) == extra,
>>                  conf->resync_lock,
>>                  flush_pending_writes(conf),
>>                  timeout);
>>          }
>>          spin_unlock_irq(&conf->resync_lock);
>>
>> On its own, I don't see how this would make any difference. Until
>> array_frozen == 0, wait_all_barriers will continue to be blocked, which
>> in turn will prevent the condition freeze_array is waiting on from ever
>> becoming true.
> Hi Nate,
>
> You are right, this idea does not help too much, we need to find another
> way.
>
>> Or should something else be done inside the new freeze_array loop that
>> would allow wait_all_barriers to make progress?
> It seems wait_all_barriers() is only used in close_sync(), which is to
> make sure all sync requests hit platters before raid1_sync_request()
> returns.
>
> How about setting a critical section in close_sync() and protected by
> another lock. It might be something like this,
>
> static void close_sync(struct r1conf *conf)
> {
> +	mutex_lock close_sync_lock;
>          wait_all_barriers(conf);
> +	mutex_unlock close_sync_lock;
>          allow_all_barriers(conf);
> 	[snip]
> }
>
>
> static void freeze_array(struct r1conf *conf, int extra)
> {
> +	mutex_lock close_sync_lock
>          spin_lock_irq(&conf->resync_lock);
> 	[snip]
>          spin_unlock_irq(&conf->resync_lock);
> +	mutex_unlock close_sync_lock
> }
>
> Then conf->array_frozen won't be set when wait_all_barriers() partially
> iterated on barrier buckets. Then a deadlock can be avoided.
>
> How do you think of this one ?
>
> Thanks.
>
> Coly Li
>
>
> [snip]
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid1: freeze_array/wait_all_barriers deadlock
  2017-10-16 15:34       ` Nate Dailey
@ 2017-10-16 16:49         ` Coly Li
  2017-10-16 17:30           ` Nate Dailey
  0 siblings, 1 reply; 7+ messages in thread
From: Coly Li @ 2017-10-16 16:49 UTC (permalink / raw)
  To: Nate Dailey; +Cc: linux-raid

On 2017/10/16 下午11:34, Nate Dailey wrote:
> Hi Coly, I'm not 100% sure that's going to work.
> 
> What I'm imagining is that close_sync gets the lock, so now
> raid1d/handle_read_error/freeze_array is blocked waiting for it.
> 

Hi Nate,

Let me think about this code path. If freeze_array() is blocked by mutex
in handle_read_error(), it means conf->array_frozen is 0 still.
_wait_barrier() only waits if 1) barrier[idx] is >0, 2) array_frozen is
1. So this situation won't block _wait_barrier() or wait_read_bucket()
in all buckets.

Then the loop to wait barrier for all buckets won't be blocked if
freeze_array() waiting on regular I/O path. But if freeze_array()
happens on resync I/O code path, deadlock is possible.

Fortunately it seems no freeze_array() called on resync I/O code path.
So the mutex seems to be safe so far.

But adding a more lock always makes people nervous, and makes code
complicated.... I don't like this :(

> Is it possible that raid1d might need to complete some sync IO from the
> retry_list, that close_sync/wait_all_barriers is waiting for? In that
> case, there would still be a deadlock.
> 
> Actually, what if close_sync just did this:
> 
> static void close_sync(struct r1conf *conf)
> {
>         int idx;
> 
>         for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) {
>                 _wait_barrier(conf, idx);
>                 _allow_barrier(conf, idx);

And you may remove wait_all_barriers() and allow_all_barriers(), we have
less code, yeah!

>         }
> ...
> 
> Is there any reason all the _wait_barriers have to complete before
> starting on _allow_barrier? If not, then I think this change would work
> (but will think about it more).
> 

No reason that _allow_barrie() should wait for all _wait_barrier()
completed here. wait_all_barriers() in close_sync() is to make sure all
flying resync I/O finished. So when it is called, no new resync I/O will
be generated, it is used to only wait for flying resync I/O to be
completed. So you modification works, I feel it is safe.

And the above code is much simpler, I like it :-)

Thanks.

Coly Li


> 
> On 10/16/2017 10:43 AM, Coly Li wrote:
>> On 2017/10/16 下午8:58, Nate Dailey wrote:
>>> Hi Coly, I'm not sure I understand the change you're proposing. Would it
>>> be something like the following?
>>>
>>>          spin_lock_irq(&conf->resync_lock);
>>>          conf->array_frozen = 1;
>>>          raid1_log(conf->mddev, "wait freeze");
>>>          while (get_unqueued_pending(conf) != extra) {
>>>              wait_event_lock_irq_cmd_timeout(
>>>                  conf->wait_barrier,
>>>                  get_unqueued_pending(conf) == extra,
>>>                  conf->resync_lock,
>>>                  flush_pending_writes(conf),
>>>                  timeout);
>>>          }
>>>          spin_unlock_irq(&conf->resync_lock);
>>>
>>> On its own, I don't see how this would make any difference. Until
>>> array_frozen == 0, wait_all_barriers will continue to be blocked, which
>>> in turn will prevent the condition freeze_array is waiting on from ever
>>> becoming true.
>> Hi Nate,
>>
>> You are right, this idea does not help too much, we need to find another
>> way.
>>
>>> Or should something else be done inside the new freeze_array loop that
>>> would allow wait_all_barriers to make progress?
>> It seems wait_all_barriers() is only used in close_sync(), which is to
>> make sure all sync requests hit platters before raid1_sync_request()
>> returns.
>>
>> How about setting a critical section in close_sync() and protected by
>> another lock. It might be something like this,
>>
>> static void close_sync(struct r1conf *conf)
>> {
>> +    mutex_lock close_sync_lock;
>>          wait_all_barriers(conf);
>> +    mutex_unlock close_sync_lock;
>>          allow_all_barriers(conf);
>>     [snip]
>> }
>>
>>
>> static void freeze_array(struct r1conf *conf, int extra)
>> {
>> +    mutex_lock close_sync_lock
>>          spin_lock_irq(&conf->resync_lock);
>>     [snip]
>>          spin_unlock_irq(&conf->resync_lock);
>> +    mutex_unlock close_sync_lock
>> }
>>
>> Then conf->array_frozen won't be set when wait_all_barriers() partially
>> iterated on barrier buckets. Then a deadlock can be avoided.
>>
>> How do you think of this one ?
>>
>> Thanks.
>>
>> Coly Li


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid1: freeze_array/wait_all_barriers deadlock
  2017-10-16 16:49         ` Coly Li
@ 2017-10-16 17:30           ` Nate Dailey
  0 siblings, 0 replies; 7+ messages in thread
From: Nate Dailey @ 2017-10-16 17:30 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-raid

Great, thanks!

I will test this and submit a patch if it works correctly.

Nate


On 10/16/2017 12:49 PM, Coly Li wrote:
> On 2017/10/16 下午11:34, Nate Dailey wrote:
>> Hi Coly, I'm not 100% sure that's going to work.
>>
>> What I'm imagining is that close_sync gets the lock, so now
>> raid1d/handle_read_error/freeze_array is blocked waiting for it.
>>
> Hi Nate,
>
> Let me think about this code path. If freeze_array() is blocked by mutex
> in handle_read_error(), it means conf->array_frozen is 0 still.
> _wait_barrier() only waits if 1) barrier[idx] is >0, 2) array_frozen is
> 1. So this situation won't block _wait_barrier() or wait_read_bucket()
> in all buckets.
>
> Then the loop to wait barrier for all buckets won't be blocked if
> freeze_array() waiting on regular I/O path. But if freeze_array()
> happens on resync I/O code path, deadlock is possible.
>
> Fortunately it seems no freeze_array() called on resync I/O code path.
> So the mutex seems to be safe so far.
>
> But adding a more lock always makes people nervous, and makes code
> complicated.... I don't like this :(
>
>> Is it possible that raid1d might need to complete some sync IO from the
>> retry_list, that close_sync/wait_all_barriers is waiting for? In that
>> case, there would still be a deadlock.
>>
>> Actually, what if close_sync just did this:
>>
>> static void close_sync(struct r1conf *conf)
>> {
>>          int idx;
>>
>>          for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) {
>>                  _wait_barrier(conf, idx);
>>                  _allow_barrier(conf, idx);
> And you may remove wait_all_barriers() and allow_all_barriers(), we have
> less code, yeah!
>
>>          }
>> ...
>>
>> Is there any reason all the _wait_barriers have to complete before
>> starting on _allow_barrier? If not, then I think this change would work
>> (but will think about it more).
>>
> No reason that _allow_barrie() should wait for all _wait_barrier()
> completed here. wait_all_barriers() in close_sync() is to make sure all
> flying resync I/O finished. So when it is called, no new resync I/O will
> be generated, it is used to only wait for flying resync I/O to be
> completed. So you modification works, I feel it is safe.
>
> And the above code is much simpler, I like it :-)
>
> Thanks.
>
> Coly Li
>
>
>> On 10/16/2017 10:43 AM, Coly Li wrote:
>>> On 2017/10/16 下午8:58, Nate Dailey wrote:
>>>> Hi Coly, I'm not sure I understand the change you're proposing. Would it
>>>> be something like the following?
>>>>
>>>>           spin_lock_irq(&conf->resync_lock);
>>>>           conf->array_frozen = 1;
>>>>           raid1_log(conf->mddev, "wait freeze");
>>>>           while (get_unqueued_pending(conf) != extra) {
>>>>               wait_event_lock_irq_cmd_timeout(
>>>>                   conf->wait_barrier,
>>>>                   get_unqueued_pending(conf) == extra,
>>>>                   conf->resync_lock,
>>>>                   flush_pending_writes(conf),
>>>>                   timeout);
>>>>           }
>>>>           spin_unlock_irq(&conf->resync_lock);
>>>>
>>>> On its own, I don't see how this would make any difference. Until
>>>> array_frozen == 0, wait_all_barriers will continue to be blocked, which
>>>> in turn will prevent the condition freeze_array is waiting on from ever
>>>> becoming true.
>>> Hi Nate,
>>>
>>> You are right, this idea does not help too much, we need to find another
>>> way.
>>>
>>>> Or should something else be done inside the new freeze_array loop that
>>>> would allow wait_all_barriers to make progress?
>>> It seems wait_all_barriers() is only used in close_sync(), which is to
>>> make sure all sync requests hit platters before raid1_sync_request()
>>> returns.
>>>
>>> How about setting a critical section in close_sync() and protected by
>>> another lock. It might be something like this,
>>>
>>> static void close_sync(struct r1conf *conf)
>>> {
>>> +    mutex_lock close_sync_lock;
>>>           wait_all_barriers(conf);
>>> +    mutex_unlock close_sync_lock;
>>>           allow_all_barriers(conf);
>>>      [snip]
>>> }
>>>
>>>
>>> static void freeze_array(struct r1conf *conf, int extra)
>>> {
>>> +    mutex_lock close_sync_lock
>>>           spin_lock_irq(&conf->resync_lock);
>>>      [snip]
>>>           spin_unlock_irq(&conf->resync_lock);
>>> +    mutex_unlock close_sync_lock
>>> }
>>>
>>> Then conf->array_frozen won't be set when wait_all_barriers() partially
>>> iterated on barrier buckets. Then a deadlock can be avoided.
>>>
>>> How do you think of this one ?
>>>
>>> Thanks.
>>>
>>> Coly Li



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-10-16 17:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-13 18:32 raid1: freeze_array/wait_all_barriers deadlock Nate Dailey
2017-10-14 21:45 ` Coly Li
2017-10-16 12:58   ` Nate Dailey
2017-10-16 14:43     ` Coly Li
2017-10-16 15:34       ` Nate Dailey
2017-10-16 16:49         ` Coly Li
2017-10-16 17:30           ` Nate Dailey

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.