All of lore.kernel.org
 help / color / mirror / Atom feed
From: Coly Li <colyli@suse.de>
To: Nate Dailey <nate.dailey@stratus.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: raid1: freeze_array/wait_all_barriers deadlock
Date: Sun, 15 Oct 2017 05:45:12 +0800	[thread overview]
Message-ID: <c63d6d36-b155-2e47-8b0d-d03d0916f410@suse.de> (raw)
In-Reply-To: <2e2fb33a-12a6-763a-bd8e-8dc4aa05fc13@stratus.com>

On 2017/10/14 上午2:32, Nate Dailey wrote:
> I hit the following deadlock:
> 
> PID: 1819   TASK: ffff9ca137dd42c0  CPU: 35 COMMAND: "md125_raid1"
>  #0 [ffffaba8c988fc18] __schedule at ffffffff8df6a84d
>  #1 [ffffaba8c988fca8] schedule at ffffffff8df6ae86
>  #2 [ffffaba8c988fcc0] freeze_array at ffffffffc017d866 [raid1]
>  #3 [ffffaba8c988fd20] handle_read_error at ffffffffc017fda1 [raid1]
>  #4 [ffffaba8c988fdd0] raid1d at ffffffffc01807d0 [raid1]
>  #5 [ffffaba8c988fea0] md_thread at ffffffff8ddc2e92
>  #6 [ffffaba8c988ff08] kthread at ffffffff8d8af739
>  #7 [ffffaba8c988ff50] ret_from_fork at ffffffff8df70485
> 
> PID: 7812   TASK: ffff9ca11f451640  CPU: 3 COMMAND: "md125_resync"
>  #0 [ffffaba8cb5d3b38] __schedule at ffffffff8df6a84d
>  #1 [ffffaba8cb5d3bc8] schedule at ffffffff8df6ae86
>  #2 [ffffaba8cb5d3be0] _wait_barrier at ffffffffc017cc81 [raid1]
>  #3 [ffffaba8cb5d3c40] raid1_sync_request at ffffffffc017db5e [raid1]
>  #4 [ffffaba8cb5d3d10] md_do_sync at ffffffff8ddc9799
>  #5 [ffffaba8cb5d3ea0] md_thread at ffffffff8ddc2e92
>  #6 [ffffaba8cb5d3f08] kthread at ffffffff8d8af739
>  #7 [ffffaba8cb5d3f50] ret_from_fork at ffffffff8df70485
> 
> The second one is actually raid1_sync_request -> close_sync ->
> wait_all_barriers.
> 
> The problem is that wait_all_barriers increments all nr_pending buckets,
> but those have no corresponding nr_queued. If freeze_array is called in
> the middle of wait_all_barriers, it hangs waiting for nr_pending and
> nr_queued to line up. This never happens because an in-progress
> _wait_barrier also gets stuck due to the freeze.
> 
> This was originally hit organically, but I was able to make it easier by
> inserting a 10ms delay before each _wait_barrier_call in
> wait_all_barriers, and a 4 sec delay before handle_read_error's call to
> freeze_array. Then, I start 2 dd processes reading from a raid1, start
> up a check, and pull a disk. Usually within 2 or 3 pulls I can hit the
> deadlock.

Hi Nate,

Nice catch! Thanks for the debug, I agree with your analysis for the
deadlock, neat :-)

> 
> I came up with a change that seems to avoid this, by manipulating
> nr_queued in wait/allow_all_barriers (not suggesting that this is the
> best way, but it seems safe at least):
> 

At first glance, I feel your fix works. But I worry about increasing and
decreasing nr_pending[idx] may introduce other race related to
"get_unqueued_pending() == extra" in freeze_array().

A solution I used when I wrote the barrier buckets, was to add a new
wait_event_* routine called wait_event_lock_irq_cmd_timeout(), which
wakes up freeze_array() after a timeout, to avoid a deadlock.

The reason whey I didn't use it in finally version was,
- the routine name is too long
- some hidden deadlock will not be trigger because freeze_array() will
be self-waken up.

For now, it seems maybe wait_event_lock_irq_cmd_timeout() has to be
used, again. Could you like to compose a patch with new
wait_event_lock_irq_cmd_timeout() and add a loop-after-timeout in
freeze_array() ? or if you are busy, I can handle this.

Thanks in advance.

Coly Li

> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index f3f3e40dc9d8..e34dfda1c629 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -994,8 +994,11 @@ static void wait_all_barriers(struct r1conf *conf)
>  {
>      int idx;
> 
> -    for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
> +    for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) {
>          _wait_barrier(conf, idx);
> +        atomic_inc(&conf->nr_queued[idx]);
> +        wake_up(&conf->wait_barrier);
> +    }
>  }
> 
>  static void _allow_barrier(struct r1conf *conf, int idx)
> @@ -1015,8 +1018,10 @@ static void allow_all_barriers(struct r1conf *conf)
>  {
>      int idx;
> 
> -    for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
> +    for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) {
> +        atomic_dec(&conf->nr_queued[idx]);
>          _allow_barrier(conf, idx);
> +    }
>  }
> 
>  /* conf->resync_lock should be held */

  reply	other threads:[~2017-10-14 21:45 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-13 18:32 raid1: freeze_array/wait_all_barriers deadlock Nate Dailey
2017-10-14 21:45 ` Coly Li [this message]
2017-10-16 12:58   ` Nate Dailey
2017-10-16 14:43     ` Coly Li
2017-10-16 15:34       ` Nate Dailey
2017-10-16 16:49         ` Coly Li
2017-10-16 17:30           ` Nate Dailey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c63d6d36-b155-2e47-8b0d-d03d0916f410@suse.de \
    --to=colyli@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=nate.dailey@stratus.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.