* raid1: freeze_array/wait_all_barriers deadlock @ 2017-10-13 18:32 Nate Dailey 2017-10-14 21:45 ` Coly Li 0 siblings, 1 reply; 7+ messages in thread From: Nate Dailey @ 2017-10-13 18:32 UTC (permalink / raw) To: linux-raid I hit the following deadlock: PID: 1819 TASK: ffff9ca137dd42c0 CPU: 35 COMMAND: "md125_raid1" #0 [ffffaba8c988fc18] __schedule at ffffffff8df6a84d #1 [ffffaba8c988fca8] schedule at ffffffff8df6ae86 #2 [ffffaba8c988fcc0] freeze_array at ffffffffc017d866 [raid1] #3 [ffffaba8c988fd20] handle_read_error at ffffffffc017fda1 [raid1] #4 [ffffaba8c988fdd0] raid1d at ffffffffc01807d0 [raid1] #5 [ffffaba8c988fea0] md_thread at ffffffff8ddc2e92 #6 [ffffaba8c988ff08] kthread at ffffffff8d8af739 #7 [ffffaba8c988ff50] ret_from_fork at ffffffff8df70485 PID: 7812 TASK: ffff9ca11f451640 CPU: 3 COMMAND: "md125_resync" #0 [ffffaba8cb5d3b38] __schedule at ffffffff8df6a84d #1 [ffffaba8cb5d3bc8] schedule at ffffffff8df6ae86 #2 [ffffaba8cb5d3be0] _wait_barrier at ffffffffc017cc81 [raid1] #3 [ffffaba8cb5d3c40] raid1_sync_request at ffffffffc017db5e [raid1] #4 [ffffaba8cb5d3d10] md_do_sync at ffffffff8ddc9799 #5 [ffffaba8cb5d3ea0] md_thread at ffffffff8ddc2e92 #6 [ffffaba8cb5d3f08] kthread at ffffffff8d8af739 #7 [ffffaba8cb5d3f50] ret_from_fork at ffffffff8df70485 The second one is actually raid1_sync_request -> close_sync -> wait_all_barriers. The problem is that wait_all_barriers increments all nr_pending buckets, but those have no corresponding nr_queued. If freeze_array is called in the middle of wait_all_barriers, it hangs waiting for nr_pending and nr_queued to line up. This never happens because an in-progress _wait_barrier also gets stuck due to the freeze. This was originally hit organically, but I was able to make it easier by inserting a 10ms delay before each _wait_barrier_call in wait_all_barriers, and a 4 sec delay before handle_read_error's call to freeze_array. Then, I start 2 dd processes reading from a raid1, start up a check, and pull a disk. Usually within 2 or 3 pulls I can hit the deadlock. I came up with a change that seems to avoid this, by manipulating nr_queued in wait/allow_all_barriers (not suggesting that this is the best way, but it seems safe at least): diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index f3f3e40dc9d8..e34dfda1c629 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -994,8 +994,11 @@ static void wait_all_barriers(struct r1conf *conf) { int idx; - for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) + for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) { _wait_barrier(conf, idx); + atomic_inc(&conf->nr_queued[idx]); + wake_up(&conf->wait_barrier); + } } static void _allow_barrier(struct r1conf *conf, int idx) @@ -1015,8 +1018,10 @@ static void allow_all_barriers(struct r1conf *conf) { int idx; - for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) + for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) { + atomic_dec(&conf->nr_queued[idx]); _allow_barrier(conf, idx); + } } /* conf->resync_lock should be held */ ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: raid1: freeze_array/wait_all_barriers deadlock 2017-10-13 18:32 raid1: freeze_array/wait_all_barriers deadlock Nate Dailey @ 2017-10-14 21:45 ` Coly Li 2017-10-16 12:58 ` Nate Dailey 0 siblings, 1 reply; 7+ messages in thread From: Coly Li @ 2017-10-14 21:45 UTC (permalink / raw) To: Nate Dailey; +Cc: linux-raid On 2017/10/14 上午2:32, Nate Dailey wrote: > I hit the following deadlock: > > PID: 1819 TASK: ffff9ca137dd42c0 CPU: 35 COMMAND: "md125_raid1" > #0 [ffffaba8c988fc18] __schedule at ffffffff8df6a84d > #1 [ffffaba8c988fca8] schedule at ffffffff8df6ae86 > #2 [ffffaba8c988fcc0] freeze_array at ffffffffc017d866 [raid1] > #3 [ffffaba8c988fd20] handle_read_error at ffffffffc017fda1 [raid1] > #4 [ffffaba8c988fdd0] raid1d at ffffffffc01807d0 [raid1] > #5 [ffffaba8c988fea0] md_thread at ffffffff8ddc2e92 > #6 [ffffaba8c988ff08] kthread at ffffffff8d8af739 > #7 [ffffaba8c988ff50] ret_from_fork at ffffffff8df70485 > > PID: 7812 TASK: ffff9ca11f451640 CPU: 3 COMMAND: "md125_resync" > #0 [ffffaba8cb5d3b38] __schedule at ffffffff8df6a84d > #1 [ffffaba8cb5d3bc8] schedule at ffffffff8df6ae86 > #2 [ffffaba8cb5d3be0] _wait_barrier at ffffffffc017cc81 [raid1] > #3 [ffffaba8cb5d3c40] raid1_sync_request at ffffffffc017db5e [raid1] > #4 [ffffaba8cb5d3d10] md_do_sync at ffffffff8ddc9799 > #5 [ffffaba8cb5d3ea0] md_thread at ffffffff8ddc2e92 > #6 [ffffaba8cb5d3f08] kthread at ffffffff8d8af739 > #7 [ffffaba8cb5d3f50] ret_from_fork at ffffffff8df70485 > > The second one is actually raid1_sync_request -> close_sync -> > wait_all_barriers. > > The problem is that wait_all_barriers increments all nr_pending buckets, > but those have no corresponding nr_queued. If freeze_array is called in > the middle of wait_all_barriers, it hangs waiting for nr_pending and > nr_queued to line up. This never happens because an in-progress > _wait_barrier also gets stuck due to the freeze. > > This was originally hit organically, but I was able to make it easier by > inserting a 10ms delay before each _wait_barrier_call in > wait_all_barriers, and a 4 sec delay before handle_read_error's call to > freeze_array. Then, I start 2 dd processes reading from a raid1, start > up a check, and pull a disk. Usually within 2 or 3 pulls I can hit the > deadlock. Hi Nate, Nice catch! Thanks for the debug, I agree with your analysis for the deadlock, neat :-) > > I came up with a change that seems to avoid this, by manipulating > nr_queued in wait/allow_all_barriers (not suggesting that this is the > best way, but it seems safe at least): > At first glance, I feel your fix works. But I worry about increasing and decreasing nr_pending[idx] may introduce other race related to "get_unqueued_pending() == extra" in freeze_array(). A solution I used when I wrote the barrier buckets, was to add a new wait_event_* routine called wait_event_lock_irq_cmd_timeout(), which wakes up freeze_array() after a timeout, to avoid a deadlock. The reason whey I didn't use it in finally version was, - the routine name is too long - some hidden deadlock will not be trigger because freeze_array() will be self-waken up. For now, it seems maybe wait_event_lock_irq_cmd_timeout() has to be used, again. Could you like to compose a patch with new wait_event_lock_irq_cmd_timeout() and add a loop-after-timeout in freeze_array() ? or if you are busy, I can handle this. Thanks in advance. Coly Li > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index f3f3e40dc9d8..e34dfda1c629 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -994,8 +994,11 @@ static void wait_all_barriers(struct r1conf *conf) > { > int idx; > > - for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) > + for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) { > _wait_barrier(conf, idx); > + atomic_inc(&conf->nr_queued[idx]); > + wake_up(&conf->wait_barrier); > + } > } > > static void _allow_barrier(struct r1conf *conf, int idx) > @@ -1015,8 +1018,10 @@ static void allow_all_barriers(struct r1conf *conf) > { > int idx; > > - for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) > + for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) { > + atomic_dec(&conf->nr_queued[idx]); > _allow_barrier(conf, idx); > + } > } > > /* conf->resync_lock should be held */ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: raid1: freeze_array/wait_all_barriers deadlock 2017-10-14 21:45 ` Coly Li @ 2017-10-16 12:58 ` Nate Dailey 2017-10-16 14:43 ` Coly Li 0 siblings, 1 reply; 7+ messages in thread From: Nate Dailey @ 2017-10-16 12:58 UTC (permalink / raw) To: Coly Li; +Cc: linux-raid Hi Coly, I'm not sure I understand the change you're proposing. Would it be something like the following? spin_lock_irq(&conf->resync_lock); conf->array_frozen = 1; raid1_log(conf->mddev, "wait freeze"); while (get_unqueued_pending(conf) != extra) { wait_event_lock_irq_cmd_timeout( conf->wait_barrier, get_unqueued_pending(conf) == extra, conf->resync_lock, flush_pending_writes(conf), timeout); } spin_unlock_irq(&conf->resync_lock); On its own, I don't see how this would make any difference. Until array_frozen == 0, wait_all_barriers will continue to be blocked, which in turn will prevent the condition freeze_array is waiting on from ever becoming true. Or should something else be done inside the new freeze_array loop that would allow wait_all_barriers to make progress? Thanks, Nate On 10/14/2017 05:45 PM, Coly Li wrote: > On 2017/10/14 上午2:32, Nate Dailey wrote: > > I hit the following deadlock: > > > > PID: 1819 TASK: ffff9ca137dd42c0 CPU: 35 COMMAND: "md125_raid1" > > #0 [ffffaba8c988fc18] __schedule at ffffffff8df6a84d > > #1 [ffffaba8c988fca8] schedule at ffffffff8df6ae86 > > #2 [ffffaba8c988fcc0] freeze_array at ffffffffc017d866 [raid1] > > #3 [ffffaba8c988fd20] handle_read_error at ffffffffc017fda1 [raid1] > > #4 [ffffaba8c988fdd0] raid1d at ffffffffc01807d0 [raid1] > > #5 [ffffaba8c988fea0] md_thread at ffffffff8ddc2e92 > > #6 [ffffaba8c988ff08] kthread at ffffffff8d8af739 > > #7 [ffffaba8c988ff50] ret_from_fork at ffffffff8df70485 > > > > PID: 7812 TASK: ffff9ca11f451640 CPU: 3 COMMAND: "md125_resync" > > #0 [ffffaba8cb5d3b38] __schedule at ffffffff8df6a84d > > #1 [ffffaba8cb5d3bc8] schedule at ffffffff8df6ae86 > > #2 [ffffaba8cb5d3be0] _wait_barrier at ffffffffc017cc81 [raid1] > > #3 [ffffaba8cb5d3c40] raid1_sync_request at ffffffffc017db5e [raid1] > > #4 [ffffaba8cb5d3d10] md_do_sync at ffffffff8ddc9799 > > #5 [ffffaba8cb5d3ea0] md_thread at ffffffff8ddc2e92 > > #6 [ffffaba8cb5d3f08] kthread at ffffffff8d8af739 > > #7 [ffffaba8cb5d3f50] ret_from_fork at ffffffff8df70485 > > > > The second one is actually raid1_sync_request -> close_sync -> > > wait_all_barriers. > > > > The problem is that wait_all_barriers increments all nr_pending buckets, > > but those have no corresponding nr_queued. If freeze_array is called in > > the middle of wait_all_barriers, it hangs waiting for nr_pending and > > nr_queued to line up. This never happens because an in-progress > > _wait_barrier also gets stuck due to the freeze. > > > > This was originally hit organically, but I was able to make it easier by > > inserting a 10ms delay before each _wait_barrier_call in > > wait_all_barriers, and a 4 sec delay before handle_read_error's call to > > freeze_array. Then, I start 2 dd processes reading from a raid1, start > > up a check, and pull a disk. Usually within 2 or 3 pulls I can hit the > > deadlock. > > Hi Nate, > > Nice catch! Thanks for the debug, I agree with your analysis for the > deadlock, neat :-) > > > > > I came up with a change that seems to avoid this, by manipulating > > nr_queued in wait/allow_all_barriers (not suggesting that this is the > > best way, but it seems safe at least): > > > > At first glance, I feel your fix works. But I worry about increasing and > decreasing nr_pending[idx] may introduce other race related to > "get_unqueued_pending() == extra" in freeze_array(). > > A solution I used when I wrote the barrier buckets, was to add a new > wait_event_* routine called wait_event_lock_irq_cmd_timeout(), which > wakes up freeze_array() after a timeout, to avoid a deadlock. > > The reason whey I didn't use it in finally version was, > - the routine name is too long > - some hidden deadlock will not be trigger because freeze_array() will > be self-waken up. > > For now, it seems maybe wait_event_lock_irq_cmd_timeout() has to be > used, again. Could you like to compose a patch with new > wait_event_lock_irq_cmd_timeout() and add a loop-after-timeout in > freeze_array() ? or if you are busy, I can handle this. > > Thanks in advance. > > Coly Li > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > > index f3f3e40dc9d8..e34dfda1c629 100644 > > --- a/drivers/md/raid1.c > > +++ b/drivers/md/raid1.c > > @@ -994,8 +994,11 @@ static void wait_all_barriers(struct r1conf *conf) > > { > > int idx; > > > > - for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) > > + for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) { > > _wait_barrier(conf, idx); > > + atomic_inc(&conf->nr_queued[idx]); > > + wake_up(&conf->wait_barrier); > > + } > > } > > > > static void _allow_barrier(struct r1conf *conf, int idx) > > @@ -1015,8 +1018,10 @@ static void allow_all_barriers(struct r1conf *conf) > > { > > int idx; > > > > - for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) > > + for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) { > > + atomic_dec(&conf->nr_queued[idx]); > > _allow_barrier(conf, idx); > > + } > > } > > > > /* conf->resync_lock should be held */ > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > <http://vger.kernel.org/majordomo-info.html> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: raid1: freeze_array/wait_all_barriers deadlock 2017-10-16 12:58 ` Nate Dailey @ 2017-10-16 14:43 ` Coly Li 2017-10-16 15:34 ` Nate Dailey 0 siblings, 1 reply; 7+ messages in thread From: Coly Li @ 2017-10-16 14:43 UTC (permalink / raw) To: Nate Dailey; +Cc: linux-raid On 2017/10/16 下午8:58, Nate Dailey wrote: > Hi Coly, I'm not sure I understand the change you're proposing. Would it > be something like the following? > > spin_lock_irq(&conf->resync_lock); > conf->array_frozen = 1; > raid1_log(conf->mddev, "wait freeze"); > while (get_unqueued_pending(conf) != extra) { > wait_event_lock_irq_cmd_timeout( > conf->wait_barrier, > get_unqueued_pending(conf) == extra, > conf->resync_lock, > flush_pending_writes(conf), > timeout); > } > spin_unlock_irq(&conf->resync_lock); > > On its own, I don't see how this would make any difference. Until > array_frozen == 0, wait_all_barriers will continue to be blocked, which > in turn will prevent the condition freeze_array is waiting on from ever > becoming true. Hi Nate, You are right, this idea does not help too much, we need to find another way. > Or should something else be done inside the new freeze_array loop that > would allow wait_all_barriers to make progress? It seems wait_all_barriers() is only used in close_sync(), which is to make sure all sync requests hit platters before raid1_sync_request() returns. How about setting a critical section in close_sync() and protected by another lock. It might be something like this, static void close_sync(struct r1conf *conf) { + mutex_lock close_sync_lock; wait_all_barriers(conf); + mutex_unlock close_sync_lock; allow_all_barriers(conf); [snip] } static void freeze_array(struct r1conf *conf, int extra) { + mutex_lock close_sync_lock spin_lock_irq(&conf->resync_lock); [snip] spin_unlock_irq(&conf->resync_lock); + mutex_unlock close_sync_lock } Then conf->array_frozen won't be set when wait_all_barriers() partially iterated on barrier buckets. Then a deadlock can be avoided. How do you think of this one ? Thanks. Coly Li [snip] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: raid1: freeze_array/wait_all_barriers deadlock 2017-10-16 14:43 ` Coly Li @ 2017-10-16 15:34 ` Nate Dailey 2017-10-16 16:49 ` Coly Li 0 siblings, 1 reply; 7+ messages in thread From: Nate Dailey @ 2017-10-16 15:34 UTC (permalink / raw) To: Coly Li; +Cc: linux-raid Hi Coly, I'm not 100% sure that's going to work. What I'm imagining is that close_sync gets the lock, so now raid1d/handle_read_error/freeze_array is blocked waiting for it. Is it possible that raid1d might need to complete some sync IO from the retry_list, that close_sync/wait_all_barriers is waiting for? In that case, there would still be a deadlock. Actually, what if close_sync just did this: static void close_sync(struct r1conf *conf) { int idx; for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) { _wait_barrier(conf, idx); _allow_barrier(conf, idx); } ... Is there any reason all the _wait_barriers have to complete before starting on _allow_barrier? If not, then I think this change would work (but will think about it more). Nate On 10/16/2017 10:43 AM, Coly Li wrote: > On 2017/10/16 下午8:58, Nate Dailey wrote: >> Hi Coly, I'm not sure I understand the change you're proposing. Would it >> be something like the following? >> >> spin_lock_irq(&conf->resync_lock); >> conf->array_frozen = 1; >> raid1_log(conf->mddev, "wait freeze"); >> while (get_unqueued_pending(conf) != extra) { >> wait_event_lock_irq_cmd_timeout( >> conf->wait_barrier, >> get_unqueued_pending(conf) == extra, >> conf->resync_lock, >> flush_pending_writes(conf), >> timeout); >> } >> spin_unlock_irq(&conf->resync_lock); >> >> On its own, I don't see how this would make any difference. Until >> array_frozen == 0, wait_all_barriers will continue to be blocked, which >> in turn will prevent the condition freeze_array is waiting on from ever >> becoming true. > Hi Nate, > > You are right, this idea does not help too much, we need to find another > way. > >> Or should something else be done inside the new freeze_array loop that >> would allow wait_all_barriers to make progress? > It seems wait_all_barriers() is only used in close_sync(), which is to > make sure all sync requests hit platters before raid1_sync_request() > returns. > > How about setting a critical section in close_sync() and protected by > another lock. It might be something like this, > > static void close_sync(struct r1conf *conf) > { > + mutex_lock close_sync_lock; > wait_all_barriers(conf); > + mutex_unlock close_sync_lock; > allow_all_barriers(conf); > [snip] > } > > > static void freeze_array(struct r1conf *conf, int extra) > { > + mutex_lock close_sync_lock > spin_lock_irq(&conf->resync_lock); > [snip] > spin_unlock_irq(&conf->resync_lock); > + mutex_unlock close_sync_lock > } > > Then conf->array_frozen won't be set when wait_all_barriers() partially > iterated on barrier buckets. Then a deadlock can be avoided. > > How do you think of this one ? > > Thanks. > > Coly Li > > > [snip] > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: raid1: freeze_array/wait_all_barriers deadlock 2017-10-16 15:34 ` Nate Dailey @ 2017-10-16 16:49 ` Coly Li 2017-10-16 17:30 ` Nate Dailey 0 siblings, 1 reply; 7+ messages in thread From: Coly Li @ 2017-10-16 16:49 UTC (permalink / raw) To: Nate Dailey; +Cc: linux-raid On 2017/10/16 下午11:34, Nate Dailey wrote: > Hi Coly, I'm not 100% sure that's going to work. > > What I'm imagining is that close_sync gets the lock, so now > raid1d/handle_read_error/freeze_array is blocked waiting for it. > Hi Nate, Let me think about this code path. If freeze_array() is blocked by mutex in handle_read_error(), it means conf->array_frozen is 0 still. _wait_barrier() only waits if 1) barrier[idx] is >0, 2) array_frozen is 1. So this situation won't block _wait_barrier() or wait_read_bucket() in all buckets. Then the loop to wait barrier for all buckets won't be blocked if freeze_array() waiting on regular I/O path. But if freeze_array() happens on resync I/O code path, deadlock is possible. Fortunately it seems no freeze_array() called on resync I/O code path. So the mutex seems to be safe so far. But adding a more lock always makes people nervous, and makes code complicated.... I don't like this :( > Is it possible that raid1d might need to complete some sync IO from the > retry_list, that close_sync/wait_all_barriers is waiting for? In that > case, there would still be a deadlock. > > Actually, what if close_sync just did this: > > static void close_sync(struct r1conf *conf) > { > int idx; > > for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) { > _wait_barrier(conf, idx); > _allow_barrier(conf, idx); And you may remove wait_all_barriers() and allow_all_barriers(), we have less code, yeah! > } > ... > > Is there any reason all the _wait_barriers have to complete before > starting on _allow_barrier? If not, then I think this change would work > (but will think about it more). > No reason that _allow_barrie() should wait for all _wait_barrier() completed here. wait_all_barriers() in close_sync() is to make sure all flying resync I/O finished. So when it is called, no new resync I/O will be generated, it is used to only wait for flying resync I/O to be completed. So you modification works, I feel it is safe. And the above code is much simpler, I like it :-) Thanks. Coly Li > > On 10/16/2017 10:43 AM, Coly Li wrote: >> On 2017/10/16 下午8:58, Nate Dailey wrote: >>> Hi Coly, I'm not sure I understand the change you're proposing. Would it >>> be something like the following? >>> >>> spin_lock_irq(&conf->resync_lock); >>> conf->array_frozen = 1; >>> raid1_log(conf->mddev, "wait freeze"); >>> while (get_unqueued_pending(conf) != extra) { >>> wait_event_lock_irq_cmd_timeout( >>> conf->wait_barrier, >>> get_unqueued_pending(conf) == extra, >>> conf->resync_lock, >>> flush_pending_writes(conf), >>> timeout); >>> } >>> spin_unlock_irq(&conf->resync_lock); >>> >>> On its own, I don't see how this would make any difference. Until >>> array_frozen == 0, wait_all_barriers will continue to be blocked, which >>> in turn will prevent the condition freeze_array is waiting on from ever >>> becoming true. >> Hi Nate, >> >> You are right, this idea does not help too much, we need to find another >> way. >> >>> Or should something else be done inside the new freeze_array loop that >>> would allow wait_all_barriers to make progress? >> It seems wait_all_barriers() is only used in close_sync(), which is to >> make sure all sync requests hit platters before raid1_sync_request() >> returns. >> >> How about setting a critical section in close_sync() and protected by >> another lock. It might be something like this, >> >> static void close_sync(struct r1conf *conf) >> { >> + mutex_lock close_sync_lock; >> wait_all_barriers(conf); >> + mutex_unlock close_sync_lock; >> allow_all_barriers(conf); >> [snip] >> } >> >> >> static void freeze_array(struct r1conf *conf, int extra) >> { >> + mutex_lock close_sync_lock >> spin_lock_irq(&conf->resync_lock); >> [snip] >> spin_unlock_irq(&conf->resync_lock); >> + mutex_unlock close_sync_lock >> } >> >> Then conf->array_frozen won't be set when wait_all_barriers() partially >> iterated on barrier buckets. Then a deadlock can be avoided. >> >> How do you think of this one ? >> >> Thanks. >> >> Coly Li ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: raid1: freeze_array/wait_all_barriers deadlock 2017-10-16 16:49 ` Coly Li @ 2017-10-16 17:30 ` Nate Dailey 0 siblings, 0 replies; 7+ messages in thread From: Nate Dailey @ 2017-10-16 17:30 UTC (permalink / raw) To: Coly Li; +Cc: linux-raid Great, thanks! I will test this and submit a patch if it works correctly. Nate On 10/16/2017 12:49 PM, Coly Li wrote: > On 2017/10/16 下午11:34, Nate Dailey wrote: >> Hi Coly, I'm not 100% sure that's going to work. >> >> What I'm imagining is that close_sync gets the lock, so now >> raid1d/handle_read_error/freeze_array is blocked waiting for it. >> > Hi Nate, > > Let me think about this code path. If freeze_array() is blocked by mutex > in handle_read_error(), it means conf->array_frozen is 0 still. > _wait_barrier() only waits if 1) barrier[idx] is >0, 2) array_frozen is > 1. So this situation won't block _wait_barrier() or wait_read_bucket() > in all buckets. > > Then the loop to wait barrier for all buckets won't be blocked if > freeze_array() waiting on regular I/O path. But if freeze_array() > happens on resync I/O code path, deadlock is possible. > > Fortunately it seems no freeze_array() called on resync I/O code path. > So the mutex seems to be safe so far. > > But adding a more lock always makes people nervous, and makes code > complicated.... I don't like this :( > >> Is it possible that raid1d might need to complete some sync IO from the >> retry_list, that close_sync/wait_all_barriers is waiting for? In that >> case, there would still be a deadlock. >> >> Actually, what if close_sync just did this: >> >> static void close_sync(struct r1conf *conf) >> { >> int idx; >> >> for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) { >> _wait_barrier(conf, idx); >> _allow_barrier(conf, idx); > And you may remove wait_all_barriers() and allow_all_barriers(), we have > less code, yeah! > >> } >> ... >> >> Is there any reason all the _wait_barriers have to complete before >> starting on _allow_barrier? If not, then I think this change would work >> (but will think about it more). >> > No reason that _allow_barrie() should wait for all _wait_barrier() > completed here. wait_all_barriers() in close_sync() is to make sure all > flying resync I/O finished. So when it is called, no new resync I/O will > be generated, it is used to only wait for flying resync I/O to be > completed. So you modification works, I feel it is safe. > > And the above code is much simpler, I like it :-) > > Thanks. > > Coly Li > > >> On 10/16/2017 10:43 AM, Coly Li wrote: >>> On 2017/10/16 下午8:58, Nate Dailey wrote: >>>> Hi Coly, I'm not sure I understand the change you're proposing. Would it >>>> be something like the following? >>>> >>>> spin_lock_irq(&conf->resync_lock); >>>> conf->array_frozen = 1; >>>> raid1_log(conf->mddev, "wait freeze"); >>>> while (get_unqueued_pending(conf) != extra) { >>>> wait_event_lock_irq_cmd_timeout( >>>> conf->wait_barrier, >>>> get_unqueued_pending(conf) == extra, >>>> conf->resync_lock, >>>> flush_pending_writes(conf), >>>> timeout); >>>> } >>>> spin_unlock_irq(&conf->resync_lock); >>>> >>>> On its own, I don't see how this would make any difference. Until >>>> array_frozen == 0, wait_all_barriers will continue to be blocked, which >>>> in turn will prevent the condition freeze_array is waiting on from ever >>>> becoming true. >>> Hi Nate, >>> >>> You are right, this idea does not help too much, we need to find another >>> way. >>> >>>> Or should something else be done inside the new freeze_array loop that >>>> would allow wait_all_barriers to make progress? >>> It seems wait_all_barriers() is only used in close_sync(), which is to >>> make sure all sync requests hit platters before raid1_sync_request() >>> returns. >>> >>> How about setting a critical section in close_sync() and protected by >>> another lock. It might be something like this, >>> >>> static void close_sync(struct r1conf *conf) >>> { >>> + mutex_lock close_sync_lock; >>> wait_all_barriers(conf); >>> + mutex_unlock close_sync_lock; >>> allow_all_barriers(conf); >>> [snip] >>> } >>> >>> >>> static void freeze_array(struct r1conf *conf, int extra) >>> { >>> + mutex_lock close_sync_lock >>> spin_lock_irq(&conf->resync_lock); >>> [snip] >>> spin_unlock_irq(&conf->resync_lock); >>> + mutex_unlock close_sync_lock >>> } >>> >>> Then conf->array_frozen won't be set when wait_all_barriers() partially >>> iterated on barrier buckets. Then a deadlock can be avoided. >>> >>> How do you think of this one ? >>> >>> Thanks. >>> >>> Coly Li ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-10-16 17:30 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-10-13 18:32 raid1: freeze_array/wait_all_barriers deadlock Nate Dailey 2017-10-14 21:45 ` Coly Li 2017-10-16 12:58 ` Nate Dailey 2017-10-16 14:43 ` Coly Li 2017-10-16 15:34 ` Nate Dailey 2017-10-16 16:49 ` Coly Li 2017-10-16 17:30 ` Nate Dailey
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.