Races in sbitmap batched wakeups

* Races in sbitmap batched wakeups
@ 2022-06-16 17:21 Jan Kara
  2022-06-17  1:07 ` Ming Lei
  2022-06-17  1:40 ` Yu Kuai
  0 siblings, 2 replies; 10+ messages in thread
From: Jan Kara @ 2022-06-16 17:21 UTC (permalink / raw)
  To: Jens Axboe, Omar Sandoval; +Cc: linux-block

Hello!

I've been debugging some customer reports of tasks hanging (forever)
waiting for free tags when in fact all tags are free. After looking into it
for some time I think I know what it happening. First, keep in mind that
it concerns a device which uses shared tags. There are 127 tags available
and the number of active queues using these tags is easily 40 or more. So
number of tags available for each device is rather small. Now I'm not sure
how batched wakeups can ever work in such situations, but maybe I'm missing
something.

So take for example a situation where two tags are available for a device,
they are both currently used. Now a process comes into blk_mq_get_tag() and
wants to allocate tag and goes to sleep. Now how can it ever be woken up if
wake_batch is 4? If the two IOs complete, sbitmap will get two wakeups but
that's not enough to trigger the batched wakeup to really wakeup the
waiter...

Even if we have say 4 tags available so in theory there should be enough
wakeups to fill the batch, there can be the following problem. So 4 tags
are in use, two processes come to blk_mq_get_tag() and sleep, one on wait
queue 0, one on wait queue 1. Now four IOs complete so
sbitmap_queue_wake_up() gets called 4 times and the fourth call decrements
wait_cnt to 0 so it ends up calling wake_up_nr(wq0, 4). Fine, one of the
waiters is woken up but the other one is still sleeping in wq1 and there
are not enough wakeups to fill the batch and wake it up? This is
essentially because we have lost three wakeups on wq0 because it didn't
have enough waiters to wake...

Finally, sbitmap_queue_wake_up() is racy and if two of them race together,
they can end up decrementing wait_cnt of wq which does not have any process
queued which again effectively leads to lost wakeups and possibly
indefinitely sleeping tasks.

Now this all looks so broken that I'm unsure whether I'm not missing
something fundamental. Also I'm not quite sure how to fix all this without
basically destroying the batched wakeup feature (but I didn't put too much
thought to this yet). Comments?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread