[PATCHSET] blk-mq: reimplement timeout handling

* [PATCHSET] blk-mq: reimplement timeout handling
@ 2017-12-09 19:25 Tejun Heo
  2017-12-09 19:25 ` [PATCH 1/6] blk-mq: protect completion path with RCU Tejun Heo
                   ` (8 more replies)
  0 siblings, 9 replies; 26+ messages in thread
From: Tejun Heo @ 2017-12-09 19:25 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, oleg, peterz, kernel-team, osandov

Currently, blk-mq timeout path synchronizes against the usual
issue/completion path using a complex scheme involving atomic
bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence
rules.  Unfortunatley, it contains quite a few holes.

It's pretty easy to make blk_mq_check_expired() terminate a later
instance of a request.  If we induce 5 sec delay before
time_after_eq() test in blk_mq_check_expired(), shorten the timeout to
2s, and issue back-to-back large IOs, blk-mq starts timing out
requests spuriously pretty quickly.  Nothing actually timed out.  It
just made the call on a recycle instance of a request and then
terminated a later instance long after the original instance finished.
The scenario isn't theoretical either.

This patchset replaces the broken synchronization mechanism with a RCU
and generation number based one.  Please read the patch description of
the second path for more details.

Oleg, Peter, I'd really appreciate if you guys can go over the
reported breakages and the new implementation.

This patchset contains the following six patches.

 0001-blk-mq-protect-completion-path-with-RCU.patch
 0002-blk-mq-replace-timeout-synchronization-with-a-RCU-an.patch
 0003-blk-mq-use-blk_mq_rq_state-instead-of-testing-REQ_AT.patch
 0004-blk-mq-make-blk_abort_request-trigger-timeout-path.patch
 0005-blk-mq-remove-REQ_ATOM_COMPLETE-usages-from-blk-mq.patch
 0006-blk-mq-remove-REQ_ATOM_STARTED.patch

and is available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git blk-mq-timeout

diffstat follows.  Thanks.

 block/blk-core.c       |    2 
 block/blk-mq-debugfs.c |    4 
 block/blk-mq.c         |  246 +++++++++++++++++++++++++++----------------------
 block/blk-mq.h         |   48 +++++++++
 block/blk-timeout.c    |    9 -
 block/blk.h            |    7 -
 include/linux/blk-mq.h |    1 
 include/linux/blkdev.h |   23 ++++
 8 files changed, 218 insertions(+), 122 deletions(-)

--
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread