All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v4] blk-mq: reimplement timeout handling
@ 2018-01-08 19:15 Tejun Heo
  2018-01-08 19:15 ` [PATCH 1/8] blk-mq: move hctx lock/unlock into a helper Tejun Heo
                   ` (7 more replies)
  0 siblings, 8 replies; 41+ messages in thread
From: Tejun Heo @ 2018-01-08 19:15 UTC (permalink / raw)
  To: jack, axboe, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, peterz, jianchao.w.wang,
	Bart.VanAssche, linux-block

Hello,

Changes from [v3]

- Rebased on top of for-4.16/block.

- Integrated Jens's hctx_[un]lock() factoring patch and refreshed the
  patches accordingly.

- Added comment explaining the use of hctx_lock() instead of
  rcu_read_lock() in completion path.

Changes from [v2]

- Possible extended looping around seqcount and u64_stat_sync fixed.

- Misplaced MQ_RQ_IDLE state setting fixed.

- RQF_MQ_TIMEOUT_EXPIRED added to prevent firing the same timeout
  multiple times.

- s/queue_rq_src/srcu/ patch added.

- Other misc changes.

Changes from [v1]

- BLK_EH_RESET_TIMER handling fixed.

- s/request->gstate_seqc/request->gstate_seq/

- READ_ONCE() added to blk_mq_rq_udpate_state().

- Removed left over blk_clear_rq_complete() invocation from
  blk_mq_rq_timed_out().

Currently, blk-mq timeout path synchronizes against the usual
issue/completion path using a complex scheme involving atomic
bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence
rules.  Unfortunatley, it contains quite a few holes.

It's pretty easy to make blk_mq_check_expired() terminate a later
instance of a request.  If we induce 5 sec delay before
time_after_eq() test in blk_mq_check_expired(), shorten the timeout to
2s, and issue back-to-back large IOs, blk-mq starts timing out
requests spuriously pretty quickly.  Nothing actually timed out.  It
just made the call on a recycle instance of a request and then
terminated a later instance long after the original instance finished.
The scenario isn't theoretical either.

This patchset replaces the broken synchronization mechanism with a RCU
and generation number based one.  Please read the patch description of
the second path for more details.

This patchset contains the following eight patches.

0001-blk-mq-move-hctx-lock-unlock-into-a-helper.patch
0002-blk-mq-protect-completion-path-with-RCU.patch
0003-blk-mq-replace-timeout-synchronization-with-a-RCU-an.patch
0004-blk-mq-use-blk_mq_rq_state-instead-of-testing-REQ_AT.patch
0005-blk-mq-make-blk_abort_request-trigger-timeout-path.patch
0006-blk-mq-remove-REQ_ATOM_COMPLETE-usages-from-blk-mq.patch
0007-blk-mq-remove-REQ_ATOM_STARTED.patch
0008-blk-mq-rename-blk_mq_hw_ctx-queue_rq_srcu-to-srcu.patch

and is available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git blk-mq-timeout-v4

diffstat follows.  Thanks.

 block/blk-core.c       |    2 
 block/blk-mq-debugfs.c |    4 
 block/blk-mq.c         |  340 ++++++++++++++++++++++++++++---------------------
 block/blk-mq.h         |   49 ++++++-
 block/blk-timeout.c    |   11 -
 block/blk.h            |    7 -
 include/linux/blk-mq.h |    3 
 include/linux/blkdev.h |   25 +++
 8 files changed, 283 insertions(+), 158 deletions(-)

--
tejun

[v3] http://lkml.kernel.org/r/20171216120726.517153-1-tj@kernel.org
[v2] http://lkml.kernel.org/r/20171010155441.753966-1-tj@kernel.org
[v1] http://lkml.kernel.org/r/20171209192525.982030-1-tj@kernel.org

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/8] blk-mq: move hctx lock/unlock into a helper
  2018-01-08 19:15 [PATCHSET v4] blk-mq: reimplement timeout handling Tejun Heo
@ 2018-01-08 19:15 ` Tejun Heo
  2018-01-08 19:24     ` Bart Van Assche
  2018-01-08 19:15 ` [PATCH 2/8] blk-mq: protect completion path with RCU Tejun Heo
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 41+ messages in thread
From: Tejun Heo @ 2018-01-08 19:15 UTC (permalink / raw)
  To: jack, axboe, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, peterz, jianchao.w.wang,
	Bart.VanAssche, linux-block, Tejun Heo

From: Jens Axboe <axboe@kernel.dk>

Move the RCU vs SRCU logic into lock/unlock helpers, which makes
the actual functional bits within the locked region much easier
to read.

tj: Reordered in front of timeout revamp patches and added the missing
    blk_mq_run_hw_queue() conversion.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-mq.c | 66 ++++++++++++++++++++++++++++------------------------------
 1 file changed, 32 insertions(+), 34 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 111e1aa..ddc9261 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -557,6 +557,22 @@ static void __blk_mq_complete_request(struct request *rq)
 	put_cpu();
 }
 
+static void hctx_unlock(struct blk_mq_hw_ctx *hctx, int srcu_idx)
+{
+	if (!(hctx->flags & BLK_MQ_F_BLOCKING))
+		rcu_read_unlock();
+	else
+		srcu_read_unlock(hctx->queue_rq_srcu, srcu_idx);
+}
+
+static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
+{
+	if (!(hctx->flags & BLK_MQ_F_BLOCKING))
+		rcu_read_lock();
+	else
+		*srcu_idx = srcu_read_lock(hctx->queue_rq_srcu);
+}
+
 /**
  * blk_mq_complete_request - end I/O on a request
  * @rq:		the request being processed
@@ -1214,17 +1230,11 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	 */
 	WARN_ON_ONCE(in_interrupt());
 
-	if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
-		rcu_read_lock();
-		blk_mq_sched_dispatch_requests(hctx);
-		rcu_read_unlock();
-	} else {
-		might_sleep();
+	might_sleep_if(hctx->flags & BLK_MQ_F_BLOCKING);
 
-		srcu_idx = srcu_read_lock(hctx->queue_rq_srcu);
-		blk_mq_sched_dispatch_requests(hctx);
-		srcu_read_unlock(hctx->queue_rq_srcu, srcu_idx);
-	}
+	hctx_lock(hctx, &srcu_idx);
+	blk_mq_sched_dispatch_requests(hctx);
+	hctx_unlock(hctx, srcu_idx);
 }
 
 /*
@@ -1296,17 +1306,10 @@ bool blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
 	 * And queue will be rerun in blk_mq_unquiesce_queue() if it is
 	 * quiesced.
 	 */
-	if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
-		rcu_read_lock();
-		need_run = !blk_queue_quiesced(hctx->queue) &&
-			blk_mq_hctx_has_pending(hctx);
-		rcu_read_unlock();
-	} else {
-		srcu_idx = srcu_read_lock(hctx->queue_rq_srcu);
-		need_run = !blk_queue_quiesced(hctx->queue) &&
-			blk_mq_hctx_has_pending(hctx);
-		srcu_read_unlock(hctx->queue_rq_srcu, srcu_idx);
-	}
+	hctx_lock(hctx, &srcu_idx);
+	need_run = !blk_queue_quiesced(hctx->queue) &&
+		blk_mq_hctx_has_pending(hctx);
+	hctx_unlock(hctx, srcu_idx);
 
 	if (need_run) {
 		__blk_mq_delay_run_hw_queue(hctx, async, 0);
@@ -1618,7 +1621,7 @@ static blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx *hctx, struct request *rq)
 
 static void __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 					struct request *rq,
-					blk_qc_t *cookie, bool may_sleep)
+					blk_qc_t *cookie)
 {
 	struct request_queue *q = rq->q;
 	struct blk_mq_queue_data bd = {
@@ -1668,25 +1671,20 @@ static void __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 	}
 
 insert:
-	blk_mq_sched_insert_request(rq, false, run_queue, false, may_sleep);
+	blk_mq_sched_insert_request(rq, false, run_queue, false,
+					hctx->flags & BLK_MQ_F_BLOCKING);
 }
 
 static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 		struct request *rq, blk_qc_t *cookie)
 {
-	if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
-		rcu_read_lock();
-		__blk_mq_try_issue_directly(hctx, rq, cookie, false);
-		rcu_read_unlock();
-	} else {
-		unsigned int srcu_idx;
+	int srcu_idx;
 
-		might_sleep();
+	might_sleep_if(hctx->flags & BLK_MQ_F_BLOCKING);
 
-		srcu_idx = srcu_read_lock(hctx->queue_rq_srcu);
-		__blk_mq_try_issue_directly(hctx, rq, cookie, true);
-		srcu_read_unlock(hctx->queue_rq_srcu, srcu_idx);
-	}
+	hctx_lock(hctx, &srcu_idx);
+	__blk_mq_try_issue_directly(hctx, rq, cookie);
+	hctx_unlock(hctx, srcu_idx);
 }
 
 static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 2/8] blk-mq: protect completion path with RCU
  2018-01-08 19:15 [PATCHSET v4] blk-mq: reimplement timeout handling Tejun Heo
  2018-01-08 19:15 ` [PATCH 1/8] blk-mq: move hctx lock/unlock into a helper Tejun Heo
@ 2018-01-08 19:15 ` Tejun Heo
  2018-01-08 19:57   ` Holger Hoffstätte
                     ` (2 more replies)
  2018-01-08 19:15 ` [PATCH 3/8] blk-mq: replace timeout synchronization with a RCU and generation based scheme Tejun Heo
                   ` (5 subsequent siblings)
  7 siblings, 3 replies; 41+ messages in thread
From: Tejun Heo @ 2018-01-08 19:15 UTC (permalink / raw)
  To: jack, axboe, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, peterz, jianchao.w.wang,
	Bart.VanAssche, linux-block, Tejun Heo

Currently, blk-mq protects only the issue path with RCU.  This patch
puts the completion path under the same RCU protection.  This will be
used to synchronize issue/completion against timeout by later patches,
which will also add the comments.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-mq.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index ddc9261..6741c3e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -584,11 +584,16 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
 void blk_mq_complete_request(struct request *rq)
 {
 	struct request_queue *q = rq->q;
+	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu);
+	int srcu_idx;
 
 	if (unlikely(blk_should_fake_timeout(q)))
 		return;
+
+	hctx_lock(hctx, &srcu_idx);
 	if (!blk_mark_rq_complete(rq))
 		__blk_mq_complete_request(rq);
+	hctx_unlock(hctx, srcu_idx);
 }
 EXPORT_SYMBOL(blk_mq_complete_request);
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 3/8] blk-mq: replace timeout synchronization with a RCU and generation based scheme
  2018-01-08 19:15 [PATCHSET v4] blk-mq: reimplement timeout handling Tejun Heo
  2018-01-08 19:15 ` [PATCH 1/8] blk-mq: move hctx lock/unlock into a helper Tejun Heo
  2018-01-08 19:15 ` [PATCH 2/8] blk-mq: protect completion path with RCU Tejun Heo
@ 2018-01-08 19:15 ` Tejun Heo
  2018-01-08 21:06     ` Bart Van Assche
  2018-01-08 23:29     ` Bart Van Assche
  2018-01-08 19:15 ` [PATCH 4/8] blk-mq: use blk_mq_rq_state() instead of testing REQ_ATOM_COMPLETE Tejun Heo
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 41+ messages in thread
From: Tejun Heo @ 2018-01-08 19:15 UTC (permalink / raw)
  To: jack, axboe, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, peterz, jianchao.w.wang,
	Bart.VanAssche, linux-block, Tejun Heo, Christoph Hellwig

Currently, blk-mq timeout path synchronizes against the usual
issue/completion path using a complex scheme involving atomic
bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence
rules.  Unfortunately, it contains quite a few holes.

There's a complex dancing around REQ_ATOM_STARTED and
REQ_ATOM_COMPLETE between issue/completion and timeout paths; however,
they don't have a synchronization point across request recycle
instances and it isn't clear what the barriers add.
blk_mq_check_expired() can easily read STARTED from N-2'th iteration,
deadline from N-1'th, blk_mark_rq_complete() against Nth instance.

In fact, it's pretty easy to make blk_mq_check_expired() terminate a
later instance of a request.  If we induce 5 sec delay before
time_after_eq() test in blk_mq_check_expired(), shorten the timeout to
2s, and issue back-to-back large IOs, blk-mq starts timing out
requests spuriously pretty quickly.  Nothing actually timed out.  It
just made the call on a recycle instance of a request and then
terminated a later instance long after the original instance finished.
The scenario isn't theoretical either.

This patch replaces the broken synchronization mechanism with a RCU
and generation number based one.

1. Each request has a u64 generation + state value, which can be
   updated only by the request owner.  Whenever a request becomes
   in-flight, the generation number gets bumped up too.  This provides
   the basis for the timeout path to distinguish different recycle
   instances of the request.

   Also, marking a request in-flight and setting its deadline are
   protected with a seqcount so that the timeout path can fetch both
   values coherently.

2. The timeout path fetches the generation, state and deadline.  If
   the verdict is timeout, it records the generation into a dedicated
   request abortion field and does RCU wait.

3. The completion path is also protected by RCU (from the previous
   patch) and checks whether the current generation number and state
   match the abortion field.  If so, it skips completion.

4. The timeout path, after RCU wait, scans requests again and
   terminates the ones whose generation and state still match the ones
   requested for abortion.

   By now, the timeout path knows that either the generation number
   and state changed if it lost the race or the completion will yield
   to it and can safely timeout the request.

While it's more lines of code, it's conceptually simpler, doesn't
depend on direct use of subtle memory ordering or coherence, and
hopefully doesn't terminate the wrong instance.

While this change makes REQ_ATOM_COMPLETE synchronization unnecessary
between issue/complete and timeout paths, REQ_ATOM_COMPLETE isn't
removed yet as it's still used in other places.  Future patches will
move all state tracking to the new mechanism and remove all bitops in
the hot paths.

v2: - Fixed BLK_EH_RESET_TIMER handling as pointed out by Jianchao.
    - s/request->gstate_seqc/request->gstate_seq/ as suggested by Peter.
    - READ_ONCE() added in blk_mq_rq_update_state() as suggested by Peter.

v3: - Fixed possible extended seqcount / u64_stats_sync read looping
      spotted by Peter.
    - MQ_RQ_IDLE was incorrectly being set in complete_request instead
      of free_request.  Fixed.

v4: - Rebased on top of hctx_lock() refactoring patch.
    - Added comment explaining the use of hctx_lock() in completion path.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "jianchao.wang" <jianchao.w.wang@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
---
 block/blk-core.c       |   2 +
 block/blk-mq.c         | 223 +++++++++++++++++++++++++++++++++----------------
 block/blk-mq.h         |  46 ++++++++++
 block/blk-timeout.c    |   2 +-
 block/blk.h            |   6 --
 include/linux/blk-mq.h |   1 +
 include/linux/blkdev.h |  23 +++++
 7 files changed, 224 insertions(+), 79 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 2e0d041..f843ae4f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -126,6 +126,8 @@ void blk_rq_init(struct request_queue *q, struct request *rq)
 	rq->start_time = jiffies;
 	set_start_time_ns(rq);
 	rq->part = NULL;
+	seqcount_init(&rq->gstate_seq);
+	u64_stats_init(&rq->aborted_gstate_sync);
 }
 EXPORT_SYMBOL(blk_rq_init);
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6741c3e..6587f0c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -483,6 +483,7 @@ void blk_mq_free_request(struct request *rq)
 	if (blk_rq_rl(rq))
 		blk_put_rl(blk_rq_rl(rq));
 
+	blk_mq_rq_update_state(rq, MQ_RQ_IDLE);
 	clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
 	clear_bit(REQ_ATOM_POLL_SLEPT, &rq->atomic_flags);
 	if (rq->tag != -1)
@@ -530,6 +531,8 @@ static void __blk_mq_complete_request(struct request *rq)
 	bool shared = false;
 	int cpu;
 
+	WARN_ON_ONCE(blk_mq_rq_state(rq) != MQ_RQ_IN_FLIGHT);
+
 	if (rq->internal_tag != -1)
 		blk_mq_sched_completed_request(rq);
 	if (rq->rq_flags & RQF_STATS) {
@@ -573,6 +576,30 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
 		*srcu_idx = srcu_read_lock(hctx->queue_rq_srcu);
 }
 
+static void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	u64_stats_update_begin(&rq->aborted_gstate_sync);
+	rq->aborted_gstate = gstate;
+	u64_stats_update_end(&rq->aborted_gstate_sync);
+	local_irq_restore(flags);
+}
+
+static u64 blk_mq_rq_aborted_gstate(struct request *rq)
+{
+	unsigned int start;
+	u64 aborted_gstate;
+
+	do {
+		start = u64_stats_fetch_begin(&rq->aborted_gstate_sync);
+		aborted_gstate = rq->aborted_gstate;
+	} while (u64_stats_fetch_retry(&rq->aborted_gstate_sync, start));
+
+	return aborted_gstate;
+}
+
 /**
  * blk_mq_complete_request - end I/O on a request
  * @rq:		the request being processed
@@ -590,8 +617,20 @@ void blk_mq_complete_request(struct request *rq)
 	if (unlikely(blk_should_fake_timeout(q)))
 		return;
 
+	/*
+	 * If @rq->aborted_gstate equals the current instance, timeout is
+	 * claiming @rq and we lost.  This is synchronized through
+	 * hctx_lock().  See blk_mq_timeout_work() for details.
+	 *
+	 * Completion path never blocks and we can directly use RCU here
+	 * instead of hctx_lock() which can be either RCU or SRCU.
+	 * However, that would complicate paths which want to synchronize
+	 * against us.  Let stay in sync with the issue path so that
+	 * hctx_lock() covers both issue and completion paths.
+	 */
 	hctx_lock(hctx, &srcu_idx);
-	if (!blk_mark_rq_complete(rq))
+	if (blk_mq_rq_aborted_gstate(rq) != rq->gstate &&
+	    !blk_mark_rq_complete(rq))
 		__blk_mq_complete_request(rq);
 	hctx_unlock(hctx, srcu_idx);
 }
@@ -617,34 +656,32 @@ void blk_mq_start_request(struct request *rq)
 		wbt_issue(q->rq_wb, &rq->issue_stat);
 	}
 
-	blk_add_timer(rq);
-
+	WARN_ON_ONCE(blk_mq_rq_state(rq) != MQ_RQ_IDLE);
 	WARN_ON_ONCE(test_bit(REQ_ATOM_STARTED, &rq->atomic_flags));
 
 	/*
-	 * Mark us as started and clear complete. Complete might have been
-	 * set if requeue raced with timeout, which then marked it as
-	 * complete. So be sure to clear complete again when we start
-	 * the request, otherwise we'll ignore the completion event.
+	 * Mark @rq in-flight which also advances the generation number,
+	 * and register for timeout.  Protect with a seqcount to allow the
+	 * timeout path to read both @rq->gstate and @rq->deadline
+	 * coherently.
 	 *
-	 * Ensure that ->deadline is visible before we set STARTED, such that
-	 * blk_mq_check_expired() is guaranteed to observe our ->deadline when
-	 * it observes STARTED.
+	 * This is the only place where a request is marked in-flight.  If
+	 * the timeout path reads an in-flight @rq->gstate, the
+	 * @rq->deadline it reads together under @rq->gstate_seq is
+	 * guaranteed to be the matching one.
 	 */
-	smp_wmb();
+	preempt_disable();
+	write_seqcount_begin(&rq->gstate_seq);
+
+	blk_mq_rq_update_state(rq, MQ_RQ_IN_FLIGHT);
+	blk_add_timer(rq);
+
+	write_seqcount_end(&rq->gstate_seq);
+	preempt_enable();
+
 	set_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
-	if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags)) {
-		/*
-		 * Coherence order guarantees these consecutive stores to a
-		 * single variable propagate in the specified order. Thus the
-		 * clear_bit() is ordered _after_ the set bit. See
-		 * blk_mq_check_expired().
-		 *
-		 * (the bits must be part of the same byte for this to be
-		 * true).
-		 */
+	if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags))
 		clear_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
-	}
 
 	if (q->dma_drain_size && blk_rq_bytes(rq)) {
 		/*
@@ -677,6 +714,7 @@ static void __blk_mq_requeue_request(struct request *rq)
 	blk_mq_sched_requeue_request(rq);
 
 	if (test_and_clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) {
+		blk_mq_rq_update_state(rq, MQ_RQ_IDLE);
 		if (q->dma_drain_size && blk_rq_bytes(rq))
 			rq->nr_phys_segments--;
 	}
@@ -774,6 +812,7 @@ EXPORT_SYMBOL(blk_mq_tag_to_rq);
 struct blk_mq_timeout_data {
 	unsigned long next;
 	unsigned int next_set;
+	unsigned int nr_expired;
 };
 
 void blk_mq_rq_timed_out(struct request *req, bool reserved)
@@ -801,6 +840,12 @@ void blk_mq_rq_timed_out(struct request *req, bool reserved)
 		__blk_mq_complete_request(req);
 		break;
 	case BLK_EH_RESET_TIMER:
+		/*
+		 * As nothing prevents from completion happening while
+		 * ->aborted_gstate is set, this may lead to ignored
+		 * completions and further spurious timeouts.
+		 */
+		blk_mq_rq_update_aborted_gstate(req, 0);
 		blk_add_timer(req);
 		blk_clear_rq_complete(req);
 		break;
@@ -816,50 +861,51 @@ static void blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
 		struct request *rq, void *priv, bool reserved)
 {
 	struct blk_mq_timeout_data *data = priv;
-	unsigned long deadline;
+	unsigned long gstate, deadline;
+	int start;
+
+	might_sleep();
 
 	if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
 		return;
 
-	/*
-	 * Ensures that if we see STARTED we must also see our
-	 * up-to-date deadline, see blk_mq_start_request().
-	 */
-	smp_rmb();
-
-	deadline = READ_ONCE(rq->deadline);
+	/* read coherent snapshots of @rq->state_gen and @rq->deadline */
+	while (true) {
+		start = read_seqcount_begin(&rq->gstate_seq);
+		gstate = READ_ONCE(rq->gstate);
+		deadline = rq->deadline;
+		if (!read_seqcount_retry(&rq->gstate_seq, start))
+			break;
+		cond_resched();
+	}
 
-	/*
-	 * The rq being checked may have been freed and reallocated
-	 * out already here, we avoid this race by checking rq->deadline
-	 * and REQ_ATOM_COMPLETE flag together:
-	 *
-	 * - if rq->deadline is observed as new value because of
-	 *   reusing, the rq won't be timed out because of timing.
-	 * - if rq->deadline is observed as previous value,
-	 *   REQ_ATOM_COMPLETE flag won't be cleared in reuse path
-	 *   because we put a barrier between setting rq->deadline
-	 *   and clearing the flag in blk_mq_start_request(), so
-	 *   this rq won't be timed out too.
-	 */
-	if (time_after_eq(jiffies, deadline)) {
-		if (!blk_mark_rq_complete(rq)) {
-			/*
-			 * Again coherence order ensures that consecutive reads
-			 * from the same variable must be in that order. This
-			 * ensures that if we see COMPLETE clear, we must then
-			 * see STARTED set and we'll ignore this timeout.
-			 *
-			 * (There's also the MB implied by the test_and_clear())
-			 */
-			blk_mq_rq_timed_out(rq, reserved);
-		}
+	/* if in-flight && overdue, mark for abortion */
+	if ((gstate & MQ_RQ_STATE_MASK) == MQ_RQ_IN_FLIGHT &&
+	    time_after_eq(jiffies, deadline)) {
+		blk_mq_rq_update_aborted_gstate(rq, gstate);
+		data->nr_expired++;
+		hctx->nr_expired++;
 	} else if (!data->next_set || time_after(data->next, deadline)) {
 		data->next = deadline;
 		data->next_set = 1;
 	}
 }
 
+static void blk_mq_terminate_expired(struct blk_mq_hw_ctx *hctx,
+		struct request *rq, void *priv, bool reserved)
+{
+	/*
+	 * We marked @rq->aborted_gstate and waited for RCU.  If there were
+	 * completions that we lost to, they would have finished and
+	 * updated @rq->gstate by now; otherwise, the completion path is
+	 * now guaranteed to see @rq->aborted_gstate and yield.  If
+	 * @rq->aborted_gstate still matches @rq->gstate, @rq is ours.
+	 */
+	if (READ_ONCE(rq->gstate) == rq->aborted_gstate &&
+	    !blk_mark_rq_complete(rq))
+		blk_mq_rq_timed_out(rq, reserved);
+}
+
 static void blk_mq_timeout_work(struct work_struct *work)
 {
 	struct request_queue *q =
@@ -867,7 +913,9 @@ static void blk_mq_timeout_work(struct work_struct *work)
 	struct blk_mq_timeout_data data = {
 		.next		= 0,
 		.next_set	= 0,
+		.nr_expired	= 0,
 	};
+	struct blk_mq_hw_ctx *hctx;
 	int i;
 
 	/* A deadlock might occur if a request is stuck requiring a
@@ -886,14 +934,40 @@ static void blk_mq_timeout_work(struct work_struct *work)
 	if (!percpu_ref_tryget(&q->q_usage_counter))
 		return;
 
+	/* scan for the expired ones and set their ->aborted_gstate */
 	blk_mq_queue_tag_busy_iter(q, blk_mq_check_expired, &data);
 
+	if (data.nr_expired) {
+		bool has_rcu = false;
+
+		/*
+		 * Wait till everyone sees ->aborted_gstate.  The
+		 * sequential waits for SRCUs aren't ideal.  If this ever
+		 * becomes a problem, we can add per-hw_ctx rcu_head and
+		 * wait in parallel.
+		 */
+		queue_for_each_hw_ctx(q, hctx, i) {
+			if (!hctx->nr_expired)
+				continue;
+
+			if (!(hctx->flags & BLK_MQ_F_BLOCKING))
+				has_rcu = true;
+			else
+				synchronize_srcu(hctx->queue_rq_srcu);
+
+			hctx->nr_expired = 0;
+		}
+		if (has_rcu)
+			synchronize_rcu();
+
+		/* terminate the ones we won */
+		blk_mq_queue_tag_busy_iter(q, blk_mq_terminate_expired, NULL);
+	}
+
 	if (data.next_set) {
 		data.next = blk_rq_timeout(round_jiffies_up(data.next));
 		mod_timer(&q->timeout, data.next);
 	} else {
-		struct blk_mq_hw_ctx *hctx;
-
 		queue_for_each_hw_ctx(q, hctx, i) {
 			/* the hctx may be unmapped, so check it here */
 			if (blk_mq_hw_queue_mapped(hctx))
@@ -1893,6 +1967,22 @@ static size_t order_to_size(unsigned int order)
 	return (size_t)PAGE_SIZE << order;
 }
 
+static int blk_mq_init_request(struct blk_mq_tag_set *set, struct request *rq,
+			       unsigned int hctx_idx, int node)
+{
+	int ret;
+
+	if (set->ops->init_request) {
+		ret = set->ops->init_request(set, rq, hctx_idx, node);
+		if (ret)
+			return ret;
+	}
+
+	seqcount_init(&rq->gstate_seq);
+	u64_stats_init(&rq->aborted_gstate_sync);
+	return 0;
+}
+
 int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		     unsigned int hctx_idx, unsigned int depth)
 {
@@ -1954,12 +2044,9 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 			struct request *rq = p;
 
 			tags->static_rqs[i] = rq;
-			if (set->ops->init_request) {
-				if (set->ops->init_request(set, rq, hctx_idx,
-						node)) {
-					tags->static_rqs[i] = NULL;
-					goto fail;
-				}
+			if (blk_mq_init_request(set, rq, hctx_idx, node)) {
+				tags->static_rqs[i] = NULL;
+				goto fail;
 			}
 
 			p += rq_size;
@@ -2098,9 +2185,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
 	if (!hctx->fq)
 		goto sched_exit_hctx;
 
-	if (set->ops->init_request &&
-	    set->ops->init_request(set, hctx->fq->flush_rq, hctx_idx,
-				   node))
+	if (blk_mq_init_request(set, hctx->fq->flush_rq, hctx_idx, node))
 		goto free_fq;
 
 	if (hctx->flags & BLK_MQ_F_BLOCKING)
@@ -3018,12 +3103,6 @@ static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
 
 static int __init blk_mq_init(void)
 {
-	/*
-	 * See comment in block/blk.h rq_atomic_flags enum
-	 */
-	BUILD_BUG_ON((REQ_ATOM_STARTED / BITS_PER_BYTE) !=
-			(REQ_ATOM_COMPLETE / BITS_PER_BYTE));
-
 	cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
 				blk_mq_hctx_notify_dead);
 	return 0;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 6c7c3ff..cf01f6f 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -27,6 +27,19 @@ struct blk_mq_ctx {
 	struct kobject		kobj;
 } ____cacheline_aligned_in_smp;
 
+/*
+ * Bits for request->gstate.  The lower two bits carry MQ_RQ_* state value
+ * and the upper bits the generation number.
+ */
+enum mq_rq_state {
+	MQ_RQ_IDLE		= 0,
+	MQ_RQ_IN_FLIGHT		= 1,
+
+	MQ_RQ_STATE_BITS	= 2,
+	MQ_RQ_STATE_MASK	= (1 << MQ_RQ_STATE_BITS) - 1,
+	MQ_RQ_GEN_INC		= 1 << MQ_RQ_STATE_BITS,
+};
+
 void blk_mq_freeze_queue(struct request_queue *q);
 void blk_mq_free_queue(struct request_queue *q);
 int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr);
@@ -85,6 +98,39 @@ extern void blk_mq_rq_timed_out(struct request *req, bool reserved);
 
 void blk_mq_release(struct request_queue *q);
 
+/**
+ * blk_mq_rq_state() - read the current MQ_RQ_* state of a request
+ * @rq: target request.
+ */
+static inline int blk_mq_rq_state(struct request *rq)
+{
+	return READ_ONCE(rq->gstate) & MQ_RQ_STATE_MASK;
+}
+
+/**
+ * blk_mq_rq_update_state() - set the current MQ_RQ_* state of a request
+ * @rq: target request.
+ * @state: new state to set.
+ *
+ * Set @rq's state to @state.  The caller is responsible for ensuring that
+ * there are no other updaters.  A request can transition into IN_FLIGHT
+ * only from IDLE and doing so increments the generation number.
+ */
+static inline void blk_mq_rq_update_state(struct request *rq,
+					  enum mq_rq_state state)
+{
+	u64 old_val = READ_ONCE(rq->gstate);
+	u64 new_val = (old_val & ~MQ_RQ_STATE_MASK) | state;
+
+	if (state == MQ_RQ_IN_FLIGHT) {
+		WARN_ON_ONCE((old_val & MQ_RQ_STATE_MASK) != MQ_RQ_IDLE);
+		new_val += MQ_RQ_GEN_INC;
+	}
+
+	/* avoid exposing interim values */
+	WRITE_ONCE(rq->gstate, new_val);
+}
+
 static inline struct blk_mq_ctx *__blk_mq_get_ctx(struct request_queue *q,
 					   unsigned int cpu)
 {
diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index 764ecf9..6427be7 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -208,7 +208,7 @@ void blk_add_timer(struct request *req)
 	if (!req->timeout)
 		req->timeout = q->rq_timeout;
 
-	WRITE_ONCE(req->deadline, jiffies + req->timeout);
+	req->deadline = jiffies + req->timeout;
 
 	/*
 	 * Only the non-mq case needs to add the request to a protected list.
diff --git a/block/blk.h b/block/blk.h
index 3f14469..9cb2739 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -123,12 +123,6 @@ void blk_account_io_done(struct request *req);
  * Internal atomic flags for request handling
  */
 enum rq_atomic_flags {
-	/*
-	 * Keep these two bits first - not because we depend on the
-	 * value of them, but we do depend on them being in the same
-	 * byte of storage to ensure ordering on writes. Keeping them
-	 * first will achieve that nicely.
-	 */
 	REQ_ATOM_COMPLETE = 0,
 	REQ_ATOM_STARTED,
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 95c9a5c..460798d 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -51,6 +51,7 @@ struct blk_mq_hw_ctx {
 	unsigned int		queue_num;
 
 	atomic_t		nr_active;
+	unsigned int		nr_expired;
 
 	struct hlist_node	cpuhp_dead;
 	struct kobject		kobj;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 46e606f..2206607 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -27,6 +27,8 @@
 #include <linux/percpu-refcount.h>
 #include <linux/scatterlist.h>
 #include <linux/blkzoned.h>
+#include <linux/seqlock.h>
+#include <linux/u64_stats_sync.h>
 
 struct module;
 struct scsi_ioctl_command;
@@ -230,6 +232,27 @@ struct request {
 
 	unsigned short write_hint;
 
+	/*
+	 * On blk-mq, the lower bits of ->gstate carry the MQ_RQ_* state
+	 * value and the upper bits the generation number which is
+	 * monotonically incremented and used to distinguish the reuse
+	 * instances.
+	 *
+	 * ->gstate_seq allows updates to ->gstate and other fields
+	 * (currently ->deadline) during request start to be read
+	 * atomically from the timeout path, so that it can operate on a
+	 * coherent set of information.
+	 */
+	seqcount_t gstate_seq;
+	u64 gstate;
+
+	/*
+	 * ->aborted_gstate is used by the timeout to claim a specific
+	 * recycle instance of this request.  See blk_mq_timeout_work().
+	 */
+	struct u64_stats_sync aborted_gstate_sync;
+	u64 aborted_gstate;
+
 	unsigned long deadline;
 	struct list_head timeout_list;
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 4/8] blk-mq: use blk_mq_rq_state() instead of testing REQ_ATOM_COMPLETE
  2018-01-08 19:15 [PATCHSET v4] blk-mq: reimplement timeout handling Tejun Heo
                   ` (2 preceding siblings ...)
  2018-01-08 19:15 ` [PATCH 3/8] blk-mq: replace timeout synchronization with a RCU and generation based scheme Tejun Heo
@ 2018-01-08 19:15 ` Tejun Heo
  2018-01-08 22:03     ` Bart Van Assche
  2018-01-08 19:15 ` [PATCH 5/8] blk-mq: make blk_abort_request() trigger timeout path Tejun Heo
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 41+ messages in thread
From: Tejun Heo @ 2018-01-08 19:15 UTC (permalink / raw)
  To: jack, axboe, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, peterz, jianchao.w.wang,
	Bart.VanAssche, linux-block, Tejun Heo

blk_mq_check_inflight() and blk_mq_poll_hybrid_sleep() test
REQ_ATOM_COMPLETE to determine the request state.  Both uses are
speculative and we can test REQ_ATOM_STARTED and blk_mq_rq_state() for
equivalent results.  Replace the tests.  This will allow removing
REQ_ATOM_COMPLETE usages from blk-mq.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-mq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6587f0c..41bfd27 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -95,8 +95,7 @@ static void blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx,
 {
 	struct mq_inflight *mi = priv;
 
-	if (test_bit(REQ_ATOM_STARTED, &rq->atomic_flags) &&
-	    !test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags)) {
+	if (blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT) {
 		/*
 		 * index[0] counts the specific partition that was asked
 		 * for. index[1] counts the ones that are active on the
@@ -3017,7 +3016,8 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
 
 	hrtimer_init_sleeper(&hs, current);
 	do {
-		if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags))
+		if (test_bit(REQ_ATOM_STARTED, &rq->atomic_flags) &&
+		    blk_mq_rq_state(rq) != MQ_RQ_IN_FLIGHT)
 			break;
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		hrtimer_start_expires(&hs.timer, mode);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 5/8] blk-mq: make blk_abort_request() trigger timeout path
  2018-01-08 19:15 [PATCHSET v4] blk-mq: reimplement timeout handling Tejun Heo
                   ` (3 preceding siblings ...)
  2018-01-08 19:15 ` [PATCH 4/8] blk-mq: use blk_mq_rq_state() instead of testing REQ_ATOM_COMPLETE Tejun Heo
@ 2018-01-08 19:15 ` Tejun Heo
  2018-01-08 22:10     ` Bart Van Assche
  2018-01-08 19:15 ` [PATCH 6/8] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq Tejun Heo
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 41+ messages in thread
From: Tejun Heo @ 2018-01-08 19:15 UTC (permalink / raw)
  To: jack, axboe, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, peterz, jianchao.w.wang,
	Bart.VanAssche, linux-block, Tejun Heo, Asai Thambi SP,
	Stefan Haberland, Jan Hoeppner

With issue/complete and timeout paths now using the generation number
and state based synchronization, blk_abort_request() is the only one
which depends on REQ_ATOM_COMPLETE for arbitrating completion.

There's no reason for blk_abort_request() to be a completely separate
path.  This patch makes blk_abort_request() piggyback on the timeout
path instead of trying to terminate the request directly.

This removes the last dependency on REQ_ATOM_COMPLETE in blk-mq.

Note that this makes blk_abort_request() asynchronous - it initiates
abortion but the actual termination will happen after a short while,
even when the caller owns the request.  AFAICS, SCSI and ATA should be
fine with that and I think mtip32xx and dasd should be safe but not
completely sure.  It'd be great if people who know the drivers take a
look.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Asai Thambi SP <asamymuthupa@micron.com>
Cc: Stefan Haberland <sth@linux.vnet.ibm.com>
Cc: Jan Hoeppner <hoeppner@linux.vnet.ibm.com>
---
 block/blk-mq.c      | 2 +-
 block/blk-mq.h      | 2 --
 block/blk-timeout.c | 8 ++++----
 3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 41bfd27..bed4cb8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -814,7 +814,7 @@ struct blk_mq_timeout_data {
 	unsigned int nr_expired;
 };
 
-void blk_mq_rq_timed_out(struct request *req, bool reserved)
+static void blk_mq_rq_timed_out(struct request *req, bool reserved)
 {
 	const struct blk_mq_ops *ops = req->q->mq_ops;
 	enum blk_eh_timer_return ret = BLK_EH_RESET_TIMER;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index cf01f6f..6b2d616 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -94,8 +94,6 @@ extern int blk_mq_sysfs_register(struct request_queue *q);
 extern void blk_mq_sysfs_unregister(struct request_queue *q);
 extern void blk_mq_hctx_kobj_init(struct blk_mq_hw_ctx *hctx);
 
-extern void blk_mq_rq_timed_out(struct request *req, bool reserved);
-
 void blk_mq_release(struct request_queue *q);
 
 /**
diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index 6427be7..d580af3 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -156,12 +156,12 @@ void blk_timeout_work(struct work_struct *work)
  */
 void blk_abort_request(struct request *req)
 {
-	if (blk_mark_rq_complete(req))
-		return;
-
 	if (req->q->mq_ops) {
-		blk_mq_rq_timed_out(req, false);
+		req->deadline = jiffies;
+		mod_timer(&req->q->timeout, 0);
 	} else {
+		if (blk_mark_rq_complete(req))
+			return;
 		blk_delete_timer(req);
 		blk_rq_timed_out(req);
 	}
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 6/8] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq
  2018-01-08 19:15 [PATCHSET v4] blk-mq: reimplement timeout handling Tejun Heo
                   ` (4 preceding siblings ...)
  2018-01-08 19:15 ` [PATCH 5/8] blk-mq: make blk_abort_request() trigger timeout path Tejun Heo
@ 2018-01-08 19:15 ` Tejun Heo
  2018-01-08 23:36     ` Bart Van Assche
  2018-01-08 19:15 ` [PATCH 7/8] blk-mq: remove REQ_ATOM_STARTED Tejun Heo
  2018-01-08 19:15 ` [PATCH 8/8] blk-mq: rename blk_mq_hw_ctx->queue_rq_srcu to ->srcu Tejun Heo
  7 siblings, 1 reply; 41+ messages in thread
From: Tejun Heo @ 2018-01-08 19:15 UTC (permalink / raw)
  To: jack, axboe, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, peterz, jianchao.w.wang,
	Bart.VanAssche, linux-block, Tejun Heo

After the recent updates to use generation number and state based
synchronization, blk-mq no longer depends on REQ_ATOM_COMPLETE except
to avoid firing the same timeout multiple times.

Remove all REQ_ATOM_COMPLETE usages and use a new rq_flags flag
RQF_MQ_TIMEOUT_EXPIRED to avoid firing the same timeout multiple
times.  This removes atomic bitops from hot paths too.

v2: Removed blk_clear_rq_complete() from blk_mq_rq_timed_out().

v3: Added RQF_MQ_TIMEOUT_EXPIRED flag.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "jianchao.wang" <jianchao.w.wang@oracle.com>
---
 block/blk-mq.c         | 15 +++++++--------
 block/blk-timeout.c    |  1 +
 include/linux/blkdev.h |  2 ++
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index bed4cb8..050d982 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -628,8 +628,7 @@ void blk_mq_complete_request(struct request *rq)
 	 * hctx_lock() covers both issue and completion paths.
 	 */
 	hctx_lock(hctx, &srcu_idx);
-	if (blk_mq_rq_aborted_gstate(rq) != rq->gstate &&
-	    !blk_mark_rq_complete(rq))
+	if (blk_mq_rq_aborted_gstate(rq) != rq->gstate)
 		__blk_mq_complete_request(rq);
 	hctx_unlock(hctx, srcu_idx);
 }
@@ -679,8 +678,6 @@ void blk_mq_start_request(struct request *rq)
 	preempt_enable();
 
 	set_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
-	if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags))
-		clear_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
 
 	if (q->dma_drain_size && blk_rq_bytes(rq)) {
 		/*
@@ -831,6 +828,8 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved)
 	if (!test_bit(REQ_ATOM_STARTED, &req->atomic_flags))
 		return;
 
+	req->rq_flags |= RQF_MQ_TIMEOUT_EXPIRED;
+
 	if (ops->timeout)
 		ret = ops->timeout(req, reserved);
 
@@ -846,7 +845,6 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved)
 		 */
 		blk_mq_rq_update_aborted_gstate(req, 0);
 		blk_add_timer(req);
-		blk_clear_rq_complete(req);
 		break;
 	case BLK_EH_NOT_HANDLED:
 		break;
@@ -865,7 +863,8 @@ static void blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
 
 	might_sleep();
 
-	if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
+	if ((rq->rq_flags & RQF_MQ_TIMEOUT_EXPIRED) ||
+	    !test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
 		return;
 
 	/* read coherent snapshots of @rq->state_gen and @rq->deadline */
@@ -900,8 +899,8 @@ static void blk_mq_terminate_expired(struct blk_mq_hw_ctx *hctx,
 	 * now guaranteed to see @rq->aborted_gstate and yield.  If
 	 * @rq->aborted_gstate still matches @rq->gstate, @rq is ours.
 	 */
-	if (READ_ONCE(rq->gstate) == rq->aborted_gstate &&
-	    !blk_mark_rq_complete(rq))
+	if (!(rq->rq_flags & RQF_MQ_TIMEOUT_EXPIRED) &&
+	    READ_ONCE(rq->gstate) == rq->aborted_gstate)
 		blk_mq_rq_timed_out(rq, reserved);
 }
 
diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index d580af3..25c4ffa 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -209,6 +209,7 @@ void blk_add_timer(struct request *req)
 		req->timeout = q->rq_timeout;
 
 	req->deadline = jiffies + req->timeout;
+	req->rq_flags &= ~RQF_MQ_TIMEOUT_EXPIRED;
 
 	/*
 	 * Only the non-mq case needs to add the request to a protected list.
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 2206607..e2a4afc 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -125,6 +125,8 @@ typedef __u32 __bitwise req_flags_t;
 #define RQF_SPECIAL_PAYLOAD	((__force req_flags_t)(1 << 18))
 /* The per-zone write lock is held for this request */
 #define RQF_ZONE_WRITE_LOCKED	((__force req_flags_t)(1 << 19))
+/* timeout is expired */
+#define RQF_MQ_TIMEOUT_EXPIRED	((__force req_flags_t)(1 << 20))
 
 /* flags that prevent us from merging requests: */
 #define RQF_NOMERGE_FLAGS \
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 7/8] blk-mq: remove REQ_ATOM_STARTED
  2018-01-08 19:15 [PATCHSET v4] blk-mq: reimplement timeout handling Tejun Heo
                   ` (5 preceding siblings ...)
  2018-01-08 19:15 ` [PATCH 6/8] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq Tejun Heo
@ 2018-01-08 19:15 ` Tejun Heo
  2018-01-08 23:47     ` Bart Van Assche
  2018-01-08 19:15 ` [PATCH 8/8] blk-mq: rename blk_mq_hw_ctx->queue_rq_srcu to ->srcu Tejun Heo
  7 siblings, 1 reply; 41+ messages in thread
From: Tejun Heo @ 2018-01-08 19:15 UTC (permalink / raw)
  To: jack, axboe, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, peterz, jianchao.w.wang,
	Bart.VanAssche, linux-block, Tejun Heo

After the recent updates to use generation number and state based
synchronization, we can easily replace REQ_ATOM_STARTED usages by
adding an extra state to distinguish completed but not yet freed
state.

Add MQ_RQ_COMPLETE and replace REQ_ATOM_STARTED usages with
blk_mq_rq_state() tests.  REQ_ATOM_STARTED no longer has any users
left and is removed.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-mq-debugfs.c |  4 +---
 block/blk-mq.c         | 37 ++++++++-----------------------------
 block/blk-mq.h         |  1 +
 block/blk.h            |  1 -
 4 files changed, 10 insertions(+), 33 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index b56a4f3..8adc837 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -271,7 +271,6 @@ static const char *const cmd_flag_name[] = {
 #define RQF_NAME(name) [ilog2((__force u32)RQF_##name)] = #name
 static const char *const rqf_name[] = {
 	RQF_NAME(SORTED),
-	RQF_NAME(STARTED),
 	RQF_NAME(QUEUED),
 	RQF_NAME(SOFTBARRIER),
 	RQF_NAME(FLUSH_SEQ),
@@ -295,7 +294,6 @@ static const char *const rqf_name[] = {
 #define RQAF_NAME(name) [REQ_ATOM_##name] = #name
 static const char *const rqaf_name[] = {
 	RQAF_NAME(COMPLETE),
-	RQAF_NAME(STARTED),
 	RQAF_NAME(POLL_SLEPT),
 };
 #undef RQAF_NAME
@@ -409,7 +407,7 @@ static void hctx_show_busy_rq(struct request *rq, void *data, bool reserved)
 	const struct show_busy_params *params = data;
 
 	if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == params->hctx &&
-	    test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
+	    blk_mq_rq_state(rq) != MQ_RQ_IDLE)
 		__blk_mq_debugfs_rq_show(params->m,
 					 list_entry_rq(&rq->queuelist));
 }
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 050d982..54ecb2c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -483,7 +483,6 @@ void blk_mq_free_request(struct request *rq)
 		blk_put_rl(blk_rq_rl(rq));
 
 	blk_mq_rq_update_state(rq, MQ_RQ_IDLE);
-	clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
 	clear_bit(REQ_ATOM_POLL_SLEPT, &rq->atomic_flags);
 	if (rq->tag != -1)
 		blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
@@ -531,6 +530,7 @@ static void __blk_mq_complete_request(struct request *rq)
 	int cpu;
 
 	WARN_ON_ONCE(blk_mq_rq_state(rq) != MQ_RQ_IN_FLIGHT);
+	blk_mq_rq_update_state(rq, MQ_RQ_COMPLETE);
 
 	if (rq->internal_tag != -1)
 		blk_mq_sched_completed_request(rq);
@@ -636,7 +636,7 @@ EXPORT_SYMBOL(blk_mq_complete_request);
 
 int blk_mq_request_started(struct request *rq)
 {
-	return test_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
+	return blk_mq_rq_state(rq) != MQ_RQ_IDLE;
 }
 EXPORT_SYMBOL_GPL(blk_mq_request_started);
 
@@ -655,7 +655,6 @@ void blk_mq_start_request(struct request *rq)
 	}
 
 	WARN_ON_ONCE(blk_mq_rq_state(rq) != MQ_RQ_IDLE);
-	WARN_ON_ONCE(test_bit(REQ_ATOM_STARTED, &rq->atomic_flags));
 
 	/*
 	 * Mark @rq in-flight which also advances the generation number,
@@ -677,8 +676,6 @@ void blk_mq_start_request(struct request *rq)
 	write_seqcount_end(&rq->gstate_seq);
 	preempt_enable();
 
-	set_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
-
 	if (q->dma_drain_size && blk_rq_bytes(rq)) {
 		/*
 		 * Make sure space for the drain appears.  We know we can do
@@ -691,13 +688,9 @@ void blk_mq_start_request(struct request *rq)
 EXPORT_SYMBOL(blk_mq_start_request);
 
 /*
- * When we reach here because queue is busy, REQ_ATOM_COMPLETE
- * flag isn't set yet, so there may be race with timeout handler,
- * but given rq->deadline is just set in .queue_rq() under
- * this situation, the race won't be possible in reality because
- * rq->timeout should be set as big enough to cover the window
- * between blk_mq_start_request() called from .queue_rq() and
- * clearing REQ_ATOM_STARTED here.
+ * When we reach here because queue is busy, it's safe to change the state
+ * to IDLE without checking @rq->aborted_gstate because we should still be
+ * holding the RCU read lock and thus protected against timeout.
  */
 static void __blk_mq_requeue_request(struct request *rq)
 {
@@ -709,7 +702,7 @@ static void __blk_mq_requeue_request(struct request *rq)
 	wbt_requeue(q->rq_wb, &rq->issue_stat);
 	blk_mq_sched_requeue_request(rq);
 
-	if (test_and_clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) {
+	if (blk_mq_rq_state(rq) != MQ_RQ_IDLE) {
 		blk_mq_rq_update_state(rq, MQ_RQ_IDLE);
 		if (q->dma_drain_size && blk_rq_bytes(rq))
 			rq->nr_phys_segments--;
@@ -816,18 +809,6 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved)
 	const struct blk_mq_ops *ops = req->q->mq_ops;
 	enum blk_eh_timer_return ret = BLK_EH_RESET_TIMER;
 
-	/*
-	 * We know that complete is set at this point. If STARTED isn't set
-	 * anymore, then the request isn't active and the "timeout" should
-	 * just be ignored. This can happen due to the bitflag ordering.
-	 * Timeout first checks if STARTED is set, and if it is, assumes
-	 * the request is active. But if we race with completion, then
-	 * both flags will get cleared. So check here again, and ignore
-	 * a timeout event with a request that isn't active.
-	 */
-	if (!test_bit(REQ_ATOM_STARTED, &req->atomic_flags))
-		return;
-
 	req->rq_flags |= RQF_MQ_TIMEOUT_EXPIRED;
 
 	if (ops->timeout)
@@ -863,8 +844,7 @@ static void blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
 
 	might_sleep();
 
-	if ((rq->rq_flags & RQF_MQ_TIMEOUT_EXPIRED) ||
-	    !test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
+	if (rq->rq_flags & RQF_MQ_TIMEOUT_EXPIRED)
 		return;
 
 	/* read coherent snapshots of @rq->state_gen and @rq->deadline */
@@ -3015,8 +2995,7 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
 
 	hrtimer_init_sleeper(&hs, current);
 	do {
-		if (test_bit(REQ_ATOM_STARTED, &rq->atomic_flags) &&
-		    blk_mq_rq_state(rq) != MQ_RQ_IN_FLIGHT)
+		if (blk_mq_rq_state(rq) == MQ_RQ_COMPLETE)
 			break;
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		hrtimer_start_expires(&hs.timer, mode);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 6b2d616..8591a54 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -34,6 +34,7 @@ struct blk_mq_ctx {
 enum mq_rq_state {
 	MQ_RQ_IDLE		= 0,
 	MQ_RQ_IN_FLIGHT		= 1,
+	MQ_RQ_COMPLETE		= 2,
 
 	MQ_RQ_STATE_BITS	= 2,
 	MQ_RQ_STATE_MASK	= (1 << MQ_RQ_STATE_BITS) - 1,
diff --git a/block/blk.h b/block/blk.h
index 9cb2739..a68dbe3 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -124,7 +124,6 @@ void blk_account_io_done(struct request *req);
  */
 enum rq_atomic_flags {
 	REQ_ATOM_COMPLETE = 0,
-	REQ_ATOM_STARTED,
 
 	REQ_ATOM_POLL_SLEPT,
 };
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 8/8] blk-mq: rename blk_mq_hw_ctx->queue_rq_srcu to ->srcu
  2018-01-08 19:15 [PATCHSET v4] blk-mq: reimplement timeout handling Tejun Heo
                   ` (6 preceding siblings ...)
  2018-01-08 19:15 ` [PATCH 7/8] blk-mq: remove REQ_ATOM_STARTED Tejun Heo
@ 2018-01-08 19:15 ` Tejun Heo
  2018-01-08 23:48     ` Bart Van Assche
  7 siblings, 1 reply; 41+ messages in thread
From: Tejun Heo @ 2018-01-08 19:15 UTC (permalink / raw)
  To: jack, axboe, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, peterz, jianchao.w.wang,
	Bart.VanAssche, linux-block, Tejun Heo

The RCU protection has been expanded to cover both queueing and
completion paths making ->queue_rq_srcu a misnomer.  Rename it to
->srcu as suggested by Bart.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Bart Van Assche <Bart.VanAssche@wdc.com>
---
 block/blk-mq.c         | 14 +++++++-------
 include/linux/blk-mq.h |  2 +-
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 54ecb2c..071f396 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -219,7 +219,7 @@ void blk_mq_quiesce_queue(struct request_queue *q)
 
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (hctx->flags & BLK_MQ_F_BLOCKING)
-			synchronize_srcu(hctx->queue_rq_srcu);
+			synchronize_srcu(hctx->srcu);
 		else
 			rcu = true;
 	}
@@ -564,7 +564,7 @@ static void hctx_unlock(struct blk_mq_hw_ctx *hctx, int srcu_idx)
 	if (!(hctx->flags & BLK_MQ_F_BLOCKING))
 		rcu_read_unlock();
 	else
-		srcu_read_unlock(hctx->queue_rq_srcu, srcu_idx);
+		srcu_read_unlock(hctx->srcu, srcu_idx);
 }
 
 static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
@@ -572,7 +572,7 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
 	if (!(hctx->flags & BLK_MQ_F_BLOCKING))
 		rcu_read_lock();
 	else
-		*srcu_idx = srcu_read_lock(hctx->queue_rq_srcu);
+		*srcu_idx = srcu_read_lock(hctx->srcu);
 }
 
 static void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate)
@@ -931,7 +931,7 @@ static void blk_mq_timeout_work(struct work_struct *work)
 			if (!(hctx->flags & BLK_MQ_F_BLOCKING))
 				has_rcu = true;
 			else
-				synchronize_srcu(hctx->queue_rq_srcu);
+				synchronize_srcu(hctx->srcu);
 
 			hctx->nr_expired = 0;
 		}
@@ -2094,7 +2094,7 @@ static void blk_mq_exit_hctx(struct request_queue *q,
 		set->ops->exit_hctx(hctx, hctx_idx);
 
 	if (hctx->flags & BLK_MQ_F_BLOCKING)
-		cleanup_srcu_struct(hctx->queue_rq_srcu);
+		cleanup_srcu_struct(hctx->srcu);
 
 	blk_mq_remove_cpuhp(hctx);
 	blk_free_flush_queue(hctx->fq);
@@ -2167,7 +2167,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
 		goto free_fq;
 
 	if (hctx->flags & BLK_MQ_F_BLOCKING)
-		init_srcu_struct(hctx->queue_rq_srcu);
+		init_srcu_struct(hctx->srcu);
 
 	blk_mq_debugfs_register_hctx(q, hctx);
 
@@ -2456,7 +2456,7 @@ static int blk_mq_hw_ctx_size(struct blk_mq_tag_set *tag_set)
 {
 	int hw_ctx_size = sizeof(struct blk_mq_hw_ctx);
 
-	BUILD_BUG_ON(ALIGN(offsetof(struct blk_mq_hw_ctx, queue_rq_srcu),
+	BUILD_BUG_ON(ALIGN(offsetof(struct blk_mq_hw_ctx, srcu),
 			   __alignof__(struct blk_mq_hw_ctx)) !=
 		     sizeof(struct blk_mq_hw_ctx));
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 460798d..8efcf49 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -66,7 +66,7 @@ struct blk_mq_hw_ctx {
 #endif
 
 	/* Must be the last member - see also blk_mq_hw_ctx_size(). */
-	struct srcu_struct	queue_rq_srcu[0];
+	struct srcu_struct	srcu[0];
 };
 
 struct blk_mq_tag_set {
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/8] blk-mq: move hctx lock/unlock into a helper
  2018-01-08 19:15 ` [PATCH 1/8] blk-mq: move hctx lock/unlock into a helper Tejun Heo
@ 2018-01-08 19:24     ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2018-01-08 19:24 UTC (permalink / raw)
  To: jbacik, tj, jack, clm, axboe
  Cc: kernel-team, linux-kernel, peterz, linux-btrfs, linux-block,
	jianchao.w.wang

T24gTW9uLCAyMDE4LTAxLTA4IGF0IDExOjE1IC0wODAwLCBUZWp1biBIZW8gd3JvdGU6DQo+ICtz
dGF0aWMgdm9pZCBoY3R4X3VubG9jayhzdHJ1Y3QgYmxrX21xX2h3X2N0eCAqaGN0eCwgaW50IHNy
Y3VfaWR4KQ0KPiArew0KPiArCWlmICghKGhjdHgtPmZsYWdzICYgQkxLX01RX0ZfQkxPQ0tJTkcp
KQ0KPiArCQlyY3VfcmVhZF91bmxvY2soKTsNCj4gKwllbHNlDQo+ICsJCXNyY3VfcmVhZF91bmxv
Y2soaGN0eC0+cXVldWVfcnFfc3JjdSwgc3JjdV9pZHgpOw0KPiArfQ0KPiArDQo+ICtzdGF0aWMg
dm9pZCBoY3R4X2xvY2soc3RydWN0IGJsa19tcV9od19jdHggKmhjdHgsIGludCAqc3JjdV9pZHgp
DQo+ICt7DQo+ICsJaWYgKCEoaGN0eC0+ZmxhZ3MgJiBCTEtfTVFfRl9CTE9DS0lORykpDQo+ICsJ
CXJjdV9yZWFkX2xvY2soKTsNCj4gKwllbHNlDQo+ICsJCSpzcmN1X2lkeCA9IHNyY3VfcmVhZF9s
b2NrKGhjdHgtPnF1ZXVlX3JxX3NyY3UpOw0KPiArfQ0KDQpBIG1pbm9yIGNvbW1lbnQ6IHBsZWFz
ZSBjb25zaWRlciB0byByZW9yZGVyIHRoZXNlIHR3byBmdW5jdGlvbnMgc3VjaCB0aGF0IHRoZQ0K
bG9jayBmdW5jdGlvbiBhcHBlYXJzIGZpcnN0IGFuZCB0aGUgdW5sb2NrIGZ1bmN0aW9uIHNlY29u
ZC4gQW55d2F5Og0KDQpSZXZpZXdlZC1ieTogQmFydCBWYW4gQXNzY2hlIDxiYXJ0LnZhbmFzc2No
ZUB3ZGMuY29tPg0KDQo=

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/8] blk-mq: move hctx lock/unlock into a helper
@ 2018-01-08 19:24     ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2018-01-08 19:24 UTC (permalink / raw)
  To: jbacik, tj, jack, clm, axboe
  Cc: kernel-team, linux-kernel, peterz, linux-btrfs, linux-block,
	jianchao.w.wang

On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote:
> +static void hctx_unlock(struct blk_mq_hw_ctx *hctx, int srcu_idx)
> +{
> +	if (!(hctx->flags & BLK_MQ_F_BLOCKING))
> +		rcu_read_unlock();
> +	else
> +		srcu_read_unlock(hctx->queue_rq_srcu, srcu_idx);
> +}
> +
> +static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
> +{
> +	if (!(hctx->flags & BLK_MQ_F_BLOCKING))
> +		rcu_read_lock();
> +	else
> +		*srcu_idx = srcu_read_lock(hctx->queue_rq_srcu);
> +}

A minor comment: please consider to reorder these two functions such that the
lock function appears first and the unlock function second. Anyway:

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] blk-mq: protect completion path with RCU
  2018-01-08 19:15 ` [PATCH 2/8] blk-mq: protect completion path with RCU Tejun Heo
@ 2018-01-08 19:57   ` Holger Hoffstätte
  2018-01-08 20:15     ` Jens Axboe
  2018-01-09  7:08   ` Hannes Reinecke
  2018-01-09 16:12     ` Bart Van Assche
  2 siblings, 1 reply; 41+ messages in thread
From: Holger Hoffstätte @ 2018-01-08 19:57 UTC (permalink / raw)
  To: Tejun Heo, jack, axboe, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, peterz, jianchao.w.wang,
	Bart.VanAssche, linux-block

On 01/08/18 20:15, Tejun Heo wrote:
> Currently, blk-mq protects only the issue path with RCU.  This patch
> puts the completion path under the same RCU protection.  This will be
> used to synchronize issue/completion against timeout by later patches,
> which will also add the comments.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
>  block/blk-mq.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index ddc9261..6741c3e 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -584,11 +584,16 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
>  void blk_mq_complete_request(struct request *rq)
>  {
>  	struct request_queue *q = rq->q;
> +	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu);
> +	int srcu_idx;
>  
>  	if (unlikely(blk_should_fake_timeout(q)))
>  		return;
> +
> +	hctx_lock(hctx, &srcu_idx);
>  	if (!blk_mark_rq_complete(rq))
>  		__blk_mq_complete_request(rq);
> +	hctx_unlock(hctx, srcu_idx);
>  }
>  EXPORT_SYMBOL(blk_mq_complete_request);

So I've had v3 running fine with 4.14++ and when I first tried Jens'
additional helpers on top, I got a bunch of warnings which I didn't
investigate further at the time. Now they are back since the helpers
moved into patch #1 and #2 correctly says:

..
block/blk-mq.c: In function ‘blk_mq_complete_request’:
./include/linux/srcu.h:175:2: warning: ‘srcu_idx’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  __srcu_read_unlock(sp, idx);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~
block/blk-mq.c:587:6: note: ‘srcu_idx’ was declared here
  int srcu_idx;
      ^~~~~~~~
..etc.

This is with gcc 7.2.0.

I understand that this is a somewhat-false positive since the lock always
precedes the unlock & writes to the value, but can we properly initialize
or annotate this?

cheers
Holger

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] blk-mq: protect completion path with RCU
  2018-01-08 19:57   ` Holger Hoffstätte
@ 2018-01-08 20:15     ` Jens Axboe
  2018-01-08 22:55       ` Jens Axboe
  0 siblings, 1 reply; 41+ messages in thread
From: Jens Axboe @ 2018-01-08 20:15 UTC (permalink / raw)
  To: Holger Hoffstätte, Tejun Heo, jack, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, peterz, jianchao.w.wang,
	Bart.VanAssche, linux-block

On 1/8/18 12:57 PM, Holger Hoffstätte wrote:
> On 01/08/18 20:15, Tejun Heo wrote:
>> Currently, blk-mq protects only the issue path with RCU.  This patch
>> puts the completion path under the same RCU protection.  This will be
>> used to synchronize issue/completion against timeout by later patches,
>> which will also add the comments.
>>
>> Signed-off-by: Tejun Heo <tj@kernel.org>
>> ---
>>  block/blk-mq.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index ddc9261..6741c3e 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -584,11 +584,16 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
>>  void blk_mq_complete_request(struct request *rq)
>>  {
>>  	struct request_queue *q = rq->q;
>> +	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu);
>> +	int srcu_idx;
>>  
>>  	if (unlikely(blk_should_fake_timeout(q)))
>>  		return;
>> +
>> +	hctx_lock(hctx, &srcu_idx);
>>  	if (!blk_mark_rq_complete(rq))
>>  		__blk_mq_complete_request(rq);
>> +	hctx_unlock(hctx, srcu_idx);
>>  }
>>  EXPORT_SYMBOL(blk_mq_complete_request);
> 
> So I've had v3 running fine with 4.14++ and when I first tried Jens'
> additional helpers on top, I got a bunch of warnings which I didn't
> investigate further at the time. Now they are back since the helpers
> moved into patch #1 and #2 correctly says:
> 
> ..
> block/blk-mq.c: In function ‘blk_mq_complete_request’:
> ./include/linux/srcu.h:175:2: warning: ‘srcu_idx’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>   __srcu_read_unlock(sp, idx);
>   ^~~~~~~~~~~~~~~~~~~~~~~~~~~
> block/blk-mq.c:587:6: note: ‘srcu_idx’ was declared here
>   int srcu_idx;
>       ^~~~~~~~
> ..etc.
> 
> This is with gcc 7.2.0.
> 
> I understand that this is a somewhat-false positive since the lock always
> precedes the unlock & writes to the value, but can we properly initialize
> or annotate this?

It's not a somewhat false positive, it's a false positive. I haven't seen
that bogus warning with the compiler I'm running:

gcc (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0

and

gcc (GCC) 7.2.0

Neither of them throw the warning.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/8] blk-mq: replace timeout synchronization with a RCU and generation based scheme
  2018-01-08 19:15 ` [PATCH 3/8] blk-mq: replace timeout synchronization with a RCU and generation based scheme Tejun Heo
@ 2018-01-08 21:06     ` Bart Van Assche
  2018-01-08 23:29     ` Bart Van Assche
  1 sibling, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2018-01-08 21:06 UTC (permalink / raw)
  To: jbacik, tj, jack, clm, axboe
  Cc: kernel-team, linux-kernel, peterz, linux-btrfs, linux-block,
	jianchao.w.wang, hch

T24gTW9uLCAyMDE4LTAxLTA4IGF0IDExOjE1IC0wODAwLCBUZWp1biBIZW8gd3JvdGU6DQo+ICtz
dGF0aWMgdm9pZCBibGtfbXFfcnFfdXBkYXRlX2Fib3J0ZWRfZ3N0YXRlKHN0cnVjdCByZXF1ZXN0
ICpycSwgdTY0IGdzdGF0ZSkNCj4gK3sNCj4gKwl1bnNpZ25lZCBsb25nIGZsYWdzOw0KPiArDQo+
ICsJbG9jYWxfaXJxX3NhdmUoZmxhZ3MpOw0KPiArCXU2NF9zdGF0c191cGRhdGVfYmVnaW4oJnJx
LT5hYm9ydGVkX2dzdGF0ZV9zeW5jKTsNCj4gKwlycS0+YWJvcnRlZF9nc3RhdGUgPSBnc3RhdGU7
DQo+ICsJdTY0X3N0YXRzX3VwZGF0ZV9lbmQoJnJxLT5hYm9ydGVkX2dzdGF0ZV9zeW5jKTsNCj4g
Kwlsb2NhbF9pcnFfcmVzdG9yZShmbGFncyk7DQo+ICt9DQoNClBsZWFzZSBhZGQgYSBjb21tZW50
IHRoYXQgZXhwbGFpbnMgdGhlIHB1cnBvc2Ugb2YgbG9jYWxfaXJxX3NhdmUoKSBhbmQNCmxvY2Fs
X2lycV9yZXN0b3JlKCkuIFBsZWFzZSBhbHNvIGV4cGxhaW4gd2h5IHlvdSBjaG9zZSB0byBkaXNh
YmxlIGludGVycnVwdHMNCmluc3RlYWQgb2YgZGlzYWJsaW5nIHByZWVtcHRpb24uIEkgdGhpbmsg
dGhhdCBkaXNhYmxpbmcgcHJlZW1wdGlvbiBzaG91bGQgYmUNCnN1ZmZpY2llbnQgc2luY2UgdGhp
cyBpcyB0aGUgb25seSBjb2RlIHRoYXQgdXBkYXRlcyBycS0+YWJvcnRlZF9nc3RhdGUgYW5kDQpz
aW5jZSB0aGlzIGZ1bmN0aW9uIGlzIGFsd2F5cyBjYWxsZWQgZnJvbSB0aHJlYWQgY29udGV4dC4N
Cg0KPiBAQCAtODAxLDYgKzg0MCwxMiBAQCB2b2lkIGJsa19tcV9ycV90aW1lZF9vdXQoc3RydWN0
IHJlcXVlc3QgKnJlcSwgYm9vbCByZXNlcnZlZCkNCj4gIAkJX19ibGtfbXFfY29tcGxldGVfcmVx
dWVzdChyZXEpOw0KPiAgCQlicmVhazsNCj4gIAljYXNlIEJMS19FSF9SRVNFVF9USU1FUjoNCj4g
KwkJLyoNCj4gKwkJICogQXMgbm90aGluZyBwcmV2ZW50cyBmcm9tIGNvbXBsZXRpb24gaGFwcGVu
aW5nIHdoaWxlDQo+ICsJCSAqIC0+YWJvcnRlZF9nc3RhdGUgaXMgc2V0LCB0aGlzIG1heSBsZWFk
IHRvIGlnbm9yZWQNCj4gKwkJICogY29tcGxldGlvbnMgYW5kIGZ1cnRoZXIgc3B1cmlvdXMgdGlt
ZW91dHMuDQo+ICsJCSAqLw0KPiArCQlibGtfbXFfcnFfdXBkYXRlX2Fib3J0ZWRfZ3N0YXRlKHJl
cSwgMCk7DQo+ICAJCWJsa19hZGRfdGltZXIocmVxKTsNCj4gIAkJYmxrX2NsZWFyX3JxX2NvbXBs
ZXRlKHJlcSk7DQo+ICAJCWJyZWFrOw0KDQpJcyB0aGUgcmFjZSB0aGF0IHRoZSBjb21tZW50IHJl
ZmVycyB0byBhZGRyZXNzZWQgYnkgb25lIG9mIHRoZSBsYXRlciBwYXRjaGVzPw0KDQpUaGFua3Ms
DQoNCkJhcnQu

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/8] blk-mq: replace timeout synchronization with a RCU and generation based scheme
@ 2018-01-08 21:06     ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2018-01-08 21:06 UTC (permalink / raw)
  To: jbacik, tj, jack, clm, axboe
  Cc: kernel-team, linux-kernel, peterz, linux-btrfs, linux-block,
	jianchao.w.wang, hch

On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote:
> +static void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate)
> +{
> +	unsigned long flags;
> +
> +	local_irq_save(flags);
> +	u64_stats_update_begin(&rq->aborted_gstate_sync);
> +	rq->aborted_gstate = gstate;
> +	u64_stats_update_end(&rq->aborted_gstate_sync);
> +	local_irq_restore(flags);
> +}

Please add a comment that explains the purpose of local_irq_save() and
local_irq_restore(). Please also explain why you chose to disable interrupts
instead of disabling preemption. I think that disabling preemption should be
sufficient since this is the only code that updates rq->aborted_gstate and
since this function is always called from thread context.

> @@ -801,6 +840,12 @@ void blk_mq_rq_timed_out(struct request *req, bool reserved)
>  		__blk_mq_complete_request(req);
>  		break;
>  	case BLK_EH_RESET_TIMER:
> +		/*
> +		 * As nothing prevents from completion happening while
> +		 * ->aborted_gstate is set, this may lead to ignored
> +		 * completions and further spurious timeouts.
> +		 */
> +		blk_mq_rq_update_aborted_gstate(req, 0);
>  		blk_add_timer(req);
>  		blk_clear_rq_complete(req);
>  		break;

Is the race that the comment refers to addressed by one of the later patches?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 4/8] blk-mq: use blk_mq_rq_state() instead of testing REQ_ATOM_COMPLETE
  2018-01-08 19:15 ` [PATCH 4/8] blk-mq: use blk_mq_rq_state() instead of testing REQ_ATOM_COMPLETE Tejun Heo
@ 2018-01-08 22:03     ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2018-01-08 22:03 UTC (permalink / raw)
  To: jbacik, tj, jack, clm, axboe
  Cc: kernel-team, linux-kernel, peterz, linux-btrfs, linux-block,
	jianchao.w.wang

T24gTW9uLCAyMDE4LTAxLTA4IGF0IDExOjE1IC0wODAwLCBUZWp1biBIZW8gd3JvdGU6DQo+IGJs
a19tcV9jaGVja19pbmZsaWdodCgpIGFuZCBibGtfbXFfcG9sbF9oeWJyaWRfc2xlZXAoKSB0ZXN0
DQo+IFJFUV9BVE9NX0NPTVBMRVRFIHRvIGRldGVybWluZSB0aGUgcmVxdWVzdCBzdGF0ZS4gIEJv
dGggdXNlcyBhcmUNCj4gc3BlY3VsYXRpdmUgYW5kIHdlIGNhbiB0ZXN0IFJFUV9BVE9NX1NUQVJU
RUQgYW5kIGJsa19tcV9ycV9zdGF0ZSgpIGZvcg0KPiBlcXVpdmFsZW50IHJlc3VsdHMuICBSZXBs
YWNlIHRoZSB0ZXN0cy4gIFRoaXMgd2lsbCBhbGxvdyByZW1vdmluZw0KPiBSRVFfQVRPTV9DT01Q
TEVURSB1c2FnZXMgZnJvbSBibGstbXEuDQoNClJldmlld2VkLWJ5OiBCYXJ0IFZhbiBBc3NjaGUg
PGJhcnQudmFuYXNzY2hlQHdkYy5jb20+DQoNCg==

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 4/8] blk-mq: use blk_mq_rq_state() instead of testing REQ_ATOM_COMPLETE
@ 2018-01-08 22:03     ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2018-01-08 22:03 UTC (permalink / raw)
  To: jbacik, tj, jack, clm, axboe
  Cc: kernel-team, linux-kernel, peterz, linux-btrfs, linux-block,
	jianchao.w.wang

On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote:
> blk_mq_check_inflight() and blk_mq_poll_hybrid_sleep() test
> REQ_ATOM_COMPLETE to determine the request state.  Both uses are
> speculative and we can test REQ_ATOM_STARTED and blk_mq_rq_state() for
> equivalent results.  Replace the tests.  This will allow removing
> REQ_ATOM_COMPLETE usages from blk-mq.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/8] blk-mq: make blk_abort_request() trigger timeout path
  2018-01-08 19:15 ` [PATCH 5/8] blk-mq: make blk_abort_request() trigger timeout path Tejun Heo
@ 2018-01-08 22:10     ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2018-01-08 22:10 UTC (permalink / raw)
  To: jbacik, tj, jack, clm, axboe
  Cc: hoeppner, linux-kernel, peterz, linux-block, sth, kernel-team,
	asamymuthupa, linux-btrfs, jianchao.w.wang

T24gTW9uLCAyMDE4LTAxLTA4IGF0IDExOjE1IC0wODAwLCBUZWp1biBIZW8gd3JvdGU6DQo+IEBA
IC0xNTYsMTIgKzE1NiwxMiBAQCB2b2lkIGJsa190aW1lb3V0X3dvcmsoc3RydWN0IHdvcmtfc3Ry
dWN0ICp3b3JrKQ0KPiAgICovDQo+ICB2b2lkIGJsa19hYm9ydF9yZXF1ZXN0KHN0cnVjdCByZXF1
ZXN0ICpyZXEpDQo+ICB7DQo+IC0JaWYgKGJsa19tYXJrX3JxX2NvbXBsZXRlKHJlcSkpDQo+IC0J
CXJldHVybjsNCj4gLQ0KPiAgCWlmIChyZXEtPnEtPm1xX29wcykgew0KPiAtCQlibGtfbXFfcnFf
dGltZWRfb3V0KHJlcSwgZmFsc2UpOw0KPiArCQlyZXEtPmRlYWRsaW5lID0gamlmZmllczsNCj4g
KwkJbW9kX3RpbWVyKCZyZXEtPnEtPnRpbWVvdXQsIDApOw0KPiAgCX0gZWxzZSB7DQo+ICsJCWlm
IChibGtfbWFya19ycV9jb21wbGV0ZShyZXEpKQ0KPiArCQkJcmV0dXJuOw0KPiAgCQlibGtfZGVs
ZXRlX3RpbWVyKHJlcSk7DQo+ICAJCWJsa19ycV90aW1lZF9vdXQocmVxKTsNCj4gIAl9DQoNCk90
aGVyIHJlcS0+ZGVhZGxpbmUgd3JpdGVzIGFyZSBwcm90ZWN0ZWQgYnkgcHJlZW1wdF9kaXNhYmxl
KCksDQp3cml0ZV9zZXFjb3VudF9iZWdpbigmcnEtPmdzdGF0ZV9zZXEpLCB3cml0ZV9zZXFjb3Vu
dF9lbmQoJnJxLT5nc3RhdGVfc2VxKQ0KYW5kIHByZWVtcHRfZW5hYmxlKCkuIEkgdGhpbmsgaXQn
cyBmaW5lIHRoYXQgdGhlIGFib3ZlIHJlcS0+ZGVhZGxpbmUgc3RvcmUNCmRvZXMgbm90IGhhdmUg
dGhhdCBwcm90ZWN0aW9uIGJ1dCBJIGFsc28gdGhpbmsgdGhhdCB0aGF0IGRlc2VydmVzIGEgY29t
bWVudC4NCg0KVGhhbmtzLA0KDQpCYXJ0Lg==

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/8] blk-mq: make blk_abort_request() trigger timeout path
@ 2018-01-08 22:10     ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2018-01-08 22:10 UTC (permalink / raw)
  To: jbacik, tj, jack, clm, axboe
  Cc: hoeppner, linux-kernel, peterz, linux-block, sth, kernel-team,
	asamymuthupa, linux-btrfs, jianchao.w.wang

On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote:
> @@ -156,12 +156,12 @@ void blk_timeout_work(struct work_struct *work)
>   */
>  void blk_abort_request(struct request *req)
>  {
> -	if (blk_mark_rq_complete(req))
> -		return;
> -
>  	if (req->q->mq_ops) {
> -		blk_mq_rq_timed_out(req, false);
> +		req->deadline = jiffies;
> +		mod_timer(&req->q->timeout, 0);
>  	} else {
> +		if (blk_mark_rq_complete(req))
> +			return;
>  		blk_delete_timer(req);
>  		blk_rq_timed_out(req);
>  	}

Other req->deadline writes are protected by preempt_disable(),
write_seqcount_begin(&rq->gstate_seq), write_seqcount_end(&rq->gstate_seq)
and preempt_enable(). I think it's fine that the above req->deadline store
does not have that protection but I also think that that deserves a comment.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] blk-mq: protect completion path with RCU
  2018-01-08 20:15     ` Jens Axboe
@ 2018-01-08 22:55       ` Jens Axboe
  2018-01-08 23:27         ` Holger Hoffstätte
  0 siblings, 1 reply; 41+ messages in thread
From: Jens Axboe @ 2018-01-08 22:55 UTC (permalink / raw)
  To: Holger Hoffstätte, Tejun Heo, jack, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, peterz, jianchao.w.wang,
	Bart.VanAssche, linux-block

On 1/8/18 1:15 PM, Jens Axboe wrote:
> On 1/8/18 12:57 PM, Holger Hoffstätte wrote:
>> On 01/08/18 20:15, Tejun Heo wrote:
>>> Currently, blk-mq protects only the issue path with RCU.  This patch
>>> puts the completion path under the same RCU protection.  This will be
>>> used to synchronize issue/completion against timeout by later patches,
>>> which will also add the comments.
>>>
>>> Signed-off-by: Tejun Heo <tj@kernel.org>
>>> ---
>>>  block/blk-mq.c | 5 +++++
>>>  1 file changed, 5 insertions(+)
>>>
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index ddc9261..6741c3e 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -584,11 +584,16 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
>>>  void blk_mq_complete_request(struct request *rq)
>>>  {
>>>  	struct request_queue *q = rq->q;
>>> +	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu);
>>> +	int srcu_idx;
>>>  
>>>  	if (unlikely(blk_should_fake_timeout(q)))
>>>  		return;
>>> +
>>> +	hctx_lock(hctx, &srcu_idx);
>>>  	if (!blk_mark_rq_complete(rq))
>>>  		__blk_mq_complete_request(rq);
>>> +	hctx_unlock(hctx, srcu_idx);
>>>  }
>>>  EXPORT_SYMBOL(blk_mq_complete_request);
>>
>> So I've had v3 running fine with 4.14++ and when I first tried Jens'
>> additional helpers on top, I got a bunch of warnings which I didn't
>> investigate further at the time. Now they are back since the helpers
>> moved into patch #1 and #2 correctly says:
>>
>> ..
>> block/blk-mq.c: In function ‘blk_mq_complete_request’:
>> ./include/linux/srcu.h:175:2: warning: ‘srcu_idx’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>>   __srcu_read_unlock(sp, idx);
>>   ^~~~~~~~~~~~~~~~~~~~~~~~~~~
>> block/blk-mq.c:587:6: note: ‘srcu_idx’ was declared here
>>   int srcu_idx;
>>       ^~~~~~~~
>> ..etc.
>>
>> This is with gcc 7.2.0.
>>
>> I understand that this is a somewhat-false positive since the lock always
>> precedes the unlock & writes to the value, but can we properly initialize
>> or annotate this?
> 
> It's not a somewhat false positive, it's a false positive. I haven't seen
> that bogus warning with the compiler I'm running:
> 
> gcc (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0
> 
> and
> 
> gcc (GCC) 7.2.0
> 
> Neither of them throw the warning.

Are you on non-x86? Really bothers me to have to add a work-around
for something that's obviously a false positive.

I forget if we have some gcc/compiler annotation for this, otherwise
the good old

int srcu_idx = srcu_idx;

should get the job done.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] blk-mq: protect completion path with RCU
  2018-01-08 22:55       ` Jens Axboe
@ 2018-01-08 23:27         ` Holger Hoffstätte
  2018-01-08 23:33           ` Holger Hoffstätte
  0 siblings, 1 reply; 41+ messages in thread
From: Holger Hoffstätte @ 2018-01-08 23:27 UTC (permalink / raw)
  To: Jens Axboe, Tejun Heo, jack, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, peterz, jianchao.w.wang,
	Bart.VanAssche, linux-block

On 01/08/18 23:55, Jens Axboe wrote:
> On 1/8/18 1:15 PM, Jens Axboe wrote:
>> On 1/8/18 12:57 PM, Holger Hoffstätte wrote:
>>> On 01/08/18 20:15, Tejun Heo wrote:
>>>> Currently, blk-mq protects only the issue path with RCU.  This patch
>>>> puts the completion path under the same RCU protection.  This will be
>>>> used to synchronize issue/completion against timeout by later patches,
>>>> which will also add the comments.
>>>>
>>>> Signed-off-by: Tejun Heo <tj@kernel.org>
>>>> ---
>>>>  block/blk-mq.c | 5 +++++
>>>>  1 file changed, 5 insertions(+)
>>>>
>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>> index ddc9261..6741c3e 100644
>>>> --- a/block/blk-mq.c
>>>> +++ b/block/blk-mq.c
>>>> @@ -584,11 +584,16 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
>>>>  void blk_mq_complete_request(struct request *rq)
>>>>  {
>>>>  	struct request_queue *q = rq->q;
>>>> +	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu);
>>>> +	int srcu_idx;
>>>>  
>>>>  	if (unlikely(blk_should_fake_timeout(q)))
>>>>  		return;
>>>> +
>>>> +	hctx_lock(hctx, &srcu_idx);
>>>>  	if (!blk_mark_rq_complete(rq))
>>>>  		__blk_mq_complete_request(rq);
>>>> +	hctx_unlock(hctx, srcu_idx);
>>>>  }
>>>>  EXPORT_SYMBOL(blk_mq_complete_request);
>>>
>>> So I've had v3 running fine with 4.14++ and when I first tried Jens'
>>> additional helpers on top, I got a bunch of warnings which I didn't
>>> investigate further at the time. Now they are back since the helpers
>>> moved into patch #1 and #2 correctly says:
>>>
>>> ..
>>> block/blk-mq.c: In function ‘blk_mq_complete_request’:
>>> ./include/linux/srcu.h:175:2: warning: ‘srcu_idx’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>>>   __srcu_read_unlock(sp, idx);
>>>   ^~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> block/blk-mq.c:587:6: note: ‘srcu_idx’ was declared here
>>>   int srcu_idx;
>>>       ^~~~~~~~
>>> ..etc.
>>>
>>> This is with gcc 7.2.0.
>>>
>>> I understand that this is a somewhat-false positive since the lock always
>>> precedes the unlock & writes to the value, but can we properly initialize
>>> or annotate this?
>>
>> It's not a somewhat false positive, it's a false positive. I haven't seen
>> that bogus warning with the compiler I'm running:
>>
>> gcc (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0
>>
>> and
>>
>> gcc (GCC) 7.2.0
>>
>> Neither of them throw the warning.
> 
> Are you on non-x86? Really bothers me to have to add a work-around
> for something that's obviously a false positive.

No, plain old x86-64 on Gentoo (*without* crazy hardening or extras).
But don't bother spending too much time on this, I can live with the
warning even though I found it curious. Not even sure why you don't
see this; according to git -Wmaybe-uninitialized is part of the
"extrawarn" make flags since ~2016.

Apparently gcc can't see that the first branch in the lock-helper
implies that the first branch in the unlock helper will also be taken
unconditionally (and how could it? nested branch condition elision?),
so it concludes srcu_idx "may" be used uninitialized in unlock, and
that's not generally wrong. It just happens to be in this case.

> I forget if we have some gcc/compiler annotation for this, otherwise

I actually went looking for gcc bugs, pragmas, annotations and whatnot;
there are many bugs depending on optimizer, the #pragmas don't work
and a potential __attribute__(initialized) was only discussed, but
apparently never implemented. \o/

> the good old
> 
> int srcu_idx = srcu_idx;
> 
> should get the job done.

(Narrator: It didn't.)

Isn't there some magic value that will never be used by regular
operations? 0 or -1? It should not matter much since it will be
consistently overwritten or left alone.

cheers
Holger

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/8] blk-mq: replace timeout synchronization with a RCU and generation based scheme
  2018-01-08 19:15 ` [PATCH 3/8] blk-mq: replace timeout synchronization with a RCU and generation based scheme Tejun Heo
@ 2018-01-08 23:29     ` Bart Van Assche
  2018-01-08 23:29     ` Bart Van Assche
  1 sibling, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2018-01-08 23:29 UTC (permalink / raw)
  To: jbacik, tj, jack, clm, axboe
  Cc: kernel-team, linux-kernel, peterz, linux-btrfs, linux-block,
	jianchao.w.wang, hch

T24gTW9uLCAyMDE4LTAxLTA4IGF0IDExOjE1IC0wODAwLCBUZWp1biBIZW8gd3JvdGU6DQo+IEBA
IC0yMzAsNiArMjMyLDI3IEBAIHN0cnVjdCByZXF1ZXN0IHsNCj4gIA0KPiAgCXVuc2lnbmVkIHNo
b3J0IHdyaXRlX2hpbnQ7DQo+ICANCj4gKwkvKg0KPiArCSAqIE9uIGJsay1tcSwgdGhlIGxvd2Vy
IGJpdHMgb2YgLT5nc3RhdGUgY2FycnkgdGhlIE1RX1JRXyogc3RhdGUNCj4gKwkgKiB2YWx1ZSBh
bmQgdGhlIHVwcGVyIGJpdHMgdGhlIGdlbmVyYXRpb24gbnVtYmVyIHdoaWNoIGlzDQo+ICsJICog
bW9ub3RvbmljYWxseSBpbmNyZW1lbnRlZCBhbmQgdXNlZCB0byBkaXN0aW5ndWlzaCB0aGUgcmV1
c2UNCj4gKwkgKiBpbnN0YW5jZXMuDQo+ICsJICoNCj4gKwkgKiAtPmdzdGF0ZV9zZXEgYWxsb3dz
IHVwZGF0ZXMgdG8gLT5nc3RhdGUgYW5kIG90aGVyIGZpZWxkcw0KPiArCSAqIChjdXJyZW50bHkg
LT5kZWFkbGluZSkgZHVyaW5nIHJlcXVlc3Qgc3RhcnQgdG8gYmUgcmVhZA0KPiArCSAqIGF0b21p
Y2FsbHkgZnJvbSB0aGUgdGltZW91dCBwYXRoLCBzbyB0aGF0IGl0IGNhbiBvcGVyYXRlIG9uIGEN
Cj4gKwkgKiBjb2hlcmVudCBzZXQgb2YgaW5mb3JtYXRpb24uDQo+ICsJICovDQo+ICsJc2VxY291
bnRfdCBnc3RhdGVfc2VxOw0KPiArCXU2NCBnc3RhdGU7DQo+ICsNCj4gKwkvKg0KPiArCSAqIC0+
YWJvcnRlZF9nc3RhdGUgaXMgdXNlZCBieSB0aGUgdGltZW91dCB0byBjbGFpbSBhIHNwZWNpZmlj
DQo+ICsJICogcmVjeWNsZSBpbnN0YW5jZSBvZiB0aGlzIHJlcXVlc3QuICBTZWUgYmxrX21xX3Rp
bWVvdXRfd29yaygpLg0KPiArCSAqLw0KPiArCXN0cnVjdCB1NjRfc3RhdHNfc3luYyBhYm9ydGVk
X2dzdGF0ZV9zeW5jOw0KPiArCXU2NCBhYm9ydGVkX2dzdGF0ZTsNCj4gKw0KPiAgCXVuc2lnbmVk
IGxvbmcgZGVhZGxpbmU7DQo+ICAJc3RydWN0IGxpc3RfaGVhZCB0aW1lb3V0X2xpc3Q7DQoNCkRv
ZXMgImdzdGF0ZSIgcGVyaGFwcyBzdGFuZCBmb3IgImdlbmVyYXRpb24gbnVtYmVyIGFuZCBzdGF0
ZSI/IElmIHNvLCBwbGVhc2UNCm1lbnRpb24gdGhpcyBpbiBvbmUgb2YgdGhlIGFib3ZlIGNvbW1l
bnRzLg0KDQpUaGFua3MsDQoNCkJhcnQu

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/8] blk-mq: replace timeout synchronization with a RCU and generation based scheme
@ 2018-01-08 23:29     ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2018-01-08 23:29 UTC (permalink / raw)
  To: jbacik, tj, jack, clm, axboe
  Cc: kernel-team, linux-kernel, peterz, linux-btrfs, linux-block,
	jianchao.w.wang, hch

On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote:
> @@ -230,6 +232,27 @@ struct request {
>  
>  	unsigned short write_hint;
>  
> +	/*
> +	 * On blk-mq, the lower bits of ->gstate carry the MQ_RQ_* state
> +	 * value and the upper bits the generation number which is
> +	 * monotonically incremented and used to distinguish the reuse
> +	 * instances.
> +	 *
> +	 * ->gstate_seq allows updates to ->gstate and other fields
> +	 * (currently ->deadline) during request start to be read
> +	 * atomically from the timeout path, so that it can operate on a
> +	 * coherent set of information.
> +	 */
> +	seqcount_t gstate_seq;
> +	u64 gstate;
> +
> +	/*
> +	 * ->aborted_gstate is used by the timeout to claim a specific
> +	 * recycle instance of this request.  See blk_mq_timeout_work().
> +	 */
> +	struct u64_stats_sync aborted_gstate_sync;
> +	u64 aborted_gstate;
> +
>  	unsigned long deadline;
>  	struct list_head timeout_list;

Does "gstate" perhaps stand for "generation number and state"? If so, please
mention this in one of the above comments.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] blk-mq: protect completion path with RCU
  2018-01-08 23:27         ` Holger Hoffstätte
@ 2018-01-08 23:33           ` Holger Hoffstätte
  0 siblings, 0 replies; 41+ messages in thread
From: Holger Hoffstätte @ 2018-01-08 23:33 UTC (permalink / raw)
  To: Jens Axboe, Tejun Heo, jack, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, peterz, jianchao.w.wang,
	Bart.VanAssche, linux-block

On 01/09/18 00:27, Holger Hoffstätte wrote:
> On 01/08/18 23:55, Jens Axboe wrote:
>> the good old
>>
>> int srcu_idx = srcu_idx;
>>
>> should get the job done.
> 
> (Narrator: It didn't.)

Narrator: we retract our previous statement and apologize for the
confusion. It works fine when you correctly replace all occurrences,
not just one. <:)

Holger

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/8] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq
  2018-01-08 19:15 ` [PATCH 6/8] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq Tejun Heo
@ 2018-01-08 23:36     ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2018-01-08 23:36 UTC (permalink / raw)
  To: jbacik, tj, jack, clm, axboe
  Cc: kernel-team, linux-kernel, peterz, linux-btrfs, linux-block,
	jianchao.w.wang

T24gTW9uLCAyMDE4LTAxLTA4IGF0IDExOjE1IC0wODAwLCBUZWp1biBIZW8gd3JvdGU6DQo+IEFm
dGVyIHRoZSByZWNlbnQgdXBkYXRlcyB0byB1c2UgZ2VuZXJhdGlvbiBudW1iZXIgYW5kIHN0YXRl
IGJhc2VkDQo+IHN5bmNocm9uaXphdGlvbiwgYmxrLW1xIG5vIGxvbmdlciBkZXBlbmRzIG9uIFJF
UV9BVE9NX0NPTVBMRVRFIGV4Y2VwdA0KPiB0byBhdm9pZCBmaXJpbmcgdGhlIHNhbWUgdGltZW91
dCBtdWx0aXBsZSB0aW1lcy4NCj4gDQo+IFJlbW92ZSBhbGwgUkVRX0FUT01fQ09NUExFVEUgdXNh
Z2VzIGFuZCB1c2UgYSBuZXcgcnFfZmxhZ3MgZmxhZw0KPiBSUUZfTVFfVElNRU9VVF9FWFBJUkVE
IHRvIGF2b2lkIGZpcmluZyB0aGUgc2FtZSB0aW1lb3V0IG11bHRpcGxlDQo+IHRpbWVzLiAgVGhp
cyByZW1vdmVzIGF0b21pYyBiaXRvcHMgZnJvbSBob3QgcGF0aHMgdG9vLg0KPiANCj4gdjI6IFJl
bW92ZWQgYmxrX2NsZWFyX3JxX2NvbXBsZXRlKCkgZnJvbSBibGtfbXFfcnFfdGltZWRfb3V0KCku
DQo+IA0KPiB2MzogQWRkZWQgUlFGX01RX1RJTUVPVVRfRVhQSVJFRCBmbGFnLg0KDQpJIHRoaW5r
IGl0J3MgdW5mb3J0dW5hdGUgdGhhdCB0aGlzIHBhdGNoIHNwcmVhZHMgdGhlIHJlcXVlc3Qgc3Rh
dGUgb3ZlciB0d28NCnN0cnVjdCByZXF1ZXN0IG1lbWJlcnMsIG5hbWVseSB0aGUgbG93ZXIgdHdv
IGJpdHMgb2YgZ3N0YXRlIGFuZCB0aGUNClJRRl9NUV9USU1FT1VUX0VYUElSRUQgZmxhZyBpbiBy
ZXEtPnJxX2ZsYWdzLiBQbGVhc2UgY29uc2lkZXIgdG8gZHJvcCB0aGUNClJRRl9NUV9USU1FT1VU
X0VYUElSRUQgZmxhZyBhbmQgdG8gcmVwcmVzZW50IHRoZSAidGltZW91dCBoYXMgYmVlbiBwcm9j
ZXNzZWQiDQpzdGF0ZSBhcyBhIE1RX1JRXyogc3RhdGUgaW4gZ3N0YXRlLiBUaGF0IHdpbGwgYXZv
aWQgdGhhdCBldmVyeSBzdGF0ZSB1cGRhdGUNCmhhcyB0byBiZSByZXZpZXdlZCB3aGV0aGVyIG9y
IG5vdCBwZXJoYXBzIGFuIHVwZGF0ZSBvZiB0aGUNClJRRl9NUV9USU1FT1VUX0VYUElSRUQgZmxh
ZyBpcyBtaXNzaW5nLg0KDQpUaGFua3MsDQoNCkJhcnQu

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/8] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq
@ 2018-01-08 23:36     ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2018-01-08 23:36 UTC (permalink / raw)
  To: jbacik, tj, jack, clm, axboe
  Cc: kernel-team, linux-kernel, peterz, linux-btrfs, linux-block,
	jianchao.w.wang

On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote:
> After the recent updates to use generation number and state based
> synchronization, blk-mq no longer depends on REQ_ATOM_COMPLETE except
> to avoid firing the same timeout multiple times.
> 
> Remove all REQ_ATOM_COMPLETE usages and use a new rq_flags flag
> RQF_MQ_TIMEOUT_EXPIRED to avoid firing the same timeout multiple
> times.  This removes atomic bitops from hot paths too.
> 
> v2: Removed blk_clear_rq_complete() from blk_mq_rq_timed_out().
> 
> v3: Added RQF_MQ_TIMEOUT_EXPIRED flag.

I think it's unfortunate that this patch spreads the request state over two
struct request members, namely the lower two bits of gstate and the
RQF_MQ_TIMEOUT_EXPIRED flag in req->rq_flags. Please consider to drop the
RQF_MQ_TIMEOUT_EXPIRED flag and to represent the "timeout has been processed"
state as a MQ_RQ_* state in gstate. That will avoid that every state update
has to be reviewed whether or not perhaps an update of the
RQF_MQ_TIMEOUT_EXPIRED flag is missing.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 7/8] blk-mq: remove REQ_ATOM_STARTED
  2018-01-08 19:15 ` [PATCH 7/8] blk-mq: remove REQ_ATOM_STARTED Tejun Heo
@ 2018-01-08 23:47     ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2018-01-08 23:47 UTC (permalink / raw)
  To: jbacik, tj, jack, clm, axboe
  Cc: kernel-team, linux-kernel, peterz, linux-btrfs, linux-block,
	jianchao.w.wang

T24gTW9uLCAyMDE4LTAxLTA4IGF0IDExOjE1IC0wODAwLCBUZWp1biBIZW8gd3JvdGU6DQo+IEFm
dGVyIHRoZSByZWNlbnQgdXBkYXRlcyB0byB1c2UgZ2VuZXJhdGlvbiBudW1iZXIgYW5kIHN0YXRl
IGJhc2VkDQo+IHN5bmNocm9uaXphdGlvbiwgd2UgY2FuIGVhc2lseSByZXBsYWNlIFJFUV9BVE9N
X1NUQVJURUQgdXNhZ2VzIGJ5DQo+IGFkZGluZyBhbiBleHRyYSBzdGF0ZSB0byBkaXN0aW5ndWlz
aCBjb21wbGV0ZWQgYnV0IG5vdCB5ZXQgZnJlZWQNCj4gc3RhdGUuDQo+IA0KPiBBZGQgTVFfUlFf
Q09NUExFVEUgYW5kIHJlcGxhY2UgUkVRX0FUT01fU1RBUlRFRCB1c2FnZXMgd2l0aA0KPiBibGtf
bXFfcnFfc3RhdGUoKSB0ZXN0cy4gIFJFUV9BVE9NX1NUQVJURUQgbm8gbG9uZ2VyIGhhcyBhbnkg
dXNlcnMNCj4gbGVmdCBhbmQgaXMgcmVtb3ZlZC4NCg0KUmV2aWV3ZWQtYnk6IEJhcnQgVmFuIEFz
c2NoZSA8YmFydC52YW5hc3NjaGVAd2RjLmNvbT4NCg0K

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 7/8] blk-mq: remove REQ_ATOM_STARTED
@ 2018-01-08 23:47     ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2018-01-08 23:47 UTC (permalink / raw)
  To: jbacik, tj, jack, clm, axboe
  Cc: kernel-team, linux-kernel, peterz, linux-btrfs, linux-block,
	jianchao.w.wang

On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote:
> After the recent updates to use generation number and state based
> synchronization, we can easily replace REQ_ATOM_STARTED usages by
> adding an extra state to distinguish completed but not yet freed
> state.
> 
> Add MQ_RQ_COMPLETE and replace REQ_ATOM_STARTED usages with
> blk_mq_rq_state() tests.  REQ_ATOM_STARTED no longer has any users
> left and is removed.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 8/8] blk-mq: rename blk_mq_hw_ctx->queue_rq_srcu to ->srcu
  2018-01-08 19:15 ` [PATCH 8/8] blk-mq: rename blk_mq_hw_ctx->queue_rq_srcu to ->srcu Tejun Heo
@ 2018-01-08 23:48     ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2018-01-08 23:48 UTC (permalink / raw)
  To: jbacik, tj, jack, clm, axboe
  Cc: kernel-team, linux-kernel, peterz, linux-btrfs, linux-block,
	jianchao.w.wang

T24gTW9uLCAyMDE4LTAxLTA4IGF0IDExOjE1IC0wODAwLCBUZWp1biBIZW8gd3JvdGU6DQo+IFRo
ZSBSQ1UgcHJvdGVjdGlvbiBoYXMgYmVlbiBleHBhbmRlZCB0byBjb3ZlciBib3RoIHF1ZXVlaW5n
IGFuZA0KPiBjb21wbGV0aW9uIHBhdGhzIG1ha2luZyAtPnF1ZXVlX3JxX3NyY3UgYSBtaXNub21l
ci4gIFJlbmFtZSBpdCB0bw0KPiAtPnNyY3UgYXMgc3VnZ2VzdGVkIGJ5IEJhcnQuDQoNClJldmll
d2VkLWJ5OiBCYXJ0IFZhbiBBc3NjaGUgPGJhcnQudmFuYXNzY2hlQHdkYy5jb20+DQoNCg==

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 8/8] blk-mq: rename blk_mq_hw_ctx->queue_rq_srcu to ->srcu
@ 2018-01-08 23:48     ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2018-01-08 23:48 UTC (permalink / raw)
  To: jbacik, tj, jack, clm, axboe
  Cc: kernel-team, linux-kernel, peterz, linux-btrfs, linux-block,
	jianchao.w.wang

On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote:
> The RCU protection has been expanded to cover both queueing and
> completion paths making ->queue_rq_srcu a misnomer.  Rename it to
> ->srcu as suggested by Bart.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] blk-mq: protect completion path with RCU
  2018-01-08 19:15 ` [PATCH 2/8] blk-mq: protect completion path with RCU Tejun Heo
  2018-01-08 19:57   ` Holger Hoffstätte
@ 2018-01-09  7:08   ` Hannes Reinecke
  2018-01-09 15:22     ` Jens Axboe
  2018-01-09 16:12     ` Bart Van Assche
  2 siblings, 1 reply; 41+ messages in thread
From: Hannes Reinecke @ 2018-01-09  7:08 UTC (permalink / raw)
  To: Tejun Heo, jack, axboe, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, peterz, jianchao.w.wang,
	Bart.VanAssche, linux-block

On 01/08/2018 08:15 PM, Tejun Heo wrote:
> Currently, blk-mq protects only the issue path with RCU.  This patch
> puts the completion path under the same RCU protection.  This will be
> used to synchronize issue/completion against timeout by later patches,
> which will also add the comments.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
>  block/blk-mq.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index ddc9261..6741c3e 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -584,11 +584,16 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
>  void blk_mq_complete_request(struct request *rq)
>  {
>  	struct request_queue *q = rq->q;
> +	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu);
> +	int srcu_idx;
>  
>  	if (unlikely(blk_should_fake_timeout(q)))
>  		return;
> +
> +	hctx_lock(hctx, &srcu_idx);
>  	if (!blk_mark_rq_complete(rq))
>  		__blk_mq_complete_request(rq);
> +	hctx_unlock(hctx, srcu_idx);
>  }
>  EXPORT_SYMBOL(blk_mq_complete_request);
>  
> 
Hmm. Why do we need to call blk_mq_map_queue() here?
Is there a chance that we end up with a _different_ hctx on completion
than that one used for submission?
If not, why can't we just keep a pointer to the hctx in struct request?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] blk-mq: protect completion path with RCU
  2018-01-09  7:08   ` Hannes Reinecke
@ 2018-01-09 15:22     ` Jens Axboe
  0 siblings, 0 replies; 41+ messages in thread
From: Jens Axboe @ 2018-01-09 15:22 UTC (permalink / raw)
  To: Hannes Reinecke, Tejun Heo, jack, axboe, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, peterz, jianchao.w.wang,
	Bart.VanAssche, linux-block

On 1/9/18 12:08 AM, Hannes Reinecke wrote:
> On 01/08/2018 08:15 PM, Tejun Heo wrote:
>> Currently, blk-mq protects only the issue path with RCU.  This patch
>> puts the completion path under the same RCU protection.  This will be
>> used to synchronize issue/completion against timeout by later patches,
>> which will also add the comments.
>>
>> Signed-off-by: Tejun Heo <tj@kernel.org>
>> ---
>>  block/blk-mq.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index ddc9261..6741c3e 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -584,11 +584,16 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
>>  void blk_mq_complete_request(struct request *rq)
>>  {
>>  	struct request_queue *q = rq->q;
>> +	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu);
>> +	int srcu_idx;
>>  
>>  	if (unlikely(blk_should_fake_timeout(q)))
>>  		return;
>> +
>> +	hctx_lock(hctx, &srcu_idx);
>>  	if (!blk_mark_rq_complete(rq))
>>  		__blk_mq_complete_request(rq);
>> +	hctx_unlock(hctx, srcu_idx);
>>  }
>>  EXPORT_SYMBOL(blk_mq_complete_request);
>>  
>>
> Hmm. Why do we need to call blk_mq_map_queue() here?
> Is there a chance that we end up with a _different_ hctx on completion
> than that one used for submission?
> If not, why can't we just keep a pointer to the hctx in struct request?

Mapping is the right thing to do. We cache the software queue, which
allows us to map back to the same hardware queue. There would be no
point in storing both, the mapping is very cheap. No point in bloating
the request with redundant information.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/8] blk-mq: replace timeout synchronization with a RCU and generation based scheme
  2018-01-08 21:06     ` Bart Van Assche
  (?)
@ 2018-01-09 15:46     ` tj
  -1 siblings, 0 replies; 41+ messages in thread
From: tj @ 2018-01-09 15:46 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: jbacik, jack, clm, axboe, kernel-team, linux-kernel, peterz,
	linux-btrfs, linux-block, jianchao.w.wang, hch

On Mon, Jan 08, 2018 at 09:06:55PM +0000, Bart Van Assche wrote:
> On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote:
> > +static void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate)
> > +{
> > +	unsigned long flags;
> > +
> > +	local_irq_save(flags);
> > +	u64_stats_update_begin(&rq->aborted_gstate_sync);
> > +	rq->aborted_gstate = gstate;
> > +	u64_stats_update_end(&rq->aborted_gstate_sync);
> > +	local_irq_restore(flags);
> > +}
> 
> Please add a comment that explains the purpose of local_irq_save() and
> local_irq_restore(). Please also explain why you chose to disable interrupts

Will do.

> instead of disabling preemption. I think that disabling preemption should be
> sufficient since this is the only code that updates rq->aborted_gstate and
> since this function is always called from thread context.

blk_mq_complete_request() can read it from the irq context.  If that
happens between update_begin and end, it'll end up looping infinitely.

> > @@ -801,6 +840,12 @@ void blk_mq_rq_timed_out(struct request *req, bool reserved)
> >  		__blk_mq_complete_request(req);
> >  		break;
> >  	case BLK_EH_RESET_TIMER:
> > +		/*
> > +		 * As nothing prevents from completion happening while
> > +		 * ->aborted_gstate is set, this may lead to ignored
> > +		 * completions and further spurious timeouts.
> > +		 */
> > +		blk_mq_rq_update_aborted_gstate(req, 0);
> >  		blk_add_timer(req);
> >  		blk_clear_rq_complete(req);
> >  		break;
> 
> Is the race that the comment refers to addressed by one of the later patches?

No, but it's not a new race.  It has always been there and I suppose
doesn't lead to practical problems - the race window is pretty small
and the effect isn't critical.  I'm just documenting the existing race
condition.  Will note that in the description.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/8] blk-mq: replace timeout synchronization with a RCU and generation based scheme
  2018-01-08 23:29     ` Bart Van Assche
  (?)
@ 2018-01-09 15:46     ` tj
  -1 siblings, 0 replies; 41+ messages in thread
From: tj @ 2018-01-09 15:46 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: jbacik, jack, clm, axboe, kernel-team, linux-kernel, peterz,
	linux-btrfs, linux-block, jianchao.w.wang, hch

On Mon, Jan 08, 2018 at 11:29:11PM +0000, Bart Van Assche wrote:
> Does "gstate" perhaps stand for "generation number and state"? If so, please
> mention this in one of the above comments.

Yeah, will do.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/8] blk-mq: make blk_abort_request() trigger timeout path
  2018-01-08 22:10     ` Bart Van Assche
  (?)
@ 2018-01-09 16:02     ` tj
  -1 siblings, 0 replies; 41+ messages in thread
From: tj @ 2018-01-09 16:02 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: jbacik, jack, clm, axboe, hoeppner, linux-kernel, peterz,
	linux-block, sth, kernel-team, asamymuthupa, linux-btrfs,
	jianchao.w.wang

On Mon, Jan 08, 2018 at 10:10:01PM +0000, Bart Van Assche wrote:
> Other req->deadline writes are protected by preempt_disable(),
> write_seqcount_begin(&rq->gstate_seq), write_seqcount_end(&rq->gstate_seq)
> and preempt_enable(). I think it's fine that the above req->deadline store
> does not have that protection but I also think that that deserves a comment.

Will add.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] blk-mq: protect completion path with RCU
  2018-01-08 19:15 ` [PATCH 2/8] blk-mq: protect completion path with RCU Tejun Heo
@ 2018-01-09 16:12     ` Bart Van Assche
  2018-01-09  7:08   ` Hannes Reinecke
  2018-01-09 16:12     ` Bart Van Assche
  2 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2018-01-09 16:12 UTC (permalink / raw)
  To: jbacik, tj, jack, clm, axboe
  Cc: kernel-team, linux-kernel, peterz, linux-btrfs, linux-block,
	jianchao.w.wang

T24gTW9uLCAyMDE4LTAxLTA4IGF0IDExOjE1IC0wODAwLCBUZWp1biBIZW8gd3JvdGU6DQo+IEN1
cnJlbnRseSwgYmxrLW1xIHByb3RlY3RzIG9ubHkgdGhlIGlzc3VlIHBhdGggd2l0aCBSQ1UuICBU
aGlzIHBhdGNoDQo+IHB1dHMgdGhlIGNvbXBsZXRpb24gcGF0aCB1bmRlciB0aGUgc2FtZSBSQ1Ug
cHJvdGVjdGlvbi4gIFRoaXMgd2lsbCBiZQ0KPiB1c2VkIHRvIHN5bmNocm9uaXplIGlzc3VlL2Nv
bXBsZXRpb24gYWdhaW5zdCB0aW1lb3V0IGJ5IGxhdGVyIHBhdGNoZXMsDQo+IHdoaWNoIHdpbGwg
YWxzbyBhZGQgdGhlIGNvbW1lbnRzLg0KPiANCj4gU2lnbmVkLW9mZi1ieTogVGVqdW4gSGVvIDx0
akBrZXJuZWwub3JnPg0KPiAtLS0NCj4gIGJsb2NrL2Jsay1tcS5jIHwgNSArKysrKw0KPiAgMSBm
aWxlIGNoYW5nZWQsIDUgaW5zZXJ0aW9ucygrKQ0KPiANCj4gZGlmZiAtLWdpdCBhL2Jsb2NrL2Js
ay1tcS5jIGIvYmxvY2svYmxrLW1xLmMNCj4gaW5kZXggZGRjOTI2MS4uNjc0MWMzZSAxMDA2NDQN
Cj4gLS0tIGEvYmxvY2svYmxrLW1xLmMNCj4gKysrIGIvYmxvY2svYmxrLW1xLmMNCj4gQEAgLTU4
NCwxMSArNTg0LDE2IEBAIHN0YXRpYyB2b2lkIGhjdHhfbG9jayhzdHJ1Y3QgYmxrX21xX2h3X2N0
eCAqaGN0eCwgaW50ICpzcmN1X2lkeCkNCj4gIHZvaWQgYmxrX21xX2NvbXBsZXRlX3JlcXVlc3Qo
c3RydWN0IHJlcXVlc3QgKnJxKQ0KPiAgew0KPiAgCXN0cnVjdCByZXF1ZXN0X3F1ZXVlICpxID0g
cnEtPnE7DQo+ICsJc3RydWN0IGJsa19tcV9od19jdHggKmhjdHggPSBibGtfbXFfbWFwX3F1ZXVl
KHEsIHJxLT5tcV9jdHgtPmNwdSk7DQo+ICsJaW50IHNyY3VfaWR4Ow0KPiAgDQo+ICAJaWYgKHVu
bGlrZWx5KGJsa19zaG91bGRfZmFrZV90aW1lb3V0KHEpKSkNCj4gIAkJcmV0dXJuOw0KPiArDQo+
ICsJaGN0eF9sb2NrKGhjdHgsICZzcmN1X2lkeCk7DQo+ICAJaWYgKCFibGtfbWFya19ycV9jb21w
bGV0ZShycSkpDQo+ICAJCV9fYmxrX21xX2NvbXBsZXRlX3JlcXVlc3QocnEpOw0KPiArCWhjdHhf
dW5sb2NrKGhjdHgsIHNyY3VfaWR4KTsNCj4gIH0NCj4gIEVYUE9SVF9TWU1CT0woYmxrX21xX2Nv
bXBsZXRlX3JlcXVlc3QpOw0KDQpIZWxsbyBUZWp1biwNCg0KSSdtIGNvbmNlcm5lZCBhYm91dCB0
aGUgYWRkaXRpb25hbCBDUFUgY3ljbGVzIG5lZWRlZCBmb3IgdGhlIG5ldyBibGtfbXFfbWFwX3F1
ZXVlKCkNCmNhbGwsIGFsdGhvdWdoIEkga25vdyB0aGlzIGNhbGwgaXMgY2hlYXAuIFdvdWxkIHRo
ZSB0aW1lb3V0IGNvZGUgcmVhbGx5IGdldCB0aGF0DQptdWNoIG1vcmUgY29tcGxpY2F0ZWQgaWYg
dGhlIGhjdHhfbG9jaygpIGFuZCBoY3R4X3VubG9jaygpIGNhbGxzIHdvdWxkIGJlIGNoYW5nZWQN
CmludG8gcmN1X3JlYWRfbG9jaygpIGFuZCByY3VfcmVhZF91bmxvY2soKSBjYWxscz8gV291bGQg
aXQgYmUgc3VmZmljaWVudCBpZg0KImlmIChoYXNfcmN1KSBzeW5jaHJvbml6ZV9yY3UoKTsiIHdv
dWxkIGJlIGNoYW5nZWQgaW50byAic3luY2hyb25pemVfcmN1KCk7IiBpbg0KYmxrX21xX3RpbWVv
dXRfd29yaygpPw0KDQpUaGFua3MsDQoNCkJhcnQu

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] blk-mq: protect completion path with RCU
@ 2018-01-09 16:12     ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2018-01-09 16:12 UTC (permalink / raw)
  To: jbacik, tj, jack, clm, axboe
  Cc: kernel-team, linux-kernel, peterz, linux-btrfs, linux-block,
	jianchao.w.wang

On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote:
> Currently, blk-mq protects only the issue path with RCU.  This patch
> puts the completion path under the same RCU protection.  This will be
> used to synchronize issue/completion against timeout by later patches,
> which will also add the comments.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
>  block/blk-mq.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index ddc9261..6741c3e 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -584,11 +584,16 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
>  void blk_mq_complete_request(struct request *rq)
>  {
>  	struct request_queue *q = rq->q;
> +	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu);
> +	int srcu_idx;
>  
>  	if (unlikely(blk_should_fake_timeout(q)))
>  		return;
> +
> +	hctx_lock(hctx, &srcu_idx);
>  	if (!blk_mark_rq_complete(rq))
>  		__blk_mq_complete_request(rq);
> +	hctx_unlock(hctx, srcu_idx);
>  }
>  EXPORT_SYMBOL(blk_mq_complete_request);

Hello Tejun,

I'm concerned about the additional CPU cycles needed for the new blk_mq_map_queue()
call, although I know this call is cheap. Would the timeout code really get that
much more complicated if the hctx_lock() and hctx_unlock() calls would be changed
into rcu_read_lock() and rcu_read_unlock() calls? Would it be sufficient if
"if (has_rcu) synchronize_rcu();" would be changed into "synchronize_rcu();" in
blk_mq_timeout_work()?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] blk-mq: protect completion path with RCU
  2018-01-09 16:12     ` Bart Van Assche
  (?)
@ 2018-01-09 16:17     ` Jens Axboe
  -1 siblings, 0 replies; 41+ messages in thread
From: Jens Axboe @ 2018-01-09 16:17 UTC (permalink / raw)
  To: Bart Van Assche, jbacik, tj, jack, clm
  Cc: kernel-team, linux-kernel, peterz, linux-btrfs, linux-block,
	jianchao.w.wang

On 1/9/18 9:12 AM, Bart Van Assche wrote:
> On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote:
>> Currently, blk-mq protects only the issue path with RCU.  This patch
>> puts the completion path under the same RCU protection.  This will be
>> used to synchronize issue/completion against timeout by later patches,
>> which will also add the comments.
>>
>> Signed-off-by: Tejun Heo <tj@kernel.org>
>> ---
>>  block/blk-mq.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index ddc9261..6741c3e 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -584,11 +584,16 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
>>  void blk_mq_complete_request(struct request *rq)
>>  {
>>  	struct request_queue *q = rq->q;
>> +	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu);
>> +	int srcu_idx;
>>  
>>  	if (unlikely(blk_should_fake_timeout(q)))
>>  		return;
>> +
>> +	hctx_lock(hctx, &srcu_idx);
>>  	if (!blk_mark_rq_complete(rq))
>>  		__blk_mq_complete_request(rq);
>> +	hctx_unlock(hctx, srcu_idx);
>>  }
>>  EXPORT_SYMBOL(blk_mq_complete_request);
> 
> Hello Tejun,
> 
> I'm concerned about the additional CPU cycles needed for the new blk_mq_map_queue()
> call, although I know this call is cheap. Would the timeout code really get that
> much more complicated if the hctx_lock() and hctx_unlock() calls would be changed
> into rcu_read_lock() and rcu_read_unlock() calls? Would it be sufficient if
> "if (has_rcu) synchronize_rcu();" would be changed into "synchronize_rcu();" in
> blk_mq_timeout_work()?

It's a non concern, imho. We do queue mapping all over the place for a variety
of reasons, it's total noise, especially since we're calling [s]rcu anyway
after that.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] blk-mq: protect completion path with RCU
  2018-01-09 16:12     ` Bart Van Assche
  (?)
  (?)
@ 2018-01-09 16:19     ` tj
  2018-01-09 16:22       ` Jens Axboe
  -1 siblings, 1 reply; 41+ messages in thread
From: tj @ 2018-01-09 16:19 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: jbacik, jack, clm, axboe, kernel-team, linux-kernel, peterz,
	linux-btrfs, linux-block, jianchao.w.wang

Hello, Bart.

On Tue, Jan 09, 2018 at 04:12:40PM +0000, Bart Van Assche wrote:
> I'm concerned about the additional CPU cycles needed for the new blk_mq_map_queue()
> call, although I know this call is cheap. Would the timeout code really get that

So, if that is really a concern, let's cache that mapping instead of
changing synchronization rules for that.

> much more complicated if the hctx_lock() and hctx_unlock() calls would be changed
> into rcu_read_lock() and rcu_read_unlock() calls? Would it be sufficient if
> "if (has_rcu) synchronize_rcu();" would be changed into "synchronize_rcu();" in
> blk_mq_timeout_work()?

Code-wise, it won't be too much extra code but I think diverging the
sync methods between issue and completion paths is more fragile and
likely to invite confusions and mistakes in the future.  We have the
normal path (issue&completion) synchronizing against the exception
path (timeout).  I think it's best to keep the sync constructs aligned
with that conceptual picture.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/8] blk-mq: protect completion path with RCU
  2018-01-09 16:19     ` tj
@ 2018-01-09 16:22       ` Jens Axboe
  0 siblings, 0 replies; 41+ messages in thread
From: Jens Axboe @ 2018-01-09 16:22 UTC (permalink / raw)
  To: tj, Bart Van Assche
  Cc: jbacik, jack, clm, kernel-team, linux-kernel, peterz,
	linux-btrfs, linux-block, jianchao.w.wang

On 1/9/18 9:19 AM, tj@kernel.org wrote:
> Hello, Bart.
> 
> On Tue, Jan 09, 2018 at 04:12:40PM +0000, Bart Van Assche wrote:
>> I'm concerned about the additional CPU cycles needed for the new blk_mq_map_queue()
>> call, although I know this call is cheap. Would the timeout code really get that
> 
> So, if that is really a concern, let's cache that mapping instead of
> changing synchronization rules for that.
> 
>> much more complicated if the hctx_lock() and hctx_unlock() calls would be changed
>> into rcu_read_lock() and rcu_read_unlock() calls? Would it be sufficient if
>> "if (has_rcu) synchronize_rcu();" would be changed into "synchronize_rcu();" in
>> blk_mq_timeout_work()?
> 
> Code-wise, it won't be too much extra code but I think diverging the
> sync methods between issue and completion paths is more fragile and
> likely to invite confusions and mistakes in the future.  We have the
> normal path (issue&completion) synchronizing against the exception
> path (timeout).  I think it's best to keep the sync constructs aligned
> with that conceptual picture.

Guys, the map call is FINE. We do it at least once per request anyway,
usually multiple times. If it's too expensive, then THAT is what needs
fixing, not adding an arbitrary caching of that mapping. Look at the
actual code:

	return q->queue_hw_ctx[q->mq_map[cpu]];

that's it. It's just a double index.

So let's put this thing to rest right now - the map call is perfectly
fine, especially since the alternatives are either bloating struct
request or making the code less maintainable.

Foot -> down.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 3/8] blk-mq: replace timeout synchronization with a RCU and generation based scheme
  2018-01-09 16:29 [PATCHSET v5] blk-mq: reimplement timeout handling Tejun Heo
@ 2018-01-09 16:29 ` Tejun Heo
  0 siblings, 0 replies; 41+ messages in thread
From: Tejun Heo @ 2018-01-09 16:29 UTC (permalink / raw)
  To: jack, axboe, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, peterz, jianchao.w.wang,
	Bart.VanAssche, linux-block, Tejun Heo, Christoph Hellwig

Currently, blk-mq timeout path synchronizes against the usual
issue/completion path using a complex scheme involving atomic
bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence
rules.  Unfortunately, it contains quite a few holes.

There's a complex dancing around REQ_ATOM_STARTED and
REQ_ATOM_COMPLETE between issue/completion and timeout paths; however,
they don't have a synchronization point across request recycle
instances and it isn't clear what the barriers add.
blk_mq_check_expired() can easily read STARTED from N-2'th iteration,
deadline from N-1'th, blk_mark_rq_complete() against Nth instance.

In fact, it's pretty easy to make blk_mq_check_expired() terminate a
later instance of a request.  If we induce 5 sec delay before
time_after_eq() test in blk_mq_check_expired(), shorten the timeout to
2s, and issue back-to-back large IOs, blk-mq starts timing out
requests spuriously pretty quickly.  Nothing actually timed out.  It
just made the call on a recycle instance of a request and then
terminated a later instance long after the original instance finished.
The scenario isn't theoretical either.

This patch replaces the broken synchronization mechanism with a RCU
and generation number based one.

1. Each request has a u64 generation + state value, which can be
   updated only by the request owner.  Whenever a request becomes
   in-flight, the generation number gets bumped up too.  This provides
   the basis for the timeout path to distinguish different recycle
   instances of the request.

   Also, marking a request in-flight and setting its deadline are
   protected with a seqcount so that the timeout path can fetch both
   values coherently.

2. The timeout path fetches the generation, state and deadline.  If
   the verdict is timeout, it records the generation into a dedicated
   request abortion field and does RCU wait.

3. The completion path is also protected by RCU (from the previous
   patch) and checks whether the current generation number and state
   match the abortion field.  If so, it skips completion.

4. The timeout path, after RCU wait, scans requests again and
   terminates the ones whose generation and state still match the ones
   requested for abortion.

   By now, the timeout path knows that either the generation number
   and state changed if it lost the race or the completion will yield
   to it and can safely timeout the request.

While it's more lines of code, it's conceptually simpler, doesn't
depend on direct use of subtle memory ordering or coherence, and
hopefully doesn't terminate the wrong instance.

While this change makes REQ_ATOM_COMPLETE synchronization unnecessary
between issue/complete and timeout paths, REQ_ATOM_COMPLETE isn't
removed yet as it's still used in other places.  Future patches will
move all state tracking to the new mechanism and remove all bitops in
the hot paths.

Note that this patch adds a comment explaining a race condition in
BLK_EH_RESET_TIMER path.  The race has always been there and this
patch doesn't change it.  It's just documenting the existing race.

v2: - Fixed BLK_EH_RESET_TIMER handling as pointed out by Jianchao.
    - s/request->gstate_seqc/request->gstate_seq/ as suggested by Peter.
    - READ_ONCE() added in blk_mq_rq_update_state() as suggested by Peter.

v3: - Fixed possible extended seqcount / u64_stats_sync read looping
      spotted by Peter.
    - MQ_RQ_IDLE was incorrectly being set in complete_request instead
      of free_request.  Fixed.

v4: - Rebased on top of hctx_lock() refactoring patch.
    - Added comment explaining the use of hctx_lock() in completion path.

v5: - Added comments requested by Bart.
    - Note the addition of BLK_EH_RESET_TIMER race condition in the
      commit message.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "jianchao.wang" <jianchao.w.wang@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <Bart.VanAssche@wdc.com>
---
 block/blk-core.c       |   2 +
 block/blk-mq.c         | 229 +++++++++++++++++++++++++++++++++----------------
 block/blk-mq.h         |  46 ++++++++++
 block/blk-timeout.c    |   2 +-
 block/blk.h            |   6 --
 include/linux/blk-mq.h |   1 +
 include/linux/blkdev.h |  23 +++++
 7 files changed, 230 insertions(+), 79 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 2e0d041..f843ae4f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -126,6 +126,8 @@ void blk_rq_init(struct request_queue *q, struct request *rq)
 	rq->start_time = jiffies;
 	set_start_time_ns(rq);
 	rq->part = NULL;
+	seqcount_init(&rq->gstate_seq);
+	u64_stats_init(&rq->aborted_gstate_sync);
 }
 EXPORT_SYMBOL(blk_rq_init);
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6741c3e..052fee5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -483,6 +483,7 @@ void blk_mq_free_request(struct request *rq)
 	if (blk_rq_rl(rq))
 		blk_put_rl(blk_rq_rl(rq));
 
+	blk_mq_rq_update_state(rq, MQ_RQ_IDLE);
 	clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
 	clear_bit(REQ_ATOM_POLL_SLEPT, &rq->atomic_flags);
 	if (rq->tag != -1)
@@ -530,6 +531,8 @@ static void __blk_mq_complete_request(struct request *rq)
 	bool shared = false;
 	int cpu;
 
+	WARN_ON_ONCE(blk_mq_rq_state(rq) != MQ_RQ_IN_FLIGHT);
+
 	if (rq->internal_tag != -1)
 		blk_mq_sched_completed_request(rq);
 	if (rq->rq_flags & RQF_STATS) {
@@ -573,6 +576,36 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
 		*srcu_idx = srcu_read_lock(hctx->queue_rq_srcu);
 }
 
+static void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate)
+{
+	unsigned long flags;
+
+	/*
+	 * blk_mq_rq_aborted_gstate() is used from the completion path and
+	 * can thus be called from irq context.  u64_stats_fetch in the
+	 * middle of update on the same CPU leads to lockup.  Disable irq
+	 * while updating.
+	 */
+	local_irq_save(flags);
+	u64_stats_update_begin(&rq->aborted_gstate_sync);
+	rq->aborted_gstate = gstate;
+	u64_stats_update_end(&rq->aborted_gstate_sync);
+	local_irq_restore(flags);
+}
+
+static u64 blk_mq_rq_aborted_gstate(struct request *rq)
+{
+	unsigned int start;
+	u64 aborted_gstate;
+
+	do {
+		start = u64_stats_fetch_begin(&rq->aborted_gstate_sync);
+		aborted_gstate = rq->aborted_gstate;
+	} while (u64_stats_fetch_retry(&rq->aborted_gstate_sync, start));
+
+	return aborted_gstate;
+}
+
 /**
  * blk_mq_complete_request - end I/O on a request
  * @rq:		the request being processed
@@ -590,8 +623,20 @@ void blk_mq_complete_request(struct request *rq)
 	if (unlikely(blk_should_fake_timeout(q)))
 		return;
 
+	/*
+	 * If @rq->aborted_gstate equals the current instance, timeout is
+	 * claiming @rq and we lost.  This is synchronized through
+	 * hctx_lock().  See blk_mq_timeout_work() for details.
+	 *
+	 * Completion path never blocks and we can directly use RCU here
+	 * instead of hctx_lock() which can be either RCU or SRCU.
+	 * However, that would complicate paths which want to synchronize
+	 * against us.  Let stay in sync with the issue path so that
+	 * hctx_lock() covers both issue and completion paths.
+	 */
 	hctx_lock(hctx, &srcu_idx);
-	if (!blk_mark_rq_complete(rq))
+	if (blk_mq_rq_aborted_gstate(rq) != rq->gstate &&
+	    !blk_mark_rq_complete(rq))
 		__blk_mq_complete_request(rq);
 	hctx_unlock(hctx, srcu_idx);
 }
@@ -617,34 +662,32 @@ void blk_mq_start_request(struct request *rq)
 		wbt_issue(q->rq_wb, &rq->issue_stat);
 	}
 
-	blk_add_timer(rq);
-
+	WARN_ON_ONCE(blk_mq_rq_state(rq) != MQ_RQ_IDLE);
 	WARN_ON_ONCE(test_bit(REQ_ATOM_STARTED, &rq->atomic_flags));
 
 	/*
-	 * Mark us as started and clear complete. Complete might have been
-	 * set if requeue raced with timeout, which then marked it as
-	 * complete. So be sure to clear complete again when we start
-	 * the request, otherwise we'll ignore the completion event.
+	 * Mark @rq in-flight which also advances the generation number,
+	 * and register for timeout.  Protect with a seqcount to allow the
+	 * timeout path to read both @rq->gstate and @rq->deadline
+	 * coherently.
 	 *
-	 * Ensure that ->deadline is visible before we set STARTED, such that
-	 * blk_mq_check_expired() is guaranteed to observe our ->deadline when
-	 * it observes STARTED.
+	 * This is the only place where a request is marked in-flight.  If
+	 * the timeout path reads an in-flight @rq->gstate, the
+	 * @rq->deadline it reads together under @rq->gstate_seq is
+	 * guaranteed to be the matching one.
 	 */
-	smp_wmb();
+	preempt_disable();
+	write_seqcount_begin(&rq->gstate_seq);
+
+	blk_mq_rq_update_state(rq, MQ_RQ_IN_FLIGHT);
+	blk_add_timer(rq);
+
+	write_seqcount_end(&rq->gstate_seq);
+	preempt_enable();
+
 	set_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
-	if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags)) {
-		/*
-		 * Coherence order guarantees these consecutive stores to a
-		 * single variable propagate in the specified order. Thus the
-		 * clear_bit() is ordered _after_ the set bit. See
-		 * blk_mq_check_expired().
-		 *
-		 * (the bits must be part of the same byte for this to be
-		 * true).
-		 */
+	if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags))
 		clear_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
-	}
 
 	if (q->dma_drain_size && blk_rq_bytes(rq)) {
 		/*
@@ -677,6 +720,7 @@ static void __blk_mq_requeue_request(struct request *rq)
 	blk_mq_sched_requeue_request(rq);
 
 	if (test_and_clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) {
+		blk_mq_rq_update_state(rq, MQ_RQ_IDLE);
 		if (q->dma_drain_size && blk_rq_bytes(rq))
 			rq->nr_phys_segments--;
 	}
@@ -774,6 +818,7 @@ EXPORT_SYMBOL(blk_mq_tag_to_rq);
 struct blk_mq_timeout_data {
 	unsigned long next;
 	unsigned int next_set;
+	unsigned int nr_expired;
 };
 
 void blk_mq_rq_timed_out(struct request *req, bool reserved)
@@ -801,6 +846,12 @@ void blk_mq_rq_timed_out(struct request *req, bool reserved)
 		__blk_mq_complete_request(req);
 		break;
 	case BLK_EH_RESET_TIMER:
+		/*
+		 * As nothing prevents from completion happening while
+		 * ->aborted_gstate is set, this may lead to ignored
+		 * completions and further spurious timeouts.
+		 */
+		blk_mq_rq_update_aborted_gstate(req, 0);
 		blk_add_timer(req);
 		blk_clear_rq_complete(req);
 		break;
@@ -816,50 +867,51 @@ static void blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
 		struct request *rq, void *priv, bool reserved)
 {
 	struct blk_mq_timeout_data *data = priv;
-	unsigned long deadline;
+	unsigned long gstate, deadline;
+	int start;
+
+	might_sleep();
 
 	if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
 		return;
 
-	/*
-	 * Ensures that if we see STARTED we must also see our
-	 * up-to-date deadline, see blk_mq_start_request().
-	 */
-	smp_rmb();
-
-	deadline = READ_ONCE(rq->deadline);
+	/* read coherent snapshots of @rq->state_gen and @rq->deadline */
+	while (true) {
+		start = read_seqcount_begin(&rq->gstate_seq);
+		gstate = READ_ONCE(rq->gstate);
+		deadline = rq->deadline;
+		if (!read_seqcount_retry(&rq->gstate_seq, start))
+			break;
+		cond_resched();
+	}
 
-	/*
-	 * The rq being checked may have been freed and reallocated
-	 * out already here, we avoid this race by checking rq->deadline
-	 * and REQ_ATOM_COMPLETE flag together:
-	 *
-	 * - if rq->deadline is observed as new value because of
-	 *   reusing, the rq won't be timed out because of timing.
-	 * - if rq->deadline is observed as previous value,
-	 *   REQ_ATOM_COMPLETE flag won't be cleared in reuse path
-	 *   because we put a barrier between setting rq->deadline
-	 *   and clearing the flag in blk_mq_start_request(), so
-	 *   this rq won't be timed out too.
-	 */
-	if (time_after_eq(jiffies, deadline)) {
-		if (!blk_mark_rq_complete(rq)) {
-			/*
-			 * Again coherence order ensures that consecutive reads
-			 * from the same variable must be in that order. This
-			 * ensures that if we see COMPLETE clear, we must then
-			 * see STARTED set and we'll ignore this timeout.
-			 *
-			 * (There's also the MB implied by the test_and_clear())
-			 */
-			blk_mq_rq_timed_out(rq, reserved);
-		}
+	/* if in-flight && overdue, mark for abortion */
+	if ((gstate & MQ_RQ_STATE_MASK) == MQ_RQ_IN_FLIGHT &&
+	    time_after_eq(jiffies, deadline)) {
+		blk_mq_rq_update_aborted_gstate(rq, gstate);
+		data->nr_expired++;
+		hctx->nr_expired++;
 	} else if (!data->next_set || time_after(data->next, deadline)) {
 		data->next = deadline;
 		data->next_set = 1;
 	}
 }
 
+static void blk_mq_terminate_expired(struct blk_mq_hw_ctx *hctx,
+		struct request *rq, void *priv, bool reserved)
+{
+	/*
+	 * We marked @rq->aborted_gstate and waited for RCU.  If there were
+	 * completions that we lost to, they would have finished and
+	 * updated @rq->gstate by now; otherwise, the completion path is
+	 * now guaranteed to see @rq->aborted_gstate and yield.  If
+	 * @rq->aborted_gstate still matches @rq->gstate, @rq is ours.
+	 */
+	if (READ_ONCE(rq->gstate) == rq->aborted_gstate &&
+	    !blk_mark_rq_complete(rq))
+		blk_mq_rq_timed_out(rq, reserved);
+}
+
 static void blk_mq_timeout_work(struct work_struct *work)
 {
 	struct request_queue *q =
@@ -867,7 +919,9 @@ static void blk_mq_timeout_work(struct work_struct *work)
 	struct blk_mq_timeout_data data = {
 		.next		= 0,
 		.next_set	= 0,
+		.nr_expired	= 0,
 	};
+	struct blk_mq_hw_ctx *hctx;
 	int i;
 
 	/* A deadlock might occur if a request is stuck requiring a
@@ -886,14 +940,40 @@ static void blk_mq_timeout_work(struct work_struct *work)
 	if (!percpu_ref_tryget(&q->q_usage_counter))
 		return;
 
+	/* scan for the expired ones and set their ->aborted_gstate */
 	blk_mq_queue_tag_busy_iter(q, blk_mq_check_expired, &data);
 
+	if (data.nr_expired) {
+		bool has_rcu = false;
+
+		/*
+		 * Wait till everyone sees ->aborted_gstate.  The
+		 * sequential waits for SRCUs aren't ideal.  If this ever
+		 * becomes a problem, we can add per-hw_ctx rcu_head and
+		 * wait in parallel.
+		 */
+		queue_for_each_hw_ctx(q, hctx, i) {
+			if (!hctx->nr_expired)
+				continue;
+
+			if (!(hctx->flags & BLK_MQ_F_BLOCKING))
+				has_rcu = true;
+			else
+				synchronize_srcu(hctx->queue_rq_srcu);
+
+			hctx->nr_expired = 0;
+		}
+		if (has_rcu)
+			synchronize_rcu();
+
+		/* terminate the ones we won */
+		blk_mq_queue_tag_busy_iter(q, blk_mq_terminate_expired, NULL);
+	}
+
 	if (data.next_set) {
 		data.next = blk_rq_timeout(round_jiffies_up(data.next));
 		mod_timer(&q->timeout, data.next);
 	} else {
-		struct blk_mq_hw_ctx *hctx;
-
 		queue_for_each_hw_ctx(q, hctx, i) {
 			/* the hctx may be unmapped, so check it here */
 			if (blk_mq_hw_queue_mapped(hctx))
@@ -1893,6 +1973,22 @@ static size_t order_to_size(unsigned int order)
 	return (size_t)PAGE_SIZE << order;
 }
 
+static int blk_mq_init_request(struct blk_mq_tag_set *set, struct request *rq,
+			       unsigned int hctx_idx, int node)
+{
+	int ret;
+
+	if (set->ops->init_request) {
+		ret = set->ops->init_request(set, rq, hctx_idx, node);
+		if (ret)
+			return ret;
+	}
+
+	seqcount_init(&rq->gstate_seq);
+	u64_stats_init(&rq->aborted_gstate_sync);
+	return 0;
+}
+
 int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		     unsigned int hctx_idx, unsigned int depth)
 {
@@ -1954,12 +2050,9 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 			struct request *rq = p;
 
 			tags->static_rqs[i] = rq;
-			if (set->ops->init_request) {
-				if (set->ops->init_request(set, rq, hctx_idx,
-						node)) {
-					tags->static_rqs[i] = NULL;
-					goto fail;
-				}
+			if (blk_mq_init_request(set, rq, hctx_idx, node)) {
+				tags->static_rqs[i] = NULL;
+				goto fail;
 			}
 
 			p += rq_size;
@@ -2098,9 +2191,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
 	if (!hctx->fq)
 		goto sched_exit_hctx;
 
-	if (set->ops->init_request &&
-	    set->ops->init_request(set, hctx->fq->flush_rq, hctx_idx,
-				   node))
+	if (blk_mq_init_request(set, hctx->fq->flush_rq, hctx_idx, node))
 		goto free_fq;
 
 	if (hctx->flags & BLK_MQ_F_BLOCKING)
@@ -3018,12 +3109,6 @@ static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
 
 static int __init blk_mq_init(void)
 {
-	/*
-	 * See comment in block/blk.h rq_atomic_flags enum
-	 */
-	BUILD_BUG_ON((REQ_ATOM_STARTED / BITS_PER_BYTE) !=
-			(REQ_ATOM_COMPLETE / BITS_PER_BYTE));
-
 	cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
 				blk_mq_hctx_notify_dead);
 	return 0;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 6c7c3ff..cf01f6f 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -27,6 +27,19 @@ struct blk_mq_ctx {
 	struct kobject		kobj;
 } ____cacheline_aligned_in_smp;
 
+/*
+ * Bits for request->gstate.  The lower two bits carry MQ_RQ_* state value
+ * and the upper bits the generation number.
+ */
+enum mq_rq_state {
+	MQ_RQ_IDLE		= 0,
+	MQ_RQ_IN_FLIGHT		= 1,
+
+	MQ_RQ_STATE_BITS	= 2,
+	MQ_RQ_STATE_MASK	= (1 << MQ_RQ_STATE_BITS) - 1,
+	MQ_RQ_GEN_INC		= 1 << MQ_RQ_STATE_BITS,
+};
+
 void blk_mq_freeze_queue(struct request_queue *q);
 void blk_mq_free_queue(struct request_queue *q);
 int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr);
@@ -85,6 +98,39 @@ extern void blk_mq_rq_timed_out(struct request *req, bool reserved);
 
 void blk_mq_release(struct request_queue *q);
 
+/**
+ * blk_mq_rq_state() - read the current MQ_RQ_* state of a request
+ * @rq: target request.
+ */
+static inline int blk_mq_rq_state(struct request *rq)
+{
+	return READ_ONCE(rq->gstate) & MQ_RQ_STATE_MASK;
+}
+
+/**
+ * blk_mq_rq_update_state() - set the current MQ_RQ_* state of a request
+ * @rq: target request.
+ * @state: new state to set.
+ *
+ * Set @rq's state to @state.  The caller is responsible for ensuring that
+ * there are no other updaters.  A request can transition into IN_FLIGHT
+ * only from IDLE and doing so increments the generation number.
+ */
+static inline void blk_mq_rq_update_state(struct request *rq,
+					  enum mq_rq_state state)
+{
+	u64 old_val = READ_ONCE(rq->gstate);
+	u64 new_val = (old_val & ~MQ_RQ_STATE_MASK) | state;
+
+	if (state == MQ_RQ_IN_FLIGHT) {
+		WARN_ON_ONCE((old_val & MQ_RQ_STATE_MASK) != MQ_RQ_IDLE);
+		new_val += MQ_RQ_GEN_INC;
+	}
+
+	/* avoid exposing interim values */
+	WRITE_ONCE(rq->gstate, new_val);
+}
+
 static inline struct blk_mq_ctx *__blk_mq_get_ctx(struct request_queue *q,
 					   unsigned int cpu)
 {
diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index 764ecf9..6427be7 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -208,7 +208,7 @@ void blk_add_timer(struct request *req)
 	if (!req->timeout)
 		req->timeout = q->rq_timeout;
 
-	WRITE_ONCE(req->deadline, jiffies + req->timeout);
+	req->deadline = jiffies + req->timeout;
 
 	/*
 	 * Only the non-mq case needs to add the request to a protected list.
diff --git a/block/blk.h b/block/blk.h
index 3f14469..9cb2739 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -123,12 +123,6 @@ void blk_account_io_done(struct request *req);
  * Internal atomic flags for request handling
  */
 enum rq_atomic_flags {
-	/*
-	 * Keep these two bits first - not because we depend on the
-	 * value of them, but we do depend on them being in the same
-	 * byte of storage to ensure ordering on writes. Keeping them
-	 * first will achieve that nicely.
-	 */
 	REQ_ATOM_COMPLETE = 0,
 	REQ_ATOM_STARTED,
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 95c9a5c..460798d 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -51,6 +51,7 @@ struct blk_mq_hw_ctx {
 	unsigned int		queue_num;
 
 	atomic_t		nr_active;
+	unsigned int		nr_expired;
 
 	struct hlist_node	cpuhp_dead;
 	struct kobject		kobj;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 46e606f..ae563d0 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -27,6 +27,8 @@
 #include <linux/percpu-refcount.h>
 #include <linux/scatterlist.h>
 #include <linux/blkzoned.h>
+#include <linux/seqlock.h>
+#include <linux/u64_stats_sync.h>
 
 struct module;
 struct scsi_ioctl_command;
@@ -230,6 +232,27 @@ struct request {
 
 	unsigned short write_hint;
 
+	/*
+	 * On blk-mq, the lower bits of ->gstate (generation number and
+	 * state) carry the MQ_RQ_* state value and the upper bits the
+	 * generation number which is monotonically incremented and used to
+	 * distinguish the reuse instances.
+	 *
+	 * ->gstate_seq allows updates to ->gstate and other fields
+	 * (currently ->deadline) during request start to be read
+	 * atomically from the timeout path, so that it can operate on a
+	 * coherent set of information.
+	 */
+	seqcount_t gstate_seq;
+	u64 gstate;
+
+	/*
+	 * ->aborted_gstate is used by the timeout to claim a specific
+	 * recycle instance of this request.  See blk_mq_timeout_work().
+	 */
+	struct u64_stats_sync aborted_gstate_sync;
+	u64 aborted_gstate;
+
 	unsigned long deadline;
 	struct list_head timeout_list;
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2018-01-09 16:29 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-08 19:15 [PATCHSET v4] blk-mq: reimplement timeout handling Tejun Heo
2018-01-08 19:15 ` [PATCH 1/8] blk-mq: move hctx lock/unlock into a helper Tejun Heo
2018-01-08 19:24   ` Bart Van Assche
2018-01-08 19:24     ` Bart Van Assche
2018-01-08 19:15 ` [PATCH 2/8] blk-mq: protect completion path with RCU Tejun Heo
2018-01-08 19:57   ` Holger Hoffstätte
2018-01-08 20:15     ` Jens Axboe
2018-01-08 22:55       ` Jens Axboe
2018-01-08 23:27         ` Holger Hoffstätte
2018-01-08 23:33           ` Holger Hoffstätte
2018-01-09  7:08   ` Hannes Reinecke
2018-01-09 15:22     ` Jens Axboe
2018-01-09 16:12   ` Bart Van Assche
2018-01-09 16:12     ` Bart Van Assche
2018-01-09 16:17     ` Jens Axboe
2018-01-09 16:19     ` tj
2018-01-09 16:22       ` Jens Axboe
2018-01-08 19:15 ` [PATCH 3/8] blk-mq: replace timeout synchronization with a RCU and generation based scheme Tejun Heo
2018-01-08 21:06   ` Bart Van Assche
2018-01-08 21:06     ` Bart Van Assche
2018-01-09 15:46     ` tj
2018-01-08 23:29   ` Bart Van Assche
2018-01-08 23:29     ` Bart Van Assche
2018-01-09 15:46     ` tj
2018-01-08 19:15 ` [PATCH 4/8] blk-mq: use blk_mq_rq_state() instead of testing REQ_ATOM_COMPLETE Tejun Heo
2018-01-08 22:03   ` Bart Van Assche
2018-01-08 22:03     ` Bart Van Assche
2018-01-08 19:15 ` [PATCH 5/8] blk-mq: make blk_abort_request() trigger timeout path Tejun Heo
2018-01-08 22:10   ` Bart Van Assche
2018-01-08 22:10     ` Bart Van Assche
2018-01-09 16:02     ` tj
2018-01-08 19:15 ` [PATCH 6/8] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq Tejun Heo
2018-01-08 23:36   ` Bart Van Assche
2018-01-08 23:36     ` Bart Van Assche
2018-01-08 19:15 ` [PATCH 7/8] blk-mq: remove REQ_ATOM_STARTED Tejun Heo
2018-01-08 23:47   ` Bart Van Assche
2018-01-08 23:47     ` Bart Van Assche
2018-01-08 19:15 ` [PATCH 8/8] blk-mq: rename blk_mq_hw_ctx->queue_rq_srcu to ->srcu Tejun Heo
2018-01-08 23:48   ` Bart Van Assche
2018-01-08 23:48     ` Bart Van Assche
2018-01-09 16:29 [PATCHSET v5] blk-mq: reimplement timeout handling Tejun Heo
2018-01-09 16:29 ` [PATCH 3/8] blk-mq: replace timeout synchronization with a RCU and generation based scheme Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.