[stable-4.14 00/23] block/scsi multiqueue performance enhancement and

All of lore.kernel.org
 help / color / mirror / Atom feed

* [stable-4.14 00/23] block/scsi multiqueue performance enhancement and
@ 2018-07-23 13:24 Jack Wang
  2018-07-23 13:24 ` [stable-4.14 01/23] blk-mq-sched: move actual dispatching into one helper Jack Wang
                   ` (23 more replies)
  0 siblings, 24 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 13:24 UTC (permalink / raw)
  To: gregkh, stable

Hi Greg,
 
Please consider this patchset, which include block/scsi multiqueue performance
enhancement and bugfix.
  
We've run multiple benchmark and different tests for over one week, looks
good.
   
These patches are also included in Oracle UEK5.
     
They're almost just simple cherry-pick, only 2 patches need minor adjust.
      
They can apply cleanly on 4.14.57.

Jens Axboe (3):
  Revert "blk-mq: don't handle TAG_SHARED in restart"
  blk-mq: fix issue with shared tag queue re-running
  blk-mq: only run the hardware queue if IO is pending

Jianchao Wang (1):
  blk-mq: put the driver tag of nxt rq before first one is requeued

Ming Lei (19):
  blk-mq-sched: move actual dispatching into one helper
  blk-mq: introduce .get_budget and .put_budget in blk_mq_ops
  sbitmap: introduce __sbitmap_for_each_set()
  blk-mq-sched: improve dispatching from sw queue
  scsi: allow passing in null rq to scsi_prep_state_check()
  scsi: implement .get_budget and .put_budget for blk-mq
  SCSI: don't get target/host busy_count in scsi_mq_get_budget()
  blk-mq: don't handle TAG_SHARED in restart
  blk-mq: don't restart queue when .get_budget returns BLK_STS_RESOURCE
  blk-mq: don't handle failure in .get_budget
  blk-flush: don't run queue for requests bypassing flush
  block: pass 'run_queue' to blk_mq_request_bypass_insert
  blk-flush: use blk_mq_request_bypass_insert()
  blk-mq-sched: decide how to handle flush rq via RQF_FLUSH_SEQ
  blk-mq: move blk_mq_put_driver_tag*() into blk-mq.h
  blk-mq: don't allocate driver tag upfront for flush rq
  blk-mq: put driver tag if dispatch budget can't be got
  blk-mq: quiesce queue during switching io sched and updating
    nr_requests
  scsi: core: run queue if SCSI device queue isn't ready and queue is
    idle

 block/blk-core.c        |   2 +-
 block/blk-flush.c       |  37 +++++--
 block/blk-mq-debugfs.c  |   1 -
 block/blk-mq-sched.c    | 203 ++++++++++++++++++++++-------------
 block/blk-mq.c          | 278 +++++++++++++++++++++++++++---------------------
 block/blk-mq.h          |  58 +++++++++-
 block/elevator.c        |   2 +
 drivers/scsi/scsi_lib.c |  53 ++++++---
 include/linux/blk-mq.h  |  20 +++-
 include/linux/sbitmap.h |  64 ++++++++---
 10 files changed, 475 insertions(+), 243 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [stable-4.14 01/23] blk-mq-sched: move actual dispatching into one helper
  2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
@ 2018-07-23 13:24 ` Jack Wang
  2018-07-23 14:31   ` Jens Axboe
  2018-07-23 13:24 ` [stable-4.14 02/23] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops Jack Wang
                   ` (22 subsequent siblings)
  23 siblings, 1 reply; 35+ messages in thread
From: Jack Wang @ 2018-07-23 13:24 UTC (permalink / raw)
  To: gregkh, stable; +Cc: Ming Lei, Jens Axboe, Jack Wang

From: Ming Lei <ming.lei@redhat.com>

commit caf8eb0d604a0eaeb8111eb4d36853a6d08eebe7 upstream

So that it becomes easy to support to dispatch from sw queue in the
following patch.

No functional change.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Suggested-by: Christoph Hellwig <hch@lst.de> # for simplifying dispatch logic
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 block/blk-mq-sched.c | 43 ++++++++++++++++++++++++-------------------
 1 file changed, 24 insertions(+), 19 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index eca011fdfa0e..be29ba849408 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -89,12 +89,26 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx)
 	return false;
 }
 
+static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
+{
+	struct request_queue *q = hctx->queue;
+	struct elevator_queue *e = q->elevator;
+	LIST_HEAD(rq_list);
+
+	do {
+		struct request *rq = e->type->ops.mq.dispatch_request(hctx);
+
+		if (!rq)
+			break;
+		list_add(&rq->queuelist, &rq_list);
+	} while (blk_mq_dispatch_rq_list(q, &rq_list));
+}
+
 void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
 	struct elevator_queue *e = q->elevator;
 	const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
-	bool do_sched_dispatch = true;
 	LIST_HEAD(rq_list);
 
 	/* RCU or SRCU read lock is needed before checking quiesced flag */
@@ -122,30 +136,21 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	 * scheduler, we can no longer merge or sort them. So it's best to
 	 * leave them there for as long as we can. Mark the hw queue as
 	 * needing a restart in that case.
+	 *
+	 * We want to dispatch from the scheduler if there was nothing
+	 * on the dispatch list or we were able to dispatch from the
+	 * dispatch list.
 	 */
 	if (!list_empty(&rq_list)) {
 		blk_mq_sched_mark_restart_hctx(hctx);
-		do_sched_dispatch = blk_mq_dispatch_rq_list(q, &rq_list);
-	} else if (!has_sched_dispatch) {
+		if (blk_mq_dispatch_rq_list(q, &rq_list) && has_sched_dispatch)
+			blk_mq_do_dispatch_sched(hctx);
+	} else if (has_sched_dispatch) {
+		blk_mq_do_dispatch_sched(hctx);
+	} else {
 		blk_mq_flush_busy_ctxs(hctx, &rq_list);
 		blk_mq_dispatch_rq_list(q, &rq_list);
 	}
-
-	/*
-	 * We want to dispatch from the scheduler if there was nothing
-	 * on the dispatch list or we were able to dispatch from the
-	 * dispatch list.
-	 */
-	if (do_sched_dispatch && has_sched_dispatch) {
-		do {
-			struct request *rq;
-
-			rq = e->type->ops.mq.dispatch_request(hctx);
-			if (!rq)
-				break;
-			list_add(&rq->queuelist, &rq_list);
-		} while (blk_mq_dispatch_rq_list(q, &rq_list));
-	}
 }
 
 bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [stable-4.14 02/23] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops
  2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
  2018-07-23 13:24 ` [stable-4.14 01/23] blk-mq-sched: move actual dispatching into one helper Jack Wang
@ 2018-07-23 13:24 ` Jack Wang
  2018-07-23 13:24 ` [stable-4.14 03/23] sbitmap: introduce __sbitmap_for_each_set() Jack Wang
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 13:24 UTC (permalink / raw)
  To: gregkh, stable; +Cc: Ming Lei, Jens Axboe, Jack Wang

From: Ming Lei <ming.lei@redhat.com>

commit de1482974080ec9ef414bf048b2646b246b63f6e upstream

For SCSI devices, there is often a per-request-queue depth, which needs
to be respected before queuing one request.

Currently blk-mq always dequeues the request first, then calls
.queue_rq() to dispatch the request to lld. One obvious issue with this
approach is that I/O merging may not be successful, because when the
per-request-queue depth can't be respected, .queue_rq() has to return
BLK_STS_RESOURCE, and then this request has to stay in hctx->dispatch
list. This means it never gets a chance to be merged with other IO.

This patch introduces .get_budget and .put_budget callback in blk_mq_ops,
then we can try to get reserved budget first before dequeuing request.
If the budget for queueing I/O can't be satisfied, we don't need to
dequeue request at all. Hence the request can be left in the IO
scheduler queue, for more merging opportunities.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 block/blk-mq-sched.c   | 55 +++++++++++++++++++++++++++++++++++++++++---------
 block/blk-mq-sched.h   |  2 +-
 block/blk-mq.c         | 43 ++++++++++++++++++++++++++++++++++-----
 block/blk-mq.h         | 20 +++++++++++++++++-
 include/linux/blk-mq.h | 11 ++++++++++
 5 files changed, 114 insertions(+), 17 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index be29ba849408..8e525e66a0d9 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -89,31 +89,57 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx)
 	return false;
 }
 
-static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
+/* return true if hctx need to run again */
+static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
 	struct elevator_queue *e = q->elevator;
 	LIST_HEAD(rq_list);
 
 	do {
-		struct request *rq = e->type->ops.mq.dispatch_request(hctx);
+		struct request *rq;
+		blk_status_t ret;
 
-		if (!rq)
+		if (e->type->ops.mq.has_work &&
+				!e->type->ops.mq.has_work(hctx))
 			break;
+
+		ret = blk_mq_get_dispatch_budget(hctx);
+		if (ret == BLK_STS_RESOURCE)
+			return true;
+
+		rq = e->type->ops.mq.dispatch_request(hctx);
+		if (!rq) {
+			blk_mq_put_dispatch_budget(hctx);
+			break;
+		} else if (ret != BLK_STS_OK) {
+			blk_mq_end_request(rq, ret);
+			continue;
+		}
+
+		/*
+		 * Now this rq owns the budget which has to be released
+		 * if this rq won't be queued to driver via .queue_rq()
+		 * in blk_mq_dispatch_rq_list().
+		 */
 		list_add(&rq->queuelist, &rq_list);
-	} while (blk_mq_dispatch_rq_list(q, &rq_list));
+	} while (blk_mq_dispatch_rq_list(q, &rq_list, true));
+
+	return false;
 }
 
-void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
+/* return true if hw queue need to be run again */
+bool blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
 	struct elevator_queue *e = q->elevator;
 	const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
 	LIST_HEAD(rq_list);
+	bool run_queue = false;
 
 	/* RCU or SRCU read lock is needed before checking quiesced flag */
 	if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
-		return;
+		return false;
 
 	hctx->run++;
 
@@ -143,14 +169,23 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	 */
 	if (!list_empty(&rq_list)) {
 		blk_mq_sched_mark_restart_hctx(hctx);
-		if (blk_mq_dispatch_rq_list(q, &rq_list) && has_sched_dispatch)
-			blk_mq_do_dispatch_sched(hctx);
+		if (blk_mq_dispatch_rq_list(q, &rq_list, false) &&
+				has_sched_dispatch)
+			run_queue = blk_mq_do_dispatch_sched(hctx);
 	} else if (has_sched_dispatch) {
-		blk_mq_do_dispatch_sched(hctx);
+		run_queue = blk_mq_do_dispatch_sched(hctx);
 	} else {
 		blk_mq_flush_busy_ctxs(hctx, &rq_list);
-		blk_mq_dispatch_rq_list(q, &rq_list);
+		blk_mq_dispatch_rq_list(q, &rq_list, false);
 	}
+
+	if (run_queue && !blk_mq_sched_needs_restart(hctx) &&
+			!test_bit(BLK_MQ_S_TAG_WAITING, &hctx->state)) {
+		blk_mq_sched_mark_restart_hctx(hctx);
+		return true;
+	}
+
+	return false;
 }
 
 bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index ba1d1418a96d..1ccfb8027cfc 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -23,7 +23,7 @@ void blk_mq_sched_insert_requests(struct request_queue *q,
 				  struct blk_mq_ctx *ctx,
 				  struct list_head *list, bool run_queue_async);
 
-void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx);
+bool blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx);
 
 int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e);
 void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 49979c095f31..ad5ae7192a64 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1037,7 +1037,8 @@ static bool blk_mq_dispatch_wait_add(struct blk_mq_hw_ctx *hctx)
 	return true;
 }
 
-bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
+bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
+		bool got_budget)
 {
 	struct blk_mq_hw_ctx *hctx;
 	struct request *rq;
@@ -1046,6 +1047,8 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
 	if (list_empty(list))
 		return false;
 
+	WARN_ON(!list_is_singular(list) && got_budget);
+
 	/*
 	 * Now process all the entries, sending them to the driver.
 	 */
@@ -1063,16 +1066,30 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
 			 * The initial allocation attempt failed, so we need to
 			 * rerun the hardware queue when a tag is freed.
 			 */
-			if (!blk_mq_dispatch_wait_add(hctx))
+			if (!blk_mq_dispatch_wait_add(hctx)) {
+				if (got_budget)
+					blk_mq_put_dispatch_budget(hctx);
 				break;
+			}
 
 			/*
 			 * It's possible that a tag was freed in the window
 			 * between the allocation failure and adding the
 			 * hardware queue to the wait queue.
 			 */
-			if (!blk_mq_get_driver_tag(rq, &hctx, false))
+			if (!blk_mq_get_driver_tag(rq, &hctx, false)) {
+				if (got_budget)
+					blk_mq_put_dispatch_budget(hctx);
+				break;
+			}
+		}
+
+		if (!got_budget) {
+			ret = blk_mq_get_dispatch_budget(hctx);
+			if (ret == BLK_STS_RESOURCE)
 				break;
+			if (ret != BLK_STS_OK)
+				goto fail_rq;
 		}
 
 		list_del_init(&rq->queuelist);
@@ -1100,6 +1117,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
 			break;
 		}
 
+ fail_rq:
 		if (unlikely(ret != BLK_STS_OK)) {
 			errors++;
 			blk_mq_end_request(rq, BLK_STS_IOERR);
@@ -1158,6 +1176,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
 static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 {
 	int srcu_idx;
+	bool run_queue;
 
 	/*
 	 * We should be running this queue from one of the CPUs that
@@ -1192,15 +1211,18 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 
 	if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
 		rcu_read_lock();
-		blk_mq_sched_dispatch_requests(hctx);
+		run_queue = blk_mq_sched_dispatch_requests(hctx);
 		rcu_read_unlock();
 	} else {
 		might_sleep();
 
 		srcu_idx = srcu_read_lock(hctx->queue_rq_srcu);
-		blk_mq_sched_dispatch_requests(hctx);
+		run_queue = blk_mq_sched_dispatch_requests(hctx);
 		srcu_read_unlock(hctx->queue_rq_srcu, srcu_idx);
 	}
+
+	if (run_queue)
+		blk_mq_run_hw_queue(hctx, true);
 }
 
 /*
@@ -1593,6 +1615,13 @@ static void __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 	if (!blk_mq_get_driver_tag(rq, NULL, false))
 		goto insert;
 
+	ret = blk_mq_get_dispatch_budget(hctx);
+	if (ret == BLK_STS_RESOURCE) {
+		blk_mq_put_driver_tag(rq);
+		goto insert;
+	} else if (ret != BLK_STS_OK)
+		goto fail_rq;
+
 	new_cookie = request_to_qc_t(hctx, rq);
 
 	/*
@@ -1609,6 +1638,7 @@ static void __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 		__blk_mq_requeue_request(rq);
 		goto insert;
 	default:
+ fail_rq:
 		*cookie = BLK_QC_T_NONE;
 		blk_mq_end_request(rq, ret);
 		return;
@@ -2615,6 +2645,9 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 	if (!set->ops->queue_rq)
 		return -EINVAL;
 
+	if (!set->ops->get_budget ^ !set->ops->put_budget)
+		return -EINVAL;
+
 	if (set->queue_depth > BLK_MQ_MAX_DEPTH) {
 		pr_info("blk-mq: reduced tag depth to %u\n",
 			BLK_MQ_MAX_DEPTH);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 877237e09083..0c8ae91a7b5b 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -31,7 +31,7 @@ void blk_mq_freeze_queue(struct request_queue *q);
 void blk_mq_free_queue(struct request_queue *q);
 int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr);
 void blk_mq_wake_waiters(struct request_queue *q);
-bool blk_mq_dispatch_rq_list(struct request_queue *, struct list_head *);
+bool blk_mq_dispatch_rq_list(struct request_queue *, struct list_head *, bool);
 void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list);
 bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx);
 bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
@@ -140,4 +140,22 @@ void blk_mq_in_flight(struct request_queue *q, struct hd_struct *part,
 void blk_mq_in_flight_rw(struct request_queue *q, struct hd_struct *part,
 			 unsigned int inflight[2]);
 
+static inline void blk_mq_put_dispatch_budget(struct blk_mq_hw_ctx *hctx)
+{
+	struct request_queue *q = hctx->queue;
+
+	if (q->mq_ops->put_budget)
+		q->mq_ops->put_budget(hctx);
+}
+
+static inline blk_status_t blk_mq_get_dispatch_budget(
+		struct blk_mq_hw_ctx *hctx)
+{
+	struct request_queue *q = hctx->queue;
+
+	if (q->mq_ops->get_budget)
+		return q->mq_ops->get_budget(hctx);
+	return BLK_STS_OK;
+}
+
 #endif
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 994cbb0f7ffc..b7da1ded07fc 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -91,6 +91,8 @@ struct blk_mq_queue_data {
 
 typedef blk_status_t (queue_rq_fn)(struct blk_mq_hw_ctx *,
 		const struct blk_mq_queue_data *);
+typedef blk_status_t (get_budget_fn)(struct blk_mq_hw_ctx *);
+typedef void (put_budget_fn)(struct blk_mq_hw_ctx *);
 typedef enum blk_eh_timer_return (timeout_fn)(struct request *, bool);
 typedef int (init_hctx_fn)(struct blk_mq_hw_ctx *, void *, unsigned int);
 typedef void (exit_hctx_fn)(struct blk_mq_hw_ctx *, unsigned int);
@@ -113,6 +115,15 @@ struct blk_mq_ops {
 	queue_rq_fn		*queue_rq;
 
 	/*
+	 * Reserve budget before queue request, once .queue_rq is
+	 * run, it is driver's responsibility to release the
+	 * reserved budget. Also we have to handle failure case
+	 * of .get_budget for avoiding I/O deadlock.
+	 */
+	get_budget_fn		*get_budget;
+	put_budget_fn		*put_budget;
+
+	/*
 	 * Called on request timeout
 	 */
 	timeout_fn		*timeout;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [stable-4.14 03/23] sbitmap: introduce __sbitmap_for_each_set()
  2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
  2018-07-23 13:24 ` [stable-4.14 01/23] blk-mq-sched: move actual dispatching into one helper Jack Wang
  2018-07-23 13:24 ` [stable-4.14 02/23] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops Jack Wang
@ 2018-07-23 13:24 ` Jack Wang
  2018-07-23 13:24 ` [stable-4.14 04/23] blk-mq-sched: improve dispatching from sw queue Jack Wang
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 13:24 UTC (permalink / raw)
  To: gregkh, stable; +Cc: Ming Lei, Omar Sandoval, Jens Axboe, Jack Wang

From: Ming Lei <ming.lei@redhat.com>

commit 7930d0a00ff5dbcc80f793d1a7a6b8de4e591f1a upstream

For blk-mq, we need to be able to iterate software queues starting
from any queue in a round robin fashion, so introduce this helper.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Cc: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 include/linux/sbitmap.h | 64 ++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 47 insertions(+), 17 deletions(-)

diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h
index a1904aadbc45..0dcc60e820de 100644
--- a/include/linux/sbitmap.h
+++ b/include/linux/sbitmap.h
@@ -211,10 +211,14 @@ bool sbitmap_any_bit_set(const struct sbitmap *sb);
  */
 bool sbitmap_any_bit_clear(const struct sbitmap *sb);
 
+#define SB_NR_TO_INDEX(sb, bitnr) ((bitnr) >> (sb)->shift)
+#define SB_NR_TO_BIT(sb, bitnr) ((bitnr) & ((1U << (sb)->shift) - 1U))
+
 typedef bool (*sb_for_each_fn)(struct sbitmap *, unsigned int, void *);
 
 /**
- * sbitmap_for_each_set() - Iterate over each set bit in a &struct sbitmap.
+ * __sbitmap_for_each_set() - Iterate over each set bit in a &struct sbitmap.
+ * @start: Where to start the iteration.
  * @sb: Bitmap to iterate over.
  * @fn: Callback. Should return true to continue or false to break early.
  * @data: Pointer to pass to callback.
@@ -222,35 +226,61 @@ typedef bool (*sb_for_each_fn)(struct sbitmap *, unsigned int, void *);
  * This is inline even though it's non-trivial so that the function calls to the
  * callback will hopefully get optimized away.
  */
-static inline void sbitmap_for_each_set(struct sbitmap *sb, sb_for_each_fn fn,
-					void *data)
+static inline void __sbitmap_for_each_set(struct sbitmap *sb,
+					  unsigned int start,
+					  sb_for_each_fn fn, void *data)
 {
-	unsigned int i;
+	unsigned int index;
+	unsigned int nr;
+	unsigned int scanned = 0;
 
-	for (i = 0; i < sb->map_nr; i++) {
-		struct sbitmap_word *word = &sb->map[i];
-		unsigned int off, nr;
+	if (start >= sb->depth)
+		start = 0;
+	index = SB_NR_TO_INDEX(sb, start);
+	nr = SB_NR_TO_BIT(sb, start);
 
-		if (!word->word)
-			continue;
+	while (scanned < sb->depth) {
+		struct sbitmap_word *word = &sb->map[index];
+		unsigned int depth = min_t(unsigned int, word->depth - nr,
+					   sb->depth - scanned);
 
-		nr = 0;
-		off = i << sb->shift;
+		scanned += depth;
+		if (!word->word)
+			goto next;
+
+		/*
+		 * On the first iteration of the outer loop, we need to add the
+		 * bit offset back to the size of the word for find_next_bit().
+		 * On all other iterations, nr is zero, so this is a noop.
+		 */
+		depth += nr;
 		while (1) {
-			nr = find_next_bit(&word->word, word->depth, nr);
-			if (nr >= word->depth)
+			nr = find_next_bit(&word->word, depth, nr);
+			if (nr >= depth)
 				break;
-
-			if (!fn(sb, off + nr, data))
+			if (!fn(sb, (index << sb->shift) + nr, data))
 				return;
 
 			nr++;
 		}
+next:
+		nr = 0;
+		if (++index >= sb->map_nr)
+			index = 0;
 	}
 }
 
-#define SB_NR_TO_INDEX(sb, bitnr) ((bitnr) >> (sb)->shift)
-#define SB_NR_TO_BIT(sb, bitnr) ((bitnr) & ((1U << (sb)->shift) - 1U))
+/**
+ * sbitmap_for_each_set() - Iterate over each set bit in a &struct sbitmap.
+ * @sb: Bitmap to iterate over.
+ * @fn: Callback. Should return true to continue or false to break early.
+ * @data: Pointer to pass to callback.
+ */
+static inline void sbitmap_for_each_set(struct sbitmap *sb, sb_for_each_fn fn,
+					void *data)
+{
+	__sbitmap_for_each_set(sb, 0, fn, data);
+}
 
 static inline unsigned long *__sbitmap_word(struct sbitmap *sb,
 					    unsigned int bitnr)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [stable-4.14 04/23] blk-mq-sched: improve dispatching from sw queue
  2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
                   ` (2 preceding siblings ...)
  2018-07-23 13:24 ` [stable-4.14 03/23] sbitmap: introduce __sbitmap_for_each_set() Jack Wang
@ 2018-07-23 13:24 ` Jack Wang
  2018-07-23 13:24 ` [stable-4.14 05/23] scsi: allow passing in null rq to scsi_prep_state_check() Jack Wang
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 13:24 UTC (permalink / raw)
  To: gregkh, stable; +Cc: Ming Lei, Jens Axboe, Jack Wang

From: Ming Lei <ming.lei@redhat.com>

commit b347689ffbca745ac457ee27400ce1affd571c6f upstream

SCSI devices use host-wide tagset, and the shared driver tag space is
often quite big. However, there is also a queue depth for each lun(
.cmd_per_lun), which is often small, for example, on both lpfc and
qla2xxx, .cmd_per_lun is just 3.

So lots of requests may stay in sw queue, and we always flush all
belonging to same hw queue and dispatch them all to driver.
Unfortunately it is easy to cause queue busy because of the small
.cmd_per_lun.  Once these requests are flushed out, they have to stay in
hctx->dispatch, and no bio merge can happen on these requests, and
sequential IO performance is harmed.

This patch introduces blk_mq_dequeue_from_ctx for dequeuing a request
from a sw queue, so that we can dispatch them in scheduler's way. We can
then avoid dequeueing too many requests from sw queue, since we don't
flush ->dispatch completely.

This patch improves dispatching from sw queue by using the .get_budget
and .put_budget callbacks.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 block/blk-mq-sched.c   | 74 ++++++++++++++++++++++++++++++++++++++++++++++++--
 block/blk-mq.c         | 39 ++++++++++++++++++++++++++
 block/blk-mq.h         |  2 ++
 include/linux/blk-mq.h |  2 ++
 4 files changed, 114 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 8e525e66a0d9..df8581bb0a37 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -128,6 +128,61 @@ static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
 	return false;
 }
 
+static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
+					  struct blk_mq_ctx *ctx)
+{
+	unsigned idx = ctx->index_hw;
+
+	if (++idx == hctx->nr_ctx)
+		idx = 0;
+
+	return hctx->ctxs[idx];
+}
+
+/* return true if hctx need to run again */
+static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
+{
+	struct request_queue *q = hctx->queue;
+	LIST_HEAD(rq_list);
+	struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
+
+	do {
+		struct request *rq;
+		blk_status_t ret;
+
+		if (!sbitmap_any_bit_set(&hctx->ctx_map))
+			break;
+
+		ret = blk_mq_get_dispatch_budget(hctx);
+		if (ret == BLK_STS_RESOURCE)
+			return true;
+
+		rq = blk_mq_dequeue_from_ctx(hctx, ctx);
+		if (!rq) {
+			blk_mq_put_dispatch_budget(hctx);
+			break;
+		} else if (ret != BLK_STS_OK) {
+			blk_mq_end_request(rq, ret);
+			continue;
+		}
+
+		/*
+		 * Now this rq owns the budget which has to be released
+		 * if this rq won't be queued to driver via .queue_rq()
+		 * in blk_mq_dispatch_rq_list().
+		 */
+		list_add(&rq->queuelist, &rq_list);
+
+		/* round robin for fair dispatch */
+		ctx = blk_mq_next_ctx(hctx, rq->mq_ctx);
+
+	} while (blk_mq_dispatch_rq_list(q, &rq_list, true));
+
+	WRITE_ONCE(hctx->dispatch_from, ctx);
+
+	return false;
+}
+
 /* return true if hw queue need to be run again */
 bool blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 {
@@ -169,11 +224,24 @@ bool blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	 */
 	if (!list_empty(&rq_list)) {
 		blk_mq_sched_mark_restart_hctx(hctx);
-		if (blk_mq_dispatch_rq_list(q, &rq_list, false) &&
-				has_sched_dispatch)
-			run_queue = blk_mq_do_dispatch_sched(hctx);
+		if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
+			if (has_sched_dispatch)
+				run_queue = blk_mq_do_dispatch_sched(hctx);
+			else
+				run_queue = blk_mq_do_dispatch_ctx(hctx);
+		}
 	} else if (has_sched_dispatch) {
 		run_queue = blk_mq_do_dispatch_sched(hctx);
+	} else if (q->mq_ops->get_budget) {
+		/*
+		 * If we need to get budget before queuing request, we
+		 * dequeue request one by one from sw queue for avoiding
+		 * to mess up I/O merge when dispatch runs out of resource.
+		 *
+		 * TODO: get more budgets, and dequeue more requests in
+		 * one time.
+		 */
+		run_queue = blk_mq_do_dispatch_ctx(hctx);
 	} else {
 		blk_mq_flush_busy_ctxs(hctx, &rq_list);
 		blk_mq_dispatch_rq_list(q, &rq_list, false);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index ad5ae7192a64..3bc6de8b82c0 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -903,6 +903,45 @@ void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list)
 }
 EXPORT_SYMBOL_GPL(blk_mq_flush_busy_ctxs);
 
+struct dispatch_rq_data {
+	struct blk_mq_hw_ctx *hctx;
+	struct request *rq;
+};
+
+static bool dispatch_rq_from_ctx(struct sbitmap *sb, unsigned int bitnr,
+		void *data)
+{
+	struct dispatch_rq_data *dispatch_data = data;
+	struct blk_mq_hw_ctx *hctx = dispatch_data->hctx;
+	struct blk_mq_ctx *ctx = hctx->ctxs[bitnr];
+
+	spin_lock(&ctx->lock);
+	if (unlikely(!list_empty(&ctx->rq_list))) {
+		dispatch_data->rq = list_entry_rq(ctx->rq_list.next);
+		list_del_init(&dispatch_data->rq->queuelist);
+		if (list_empty(&ctx->rq_list))
+			sbitmap_clear_bit(sb, bitnr);
+	}
+	spin_unlock(&ctx->lock);
+
+	return !dispatch_data->rq;
+}
+
+struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
+					struct blk_mq_ctx *start)
+{
+	unsigned off = start ? start->index_hw : 0;
+	struct dispatch_rq_data data = {
+		.hctx = hctx,
+		.rq   = NULL,
+	};
+
+	__sbitmap_for_each_set(&hctx->ctx_map, off,
+			       dispatch_rq_from_ctx, &data);
+
+	return data.rq;
+}
+
 static inline unsigned int queued_to_index(unsigned int queued)
 {
 	if (!queued)
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 0c8ae91a7b5b..ef0f7c979f64 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -36,6 +36,8 @@ void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list);
 bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx);
 bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
 				bool wait);
+struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
+					struct blk_mq_ctx *start);
 
 /*
  * Internal helpers for allocating/freeing the request map
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index b7da1ded07fc..03439f10464f 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -31,6 +31,8 @@ struct blk_mq_hw_ctx {
 
 	struct sbitmap		ctx_map;
 
+	struct blk_mq_ctx	*dispatch_from;
+
 	struct blk_mq_ctx	**ctxs;
 	unsigned int		nr_ctx;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [stable-4.14 05/23] scsi: allow passing in null rq to scsi_prep_state_check()
  2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
                   ` (3 preceding siblings ...)
  2018-07-23 13:24 ` [stable-4.14 04/23] blk-mq-sched: improve dispatching from sw queue Jack Wang
@ 2018-07-23 13:24 ` Jack Wang
  2018-07-23 13:24 ` [stable-4.14 06/23] scsi: implement .get_budget and .put_budget for blk-mq Jack Wang
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 13:24 UTC (permalink / raw)
  To: gregkh, stable; +Cc: Ming Lei, Jens Axboe, Jack Wang

From: Ming Lei <ming.lei@redhat.com>

commit aeec77629a4ac6f8c248f3a82e80d4170a881f22 upstream

In the following patch, we will implement scsi_get_budget()
which need to call scsi_prep_state_check() when rq isn't
dequeued yet.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 drivers/scsi/scsi_lib.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index bfd8f12d4e9a..0ecc74421b1f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1331,7 +1331,7 @@ scsi_prep_state_check(struct scsi_device *sdev, struct request *req)
 			/*
 			 * If the devices is blocked we defer normal commands.
 			 */
-			if (!(req->rq_flags & RQF_PREEMPT))
+			if (req && !(req->rq_flags & RQF_PREEMPT))
 				ret = BLKPREP_DEFER;
 			break;
 		default:
@@ -1340,7 +1340,7 @@ scsi_prep_state_check(struct scsi_device *sdev, struct request *req)
 			 * special commands.  In particular any user initiated
 			 * command is not allowed.
 			 */
-			if (!(req->rq_flags & RQF_PREEMPT))
+			if (req && !(req->rq_flags & RQF_PREEMPT))
 				ret = BLKPREP_KILL;
 			break;
 		}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [stable-4.14 06/23] scsi: implement .get_budget and .put_budget for blk-mq
  2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
                   ` (4 preceding siblings ...)
  2018-07-23 13:24 ` [stable-4.14 05/23] scsi: allow passing in null rq to scsi_prep_state_check() Jack Wang
@ 2018-07-23 13:24 ` Jack Wang
  2018-07-23 13:24 ` [stable-4.14 07/23] SCSI: don't get target/host busy_count in scsi_mq_get_budget() Jack Wang
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 13:24 UTC (permalink / raw)
  To: gregkh, stable; +Cc: Ming Lei, Jens Axboe, Jack Wang

From: Ming Lei <ming.lei@redhat.com>

commit 0df21c86bdbfd17dec9ab898312af9bfb74d5d86 upstream

We need to tell blk-mq to reserve resources before queuing one request,
so implement these two callbacks. Then blk-mq can avoid to dequeue
request too early, and IO merging can be improved a lot.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[jwang: fix conflict in scsi_lib.c]
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 drivers/scsi/scsi_lib.c | 75 ++++++++++++++++++++++++++++++++++---------------
 1 file changed, 52 insertions(+), 23 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 0ecc74421b1f..c3f9d6d7a724 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1970,25 +1970,32 @@ static void scsi_mq_done(struct scsi_cmnd *cmd)
 	blk_mq_complete_request(cmd->request);
 }
 
-static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
-			 const struct blk_mq_queue_data *bd)
+static void scsi_mq_put_budget(struct blk_mq_hw_ctx *hctx)
 {
-	struct request *req = bd->rq;
-	struct request_queue *q = req->q;
+	struct request_queue *q = hctx->queue;
+	struct scsi_device *sdev = q->queuedata;
+	struct Scsi_Host *shost = sdev->host;
+
+	scsi_dec_host_busy(shost);
+	if (scsi_target(sdev)->can_queue > 0)
+		atomic_dec(&scsi_target(sdev)->target_busy);
+	atomic_dec(&sdev->device_busy);
+	put_device(&sdev->sdev_gendev);
+}
+
+static blk_status_t scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
+{
+	struct request_queue *q = hctx->queue;
 	struct scsi_device *sdev = q->queuedata;
 	struct Scsi_Host *shost = sdev->host;
-	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req);
 	blk_status_t ret;
-	int reason;
 
-	ret = prep_to_mq(scsi_prep_state_check(sdev, req));
-	if (ret != BLK_STS_OK)
-		goto out;
+	ret = prep_to_mq(scsi_prep_state_check(sdev, NULL));
+	if (ret == BLK_STS_RESOURCE || ret != BLK_STS_OK)
+		return ret;
 
-	ret = BLK_STS_RESOURCE;
 	if (!get_device(&sdev->sdev_gendev))
 		goto out;
-
 	if (!scsi_dev_queue_ready(q, sdev))
 		goto out_put_device;
 	if (!scsi_target_queue_ready(shost, sdev))
@@ -1996,10 +2003,38 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 	if (!scsi_host_queue_ready(q, shost, sdev))
 		goto out_dec_target_busy;
 
+	return BLK_STS_OK;
+
+out_dec_target_busy:
+	if (scsi_target(sdev)->can_queue > 0)
+		atomic_dec(&scsi_target(sdev)->target_busy);
+out_dec_device_busy:
+	atomic_dec(&sdev->device_busy);
+out_put_device:
+	put_device(&sdev->sdev_gendev);
+out:
+	return BLK_STS_RESOURCE;
+}
+
+static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
+			 const struct blk_mq_queue_data *bd)
+{
+	struct request *req = bd->rq;
+	struct request_queue *q = req->q;
+	struct scsi_device *sdev = q->queuedata;
+	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req);
+	blk_status_t ret;
+	int reason;
+
+	ret = prep_to_mq(scsi_prep_state_check(sdev, req));
+	if (ret != BLK_STS_OK)
+		goto out_put_budget;
+
+	ret = BLK_STS_RESOURCE;
 	if (!(req->rq_flags & RQF_DONTPREP)) {
 		ret = prep_to_mq(scsi_mq_prep_fn(req));
 		if (ret != BLK_STS_OK)
-			goto out_dec_host_busy;
+			goto out_put_budget;
 		req->rq_flags |= RQF_DONTPREP;
 	} else {
 		blk_mq_start_request(req);
@@ -2017,21 +2052,13 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 	if (reason) {
 		scsi_set_blocked(cmd, reason);
 		ret = BLK_STS_RESOURCE;
-		goto out_dec_host_busy;
+		goto out_put_budget;
 	}
 
 	return BLK_STS_OK;
 
-out_dec_host_busy:
-	scsi_dec_host_busy(shost);
-out_dec_target_busy:
-	if (scsi_target(sdev)->can_queue > 0)
-		atomic_dec(&scsi_target(sdev)->target_busy);
-out_dec_device_busy:
-	atomic_dec(&sdev->device_busy);
-out_put_device:
-	put_device(&sdev->sdev_gendev);
-out:
+out_put_budget:
+	scsi_mq_put_budget(hctx);
 	switch (ret) {
 	case BLK_STS_OK:
 		break;
@@ -2237,6 +2264,8 @@ struct request_queue *scsi_old_alloc_queue(struct scsi_device *sdev)
 }
 
 static const struct blk_mq_ops scsi_mq_ops = {
+	.get_budget	= scsi_mq_get_budget,
+	.put_budget	= scsi_mq_put_budget,
 	.queue_rq	= scsi_queue_rq,
 	.complete	= scsi_softirq_done,
 	.timeout	= scsi_timeout,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [stable-4.14 07/23] SCSI: don't get target/host busy_count in scsi_mq_get_budget()
  2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
                   ` (5 preceding siblings ...)
  2018-07-23 13:24 ` [stable-4.14 06/23] scsi: implement .get_budget and .put_budget for blk-mq Jack Wang
@ 2018-07-23 13:24 ` Jack Wang
  2018-07-23 13:24 ` [stable-4.14 08/23] blk-mq: don't handle TAG_SHARED in restart Jack Wang
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 13:24 UTC (permalink / raw)
  To: gregkh, stable; +Cc: Ming Lei, Jens Axboe, Jack Wang

From: Ming Lei <ming.lei@redhat.com>

commit 826a70a08b1210bbfdbda812ab43eb986e25b5c2 upstream

It is very expensive to atomic_inc/atomic_dec the host wide counter of
host->busy_count, and it should have been avoided via blk-mq's mechanism
of getting driver tag, which uses the more efficient way of sbitmap queue.

Also we don't check atomic_read(&sdev->device_busy) in scsi_mq_get_budget()
and don't run queue if the counter becomes zero, so IO hang may be caused
if all requests are completed just before the current SCSI device
is added to shost->starved_list.

Fixes: 0df21c86bdbf(scsi: implement .get_budget and .put_budget for blk-mq)
Reported-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[jwang: fix conflict in scsi_lib.c]
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 drivers/scsi/scsi_lib.c | 29 +++++++++++++----------------
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index c3f9d6d7a724..51265d2cfbb4 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1974,11 +1974,7 @@ static void scsi_mq_put_budget(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
 	struct scsi_device *sdev = q->queuedata;
-	struct Scsi_Host *shost = sdev->host;
 
-	scsi_dec_host_busy(shost);
-	if (scsi_target(sdev)->can_queue > 0)
-		atomic_dec(&scsi_target(sdev)->target_busy);
 	atomic_dec(&sdev->device_busy);
 	put_device(&sdev->sdev_gendev);
 }
@@ -1987,7 +1983,6 @@ static blk_status_t scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
 	struct scsi_device *sdev = q->queuedata;
-	struct Scsi_Host *shost = sdev->host;
 	blk_status_t ret;
 
 	ret = prep_to_mq(scsi_prep_state_check(sdev, NULL));
@@ -1998,18 +1993,9 @@ static blk_status_t scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
 		goto out;
 	if (!scsi_dev_queue_ready(q, sdev))
 		goto out_put_device;
-	if (!scsi_target_queue_ready(shost, sdev))
-		goto out_dec_device_busy;
-	if (!scsi_host_queue_ready(q, shost, sdev))
-		goto out_dec_target_busy;
 
 	return BLK_STS_OK;
 
-out_dec_target_busy:
-	if (scsi_target(sdev)->can_queue > 0)
-		atomic_dec(&scsi_target(sdev)->target_busy);
-out_dec_device_busy:
-	atomic_dec(&sdev->device_busy);
 out_put_device:
 	put_device(&sdev->sdev_gendev);
 out:
@@ -2022,6 +2008,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 	struct request *req = bd->rq;
 	struct request_queue *q = req->q;
 	struct scsi_device *sdev = q->queuedata;
+	struct Scsi_Host *shost = sdev->host;
 	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req);
 	blk_status_t ret;
 	int reason;
@@ -2031,10 +2018,15 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 		goto out_put_budget;
 
 	ret = BLK_STS_RESOURCE;
+	if (!scsi_target_queue_ready(shost, sdev))
+		goto out_put_budget;
+	if (!scsi_host_queue_ready(q, shost, sdev))
+		goto out_dec_target_busy;
+
 	if (!(req->rq_flags & RQF_DONTPREP)) {
 		ret = prep_to_mq(scsi_mq_prep_fn(req));
 		if (ret != BLK_STS_OK)
-			goto out_put_budget;
+			goto out_dec_host_busy;
 		req->rq_flags |= RQF_DONTPREP;
 	} else {
 		blk_mq_start_request(req);
@@ -2052,11 +2044,16 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 	if (reason) {
 		scsi_set_blocked(cmd, reason);
 		ret = BLK_STS_RESOURCE;
-		goto out_put_budget;
+		goto out_dec_host_busy;
 	}
 
 	return BLK_STS_OK;
 
+out_dec_host_busy:
+       scsi_dec_host_busy(shost);
+out_dec_target_busy:
+	if (scsi_target(sdev)->can_queue > 0)
+		atomic_dec(&scsi_target(sdev)->target_busy);
 out_put_budget:
 	scsi_mq_put_budget(hctx);
 	switch (ret) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [stable-4.14 08/23] blk-mq: don't handle TAG_SHARED in restart
  2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
                   ` (6 preceding siblings ...)
  2018-07-23 13:24 ` [stable-4.14 07/23] SCSI: don't get target/host busy_count in scsi_mq_get_budget() Jack Wang
@ 2018-07-23 13:24 ` Jack Wang
  2018-07-23 13:24 ` [stable-4.14 09/23] blk-mq: don't restart queue when .get_budget returns BLK_STS_RESOURCE Jack Wang
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 13:24 UTC (permalink / raw)
  To: gregkh, stable; +Cc: Ming Lei, Jens Axboe, Jack Wang

From: Ming Lei <ming.lei@redhat.com>

commit 358a3a6bccb74da9d63a26b2dd5f09f1e9970e0b upstream

Now restart is used in the following cases, and TAG_SHARED is for
SCSI only.

1) .get_budget() returns BLK_STS_RESOURCE
- if resource in target/host level isn't satisfied, this SCSI device
will be added in shost->starved_list, and the whole queue will be rerun
(via SCSI's built-in RESTART) in scsi_end_request() after any request
initiated from this host/targe is completed. Forget to mention, host level
resource can't be an issue for blk-mq at all.

- the same is true if resource in the queue level isn't satisfied.

- if there isn't outstanding request on this queue, then SCSI's RESTART
can't work(blk-mq's can't work too), and the queue will be run after
SCSI_QUEUE_DELAY, and finally all starved sdevs will be handled by SCSI's
RESTART when this request is finished

2) scsi_dispatch_cmd() returns BLK_STS_RESOURCE
- if there isn't onprogressing request on this queue, the queue
will be run after SCSI_QUEUE_DELAY

- otherwise, SCSI's RESTART covers the rerun.

3) blk_mq_get_driver_tag() failed
- BLK_MQ_S_TAG_WAITING covers the cross-queue RESTART for driver
allocation.

In one word, SCSI's built-in RESTART is enough to cover the queue
rerun, and we don't need to pay special attention to TAG_SHARED wrt. restart.

In my test on scsi_debug(8 luns), this patch improves IOPS by 20% ~ 30% when
running I/O on these 8 luns concurrently.

Aslo Roman Pen reported the current RESTART is very expensive especialy
when there are lots of LUNs attached in one host, such as in his
test, RESTART causes half of IOPS be cut.

Fixes: https://marc.info/?l=linux-kernel&m=150832216727524&w=2
Fixes: 6d8c6c0f97ad ("blk-mq: Restart a single queue if tag sets are shared")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 block/blk-mq-sched.c | 78 +++-------------------------------------------------
 1 file changed, 4 insertions(+), 74 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index df8581bb0a37..daab27feb653 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -68,25 +68,17 @@ static void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx)
 		set_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
 }
 
-static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx)
+void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
 {
 	if (!test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
-		return false;
-
-	if (hctx->flags & BLK_MQ_F_TAG_SHARED) {
-		struct request_queue *q = hctx->queue;
+		return;
 
-		if (test_and_clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
-			atomic_dec(&q->shared_hctx_restart);
-	} else
-		clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
+	clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
 
 	if (blk_mq_hctx_has_pending(hctx)) {
 		blk_mq_run_hw_queue(hctx, true);
-		return true;
+		return;
 	}
-
-	return false;
 }
 
 /* return true if hctx need to run again */
@@ -385,68 +377,6 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
 	return true;
 }
 
-/**
- * list_for_each_entry_rcu_rr - iterate in a round-robin fashion over rcu list
- * @pos:    loop cursor.
- * @skip:   the list element that will not be examined. Iteration starts at
- *          @skip->next.
- * @head:   head of the list to examine. This list must have at least one
- *          element, namely @skip.
- * @member: name of the list_head structure within typeof(*pos).
- */
-#define list_for_each_entry_rcu_rr(pos, skip, head, member)		\
-	for ((pos) = (skip);						\
-	     (pos = (pos)->member.next != (head) ? list_entry_rcu(	\
-			(pos)->member.next, typeof(*pos), member) :	\
-	      list_entry_rcu((pos)->member.next->next, typeof(*pos), member)), \
-	     (pos) != (skip); )
-
-/*
- * Called after a driver tag has been freed to check whether a hctx needs to
- * be restarted. Restarts @hctx if its tag set is not shared. Restarts hardware
- * queues in a round-robin fashion if the tag set of @hctx is shared with other
- * hardware queues.
- */
-void blk_mq_sched_restart(struct blk_mq_hw_ctx *const hctx)
-{
-	struct blk_mq_tags *const tags = hctx->tags;
-	struct blk_mq_tag_set *const set = hctx->queue->tag_set;
-	struct request_queue *const queue = hctx->queue, *q;
-	struct blk_mq_hw_ctx *hctx2;
-	unsigned int i, j;
-
-	if (set->flags & BLK_MQ_F_TAG_SHARED) {
-		/*
-		 * If this is 0, then we know that no hardware queues
-		 * have RESTART marked. We're done.
-		 */
-		if (!atomic_read(&queue->shared_hctx_restart))
-			return;
-
-		rcu_read_lock();
-		list_for_each_entry_rcu_rr(q, queue, &set->tag_list,
-					   tag_set_list) {
-			queue_for_each_hw_ctx(q, hctx2, i)
-				if (hctx2->tags == tags &&
-				    blk_mq_sched_restart_hctx(hctx2))
-					goto done;
-		}
-		j = hctx->queue_num + 1;
-		for (i = 0; i < queue->nr_hw_queues; i++, j++) {
-			if (j == queue->nr_hw_queues)
-				j = 0;
-			hctx2 = queue->queue_hw_ctx[j];
-			if (hctx2->tags == tags &&
-			    blk_mq_sched_restart_hctx(hctx2))
-				break;
-		}
-done:
-		rcu_read_unlock();
-	} else {
-		blk_mq_sched_restart_hctx(hctx);
-	}
-}
-
 /*
  * Add flush/fua to the queue. If we fail getting a driver tag, then
  * punt to the requeue list. Requeue will re-invoke us from a context
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [stable-4.14 09/23] blk-mq: don't restart queue when .get_budget returns BLK_STS_RESOURCE
  2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
                   ` (7 preceding siblings ...)
  2018-07-23 13:24 ` [stable-4.14 08/23] blk-mq: don't handle TAG_SHARED in restart Jack Wang
@ 2018-07-23 13:24 ` Jack Wang
  2018-07-23 13:24 ` [stable-4.14 10/23] blk-mq: don't handle failure in .get_budget Jack Wang
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 13:24 UTC (permalink / raw)
  To: gregkh, stable; +Cc: Ming Lei, Jens Axboe, Jack Wang

From: Ming Lei <ming.lei@redhat.com>

commit 1f460b63d4b37f504d8d0affc2cd492eb005ea97 upstream

SCSI restarts its queue in scsi_end_request() automatically, so we don't
need to handle this case in blk-mq.

Especailly any request won't be dequeued in this case, we needn't to
worry about IO hang caused by restart vs. dispatch.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 block/blk-mq-sched.c | 45 ++++++++++++++++++++-------------------------
 block/blk-mq-sched.h |  2 +-
 block/blk-mq.c       |  8 ++------
 3 files changed, 23 insertions(+), 32 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index daab27feb653..7775f6b12fa9 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -81,8 +81,12 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
 	}
 }
 
-/* return true if hctx need to run again */
-static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
+/*
+ * Only SCSI implements .get_budget and .put_budget, and SCSI restarts
+ * its queue by itself in its completion handler, so we don't need to
+ * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
+ */
+static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
 	struct elevator_queue *e = q->elevator;
@@ -98,7 +102,7 @@ static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
 
 		ret = blk_mq_get_dispatch_budget(hctx);
 		if (ret == BLK_STS_RESOURCE)
-			return true;
+			break;
 
 		rq = e->type->ops.mq.dispatch_request(hctx);
 		if (!rq) {
@@ -116,8 +120,6 @@ static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
 		 */
 		list_add(&rq->queuelist, &rq_list);
 	} while (blk_mq_dispatch_rq_list(q, &rq_list, true));
-
-	return false;
 }
 
 static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
@@ -131,8 +133,12 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
 	return hctx->ctxs[idx];
 }
 
-/* return true if hctx need to run again */
-static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
+/*
+ * Only SCSI implements .get_budget and .put_budget, and SCSI restarts
+ * its queue by itself in its completion handler, so we don't need to
+ * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
+ */
+static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
 	LIST_HEAD(rq_list);
@@ -147,7 +153,7 @@ static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
 
 		ret = blk_mq_get_dispatch_budget(hctx);
 		if (ret == BLK_STS_RESOURCE)
-			return true;
+			break;
 
 		rq = blk_mq_dequeue_from_ctx(hctx, ctx);
 		if (!rq) {
@@ -171,22 +177,19 @@ static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
 	} while (blk_mq_dispatch_rq_list(q, &rq_list, true));
 
 	WRITE_ONCE(hctx->dispatch_from, ctx);
-
-	return false;
 }
 
 /* return true if hw queue need to be run again */
-bool blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
+void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
 	struct elevator_queue *e = q->elevator;
 	const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
 	LIST_HEAD(rq_list);
-	bool run_queue = false;
 
 	/* RCU or SRCU read lock is needed before checking quiesced flag */
 	if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
-		return false;
+		return;
 
 	hctx->run++;
 
@@ -218,12 +221,12 @@ bool blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 		blk_mq_sched_mark_restart_hctx(hctx);
 		if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
 			if (has_sched_dispatch)
-				run_queue = blk_mq_do_dispatch_sched(hctx);
+				blk_mq_do_dispatch_sched(hctx);
 			else
-				run_queue = blk_mq_do_dispatch_ctx(hctx);
+				blk_mq_do_dispatch_ctx(hctx);
 		}
 	} else if (has_sched_dispatch) {
-		run_queue = blk_mq_do_dispatch_sched(hctx);
+		blk_mq_do_dispatch_sched(hctx);
 	} else if (q->mq_ops->get_budget) {
 		/*
 		 * If we need to get budget before queuing request, we
@@ -233,19 +236,11 @@ bool blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 		 * TODO: get more budgets, and dequeue more requests in
 		 * one time.
 		 */
-		run_queue = blk_mq_do_dispatch_ctx(hctx);
+		blk_mq_do_dispatch_ctx(hctx);
 	} else {
 		blk_mq_flush_busy_ctxs(hctx, &rq_list);
 		blk_mq_dispatch_rq_list(q, &rq_list, false);
 	}
-
-	if (run_queue && !blk_mq_sched_needs_restart(hctx) &&
-			!test_bit(BLK_MQ_S_TAG_WAITING, &hctx->state)) {
-		blk_mq_sched_mark_restart_hctx(hctx);
-		return true;
-	}
-
-	return false;
 }
 
 bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 1ccfb8027cfc..ba1d1418a96d 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -23,7 +23,7 @@ void blk_mq_sched_insert_requests(struct request_queue *q,
 				  struct blk_mq_ctx *ctx,
 				  struct list_head *list, bool run_queue_async);
 
-bool blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx);
+void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx);
 
 int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e);
 void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3bc6de8b82c0..b9dac21f35f6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1215,7 +1215,6 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 {
 	int srcu_idx;
-	bool run_queue;
 
 	/*
 	 * We should be running this queue from one of the CPUs that
@@ -1250,18 +1249,15 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 
 	if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
 		rcu_read_lock();
-		run_queue = blk_mq_sched_dispatch_requests(hctx);
+		blk_mq_sched_dispatch_requests(hctx);
 		rcu_read_unlock();
 	} else {
 		might_sleep();
 
 		srcu_idx = srcu_read_lock(hctx->queue_rq_srcu);
-		run_queue = blk_mq_sched_dispatch_requests(hctx);
+		blk_mq_sched_dispatch_requests(hctx);
 		srcu_read_unlock(hctx->queue_rq_srcu, srcu_idx);
 	}
-
-	if (run_queue)
-		blk_mq_run_hw_queue(hctx, true);
 }
 
 /*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [stable-4.14 10/23] blk-mq: don't handle failure in .get_budget
  2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
                   ` (8 preceding siblings ...)
  2018-07-23 13:24 ` [stable-4.14 09/23] blk-mq: don't restart queue when .get_budget returns BLK_STS_RESOURCE Jack Wang
@ 2018-07-23 13:24 ` Jack Wang
  2018-07-23 13:24 ` [stable-4.14 11/23] blk-mq: put the driver tag of nxt rq before first one is requeued Jack Wang
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 13:24 UTC (permalink / raw)
  To: gregkh, stable; +Cc: Ming Lei, Jens Axboe, Jack Wang

From: Ming Lei <ming.lei@redhat.com>

commit 88022d7201e96b43f1754b0358fc6bcd8dbdcde1 upstream

It is enough to just check if we can get the budget via .get_budget().
And we don't need to deal with device state change in .get_budget().

For SCSI, one issue to be fixed is that we have to call
scsi_mq_uninit_cmd() to free allocated ressources if SCSI device fails
to handle the request. And it isn't enough to simply call
blk_mq_end_request() to do that if this request is marked as
RQF_DONTPREP.

Fixes: 0df21c86bdbf(scsi: implement .get_budget and .put_budget for blk-mq)
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 block/blk-mq-sched.c    | 14 ++------------
 block/blk-mq.c          | 17 ++++-------------
 block/blk-mq.h          |  5 ++---
 drivers/scsi/scsi_lib.c | 11 +++--------
 include/linux/blk-mq.h  |  2 +-
 5 files changed, 12 insertions(+), 37 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 7775f6b12fa9..13a27d4d1671 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -94,23 +94,18 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
 
 	do {
 		struct request *rq;
-		blk_status_t ret;
 
 		if (e->type->ops.mq.has_work &&
 				!e->type->ops.mq.has_work(hctx))
 			break;
 
-		ret = blk_mq_get_dispatch_budget(hctx);
-		if (ret == BLK_STS_RESOURCE)
+		if (!blk_mq_get_dispatch_budget(hctx))
 			break;
 
 		rq = e->type->ops.mq.dispatch_request(hctx);
 		if (!rq) {
 			blk_mq_put_dispatch_budget(hctx);
 			break;
-		} else if (ret != BLK_STS_OK) {
-			blk_mq_end_request(rq, ret);
-			continue;
 		}
 
 		/*
@@ -146,22 +141,17 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
 
 	do {
 		struct request *rq;
-		blk_status_t ret;
 
 		if (!sbitmap_any_bit_set(&hctx->ctx_map))
 			break;
 
-		ret = blk_mq_get_dispatch_budget(hctx);
-		if (ret == BLK_STS_RESOURCE)
+		if (!blk_mq_get_dispatch_budget(hctx))
 			break;
 
 		rq = blk_mq_dequeue_from_ctx(hctx, ctx);
 		if (!rq) {
 			blk_mq_put_dispatch_budget(hctx);
 			break;
-		} else if (ret != BLK_STS_OK) {
-			blk_mq_end_request(rq, ret);
-			continue;
 		}
 
 		/*
diff --git a/block/blk-mq.c b/block/blk-mq.c
index b9dac21f35f6..94cf2c91f767 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1123,13 +1123,8 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 			}
 		}
 
-		if (!got_budget) {
-			ret = blk_mq_get_dispatch_budget(hctx);
-			if (ret == BLK_STS_RESOURCE)
-				break;
-			if (ret != BLK_STS_OK)
-				goto fail_rq;
-		}
+		if (!got_budget && !blk_mq_get_dispatch_budget(hctx))
+			break;
 
 		list_del_init(&rq->queuelist);
 
@@ -1156,7 +1151,6 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 			break;
 		}
 
- fail_rq:
 		if (unlikely(ret != BLK_STS_OK)) {
 			errors++;
 			blk_mq_end_request(rq, BLK_STS_IOERR);
@@ -1650,12 +1644,10 @@ static void __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 	if (!blk_mq_get_driver_tag(rq, NULL, false))
 		goto insert;
 
-	ret = blk_mq_get_dispatch_budget(hctx);
-	if (ret == BLK_STS_RESOURCE) {
+	if (!blk_mq_get_dispatch_budget(hctx)) {
 		blk_mq_put_driver_tag(rq);
 		goto insert;
-	} else if (ret != BLK_STS_OK)
-		goto fail_rq;
+	}
 
 	new_cookie = request_to_qc_t(hctx, rq);
 
@@ -1673,7 +1665,6 @@ static void __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 		__blk_mq_requeue_request(rq);
 		goto insert;
 	default:
- fail_rq:
 		*cookie = BLK_QC_T_NONE;
 		blk_mq_end_request(rq, ret);
 		return;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index ef0f7c979f64..93dde7054a28 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -150,14 +150,13 @@ static inline void blk_mq_put_dispatch_budget(struct blk_mq_hw_ctx *hctx)
 		q->mq_ops->put_budget(hctx);
 }
 
-static inline blk_status_t blk_mq_get_dispatch_budget(
-		struct blk_mq_hw_ctx *hctx)
+static inline bool blk_mq_get_dispatch_budget(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
 
 	if (q->mq_ops->get_budget)
 		return q->mq_ops->get_budget(hctx);
-	return BLK_STS_OK;
+	return true;
 }
 
 #endif
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 51265d2cfbb4..9e242583ab83 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1979,27 +1979,22 @@ static void scsi_mq_put_budget(struct blk_mq_hw_ctx *hctx)
 	put_device(&sdev->sdev_gendev);
 }
 
-static blk_status_t scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
+static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
 	struct scsi_device *sdev = q->queuedata;
-	blk_status_t ret;
-
-	ret = prep_to_mq(scsi_prep_state_check(sdev, NULL));
-	if (ret == BLK_STS_RESOURCE || ret != BLK_STS_OK)
-		return ret;
 
 	if (!get_device(&sdev->sdev_gendev))
 		goto out;
 	if (!scsi_dev_queue_ready(q, sdev))
 		goto out_put_device;
 
-	return BLK_STS_OK;
+	return true;
 
 out_put_device:
 	put_device(&sdev->sdev_gendev);
 out:
-	return BLK_STS_RESOURCE;
+	return false;
 }
 
 static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 03439f10464f..4564bd216431 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -93,7 +93,7 @@ struct blk_mq_queue_data {
 
 typedef blk_status_t (queue_rq_fn)(struct blk_mq_hw_ctx *,
 		const struct blk_mq_queue_data *);
-typedef blk_status_t (get_budget_fn)(struct blk_mq_hw_ctx *);
+typedef bool (get_budget_fn)(struct blk_mq_hw_ctx *);
 typedef void (put_budget_fn)(struct blk_mq_hw_ctx *);
 typedef enum blk_eh_timer_return (timeout_fn)(struct request *, bool);
 typedef int (init_hctx_fn)(struct blk_mq_hw_ctx *, void *, unsigned int);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [stable-4.14 11/23] blk-mq: put the driver tag of nxt rq before first one is requeued
  2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
                   ` (9 preceding siblings ...)
  2018-07-23 13:24 ` [stable-4.14 10/23] blk-mq: don't handle failure in .get_budget Jack Wang
@ 2018-07-23 13:24 ` Jack Wang
  2018-07-23 13:24 ` [stable-4.14 12/23] blk-flush: don't run queue for requests bypassing flush Jack Wang
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 13:24 UTC (permalink / raw)
  To: gregkh, stable; +Cc: Jianchao Wang, Jens Axboe, Jack Wang

From: Jianchao Wang <jianchao.w.wang@oracle.com>

commit 6d6f167ce74158903e7fc20dfbecf89c71aa1c00 upstream

When freeing the driver tag of the next rq with an I/O scheduler
configured, we get the first entry of the list. However, this can
race with requeue of a request, and we end up getting the wrong request
from the head of the list. Free the driver tag of next rq before the
failed one is requeued in the failure branch of queue_rq callback.

Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 block/blk-mq.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 94cf2c91f767..8bef73f9151d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1080,7 +1080,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 		bool got_budget)
 {
 	struct blk_mq_hw_ctx *hctx;
-	struct request *rq;
+	struct request *rq, *nxt;
 	int errors, queued;
 
 	if (list_empty(list))
@@ -1137,14 +1137,20 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 		if (list_empty(list))
 			bd.last = true;
 		else {
-			struct request *nxt;
-
 			nxt = list_first_entry(list, struct request, queuelist);
 			bd.last = !blk_mq_get_driver_tag(nxt, NULL, false);
 		}
 
 		ret = q->mq_ops->queue_rq(hctx, &bd);
 		if (ret == BLK_STS_RESOURCE) {
+			/*
+			 * If an I/O scheduler has been configured and we got a
+			 * driver tag for the next request already, free it again.
+			 */
+			if (!list_empty(list)) {
+				nxt = list_first_entry(list, struct request, queuelist);
+				blk_mq_put_driver_tag(nxt);
+			}
 			blk_mq_put_driver_tag_hctx(hctx, rq);
 			list_add(&rq->queuelist, list);
 			__blk_mq_requeue_request(rq);
@@ -1167,13 +1173,6 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 	 * that is where we will continue on next queue run.
 	 */
 	if (!list_empty(list)) {
-		/*
-		 * If an I/O scheduler has been configured and we got a driver
-		 * tag for the next request already, free it again.
-		 */
-		rq = list_first_entry(list, struct request, queuelist);
-		blk_mq_put_driver_tag(rq);
-
 		spin_lock(&hctx->lock);
 		list_splice_init(list, &hctx->dispatch);
 		spin_unlock(&hctx->lock);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [stable-4.14 12/23] blk-flush: don't run queue for requests bypassing flush
  2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
                   ` (10 preceding siblings ...)
  2018-07-23 13:24 ` [stable-4.14 11/23] blk-mq: put the driver tag of nxt rq before first one is requeued Jack Wang
@ 2018-07-23 13:24 ` Jack Wang
  2018-07-23 13:24 ` [stable-4.14 13/23] block: pass 'run_queue' to blk_mq_request_bypass_insert Jack Wang
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 13:24 UTC (permalink / raw)
  To: gregkh, stable; +Cc: Ming Lei, Jens Axboe, Jack Wang

From: Ming Lei <ming.lei@redhat.com>

commit 9c71c83c857e7a84a5be5a56ea88da7098f51db8 upstream

blk_insert_flush() should only insert request since run queue always
follows it.

In case of bypassing flush, we don't need to run queue because every
blk_insert_flush() follows one run queue.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 block/blk-flush.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 4938bec8cfef..81bd1a843043 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -463,7 +463,7 @@ void blk_insert_flush(struct request *rq)
 	if ((policy & REQ_FSEQ_DATA) &&
 	    !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
 		if (q->mq_ops)
-			blk_mq_sched_insert_request(rq, false, true, false, false);
+			blk_mq_sched_insert_request(rq, false, false, false, false);
 		else
 			list_add_tail(&rq->queuelist, &q->queue_head);
 		return;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [stable-4.14 13/23] block: pass 'run_queue' to blk_mq_request_bypass_insert
  2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
                   ` (11 preceding siblings ...)
  2018-07-23 13:24 ` [stable-4.14 12/23] blk-flush: don't run queue for requests bypassing flush Jack Wang
@ 2018-07-23 13:24 ` Jack Wang
  2018-07-23 13:24 ` [stable-4.14 14/23] blk-flush: use blk_mq_request_bypass_insert() Jack Wang
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 13:24 UTC (permalink / raw)
  To: gregkh, stable; +Cc: Ming Lei, Jens Axboe, Jack Wang

From: Ming Lei <ming.lei@redhat.com>

commit b0850297c749ea79a5717d597931366b3d7f4b09 upstream

Block flush need this function without running the queue, so add a
parameter controlling whether we run it or not.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 block/blk-core.c | 2 +-
 block/blk-mq.c   | 5 +++--
 block/blk-mq.h   | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 68bae6338ad4..297e1745f099 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2364,7 +2364,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *
 		 * bypass a potential scheduler on the bottom device for
 		 * insert.
 		 */
-		blk_mq_request_bypass_insert(rq);
+		blk_mq_request_bypass_insert(rq, true);
 		return BLK_STS_OK;
 	}
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8bef73f9151d..1f515346f94e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1496,7 +1496,7 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
  * Should only be used carefully, when the caller knows we want to
  * bypass a potential IO scheduler on the target device.
  */
-void blk_mq_request_bypass_insert(struct request *rq)
+void blk_mq_request_bypass_insert(struct request *rq, bool run_queue)
 {
 	struct blk_mq_ctx *ctx = rq->mq_ctx;
 	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu);
@@ -1505,7 +1505,8 @@ void blk_mq_request_bypass_insert(struct request *rq)
 	list_add_tail(&rq->queuelist, &hctx->dispatch);
 	spin_unlock(&hctx->lock);
 
-	blk_mq_run_hw_queue(hctx, false);
+	if (run_queue)
+		blk_mq_run_hw_queue(hctx, false);
 }
 
 void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 93dde7054a28..1819db23b038 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -57,7 +57,7 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
  */
 void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 				bool at_head);
-void blk_mq_request_bypass_insert(struct request *rq);
+void blk_mq_request_bypass_insert(struct request *rq, bool run_queue);
 void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 				struct list_head *list);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [stable-4.14 14/23] blk-flush: use blk_mq_request_bypass_insert()
  2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
                   ` (12 preceding siblings ...)
  2018-07-23 13:24 ` [stable-4.14 13/23] block: pass 'run_queue' to blk_mq_request_bypass_insert Jack Wang
@ 2018-07-23 13:24 ` Jack Wang
  2018-07-23 13:24 ` [stable-4.14 15/23] blk-mq-sched: decide how to handle flush rq via RQF_FLUSH_SEQ Jack Wang
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 13:24 UTC (permalink / raw)
  To: gregkh, stable; +Cc: Ming Lei, Jens Axboe, Jack Wang

From: Ming Lei <ming.lei@redhat.com>

commit 598906f814280762157629ba8833bf5cb11def74 upstream

In the following patch, we will use RQF_FLUSH_SEQ to decide:

1) if the flag isn't set, the flush rq need to be inserted via
blk_insert_flush()

2) otherwise, the flush rq need to be dispatched directly since
it is in flush machinery now.

So we use blk_mq_request_bypass_insert() for requests of bypassing
flush machinery, just like the legacy path did.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 block/blk-flush.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 81bd1a843043..a9773d2075ac 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -463,7 +463,7 @@ void blk_insert_flush(struct request *rq)
 	if ((policy & REQ_FSEQ_DATA) &&
 	    !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
 		if (q->mq_ops)
-			blk_mq_sched_insert_request(rq, false, false, false, false);
+			blk_mq_request_bypass_insert(rq, false);
 		else
 			list_add_tail(&rq->queuelist, &q->queue_head);
 		return;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [stable-4.14 15/23] blk-mq-sched: decide how to handle flush rq via RQF_FLUSH_SEQ
  2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
                   ` (13 preceding siblings ...)
  2018-07-23 13:24 ` [stable-4.14 14/23] blk-flush: use blk_mq_request_bypass_insert() Jack Wang
@ 2018-07-23 13:24 ` Jack Wang
  2018-07-23 13:24 ` [stable-4.14 16/23] blk-mq: move blk_mq_put_driver_tag*() into blk-mq.h Jack Wang
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 13:24 UTC (permalink / raw)
  To: gregkh, stable; +Cc: Ming Lei, Jens Axboe, Jack Wang

From: Ming Lei <ming.lei@redhat.com>

commit a6a252e6491443c1c18eab7e254daee63d4a7a04 upstream

In case of IO scheduler we always pre-allocate one driver tag before
calling blk_insert_flush(), and flush request will be marked as
RQF_FLUSH_SEQ once it is in flush machinery.

So if RQF_FLUSH_SEQ isn't set, we call blk_insert_flush() to handle
the request, otherwise the flush request is dispatched to ->dispatch
list directly.

This is a preparation patch for not preallocating a driver tag for flush
requests, and for not treating flush requests as a special case. This is
similar to what the legacy path does.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 block/blk-mq-sched.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 13a27d4d1671..e7094f44afaf 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -345,21 +345,23 @@ void blk_mq_sched_request_inserted(struct request *rq)
 EXPORT_SYMBOL_GPL(blk_mq_sched_request_inserted);
 
 static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
+				       bool has_sched,
 				       struct request *rq)
 {
-	if (rq->tag == -1) {
+	/* dispatch flush rq directly */
+	if (rq->rq_flags & RQF_FLUSH_SEQ) {
+		spin_lock(&hctx->lock);
+		list_add(&rq->queuelist, &hctx->dispatch);
+		spin_unlock(&hctx->lock);
+		return true;
+	}
+
+	if (has_sched) {
 		rq->rq_flags |= RQF_SORTED;
-		return false;
+		WARN_ON(rq->tag != -1);
 	}
 
-	/*
-	 * If we already have a real request tag, send directly to
-	 * the dispatch list.
-	 */
-	spin_lock(&hctx->lock);
-	list_add(&rq->queuelist, &hctx->dispatch);
-	spin_unlock(&hctx->lock);
-	return true;
+	return false;
 }
 
 /*
@@ -385,12 +387,13 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head,
 	struct blk_mq_ctx *ctx = rq->mq_ctx;
 	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
 
-	if (rq->tag == -1 && op_is_flush(rq->cmd_flags)) {
+	/* flush rq in flush machinery need to be dispatched directly */
+	if (!(rq->rq_flags & RQF_FLUSH_SEQ) && op_is_flush(rq->cmd_flags)) {
 		blk_mq_sched_insert_flush(hctx, rq, can_block);
 		return;
 	}
 
-	if (e && blk_mq_sched_bypass_insert(hctx, rq))
+	if (blk_mq_sched_bypass_insert(hctx, !!e, rq))
 		goto run;
 
 	if (e && e->type->ops.mq.insert_requests) {
@@ -428,7 +431,7 @@ void blk_mq_sched_insert_requests(struct request_queue *q,
 		list_for_each_entry_safe(rq, next, list, queuelist) {
 			if (WARN_ON_ONCE(rq->tag != -1)) {
 				list_del_init(&rq->queuelist);
-				blk_mq_sched_bypass_insert(hctx, rq);
+				blk_mq_sched_bypass_insert(hctx, true, rq);
 			}
 		}
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [stable-4.14 16/23] blk-mq: move blk_mq_put_driver_tag*() into blk-mq.h
  2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
                   ` (14 preceding siblings ...)
  2018-07-23 13:24 ` [stable-4.14 15/23] blk-mq-sched: decide how to handle flush rq via RQF_FLUSH_SEQ Jack Wang
@ 2018-07-23 13:24 ` Jack Wang
  2018-07-23 13:24 ` [stable-4.14 17/23] blk-mq: don't allocate driver tag upfront for flush rq Jack Wang
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 13:24 UTC (permalink / raw)
  To: gregkh, stable; +Cc: Ming Lei, Jens Axboe, Jack Wang

From: Ming Lei <ming.lei@redhat.com>

commit 244c65a3ccaa06fd15cc940315606674d3108b2f upstream

We need this helper to put the driver tag for flush rq, since we will
not share tag in the flush request sequence in the following patch
in case that I/O scheduler is applied.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 block/blk-mq.c | 32 --------------------------------
 block/blk-mq.h | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 32 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 1f515346f94e..aff2ada523c3 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -982,38 +982,6 @@ bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
 	return rq->tag != -1;
 }
 
-static void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
-				    struct request *rq)
-{
-	blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag);
-	rq->tag = -1;
-
-	if (rq->rq_flags & RQF_MQ_INFLIGHT) {
-		rq->rq_flags &= ~RQF_MQ_INFLIGHT;
-		atomic_dec(&hctx->nr_active);
-	}
-}
-
-static void blk_mq_put_driver_tag_hctx(struct blk_mq_hw_ctx *hctx,
-				       struct request *rq)
-{
-	if (rq->tag == -1 || rq->internal_tag == -1)
-		return;
-
-	__blk_mq_put_driver_tag(hctx, rq);
-}
-
-static void blk_mq_put_driver_tag(struct request *rq)
-{
-	struct blk_mq_hw_ctx *hctx;
-
-	if (rq->tag == -1 || rq->internal_tag == -1)
-		return;
-
-	hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu);
-	__blk_mq_put_driver_tag(hctx, rq);
-}
-
 /*
  * If we fail getting a driver tag because all the driver tags are already
  * assigned and on the dispatch list, BUT the first entry does not have a
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 1819db23b038..a5c2b46951e4 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -3,6 +3,7 @@
 #define INT_BLK_MQ_H
 
 #include "blk-stat.h"
+#include "blk-mq-tag.h"
 
 struct blk_mq_tag_set;
 
@@ -159,4 +160,36 @@ static inline bool blk_mq_get_dispatch_budget(struct blk_mq_hw_ctx *hctx)
 	return true;
 }
 
+static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
+					   struct request *rq)
+{
+	blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag);
+	rq->tag = -1;
+
+	if (rq->rq_flags & RQF_MQ_INFLIGHT) {
+		rq->rq_flags &= ~RQF_MQ_INFLIGHT;
+		atomic_dec(&hctx->nr_active);
+	}
+}
+
+static inline void blk_mq_put_driver_tag_hctx(struct blk_mq_hw_ctx *hctx,
+				       struct request *rq)
+{
+	if (rq->tag == -1 || rq->internal_tag == -1)
+		return;
+
+	__blk_mq_put_driver_tag(hctx, rq);
+}
+
+static inline void blk_mq_put_driver_tag(struct request *rq)
+{
+	struct blk_mq_hw_ctx *hctx;
+
+	if (rq->tag == -1 || rq->internal_tag == -1)
+		return;
+
+	hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu);
+	__blk_mq_put_driver_tag(hctx, rq);
+}
+
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [stable-4.14 17/23] blk-mq: don't allocate driver tag upfront for flush rq
  2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
                   ` (15 preceding siblings ...)
  2018-07-23 13:24 ` [stable-4.14 16/23] blk-mq: move blk_mq_put_driver_tag*() into blk-mq.h Jack Wang
@ 2018-07-23 13:24 ` Jack Wang
  2018-07-23 13:24 ` [stable-4.14 18/23] Revert "blk-mq: don't handle TAG_SHARED in restart" Jack Wang
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 13:24 UTC (permalink / raw)
  To: gregkh, stable; +Cc: Ming Lei, Jens Axboe, Jack Wang

From: Ming Lei <ming.lei@redhat.com>

commit 923218f6166a84688973acdc39094f3bee1e9ad4 upstream

The idea behind it is simple:

1) for none scheduler, driver tag has to be borrowed for flush rq,
   otherwise we may run out of tag, and that causes an IO hang. And
   get/put driver tag is actually noop for none, so reordering tags
   isn't necessary at all.

2) for a real I/O scheduler, we need not allocate a driver tag upfront
   for flush rq. It works just fine to follow the same approach as
   normal requests: allocate driver tag for each rq just before calling
   ->queue_rq().

One driver visible change is that the driver tag isn't shared in the
flush request sequence. That won't be a problem, since we always do that
in legacy path.

Then flush rq need not be treated specially wrt. get/put driver tag.
This cleans up the code - for instance, reorder_tags_to_front() can be
removed, and we needn't worry about request ordering in dispatch list
for avoiding I/O deadlock.

Also we have to put the driver tag before requeueing.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 block/blk-flush.c    | 35 ++++++++++++++++++++++++++---------
 block/blk-mq-sched.c | 42 +++++-------------------------------------
 block/blk-mq.c       | 41 ++++++-----------------------------------
 3 files changed, 37 insertions(+), 81 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index a9773d2075ac..f17170675917 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -231,8 +231,13 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
 		/* release the tag's ownership to the req cloned from */
 		spin_lock_irqsave(&fq->mq_flush_lock, flags);
 		hctx = blk_mq_map_queue(q, flush_rq->mq_ctx->cpu);
-		blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq);
-		flush_rq->tag = -1;
+		if (!q->elevator) {
+			blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq);
+			flush_rq->tag = -1;
+		} else {
+			blk_mq_put_driver_tag_hctx(hctx, flush_rq);
+			flush_rq->internal_tag = -1;
+		}
 	}
 
 	running = &fq->flush_queue[fq->flush_running_idx];
@@ -318,19 +323,26 @@ static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq)
 	blk_rq_init(q, flush_rq);
 
 	/*
-	 * Borrow tag from the first request since they can't
-	 * be in flight at the same time. And acquire the tag's
-	 * ownership for flush req.
+	 * In case of none scheduler, borrow tag from the first request
+	 * since they can't be in flight at the same time. And acquire
+	 * the tag's ownership for flush req.
+	 *
+	 * In case of IO scheduler, flush rq need to borrow scheduler tag
+	 * just for cheating put/get driver tag.
 	 */
 	if (q->mq_ops) {
 		struct blk_mq_hw_ctx *hctx;
 
 		flush_rq->mq_ctx = first_rq->mq_ctx;
-		flush_rq->tag = first_rq->tag;
-		fq->orig_rq = first_rq;
 
-		hctx = blk_mq_map_queue(q, first_rq->mq_ctx->cpu);
-		blk_mq_tag_set_rq(hctx, first_rq->tag, flush_rq);
+		if (!q->elevator) {
+			fq->orig_rq = first_rq;
+			flush_rq->tag = first_rq->tag;
+			hctx = blk_mq_map_queue(q, first_rq->mq_ctx->cpu);
+			blk_mq_tag_set_rq(hctx, first_rq->tag, flush_rq);
+		} else {
+			flush_rq->internal_tag = first_rq->internal_tag;
+		}
 	}
 
 	flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH;
@@ -394,6 +406,11 @@ static void mq_flush_data_end_io(struct request *rq, blk_status_t error)
 
 	hctx = blk_mq_map_queue(q, ctx->cpu);
 
+	if (q->elevator) {
+		WARN_ON(rq->tag < 0);
+		blk_mq_put_driver_tag_hctx(hctx, rq);
+	}
+
 	/*
 	 * After populating an empty queue, kick it to avoid stall.  Read
 	 * the comment in flush_end_io().
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index e7094f44afaf..01a43fed6b8c 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -356,29 +356,12 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
 		return true;
 	}
 
-	if (has_sched) {
+	if (has_sched)
 		rq->rq_flags |= RQF_SORTED;
-		WARN_ON(rq->tag != -1);
-	}
 
 	return false;
 }
 
-/*
- * Add flush/fua to the queue. If we fail getting a driver tag, then
- * punt to the requeue list. Requeue will re-invoke us from a context
- * that's safe to block from.
- */
-static void blk_mq_sched_insert_flush(struct blk_mq_hw_ctx *hctx,
-				      struct request *rq, bool can_block)
-{
-	if (blk_mq_get_driver_tag(rq, &hctx, can_block)) {
-		blk_insert_flush(rq);
-		blk_mq_run_hw_queue(hctx, true);
-	} else
-		blk_mq_add_to_requeue_list(rq, false, true);
-}
-
 void blk_mq_sched_insert_request(struct request *rq, bool at_head,
 				 bool run_queue, bool async, bool can_block)
 {
@@ -389,10 +372,12 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head,
 
 	/* flush rq in flush machinery need to be dispatched directly */
 	if (!(rq->rq_flags & RQF_FLUSH_SEQ) && op_is_flush(rq->cmd_flags)) {
-		blk_mq_sched_insert_flush(hctx, rq, can_block);
-		return;
+		blk_insert_flush(rq);
+		goto run;
 	}
 
+	WARN_ON(e && (rq->tag != -1));
+
 	if (blk_mq_sched_bypass_insert(hctx, !!e, rq))
 		goto run;
 
@@ -419,23 +404,6 @@ void blk_mq_sched_insert_requests(struct request_queue *q,
 	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
 	struct elevator_queue *e = hctx->queue->elevator;
 
-	if (e) {
-		struct request *rq, *next;
-
-		/*
-		 * We bypass requests that already have a driver tag assigned,
-		 * which should only be flushes. Flushes are only ever inserted
-		 * as single requests, so we shouldn't ever hit the
-		 * WARN_ON_ONCE() below (but let's handle it just in case).
-		 */
-		list_for_each_entry_safe(rq, next, list, queuelist) {
-			if (WARN_ON_ONCE(rq->tag != -1)) {
-				list_del_init(&rq->queuelist);
-				blk_mq_sched_bypass_insert(hctx, true, rq);
-			}
-		}
-	}
-
 	if (e && e->type->ops.mq.insert_requests)
 		e->type->ops.mq.insert_requests(hctx, list, false);
 	else
diff --git a/block/blk-mq.c b/block/blk-mq.c
index aff2ada523c3..b490a1c2acc8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -655,6 +655,8 @@ static void __blk_mq_requeue_request(struct request *rq)
 {
 	struct request_queue *q = rq->q;
 
+	blk_mq_put_driver_tag(rq);
+
 	trace_block_rq_requeue(q, rq);
 	wbt_requeue(q->rq_wb, &rq->issue_stat);
 
@@ -982,30 +984,6 @@ bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
 	return rq->tag != -1;
 }
 
-/*
- * If we fail getting a driver tag because all the driver tags are already
- * assigned and on the dispatch list, BUT the first entry does not have a
- * tag, then we could deadlock. For that case, move entries with assigned
- * driver tags to the front, leaving the set of tagged requests in the
- * same order, and the untagged set in the same order.
- */
-static bool reorder_tags_to_front(struct list_head *list)
-{
-	struct request *rq, *tmp, *first = NULL;
-
-	list_for_each_entry_safe_reverse(rq, tmp, list, queuelist) {
-		if (rq == first)
-			break;
-		if (rq->tag != -1) {
-			list_move(&rq->queuelist, list);
-			if (!first)
-				first = rq;
-		}
-	}
-
-	return first != NULL;
-}
-
 static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode, int flags,
 				void *key)
 {
@@ -1066,9 +1044,6 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 
 		rq = list_first_entry(list, struct request, queuelist);
 		if (!blk_mq_get_driver_tag(rq, &hctx, false)) {
-			if (!queued && reorder_tags_to_front(list))
-				continue;
-
 			/*
 			 * The initial allocation attempt failed, so we need to
 			 * rerun the hardware queue when a tag is freed.
@@ -1119,7 +1094,6 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 				nxt = list_first_entry(list, struct request, queuelist);
 				blk_mq_put_driver_tag(nxt);
 			}
-			blk_mq_put_driver_tag_hctx(hctx, rq);
 			list_add(&rq->queuelist, list);
 			__blk_mq_requeue_request(rq);
 			break;
@@ -1706,13 +1680,10 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 	if (unlikely(is_flush_fua)) {
 		blk_mq_put_ctx(data.ctx);
 		blk_mq_bio_to_request(rq, bio);
-		if (q->elevator) {
-			blk_mq_sched_insert_request(rq, false, true, true,
-					true);
-		} else {
-			blk_insert_flush(rq);
-			blk_mq_run_hw_queue(data.hctx, true);
-		}
+
+		/* bypass scheduler for flush rq */
+		blk_insert_flush(rq);
+		blk_mq_run_hw_queue(data.hctx, true);
 	} else if (plug && q->nr_hw_queues == 1) {
 		struct request *last = NULL;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [stable-4.14 18/23] Revert "blk-mq: don't handle TAG_SHARED in restart"
  2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
                   ` (16 preceding siblings ...)
  2018-07-23 13:24 ` [stable-4.14 17/23] blk-mq: don't allocate driver tag upfront for flush rq Jack Wang
@ 2018-07-23 13:24 ` Jack Wang
  2018-07-23 13:24 ` [stable-4.14 19/23] blk-mq: put driver tag if dispatch budget can't be got Jack Wang
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 13:24 UTC (permalink / raw)
  To: gregkh, stable; +Cc: Jens Axboe, Jack Wang

From: Jens Axboe <axboe@kernel.dk>

commit 05b79413946d8b2b58999ea1ae844b6fc3c54f61 upstream

This reverts commit 358a3a6bccb74da9d63a26b2dd5f09f1e9970e0b.

We have cases that aren't covered 100% in the drivers, so for now
we have to retain the shared tag restart loops.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 block/blk-mq-sched.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 74 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 01a43fed6b8c..6f4bdb8209f7 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -68,17 +68,25 @@ static void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx)
 		set_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
 }
 
-void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
+static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx)
 {
 	if (!test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
-		return;
+		return false;
 
-	clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
+	if (hctx->flags & BLK_MQ_F_TAG_SHARED) {
+		struct request_queue *q = hctx->queue;
+
+		if (test_and_clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
+			atomic_dec(&q->shared_hctx_restart);
+	} else
+		clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
 
 	if (blk_mq_hctx_has_pending(hctx)) {
 		blk_mq_run_hw_queue(hctx, true);
-		return;
+		return true;
 	}
+
+	return false;
 }
 
 /*
@@ -362,6 +370,68 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
 	return false;
 }
 
+/**
+ * list_for_each_entry_rcu_rr - iterate in a round-robin fashion over rcu list
+ * @pos:    loop cursor.
+ * @skip:   the list element that will not be examined. Iteration starts at
+ *          @skip->next.
+ * @head:   head of the list to examine. This list must have at least one
+ *          element, namely @skip.
+ * @member: name of the list_head structure within typeof(*pos).
+ */
+#define list_for_each_entry_rcu_rr(pos, skip, head, member)		\
+	for ((pos) = (skip);						\
+	     (pos = (pos)->member.next != (head) ? list_entry_rcu(	\
+			(pos)->member.next, typeof(*pos), member) :	\
+	      list_entry_rcu((pos)->member.next->next, typeof(*pos), member)), \
+	     (pos) != (skip); )
+
+/*
+ * Called after a driver tag has been freed to check whether a hctx needs to
+ * be restarted. Restarts @hctx if its tag set is not shared. Restarts hardware
+ * queues in a round-robin fashion if the tag set of @hctx is shared with other
+ * hardware queues.
+ */
+void blk_mq_sched_restart(struct blk_mq_hw_ctx *const hctx)
+{
+	struct blk_mq_tags *const tags = hctx->tags;
+	struct blk_mq_tag_set *const set = hctx->queue->tag_set;
+	struct request_queue *const queue = hctx->queue, *q;
+	struct blk_mq_hw_ctx *hctx2;
+	unsigned int i, j;
+
+	if (set->flags & BLK_MQ_F_TAG_SHARED) {
+		/*
+		 * If this is 0, then we know that no hardware queues
+		 * have RESTART marked. We're done.
+		 */
+		if (!atomic_read(&queue->shared_hctx_restart))
+			return;
+
+		rcu_read_lock();
+		list_for_each_entry_rcu_rr(q, queue, &set->tag_list,
+					   tag_set_list) {
+			queue_for_each_hw_ctx(q, hctx2, i)
+				if (hctx2->tags == tags &&
+				    blk_mq_sched_restart_hctx(hctx2))
+					goto done;
+		}
+		j = hctx->queue_num + 1;
+		for (i = 0; i < queue->nr_hw_queues; i++, j++) {
+			if (j == queue->nr_hw_queues)
+				j = 0;
+			hctx2 = queue->queue_hw_ctx[j];
+			if (hctx2->tags == tags &&
+			    blk_mq_sched_restart_hctx(hctx2))
+				break;
+		}
+done:
+		rcu_read_unlock();
+	} else {
+		blk_mq_sched_restart_hctx(hctx);
+	}
+}
+
 void blk_mq_sched_insert_request(struct request *rq, bool at_head,
 				 bool run_queue, bool async, bool can_block)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [stable-4.14 19/23] blk-mq: put driver tag if dispatch budget can't be got
  2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
                   ` (17 preceding siblings ...)
  2018-07-23 13:24 ` [stable-4.14 18/23] Revert "blk-mq: don't handle TAG_SHARED in restart" Jack Wang
@ 2018-07-23 13:24 ` Jack Wang
  2018-07-23 13:24 ` [stable-4.14 20/23] blk-mq: fix issue with shared tag queue re-running Jack Wang
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 13:24 UTC (permalink / raw)
  To: gregkh, stable; +Cc: Ming Lei, Jens Axboe, Jack Wang

From: Ming Lei <ming.lei@redhat.com>

commit 0c6af1ccd5fd9ac640aef01c8de0043837451a04 upstream

We have to put the driver tag if dispatch budget can't be got, otherwise
it might cause IO deadlock, especially in case that size of tags is very
small.

Fixes: de1482974080(blk-mq: introduce .get_budget and .put_budget in blk_mq_ops)
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 block/blk-mq.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index b490a1c2acc8..657cfde50cda 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1066,8 +1066,10 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 			}
 		}
 
-		if (!got_budget && !blk_mq_get_dispatch_budget(hctx))
+		if (!got_budget && !blk_mq_get_dispatch_budget(hctx)) {
+			blk_mq_put_driver_tag(rq);
 			break;
+		}
 
 		list_del_init(&rq->queuelist);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [stable-4.14 20/23] blk-mq: fix issue with shared tag queue re-running
  2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
                   ` (18 preceding siblings ...)
  2018-07-23 13:24 ` [stable-4.14 19/23] blk-mq: put driver tag if dispatch budget can't be got Jack Wang
@ 2018-07-23 13:24 ` Jack Wang
  2018-07-23 13:24 ` [stable-4.14 21/23] blk-mq: only run the hardware queue if IO is pending Jack Wang
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 13:24 UTC (permalink / raw)
  To: gregkh, stable; +Cc: Jens Axboe, Jack Wang

From: Jens Axboe <axboe@kernel.dk>

commit eb619fdb2d4cb8b3d3419e9113921e87e7daf557 upstream

This patch attempts to make the case of hctx re-running on driver tag
failure more robust. Without this patch, it's pretty easy to trigger a
stall condition with shared tags. An example is using null_blk like
this:

modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 submit_queues=1 hw_queue_depth=1

which sets up 4 devices, sharing the same tag set with a depth of 1.
Running a fio job ala:

[global]
bs=4k
rw=randread
norandommap
direct=1
ioengine=libaio
iodepth=4

[nullb0]
filename=/dev/nullb0
[nullb1]
filename=/dev/nullb1
[nullb2]
filename=/dev/nullb2
[nullb3]
filename=/dev/nullb3

will inevitably end with one or more threads being stuck waiting for a
scheduler tag. That IO is then stuck forever, until someone else
triggers a run of the queue.

Ensure that we always re-run the hardware queue, if the driver tag we
were waiting for got freed before we added our leftover request entries
back on the dispatch list.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Tested-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 block/blk-mq-debugfs.c |  1 -
 block/blk-mq.c         | 85 ++++++++++++++++++++++++++++----------------------
 include/linux/blk-mq.h |  5 ++-
 3 files changed, 50 insertions(+), 41 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index d95439154556..fe880e194cdb 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -180,7 +180,6 @@ static const char *const hctx_state_name[] = {
 	HCTX_STATE_NAME(STOPPED),
 	HCTX_STATE_NAME(TAG_ACTIVE),
 	HCTX_STATE_NAME(SCHED_RESTART),
-	HCTX_STATE_NAME(TAG_WAITING),
 	HCTX_STATE_NAME(START_ON_RUN),
 };
 #undef HCTX_STATE_NAME
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 657cfde50cda..7b45f20ae7c1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -984,49 +984,64 @@ bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
 	return rq->tag != -1;
 }
 
-static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode, int flags,
-				void *key)
+static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
+				int flags, void *key)
 {
 	struct blk_mq_hw_ctx *hctx;
 
 	hctx = container_of(wait, struct blk_mq_hw_ctx, dispatch_wait);
 
-	list_del(&wait->entry);
-	clear_bit_unlock(BLK_MQ_S_TAG_WAITING, &hctx->state);
+	list_del_init(&wait->entry);
 	blk_mq_run_hw_queue(hctx, true);
 	return 1;
 }
 
-static bool blk_mq_dispatch_wait_add(struct blk_mq_hw_ctx *hctx)
+static bool blk_mq_dispatch_wait_add(struct blk_mq_hw_ctx **hctx,
+				     struct request *rq)
 {
+	struct blk_mq_hw_ctx *this_hctx = *hctx;
+	wait_queue_entry_t *wait = &this_hctx->dispatch_wait;
 	struct sbq_wait_state *ws;
 
+	if (!list_empty_careful(&wait->entry))
+		return false;
+
+	spin_lock(&this_hctx->lock);
+	if (!list_empty(&wait->entry)) {
+		spin_unlock(&this_hctx->lock);
+		return false;
+	}
+
+	ws = bt_wait_ptr(&this_hctx->tags->bitmap_tags, this_hctx);
+	add_wait_queue(&ws->wait, wait);
+
 	/*
-	 * The TAG_WAITING bit serves as a lock protecting hctx->dispatch_wait.
-	 * The thread which wins the race to grab this bit adds the hardware
-	 * queue to the wait queue.
+	 * It's possible that a tag was freed in the window between the
+	 * allocation failure and adding the hardware queue to the wait
+	 * queue.
 	 */
-	if (test_bit(BLK_MQ_S_TAG_WAITING, &hctx->state) ||
-	    test_and_set_bit_lock(BLK_MQ_S_TAG_WAITING, &hctx->state))
+	if (!blk_mq_get_driver_tag(rq, hctx, false)) {
+		spin_unlock(&this_hctx->lock);
 		return false;
-
-	init_waitqueue_func_entry(&hctx->dispatch_wait, blk_mq_dispatch_wake);
-	ws = bt_wait_ptr(&hctx->tags->bitmap_tags, hctx);
+	}
 
 	/*
-	 * As soon as this returns, it's no longer safe to fiddle with
-	 * hctx->dispatch_wait, since a completion can wake up the wait queue
-	 * and unlock the bit.
+	 * We got a tag, remove ourselves from the wait queue to ensure
+	 * someone else gets the wakeup.
 	 */
-	add_wait_queue(&ws->wait, &hctx->dispatch_wait);
+	spin_lock_irq(&ws->wait.lock);
+	list_del_init(&wait->entry);
+	spin_unlock_irq(&ws->wait.lock);
+	spin_unlock(&this_hctx->lock);
 	return true;
 }
 
 bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
-		bool got_budget)
+			     bool got_budget)
 {
 	struct blk_mq_hw_ctx *hctx;
 	struct request *rq, *nxt;
+	bool no_tag = false;
 	int errors, queued;
 
 	if (list_empty(list))
@@ -1046,22 +1061,15 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 		if (!blk_mq_get_driver_tag(rq, &hctx, false)) {
 			/*
 			 * The initial allocation attempt failed, so we need to
-			 * rerun the hardware queue when a tag is freed.
+			 * rerun the hardware queue when a tag is freed. The
+			 * waitqueue takes care of that. If the queue is run
+			 * before we add this entry back on the dispatch list,
+			 * we'll re-run it below.
 			 */
-			if (!blk_mq_dispatch_wait_add(hctx)) {
-				if (got_budget)
-					blk_mq_put_dispatch_budget(hctx);
-				break;
-			}
-
-			/*
-			 * It's possible that a tag was freed in the window
-			 * between the allocation failure and adding the
-			 * hardware queue to the wait queue.
-			 */
-			if (!blk_mq_get_driver_tag(rq, &hctx, false)) {
+			if (!blk_mq_dispatch_wait_add(&hctx, rq)) {
 				if (got_budget)
 					blk_mq_put_dispatch_budget(hctx);
+				no_tag = true;
 				break;
 			}
 		}
@@ -1126,10 +1134,10 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 		 * it is no longer set that means that it was cleared by another
 		 * thread and hence that a queue rerun is needed.
 		 *
-		 * If TAG_WAITING is set that means that an I/O scheduler has
-		 * been configured and another thread is waiting for a driver
-		 * tag. To guarantee fairness, do not rerun this hardware queue
-		 * but let the other thread grab the driver tag.
+		 * If 'no_tag' is set, that means that we failed getting
+		 * a driver tag with an I/O scheduler attached. If our dispatch
+		 * waitqueue is no longer active, ensure that we run the queue
+		 * AFTER adding our entries back to the list.
 		 *
 		 * If no I/O scheduler has been configured it is possible that
 		 * the hardware queue got stopped and restarted before requests
@@ -1141,8 +1149,8 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 		 *   returning BLK_STS_RESOURCE. Two exceptions are scsi-mq
 		 *   and dm-rq.
 		 */
-		if (!blk_mq_sched_needs_restart(hctx) &&
-		    !test_bit(BLK_MQ_S_TAG_WAITING, &hctx->state))
+		if (!blk_mq_sched_needs_restart(hctx) ||
+		    (no_tag && list_empty_careful(&hctx->dispatch_wait.entry)))
 			blk_mq_run_hw_queue(hctx, true);
 	}
 
@@ -2029,6 +2037,9 @@ static int blk_mq_init_hctx(struct request_queue *q,
 
 	hctx->nr_ctx = 0;
 
+	init_waitqueue_func_entry(&hctx->dispatch_wait, blk_mq_dispatch_wake);
+	INIT_LIST_HEAD(&hctx->dispatch_wait.entry);
+
 	if (set->ops->init_hctx &&
 	    set->ops->init_hctx(hctx, set->driver_data, hctx_idx))
 		goto free_bitmap;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 4564bd216431..cc51f940aec4 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -36,7 +36,7 @@ struct blk_mq_hw_ctx {
 	struct blk_mq_ctx	**ctxs;
 	unsigned int		nr_ctx;
 
-	wait_queue_entry_t		dispatch_wait;
+	wait_queue_entry_t	dispatch_wait;
 	atomic_t		wait_index;
 
 	struct blk_mq_tags	*tags;
@@ -182,8 +182,7 @@ enum {
 	BLK_MQ_S_STOPPED	= 0,
 	BLK_MQ_S_TAG_ACTIVE	= 1,
 	BLK_MQ_S_SCHED_RESTART	= 2,
-	BLK_MQ_S_TAG_WAITING	= 3,
-	BLK_MQ_S_START_ON_RUN	= 4,
+	BLK_MQ_S_START_ON_RUN	= 3,
 
 	BLK_MQ_MAX_DEPTH	= 10240,
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [stable-4.14 21/23] blk-mq: only run the hardware queue if IO is pending
  2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
                   ` (19 preceding siblings ...)
  2018-07-23 13:24 ` [stable-4.14 20/23] blk-mq: fix issue with shared tag queue re-running Jack Wang
@ 2018-07-23 13:24 ` Jack Wang
  2018-07-23 13:24 ` [stable-4.14 22/23] blk-mq: quiesce queue during switching io sched and updating nr_requests Jack Wang
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 13:24 UTC (permalink / raw)
  To: gregkh, stable; +Cc: Jens Axboe, Jack Wang

From: Jens Axboe <axboe@kernel.dk>

commit 79f720a751cad613620d0237e3b44f89f4a69181 upstream

Currently we are inconsistent in when we decide to run the queue. Using
blk_mq_run_hw_queues() we check if the hctx has pending IO before
running it, but we don't do that from the individual queue run function,
blk_mq_run_hw_queue(). This results in a lot of extra and pointless
queue runs, potentially, on flush requests and (much worse) on tag
starvation situations. This is observable just looking at top output,
with lots of kworkers active. For the !async runs, it just adds to the
CPU overhead of blk-mq.

Move the has-pending check into the run function instead of having
callers do it.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 block/blk-mq-sched.c   |  7 +------
 block/blk-mq.c         | 18 +++++++++++-------
 block/blk-mq.h         |  2 --
 include/linux/blk-mq.h |  2 +-
 4 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 6f4bdb8209f7..c117bd8fd1f6 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -81,12 +81,7 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx)
 	} else
 		clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
 
-	if (blk_mq_hctx_has_pending(hctx)) {
-		blk_mq_run_hw_queue(hctx, true);
-		return true;
-	}
-
-	return false;
+	return blk_mq_run_hw_queue(hctx, true);
 }
 
 /*
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7b45f20ae7c1..b3f15d9c7c5a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -60,10 +60,10 @@ static int blk_mq_poll_stats_bkt(const struct request *rq)
 /*
  * Check if any of the ctx's have pending work in this hardware queue
  */
-bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
+static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
 {
-	return sbitmap_any_bit_set(&hctx->ctx_map) ||
-			!list_empty_careful(&hctx->dispatch) ||
+	return !list_empty_careful(&hctx->dispatch) ||
+		sbitmap_any_bit_set(&hctx->ctx_map) ||
 			blk_mq_sched_has_work(hctx);
 }
 
@@ -1261,9 +1261,14 @@ void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs)
 }
 EXPORT_SYMBOL(blk_mq_delay_run_hw_queue);
 
-void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
+bool blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
 {
-	__blk_mq_delay_run_hw_queue(hctx, async, 0);
+	if (blk_mq_hctx_has_pending(hctx)) {
+		__blk_mq_delay_run_hw_queue(hctx, async, 0);
+		return true;
+	}
+
+	return false;
 }
 EXPORT_SYMBOL(blk_mq_run_hw_queue);
 
@@ -1273,8 +1278,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
 	int i;
 
 	queue_for_each_hw_ctx(q, hctx, i) {
-		if (!blk_mq_hctx_has_pending(hctx) ||
-		    blk_mq_hctx_stopped(hctx))
+		if (blk_mq_hctx_stopped(hctx))
 			continue;
 
 		blk_mq_run_hw_queue(hctx, async);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index a5c2b46951e4..1b849a163e9d 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -27,14 +27,12 @@ struct blk_mq_ctx {
 	struct kobject		kobj;
 } ____cacheline_aligned_in_smp;
 
-void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
 void blk_mq_freeze_queue(struct request_queue *q);
 void blk_mq_free_queue(struct request_queue *q);
 int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr);
 void blk_mq_wake_waiters(struct request_queue *q);
 bool blk_mq_dispatch_rq_list(struct request_queue *, struct list_head *, bool);
 void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list);
-bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx);
 bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
 				bool wait);
 struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index cc51f940aec4..d0162ea97078 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -261,7 +261,7 @@ void blk_mq_start_stopped_hw_queues(struct request_queue *q, bool async);
 void blk_mq_quiesce_queue(struct request_queue *q);
 void blk_mq_unquiesce_queue(struct request_queue *q);
 void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs);
-void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
+bool blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
 void blk_mq_run_hw_queues(struct request_queue *q, bool async);
 void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs);
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [stable-4.14 22/23] blk-mq: quiesce queue during switching io sched and updating nr_requests
  2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
                   ` (20 preceding siblings ...)
  2018-07-23 13:24 ` [stable-4.14 21/23] blk-mq: only run the hardware queue if IO is pending Jack Wang
@ 2018-07-23 13:24 ` Jack Wang
  2018-07-23 13:24 ` [stable-4.14 23/23] scsi: core: run queue if SCSI device queue isn't ready and queue is idle Jack Wang
  2018-07-23 13:33 ` [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Greg KH
  23 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 13:24 UTC (permalink / raw)
  To: gregkh, stable; +Cc: Ming Lei, Jens Axboe, Jack Wang

From: Ming Lei <ming.lei@redhat.com>

commit 24f5a90f0d13a97b51aa79f468143fafea4246bb upstream

Dispatch may still be in-progress after queue is frozen, so we have to
quiesce queue before switching IO scheduler and updating nr_requests.

Also when switching io schedulers, blk_mq_run_hw_queue() may still be
called somewhere(such as from nvme_reset_work()), and io scheduler's
per-hctx data may not be setup yet, so cause oops even inside
blk_mq_hctx_has_pending(), such as it can be run just between:

        ret = e->ops.mq.init_sched(q, e);
AND
        ret = e->ops.mq.init_hctx(hctx, i)

inside blk_mq_init_sched().

This reverts commit 7a148c2fcff8330(block: don't call blk_mq_quiesce_queue()
after queue is frozen) basically, and makes sure blk_mq_hctx_has_pending
won't be called if queue is quiesced.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Fixes: 7a148c2fcff83309(block: don't call blk_mq_quiesce_queue() after queue is frozen)
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 block/blk-mq.c   | 27 ++++++++++++++++++++++++++-
 block/elevator.c |  2 ++
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index b3f15d9c7c5a..7d396b77e5e8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1263,7 +1263,30 @@ EXPORT_SYMBOL(blk_mq_delay_run_hw_queue);
 
 bool blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
 {
-	if (blk_mq_hctx_has_pending(hctx)) {
+	int srcu_idx;
+	bool need_run;
+
+	/*
+	 * When queue is quiesced, we may be switching io scheduler, or
+	 * updating nr_hw_queues, or other things, and we can't run queue
+	 * any more, even __blk_mq_hctx_has_pending() can't be called safely.
+	 *
+	 * And queue will be rerun in blk_mq_unquiesce_queue() if it is
+	 * quiesced.
+	 */
+	if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
+		rcu_read_lock();
+		need_run = !blk_queue_quiesced(hctx->queue) &&
+			blk_mq_hctx_has_pending(hctx);
+		rcu_read_unlock();
+	} else {
+		srcu_idx = srcu_read_lock(hctx->queue_rq_srcu);
+		need_run = !blk_queue_quiesced(hctx->queue) &&
+			blk_mq_hctx_has_pending(hctx);
+		srcu_read_unlock(hctx->queue_rq_srcu, srcu_idx);
+	}
+
+	if (need_run) {
 		__blk_mq_delay_run_hw_queue(hctx, async, 0);
 		return true;
 	}
@@ -2710,6 +2733,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 		return -EINVAL;
 
 	blk_mq_freeze_queue(q);
+	blk_mq_quiesce_queue(q);
 
 	ret = 0;
 	queue_for_each_hw_ctx(q, hctx, i) {
@@ -2734,6 +2758,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 	if (!ret)
 		q->nr_requests = nr;
 
+	blk_mq_unquiesce_queue(q);
 	blk_mq_unfreeze_queue(q);
 
 	return ret;
diff --git a/block/elevator.c b/block/elevator.c
index 153926a90901..229eb4b5604c 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -951,6 +951,7 @@ static int elevator_switch_mq(struct request_queue *q,
 	int ret;
 
 	blk_mq_freeze_queue(q);
+	blk_mq_quiesce_queue(q);
 
 	if (q->elevator) {
 		if (q->elevator->registered)
@@ -977,6 +978,7 @@ static int elevator_switch_mq(struct request_queue *q,
 		blk_add_trace_msg(q, "elv switch: none");
 
 out:
+	blk_mq_unquiesce_queue(q);
 	blk_mq_unfreeze_queue(q);
 	return ret;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [stable-4.14 23/23] scsi: core: run queue if SCSI device queue isn't ready and queue is idle
  2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
                   ` (21 preceding siblings ...)
  2018-07-23 13:24 ` [stable-4.14 22/23] blk-mq: quiesce queue during switching io sched and updating nr_requests Jack Wang
@ 2018-07-23 13:24 ` Jack Wang
  2018-07-23 13:33 ` [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Greg KH
  23 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 13:24 UTC (permalink / raw)
  To: gregkh, stable; +Cc: Ming Lei, Martin K . Petersen, Jack Wang

From: Ming Lei <ming.lei@redhat.com>

commit 7e70aa789d4a0c89dbfbd2c8a974a4df717475ec upstream

Before commit 0df21c86bdbf ("scsi: implement .get_budget and .put_budget
for blk-mq"), we run queue after 3ms if queue is idle and SCSI device
queue isn't ready, which is done in handling BLK_STS_RESOURCE. After
commit 0df21c86bdbf is introduced, queue won't be run any more under
this situation.

IO hang is observed when timeout happened, and this patch fixes the IO
hang issue by running queue after delay in scsi_dev_queue_ready, just
like non-mq. This issue can be triggered by the following script[1].

There is another issue which can be covered by running idle queue: when
.get_budget() is called on request coming from hctx->dispatch_list, if
one request just completes during .get_budget(), we can't depend on
SCSI's restart to make progress any more. This patch fixes the race too.

With this patch, we basically recover to previous behaviour (before
commit 0df21c86bdbf) of handling idle queue when running out of
resource.

[1] script for test/verify SCSI timeout
rmmod scsi_debug
modprobe scsi_debug max_queue=1

DEVICE=`ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head -1 | xargs basename`
DISK_DIR=`ls -d /sys/block/$DEVICE/device/scsi_disk/*`

echo "using scsi device $DEVICE"
echo "-1" >/sys/bus/pseudo/drivers/scsi_debug/every_nth
echo "temporary write through" >$DISK_DIR/cache_type
echo "128" >/sys/bus/pseudo/drivers/scsi_debug/opts
echo none > /sys/block/$DEVICE/queue/scheduler
dd if=/dev/$DEVICE of=/dev/null bs=1M iflag=direct count=1 &
sleep 5
echo "0" >/sys/bus/pseudo/drivers/scsi_debug/opts
wait
echo "SUCCESS"

Fixes: 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for blk-mq")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 drivers/scsi/scsi_lib.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9e242583ab83..175981058396 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1994,6 +1994,8 @@ static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
 out_put_device:
 	put_device(&sdev->sdev_gendev);
 out:
+	if (atomic_read(&sdev->device_busy) == 0 && !scsi_device_blocked(sdev))
+		blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
 	return false;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [stable-4.14 00/23] block/scsi multiqueue performance enhancement and
  2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
                   ` (22 preceding siblings ...)
  2018-07-23 13:24 ` [stable-4.14 23/23] scsi: core: run queue if SCSI device queue isn't ready and queue is idle Jack Wang
@ 2018-07-23 13:33 ` Greg KH
  2018-07-23 15:00     ` Jack Wang
  23 siblings, 1 reply; 35+ messages in thread
From: Greg KH @ 2018-07-23 13:33 UTC (permalink / raw)
  To: Jack Wang; +Cc: stable

On Mon, Jul 23, 2018 at 03:24:22PM +0200, Jack Wang wrote:
> Hi Greg,
>  
> Please consider this patchset, which include block/scsi multiqueue performance
> enhancement and bugfix.

What exactly is the performance enhancement?  How can you measure it?
How did you measure it?

Which one(s) are the bugfixes in this series?  Why not just submit those
separately as they should be "obvious fixes", right?

And why didn't you cc: the subsystem maintainer and mailing list here?
I need their ack before I could take these.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [stable-4.14 01/23] blk-mq-sched: move actual dispatching into one helper
  2018-07-23 13:24 ` [stable-4.14 01/23] blk-mq-sched: move actual dispatching into one helper Jack Wang
@ 2018-07-23 14:31   ` Jens Axboe
  2018-07-23 14:36     ` Greg KH
  0 siblings, 1 reply; 35+ messages in thread
From: Jens Axboe @ 2018-07-23 14:31 UTC (permalink / raw)
  To: Jack Wang, gregkh, stable; +Cc: Ming Lei, Jack Wang

On 7/23/18 7:24 AM, Jack Wang wrote:
> From: Ming Lei <ming.lei@redhat.com>
> 
> commit caf8eb0d604a0eaeb8111eb4d36853a6d08eebe7 upstream
> 
> So that it becomes easy to support to dispatch from sw queue in the
> following patch.
> 
> No functional change.

What is this huge series of 23 patches?!

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [stable-4.14 01/23] blk-mq-sched: move actual dispatching into one helper
  2018-07-23 14:31   ` Jens Axboe
@ 2018-07-23 14:36     ` Greg KH
  0 siblings, 0 replies; 35+ messages in thread
From: Greg KH @ 2018-07-23 14:36 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Jack Wang, stable, Ming Lei, Jack Wang

On Mon, Jul 23, 2018 at 08:31:31AM -0600, Jens Axboe wrote:
> On 7/23/18 7:24 AM, Jack Wang wrote:
> > From: Ming Lei <ming.lei@redhat.com>
> > 
> > commit caf8eb0d604a0eaeb8111eb4d36853a6d08eebe7 upstream
> > 
> > So that it becomes easy to support to dispatch from sw queue in the
> > following patch.
> > 
> > No functional change.
> 
> What is this huge series of 23 patches?!

I already complained that you were not cc:ed on the 00/23 email :)

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [stable-4.14 00/23] block/scsi multiqueue performance enhancement and
  2018-07-23 13:33 ` [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Greg KH
@ 2018-07-23 15:00     ` Jack Wang
  0 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 15:00 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-scsi, linux-block, Jens Axboe
  Cc: Wang Jinpu, stable, Jinpu Wang

Hi Greg,

Thanks for quick reply. Please see reply inline.



Greg KH <gregkh@linuxfoundation.org> =E4=BA=8E2018=E5=B9=B47=E6=9C=8823=E6=
=97=A5=E5=91=A8=E4=B8=80 =E4=B8=8B=E5=8D=883:34=E5=86=99=E9=81=93=EF=BC=9A
>
> On Mon, Jul 23, 2018 at 03:24:22PM +0200, Jack Wang wrote:
> > Hi Greg,
> >
> > Please consider this patchset, which include block/scsi multiqueue perf=
ormance
> > enhancement and bugfix.
>
> What exactly is the performance enhancement?  How can you measure it?
> How did you measure it?
I'm testing on SRP/IBNBD using fio, I've seen +10% with mix IO load,
and 50% improvement on small IO (bs=3D512.) with the patchset.

>
> Which one(s) are the bugfixes in this series?  Why not just submit those
> separately as they should be "obvious fixes", right?
We had queue stall problem on shared hctx restart, my colleague Roman
reported to upstream:
https://lkml.org/lkml/2017/10/18/263
the patch wasn't taken by upstream, and a bigger patchset from Lei
Ming was accepted instread with revert and further bugfix later on.

following patches meant for improving performance on scsi-mq, and also
contains the attempt to fix the queue stall problem
 358a3a6bccb7 ("blk-mq: don't handle TAG_SHARED in restart")
  blk-mq-sched: move actual dispatching into one helper
  blk-mq: introduce .get_budget and .put_budget in blk_mq_ops
  sbitmap: introduce __sbitmap_for_each_set()
  blk-mq-sched: improve dispatching from sw queue
  scsi: allow passing in null rq to scsi_prep_state_check()
  scsi: implement .get_budget and .put_budget for blk-mq
  SCSI: don't get target/host busy_count in scsi_mq_get_budget()
  blk-mq: don't handle TAG_SHARED in restart
  blk-mq: don't restart queue when .get_budget returns BLK_STS_RESOURCE
  blk-mq: don't handle failure in .get_budget

Following is independent fix:
blk-mq: quiesce queue during switching io sched and updating nr_requests
blk-mq: fix issue with shared tag queue re-running

These are clearup for flush request handling:
  blk-flush: don't run queue for requests bypassing flush
  block: pass 'run_queue' to blk_mq_request_bypass_insert
  blk-flush: use blk_mq_request_bypass_insert()
  blk-mq-sched: decide how to handle flush rq via RQF_FLUSH_SEQ
  blk-mq: move blk_mq_put_driver_tag*() into blk-mq.h
  blk-mq: don't allocate driver tag upfront for flush rq

The left are following up fixes.
>
> And why didn't you cc: the subsystem maintainer and mailing list here?
> I need their ack before I could take these.
Sorry, my mistake, now add Jens, linux-block, ilnux-scsi in reply.

Sorry for confusion.
>
> thanks,
>
> greg k-h

Regards,
Jack Wang

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [stable-4.14 00/23] block/scsi multiqueue performance enhancement and
@ 2018-07-23 15:00     ` Jack Wang
  0 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 15:00 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-scsi, linux-block, Jens Axboe
  Cc: Wang Jinpu, stable, Jinpu Wang

Hi Greg,

Thanks for quick reply. Please see reply inline.



Greg KH <gregkh@linuxfoundation.org> 于2018年7月23日周一 下午3:34写道：
>
> On Mon, Jul 23, 2018 at 03:24:22PM +0200, Jack Wang wrote:
> > Hi Greg,
> >
> > Please consider this patchset, which include block/scsi multiqueue performance
> > enhancement and bugfix.
>
> What exactly is the performance enhancement?  How can you measure it?
> How did you measure it?
I'm testing on SRP/IBNBD using fio, I've seen +10% with mix IO load,
and 50% improvement on small IO (bs=512.) with the patchset.

>
> Which one(s) are the bugfixes in this series?  Why not just submit those
> separately as they should be "obvious fixes", right?
We had queue stall problem on shared hctx restart, my colleague Roman
reported to upstream:
https://lkml.org/lkml/2017/10/18/263
the patch wasn't taken by upstream, and a bigger patchset from Lei
Ming was accepted instread with revert and further bugfix later on.

following patches meant for improving performance on scsi-mq, and also
contains the attempt to fix the queue stall problem
 358a3a6bccb7 ("blk-mq: don't handle TAG_SHARED in restart")
  blk-mq-sched: move actual dispatching into one helper
  blk-mq: introduce .get_budget and .put_budget in blk_mq_ops
  sbitmap: introduce __sbitmap_for_each_set()
  blk-mq-sched: improve dispatching from sw queue
  scsi: allow passing in null rq to scsi_prep_state_check()
  scsi: implement .get_budget and .put_budget for blk-mq
  SCSI: don't get target/host busy_count in scsi_mq_get_budget()
  blk-mq: don't handle TAG_SHARED in restart
  blk-mq: don't restart queue when .get_budget returns BLK_STS_RESOURCE
  blk-mq: don't handle failure in .get_budget

Following is independent fix:
blk-mq: quiesce queue during switching io sched and updating nr_requests
blk-mq: fix issue with shared tag queue re-running

These are clearup for flush request handling:
  blk-flush: don't run queue for requests bypassing flush
  block: pass 'run_queue' to blk_mq_request_bypass_insert
  blk-flush: use blk_mq_request_bypass_insert()
  blk-mq-sched: decide how to handle flush rq via RQF_FLUSH_SEQ
  blk-mq: move blk_mq_put_driver_tag*() into blk-mq.h
  blk-mq: don't allocate driver tag upfront for flush rq

The left are following up fixes.
>
> And why didn't you cc: the subsystem maintainer and mailing list here?
> I need their ack before I could take these.
Sorry, my mistake, now add Jens, linux-block, ilnux-scsi in reply.

Sorry for confusion.
>
> thanks,
>
> greg k-h

Regards,
Jack Wang

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [stable-4.14 00/23] block/scsi multiqueue performance enhancement and
  2018-07-23 15:00     ` Jack Wang
  (?)
@ 2018-07-23 15:05     ` Jens Axboe
  2018-07-23 15:28         ` Jack Wang
  -1 siblings, 1 reply; 35+ messages in thread
From: Jens Axboe @ 2018-07-23 15:05 UTC (permalink / raw)
  To: Jack Wang, Greg Kroah-Hartman, linux-scsi, linux-block
  Cc: Wang Jinpu, stable, Jinpu Wang

On 7/23/18 9:00 AM, Jack Wang wrote:
> Hi Greg,
> 
> Thanks for quick reply. Please see reply inline.
> 
> 
> 
> Greg KH <gregkh@linuxfoundation.org> 于2018年7月23日周一 下午3:34写道：
>>
>> On Mon, Jul 23, 2018 at 03:24:22PM +0200, Jack Wang wrote:
>>> Hi Greg,
>>>
>>> Please consider this patchset, which include block/scsi multiqueue performance
>>> enhancement and bugfix.
>>
>> What exactly is the performance enhancement?  How can you measure it?
>> How did you measure it?
> I'm testing on SRP/IBNBD using fio, I've seen +10% with mix IO load,
> and 50% improvement on small IO (bs=512.) with the patchset.

Big nak on backporting this huge series, it's a lot of core changes.
It's way beyond the scope of a stable fix backport.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [stable-4.14 00/23] block/scsi multiqueue performance enhancement and
  2018-07-23 15:05     ` Jens Axboe
@ 2018-07-23 15:28         ` Jack Wang
  0 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 15:28 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Greg Kroah-Hartman, linux-scsi, linux-block, Wang Jinpu, stable,
	Jinpu Wang

Jens Axboe <axboe@kernel.dk> =E4=BA=8E2018=E5=B9=B47=E6=9C=8823=E6=97=A5=E5=
=91=A8=E4=B8=80 =E4=B8=8B=E5=8D=885:05=E5=86=99=E9=81=93=EF=BC=9A
>
> On 7/23/18 9:00 AM, Jack Wang wrote:
> > Hi Greg,
> >
> > Thanks for quick reply. Please see reply inline.
> >
> >
> >
> > Greg KH <gregkh@linuxfoundation.org> =E4=BA=8E2018=E5=B9=B47=E6=9C=8823=
=E6=97=A5=E5=91=A8=E4=B8=80 =E4=B8=8B=E5=8D=883:34=E5=86=99=E9=81=93=EF=BC=
=9A
> >>
> >> On Mon, Jul 23, 2018 at 03:24:22PM +0200, Jack Wang wrote:
> >>> Hi Greg,
> >>>
> >>> Please consider this patchset, which include block/scsi multiqueue pe=
rformance
> >>> enhancement and bugfix.
> >>
> >> What exactly is the performance enhancement?  How can you measure it?
> >> How did you measure it?
> > I'm testing on SRP/IBNBD using fio, I've seen +10% with mix IO load,
> > and 50% improvement on small IO (bs=3D512.) with the patchset.
>
> Big nak on backporting this huge series, it's a lot of core changes.
> It's way beyond the scope of a stable fix backport.
>
> --
> Jens Axboe
>
OK, could you shed light on how could we fix the queue stall problem on 4.1=
4?
My colleague Roman reported to upstream:
https://lkml.org/lkml/2017/10/18/263

It's still there on latest 4.14.

Thanks,
Jack

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [stable-4.14 00/23] block/scsi multiqueue performance enhancement and
@ 2018-07-23 15:28         ` Jack Wang
  0 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 15:28 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Greg Kroah-Hartman, linux-scsi, linux-block, Wang Jinpu, stable,
	Jinpu Wang

Jens Axboe <axboe@kernel.dk> 于2018年7月23日周一 下午5:05写道：
>
> On 7/23/18 9:00 AM, Jack Wang wrote:
> > Hi Greg,
> >
> > Thanks for quick reply. Please see reply inline.
> >
> >
> >
> > Greg KH <gregkh@linuxfoundation.org> 于2018年7月23日周一 下午3:34写道：
> >>
> >> On Mon, Jul 23, 2018 at 03:24:22PM +0200, Jack Wang wrote:
> >>> Hi Greg,
> >>>
> >>> Please consider this patchset, which include block/scsi multiqueue performance
> >>> enhancement and bugfix.
> >>
> >> What exactly is the performance enhancement?  How can you measure it?
> >> How did you measure it?
> > I'm testing on SRP/IBNBD using fio, I've seen +10% with mix IO load,
> > and 50% improvement on small IO (bs=512.) with the patchset.
>
> Big nak on backporting this huge series, it's a lot of core changes.
> It's way beyond the scope of a stable fix backport.
>
> --
> Jens Axboe
>
OK, could you shed light on how could we fix the queue stall problem on 4.14?
My colleague Roman reported to upstream:
https://lkml.org/lkml/2017/10/18/263

It's still there on latest 4.14.

Thanks,
Jack

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [stable-4.14 00/23] block/scsi multiqueue performance enhancement and
  2018-07-23 15:28         ` Jack Wang
  (?)
@ 2018-07-23 15:31         ` Jens Axboe
  2018-07-23 15:37             ` Jack Wang
  -1 siblings, 1 reply; 35+ messages in thread
From: Jens Axboe @ 2018-07-23 15:31 UTC (permalink / raw)
  To: Jack Wang
  Cc: Greg Kroah-Hartman, linux-scsi, linux-block, Wang Jinpu, stable,
	Jinpu Wang

On 7/23/18 9:28 AM, Jack Wang wrote:
> Jens Axboe <axboe@kernel.dk> 于2018年7月23日周一 下午5:05写道：
>>
>> On 7/23/18 9:00 AM, Jack Wang wrote:
>>> Hi Greg,
>>>
>>> Thanks for quick reply. Please see reply inline.
>>>
>>>
>>>
>>> Greg KH <gregkh@linuxfoundation.org> 于2018年7月23日周一 下午3:34写道：
>>>>
>>>> On Mon, Jul 23, 2018 at 03:24:22PM +0200, Jack Wang wrote:
>>>>> Hi Greg,
>>>>>
>>>>> Please consider this patchset, which include block/scsi multiqueue performance
>>>>> enhancement and bugfix.
>>>>
>>>> What exactly is the performance enhancement?  How can you measure it?
>>>> How did you measure it?
>>> I'm testing on SRP/IBNBD using fio, I've seen +10% with mix IO load,
>>> and 50% improvement on small IO (bs=512.) with the patchset.
>>
>> Big nak on backporting this huge series, it's a lot of core changes.
>> It's way beyond the scope of a stable fix backport.
>>
>> --
>> Jens Axboe
>>
> OK, could you shed light on how could we fix the queue stall problem on 4.14?
> My colleague Roman reported to upstream:
> https://lkml.org/lkml/2017/10/18/263
> 
> It's still there on latest 4.14.

The proposed patch is a helluva lot simpler than doing a 23 patch selective
backport.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [stable-4.14 00/23] block/scsi multiqueue performance enhancement and
  2018-07-23 15:31         ` Jens Axboe
@ 2018-07-23 15:37             ` Jack Wang
  0 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 15:37 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Greg Kroah-Hartman, linux-scsi, linux-block, Wang Jinpu, stable,
	Jinpu Wang

Jens Axboe <axboe@kernel.dk> =E4=BA=8E2018=E5=B9=B47=E6=9C=8823=E6=97=A5=E5=
=91=A8=E4=B8=80 =E4=B8=8B=E5=8D=885:31=E5=86=99=E9=81=93=EF=BC=9A
>
> On 7/23/18 9:28 AM, Jack Wang wrote:
> > Jens Axboe <axboe@kernel.dk> =E4=BA=8E2018=E5=B9=B47=E6=9C=8823=E6=97=
=A5=E5=91=A8=E4=B8=80 =E4=B8=8B=E5=8D=885:05=E5=86=99=E9=81=93=EF=BC=9A
> >>
> >> On 7/23/18 9:00 AM, Jack Wang wrote:
> >>> Hi Greg,
> >>>
> >>> Thanks for quick reply. Please see reply inline.
> >>>
> >>>
> >>>
> >>> Greg KH <gregkh@linuxfoundation.org> =E4=BA=8E2018=E5=B9=B47=E6=9C=88=
23=E6=97=A5=E5=91=A8=E4=B8=80 =E4=B8=8B=E5=8D=883:34=E5=86=99=E9=81=93=EF=
=BC=9A
> >>>>
> >>>> On Mon, Jul 23, 2018 at 03:24:22PM +0200, Jack Wang wrote:
> >>>>> Hi Greg,
> >>>>>
> >>>>> Please consider this patchset, which include block/scsi multiqueue =
performance
> >>>>> enhancement and bugfix.
> >>>>
> >>>> What exactly is the performance enhancement?  How can you measure it=
?
> >>>> How did you measure it?
> >>> I'm testing on SRP/IBNBD using fio, I've seen +10% with mix IO load,
> >>> and 50% improvement on small IO (bs=3D512.) with the patchset.
> >>
> >> Big nak on backporting this huge series, it's a lot of core changes.
> >> It's way beyond the scope of a stable fix backport.
> >>
> >> --
> >> Jens Axboe
> >>
> > OK, could you shed light on how could we fix the queue stall problem on=
 4.14?
> > My colleague Roman reported to upstream:
> > https://lkml.org/lkml/2017/10/18/263
> >
> > It's still there on latest 4.14.
>
> The proposed patch is a helluva lot simpler than doing a 23 patch selecti=
ve
> backport.
>
> --
> Jens Axboe
>
Do you think it make sense to port Roman's patch to upstream 4.14?

So others would also be benifitial?
On the other side, stable tree only takes patches accepted into Linus tree.

Thanks,
Jack

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [stable-4.14 00/23] block/scsi multiqueue performance enhancement and
@ 2018-07-23 15:37             ` Jack Wang
  0 siblings, 0 replies; 35+ messages in thread
From: Jack Wang @ 2018-07-23 15:37 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Greg Kroah-Hartman, linux-scsi, linux-block, Wang Jinpu, stable,
	Jinpu Wang

Jens Axboe <axboe@kernel.dk> 于2018年7月23日周一 下午5:31写道：
>
> On 7/23/18 9:28 AM, Jack Wang wrote:
> > Jens Axboe <axboe@kernel.dk> 于2018年7月23日周一 下午5:05写道：
> >>
> >> On 7/23/18 9:00 AM, Jack Wang wrote:
> >>> Hi Greg,
> >>>
> >>> Thanks for quick reply. Please see reply inline.
> >>>
> >>>
> >>>
> >>> Greg KH <gregkh@linuxfoundation.org> 于2018年7月23日周一 下午3:34写道：
> >>>>
> >>>> On Mon, Jul 23, 2018 at 03:24:22PM +0200, Jack Wang wrote:
> >>>>> Hi Greg,
> >>>>>
> >>>>> Please consider this patchset, which include block/scsi multiqueue performance
> >>>>> enhancement and bugfix.
> >>>>
> >>>> What exactly is the performance enhancement?  How can you measure it?
> >>>> How did you measure it?
> >>> I'm testing on SRP/IBNBD using fio, I've seen +10% with mix IO load,
> >>> and 50% improvement on small IO (bs=512.) with the patchset.
> >>
> >> Big nak on backporting this huge series, it's a lot of core changes.
> >> It's way beyond the scope of a stable fix backport.
> >>
> >> --
> >> Jens Axboe
> >>
> > OK, could you shed light on how could we fix the queue stall problem on 4.14?
> > My colleague Roman reported to upstream:
> > https://lkml.org/lkml/2017/10/18/263
> >
> > It's still there on latest 4.14.
>
> The proposed patch is a helluva lot simpler than doing a 23 patch selective
> backport.
>
> --
> Jens Axboe
>
Do you think it make sense to port Roman's patch to upstream 4.14?

So others would also be benifitial?
On the other side, stable tree only takes patches accepted into Linus tree.

Thanks,
Jack

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2018-07-23 16:39 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-23 13:24 [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Jack Wang
2018-07-23 13:24 ` [stable-4.14 01/23] blk-mq-sched: move actual dispatching into one helper Jack Wang
2018-07-23 14:31   ` Jens Axboe
2018-07-23 14:36     ` Greg KH
2018-07-23 13:24 ` [stable-4.14 02/23] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops Jack Wang
2018-07-23 13:24 ` [stable-4.14 03/23] sbitmap: introduce __sbitmap_for_each_set() Jack Wang
2018-07-23 13:24 ` [stable-4.14 04/23] blk-mq-sched: improve dispatching from sw queue Jack Wang
2018-07-23 13:24 ` [stable-4.14 05/23] scsi: allow passing in null rq to scsi_prep_state_check() Jack Wang
2018-07-23 13:24 ` [stable-4.14 06/23] scsi: implement .get_budget and .put_budget for blk-mq Jack Wang
2018-07-23 13:24 ` [stable-4.14 07/23] SCSI: don't get target/host busy_count in scsi_mq_get_budget() Jack Wang
2018-07-23 13:24 ` [stable-4.14 08/23] blk-mq: don't handle TAG_SHARED in restart Jack Wang
2018-07-23 13:24 ` [stable-4.14 09/23] blk-mq: don't restart queue when .get_budget returns BLK_STS_RESOURCE Jack Wang
2018-07-23 13:24 ` [stable-4.14 10/23] blk-mq: don't handle failure in .get_budget Jack Wang
2018-07-23 13:24 ` [stable-4.14 11/23] blk-mq: put the driver tag of nxt rq before first one is requeued Jack Wang
2018-07-23 13:24 ` [stable-4.14 12/23] blk-flush: don't run queue for requests bypassing flush Jack Wang
2018-07-23 13:24 ` [stable-4.14 13/23] block: pass 'run_queue' to blk_mq_request_bypass_insert Jack Wang
2018-07-23 13:24 ` [stable-4.14 14/23] blk-flush: use blk_mq_request_bypass_insert() Jack Wang
2018-07-23 13:24 ` [stable-4.14 15/23] blk-mq-sched: decide how to handle flush rq via RQF_FLUSH_SEQ Jack Wang
2018-07-23 13:24 ` [stable-4.14 16/23] blk-mq: move blk_mq_put_driver_tag*() into blk-mq.h Jack Wang
2018-07-23 13:24 ` [stable-4.14 17/23] blk-mq: don't allocate driver tag upfront for flush rq Jack Wang
2018-07-23 13:24 ` [stable-4.14 18/23] Revert "blk-mq: don't handle TAG_SHARED in restart" Jack Wang
2018-07-23 13:24 ` [stable-4.14 19/23] blk-mq: put driver tag if dispatch budget can't be got Jack Wang
2018-07-23 13:24 ` [stable-4.14 20/23] blk-mq: fix issue with shared tag queue re-running Jack Wang
2018-07-23 13:24 ` [stable-4.14 21/23] blk-mq: only run the hardware queue if IO is pending Jack Wang
2018-07-23 13:24 ` [stable-4.14 22/23] blk-mq: quiesce queue during switching io sched and updating nr_requests Jack Wang
2018-07-23 13:24 ` [stable-4.14 23/23] scsi: core: run queue if SCSI device queue isn't ready and queue is idle Jack Wang
2018-07-23 13:33 ` [stable-4.14 00/23] block/scsi multiqueue performance enhancement and Greg KH
2018-07-23 15:00   ` Jack Wang
2018-07-23 15:00     ` Jack Wang
2018-07-23 15:05     ` Jens Axboe
2018-07-23 15:28       ` Jack Wang
2018-07-23 15:28         ` Jack Wang
2018-07-23 15:31         ` Jens Axboe
2018-07-23 15:37           ` Jack Wang
2018-07-23 15:37             ` Jack Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.