All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/5] Avoid that scsi-mq queue processing stalls
@ 2017-04-06 18:10 Bart Van Assche
  2017-04-06 18:10 ` [PATCH v3 1/5] blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list Bart Van Assche
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Bart Van Assche @ 2017-04-06 18:10 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Bart Van Assche

Hello Jens,

The five patches in this patch series fix the queue lockup I reported
recently on the linux-block mailing list. Please consider these patches
for inclusion in the upstream kernel.

Thanks,

Bart.

Changes between v2 and v3:
- Removed the blk_mq_ops.restart_hctx function pointer again.
- Modified blk_mq_sched_restart_queues() such that only a single hardware
  queue is restarted instead of multiple if hardware queues are shared.
- Introduced a new function in the block layer, namely
  blk_mq_delay_run_hw_queue().  

Changes between v1 and v2:
- Reworked scsi_restart_queues() such that it no longer takes the SCSI
  host lock.
- Added two patches - one for exporting blk_mq_sched_restart_hctx() and
  another one to make iterating with RCU over blk_mq_tag_set.tag_list safe.

Bart Van Assche (5):
  blk-mq: Make it safe to use RCU to iterate over
    blk_mq_tag_set.tag_list
  blk-mq: Restart a single queue if tag sets are shared
  blk-mq: Clarify comments in blk_mq_dispatch_rq_list()
  blk-mq: Introduce blk_mq_delay_run_hw_queue()
  scsi: Avoid that SCSI queues get stuck

 block/blk-mq-sched.c    | 60 +++++++++++++++++++++++++++++++++++--------
 block/blk-mq-sched.h    | 16 +-----------
 block/blk-mq.c          | 68 +++++++++++++++++++++++++++++++++++++++----------
 drivers/scsi/scsi_lib.c |  6 ++---
 include/linux/blk-mq.h  |  2 ++
 include/linux/blkdev.h  |  1 -
 6 files changed, 111 insertions(+), 42 deletions(-)

-- 
2.12.0

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/5] blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list
  2017-04-06 18:10 [PATCH v3 0/5] Avoid that scsi-mq queue processing stalls Bart Van Assche
@ 2017-04-06 18:10 ` Bart Van Assche
  2017-04-07  9:46   ` Ming Lei
  2017-04-06 18:10 ` [PATCH v3 2/5] blk-mq: Restart a single queue if tag sets are shared Bart Van Assche
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Bart Van Assche @ 2017-04-06 18:10 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Bart Van Assche, Christoph Hellwig, Hannes Reinecke

Since the next patch in this series will use RCU to iterate over
tag_list, make this safe. Add lockdep_assert_held() statements
in functions that iterate over tag_list to make clear that using
list_for_each_entry() instead of list_for_each_entry_rcu() is
fine in these functions.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
---
 block/blk-mq.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f7cd3208bcdf..b5580b09b4a5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2076,6 +2076,8 @@ static void blk_mq_update_tag_set_depth(struct blk_mq_tag_set *set, bool shared)
 {
 	struct request_queue *q;
 
+	lockdep_assert_held(&set->tag_list_lock);
+
 	list_for_each_entry(q, &set->tag_list, tag_set_list) {
 		blk_mq_freeze_queue(q);
 		queue_set_hctx_shared(q, shared);
@@ -2096,6 +2098,8 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)
 		blk_mq_update_tag_set_depth(set, false);
 	}
 	mutex_unlock(&set->tag_list_lock);
+
+	synchronize_rcu();
 }
 
 static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,
@@ -2601,6 +2605,8 @@ void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues)
 {
 	struct request_queue *q;
 
+	lockdep_assert_held(&set->tag_list_lock);
+
 	if (nr_hw_queues > nr_cpu_ids)
 		nr_hw_queues = nr_cpu_ids;
 	if (nr_hw_queues < 1 || nr_hw_queues == set->nr_hw_queues)
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 2/5] blk-mq: Restart a single queue if tag sets are shared
  2017-04-06 18:10 [PATCH v3 0/5] Avoid that scsi-mq queue processing stalls Bart Van Assche
  2017-04-06 18:10 ` [PATCH v3 1/5] blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list Bart Van Assche
@ 2017-04-06 18:10 ` Bart Van Assche
  2017-04-06 19:12   ` Jens Axboe
  2017-04-06 18:10 ` [PATCH v3 3/5] blk-mq: Clarify comments in blk_mq_dispatch_rq_list() Bart Van Assche
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Bart Van Assche @ 2017-04-06 18:10 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Bart Van Assche, Christoph Hellwig, Hannes Reinecke

To improve scalability, if hardware queues are shared, restart
a single hardware queue in round-robin fashion. Rename
blk_mq_sched_restart_queues() to reflect the new semantics.
Remove blk_mq_sched_mark_restart_queue() because this function
has no callers. Remove flag QUEUE_FLAG_RESTART because this
patch removes the code that uses this flag.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
---
 block/blk-mq-sched.c   | 60 +++++++++++++++++++++++++++++++++++++++++---------
 block/blk-mq-sched.h   | 16 +-------------
 block/blk-mq.c         |  2 +-
 include/linux/blkdev.h |  1 -
 4 files changed, 52 insertions(+), 27 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 09af8ff18719..8f553775d3ed 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -317,25 +317,65 @@ static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
 	return true;
 }
 
-static void blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx)
+static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx)
 {
 	if (test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) {
 		clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
-		if (blk_mq_hctx_has_pending(hctx))
+		if (blk_mq_hctx_has_pending(hctx)) {
 			blk_mq_run_hw_queue(hctx, true);
+			return true;
+		}
 	}
+	return false;
 }
 
-void blk_mq_sched_restart_queues(struct blk_mq_hw_ctx *hctx)
+/**
+ * list_for_each_entry_rcu_rr - iterate in a round-robin fashion over rcu list
+ * @pos:    loop cursor.
+ * @skip:   the list element that will not be examined. Iteration starts at
+ *          @skip->next.
+ * @head:   head of the list to examine. This list must have at least one
+ *          element, namely @skip.
+ * @member: name of the list_head structure within typeof(*pos).
+ */
+#define list_for_each_entry_rcu_rr(pos, skip, head, member)		\
+	for ((pos) = (skip);						\
+	     (pos = (pos)->member.next != (head) ? list_entry_rcu(	\
+			(pos)->member.next, typeof(*pos), member) :	\
+	      list_entry_rcu((pos)->member.next->next, typeof(*pos), member)), \
+	     (pos) != (skip); )
+
+/**
+ * Called after a driver tag has been freed to check whether a hctx needs to
+ * be restarted. Restarts @hctx if its tag set is not shared. Restarts hardware
+ * queues in a round-robin fashion if the tag set of @hctx is shared with other
+ * hardware queues.
+ */
+void blk_mq_sched_restart(struct blk_mq_hw_ctx *const hctx)
 {
-	struct request_queue *q = hctx->queue;
-	unsigned int i;
-
-	if (test_bit(QUEUE_FLAG_RESTART, &q->queue_flags)) {
-		if (test_and_clear_bit(QUEUE_FLAG_RESTART, &q->queue_flags)) {
-			queue_for_each_hw_ctx(q, hctx, i)
-				blk_mq_sched_restart_hctx(hctx);
+	struct blk_mq_tags *const tags = hctx->tags;
+	struct blk_mq_tag_set *const set = hctx->queue->tag_set;
+	struct request_queue *const queue = hctx->queue, *q;
+	struct blk_mq_hw_ctx *h;
+	unsigned int i, j;
+
+	if (set->flags & BLK_MQ_F_TAG_SHARED) {
+		rcu_read_lock();
+		list_for_each_entry_rcu_rr(q, queue, &set->tag_list,
+					   tag_set_list) {
+			queue_for_each_hw_ctx(q, h, i)
+				if (h->tags == tags &&
+				    blk_mq_sched_restart_hctx(h))
+					goto done;
+		}
+		for (i = 0; i < queue->nr_hw_queues; i++) {
+			j = (i + hctx->queue_num + 1) % queue->nr_hw_queues;
+			h = queue->queue_hw_ctx[j];
+			if (h->tags == tags && blk_mq_sched_restart_hctx(h))
+				break;
 		}
+done:
+		rcu_read_unlock();
 	} else {
 		blk_mq_sched_restart_hctx(hctx);
 	}
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index a75b16b123f7..4e3fc2a40207 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -19,7 +19,7 @@ bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
 				struct request **merged_request);
 bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio);
 bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq);
-void blk_mq_sched_restart_queues(struct blk_mq_hw_ctx *hctx);
+void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx);
 
 void blk_mq_sched_insert_request(struct request *rq, bool at_head,
 				 bool run_queue, bool async, bool can_block);
@@ -131,20 +131,6 @@ static inline void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx)
 		set_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
 }
 
-/*
- * Mark a hardware queue and the request queue it belongs to as needing a
- * restart.
- */
-static inline void blk_mq_sched_mark_restart_queue(struct blk_mq_hw_ctx *hctx)
-{
-	struct request_queue *q = hctx->queue;
-
-	if (!test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
-		set_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
-	if (!test_bit(QUEUE_FLAG_RESTART, &q->queue_flags))
-		set_bit(QUEUE_FLAG_RESTART, &q->queue_flags);
-}
-
 static inline bool blk_mq_sched_needs_restart(struct blk_mq_hw_ctx *hctx)
 {
 	return test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index b5580b09b4a5..dc83aec338d9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -351,7 +351,7 @@ void __blk_mq_finish_request(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 		blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
 	if (sched_tag != -1)
 		blk_mq_sched_completed_request(hctx, rq);
-	blk_mq_sched_restart_queues(hctx);
+	blk_mq_sched_restart(hctx);
 	blk_queue_exit(q);
 }
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3cf241b0814d..dc6c8d39d462 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -615,7 +615,6 @@ struct request_queue {
 #define QUEUE_FLAG_FLUSH_NQ    25	/* flush not queueuable */
 #define QUEUE_FLAG_DAX         26	/* device supports DAX */
 #define QUEUE_FLAG_STATS       27	/* track rq completion times */
-#define QUEUE_FLAG_RESTART     28	/* queue needs restart at completion */
 #define QUEUE_FLAG_POLL_STATS  29	/* collecting stats for hybrid polling */
 #define QUEUE_FLAG_REGISTERED  30	/* queue has been registered to a disk */
 
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 3/5] blk-mq: Clarify comments in blk_mq_dispatch_rq_list()
  2017-04-06 18:10 [PATCH v3 0/5] Avoid that scsi-mq queue processing stalls Bart Van Assche
  2017-04-06 18:10 ` [PATCH v3 1/5] blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list Bart Van Assche
  2017-04-06 18:10 ` [PATCH v3 2/5] blk-mq: Restart a single queue if tag sets are shared Bart Van Assche
@ 2017-04-06 18:10 ` Bart Van Assche
  2017-04-06 18:10 ` [PATCH v3 4/5] blk-mq: Introduce blk_mq_delay_run_hw_queue() Bart Van Assche
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Bart Van Assche @ 2017-04-06 18:10 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Bart Van Assche, Omar Sandoval, Christoph Hellwig,
	Hannes Reinecke

The blk_mq_dispatch_rq_list() implementation got modified several
times but the comments in that function were not updated every
time. Since it is nontrivial what is going on, update the comments
in blk_mq_dispatch_rq_list().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
---
 block/blk-mq.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index dc83aec338d9..4db48ad76878 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1063,8 +1063,8 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list)
 	 */
 	if (!list_empty(list)) {
 		/*
-		 * If we got a driver tag for the next request already,
-		 * free it again.
+		 * If an I/O scheduler has been configured and we got a driver
+		 * tag for the next request already, free it again.
 		 */
 		rq = list_first_entry(list, struct request, queuelist);
 		blk_mq_put_driver_tag(rq);
@@ -1074,16 +1074,24 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list)
 		spin_unlock(&hctx->lock);
 
 		/*
-		 * the queue is expected stopped with BLK_MQ_RQ_QUEUE_BUSY, but
-		 * it's possible the queue is stopped and restarted again
-		 * before this. Queue restart will dispatch requests. And since
-		 * requests in rq_list aren't added into hctx->dispatch yet,
-		 * the requests in rq_list might get lost.
+		 * If SCHED_RESTART was set by the caller of this function and
+		 * it is no longer set that means that it was cleared by another
+		 * thread and hence that a queue rerun is needed.
 		 *
-		 * blk_mq_run_hw_queue() already checks the STOPPED bit
+		 * If TAG_WAITING is set that means that an I/O scheduler has
+		 * been configured and another thread is waiting for a driver
+		 * tag. To guarantee fairness, do not rerun this hardware queue
+		 * but let the other thread grab the driver tag.
 		 *
-		 * If RESTART or TAG_WAITING is set, then let completion restart
-		 * the queue instead of potentially looping here.
+		 * If no I/O scheduler has been configured it is possible that
+		 * the hardware queue got stopped and restarted before requests
+		 * were pushed back onto the dispatch list. Rerun the queue to
+		 * avoid starvation. Notes:
+		 * - blk_mq_run_hw_queue() checks whether or not a queue has
+		 *   been stopped before rerunning a queue.
+		 * - Some but not all block drivers stop a queue before
+		 *   returning BLK_MQ_RQ_QUEUE_BUSY. Two exceptions are scsi-mq
+		 *   and dm-rq.
 		 */
 		if (!blk_mq_sched_needs_restart(hctx) &&
 		    !test_bit(BLK_MQ_S_TAG_WAITING, &hctx->state))
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 4/5] blk-mq: Introduce blk_mq_delay_run_hw_queue()
  2017-04-06 18:10 [PATCH v3 0/5] Avoid that scsi-mq queue processing stalls Bart Van Assche
                   ` (2 preceding siblings ...)
  2017-04-06 18:10 ` [PATCH v3 3/5] blk-mq: Clarify comments in blk_mq_dispatch_rq_list() Bart Van Assche
@ 2017-04-06 18:10 ` Bart Van Assche
  2017-04-07 10:23   ` Ming Lei
  2017-04-06 18:10 ` [PATCH v3 5/5] scsi: Avoid that SCSI queues get stuck Bart Van Assche
  2017-04-07  9:41 ` [PATCH v3 0/5] Avoid that scsi-mq queue processing stalls Ming Lei
  5 siblings, 1 reply; 17+ messages in thread
From: Bart Van Assche @ 2017-04-06 18:10 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Bart Van Assche, Christoph Hellwig, Hannes Reinecke,
	Long Li, K . Y . Srinivasan

Introduce a function that runs a hardware queue unconditionally
after a delay. Note: there is already a function that stops and
restarts a hardware queue after a delay, namely blk_mq_delay_queue().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Long Li <longli@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
---
 block/blk-mq.c         | 32 ++++++++++++++++++++++++++++++--
 include/linux/blk-mq.h |  2 ++
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4db48ad76878..36a80ab6fff8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1146,7 +1146,8 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
 	return hctx->next_cpu;
 }
 
-void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
+static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async,
+					unsigned long msecs)
 {
 	if (unlikely(blk_mq_hctx_stopped(hctx) ||
 		     !blk_mq_hw_queue_mapped(hctx)))
@@ -1163,7 +1164,24 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
 		put_cpu();
 	}
 
-	kblockd_schedule_work_on(blk_mq_hctx_next_cpu(hctx), &hctx->run_work);
+	if (msecs == 0)
+		kblockd_schedule_work_on(blk_mq_hctx_next_cpu(hctx),
+					 &hctx->run_work);
+	else
+		kblockd_schedule_delayed_work_on(blk_mq_hctx_next_cpu(hctx),
+						 &hctx->delayed_run_work,
+						 msecs_to_jiffies(msecs));
+}
+
+void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs)
+{
+	__blk_mq_delay_run_hw_queue(hctx, true, msecs);
+}
+EXPORT_SYMBOL(blk_mq_delay_run_hw_queue);
+
+void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
+{
+	__blk_mq_delay_run_hw_queue(hctx, async, 0);
 }
 
 void blk_mq_run_hw_queues(struct request_queue *q, bool async)
@@ -1266,6 +1284,15 @@ static void blk_mq_run_work_fn(struct work_struct *work)
 	__blk_mq_run_hw_queue(hctx);
 }
 
+static void blk_mq_delayed_run_work_fn(struct work_struct *work)
+{
+	struct blk_mq_hw_ctx *hctx;
+
+	hctx = container_of(work, struct blk_mq_hw_ctx, delayed_run_work.work);
+
+	__blk_mq_run_hw_queue(hctx);
+}
+
 static void blk_mq_delay_work_fn(struct work_struct *work)
 {
 	struct blk_mq_hw_ctx *hctx;
@@ -1866,6 +1893,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
 		node = hctx->numa_node = set->numa_node;
 
 	INIT_WORK(&hctx->run_work, blk_mq_run_work_fn);
+	INIT_DELAYED_WORK(&hctx->delayed_run_work, blk_mq_delayed_run_work_fn);
 	INIT_DELAYED_WORK(&hctx->delay_work, blk_mq_delay_work_fn);
 	spin_lock_init(&hctx->lock);
 	INIT_LIST_HEAD(&hctx->dispatch);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index bdea90d75274..b90c3d5766cd 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -51,6 +51,7 @@ struct blk_mq_hw_ctx {
 
 	atomic_t		nr_active;
 
+	struct delayed_work	delayed_run_work;
 	struct delayed_work	delay_work;
 
 	struct hlist_node	cpuhp_dead;
@@ -236,6 +237,7 @@ void blk_mq_stop_hw_queues(struct request_queue *q);
 void blk_mq_start_hw_queues(struct request_queue *q);
 void blk_mq_start_stopped_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
 void blk_mq_start_stopped_hw_queues(struct request_queue *q, bool async);
+void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs);
 void blk_mq_run_hw_queues(struct request_queue *q, bool async);
 void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs);
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 5/5] scsi: Avoid that SCSI queues get stuck
  2017-04-06 18:10 [PATCH v3 0/5] Avoid that scsi-mq queue processing stalls Bart Van Assche
                   ` (3 preceding siblings ...)
  2017-04-06 18:10 ` [PATCH v3 4/5] blk-mq: Introduce blk_mq_delay_run_hw_queue() Bart Van Assche
@ 2017-04-06 18:10 ` Bart Van Assche
  2017-04-07  9:41 ` [PATCH v3 0/5] Avoid that scsi-mq queue processing stalls Ming Lei
  5 siblings, 0 replies; 17+ messages in thread
From: Bart Van Assche @ 2017-04-06 18:10 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Bart Van Assche, Martin K . Petersen,
	James Bottomley, Christoph Hellwig, Hannes Reinecke,
	Sagi Grimberg, Long Li, K . Y . Srinivasan

commit 52d7f1b5c2f3 ("blk-mq: Avoid that requeueing starts stopped
queues") removed the blk_mq_stop_hw_queue() call from scsi_queue_rq()
for the BLK_MQ_RQ_QUEUE_BUSY case. Hence change all calls to functions
that are intended to rerun a busy queue such that these examine all
hardware queues instead of only stopped queues.

Since no other functions than scsi_internal_device_block() and
scsi_internal_device_unblock() should ever stop or restart a SCSI
queue, change the blk_mq_delay_queue() call into a
blk_mq_delay_run_hw_queue() call.

Fixes: commit 52d7f1b5c2f3 ("blk-mq: Avoid that requeueing starts stopped queues")
Fixes: commit 7e79dadce222 ("blk-mq: stop hardware queue in blk_mq_delay_queue()")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Long Li <longli@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
---
 drivers/scsi/scsi_lib.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 11972d1075f1..7bc4513bf4e4 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -496,7 +496,7 @@ static void scsi_run_queue(struct request_queue *q)
 		scsi_starved_list_run(sdev->host);
 
 	if (q->mq_ops)
-		blk_mq_start_stopped_hw_queues(q, false);
+		blk_mq_run_hw_queues(q, false);
 	else
 		blk_run_queue(q);
 }
@@ -667,7 +667,7 @@ static bool scsi_end_request(struct request *req, int error,
 		    !list_empty(&sdev->host->starved_list))
 			kblockd_schedule_work(&sdev->requeue_work);
 		else
-			blk_mq_start_stopped_hw_queues(q, true);
+			blk_mq_run_hw_queues(q, true);
 	} else {
 		unsigned long flags;
 
@@ -1974,7 +1974,7 @@ static int scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 	case BLK_MQ_RQ_QUEUE_BUSY:
 		if (atomic_read(&sdev->device_busy) == 0 &&
 		    !scsi_device_blocked(sdev))
-			blk_mq_delay_queue(hctx, SCSI_QUEUE_DELAY);
+			blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
 		break;
 	case BLK_MQ_RQ_QUEUE_ERROR:
 		/*
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/5] blk-mq: Restart a single queue if tag sets are shared
  2017-04-06 18:10 ` [PATCH v3 2/5] blk-mq: Restart a single queue if tag sets are shared Bart Van Assche
@ 2017-04-06 19:12   ` Jens Axboe
  2017-04-06 19:21     ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2017-04-06 19:12 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-block, Christoph Hellwig, Hannes Reinecke

On 04/06/2017 12:10 PM, Bart Van Assche wrote:
> +		for (i = 0; i < queue->nr_hw_queues; i++) {
> +			j = (i + hctx->queue_num + 1) % queue->nr_hw_queues;
> +			h = queue->queue_hw_ctx[j];
> +			if (h->tags == tags && blk_mq_sched_restart_hctx(h))
> +				break;

I'm pretty sure that doing:

	j = i + hctx->queue_num + 1;;
	for (i = 0; i < queue->nr_hw_queues; i++, j++) {
		if (j == queue->nr_hw_queues)
			j = 0;
		h = queue->queue_hw_ctx[j];
		if (h->tags == tags && blk_mq_sched_restart_hctx(h))
			break;
	}

would be considerably more efficient than a modulo for each loop. And
let's rename 'h' to 'hctx', readability is much better that way.
Especially since you have both i and j as iterators.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/5] blk-mq: Restart a single queue if tag sets are shared
  2017-04-06 19:12   ` Jens Axboe
@ 2017-04-06 19:21     ` Jens Axboe
  2017-04-07 15:12       ` Bart Van Assche
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2017-04-06 19:21 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-block, Christoph Hellwig, Hannes Reinecke

On 04/06/2017 01:12 PM, Jens Axboe wrote:
> On 04/06/2017 12:10 PM, Bart Van Assche wrote:
>> +		for (i = 0; i < queue->nr_hw_queues; i++) {
>> +			j = (i + hctx->queue_num + 1) % queue->nr_hw_queues;
>> +			h = queue->queue_hw_ctx[j];
>> +			if (h->tags == tags && blk_mq_sched_restart_hctx(h))
>> +				break;
> 
> I'm pretty sure that doing:
> 
> 	j = i + hctx->queue_num + 1;;

And 'i' too many there of course:

 	j = hctx->queue_num + 1;;

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/5] Avoid that scsi-mq queue processing stalls
  2017-04-06 18:10 [PATCH v3 0/5] Avoid that scsi-mq queue processing stalls Bart Van Assche
                   ` (4 preceding siblings ...)
  2017-04-06 18:10 ` [PATCH v3 5/5] scsi: Avoid that SCSI queues get stuck Bart Van Assche
@ 2017-04-07  9:41 ` Ming Lei
  2017-04-07 15:18   ` Bart Van Assche
  5 siblings, 1 reply; 17+ messages in thread
From: Ming Lei @ 2017-04-07  9:41 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Jens Axboe, linux-block

On Thu, Apr 06, 2017 at 11:10:45AM -0700, Bart Van Assche wrote:
> Hello Jens,
> 
> The five patches in this patch series fix the queue lockup I reported
> recently on the linux-block mailing list. Please consider these patches
> for inclusion in the upstream kernel.

I read the commit log of the 5 patches, looks not found descriptions
about root cause of the queue lockup, so could you explain a bit about
the reason behind?

Thanks,
Ming

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/5] blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list
  2017-04-06 18:10 ` [PATCH v3 1/5] blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list Bart Van Assche
@ 2017-04-07  9:46   ` Ming Lei
  2017-04-07 15:15     ` Bart Van Assche
  0 siblings, 1 reply; 17+ messages in thread
From: Ming Lei @ 2017-04-07  9:46 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Hannes Reinecke

On Thu, Apr 06, 2017 at 11:10:46AM -0700, Bart Van Assche wrote:
> Since the next patch in this series will use RCU to iterate over
> tag_list, make this safe. Add lockdep_assert_held() statements
> in functions that iterate over tag_list to make clear that using
> list_for_each_entry() instead of list_for_each_entry_rcu() is
> fine in these functions.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.com>
> ---
>  block/blk-mq.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index f7cd3208bcdf..b5580b09b4a5 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2076,6 +2076,8 @@ static void blk_mq_update_tag_set_depth(struct blk_mq_tag_set *set, bool shared)
>  {
>  	struct request_queue *q;
>  
> +	lockdep_assert_held(&set->tag_list_lock);
> +
>  	list_for_each_entry(q, &set->tag_list, tag_set_list) {
>  		blk_mq_freeze_queue(q);
>  		queue_set_hctx_shared(q, shared);
> @@ -2096,6 +2098,8 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)
>  		blk_mq_update_tag_set_depth(set, false);
>  	}
>  	mutex_unlock(&set->tag_list_lock);
> +
> +	synchronize_rcu();

Looks synchronize_rcu() is only needed in deletion path, so it can
be moved to blk_mq_del_queue_tag_set().

Also list_del_init/list_add_tail() need to be replaced with RCU
safe version in functions operating &set->tag_list.

>  }
>  
>  static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,
> @@ -2601,6 +2605,8 @@ void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues)
>  {
>  	struct request_queue *q;
>  
> +	lockdep_assert_held(&set->tag_list_lock);
> +
>  	if (nr_hw_queues > nr_cpu_ids)
>  		nr_hw_queues = nr_cpu_ids;
>  	if (nr_hw_queues < 1 || nr_hw_queues == set->nr_hw_queues)
> -- 
> 2.12.0
> 

-- 
Ming

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/5] blk-mq: Introduce blk_mq_delay_run_hw_queue()
  2017-04-06 18:10 ` [PATCH v3 4/5] blk-mq: Introduce blk_mq_delay_run_hw_queue() Bart Van Assche
@ 2017-04-07 10:23   ` Ming Lei
  2017-04-07 15:13     ` Bart Van Assche
  0 siblings, 1 reply; 17+ messages in thread
From: Ming Lei @ 2017-04-07 10:23 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Hannes Reinecke,
	Long Li, K . Y . Srinivasan

On Thu, Apr 06, 2017 at 11:10:49AM -0700, Bart Van Assche wrote:
> Introduce a function that runs a hardware queue unconditionally

I appreciate if some background can be provided for this function,
like fixing some issues?


Thanks,
Ming

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/5] blk-mq: Restart a single queue if tag sets are shared
  2017-04-06 19:21     ` Jens Axboe
@ 2017-04-07 15:12       ` Bart Van Assche
  0 siblings, 0 replies; 17+ messages in thread
From: Bart Van Assche @ 2017-04-07 15:12 UTC (permalink / raw)
  To: axboe; +Cc: hch, linux-block, hare

On Thu, 2017-04-06 at 13:21 -0600, Jens Axboe wrote:
> On 04/06/2017 01:12 PM, Jens Axboe wrote:
> > On 04/06/2017 12:10 PM, Bart Van Assche wrote:
> > > +		for (i =3D 0; i < queue->nr_hw_queues; i++) {
> > > +			j =3D (i + hctx->queue_num + 1) % queue->nr_hw_queues;
> > > +			h =3D queue->queue_hw_ctx[j];
> > > +			if (h->tags =3D=3D tags && blk_mq_sched_restart_hctx(h))
> > > +				break;
> >=20
> > I'm pretty sure that doing:
> >=20
> > 	j =3D i + hctx->queue_num + 1;;
>=20
> And 'i' too many there of course:
>=20
>  	j =3D hctx->queue_num + 1;;

Hello Jens,

Thanks for the feedback. I will implement this change and retest this patch
series.

Bart.=

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/5] blk-mq: Introduce blk_mq_delay_run_hw_queue()
  2017-04-07 10:23   ` Ming Lei
@ 2017-04-07 15:13     ` Bart Van Assche
  0 siblings, 0 replies; 17+ messages in thread
From: Bart Van Assche @ 2017-04-07 15:13 UTC (permalink / raw)
  To: ming.lei; +Cc: hch, linux-block, hare, longli, axboe, kys

On Fri, 2017-04-07 at 18:23 +0800, Ming Lei wrote:
> On Thu, Apr 06, 2017 at 11:10:49AM -0700, Bart Van Assche wrote:
> > Introduce a function that runs a hardware queue unconditionally
>=20
> I appreciate if some background can be provided for this function,
> like fixing some issues?

Hello Ming,

Have you noticed that patch 5/5 in this series uses this function?

Bart.=

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/5] blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list
  2017-04-07  9:46   ` Ming Lei
@ 2017-04-07 15:15     ` Bart Van Assche
  0 siblings, 0 replies; 17+ messages in thread
From: Bart Van Assche @ 2017-04-07 15:15 UTC (permalink / raw)
  To: ming.lei; +Cc: hch, linux-block, hare, axboe

On Fri, 2017-04-07 at 17:46 +0800, Ming Lei wrote:
> On Thu, Apr 06, 2017 at 11:10:46AM -0700, Bart Van Assche wrote:
> > Since the next patch in this series will use RCU to iterate over
> > tag_list, make this safe. Add lockdep_assert_held() statements
> > in functions that iterate over tag_list to make clear that using
> > list_for_each_entry() instead of list_for_each_entry_rcu() is
> > fine in these functions.
> >=20
> > Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Hannes Reinecke <hare@suse.com>
> > ---
> >  block/blk-mq.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >=20
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index f7cd3208bcdf..b5580b09b4a5 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -2076,6 +2076,8 @@ static void blk_mq_update_tag_set_depth(struct bl=
k_mq_tag_set *set, bool shared)
> >  {
> >  	struct request_queue *q;
> > =20
> > +	lockdep_assert_held(&set->tag_list_lock);
> > +
> >  	list_for_each_entry(q, &set->tag_list, tag_set_list) {
> >  		blk_mq_freeze_queue(q);
> >  		queue_set_hctx_shared(q, shared);
> > @@ -2096,6 +2098,8 @@ static void blk_mq_del_queue_tag_set(struct reque=
st_queue *q)
> >  		blk_mq_update_tag_set_depth(set, false);
> >  	}
> >  	mutex_unlock(&set->tag_list_lock);
> > +
> > +	synchronize_rcu();
>=20
> Looks synchronize_rcu() is only needed in deletion path, so it can
> be moved to blk_mq_del_queue_tag_set().
>=20
> Also list_del_init/list_add_tail() need to be replaced with RCU
> safe version in functions operating &set->tag_list.

Hello Ming,

I will replace list_del_init() / list_add_tail() by their RCU equivalents.

Regarding synchronize_rcu(): have you noticed that that call has been added=
 to
blk_mq_del_queue_tag_set(), the function you requested to move that call to=
?

Bart.=

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/5] Avoid that scsi-mq queue processing stalls
  2017-04-07  9:41 ` [PATCH v3 0/5] Avoid that scsi-mq queue processing stalls Ming Lei
@ 2017-04-07 15:18   ` Bart Van Assche
  2017-04-07 15:34     ` Ming Lei
  0 siblings, 1 reply; 17+ messages in thread
From: Bart Van Assche @ 2017-04-07 15:18 UTC (permalink / raw)
  To: ming.lei; +Cc: linux-block, axboe

On Fri, 2017-04-07 at 17:41 +0800, Ming Lei wrote:
> On Thu, Apr 06, 2017 at 11:10:45AM -0700, Bart Van Assche wrote:
> > Hello Jens,
> >=20
> > The five patches in this patch series fix the queue lockup I reported
> > recently on the linux-block mailing list. Please consider these patches
> > for inclusion in the upstream kernel.
>=20
> I read the commit log of the 5 patches, looks not found descriptions
> about root cause of the queue lockup, so could you explain a bit about
> the reason behind?

Hello Ming,

If a .queue_rq() function returns BLK_MQ_RQ_QUEUE_BUSY then the block
driver that implements that function is responsible for rerunning the
hardware queue once requests can be queued successfully again. That is
not the case today for the SCSI core. Patch 5/5 ensures that hardware
queues for which scsi_queue_rq() has returned "busy" are rerun after
the "busy" condition has been cleared. This is why patch 5/5 fixes a
queue lockup.

Bart.=

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/5] Avoid that scsi-mq queue processing stalls
  2017-04-07 15:18   ` Bart Van Assche
@ 2017-04-07 15:34     ` Ming Lei
  2017-04-07 15:46       ` Bart Van Assche
  0 siblings, 1 reply; 17+ messages in thread
From: Ming Lei @ 2017-04-07 15:34 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-block, axboe

On Fri, Apr 07, 2017 at 03:18:19PM +0000, Bart Van Assche wrote:
> On Fri, 2017-04-07 at 17:41 +0800, Ming Lei wrote:
> > On Thu, Apr 06, 2017 at 11:10:45AM -0700, Bart Van Assche wrote:
> > > Hello Jens,
> > > 
> > > The five patches in this patch series fix the queue lockup I reported
> > > recently on the linux-block mailing list. Please consider these patches
> > > for inclusion in the upstream kernel.
> > 
> > I read the commit log of the 5 patches, looks not found descriptions
> > about root cause of the queue lockup, so could you explain a bit about
> > the reason behind?
> 
> Hello Ming,
> 
> If a .queue_rq() function returns BLK_MQ_RQ_QUEUE_BUSY then the block
> driver that implements that function is responsible for rerunning the
> hardware queue once requests can be queued successfully again. That is
> not the case today for the SCSI core. Patch 5/5 ensures that hardware

The current .queue_rq() will call blk_mq_delay_queue() if QUEUE_BUSY is
returned, and once request is completed, the queue will be restarted
by blk_mq_start_stopped_hw_queues() in scsi_end_request(). This way
sounds OK in theory. And I just try to understand the specific reason
which causes the lockup, but still not get it.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/5] Avoid that scsi-mq queue processing stalls
  2017-04-07 15:34     ` Ming Lei
@ 2017-04-07 15:46       ` Bart Van Assche
  0 siblings, 0 replies; 17+ messages in thread
From: Bart Van Assche @ 2017-04-07 15:46 UTC (permalink / raw)
  To: ming.lei; +Cc: linux-block, axboe

On Fri, 2017-04-07 at 23:34 +0800, Ming Lei wrote:
> On Fri, Apr 07, 2017 at 03:18:19PM +0000, Bart Van Assche wrote:
> > On Fri, 2017-04-07 at 17:41 +0800, Ming Lei wrote:
> > > On Thu, Apr 06, 2017 at 11:10:45AM -0700, Bart Van Assche wrote:
> > > > Hello Jens,
> > > >=20
> > > > The five patches in this patch series fix the queue lockup I report=
ed
> > > > recently on the linux-block mailing list. Please consider these pat=
ches
> > > > for inclusion in the upstream kernel.
> > >=20
> > > I read the commit log of the 5 patches, looks not found descriptions
> > > about root cause of the queue lockup, so could you explain a bit abou=
t
> > > the reason behind?
> >=20
> > Hello Ming,
> >=20
> > If a .queue_rq() function returns BLK_MQ_RQ_QUEUE_BUSY then the block
> > driver that implements that function is responsible for rerunning the
> > hardware queue once requests can be queued successfully again. That is
> > not the case today for the SCSI core. Patch 5/5 ensures that hardware
>=20
> The current .queue_rq() will call blk_mq_delay_queue() if QUEUE_BUSY is
> returned, and once request is completed, the queue will be restarted
> by blk_mq_start_stopped_hw_queues() in scsi_end_request(). This way
> sounds OK in theory. And I just try to understand the specific reason
> which causes the lockup, but still not get it.

Hello Ming,

blk_mq_delay_queue() stops and restarts a hardware queue after a delay has
expired. If the SCSI core calls blk_mq_start_stopped_hw_queues() after that
delay has expired no queues will be restarted. This is why patch 5/5 change=
s
two blk_mq_start_stopped_hw_queues() calls into two blk_mq_run_hw_queues()
calls.

Bart.=

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-04-07 15:46 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-06 18:10 [PATCH v3 0/5] Avoid that scsi-mq queue processing stalls Bart Van Assche
2017-04-06 18:10 ` [PATCH v3 1/5] blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list Bart Van Assche
2017-04-07  9:46   ` Ming Lei
2017-04-07 15:15     ` Bart Van Assche
2017-04-06 18:10 ` [PATCH v3 2/5] blk-mq: Restart a single queue if tag sets are shared Bart Van Assche
2017-04-06 19:12   ` Jens Axboe
2017-04-06 19:21     ` Jens Axboe
2017-04-07 15:12       ` Bart Van Assche
2017-04-06 18:10 ` [PATCH v3 3/5] blk-mq: Clarify comments in blk_mq_dispatch_rq_list() Bart Van Assche
2017-04-06 18:10 ` [PATCH v3 4/5] blk-mq: Introduce blk_mq_delay_run_hw_queue() Bart Van Assche
2017-04-07 10:23   ` Ming Lei
2017-04-07 15:13     ` Bart Van Assche
2017-04-06 18:10 ` [PATCH v3 5/5] scsi: Avoid that SCSI queues get stuck Bart Van Assche
2017-04-07  9:41 ` [PATCH v3 0/5] Avoid that scsi-mq queue processing stalls Ming Lei
2017-04-07 15:18   ` Bart Van Assche
2017-04-07 15:34     ` Ming Lei
2017-04-07 15:46       ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.