All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] blk-mq: various blk-mq/blk-mq-sched fixes
@ 2017-01-26 19:48 Jens Axboe
  2017-01-26 19:48 ` [PATCH 1/5] blk-mq: improve scheduler queue sync/async running Jens Axboe
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Jens Axboe @ 2017-01-26 19:48 UTC (permalink / raw)
  To: linux-block; +Cc: bart.vanassche, hch, osandov, paolo.valente, hare

I've been diving into the problems that Hannes reported, both the
stalls related to shared tag maps and multiple hardware queues,
but also the cases where we get suboptimal merging.

This series attempts to fix that.

Hannes, this should be identical to what I sent you as a bundled up
patch earlier today. Would be great if you could run this through
your testing.

The patches are against for-4.11/block


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/5] blk-mq: improve scheduler queue sync/async running
  2017-01-26 19:48 [PATCH 0/5] blk-mq: various blk-mq/blk-mq-sched fixes Jens Axboe
@ 2017-01-26 19:48 ` Jens Axboe
  2017-01-26 19:59   ` Omar Sandoval
  2017-01-26 19:48 ` [PATCH 2/5] blk-mq: fix potential race in queue restart and driver tag allocation Jens Axboe
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2017-01-26 19:48 UTC (permalink / raw)
  To: linux-block; +Cc: bart.vanassche, hch, osandov, paolo.valente, hare, Jens Axboe

We'll use the same criteria for whether we need to run the queue sync
or async when we have a scheduler, as we do without one.

Signed-off-by: Jens Axboe <axboe@fb.com>
---
 block/blk-mq.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 84d13b5cafd0..223b9b080820 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1476,7 +1476,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 	if (q->elevator) {
 		blk_mq_put_ctx(data.ctx);
 		blk_mq_bio_to_request(rq, bio);
-		blk_mq_sched_insert_request(rq, false, true, true);
+		blk_mq_sched_insert_request(rq, false, true,
+						!is_sync || is_flush_fua);
 		goto done;
 	}
 	if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) {
@@ -1585,7 +1586,8 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
 	if (q->elevator) {
 		blk_mq_put_ctx(data.ctx);
 		blk_mq_bio_to_request(rq, bio);
-		blk_mq_sched_insert_request(rq, false, true, true);
+		blk_mq_sched_insert_request(rq, false, true,
+						!is_sync || is_flush_fua);
 		goto done;
 	}
 	if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/5] blk-mq: fix potential race in queue restart and driver tag allocation
  2017-01-26 19:48 [PATCH 0/5] blk-mq: various blk-mq/blk-mq-sched fixes Jens Axboe
  2017-01-26 19:48 ` [PATCH 1/5] blk-mq: improve scheduler queue sync/async running Jens Axboe
@ 2017-01-26 19:48 ` Jens Axboe
  2017-01-26 19:52   ` Jens Axboe
  2017-01-26 19:48 ` [PATCH 3/5] blk-mq: release driver tag on a requeue event Jens Axboe
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2017-01-26 19:48 UTC (permalink / raw)
  To: linux-block; +Cc: bart.vanassche, hch, osandov, paolo.valente, hare, Jens Axboe

Once we mark the queue as needing a restart, re-check if we can
get a driver tag. This fixes a theoretical issue where the needed
IO completes _after_ blk_mq_get_driver_tag() fails, but before we
manage to set the restart bit.

Signed-off-by: Jens Axboe <axboe@fb.com>
---
 block/blk-mq.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 223b9b080820..8041ad330289 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -928,7 +928,16 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list)
 		if (!blk_mq_get_driver_tag(rq, &hctx, false)) {
 			if (!queued && reorder_tags_to_front(list))
 				continue;
+
+			/*
+			 * We failed getting a driver tag. Mark the queue(s)
+			 * as needing a restart. Retry getting a tag again,
+			 * in case the needed IO completed right before we
+			 * marked the queue as needing a restart.
+			 */
 			blk_mq_sched_mark_restart(hctx);
+			if (!blk_mq_get_driver_tag(rq, &hctx, false))
+				break;
 			break;
 		}
 		list_del_init(&rq->queuelist);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/5] blk-mq: release driver tag on a requeue event
  2017-01-26 19:48 [PATCH 0/5] blk-mq: various blk-mq/blk-mq-sched fixes Jens Axboe
  2017-01-26 19:48 ` [PATCH 1/5] blk-mq: improve scheduler queue sync/async running Jens Axboe
  2017-01-26 19:48 ` [PATCH 2/5] blk-mq: fix potential race in queue restart and driver tag allocation Jens Axboe
@ 2017-01-26 19:48 ` Jens Axboe
  2017-01-26 20:06   ` Omar Sandoval
  2017-01-26 19:48 ` [PATCH 4/5] blk-mq-sched: fix starvation for multiple hardware queues and shared tags Jens Axboe
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2017-01-26 19:48 UTC (permalink / raw)
  To: linux-block; +Cc: bart.vanassche, hch, osandov, paolo.valente, hare, Jens Axboe

We don't want to hold on to this resource when we have a scheduler
attached.

Signed-off-by: Jens Axboe <axboe@fb.com>
---
 block/blk-mq.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8041ad330289..089b2eedca4f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -879,6 +879,21 @@ static bool blk_mq_get_driver_tag(struct request *rq,
 	return false;
 }
 
+static void blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
+				  struct request *rq)
+{
+	if (rq->tag == -1 || rq->internal_tag == -1)
+		return;
+
+	blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag);
+	rq->tag = -1;
+
+	if (rq->rq_flags & RQF_MQ_INFLIGHT) {
+		rq->rq_flags &= ~RQF_MQ_INFLIGHT;
+		atomic_dec(&hctx->nr_active);
+	}
+}
+
 /*
  * If we fail getting a driver tag because all the driver tags are already
  * assigned and on the dispatch list, BUT the first entry does not have a
@@ -952,6 +967,7 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list)
 			queued++;
 			break;
 		case BLK_MQ_RQ_QUEUE_BUSY:
+			blk_mq_put_driver_tag(hctx, rq);
 			list_add(&rq->queuelist, list);
 			__blk_mq_requeue_request(rq);
 			break;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/5] blk-mq-sched: fix starvation for multiple hardware queues and shared tags
  2017-01-26 19:48 [PATCH 0/5] blk-mq: various blk-mq/blk-mq-sched fixes Jens Axboe
                   ` (2 preceding siblings ...)
  2017-01-26 19:48 ` [PATCH 3/5] blk-mq: release driver tag on a requeue event Jens Axboe
@ 2017-01-26 19:48 ` Jens Axboe
  2017-01-26 20:25   ` Omar Sandoval
  2017-01-26 19:48 ` [PATCH 5/5] blk-mq-sched: change ->dispatch_requests() to ->dispatch_request() Jens Axboe
  2017-01-27 12:10 ` [PATCH 0/5] blk-mq: various blk-mq/blk-mq-sched fixes Hannes Reinecke
  5 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2017-01-26 19:48 UTC (permalink / raw)
  To: linux-block; +Cc: bart.vanassche, hch, osandov, paolo.valente, hare, Jens Axboe

If we have both multiple hardware queues and shared tag map between
devices, we need to ensure that we propagate the hardware queue
restart bit higher up. This is because we can get into a situation
where we don't have any IO pending on a hardware queue, yet we fail
getting a tag to start new IO. If that happens, it's not enough to
mark the hardware queue as needing a restart, we need to bubble
that up to the higher level queue as well.

Signed-off-by: Jens Axboe <axboe@fb.com>
---
 block/blk-mq-sched.c   | 28 ++++++++++++++++++++++++++++
 block/blk-mq-sched.h   | 15 +++++++++------
 block/blk-mq.c         |  3 ++-
 block/blk-mq.h         |  1 +
 include/linux/blkdev.h |  1 +
 5 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 56b92db944ae..69502ff89f3a 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -300,6 +300,34 @@ bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx, struct request *rq)
 }
 EXPORT_SYMBOL_GPL(blk_mq_sched_bypass_insert);
 
+static void blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx)
+{
+	if (test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) {
+		clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
+		if (blk_mq_hctx_has_pending(hctx))
+			blk_mq_run_hw_queue(hctx, true);
+	}
+}
+
+void blk_mq_sched_restart_queues(struct blk_mq_hw_ctx *hctx)
+{
+	unsigned int i;
+
+	if (!(hctx->flags & BLK_MQ_F_TAG_SHARED))
+		blk_mq_sched_restart_hctx(hctx);
+	else {
+		struct request_queue *q = hctx->queue;
+
+		if (!test_bit(QUEUE_FLAG_RESTART, &q->queue_flags))
+			return;
+
+		clear_bit(QUEUE_FLAG_RESTART, &q->queue_flags);
+
+		queue_for_each_hw_ctx(q, hctx, i)
+			blk_mq_sched_restart_hctx(hctx);
+	}
+}
+
 static void blk_mq_sched_free_tags(struct blk_mq_tag_set *set,
 				   struct blk_mq_hw_ctx *hctx,
 				   unsigned int hctx_idx)
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 6b465bc7014c..becbc7840364 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -19,6 +19,7 @@ bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx, struct request *rq);
 bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio);
 bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio);
 bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq);
+void blk_mq_sched_restart_queues(struct blk_mq_hw_ctx *hctx);
 
 void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx);
 void blk_mq_sched_move_to_dispatch(struct blk_mq_hw_ctx *hctx,
@@ -123,11 +124,6 @@ blk_mq_sched_completed_request(struct blk_mq_hw_ctx *hctx, struct request *rq)
 	BUG_ON(rq->internal_tag == -1);
 
 	blk_mq_put_tag(hctx, hctx->sched_tags, rq->mq_ctx, rq->internal_tag);
-
-	if (test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) {
-		clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
-		blk_mq_run_hw_queue(hctx, true);
-	}
 }
 
 static inline void blk_mq_sched_started_request(struct request *rq)
@@ -160,8 +156,15 @@ static inline bool blk_mq_sched_has_work(struct blk_mq_hw_ctx *hctx)
 
 static inline void blk_mq_sched_mark_restart(struct blk_mq_hw_ctx *hctx)
 {
-	if (!test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
+	if (!test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) {
 		set_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
+		if (hctx->flags & BLK_MQ_F_TAG_SHARED) {
+			struct request_queue *q = hctx->queue;
+
+			if (!test_bit(QUEUE_FLAG_RESTART, &q->queue_flags))
+				set_bit(QUEUE_FLAG_RESTART, &q->queue_flags);
+		}
+	}
 }
 
 static inline bool blk_mq_sched_needs_restart(struct blk_mq_hw_ctx *hctx)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 089b2eedca4f..fcb5f9f445f7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -40,7 +40,7 @@ static LIST_HEAD(all_q_list);
 /*
  * Check if any of the ctx's have pending work in this hardware queue
  */
-static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
+bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
 {
 	return sbitmap_any_bit_set(&hctx->ctx_map) ||
 			!list_empty_careful(&hctx->dispatch) ||
@@ -345,6 +345,7 @@ void __blk_mq_finish_request(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 		blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
 	if (sched_tag != -1)
 		blk_mq_sched_completed_request(hctx, rq);
+	blk_mq_sched_restart_queues(hctx);
 	blk_queue_exit(q);
 }
 
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 0c7c034d9ddd..d19b0e75a129 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -33,6 +33,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr);
 void blk_mq_wake_waiters(struct request_queue *q);
 bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *, struct list_head *);
 void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list);
+bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx);
 
 /*
  * Internal helpers for allocating/freeing the request map
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 25564857f5f8..73bcd201a9b7 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -602,6 +602,7 @@ struct request_queue {
 #define QUEUE_FLAG_FLUSH_NQ    25	/* flush not queueuable */
 #define QUEUE_FLAG_DAX         26	/* device supports DAX */
 #define QUEUE_FLAG_STATS       27	/* track rq completion times */
+#define QUEUE_FLAG_RESTART     28
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/5] blk-mq-sched: change ->dispatch_requests() to ->dispatch_request()
  2017-01-26 19:48 [PATCH 0/5] blk-mq: various blk-mq/blk-mq-sched fixes Jens Axboe
                   ` (3 preceding siblings ...)
  2017-01-26 19:48 ` [PATCH 4/5] blk-mq-sched: fix starvation for multiple hardware queues and shared tags Jens Axboe
@ 2017-01-26 19:48 ` Jens Axboe
  2017-01-26 20:54   ` Omar Sandoval
  2017-01-27 12:10 ` [PATCH 0/5] blk-mq: various blk-mq/blk-mq-sched fixes Hannes Reinecke
  5 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2017-01-26 19:48 UTC (permalink / raw)
  To: linux-block; +Cc: bart.vanassche, hch, osandov, paolo.valente, hare, Jens Axboe

When we invoke dispatch_requests(), the scheduler empties everything
into the passed in list. This isn't always a good thing, since it
means that we remove items that we could have potentially merged
with.

Change the function to dispatch single requests at the time. If
we do that, we can backoff exactly at the point where the device
can't consume more IO, and leave the rest with the scheduler for
better merging and future dispatch decision making.

Signed-off-by: Jens Axboe <axboe@fb.com>
---
 block/blk-mq-sched.c     | 23 +++++++++++++++--------
 block/blk-mq.c           |  2 +-
 block/mq-deadline.c      | 10 ++++++----
 include/linux/elevator.h |  2 +-
 4 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 69502ff89f3a..3136696f4991 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -200,15 +200,22 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
 	 * leave them there for as long as we can. Mark the hw queue as
 	 * needing a restart in that case.
 	 */
-	if (list_empty(&rq_list)) {
-		if (e && e->type->ops.mq.dispatch_requests)
-			e->type->ops.mq.dispatch_requests(hctx, &rq_list);
-		else
-			blk_mq_flush_busy_ctxs(hctx, &rq_list);
-	} else
+	if (!list_empty(&rq_list)) {
 		blk_mq_sched_mark_restart(hctx);
-
-	blk_mq_dispatch_rq_list(hctx, &rq_list);
+		blk_mq_dispatch_rq_list(hctx, &rq_list);
+	} else if (!e || !e->type->ops.mq.dispatch_request) {
+		blk_mq_flush_busy_ctxs(hctx, &rq_list);
+		blk_mq_dispatch_rq_list(hctx, &rq_list);
+	} else {
+		do {
+			struct request *rq;
+
+			rq = e->type->ops.mq.dispatch_request(hctx);
+			if (!rq)
+				break;
+			list_add(&rq->queuelist, &rq_list);
+		} while (blk_mq_dispatch_rq_list(hctx, &rq_list));
+	}
 }
 
 void blk_mq_sched_move_to_dispatch(struct blk_mq_hw_ctx *hctx,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index fcb5f9f445f7..ee20f43f8e83 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -999,7 +999,7 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list)
 	 */
 	if (!list_empty(list)) {
 		spin_lock(&hctx->lock);
-		list_splice(list, &hctx->dispatch);
+		list_splice_init(list, &hctx->dispatch);
 		spin_unlock(&hctx->lock);
 
 		/*
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index a01986d7b6fb..d93ec713fa62 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -287,14 +287,16 @@ static struct request *__dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	return rq;
 }
 
-static void dd_dispatch_requests(struct blk_mq_hw_ctx *hctx,
-				 struct list_head *rq_list)
+static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 {
 	struct deadline_data *dd = hctx->queue->elevator->elevator_data;
+	struct request *rq;
 
 	spin_lock(&dd->lock);
-	blk_mq_sched_move_to_dispatch(hctx, rq_list, __dd_dispatch_request);
+	rq = __dd_dispatch_request(hctx);
 	spin_unlock(&dd->lock);
+
+	return rq;
 }
 
 static void dd_exit_queue(struct elevator_queue *e)
@@ -517,7 +519,7 @@ static struct elv_fs_entry deadline_attrs[] = {
 static struct elevator_type mq_deadline = {
 	.ops.mq = {
 		.insert_requests	= dd_insert_requests,
-		.dispatch_requests	= dd_dispatch_requests,
+		.dispatch_request	= dd_dispatch_request,
 		.next_request		= elv_rb_latter_request,
 		.former_request		= elv_rb_former_request,
 		.bio_merge		= dd_bio_merge,
diff --git a/include/linux/elevator.h b/include/linux/elevator.h
index ecb96fd67c6d..b5825c4f06f7 100644
--- a/include/linux/elevator.h
+++ b/include/linux/elevator.h
@@ -92,7 +92,7 @@ struct elevator_mq_ops {
 	struct request *(*get_request)(struct request_queue *, unsigned int, struct blk_mq_alloc_data *);
 	void (*put_request)(struct request *);
 	void (*insert_requests)(struct blk_mq_hw_ctx *, struct list_head *, bool);
-	void (*dispatch_requests)(struct blk_mq_hw_ctx *, struct list_head *);
+	struct request *(*dispatch_request)(struct blk_mq_hw_ctx *);
 	bool (*has_work)(struct blk_mq_hw_ctx *);
 	void (*completed_request)(struct blk_mq_hw_ctx *, struct request *);
 	void (*started_request)(struct request *);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] blk-mq: fix potential race in queue restart and driver tag allocation
  2017-01-26 19:48 ` [PATCH 2/5] blk-mq: fix potential race in queue restart and driver tag allocation Jens Axboe
@ 2017-01-26 19:52   ` Jens Axboe
  2017-01-26 20:04     ` Omar Sandoval
  0 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2017-01-26 19:52 UTC (permalink / raw)
  To: linux-block; +Cc: bart.vanassche, hch, osandov, paolo.valente, hare

On 01/26/2017 12:48 PM, Jens Axboe wrote:
> Once we mark the queue as needing a restart, re-check if we can
> get a driver tag. This fixes a theoretical issue where the needed
> IO completes _after_ blk_mq_get_driver_tag() fails, but before we
> manage to set the restart bit.
> 
> Signed-off-by: Jens Axboe <axboe@fb.com>
> ---
>  block/blk-mq.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 223b9b080820..8041ad330289 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -928,7 +928,16 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list)
>  		if (!blk_mq_get_driver_tag(rq, &hctx, false)) {
>  			if (!queued && reorder_tags_to_front(list))
>  				continue;
> +
> +			/*
> +			 * We failed getting a driver tag. Mark the queue(s)
> +			 * as needing a restart. Retry getting a tag again,
> +			 * in case the needed IO completed right before we
> +			 * marked the queue as needing a restart.
> +			 */
>  			blk_mq_sched_mark_restart(hctx);
> +			if (!blk_mq_get_driver_tag(rq, &hctx, false))
> +				break;
>  			break;
>  		}
>  		list_del_init(&rq->queuelist);

I screwed this up when splitting up the patchset, that last break needs to
be removed as well, of course. Updated below:


>From 9d68cf9232c06a793e305d10b6d655df4beae928 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@fb.com>
Date: Thu, 26 Jan 2017 12:50:36 -0700
Subject: [PATCH 1/4] blk-mq: fix potential race in queue restart and driver
 tag allocation

Once we mark the queue as needing a restart, re-check if we can
get a driver tag. This fixes a theoretical issue where the needed
IO completes _after_ blk_mq_get_driver_tag() fails, but before we
manage to set the restart bit.

Signed-off-by: Jens Axboe <axboe@fb.com>
---
 block/blk-mq.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 223b9b080820..5cf013d87c2e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -928,8 +928,16 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list)
 		if (!blk_mq_get_driver_tag(rq, &hctx, false)) {
 			if (!queued && reorder_tags_to_front(list))
 				continue;
+
+			/*
+			 * We failed getting a driver tag. Mark the queue(s)
+			 * as needing a restart. Retry getting a tag again,
+			 * in case the needed IO completed right before we
+			 * marked the queue as needing a restart.
+			 */
 			blk_mq_sched_mark_restart(hctx);
-			break;
+			if (!blk_mq_get_driver_tag(rq, &hctx, false))
+				break;
 		}
 		list_del_init(&rq->queuelist);
 
-- 
2.7.4


-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] blk-mq: improve scheduler queue sync/async running
  2017-01-26 19:48 ` [PATCH 1/5] blk-mq: improve scheduler queue sync/async running Jens Axboe
@ 2017-01-26 19:59   ` Omar Sandoval
  0 siblings, 0 replies; 18+ messages in thread
From: Omar Sandoval @ 2017-01-26 19:59 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, bart.vanassche, hch, osandov, paolo.valente, hare

On Thu, Jan 26, 2017 at 12:48:14PM -0700, Jens Axboe wrote:
> We'll use the same criteria for whether we need to run the queue sync
> or async when we have a scheduler, as we do without one.

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Signed-off-by: Jens Axboe <axboe@fb.com>
> ---
>  block/blk-mq.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] blk-mq: fix potential race in queue restart and driver tag allocation
  2017-01-26 19:52   ` Jens Axboe
@ 2017-01-26 20:04     ` Omar Sandoval
  0 siblings, 0 replies; 18+ messages in thread
From: Omar Sandoval @ 2017-01-26 20:04 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, bart.vanassche, hch, osandov, paolo.valente, hare

On Thu, Jan 26, 2017 at 12:52:15PM -0700, Jens Axboe wrote:
> I screwed this up when splitting up the patchset, that last break needs to
> be removed as well, of course. Updated below:
> 
> 
> From 9d68cf9232c06a793e305d10b6d655df4beae928 Mon Sep 17 00:00:00 2001
> From: Jens Axboe <axboe@fb.com>
> Date: Thu, 26 Jan 2017 12:50:36 -0700
> Subject: [PATCH 1/4] blk-mq: fix potential race in queue restart and driver
>  tag allocation
> 
> Once we mark the queue as needing a restart, re-check if we can
> get a driver tag. This fixes a theoretical issue where the needed
> IO completes _after_ blk_mq_get_driver_tag() fails, but before we
> manage to set the restart bit.

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Signed-off-by: Jens Axboe <axboe@fb.com>
> ---
>  block/blk-mq.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/5] blk-mq: release driver tag on a requeue event
  2017-01-26 19:48 ` [PATCH 3/5] blk-mq: release driver tag on a requeue event Jens Axboe
@ 2017-01-26 20:06   ` Omar Sandoval
  0 siblings, 0 replies; 18+ messages in thread
From: Omar Sandoval @ 2017-01-26 20:06 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, bart.vanassche, hch, osandov, paolo.valente, hare

On Thu, Jan 26, 2017 at 12:48:16PM -0700, Jens Axboe wrote:
> We don't want to hold on to this resource when we have a scheduler
> attached.

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Signed-off-by: Jens Axboe <axboe@fb.com>
> ---
>  block/blk-mq.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/5] blk-mq-sched: fix starvation for multiple hardware queues and shared tags
  2017-01-26 19:48 ` [PATCH 4/5] blk-mq-sched: fix starvation for multiple hardware queues and shared tags Jens Axboe
@ 2017-01-26 20:25   ` Omar Sandoval
  2017-01-26 20:26     ` Jens Axboe
  0 siblings, 1 reply; 18+ messages in thread
From: Omar Sandoval @ 2017-01-26 20:25 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, bart.vanassche, hch, osandov, paolo.valente, hare

On Thu, Jan 26, 2017 at 12:48:17PM -0700, Jens Axboe wrote:
> If we have both multiple hardware queues and shared tag map between
> devices, we need to ensure that we propagate the hardware queue
> restart bit higher up. This is because we can get into a situation
> where we don't have any IO pending on a hardware queue, yet we fail
> getting a tag to start new IO. If that happens, it's not enough to
> mark the hardware queue as needing a restart, we need to bubble
> that up to the higher level queue as well.

One minor nit below. Otherwise, makes sense.

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Signed-off-by: Jens Axboe <axboe@fb.com>
> ---
>  block/blk-mq-sched.c   | 28 ++++++++++++++++++++++++++++
>  block/blk-mq-sched.h   | 15 +++++++++------
>  block/blk-mq.c         |  3 ++-
>  block/blk-mq.h         |  1 +
>  include/linux/blkdev.h |  1 +
>  5 files changed, 41 insertions(+), 7 deletions(-)

> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 25564857f5f8..73bcd201a9b7 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -602,6 +602,7 @@ struct request_queue {
>  #define QUEUE_FLAG_FLUSH_NQ    25	/* flush not queueuable */
>  #define QUEUE_FLAG_DAX         26	/* device supports DAX */
>  #define QUEUE_FLAG_STATS       27	/* track rq completion times */
> +#define QUEUE_FLAG_RESTART     28

All of the other queue flags have a comment, could you add one here,
too?

>  #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
>  				 (1 << QUEUE_FLAG_STACKABLE)	|	\
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/5] blk-mq-sched: fix starvation for multiple hardware queues and shared tags
  2017-01-26 20:25   ` Omar Sandoval
@ 2017-01-26 20:26     ` Jens Axboe
  0 siblings, 0 replies; 18+ messages in thread
From: Jens Axboe @ 2017-01-26 20:26 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: linux-block, bart.vanassche, hch, osandov, paolo.valente, hare

On 01/26/2017 01:25 PM, Omar Sandoval wrote:
> On Thu, Jan 26, 2017 at 12:48:17PM -0700, Jens Axboe wrote:
>> If we have both multiple hardware queues and shared tag map between
>> devices, we need to ensure that we propagate the hardware queue
>> restart bit higher up. This is because we can get into a situation
>> where we don't have any IO pending on a hardware queue, yet we fail
>> getting a tag to start new IO. If that happens, it's not enough to
>> mark the hardware queue as needing a restart, we need to bubble
>> that up to the higher level queue as well.
> 
> One minor nit below. Otherwise, makes sense.
> 
> Reviewed-by: Omar Sandoval <osandov@fb.com>
> 
>> Signed-off-by: Jens Axboe <axboe@fb.com>
>> ---
>>  block/blk-mq-sched.c   | 28 ++++++++++++++++++++++++++++
>>  block/blk-mq-sched.h   | 15 +++++++++------
>>  block/blk-mq.c         |  3 ++-
>>  block/blk-mq.h         |  1 +
>>  include/linux/blkdev.h |  1 +
>>  5 files changed, 41 insertions(+), 7 deletions(-)
> 
>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> index 25564857f5f8..73bcd201a9b7 100644
>> --- a/include/linux/blkdev.h
>> +++ b/include/linux/blkdev.h
>> @@ -602,6 +602,7 @@ struct request_queue {
>>  #define QUEUE_FLAG_FLUSH_NQ    25	/* flush not queueuable */
>>  #define QUEUE_FLAG_DAX         26	/* device supports DAX */
>>  #define QUEUE_FLAG_STATS       27	/* track rq completion times */
>> +#define QUEUE_FLAG_RESTART     28
> 
> All of the other queue flags have a comment, could you add one here,
> too?

Definitely, I'll add an appropriate comment.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] blk-mq-sched: change ->dispatch_requests() to ->dispatch_request()
  2017-01-26 19:48 ` [PATCH 5/5] blk-mq-sched: change ->dispatch_requests() to ->dispatch_request() Jens Axboe
@ 2017-01-26 20:54   ` Omar Sandoval
  2017-01-26 20:59     ` Jens Axboe
  0 siblings, 1 reply; 18+ messages in thread
From: Omar Sandoval @ 2017-01-26 20:54 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, bart.vanassche, hch, osandov, paolo.valente, hare

On Thu, Jan 26, 2017 at 12:48:18PM -0700, Jens Axboe wrote:
> When we invoke dispatch_requests(), the scheduler empties everything
> into the passed in list. This isn't always a good thing, since it
> means that we remove items that we could have potentially merged
> with.
> 
> Change the function to dispatch single requests at the time. If
> we do that, we can backoff exactly at the point where the device
> can't consume more IO, and leave the rest with the scheduler for
> better merging and future dispatch decision making.

Hmm, I think I previously changed this from ->dispatch_request() to
->dispatch_requests() to support schedulers using software queues. My
current mq-token stuff doesn't have a ->dispatch_requests() hook anymore
after the sched_tags rework, but I think I'm going to need it again
soon. Having the scheduler do blk_mq_flush_busy_ctxs() into its own
private list and then handing the requests out one-by-one kinda sucks.
(Plus, deferred issue wouldn't work with this, but it's not implemented,
anyways :)

One idea: what if we have the scheduler get the driver tags inside of
its ->dispatch_requests()? For example, __dd_dispatch_request() could
first check whether it has a request to dispatch and then try to grab a
driver tag. If it succeeds, it dispatches the request, and if it
doesn't, it marks itself as needing restart.

With that, the scheduler will only return requests ready for
->queue_rq(), meaning we could get rid of the list reshuffling in
blk_mq_dispatch_rq_list().

> Signed-off-by: Jens Axboe <axboe@fb.com>
> ---
>  block/blk-mq-sched.c     | 23 +++++++++++++++--------
>  block/blk-mq.c           |  2 +-
>  block/mq-deadline.c      | 10 ++++++----
>  include/linux/elevator.h |  2 +-
>  4 files changed, 23 insertions(+), 14 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] blk-mq-sched: change ->dispatch_requests() to ->dispatch_request()
  2017-01-26 20:54   ` Omar Sandoval
@ 2017-01-26 20:59     ` Jens Axboe
  2017-01-26 21:11       ` Omar Sandoval
  0 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2017-01-26 20:59 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: linux-block, bart.vanassche, hch, osandov, paolo.valente, hare

On 01/26/2017 01:54 PM, Omar Sandoval wrote:
> On Thu, Jan 26, 2017 at 12:48:18PM -0700, Jens Axboe wrote:
>> When we invoke dispatch_requests(), the scheduler empties everything
>> into the passed in list. This isn't always a good thing, since it
>> means that we remove items that we could have potentially merged
>> with.
>>
>> Change the function to dispatch single requests at the time. If
>> we do that, we can backoff exactly at the point where the device
>> can't consume more IO, and leave the rest with the scheduler for
>> better merging and future dispatch decision making.
> 
> Hmm, I think I previously changed this from ->dispatch_request() to
> ->dispatch_requests() to support schedulers using software queues. My
> current mq-token stuff doesn't have a ->dispatch_requests() hook anymore
> after the sched_tags rework, but I think I'm going to need it again
> soon. Having the scheduler do blk_mq_flush_busy_ctxs() into its own
> private list and then handing the requests out one-by-one kinda sucks.
> (Plus, deferred issue wouldn't work with this, but it's not implemented,
> anyways :)
> 
> One idea: what if we have the scheduler get the driver tags inside of
> its ->dispatch_requests()? For example, __dd_dispatch_request() could
> first check whether it has a request to dispatch and then try to grab a
> driver tag. If it succeeds, it dispatches the request, and if it
> doesn't, it marks itself as needing restart.
> 
> With that, the scheduler will only return requests ready for
> ->queue_rq(), meaning we could get rid of the list reshuffling in
> blk_mq_dispatch_rq_list().

That'd work for the driver tags, but it would not work for the case
where the driver returns BUSY. So we'd only be covering some of the
cases. That may or may not matter... Hard to say. I appreciate having
the hook that dispatches them all for efficiency reasons, but from a
scheduler point of view, sending off one is generally simpler and it'll
be better for rotational storage since we depend so heavily on merging
to get good performance there.

I'm definitely open to alternatives. We can keep the dispatch_requests()
and pass in a dispatch count, ramping up the dispatch count or something
like that. Seems a bit fragile though, and hard to get right.


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] blk-mq-sched: change ->dispatch_requests() to ->dispatch_request()
  2017-01-26 20:59     ` Jens Axboe
@ 2017-01-26 21:11       ` Omar Sandoval
  2017-01-26 21:15         ` Jens Axboe
  0 siblings, 1 reply; 18+ messages in thread
From: Omar Sandoval @ 2017-01-26 21:11 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, bart.vanassche, hch, osandov, paolo.valente, hare

On Thu, Jan 26, 2017 at 01:59:23PM -0700, Jens Axboe wrote:
> On 01/26/2017 01:54 PM, Omar Sandoval wrote:
> > On Thu, Jan 26, 2017 at 12:48:18PM -0700, Jens Axboe wrote:
> >> When we invoke dispatch_requests(), the scheduler empties everything
> >> into the passed in list. This isn't always a good thing, since it
> >> means that we remove items that we could have potentially merged
> >> with.
> >>
> >> Change the function to dispatch single requests at the time. If
> >> we do that, we can backoff exactly at the point where the device
> >> can't consume more IO, and leave the rest with the scheduler for
> >> better merging and future dispatch decision making.
> > 
> > Hmm, I think I previously changed this from ->dispatch_request() to
> > ->dispatch_requests() to support schedulers using software queues. My
> > current mq-token stuff doesn't have a ->dispatch_requests() hook anymore
> > after the sched_tags rework, but I think I'm going to need it again
> > soon. Having the scheduler do blk_mq_flush_busy_ctxs() into its own
> > private list and then handing the requests out one-by-one kinda sucks.
> > (Plus, deferred issue wouldn't work with this, but it's not implemented,
> > anyways :)
> > 
> > One idea: what if we have the scheduler get the driver tags inside of
> > its ->dispatch_requests()? For example, __dd_dispatch_request() could
> > first check whether it has a request to dispatch and then try to grab a
> > driver tag. If it succeeds, it dispatches the request, and if it
> > doesn't, it marks itself as needing restart.
> > 
> > With that, the scheduler will only return requests ready for
> > ->queue_rq(), meaning we could get rid of the list reshuffling in
> > blk_mq_dispatch_rq_list().
> 
> That'd work for the driver tags, but it would not work for the case
> where the driver returns BUSY. So we'd only be covering some of the
> cases. That may or may not matter... Hard to say.

Right, I didn't think of that case.

> I appreciate having
> the hook that dispatches them all for efficiency reasons, but from a
> scheduler point of view, sending off one is generally simpler and it'll
> be better for rotational storage since we depend so heavily on merging
> to get good performance there.
> 
> I'm definitely open to alternatives. We can keep the dispatch_requests()
> and pass in a dispatch count, ramping up the dispatch count or something
> like that. Seems a bit fragile though, and hard to get right.

Yeah, beyond having a way to shove requests back into the scheduler, I
can't think of a good way to reconcile it. I guess we can go with this,
and I'll figure something out for the software queue case.

Begrudgingly-reviewed-by: Omar Sandoval <osandov@fb.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] blk-mq-sched: change ->dispatch_requests() to ->dispatch_request()
  2017-01-26 21:11       ` Omar Sandoval
@ 2017-01-26 21:15         ` Jens Axboe
  0 siblings, 0 replies; 18+ messages in thread
From: Jens Axboe @ 2017-01-26 21:15 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: linux-block, bart.vanassche, hch, osandov, paolo.valente, hare

On 01/26/2017 02:11 PM, Omar Sandoval wrote:
> On Thu, Jan 26, 2017 at 01:59:23PM -0700, Jens Axboe wrote:
>> On 01/26/2017 01:54 PM, Omar Sandoval wrote:
>>> On Thu, Jan 26, 2017 at 12:48:18PM -0700, Jens Axboe wrote:
>>>> When we invoke dispatch_requests(), the scheduler empties everything
>>>> into the passed in list. This isn't always a good thing, since it
>>>> means that we remove items that we could have potentially merged
>>>> with.
>>>>
>>>> Change the function to dispatch single requests at the time. If
>>>> we do that, we can backoff exactly at the point where the device
>>>> can't consume more IO, and leave the rest with the scheduler for
>>>> better merging and future dispatch decision making.
>>>
>>> Hmm, I think I previously changed this from ->dispatch_request() to
>>> ->dispatch_requests() to support schedulers using software queues. My
>>> current mq-token stuff doesn't have a ->dispatch_requests() hook anymore
>>> after the sched_tags rework, but I think I'm going to need it again
>>> soon. Having the scheduler do blk_mq_flush_busy_ctxs() into its own
>>> private list and then handing the requests out one-by-one kinda sucks.
>>> (Plus, deferred issue wouldn't work with this, but it's not implemented,
>>> anyways :)
>>>
>>> One idea: what if we have the scheduler get the driver tags inside of
>>> its ->dispatch_requests()? For example, __dd_dispatch_request() could
>>> first check whether it has a request to dispatch and then try to grab a
>>> driver tag. If it succeeds, it dispatches the request, and if it
>>> doesn't, it marks itself as needing restart.
>>>
>>> With that, the scheduler will only return requests ready for
>>> ->queue_rq(), meaning we could get rid of the list reshuffling in
>>> blk_mq_dispatch_rq_list().
>>
>> That'd work for the driver tags, but it would not work for the case
>> where the driver returns BUSY. So we'd only be covering some of the
>> cases. That may or may not matter... Hard to say.
> 
> Right, I didn't think of that case.
> 
>> I appreciate having
>> the hook that dispatches them all for efficiency reasons, but from a
>> scheduler point of view, sending off one is generally simpler and it'll
>> be better for rotational storage since we depend so heavily on merging
>> to get good performance there.
>>
>> I'm definitely open to alternatives. We can keep the dispatch_requests()
>> and pass in a dispatch count, ramping up the dispatch count or something
>> like that. Seems a bit fragile though, and hard to get right.
> 
> Yeah, beyond having a way to shove requests back into the scheduler, I
> can't think of a good way to reconcile it. I guess we can go with this,
> and I'll figure something out for the software queue case.

I considered that case as well, but I think it'd be less efficient
than pulling one at the time. Obviously for the case where we NEVER
get BUSY or share tags, we can more easily pull the whole thing. But
for common code...

> Begrudgingly-reviewed-by: Omar Sandoval <osandov@fb.com>

We can always change this again, if we come up with something more
efficient that also gets performance where we want it on shared tags +
multiple queues. :-)

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/5] blk-mq: various blk-mq/blk-mq-sched fixes
  2017-01-26 19:48 [PATCH 0/5] blk-mq: various blk-mq/blk-mq-sched fixes Jens Axboe
                   ` (4 preceding siblings ...)
  2017-01-26 19:48 ` [PATCH 5/5] blk-mq-sched: change ->dispatch_requests() to ->dispatch_request() Jens Axboe
@ 2017-01-27 12:10 ` Hannes Reinecke
  2017-01-27 14:40   ` Jens Axboe
  5 siblings, 1 reply; 18+ messages in thread
From: Hannes Reinecke @ 2017-01-27 12:10 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: bart.vanassche, hch, osandov, paolo.valente, hare

On 01/26/2017 08:48 PM, Jens Axboe wrote:
> I've been diving into the problems that Hannes reported, both the
> stalls related to shared tag maps and multiple hardware queues,
> but also the cases where we get suboptimal merging.
> 
> This series attempts to fix that.
> 
> Hannes, this should be identical to what I sent you as a bundled up
> patch earlier today. Would be great if you could run this through
> your testing.
> 
> The patches are against for-4.11/block
> 
Yep, full success:

4k  seq read : io=59850MB, bw=997.36MB/s, iops=255241, runt= 60028msec
4k rand read : io=399676KB, bw=6127.9KB/s, iops=1531, runt= 65231msec
4k seq write: io=12194MB, bw=207576KB/s, iops=51893, runt= 60155msec
4k rand write: io=111100KB, bw=1315.7KB/s, iops=328, runt= 84445msec

Although I'm getting a lockdep splat:
------------[ cut here ]------------
WARNING: CPU: 4 PID: 3211 at kernel/locking/lockdep.c:3514 lock_release+
DEBUG_LOCKS_WARN_ON(depth <= 0)
Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs l
uhci_hcd ahci ehci_hcd libahci ttm crc32c_intel serio_raw hpsa usbcore
CPU: 4 PID: 3211 Comm: fio Not tainted 4.10.0-rc3+ #598
Hardware name: HP ProLiant DL380p Gen8, BIOS P70 09/18/2013
Call Trace:
 dump_stack+0x85/0xc9
 __warn+0xd1/0xf0
 ? aio_write+0x118/0x170
 warn_slowpath_fmt+0x4f/0x60
 lock_release+0x2a7/0x490
 ? blkdev_write_iter+0x89/0xd0
 aio_write+0x138/0x170
 do_io_submit+0x4d2/0x8f0
 ? do_io_submit+0x413/0x8f0
 SyS_io_submit+0x10/0x20
 entry_SYSCALL_64_fastpath+0x23/0xc6

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: F. Imend�rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N�rnberg)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/5] blk-mq: various blk-mq/blk-mq-sched fixes
  2017-01-27 12:10 ` [PATCH 0/5] blk-mq: various blk-mq/blk-mq-sched fixes Hannes Reinecke
@ 2017-01-27 14:40   ` Jens Axboe
  0 siblings, 0 replies; 18+ messages in thread
From: Jens Axboe @ 2017-01-27 14:40 UTC (permalink / raw)
  To: Hannes Reinecke, linux-block
  Cc: bart.vanassche, hch, osandov, paolo.valente, hare

On 01/27/2017 05:10 AM, Hannes Reinecke wrote:
> On 01/26/2017 08:48 PM, Jens Axboe wrote:
>> I've been diving into the problems that Hannes reported, both the
>> stalls related to shared tag maps and multiple hardware queues,
>> but also the cases where we get suboptimal merging.
>>
>> This series attempts to fix that.
>>
>> Hannes, this should be identical to what I sent you as a bundled up
>> patch earlier today. Would be great if you could run this through
>> your testing.
>>
>> The patches are against for-4.11/block
>>
> Yep, full success:
> 
> 4k  seq read : io=59850MB, bw=997.36MB/s, iops=255241, runt= 60028msec
> 4k rand read : io=399676KB, bw=6127.9KB/s, iops=1531, runt= 65231msec
> 4k seq write: io=12194MB, bw=207576KB/s, iops=51893, runt= 60155msec
> 4k rand write: io=111100KB, bw=1315.7KB/s, iops=328, runt= 84445msec

Perfect, do you want to add a tested-by or something?

And how does performance compare to !mq with deadline or similar?

> Although I'm getting a lockdep splat:
> ------------[ cut here ]------------
> WARNING: CPU: 4 PID: 3211 at kernel/locking/lockdep.c:3514 lock_release+
> DEBUG_LOCKS_WARN_ON(depth <= 0)
> Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs l
> uhci_hcd ahci ehci_hcd libahci ttm crc32c_intel serio_raw hpsa usbcore
> CPU: 4 PID: 3211 Comm: fio Not tainted 4.10.0-rc3+ #598
> Hardware name: HP ProLiant DL380p Gen8, BIOS P70 09/18/2013

That's a known issue in the base that I have, you're supposed to run
it with master pulled in. Annoying, sorry, but it's not my patches.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-01-27 14:40 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-26 19:48 [PATCH 0/5] blk-mq: various blk-mq/blk-mq-sched fixes Jens Axboe
2017-01-26 19:48 ` [PATCH 1/5] blk-mq: improve scheduler queue sync/async running Jens Axboe
2017-01-26 19:59   ` Omar Sandoval
2017-01-26 19:48 ` [PATCH 2/5] blk-mq: fix potential race in queue restart and driver tag allocation Jens Axboe
2017-01-26 19:52   ` Jens Axboe
2017-01-26 20:04     ` Omar Sandoval
2017-01-26 19:48 ` [PATCH 3/5] blk-mq: release driver tag on a requeue event Jens Axboe
2017-01-26 20:06   ` Omar Sandoval
2017-01-26 19:48 ` [PATCH 4/5] blk-mq-sched: fix starvation for multiple hardware queues and shared tags Jens Axboe
2017-01-26 20:25   ` Omar Sandoval
2017-01-26 20:26     ` Jens Axboe
2017-01-26 19:48 ` [PATCH 5/5] blk-mq-sched: change ->dispatch_requests() to ->dispatch_request() Jens Axboe
2017-01-26 20:54   ` Omar Sandoval
2017-01-26 20:59     ` Jens Axboe
2017-01-26 21:11       ` Omar Sandoval
2017-01-26 21:15         ` Jens Axboe
2017-01-27 12:10 ` [PATCH 0/5] blk-mq: various blk-mq/blk-mq-sched fixes Hannes Reinecke
2017-01-27 14:40   ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.