All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched
@ 2017-04-15 12:38 Ming Lei
  2017-04-15 12:38 ` [PATCH 1/4] block: respect BLK_MQ_F_NO_SCHED Ming Lei
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Ming Lei @ 2017-04-15 12:38 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Christoph Hellwig, Omar Sandoval, Jozef Mikovic, Ming Lei

The 1st patch enhances BLK_MQ_F_NO_SCHED so that we can't change/
show available io schedulers on devices which don't support io
scheduler.

The 2nd patch passes BLK_MQ_F_NO_SCHED for avoiding one regression
on mtip32xx, which is introduced by blk-mq io scheduler.

The last two patches introduce BLK_MQ_F_SCHED_USE_HW_TAG so that
we can allow to use hardware tag for scheduler, then mq-deadline
can work well on mtip32xx. Even though other devices with enough
hardware tag space can benefit from this feature too.

The 1st two patches aims on v4.11, and the last two are for
v4.12.


Thanks,
Ming

Ming Lei (4):
  block: respect BLK_MQ_F_NO_SCHED
  mtip32xx: pass BLK_MQ_F_NO_SCHED
  blk-mq: introduce BLK_MQ_F_SCHED_USE_HW_TAG
  mtip32xx: use BLK_MQ_F_USE_SCHED_TAG

 block/blk-mq-sched.c              | 10 +++++++++-
 block/blk-mq.c                    | 35 +++++++++++++++++++++++++++++------
 block/elevator.c                  | 12 ++++++++++--
 drivers/block/mtip32xx/mtip32xx.c |  2 +-
 include/linux/blk-mq.h            |  1 +
 5 files changed, 50 insertions(+), 10 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/4] block: respect BLK_MQ_F_NO_SCHED
  2017-04-15 12:38 [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched Ming Lei
@ 2017-04-15 12:38 ` Ming Lei
  2017-04-15 12:38 ` [PATCH 2/4] mtip32xx: pass BLK_MQ_F_NO_SCHED Ming Lei
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2017-04-15 12:38 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Christoph Hellwig, Omar Sandoval, Jozef Mikovic, Ming Lei

If one driver claims that it doesn't support io scheduler via
BLK_MQ_F_NO_SCHED, we should not allow to change and show the
availabe io schedulers.

This patch adds check to enhance this behaviour.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/elevator.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index dbeecf7be719..4d9084a14c10 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -1098,12 +1098,20 @@ int elevator_change(struct request_queue *q, const char *name)
 }
 EXPORT_SYMBOL(elevator_change);
 
+static inline bool elv_support_iosched(struct request_queue *q)
+{
+	if (q->mq_ops && q->tag_set && (q->tag_set->flags &
+				BLK_MQ_F_NO_SCHED))
+		return false;
+	return true;
+}
+
 ssize_t elv_iosched_store(struct request_queue *q, const char *name,
 			  size_t count)
 {
 	int ret;
 
-	if (!(q->mq_ops || q->request_fn))
+	if (!(q->mq_ops || q->request_fn) || !elv_support_iosched(q))
 		return count;
 
 	ret = __elevator_change(q, name);
@@ -1135,7 +1143,7 @@ ssize_t elv_iosched_show(struct request_queue *q, char *name)
 			len += sprintf(name+len, "[%s] ", elv->elevator_name);
 			continue;
 		}
-		if (__e->uses_mq && q->mq_ops)
+		if (__e->uses_mq && q->mq_ops && elv_support_iosched(q))
 			len += sprintf(name+len, "%s ", __e->elevator_name);
 		else if (!__e->uses_mq && !q->mq_ops)
 			len += sprintf(name+len, "%s ", __e->elevator_name);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/4] mtip32xx: pass BLK_MQ_F_NO_SCHED
  2017-04-15 12:38 [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched Ming Lei
  2017-04-15 12:38 ` [PATCH 1/4] block: respect BLK_MQ_F_NO_SCHED Ming Lei
@ 2017-04-15 12:38 ` Ming Lei
  2017-04-15 12:38 ` [PATCH 3/4] blk-mq: introduce BLK_MQ_F_SCHED_USE_HW_TAG Ming Lei
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2017-04-15 12:38 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Christoph Hellwig, Omar Sandoval, Jozef Mikovic, Ming Lei

The recent introduced MQ IO scheduler breaks mtip32xx in the
following way.

mtip32xx use the 'request_index' passed to .init_request() as
hardware tag index for initializing hardware queue, and it
actually require that rq->tag is always same with 'request_index'
passed to .init_request(). Current blk-mq IO scheduler can't
guarantee this point, so this patch passes BLK_MQ_F_NO_SCHED
and at least make mtip32xx working.

This patch fixes the following strange hardware failure. The
issue can be triggered easily when doing I/O with mq-deadline
enabled.

[  186.972578] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 32993
[  186.972578] {1}[Hardware Error]: event severity: fatal
[  186.972579] {1}[Hardware Error]:  Error 0, type: fatal
[  186.972580] {1}[Hardware Error]:   section_type: PCIe error
[  186.972580] {1}[Hardware Error]:   port_type: 0, PCIe end point
[  186.972581] {1}[Hardware Error]:   version: 1.0
[  186.972581] {1}[Hardware Error]:   command: 0x0407, status: 0x0010
[  186.972582] {1}[Hardware Error]:   device_id: 0000:07:00.0
[  186.972582] {1}[Hardware Error]:   slot: 4
[  186.972583] {1}[Hardware Error]:   secondary_bus: 0x00
[  186.972583] {1}[Hardware Error]:   vendor_id: 0x1344, device_id: 0x5150
[  186.972584] {1}[Hardware Error]:   class_code: 008001
[  186.972585] Kernel panic - not syncing: Fatal hardware error!

Reported-by: Jozef Mikovic <jmikovic@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/mtip32xx/mtip32xx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
index 05e3e664ea1b..4e344246c8dd 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -3969,7 +3969,7 @@ static int mtip_block_initialize(struct driver_data *dd)
 	dd->tags.reserved_tags = 1;
 	dd->tags.cmd_size = sizeof(struct mtip_cmd);
 	dd->tags.numa_node = dd->numa_node;
-	dd->tags.flags = BLK_MQ_F_SHOULD_MERGE;
+	dd->tags.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_SCHED;
 	dd->tags.driver_data = dd;
 	dd->tags.timeout = MTIP_NCQ_CMD_TIMEOUT_MS;
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/4] blk-mq: introduce BLK_MQ_F_SCHED_USE_HW_TAG
  2017-04-15 12:38 [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched Ming Lei
  2017-04-15 12:38 ` [PATCH 1/4] block: respect BLK_MQ_F_NO_SCHED Ming Lei
  2017-04-15 12:38 ` [PATCH 2/4] mtip32xx: pass BLK_MQ_F_NO_SCHED Ming Lei
@ 2017-04-15 12:38 ` Ming Lei
  2017-04-15 12:38 ` [PATCH 4/4] mtip32xx: use BLK_MQ_F_USE_SCHED_TAG Ming Lei
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2017-04-15 12:38 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Christoph Hellwig, Omar Sandoval, Jozef Mikovic, Ming Lei

Some drivers, for example of mtip32xx, use the 'request_index'
passed to .init_request() as hardware tag index for initializing
hardware queue, and these drivers actually require that rq->tag
is always same with 'request_index' passed to .init_request().

After blk-mq I/O scheduler is in, the driver tag is allocated
during dispatching, and the allocated driver tag can't be same
with I/O scheduler's tag, so blk-mq I/O scheduler breaks these
devices, like mtip32xx.

This patch introduces BLK_MQ_F_SCHED_USE_HW_TAG flag, and just
allocate hardware tag for scheduler directly, then we can address
mtip32xx's issue.

On the other hand, this feature should make blk-mq io scheduler
more efficient than current way if the hardware tag space is big
enough, because we can save one tag allocation/release.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c   | 10 +++++++++-
 block/blk-mq.c         | 35 +++++++++++++++++++++++++++++------
 include/linux/blk-mq.h |  1 +
 3 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 9e3c0f92851b..1ff4b61135bc 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -83,7 +83,12 @@ struct request *blk_mq_sched_get_request(struct request_queue *q,
 		data->hctx = blk_mq_map_queue(q, data->ctx->cpu);
 
 	if (e) {
-		data->flags |= BLK_MQ_REQ_INTERNAL;
+		/*
+		 * If BLK_MQ_F_SCHED_USE_HW_TAG is set, we use hardware
+		 * tag as scheduler tag.
+		 */
+		if (!(data->hctx->flags & BLK_MQ_F_SCHED_USE_HW_TAG))
+			data->flags |= BLK_MQ_REQ_INTERNAL;
 
 		/*
 		 * Flush requests are special and go directly to the
@@ -445,6 +450,9 @@ static int blk_mq_sched_alloc_tags(struct request_queue *q,
 	struct blk_mq_tag_set *set = q->tag_set;
 	int ret;
 
+	if (hctx->flags & BLK_MQ_F_SCHED_USE_HW_TAG)
+		return 0;
+
 	hctx->sched_tags = blk_mq_alloc_rq_map(set, hctx_idx, q->nr_requests,
 					       set->reserved_tags);
 	if (!hctx->sched_tags)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index e536dacfae4c..ac6245bdbc8c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -247,9 +247,19 @@ struct request *__blk_mq_alloc_request(struct blk_mq_alloc_data *data,
 				rq->rq_flags = RQF_MQ_INFLIGHT;
 				atomic_inc(&data->hctx->nr_active);
 			}
-			rq->tag = tag;
-			rq->internal_tag = -1;
-			data->hctx->tags->rqs[rq->tag] = rq;
+			data->hctx->tags->rqs[tag] = rq;
+
+			/*
+			 * If we use hw tag for scheduling, postpone setting
+			 * rq->tag in blk_mq_get_driver_tag().
+			 */
+			if (data->hctx->flags & BLK_MQ_F_SCHED_USE_HW_TAG) {
+				rq->tag = -1;
+				rq->internal_tag = tag;
+			} else {
+				rq->tag = tag;
+				rq->internal_tag = -1;
+			}
 		}
 
 		blk_mq_rq_ctx_init(data->q, data->ctx, rq, op);
@@ -349,7 +359,7 @@ void __blk_mq_finish_request(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 	clear_bit(REQ_ATOM_POLL_SLEPT, &rq->atomic_flags);
 	if (rq->tag != -1)
 		blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
-	if (sched_tag != -1)
+	if (sched_tag != -1 && !(hctx->flags & BLK_MQ_F_SCHED_USE_HW_TAG))
 		blk_mq_sched_completed_request(hctx, rq);
 	blk_mq_sched_restart(hctx);
 	blk_queue_exit(q);
@@ -866,6 +876,12 @@ bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
 	if (rq->tag != -1)
 		goto done;
 
+	/* we buffered driver tag in rq->internal_tag */
+	if (data.hctx->flags & BLK_MQ_F_SCHED_USE_HW_TAG) {
+		rq->tag = rq->internal_tag;
+		goto done;
+	}
+
 	if (blk_mq_tag_is_reserved(data.hctx->sched_tags, rq->internal_tag))
 		data.flags |= BLK_MQ_REQ_RESERVED;
 
@@ -887,9 +903,15 @@ bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
 static void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
 				    struct request *rq)
 {
-	blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag);
+	unsigned tag = rq->tag;
+
 	rq->tag = -1;
 
+	if (hctx->flags & BLK_MQ_F_SCHED_USE_HW_TAG)
+		return;
+
+	blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, tag);
+
 	if (rq->rq_flags & RQF_MQ_INFLIGHT) {
 		rq->rq_flags &= ~RQF_MQ_INFLIGHT;
 		atomic_dec(&hctx->nr_active);
@@ -2852,7 +2874,8 @@ bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
 		blk_flush_plug_list(plug, false);
 
 	hctx = q->queue_hw_ctx[blk_qc_t_to_queue_num(cookie)];
-	if (!blk_qc_t_is_internal(cookie))
+	if (!blk_qc_t_is_internal(cookie) || (hctx->flags &
+			BLK_MQ_F_SCHED_USE_HW_TAG))
 		rq = blk_mq_tag_to_rq(hctx->tags, blk_qc_t_to_tag(cookie));
 	else
 		rq = blk_mq_tag_to_rq(hctx->sched_tags, blk_qc_t_to_tag(cookie));
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index b90c3d5766cd..be605a05f340 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -154,6 +154,7 @@ enum {
 	BLK_MQ_F_SG_MERGE	= 1 << 2,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
 	BLK_MQ_F_NO_SCHED	= 1 << 6,
+	BLK_MQ_F_SCHED_USE_HW_TAG	= 1 << 7,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
 	BLK_MQ_F_ALLOC_POLICY_BITS = 1,
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/4] mtip32xx: use BLK_MQ_F_USE_SCHED_TAG
  2017-04-15 12:38 [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched Ming Lei
                   ` (2 preceding siblings ...)
  2017-04-15 12:38 ` [PATCH 3/4] blk-mq: introduce BLK_MQ_F_SCHED_USE_HW_TAG Ming Lei
@ 2017-04-15 12:38 ` Ming Lei
  2017-04-16 16:03 ` [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched Ming Lei
  2017-04-19 20:17 ` Jens Axboe
  5 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2017-04-15 12:38 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Christoph Hellwig, Omar Sandoval, Jozef Mikovic, Ming Lei

This patch applys the new introduced flag of BLK_MQ_F_USE_SCHED_TAG
to make mq-deadline working on mtip32xx. With this flag, we can
allocate hardware tag for scheduler, then mtip32xx can work well.

Also mtip32xx has 256 queue depth, which is same with the default
value of q->nr_requests, so in theory performance loss won't happen.

Finally BLK_MQ_F_NO_SCHED isn't necessary any more.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/mtip32xx/mtip32xx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
index 4e344246c8dd..203b18a9eff0 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -3969,7 +3969,7 @@ static int mtip_block_initialize(struct driver_data *dd)
 	dd->tags.reserved_tags = 1;
 	dd->tags.cmd_size = sizeof(struct mtip_cmd);
 	dd->tags.numa_node = dd->numa_node;
-	dd->tags.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_SCHED;
+	dd->tags.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SCHED_USE_HW_TAG;
 	dd->tags.driver_data = dd;
 	dd->tags.timeout = MTIP_NCQ_CMD_TIMEOUT_MS;
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched
  2017-04-15 12:38 [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched Ming Lei
                   ` (3 preceding siblings ...)
  2017-04-15 12:38 ` [PATCH 4/4] mtip32xx: use BLK_MQ_F_USE_SCHED_TAG Ming Lei
@ 2017-04-16 16:03 ` Ming Lei
  2017-04-17 17:30   ` Omar Sandoval
  2017-04-19 20:17 ` Jens Axboe
  5 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2017-04-16 16:03 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Christoph Hellwig, Omar Sandoval, Jozef Mikovic

On Sat, Apr 15, 2017 at 08:38:21PM +0800, Ming Lei wrote:
> The 1st patch enhances BLK_MQ_F_NO_SCHED so that we can't change/
> show available io schedulers on devices which don't support io
> scheduler.
> 
> The 2nd patch passes BLK_MQ_F_NO_SCHED for avoiding one regression
> on mtip32xx, which is introduced by blk-mq io scheduler.
> 
> The last two patches introduce BLK_MQ_F_SCHED_USE_HW_TAG so that
> we can allow to use hardware tag for scheduler, then mq-deadline
> can work well on mtip32xx. Even though other devices with enough
> hardware tag space can benefit from this feature too.
> 
> The 1st two patches aims on v4.11, and the last two are for
> v4.12.

Please ignore this patchset, and I will post another serial for
mtip32xx fix.

thanks,
Ming

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched
  2017-04-16 16:03 ` [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched Ming Lei
@ 2017-04-17 17:30   ` Omar Sandoval
  2017-04-19  1:10     ` Ming Lei
  0 siblings, 1 reply; 25+ messages in thread
From: Omar Sandoval @ 2017-04-17 17:30 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Omar Sandoval, Jozef Mikovic

On Mon, Apr 17, 2017 at 12:03:53AM +0800, Ming Lei wrote:
> On Sat, Apr 15, 2017 at 08:38:21PM +0800, Ming Lei wrote:
> > The 1st patch enhances BLK_MQ_F_NO_SCHED so that we can't change/
> > show available io schedulers on devices which don't support io
> > scheduler.
> > 
> > The 2nd patch passes BLK_MQ_F_NO_SCHED for avoiding one regression
> > on mtip32xx, which is introduced by blk-mq io scheduler.
> > 
> > The last two patches introduce BLK_MQ_F_SCHED_USE_HW_TAG so that
> > we can allow to use hardware tag for scheduler, then mq-deadline
> > can work well on mtip32xx. Even though other devices with enough
> > hardware tag space can benefit from this feature too.
> > 
> > The 1st two patches aims on v4.11, and the last two are for
> > v4.12.
> 
> Please ignore this patchset, and I will post another serial for
> mtip32xx fix.
> 
> thanks,
> Ming

Regardless of the mtip32xx fix, I actually wanted to skip the scheduler
tags when possible just for performance reasons. I don't think that
should be a device-specific setting. We can let the I/O scheduler decide
how many tags it wants, and if the device has at least that many tags,
just use the hardware tags directly. I'll probably look at doing that
for 4.13, we'll see if it actually makes a performance difference.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched
  2017-04-17 17:30   ` Omar Sandoval
@ 2017-04-19  1:10     ` Ming Lei
  0 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2017-04-19  1:10 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Omar Sandoval, Jozef Mikovic

On Mon, Apr 17, 2017 at 10:30:46AM -0700, Omar Sandoval wrote:
> On Mon, Apr 17, 2017 at 12:03:53AM +0800, Ming Lei wrote:
> > On Sat, Apr 15, 2017 at 08:38:21PM +0800, Ming Lei wrote:
> > > The 1st patch enhances BLK_MQ_F_NO_SCHED so that we can't change/
> > > show available io schedulers on devices which don't support io
> > > scheduler.
> > > 
> > > The 2nd patch passes BLK_MQ_F_NO_SCHED for avoiding one regression
> > > on mtip32xx, which is introduced by blk-mq io scheduler.
> > > 
> > > The last two patches introduce BLK_MQ_F_SCHED_USE_HW_TAG so that
> > > we can allow to use hardware tag for scheduler, then mq-deadline
> > > can work well on mtip32xx. Even though other devices with enough
> > > hardware tag space can benefit from this feature too.
> > > 
> > > The 1st two patches aims on v4.11, and the last two are for
> > > v4.12.
> > 
> > Please ignore this patchset, and I will post another serial for
> > mtip32xx fix.
> > 
> > thanks,
> > Ming
> 
> Regardless of the mtip32xx fix, I actually wanted to skip the scheduler
> tags when possible just for performance reasons. I don't think that
> should be a device-specific setting. We can let the I/O scheduler decide
> how many tags it wants, and if the device has at least that many tags,
> just use the hardware tags directly. I'll probably look at doing that
> for 4.13, we'll see if it actually makes a performance difference.

Yes, I thought about that too, the following policy may be applied at
default:

	- if queue depth is not less than q->nr_requests
	and
	- the tag space isn't shared

We still can keep the flag of BLK_MQ_F_SCHED_USE_HW_TAG so that
drivers can decide if they want to do that, since I guess it may
help some shared tag cases too.

Now the mtip32xx fix has been sent out already, I will post this
patchset of using hw tag for scheduler out soon for review.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched
  2017-04-15 12:38 [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched Ming Lei
                   ` (4 preceding siblings ...)
  2017-04-16 16:03 ` [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched Ming Lei
@ 2017-04-19 20:17 ` Jens Axboe
  2017-04-20  0:44   ` Ming Lei
  5 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2017-04-19 20:17 UTC (permalink / raw)
  To: Ming Lei, linux-block; +Cc: Christoph Hellwig, Omar Sandoval, Jozef Mikovic

On 04/15/2017 06:38 AM, Ming Lei wrote:
> The 1st patch enhances BLK_MQ_F_NO_SCHED so that we can't change/
> show available io schedulers on devices which don't support io
> scheduler.
> 
> The 2nd patch passes BLK_MQ_F_NO_SCHED for avoiding one regression
> on mtip32xx, which is introduced by blk-mq io scheduler.

I've added 1-2 for 4.11 for now, and then let's plan on getting
a real fix done for 4.12.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched
  2017-04-19 20:17 ` Jens Axboe
@ 2017-04-20  0:44   ` Ming Lei
  2017-04-20  0:55     ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2017-04-20  0:44 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Omar Sandoval, Jozef Mikovic

On Wed, Apr 19, 2017 at 02:17:45PM -0600, Jens Axboe wrote:
> On 04/15/2017 06:38 AM, Ming Lei wrote:
> > The 1st patch enhances BLK_MQ_F_NO_SCHED so that we can't change/
> > show available io schedulers on devices which don't support io
> > scheduler.
> > 
> > The 2nd patch passes BLK_MQ_F_NO_SCHED for avoiding one regression
> > on mtip32xx, which is introduced by blk-mq io scheduler.
> 
> I've added 1-2 for 4.11 for now, and then let's plan on getting
> a real fix done for 4.12.

The real fix has been posted out already:

	http://marc.info/?t=149256196700001&r=1&w=2

Could you consider that for 4.12?

Thanks,
Ming

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched
  2017-04-20  0:44   ` Ming Lei
@ 2017-04-20  0:55     ` Jens Axboe
  2017-04-20  1:03       ` Ming Lei
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2017-04-20  0:55 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block, Christoph Hellwig, Omar Sandoval, Jozef Mikovic

On 04/19/2017 06:44 PM, Ming Lei wrote:
> On Wed, Apr 19, 2017 at 02:17:45PM -0600, Jens Axboe wrote:
>> On 04/15/2017 06:38 AM, Ming Lei wrote:
>>> The 1st patch enhances BLK_MQ_F_NO_SCHED so that we can't change/
>>> show available io schedulers on devices which don't support io
>>> scheduler.
>>>
>>> The 2nd patch passes BLK_MQ_F_NO_SCHED for avoiding one regression
>>> on mtip32xx, which is introduced by blk-mq io scheduler.
>>
>> I've added 1-2 for 4.11 for now, and then let's plan on getting
>> a real fix done for 4.12.
> 
> The real fix has been posted out already:
> 
> 	http://marc.info/?t=149256196700001&r=1&w=2
> 
> Could you consider that for 4.12?

I didn't particularly like that approach. And we are really late
in the 4.11 cycle, and your first two patches fit the bill for
how we should fix it in the interim.

I'll take a closer look for 4.12, I want to get this fixed
for real, obviously.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched
  2017-04-20  0:55     ` Jens Axboe
@ 2017-04-20  1:03       ` Ming Lei
  2017-04-20  4:54         ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2017-04-20  1:03 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Omar Sandoval, Jozef Mikovic

On Wed, Apr 19, 2017 at 06:55:30PM -0600, Jens Axboe wrote:
> On 04/19/2017 06:44 PM, Ming Lei wrote:
> > On Wed, Apr 19, 2017 at 02:17:45PM -0600, Jens Axboe wrote:
> >> On 04/15/2017 06:38 AM, Ming Lei wrote:
> >>> The 1st patch enhances BLK_MQ_F_NO_SCHED so that we can't change/
> >>> show available io schedulers on devices which don't support io
> >>> scheduler.
> >>>
> >>> The 2nd patch passes BLK_MQ_F_NO_SCHED for avoiding one regression
> >>> on mtip32xx, which is introduced by blk-mq io scheduler.
> >>
> >> I've added 1-2 for 4.11 for now, and then let's plan on getting
> >> a real fix done for 4.12.
> > 
> > The real fix has been posted out already:
> > 
> > 	http://marc.info/?t=149256196700001&r=1&w=2
> > 
> > Could you consider that for 4.12?
> 
> I didn't particularly like that approach. And we are really late
> in the 4.11 cycle, and your first two patches fit the bill for
> how we should fix it in the interim.

If we don't need to reserve tag for internal command, I am happy
to fix it in the interim. However, the mtip32xx driver has to
ask blk-mq to reserve the tag zero for internal command. Then
if we can't allow the driver to use that reserved tag, it looks
quite awkward, doesn't it?

> I'll take a closer look for 4.12, I want to get this fixed
> for real, obviously.

OK, thanks, let's discuss on solutions.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched
  2017-04-20  1:03       ` Ming Lei
@ 2017-04-20  4:54         ` Christoph Hellwig
  2017-04-20  8:30           ` Ming Lei
  2017-04-20 16:10           ` Jens Axboe
  0 siblings, 2 replies; 25+ messages in thread
From: Christoph Hellwig @ 2017-04-20  4:54 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Omar Sandoval, Jozef Mikovic

On Thu, Apr 20, 2017 at 09:03:47AM +0800, Ming Lei wrote:
> If we don't need to reserve tag for internal command, I am happy
> to fix it in the interim. However, the mtip32xx driver has to
> ask blk-mq to reserve the tag zero for internal command. Then
> if we can't allow the driver to use that reserved tag, it looks
> quite awkward, doesn't it?

It doesn't.  Just offset the hardware tags by 1 from the blk-mq tags.

But I'm pretty sure the hardware wouldn't require a specific tag
for the internal ops.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched
  2017-04-20  4:54         ` Christoph Hellwig
@ 2017-04-20  8:30           ` Ming Lei
  2017-04-26 10:48             ` Ming Lei
  2017-04-20 16:10           ` Jens Axboe
  1 sibling, 1 reply; 25+ messages in thread
From: Ming Lei @ 2017-04-20  8:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, Omar Sandoval, Jozef Mikovic

Hi Christoph,

On Wed, Apr 19, 2017 at 09:54:08PM -0700, Christoph Hellwig wrote:
> On Thu, Apr 20, 2017 at 09:03:47AM +0800, Ming Lei wrote:
> > If we don't need to reserve tag for internal command, I am happy
> > to fix it in the interim. However, the mtip32xx driver has to
> > ask blk-mq to reserve the tag zero for internal command. Then
> > if we can't allow the driver to use that reserved tag, it looks
> > quite awkward, doesn't it?
> 
> It doesn't.  Just offset the hardware tags by 1 from the blk-mq tags.
> 
> But I'm pretty sure the hardware wouldn't require a specific tag
> for the internal ops.

>From mtip32xx code, the tag 0 is used to send internal command only, and
not used for submitting normal request.

>From code of mtip_issue_non_ncq_command() and mtip_issue_ncq_command(),
even same hardware address is writen to when issuing non-ncq and ncq
commands. So I think the internal ops and normal requests share one same
tag space on mtip32xx.

Could you explain it a bit why you think the hw doesn't need the tag
for internal ops in this case?

Thanks,
Ming

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched
  2017-04-20  4:54         ` Christoph Hellwig
  2017-04-20  8:30           ` Ming Lei
@ 2017-04-20 16:10           ` Jens Axboe
  2017-04-21 10:54             ` Christoph Hellwig
  1 sibling, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2017-04-20 16:10 UTC (permalink / raw)
  To: Christoph Hellwig, Ming Lei; +Cc: linux-block, Omar Sandoval, Jozef Mikovic

On 04/19/2017 10:54 PM, Christoph Hellwig wrote:
> On Thu, Apr 20, 2017 at 09:03:47AM +0800, Ming Lei wrote:
>> If we don't need to reserve tag for internal command, I am happy
>> to fix it in the interim. However, the mtip32xx driver has to
>> ask blk-mq to reserve the tag zero for internal command. Then
>> if we can't allow the driver to use that reserved tag, it looks
>> quite awkward, doesn't it?
> 
> It doesn't.  Just offset the hardware tags by 1 from the blk-mq tags.
> 
> But I'm pretty sure the hardware wouldn't require a specific tag
> for the internal ops.

Right, the hardware doesn't require it at all, it's just an oddity
of the driver that it hardwares tag 0 as an internal tag. That could
definitely get cleaned up.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched
  2017-04-20 16:10           ` Jens Axboe
@ 2017-04-21 10:54             ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2017-04-21 10:54 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Ming Lei, linux-block, Omar Sandoval, Jozef Mikovic

On Thu, Apr 20, 2017 at 10:10:59AM -0600, Jens Axboe wrote:
> Right, the hardware doesn't require it at all, it's just an oddity
> of the driver that it hardwares tag 0 as an internal tag. That could
> definitely get cleaned up.

That's exactly what I suspect, especially as the hardware is AHCI
on stereoids.  But I'd need a data sheet to confirm my suspicion.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched
  2017-04-20  8:30           ` Ming Lei
@ 2017-04-26 10:48             ` Ming Lei
  2017-04-26 18:15               ` Jens Axboe
  2017-04-27  6:32               ` Christoph Hellwig
  0 siblings, 2 replies; 25+ messages in thread
From: Ming Lei @ 2017-04-26 10:48 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, Omar Sandoval, Jozef Mikovic

Hi Christoph,

On Thu, Apr 20, 2017 at 04:30:42PM +0800, Ming Lei wrote:
> Hi Christoph,
> 
> On Wed, Apr 19, 2017 at 09:54:08PM -0700, Christoph Hellwig wrote:
> > On Thu, Apr 20, 2017 at 09:03:47AM +0800, Ming Lei wrote:
> > > If we don't need to reserve tag for internal command, I am happy
> > > to fix it in the interim. However, the mtip32xx driver has to
> > > ask blk-mq to reserve the tag zero for internal command. Then
> > > if we can't allow the driver to use that reserved tag, it looks
> > > quite awkward, doesn't it?
> > 
> > It doesn't.  Just offset the hardware tags by 1 from the blk-mq tags.
> > 
> > But I'm pretty sure the hardware wouldn't require a specific tag
> > for the internal ops.
> 
> From mtip32xx code, the tag 0 is used to send internal command only, and
> not used for submitting normal request.
> 
> From code of mtip_issue_non_ncq_command() and mtip_issue_ncq_command(),
> even same hardware address is writen to when issuing non-ncq and ncq
> commands. So I think the internal ops and normal requests share one same
> tag space on mtip32xx.
> 
> Could you explain it a bit why you think the hw doesn't need the tag
> for internal ops in this case?

Gentle ping...

Looks there are some choices for this issue:

1) if internal ops uses independent tag space
- we need to clean up the mtip32xx driver

2) if internal ops shares tag space with normal request
- export blk_mq_get_driver_tag() for mtip32xx

or

- use private way to send internal ops, and the drawback
is that we still need to reserve tag and the reserved tag
can't be used at all.

>From mtip32xx source code, I can see both share same tag space.
When I try to search hw manual via google, not succeed yet.

Christoph, could you share us your findings about if internal ops
uses independent tag space or not?

Thanks,
Ming

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched
  2017-04-26 10:48             ` Ming Lei
@ 2017-04-26 18:15               ` Jens Axboe
  2017-04-26 18:22                 ` Jens Axboe
  2017-04-27  6:32               ` Christoph Hellwig
  1 sibling, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2017-04-26 18:15 UTC (permalink / raw)
  To: Ming Lei, Christoph Hellwig; +Cc: linux-block, Omar Sandoval, Jozef Mikovic

On 04/26/2017 03:48 AM, Ming Lei wrote:
> Hi Christoph,
> 
> On Thu, Apr 20, 2017 at 04:30:42PM +0800, Ming Lei wrote:
>> Hi Christoph,
>>
>> On Wed, Apr 19, 2017 at 09:54:08PM -0700, Christoph Hellwig wrote:
>>> On Thu, Apr 20, 2017 at 09:03:47AM +0800, Ming Lei wrote:
>>>> If we don't need to reserve tag for internal command, I am happy
>>>> to fix it in the interim. However, the mtip32xx driver has to
>>>> ask blk-mq to reserve the tag zero for internal command. Then
>>>> if we can't allow the driver to use that reserved tag, it looks
>>>> quite awkward, doesn't it?
>>>
>>> It doesn't.  Just offset the hardware tags by 1 from the blk-mq tags.
>>>
>>> But I'm pretty sure the hardware wouldn't require a specific tag
>>> for the internal ops.
>>
>> From mtip32xx code, the tag 0 is used to send internal command only, and
>> not used for submitting normal request.
>>
>> From code of mtip_issue_non_ncq_command() and mtip_issue_ncq_command(),
>> even same hardware address is writen to when issuing non-ncq and ncq
>> commands. So I think the internal ops and normal requests share one same
>> tag space on mtip32xx.
>>
>> Could you explain it a bit why you think the hw doesn't need the tag
>> for internal ops in this case?
> 
> Gentle ping...
> 
> Looks there are some choices for this issue:
> 
> 1) if internal ops uses independent tag space
> - we need to clean up the mtip32xx driver
> 
> 2) if internal ops shares tag space with normal request
> - export blk_mq_get_driver_tag() for mtip32xx
> 
> or
> 
> - use private way to send internal ops, and the drawback
> is that we still need to reserve tag and the reserved tag
> can't be used at all.
> 
> From mtip32xx source code, I can see both share same tag space.
> When I try to search hw manual via google, not succeed yet.
> 
> Christoph, could you share us your findings about if internal ops
> uses independent tag space or not?

Why aren't we just bypassing for RESERVED? If someone is asking
for an reserved tag, ensure that we use the right tag pool (sched
vs normal) and sub pool (reserved vs normal). Then insert just
bypasses the scheduler, like it already does if we have a driver
tag assigned already.

I've got a few of these cards, but traveling. I'll wire up something
and test it end this week.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched
  2017-04-26 18:15               ` Jens Axboe
@ 2017-04-26 18:22                 ` Jens Axboe
  2017-04-27  3:14                   ` Ming Lei
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2017-04-26 18:22 UTC (permalink / raw)
  To: Ming Lei, Christoph Hellwig; +Cc: linux-block, Omar Sandoval, Jozef Mikovic

On 04/26/2017 11:15 AM, Jens Axboe wrote:
> On 04/26/2017 03:48 AM, Ming Lei wrote:
>> Hi Christoph,
>>
>> On Thu, Apr 20, 2017 at 04:30:42PM +0800, Ming Lei wrote:
>>> Hi Christoph,
>>>
>>> On Wed, Apr 19, 2017 at 09:54:08PM -0700, Christoph Hellwig wrote:
>>>> On Thu, Apr 20, 2017 at 09:03:47AM +0800, Ming Lei wrote:
>>>>> If we don't need to reserve tag for internal command, I am happy
>>>>> to fix it in the interim. However, the mtip32xx driver has to
>>>>> ask blk-mq to reserve the tag zero for internal command. Then
>>>>> if we can't allow the driver to use that reserved tag, it looks
>>>>> quite awkward, doesn't it?
>>>>
>>>> It doesn't.  Just offset the hardware tags by 1 from the blk-mq tags.
>>>>
>>>> But I'm pretty sure the hardware wouldn't require a specific tag
>>>> for the internal ops.
>>>
>>> From mtip32xx code, the tag 0 is used to send internal command only, and
>>> not used for submitting normal request.
>>>
>>> From code of mtip_issue_non_ncq_command() and mtip_issue_ncq_command(),
>>> even same hardware address is writen to when issuing non-ncq and ncq
>>> commands. So I think the internal ops and normal requests share one same
>>> tag space on mtip32xx.
>>>
>>> Could you explain it a bit why you think the hw doesn't need the tag
>>> for internal ops in this case?
>>
>> Gentle ping...
>>
>> Looks there are some choices for this issue:
>>
>> 1) if internal ops uses independent tag space
>> - we need to clean up the mtip32xx driver
>>
>> 2) if internal ops shares tag space with normal request
>> - export blk_mq_get_driver_tag() for mtip32xx
>>
>> or
>>
>> - use private way to send internal ops, and the drawback
>> is that we still need to reserve tag and the reserved tag
>> can't be used at all.
>>
>> From mtip32xx source code, I can see both share same tag space.
>> When I try to search hw manual via google, not succeed yet.
>>
>> Christoph, could you share us your findings about if internal ops
>> uses independent tag space or not?
> 
> Why aren't we just bypassing for RESERVED? If someone is asking
> for an reserved tag, ensure that we use the right tag pool (sched
> vs normal) and sub pool (reserved vs normal). Then insert just
> bypasses the scheduler, like it already does if we have a driver
> tag assigned already.
> 
> I've got a few of these cards, but traveling. I'll wire up something
> and test it end this week.

Something like this. Totally untested.

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index c974a1bbf4cb..11109b5b64b6 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -119,7 +119,11 @@ struct request *blk_mq_sched_get_request(struct request_queue *q,
 	if (likely(!data->hctx))
 		data->hctx = blk_mq_map_queue(q, data->ctx->cpu);
 
-	if (e) {
+	/*
+	 * For a reserved tag, allocate a normal request since we might
+	 * have driver dependencies on the value of the internal tag.
+	 */
+	if (e && !(data->flags & BLK_MQ_REQ_RESERVED)) {
 		data->flags |= BLK_MQ_REQ_INTERNAL;
 
 		/*

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched
  2017-04-26 18:22                 ` Jens Axboe
@ 2017-04-27  3:14                   ` Ming Lei
  2017-04-27 13:49                     ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2017-04-27  3:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, linux-block, Omar Sandoval, Jozef Mikovic

On Wed, Apr 26, 2017 at 11:22:43AM -0700, Jens Axboe wrote:
> On 04/26/2017 11:15 AM, Jens Axboe wrote:
> > On 04/26/2017 03:48 AM, Ming Lei wrote:
> >> Hi Christoph,
> >>
> >> On Thu, Apr 20, 2017 at 04:30:42PM +0800, Ming Lei wrote:
> >>> Hi Christoph,
> >>>
> >>> On Wed, Apr 19, 2017 at 09:54:08PM -0700, Christoph Hellwig wrote:
> >>>> On Thu, Apr 20, 2017 at 09:03:47AM +0800, Ming Lei wrote:
> >>>>> If we don't need to reserve tag for internal command, I am happy
> >>>>> to fix it in the interim. However, the mtip32xx driver has to
> >>>>> ask blk-mq to reserve the tag zero for internal command. Then
> >>>>> if we can't allow the driver to use that reserved tag, it looks
> >>>>> quite awkward, doesn't it?
> >>>>
> >>>> It doesn't.  Just offset the hardware tags by 1 from the blk-mq tags.
> >>>>
> >>>> But I'm pretty sure the hardware wouldn't require a specific tag
> >>>> for the internal ops.
> >>>
> >>> From mtip32xx code, the tag 0 is used to send internal command only, and
> >>> not used for submitting normal request.
> >>>
> >>> From code of mtip_issue_non_ncq_command() and mtip_issue_ncq_command(),
> >>> even same hardware address is writen to when issuing non-ncq and ncq
> >>> commands. So I think the internal ops and normal requests share one same
> >>> tag space on mtip32xx.
> >>>
> >>> Could you explain it a bit why you think the hw doesn't need the tag
> >>> for internal ops in this case?
> >>
> >> Gentle ping...
> >>
> >> Looks there are some choices for this issue:
> >>
> >> 1) if internal ops uses independent tag space
> >> - we need to clean up the mtip32xx driver
> >>
> >> 2) if internal ops shares tag space with normal request
> >> - export blk_mq_get_driver_tag() for mtip32xx
> >>
> >> or
> >>
> >> - use private way to send internal ops, and the drawback
> >> is that we still need to reserve tag and the reserved tag
> >> can't be used at all.
> >>
> >> From mtip32xx source code, I can see both share same tag space.
> >> When I try to search hw manual via google, not succeed yet.
> >>
> >> Christoph, could you share us your findings about if internal ops
> >> uses independent tag space or not?
> > 
> > Why aren't we just bypassing for RESERVED? If someone is asking
> > for an reserved tag, ensure that we use the right tag pool (sched
> > vs normal) and sub pool (reserved vs normal). Then insert just
> > bypasses the scheduler, like it already does if we have a driver
> > tag assigned already.
> > 
> > I've got a few of these cards, but traveling. I'll wire up something
> > and test it end this week.
> 
> Something like this. Totally untested.
> 
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index c974a1bbf4cb..11109b5b64b6 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -119,7 +119,11 @@ struct request *blk_mq_sched_get_request(struct request_queue *q,
>  	if (likely(!data->hctx))
>  		data->hctx = blk_mq_map_queue(q, data->ctx->cpu);
>  
> -	if (e) {
> +	/*
> +	 * For a reserved tag, allocate a normal request since we might
> +	 * have driver dependencies on the value of the internal tag.
> +	 */
> +	if (e && !(data->flags & BLK_MQ_REQ_RESERVED)) {
>  		data->flags |= BLK_MQ_REQ_INTERNAL;
>  
>  		/*

This one looks reasonable, since reserved rqs often needn't scheduling,
also works well on mtip32xx with another patch[1] together.

Tested-by: Ming Lei <ming.lei@redhat.com>

[1] http://marc.info/?l=linux-block&m=149256194703481&w=2

Thanks,
Ming

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched
  2017-04-26 10:48             ` Ming Lei
  2017-04-26 18:15               ` Jens Axboe
@ 2017-04-27  6:32               ` Christoph Hellwig
  1 sibling, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2017-04-27  6:32 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Omar Sandoval, Jozef Mikovic

On Wed, Apr 26, 2017 at 06:48:43PM +0800, Ming Lei wrote:
> Looks there are some choices for this issue:
> 
> 1) if internal ops uses independent tag space
> - we need to clean up the mtip32xx driver
> 
> 2) if internal ops shares tag space with normal request
> - export blk_mq_get_driver_tag() for mtip32xx
> 
> or
> 
> - use private way to send internal ops, and the drawback
> is that we still need to reserve tag and the reserved tag
> can't be used at all.

The fundamental problem in mtip is that it uses the block layer to
allocate requests, but not to issue them.  Either it needs to switch to
use blk_execute_rq(_nowait) to issue the command, or it needs to stop
using blk_mq_alloc_request to allocatate the internal commands.
Everything else is a layering violation just asking for more problems
down the road.

> 
> >From mtip32xx source code, I can see both share same tag space.
> When I try to search hw manual via google, not succeed yet.
> 
> Christoph, could you share us your findings about if internal ops
> uses independent tag space or not?

As said earlier in the thread - mtip32xx is a beefed up AHCI design,
and the internal commands issued are bog standard ATA commands.  So
I would be very surprised if the hardware requires a specific tag.
The way the driver is written it might very well expect a tag to be
set aside, though.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched
  2017-04-27  3:14                   ` Ming Lei
@ 2017-04-27 13:49                     ` Jens Axboe
  2017-04-27 15:20                       ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2017-04-27 13:49 UTC (permalink / raw)
  To: Ming Lei; +Cc: Christoph Hellwig, linux-block, Omar Sandoval, Jozef Mikovic

On 04/26/2017 08:14 PM, Ming Lei wrote:
> On Wed, Apr 26, 2017 at 11:22:43AM -0700, Jens Axboe wrote:
>> On 04/26/2017 11:15 AM, Jens Axboe wrote:
>>> On 04/26/2017 03:48 AM, Ming Lei wrote:
>>>> Hi Christoph,
>>>>
>>>> On Thu, Apr 20, 2017 at 04:30:42PM +0800, Ming Lei wrote:
>>>>> Hi Christoph,
>>>>>
>>>>> On Wed, Apr 19, 2017 at 09:54:08PM -0700, Christoph Hellwig wrote:
>>>>>> On Thu, Apr 20, 2017 at 09:03:47AM +0800, Ming Lei wrote:
>>>>>>> If we don't need to reserve tag for internal command, I am happy
>>>>>>> to fix it in the interim. However, the mtip32xx driver has to
>>>>>>> ask blk-mq to reserve the tag zero for internal command. Then
>>>>>>> if we can't allow the driver to use that reserved tag, it looks
>>>>>>> quite awkward, doesn't it?
>>>>>>
>>>>>> It doesn't.  Just offset the hardware tags by 1 from the blk-mq tags.
>>>>>>
>>>>>> But I'm pretty sure the hardware wouldn't require a specific tag
>>>>>> for the internal ops.
>>>>>
>>>>> From mtip32xx code, the tag 0 is used to send internal command only, and
>>>>> not used for submitting normal request.
>>>>>
>>>>> From code of mtip_issue_non_ncq_command() and mtip_issue_ncq_command(),
>>>>> even same hardware address is writen to when issuing non-ncq and ncq
>>>>> commands. So I think the internal ops and normal requests share one same
>>>>> tag space on mtip32xx.
>>>>>
>>>>> Could you explain it a bit why you think the hw doesn't need the tag
>>>>> for internal ops in this case?
>>>>
>>>> Gentle ping...
>>>>
>>>> Looks there are some choices for this issue:
>>>>
>>>> 1) if internal ops uses independent tag space
>>>> - we need to clean up the mtip32xx driver
>>>>
>>>> 2) if internal ops shares tag space with normal request
>>>> - export blk_mq_get_driver_tag() for mtip32xx
>>>>
>>>> or
>>>>
>>>> - use private way to send internal ops, and the drawback
>>>> is that we still need to reserve tag and the reserved tag
>>>> can't be used at all.
>>>>
>>>> From mtip32xx source code, I can see both share same tag space.
>>>> When I try to search hw manual via google, not succeed yet.
>>>>
>>>> Christoph, could you share us your findings about if internal ops
>>>> uses independent tag space or not?
>>>
>>> Why aren't we just bypassing for RESERVED? If someone is asking
>>> for an reserved tag, ensure that we use the right tag pool (sched
>>> vs normal) and sub pool (reserved vs normal). Then insert just
>>> bypasses the scheduler, like it already does if we have a driver
>>> tag assigned already.
>>>
>>> I've got a few of these cards, but traveling. I'll wire up something
>>> and test it end this week.
>>
>> Something like this. Totally untested.
>>
>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>> index c974a1bbf4cb..11109b5b64b6 100644
>> --- a/block/blk-mq-sched.c
>> +++ b/block/blk-mq-sched.c
>> @@ -119,7 +119,11 @@ struct request *blk_mq_sched_get_request(struct request_queue *q,
>>  	if (likely(!data->hctx))
>>  		data->hctx = blk_mq_map_queue(q, data->ctx->cpu);
>>  
>> -	if (e) {
>> +	/*
>> +	 * For a reserved tag, allocate a normal request since we might
>> +	 * have driver dependencies on the value of the internal tag.
>> +	 */
>> +	if (e && !(data->flags & BLK_MQ_REQ_RESERVED)) {
>>  		data->flags |= BLK_MQ_REQ_INTERNAL;
>>  
>>  		/*
> 
> This one looks reasonable, since reserved rqs often needn't scheduling,
> also works well on mtip32xx with another patch[1] together.
> 
> Tested-by: Ming Lei <ming.lei@redhat.com>

Thanks Ming, I've added that patch and your patch to initialize the cmd
at runtime. Queued up for 4.12. Once 4.12 opens and it gets merged with
master, I'll queue up a revert of:

commit 4981d04dd8f1ab19e2cce008da556d7f099b6e68
Author: Ming Lei <ming.lei@redhat.com>
Date:   Sat Apr 15 20:38:23 2017 +0800

    mtip32xx: pass BLK_MQ_F_NO_SCHED

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched
  2017-04-27 13:49                     ` Jens Axboe
@ 2017-04-27 15:20                       ` Christoph Hellwig
  2017-04-27 15:46                         ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2017-04-27 15:20 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ming Lei, Christoph Hellwig, linux-block, Omar Sandoval, Jozef Mikovic

On Thu, Apr 27, 2017 at 06:49:43AM -0700, Jens Axboe wrote:
> Thanks Ming, I've added that patch and your patch to initialize the cmd
> at runtime.

I really think this is the wrong fix, and doing these sort of band-aids
just will lead to further problems done the road.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched
  2017-04-27 15:20                       ` Christoph Hellwig
@ 2017-04-27 15:46                         ` Jens Axboe
  2017-04-27 21:40                           ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2017-04-27 15:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Ming Lei, linux-block, Omar Sandoval, Jozef Mikovic

On 04/27/2017 09:20 AM, Christoph Hellwig wrote:
> On Thu, Apr 27, 2017 at 06:49:43AM -0700, Jens Axboe wrote:
>> Thanks Ming, I've added that patch and your patch to initialize the cmd
>> at runtime.
> 
> I really think this is the wrong fix, and doing these sort of band-aids
> just will lead to further problems done the road.

Ming's patch is just fine, mine is a bit more of a hack. I'll throw some
cycles at fixing up mtip32xx to not have a hard wired internal tag since
it doesn't need to, and then we can drop my one-liner again.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched
  2017-04-27 15:46                         ` Jens Axboe
@ 2017-04-27 21:40                           ` Jens Axboe
  0 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2017-04-27 21:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Ming Lei, linux-block, Omar Sandoval, Jozef Mikovic

On 04/27/2017 09:46 AM, Jens Axboe wrote:
> On 04/27/2017 09:20 AM, Christoph Hellwig wrote:
>> On Thu, Apr 27, 2017 at 06:49:43AM -0700, Jens Axboe wrote:
>>> Thanks Ming, I've added that patch and your patch to initialize the cmd
>>> at runtime.
>>
>> I really think this is the wrong fix, and doing these sort of band-aids
>> just will lead to further problems done the road.
> 
> Ming's patch is just fine, mine is a bit more of a hack. I'll throw some
> cycles at fixing up mtip32xx to not have a hard wired internal tag since
> it doesn't need to, and then we can drop my one-liner again.

Something like this, to keep it simple. This allows us to drop the
requirement that RESERVED requests bypass the normal insertion logic,
and converts mtip32xx to go through queue_rq for internal commands.
Quiesce of NCQ tags is done with busy backoff from queue_rq.

We don't really care about the value of the tag at this point, but
since we KNOW it's a reserved request, I did not rewrite the fact
that we hard code 0 as the internal tag. I think it's safer this
way.

Compiles...


diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 8b361e1..27c6746 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -82,11 +82,7 @@ struct request *blk_mq_sched_get_request(struct request_queue *q,
 	if (likely(!data->hctx))
 		data->hctx = blk_mq_map_queue(q, data->ctx->cpu);
 
-	/*
-	 * For a reserved tag, allocate a normal request since we might
-	 * have driver dependencies on the value of the internal tag.
-	 */
-	if (e && !(data->flags & BLK_MQ_REQ_RESERVED)) {
+	if (e) {
 		data->flags |= BLK_MQ_REQ_INTERNAL;
 
 		/*
@@ -104,6 +100,8 @@ struct request *blk_mq_sched_get_request(struct request_queue *q,
 	}
 
 	if (rq) {
+		if (data->flags & BLK_MQ_REQ_RESERVED)
+			rq->rq_flags |= RQF_RESERVED;
 		if (!op_is_flush(op)) {
 			rq->elv.icq = NULL;
 			if (e && e->type->icq_cache)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index b75ef23..0168b27 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -268,6 +268,9 @@ struct request *__blk_mq_alloc_request(struct blk_mq_alloc_data *data,
 			data->hctx->tags->rqs[rq->tag] = rq;
 		}
 
+		if (data->flags & BLK_MQ_REQ_RESERVED)
+			rq->rq_flags |= RQF_RESERVED;
+
 		blk_mq_rq_ctx_init(data->q, data->ctx, rq, op);
 		return rq;
 	}
diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
index 02804cc..ed287ed 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -195,13 +195,10 @@ static struct mtip_cmd *mtip_get_int_command(struct driver_data *dd)
 	if (mtip_check_surprise_removal(dd->pdev))
 		return NULL;
 
-	rq = blk_mq_alloc_request(dd->queue, 0, BLK_MQ_REQ_RESERVED);
+	rq = blk_mq_alloc_request(dd->queue, REQ_OP_DRV_IN, BLK_MQ_REQ_RESERVED);
 	if (IS_ERR(rq))
 		return NULL;
 
-	/* Internal cmd isn't submitted via .queue_rq */
-	mtip_init_cmd_header(rq);
-
 	return blk_mq_rq_to_pdu(rq);
 }
 
@@ -609,11 +606,6 @@ static void mtip_completion(struct mtip_port *port,
 	complete(waiting);
 }
 
-static void mtip_null_completion(struct mtip_port *port,
-			    int tag, struct mtip_cmd *command, int status)
-{
-}
-
 static int mtip_read_log_page(struct mtip_port *port, u8 page, u16 *buffer,
 				dma_addr_t buffer_dma, unsigned int sectors);
 static int mtip_get_smart_attr(struct mtip_port *port, unsigned int id,
@@ -1035,53 +1027,54 @@ static bool mtip_pause_ncq(struct mtip_port *port,
 	return false;
 }
 
+static bool mtip_commands_active(struct mtip_port *port)
+{
+	unsigned int n;
+	unsigned int active = 1;
+
+	/*
+	 * Ignore s_active bit 0 of array element 0.
+	 * This bit will always be set
+	 */
+	active = readl(port->s_active[0]) & 0xFFFFFFFE;
+	for (n = 1; n < port->dd->slot_groups; n++)
+		active |= readl(port->s_active[n]);
+
+	return active != 0;
+}
+
 /*
  * Wait for port to quiesce
  *
  * @port    Pointer to port data structure
  * @timeout Max duration to wait (ms)
- * @atomic  gfp_t flag to indicate blockable context or not
+ * @can_block flag to indicate blockable context or not
  *
  * return value
  *	0	Success
  *	-EBUSY  Commands still active
  */
-static int mtip_quiesce_io(struct mtip_port *port, unsigned long timeout,
-								gfp_t atomic)
+static int mtip_quiesce_io(struct mtip_port *port, unsigned long timeout)
 {
+	bool active = true;
 	unsigned long to;
-	unsigned int n;
-	unsigned int active = 1;
 
 	blk_mq_stop_hw_queues(port->dd->queue);
 
 	to = jiffies + msecs_to_jiffies(timeout);
 	do {
 		if (test_bit(MTIP_PF_SVC_THD_ACTIVE_BIT, &port->flags) &&
-			test_bit(MTIP_PF_ISSUE_CMDS_BIT, &port->flags) &&
-			atomic == GFP_KERNEL) {
+			test_bit(MTIP_PF_ISSUE_CMDS_BIT, &port->flags)) {
 			msleep(20);
 			continue; /* svc thd is actively issuing commands */
 		}
 
-		if (atomic == GFP_KERNEL)
-			msleep(100);
-		else {
-			cpu_relax();
-			udelay(100);
-		}
+		msleep(100);
 
 		if (mtip_check_surprise_removal(port->dd->pdev))
 			goto err_fault;
 
-		/*
-		 * Ignore s_active bit 0 of array element 0.
-		 * This bit will always be set
-		 */
-		active = readl(port->s_active[0]) & 0xFFFFFFFE;
-		for (n = 1; n < port->dd->slot_groups; n++)
-			active |= readl(port->s_active[n]);
-
+		active = mtip_commands_active(port);
 		if (!active)
 			break;
 	} while (time_before(jiffies, to));
@@ -1093,6 +1086,13 @@ static int mtip_quiesce_io(struct mtip_port *port, unsigned long timeout,
 	return -EFAULT;
 }
 
+struct mtip_int_cmd {
+	int fis_len;
+	dma_addr_t buffer;
+	int buf_len;
+	u32 opts;
+};
+
 /*
  * Execute an internal command and wait for the completion.
  *
@@ -1117,13 +1117,18 @@ static int mtip_exec_internal_command(struct mtip_port *port,
 					dma_addr_t buffer,
 					int buf_len,
 					u32 opts,
-					gfp_t atomic,
 					unsigned long timeout)
 {
-	struct mtip_cmd_sg *command_sg;
 	DECLARE_COMPLETION_ONSTACK(wait);
 	struct mtip_cmd *int_cmd;
 	struct driver_data *dd = port->dd;
+	struct request *rq;
+	struct mtip_int_cmd icmd = {
+		.fis_len = fis_len,
+		.buffer = buffer,
+		.buf_len = buf_len,
+		.opts = opts
+	};
 	int rv = 0;
 	unsigned long start;
 
@@ -1138,6 +1143,8 @@ static int mtip_exec_internal_command(struct mtip_port *port,
 		dbg_printk(MTIP_DRV_NAME "Unable to allocate tag for PIO cmd\n");
 		return -EFAULT;
 	}
+	rq = blk_mq_rq_from_pdu(int_cmd);
+	rq->end_io_data = &icmd;
 
 	set_bit(MTIP_PF_IC_ACTIVE_BIT, &port->flags);
 
@@ -1146,133 +1153,53 @@ static int mtip_exec_internal_command(struct mtip_port *port,
 
 	clear_bit(MTIP_PF_DM_ACTIVE_BIT, &port->flags);
 
-	if (atomic == GFP_KERNEL) {
-		if (fis->command != ATA_CMD_STANDBYNOW1) {
-			/* wait for io to complete if non atomic */
-			if (mtip_quiesce_io(port,
-				MTIP_QUIESCE_IO_TIMEOUT_MS, atomic) < 0) {
-				dev_warn(&dd->pdev->dev,
-					"Failed to quiesce IO\n");
-				mtip_put_int_command(dd, int_cmd);
-				clear_bit(MTIP_PF_IC_ACTIVE_BIT, &port->flags);
-				wake_up_interruptible(&port->svc_wait);
-				return -EBUSY;
-			}
-		}
-
-		/* Set the completion function and data for the command. */
-		int_cmd->comp_data = &wait;
-		int_cmd->comp_func = mtip_completion;
-
-	} else {
-		/* Clear completion - we're going to poll */
-		int_cmd->comp_data = NULL;
-		int_cmd->comp_func = mtip_null_completion;
-	}
+	/* Set the completion function and data for the command. */
+	int_cmd->comp_data = &wait;
+	int_cmd->comp_func = mtip_completion;
 
 	/* Copy the command to the command table */
 	memcpy(int_cmd->command, fis, fis_len*4);
 
-	/* Populate the SG list */
-	int_cmd->command_header->opts =
-		 __force_bit2int cpu_to_le32(opts | fis_len);
-	if (buf_len) {
-		command_sg = int_cmd->command + AHCI_CMD_TBL_HDR_SZ;
-
-		command_sg->info =
-			__force_bit2int cpu_to_le32((buf_len-1) & 0x3FFFFF);
-		command_sg->dba	=
-			__force_bit2int cpu_to_le32(buffer & 0xFFFFFFFF);
-		command_sg->dba_upper =
-			__force_bit2int cpu_to_le32((buffer >> 16) >> 16);
-
-		int_cmd->command_header->opts |=
-			__force_bit2int cpu_to_le32((1 << 16));
-	}
-
-	/* Populate the command header */
-	int_cmd->command_header->byte_count = 0;
-
 	start = jiffies;
 
-	/* Issue the command to the hardware */
-	mtip_issue_non_ncq_command(port, MTIP_TAG_INTERNAL);
-
-	if (atomic == GFP_KERNEL) {
-		/* Wait for the command to complete or timeout. */
-		if ((rv = wait_for_completion_interruptible_timeout(
-				&wait,
-				msecs_to_jiffies(timeout))) <= 0) {
-
-			if (rv == -ERESTARTSYS) { /* interrupted */
-				dev_err(&dd->pdev->dev,
-					"Internal command [%02X] was interrupted after %u ms\n",
-					fis->command,
-					jiffies_to_msecs(jiffies - start));
-				rv = -EINTR;
-				goto exec_ic_exit;
-			} else if (rv == 0) /* timeout */
-				dev_err(&dd->pdev->dev,
-					"Internal command did not complete [%02X] within timeout of  %lu ms\n",
-					fis->command, timeout);
-			else
-				dev_err(&dd->pdev->dev,
-					"Internal command [%02X] wait returned code [%d] after %lu ms - unhandled\n",
-					fis->command, rv, timeout);
-
-			if (mtip_check_surprise_removal(dd->pdev) ||
-				test_bit(MTIP_DDF_REMOVE_PENDING_BIT,
-						&dd->dd_flag)) {
-				dev_err(&dd->pdev->dev,
-					"Internal command [%02X] wait returned due to SR\n",
-					fis->command);
-				rv = -ENXIO;
-				goto exec_ic_exit;
-			}
-			mtip_device_reset(dd); /* recover from timeout issue */
-			rv = -EAGAIN;
+	/* insert request and run queue */
+	blk_execute_rq_nowait(rq->q, NULL, rq, true, NULL);
+
+	/* Wait for the command to complete or timeout. */
+	rv = wait_for_completion_interruptible_timeout(&wait,
+						msecs_to_jiffies(timeout));
+	if (rv < 0) {
+		if (rv == -ERESTARTSYS) { /* interrupted */
+			dev_err(&dd->pdev->dev,
+				"Internal command [%02X] was interrupted after %u ms\n",
+				fis->command,
+				jiffies_to_msecs(jiffies - start));
+			rv = -EINTR;
 			goto exec_ic_exit;
-		}
-	} else {
-		u32 hba_stat, port_stat;
-
-		/* Spin for <timeout> checking if command still outstanding */
-		timeout = jiffies + msecs_to_jiffies(timeout);
-		while ((readl(port->cmd_issue[MTIP_TAG_INTERNAL])
-				& (1 << MTIP_TAG_INTERNAL))
-				&& time_before(jiffies, timeout)) {
-			if (mtip_check_surprise_removal(dd->pdev)) {
-				rv = -ENXIO;
-				goto exec_ic_exit;
-			}
-			if ((fis->command != ATA_CMD_STANDBYNOW1) &&
-				test_bit(MTIP_DDF_REMOVE_PENDING_BIT,
-						&dd->dd_flag)) {
-				rv = -ENXIO;
-				goto exec_ic_exit;
-			}
-			port_stat = readl(port->mmio + PORT_IRQ_STAT);
-			if (!port_stat)
-				continue;
+		} else if (rv == 0) /* timeout */
+			dev_err(&dd->pdev->dev,
+				"Internal command did not complete [%02X] within timeout of  %lu ms\n",
+				fis->command, timeout);
+		else
+			dev_err(&dd->pdev->dev,
+				"Internal command [%02X] wait returned code [%d] after %lu ms - unhandled\n",
+				fis->command, rv, timeout);
 
-			if (port_stat & PORT_IRQ_ERR) {
-				dev_err(&dd->pdev->dev,
-					"Internal command [%02X] failed\n",
-					fis->command);
-				mtip_device_reset(dd);
-				rv = -EIO;
-				goto exec_ic_exit;
-			} else {
-				writel(port_stat, port->mmio + PORT_IRQ_STAT);
-				hba_stat = readl(dd->mmio + HOST_IRQ_STAT);
-				if (hba_stat)
-					writel(hba_stat,
-						dd->mmio + HOST_IRQ_STAT);
-			}
-			break;
+		if (mtip_check_surprise_removal(dd->pdev) ||
+			test_bit(MTIP_DDF_REMOVE_PENDING_BIT,
+					&dd->dd_flag)) {
+			dev_err(&dd->pdev->dev,
+				"Internal command [%02X] wait returned due to SR\n",
+				fis->command);
+			rv = -ENXIO;
+			goto exec_ic_exit;
 		}
+		mtip_device_reset(dd); /* recover from timeout issue */
+		rv = -EAGAIN;
+		goto exec_ic_exit;
 	}
 
+	rv = 0;
 	if (readl(port->cmd_issue[MTIP_TAG_INTERNAL])
 			& (1 << MTIP_TAG_INTERNAL)) {
 		rv = -ENXIO;
@@ -1290,7 +1217,6 @@ static int mtip_exec_internal_command(struct mtip_port *port,
 		return rv;
 	}
 	wake_up_interruptible(&port->svc_wait);
-
 	return rv;
 }
 
@@ -1391,7 +1317,6 @@ static int mtip_get_identify(struct mtip_port *port, void __user *user_buffer)
 				port->identify_dma,
 				sizeof(u16) * ATA_ID_WORDS,
 				0,
-				GFP_KERNEL,
 				MTIP_INT_CMD_TIMEOUT_MS)
 				< 0) {
 		rv = -1;
@@ -1477,7 +1402,6 @@ static int mtip_standby_immediate(struct mtip_port *port)
 					0,
 					0,
 					0,
-					GFP_ATOMIC,
 					timeout);
 	dbg_printk(MTIP_DRV_NAME "Time taken to complete standby cmd: %d ms\n",
 			jiffies_to_msecs(jiffies - start));
@@ -1523,7 +1447,6 @@ static int mtip_read_log_page(struct mtip_port *port, u8 page, u16 *buffer,
 					buffer_dma,
 					sectors * ATA_SECT_SIZE,
 					0,
-					GFP_ATOMIC,
 					MTIP_INT_CMD_TIMEOUT_MS);
 }
 
@@ -1558,7 +1481,6 @@ static int mtip_get_smart_data(struct mtip_port *port, u8 *buffer,
 					buffer_dma,
 					ATA_SECT_SIZE,
 					0,
-					GFP_ATOMIC,
 					15000);
 }
 
@@ -1686,7 +1608,6 @@ static int mtip_send_trim(struct driver_data *dd, unsigned int lba,
 					dma_addr,
 					ATA_SECT_SIZE,
 					0,
-					GFP_KERNEL,
 					MTIP_TRIM_TIMEOUT_MS) < 0)
 		rv = -EIO;
 
@@ -1850,7 +1771,6 @@ static int exec_drive_task(struct mtip_port *port, u8 *command)
 				 0,
 				 0,
 				 0,
-				 GFP_KERNEL,
 				 to) < 0) {
 		return -1;
 	}
@@ -1946,7 +1866,6 @@ static int exec_drive_command(struct mtip_port *port, u8 *command,
 				 (xfer_sz ? dma_addr : 0),
 				 (xfer_sz ? ATA_SECT_SIZE * xfer_sz : 0),
 				 0,
-				 GFP_KERNEL,
 				 to)
 				 < 0) {
 		rv = -EFAULT;
@@ -2189,7 +2108,6 @@ static int exec_drive_taskfile(struct driver_data *dd,
 				 dma_buffer,
 				 transfer_size,
 				 0,
-				 GFP_KERNEL,
 				 timeout) < 0) {
 		err = -EIO;
 		goto abort;
@@ -3825,6 +3743,44 @@ static bool mtip_check_unal_depth(struct blk_mq_hw_ctx *hctx,
 	return false;
 }
 
+static int mtip_issue_reserved_cmd(struct blk_mq_hw_ctx *hctx,
+				   struct request *rq)
+{
+	struct driver_data *dd = hctx->queue->queuedata;
+	struct mtip_int_cmd *icmd = rq->end_io_data;
+	struct mtip_cmd *cmd = blk_mq_rq_to_pdu(rq);
+	struct mtip_cmd_sg *command_sg;
+
+	if (mtip_commands_active(dd->port))
+		return BLK_MQ_RQ_QUEUE_BUSY;
+
+	rq->end_io_data = NULL;
+
+	/* Populate the SG list */
+	cmd->command_header->opts =
+		 __force_bit2int cpu_to_le32(icmd->opts | icmd->fis_len);
+	if (icmd->buf_len) {
+		command_sg = cmd->command + AHCI_CMD_TBL_HDR_SZ;
+
+		command_sg->info =
+			__force_bit2int cpu_to_le32((icmd->buf_len-1) & 0x3FFFFF);
+		command_sg->dba	=
+			__force_bit2int cpu_to_le32(icmd->buffer & 0xFFFFFFFF);
+		command_sg->dba_upper =
+			__force_bit2int cpu_to_le32((icmd->buffer >> 16) >> 16);
+
+		cmd->command_header->opts |=
+			__force_bit2int cpu_to_le32((1 << 16));
+	}
+
+	/* Populate the command header */
+	cmd->command_header->byte_count = 0;
+
+	blk_mq_start_request(rq);
+	mtip_issue_non_ncq_command(dd->port, rq->tag);
+	return BLK_MQ_RQ_QUEUE_OK;
+}
+
 static int mtip_queue_rq(struct blk_mq_hw_ctx *hctx,
 			 const struct blk_mq_queue_data *bd)
 {
@@ -3833,6 +3789,9 @@ static int mtip_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 	mtip_init_cmd_header(rq);
 
+	if (rq->rq_flags & RQF_RESERVED)
+		return mtip_issue_reserved_cmd(hctx, rq);
+
 	if (unlikely(mtip_check_unal_depth(hctx, rq)))
 		return BLK_MQ_RQ_QUEUE_BUSY;
 
@@ -3982,7 +3941,7 @@ static int mtip_block_initialize(struct driver_data *dd)
 	dd->tags.reserved_tags = 1;
 	dd->tags.cmd_size = sizeof(struct mtip_cmd);
 	dd->tags.numa_node = dd->numa_node;
-	dd->tags.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_SCHED;
+	dd->tags.flags = BLK_MQ_F_SHOULD_MERGE;
 	dd->tags.driver_data = dd;
 	dd->tags.timeout = MTIP_NCQ_CMD_TIMEOUT_MS;
 
@@ -4168,11 +4127,9 @@ static int mtip_block_remove(struct driver_data *dd)
 		 * Explicitly wait here for IOs to quiesce,
 		 * as mtip_standby_drive usually won't wait for IOs.
 		 */
-		if (!mtip_quiesce_io(dd->port, MTIP_QUIESCE_IO_TIMEOUT_MS,
-								GFP_KERNEL))
+		if (!mtip_quiesce_io(dd->port, MTIP_QUIESCE_IO_TIMEOUT_MS))
 			mtip_standby_drive(dd);
-	}
-	else
+	} else
 		dev_info(&dd->pdev->dev, "device %s surprise removal\n",
 						dd->disk->disk_name);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ba3884f..c246de5 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -120,6 +120,8 @@ typedef __u32 __bitwise req_flags_t;
 /* Look at ->special_vec for the actual data payload instead of the
    bio chain. */
 #define RQF_SPECIAL_PAYLOAD	((__force req_flags_t)(1 << 18))
+/* Request came from the reserved tags/pool */
+#define RQF_RESERVED		((__force req_flags_t)(1 << 19))
 
 /* flags that prevent us from merging requests: */
 #define RQF_NOMERGE_FLAGS \

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-04-27 21:40 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-15 12:38 [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched Ming Lei
2017-04-15 12:38 ` [PATCH 1/4] block: respect BLK_MQ_F_NO_SCHED Ming Lei
2017-04-15 12:38 ` [PATCH 2/4] mtip32xx: pass BLK_MQ_F_NO_SCHED Ming Lei
2017-04-15 12:38 ` [PATCH 3/4] blk-mq: introduce BLK_MQ_F_SCHED_USE_HW_TAG Ming Lei
2017-04-15 12:38 ` [PATCH 4/4] mtip32xx: use BLK_MQ_F_USE_SCHED_TAG Ming Lei
2017-04-16 16:03 ` [PATCH 0/4] blk-mq-sched: allow to use hw tag for sched Ming Lei
2017-04-17 17:30   ` Omar Sandoval
2017-04-19  1:10     ` Ming Lei
2017-04-19 20:17 ` Jens Axboe
2017-04-20  0:44   ` Ming Lei
2017-04-20  0:55     ` Jens Axboe
2017-04-20  1:03       ` Ming Lei
2017-04-20  4:54         ` Christoph Hellwig
2017-04-20  8:30           ` Ming Lei
2017-04-26 10:48             ` Ming Lei
2017-04-26 18:15               ` Jens Axboe
2017-04-26 18:22                 ` Jens Axboe
2017-04-27  3:14                   ` Ming Lei
2017-04-27 13:49                     ` Jens Axboe
2017-04-27 15:20                       ` Christoph Hellwig
2017-04-27 15:46                         ` Jens Axboe
2017-04-27 21:40                           ` Jens Axboe
2017-04-27  6:32               ` Christoph Hellwig
2017-04-20 16:10           ` Jens Axboe
2017-04-21 10:54             ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.