linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] improve io scheduler callback triggering
@ 2021-08-27 12:41 Niklas Cassel
  2021-08-27 12:41 ` [RFC PATCH 1/2] blk-mq: don't call callbacks for requests that bypassed the scheduler Niklas Cassel
  2021-08-27 12:41 ` [RFC PATCH 2/2] Revert "mq-deadline: Fix request accounting" Niklas Cassel
  0 siblings, 2 replies; 7+ messages in thread
From: Niklas Cassel @ 2021-08-27 12:41 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Bart Van Assche, Damien Le Moal, Paolo Valente, Niklas Cassel,
	linux-block, linux-kernel

From: Niklas Cassel <niklas.cassel@wdc.com>

Recently, there were some extra "dummy" data added into mq-deadline
rq->elv.priv, that was there for the solve purpose of working around
shortcomings in blk-mq (which called the scheduler callbacks even
for requests that bypassed the scheduler).

Fix blk-mq to behave in a more sane way with regards to scheduler
callback triggering, and remove the now unneeded rq->elv.priv magic.

BFQ should be able to perform a similar cleanup, and should be able
to drop this comment from the code:
https://github.com/torvalds/linux/blob/master/block/bfq-iosched.c#L6456-L6462

Considering that I'm quite unfamiliar with the BFQ code, I'd rather
someone who is a bit more familiar with BFQ performs that cleanup.


Kind regards,
Niklas

Niklas Cassel (2):
  blk-mq: don't call callbacks for requests that bypassed the scheduler
  Revert "mq-deadline: Fix request accounting"

 block/blk-mq-sched.c   | 20 ++++++++++++++++++++
 block/blk-mq.c         | 13 -------------
 block/mq-deadline.c    | 16 +++++-----------
 include/linux/blkdev.h |  3 ++-
 4 files changed, 27 insertions(+), 25 deletions(-)

-- 
2.31.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH 1/2] blk-mq: don't call callbacks for requests that bypassed the scheduler
  2021-08-27 12:41 [RFC PATCH 0/2] improve io scheduler callback triggering Niklas Cassel
@ 2021-08-27 12:41 ` Niklas Cassel
  2021-08-27 13:28   ` Ming Lei
  2021-08-27 12:41 ` [RFC PATCH 2/2] Revert "mq-deadline: Fix request accounting" Niklas Cassel
  1 sibling, 1 reply; 7+ messages in thread
From: Niklas Cassel @ 2021-08-27 12:41 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Bart Van Assche, Damien Le Moal, Paolo Valente, Niklas Cassel,
	linux-block, linux-kernel

From: Niklas Cassel <niklas.cassel@wdc.com>

Currently, __blk_mq_alloc_request() calls ops.prepare_request and sets
RQF_ELVPRIV.

Therefore, (if the request is not a flush) the RQF_ELVPRIV flag will be
set for the request in blk_mq_submit_bio(), regardless if the request
was submitted to a scheduler, or bypassed the scheduler.

Later, blk_mq_free_request() checks if the RQF_ELVPRIV flag is set,
if it is, the ops.finish_request callback will be called.

The problem with this is that the finish_request scheduler callback
will be called for requests that bypassed the scheduler.

Fix this by calling the scheduler ops.prepare_request callback, and
set the RQF_ELVPRIV flag only immediately before calling the insert
callback.

This way, we can reuse the flag, and we don't need to add any
additional checks in blk_mq_sched_requeue_request() and
blk_mq_free_request(), since they already only do the callback
if RQF_ELVPRIV is set, and the existing .prepare_request callbacks
should still work without needing modifications.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 block/blk-mq-sched.c   | 20 ++++++++++++++++++++
 block/blk-mq.c         | 13 -------------
 include/linux/blkdev.h |  3 ++-
 3 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 0f006cabfd91..eacacb7088c1 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -466,6 +466,14 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head,
 	if (e) {
 		LIST_HEAD(list);
 
+		rq->elv.icq = NULL;
+		if (e && e->type->ops.prepare_request) {
+			if (e->type->icq_cache)
+				blk_mq_sched_assign_ioc(rq);
+
+			e->type->ops.prepare_request(rq);
+			rq->rq_flags |= RQF_ELVPRIV;
+		}
 		list_add(&rq->queuelist, &list);
 		e->type->ops.insert_requests(hctx, &list, at_head);
 	} else {
@@ -495,6 +503,18 @@ void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx,
 
 	e = hctx->queue->elevator;
 	if (e) {
+		struct request *rq;
+
+		list_for_each_entry(rq, list, queuelist) {
+			rq->elv.icq = NULL;
+			if (e && e->type->ops.prepare_request) {
+				if (e->type->icq_cache)
+					blk_mq_sched_assign_ioc(rq);
+
+				e->type->ops.prepare_request(rq);
+				rq->rq_flags |= RQF_ELVPRIV;
+			}
+		}
 		e->type->ops.insert_requests(hctx, list, false);
 	} else {
 		/*
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9d4fdc2be88a..3527dd9fd10e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -328,19 +328,6 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 	data->ctx->rq_dispatched[op_is_sync(data->cmd_flags)]++;
 	refcount_set(&rq->ref, 1);
 
-	if (!op_is_flush(data->cmd_flags)) {
-		struct elevator_queue *e = data->q->elevator;
-
-		rq->elv.icq = NULL;
-		if (e && e->type->ops.prepare_request) {
-			if (e->type->icq_cache)
-				blk_mq_sched_assign_ioc(rq);
-
-			e->type->ops.prepare_request(rq);
-			rq->rq_flags |= RQF_ELVPRIV;
-		}
-	}
-
 	data->hctx->queued++;
 	return rq;
 }
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 2e12320cb121..a5047c7e9448 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -81,7 +81,8 @@ typedef __u32 __bitwise req_flags_t;
 #define RQF_FAILED		((__force req_flags_t)(1 << 10))
 /* don't warn about errors */
 #define RQF_QUIET		((__force req_flags_t)(1 << 11))
-/* elevator private data attached */
+/* The request has been inserted to an elevator, and thus has private
+   data attached */
 #define RQF_ELVPRIV		((__force req_flags_t)(1 << 12))
 /* account into disk and partition IO statistics */
 #define RQF_IO_STAT		((__force req_flags_t)(1 << 13))
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH 2/2] Revert "mq-deadline: Fix request accounting"
  2021-08-27 12:41 [RFC PATCH 0/2] improve io scheduler callback triggering Niklas Cassel
  2021-08-27 12:41 ` [RFC PATCH 1/2] blk-mq: don't call callbacks for requests that bypassed the scheduler Niklas Cassel
@ 2021-08-27 12:41 ` Niklas Cassel
  1 sibling, 0 replies; 7+ messages in thread
From: Niklas Cassel @ 2021-08-27 12:41 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Bart Van Assche, Damien Le Moal, Paolo Valente, Niklas Cassel,
	linux-block, linux-kernel

From: Niklas Cassel <niklas.cassel@wdc.com>

This reverts commit b6d2b054e8baaee53fd2d4854c63cbf0f2c6262a.

There is no longer any need to perform any workaround private data magic
in order to track if a request was inserted to the scheduler or not.

blk-mq will no longer call .finish_request() for requests that were
never inserted to the scheduler.

This patch also didn't handle requeues properly, so it would have
required additional private data magic if it wasn't reverted.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 block/mq-deadline.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 36920670dccc..a5d171b54f8e 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -682,7 +682,6 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 
 	prio = ioprio_class_to_prio[ioprio_class];
 	dd_count(dd, inserted, prio);
-	rq->elv.priv[0] = (void *)(uintptr_t)1;
 
 	if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
 		blk_mq_free_requests(&free);
@@ -731,10 +730,12 @@ static void dd_insert_requests(struct blk_mq_hw_ctx *hctx,
 	spin_unlock(&dd->lock);
 }
 
-/* Callback from inside blk_mq_rq_ctx_init(). */
+/*
+ * Nothing to do here. This is defined only to ensure that .finish_request
+ * method is called upon request completion.
+ */
 static void dd_prepare_request(struct request *rq)
 {
-	rq->elv.priv[0] = NULL;
 }
 
 /*
@@ -761,14 +762,7 @@ static void dd_finish_request(struct request *rq)
 	const enum dd_prio prio = ioprio_class_to_prio[ioprio_class];
 	struct dd_per_prio *per_prio = &dd->per_prio[prio];
 
-	/*
-	 * The block layer core may call dd_finish_request() without having
-	 * called dd_insert_requests(). Hence only update statistics for
-	 * requests for which dd_insert_requests() has been called. See also
-	 * blk_mq_request_bypass_insert().
-	 */
-	if (rq->elv.priv[0])
-		dd_count(dd, completed, prio);
+	dd_count(dd, completed, prio);
 
 	if (blk_queue_is_zoned(q)) {
 		unsigned long flags;
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 1/2] blk-mq: don't call callbacks for requests that bypassed the scheduler
  2021-08-27 12:41 ` [RFC PATCH 1/2] blk-mq: don't call callbacks for requests that bypassed the scheduler Niklas Cassel
@ 2021-08-27 13:28   ` Ming Lei
  2021-08-30  9:48     ` Niklas Cassel
  0 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2021-08-27 13:28 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Jens Axboe, Bart Van Assche, Damien Le Moal, Paolo Valente,
	linux-block, linux-kernel

On Fri, Aug 27, 2021 at 12:41:31PM +0000, Niklas Cassel wrote:
> From: Niklas Cassel <niklas.cassel@wdc.com>
> 
> Currently, __blk_mq_alloc_request() calls ops.prepare_request and sets
> RQF_ELVPRIV.
> 
> Therefore, (if the request is not a flush) the RQF_ELVPRIV flag will be
> set for the request in blk_mq_submit_bio(), regardless if the request
> was submitted to a scheduler, or bypassed the scheduler.
> 
> Later, blk_mq_free_request() checks if the RQF_ELVPRIV flag is set,
> if it is, the ops.finish_request callback will be called.
> 
> The problem with this is that the finish_request scheduler callback
> will be called for requests that bypassed the scheduler.
> 
> Fix this by calling the scheduler ops.prepare_request callback, and
> set the RQF_ELVPRIV flag only immediately before calling the insert
> callback.

One request could be inserted more than one times, such as requeue,
however __blk_mq_alloc_request() is just run once, so is it fine to
call ->prepare_request more than one time for same request?

Or I am wondering why not call ->prepare_request when the following
check is true?

	if (e && e->type->ops.prepare_request && !op_is_flush(data->cmd_flags) &&
		!blk_op_is_passthrough(data->cmd_flags))
		e->type->ops.prepare_request()




Thanks, 
Ming


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 1/2] blk-mq: don't call callbacks for requests that bypassed the scheduler
  2021-08-27 13:28   ` Ming Lei
@ 2021-08-30  9:48     ` Niklas Cassel
  2021-08-30 10:11       ` Ming Lei
  0 siblings, 1 reply; 7+ messages in thread
From: Niklas Cassel @ 2021-08-30  9:48 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Bart Van Assche, Damien Le Moal, Paolo Valente,
	linux-block, linux-kernel

On Fri, Aug 27, 2021 at 09:28:07PM +0800, Ming Lei wrote:
> On Fri, Aug 27, 2021 at 12:41:31PM +0000, Niklas Cassel wrote:
> > From: Niklas Cassel <niklas.cassel@wdc.com>
> > 
> > Currently, __blk_mq_alloc_request() calls ops.prepare_request and sets
> > RQF_ELVPRIV.
> > 
> > Therefore, (if the request is not a flush) the RQF_ELVPRIV flag will be
> > set for the request in blk_mq_submit_bio(), regardless if the request
> > was submitted to a scheduler, or bypassed the scheduler.
> > 
> > Later, blk_mq_free_request() checks if the RQF_ELVPRIV flag is set,
> > if it is, the ops.finish_request callback will be called.
> > 
> > The problem with this is that the finish_request scheduler callback
> > will be called for requests that bypassed the scheduler.
> > 
> > Fix this by calling the scheduler ops.prepare_request callback, and
> > set the RQF_ELVPRIV flag only immediately before calling the insert
> > callback.
> 
> One request could be inserted more than one times, such as requeue,
> however __blk_mq_alloc_request() is just run once, so is it fine to
> call ->prepare_request more than one time for same request?

Calling ->prepare_request multiple times is fine.
All the different I/O schedulers (BFQ, mq-deadline, kyber)
simply use .prepare_request to clear/set elv->priv to a fixed value.

> 
> Or I am wondering why not call ->prepare_request when the following
> check is true?
> 
> 	if (e && e->type->ops.prepare_request && !op_is_flush(data->cmd_flags) &&
> 		!blk_op_is_passthrough(data->cmd_flags))
> 		e->type->ops.prepare_request()


That might work, and might be a nicer solution indeed.

If a request got plugged, it will be inserted to the scheduler through
blk_flush_plug_list() -> blk_mq_flush_plug_list() -> blk_mq_sched_insert_requests()
which will insert them unconditionally.
In this case. we know that !op_is_flush() (because if it was, blk_mq_submit_bio()
would have inserted directly.)


If we didn't plug, we do blk_mq_sched_insert_request(), which will add it if
blk_mq_sched_bypass_insert() returns false:

blk_mq_sched_bypass_insert() is defined as:

        if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq))
                return true;
Also in this case. we know that !op_is_flush() (blk_mq_submit_bio() would have
inserted directly.)


So, we could easily add && !blk_op_is_passthrough(data->cmd_flags) to the
->prepare_request condition in blk_mq_rq_ctx_init() like you suggested,
but since the bypass condition also seems to look at RQF_FLUSH_SEQ, wouldn't
we need to add RQF_FLUSH_SEQ to the condition in blk_mq_rq_ctx_init() as well?

This flag is set after blk_mq_rq_ctx_init(). Are we sure that RQF_FLUSH_SEQ
flag will only be set for a request which op_is_flush() returned true?

(If so, then only adding  && !blk_op_is_passthrough(data->cmd_flags) should
be fine.)


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 1/2] blk-mq: don't call callbacks for requests that bypassed the scheduler
  2021-08-30  9:48     ` Niklas Cassel
@ 2021-08-30 10:11       ` Ming Lei
  2021-09-07 12:59         ` Niklas Cassel
  0 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2021-08-30 10:11 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Jens Axboe, Bart Van Assche, Damien Le Moal, Paolo Valente,
	linux-block, linux-kernel

On Mon, Aug 30, 2021 at 09:48:06AM +0000, Niklas Cassel wrote:
> On Fri, Aug 27, 2021 at 09:28:07PM +0800, Ming Lei wrote:
> > On Fri, Aug 27, 2021 at 12:41:31PM +0000, Niklas Cassel wrote:
> > > From: Niklas Cassel <niklas.cassel@wdc.com>
> > > 
> > > Currently, __blk_mq_alloc_request() calls ops.prepare_request and sets
> > > RQF_ELVPRIV.
> > > 
> > > Therefore, (if the request is not a flush) the RQF_ELVPRIV flag will be
> > > set for the request in blk_mq_submit_bio(), regardless if the request
> > > was submitted to a scheduler, or bypassed the scheduler.
> > > 
> > > Later, blk_mq_free_request() checks if the RQF_ELVPRIV flag is set,
> > > if it is, the ops.finish_request callback will be called.
> > > 
> > > The problem with this is that the finish_request scheduler callback
> > > will be called for requests that bypassed the scheduler.
> > > 
> > > Fix this by calling the scheduler ops.prepare_request callback, and
> > > set the RQF_ELVPRIV flag only immediately before calling the insert
> > > callback.
> > 
> > One request could be inserted more than one times, such as requeue,
> > however __blk_mq_alloc_request() is just run once, so is it fine to
> > call ->prepare_request more than one time for same request?
> 
> Calling ->prepare_request multiple times is fine.
> All the different I/O schedulers (BFQ, mq-deadline, kyber)
> simply use .prepare_request to clear/set elv->priv to a fixed value.
> 
> > 
> > Or I am wondering why not call ->prepare_request when the following
> > check is true?
> > 
> > 	if (e && e->type->ops.prepare_request && !op_is_flush(data->cmd_flags) &&
> > 		!blk_op_is_passthrough(data->cmd_flags))
> > 		e->type->ops.prepare_request()
> 
> 
> That might work, and might be a nicer solution indeed.
> 
> If a request got plugged, it will be inserted to the scheduler through
> blk_flush_plug_list() -> blk_mq_flush_plug_list() -> blk_mq_sched_insert_requests()
> which will insert them unconditionally.
> In this case. we know that !op_is_flush() (because if it was, blk_mq_submit_bio()
> would have inserted directly.)
> 
> 
> If we didn't plug, we do blk_mq_sched_insert_request(), which will add it if
> blk_mq_sched_bypass_insert() returns false:
> 
> blk_mq_sched_bypass_insert() is defined as:
> 
>         if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq))
>                 return true;
> Also in this case. we know that !op_is_flush() (blk_mq_submit_bio() would have
> inserted directly.)
> 
> 
> So, we could easily add && !blk_op_is_passthrough(data->cmd_flags) to the
> ->prepare_request condition in blk_mq_rq_ctx_init() like you suggested,
> but since the bypass condition also seems to look at RQF_FLUSH_SEQ, wouldn't
> we need to add RQF_FLUSH_SEQ to the condition in blk_mq_rq_ctx_init() as well?
> 
> This flag is set after blk_mq_rq_ctx_init(). Are we sure that RQF_FLUSH_SEQ
> flag will only be set for a request which op_is_flush() returned true?
> 
> (If so, then only adding  && !blk_op_is_passthrough(data->cmd_flags) should
> be fine.)

BTW, what I meant is the following change, is it fine?

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0a33d16a7298..f98f8cc05644 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -327,20 +327,6 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 
 	data->ctx->rq_dispatched[op_is_sync(data->cmd_flags)]++;
 	refcount_set(&rq->ref, 1);
-
-	if (!op_is_flush(data->cmd_flags)) {
-		struct elevator_queue *e = data->q->elevator;
-
-		rq->elv.icq = NULL;
-		if (e && e->type->ops.prepare_request) {
-			if (e->type->icq_cache)
-				blk_mq_sched_assign_ioc(rq);
-
-			e->type->ops.prepare_request(rq);
-			rq->rq_flags |= RQF_ELVPRIV;
-		}
-	}
-
 	data->hctx->queued++;
 	return rq;
 }
@@ -359,17 +345,25 @@ static struct request *__blk_mq_alloc_request(struct blk_mq_alloc_data *data)
 	if (data->cmd_flags & REQ_NOWAIT)
 		data->flags |= BLK_MQ_REQ_NOWAIT;
 
-	if (e) {
+	if (e && !op_is_flush(data->cmd_flags) &&
+			!blk_op_is_passthrough(data->cmd_flags)) {
 		/*
 		 * Flush/passthrough requests are special and go directly to the
 		 * dispatch list. Don't include reserved tags in the
 		 * limiting, as it isn't useful.
 		 */
-		if (!op_is_flush(data->cmd_flags) &&
-		    !blk_op_is_passthrough(data->cmd_flags) &&
-		    e->type->ops.limit_depth &&
-		    !(data->flags & BLK_MQ_REQ_RESERVED))
+		if (e->type->ops.limit_depth &&
+			    !(data->flags & BLK_MQ_REQ_RESERVED))
 			e->type->ops.limit_depth(data->cmd_flags, data);
+
+		rq->elv.icq = NULL;
+		if (e->type->ops.prepare_request) {
+			if (e->type->icq_cache)
+				blk_mq_sched_assign_ioc(rq);
+
+			e->type->ops.prepare_request(rq);
+			rq->rq_flags |= RQF_ELVPRIV;
+		}
 	}
 
 retry:

Thanks, 
Ming


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 1/2] blk-mq: don't call callbacks for requests that bypassed the scheduler
  2021-08-30 10:11       ` Ming Lei
@ 2021-09-07 12:59         ` Niklas Cassel
  0 siblings, 0 replies; 7+ messages in thread
From: Niklas Cassel @ 2021-09-07 12:59 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Bart Van Assche, Damien Le Moal, Paolo Valente,
	linux-block, linux-kernel

On Mon, Aug 30, 2021 at 06:11:12PM +0800, Ming Lei wrote:
> On Mon, Aug 30, 2021 at 09:48:06AM +0000, Niklas Cassel wrote:
> > On Fri, Aug 27, 2021 at 09:28:07PM +0800, Ming Lei wrote:
> > > On Fri, Aug 27, 2021 at 12:41:31PM +0000, Niklas Cassel wrote:
> > > > From: Niklas Cassel <niklas.cassel@wdc.com>
> > > > 
> > > > Currently, __blk_mq_alloc_request() calls ops.prepare_request and sets
> > > > RQF_ELVPRIV.
> > > > 
> > > > Therefore, (if the request is not a flush) the RQF_ELVPRIV flag will be
> > > > set for the request in blk_mq_submit_bio(), regardless if the request
> > > > was submitted to a scheduler, or bypassed the scheduler.
> > > > 
> > > > Later, blk_mq_free_request() checks if the RQF_ELVPRIV flag is set,
> > > > if it is, the ops.finish_request callback will be called.
> > > > 
> > > > The problem with this is that the finish_request scheduler callback
> > > > will be called for requests that bypassed the scheduler.
> > > > 
> > > > Fix this by calling the scheduler ops.prepare_request callback, and
> > > > set the RQF_ELVPRIV flag only immediately before calling the insert
> > > > callback.
> > > 
> > > One request could be inserted more than one times, such as requeue,
> > > however __blk_mq_alloc_request() is just run once, so is it fine to
> > > call ->prepare_request more than one time for same request?
> > 
> > Calling ->prepare_request multiple times is fine.
> > All the different I/O schedulers (BFQ, mq-deadline, kyber)
> > simply use .prepare_request to clear/set elv->priv to a fixed value.
> > 
> > > 
> > > Or I am wondering why not call ->prepare_request when the following
> > > check is true?
> > > 
> > > 	if (e && e->type->ops.prepare_request && !op_is_flush(data->cmd_flags) &&
> > > 		!blk_op_is_passthrough(data->cmd_flags))
> > > 		e->type->ops.prepare_request()
> > 
> > 
> > That might work, and might be a nicer solution indeed.
> > 
> > If a request got plugged, it will be inserted to the scheduler through
> > blk_flush_plug_list() -> blk_mq_flush_plug_list() -> blk_mq_sched_insert_requests()
> > which will insert them unconditionally.
> > In this case. we know that !op_is_flush() (because if it was, blk_mq_submit_bio()
> > would have inserted directly.)
> > 
> > 
> > If we didn't plug, we do blk_mq_sched_insert_request(), which will add it if
> > blk_mq_sched_bypass_insert() returns false:
> > 
> > blk_mq_sched_bypass_insert() is defined as:
> > 
> >         if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq))
> >                 return true;
> > Also in this case. we know that !op_is_flush() (blk_mq_submit_bio() would have
> > inserted directly.)
> > 
> > 
> > So, we could easily add && !blk_op_is_passthrough(data->cmd_flags) to the
> > ->prepare_request condition in blk_mq_rq_ctx_init() like you suggested,
> > but since the bypass condition also seems to look at RQF_FLUSH_SEQ, wouldn't
> > we need to add RQF_FLUSH_SEQ to the condition in blk_mq_rq_ctx_init() as well?
> > 
> > This flag is set after blk_mq_rq_ctx_init(). Are we sure that RQF_FLUSH_SEQ
> > flag will only be set for a request which op_is_flush() returned true?
> > 
> > (If so, then only adding  && !blk_op_is_passthrough(data->cmd_flags) should
> > be fine.)
> 
> BTW, what I meant is the following change, is it fine?
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 0a33d16a7298..f98f8cc05644 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -327,20 +327,6 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
>  
>  	data->ctx->rq_dispatched[op_is_sync(data->cmd_flags)]++;
>  	refcount_set(&rq->ref, 1);
> -
> -	if (!op_is_flush(data->cmd_flags)) {
> -		struct elevator_queue *e = data->q->elevator;
> -
> -		rq->elv.icq = NULL;
> -		if (e && e->type->ops.prepare_request) {
> -			if (e->type->icq_cache)
> -				blk_mq_sched_assign_ioc(rq);
> -
> -			e->type->ops.prepare_request(rq);
> -			rq->rq_flags |= RQF_ELVPRIV;
> -		}
> -	}
> -
>  	data->hctx->queued++;
>  	return rq;
>  }
> @@ -359,17 +345,25 @@ static struct request *__blk_mq_alloc_request(struct blk_mq_alloc_data *data)
>  	if (data->cmd_flags & REQ_NOWAIT)
>  		data->flags |= BLK_MQ_REQ_NOWAIT;
>  
> -	if (e) {
> +	if (e && !op_is_flush(data->cmd_flags) &&
> +			!blk_op_is_passthrough(data->cmd_flags)) {
>  		/*
>  		 * Flush/passthrough requests are special and go directly to the
>  		 * dispatch list. Don't include reserved tags in the
>  		 * limiting, as it isn't useful.
>  		 */
> -		if (!op_is_flush(data->cmd_flags) &&
> -		    !blk_op_is_passthrough(data->cmd_flags) &&
> -		    e->type->ops.limit_depth &&
> -		    !(data->flags & BLK_MQ_REQ_RESERVED))
> +		if (e->type->ops.limit_depth &&
> +			    !(data->flags & BLK_MQ_REQ_RESERVED))
>  			e->type->ops.limit_depth(data->cmd_flags, data);
> +
> +		rq->elv.icq = NULL;
> +		if (e->type->ops.prepare_request) {
> +			if (e->type->icq_cache)
> +				blk_mq_sched_assign_ioc(rq);
> +
> +			e->type->ops.prepare_request(rq);
> +			rq->rq_flags |= RQF_ELVPRIV;
> +		}
>  	}
>  
>  retry:
> 

Hello Ming,


Sorry for the delayed reply.

Your patch does not compile, because rq is not defined in this function.

Another problem seems to be that in __blk_mq_alloc_request(), at the end
of the function, calls blk_mq_rq_ctx_init(), which will unconditionally
set rq->rq_flags = 0;


The simple patch:

--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -328,7 +328,8 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
        data->ctx->rq_dispatched[op_is_sync(data->cmd_flags)]++;
        refcount_set(&rq->ref, 1);
 
-       if (!op_is_flush(data->cmd_flags)) {
+       if (!op_is_flush(data->cmd_flags) &&
+           !blk_op_is_passthrough(data->cmd_flags)) {
                struct elevator_queue *e = data->q->elevator;
 
                rq->elv.icq = NULL;



Does appear to solve the problem.

My only worry was RQF_FLUSH_SEQ flag, but as far as I can tell, it is
only ever set for a request that which op_is_flush() returned true.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-09-07 12:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-27 12:41 [RFC PATCH 0/2] improve io scheduler callback triggering Niklas Cassel
2021-08-27 12:41 ` [RFC PATCH 1/2] blk-mq: don't call callbacks for requests that bypassed the scheduler Niklas Cassel
2021-08-27 13:28   ` Ming Lei
2021-08-30  9:48     ` Niklas Cassel
2021-08-30 10:11       ` Ming Lei
2021-09-07 12:59         ` Niklas Cassel
2021-08-27 12:41 ` [RFC PATCH 2/2] Revert "mq-deadline: Fix request accounting" Niklas Cassel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).