All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] improve io scheduler callback triggering
@ 2021-08-27 12:41 Niklas Cassel
  2021-08-27 12:41 ` [RFC PATCH 1/2] blk-mq: don't call callbacks for requests that bypassed the scheduler Niklas Cassel
  2021-08-27 12:41 ` [RFC PATCH 2/2] Revert "mq-deadline: Fix request accounting" Niklas Cassel
  0 siblings, 2 replies; 7+ messages in thread
From: Niklas Cassel @ 2021-08-27 12:41 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Bart Van Assche, Damien Le Moal, Paolo Valente, Niklas Cassel,
	linux-block, linux-kernel

From: Niklas Cassel <niklas.cassel@wdc.com>

Recently, there were some extra "dummy" data added into mq-deadline
rq->elv.priv, that was there for the solve purpose of working around
shortcomings in blk-mq (which called the scheduler callbacks even
for requests that bypassed the scheduler).

Fix blk-mq to behave in a more sane way with regards to scheduler
callback triggering, and remove the now unneeded rq->elv.priv magic.

BFQ should be able to perform a similar cleanup, and should be able
to drop this comment from the code:
https://github.com/torvalds/linux/blob/master/block/bfq-iosched.c#L6456-L6462

Considering that I'm quite unfamiliar with the BFQ code, I'd rather
someone who is a bit more familiar with BFQ performs that cleanup.


Kind regards,
Niklas

Niklas Cassel (2):
  blk-mq: don't call callbacks for requests that bypassed the scheduler
  Revert "mq-deadline: Fix request accounting"

 block/blk-mq-sched.c   | 20 ++++++++++++++++++++
 block/blk-mq.c         | 13 -------------
 block/mq-deadline.c    | 16 +++++-----------
 include/linux/blkdev.h |  3 ++-
 4 files changed, 27 insertions(+), 25 deletions(-)

-- 
2.31.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH 1/2] blk-mq: don't call callbacks for requests that bypassed the scheduler
  2021-08-27 12:41 [RFC PATCH 0/2] improve io scheduler callback triggering Niklas Cassel
@ 2021-08-27 12:41 ` Niklas Cassel
  2021-08-27 13:28   ` Ming Lei
  2021-08-27 12:41 ` [RFC PATCH 2/2] Revert "mq-deadline: Fix request accounting" Niklas Cassel
  1 sibling, 1 reply; 7+ messages in thread
From: Niklas Cassel @ 2021-08-27 12:41 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Bart Van Assche, Damien Le Moal, Paolo Valente, Niklas Cassel,
	linux-block, linux-kernel

From: Niklas Cassel <niklas.cassel@wdc.com>

Currently, __blk_mq_alloc_request() calls ops.prepare_request and sets
RQF_ELVPRIV.

Therefore, (if the request is not a flush) the RQF_ELVPRIV flag will be
set for the request in blk_mq_submit_bio(), regardless if the request
was submitted to a scheduler, or bypassed the scheduler.

Later, blk_mq_free_request() checks if the RQF_ELVPRIV flag is set,
if it is, the ops.finish_request callback will be called.

The problem with this is that the finish_request scheduler callback
will be called for requests that bypassed the scheduler.

Fix this by calling the scheduler ops.prepare_request callback, and
set the RQF_ELVPRIV flag only immediately before calling the insert
callback.

This way, we can reuse the flag, and we don't need to add any
additional checks in blk_mq_sched_requeue_request() and
blk_mq_free_request(), since they already only do the callback
if RQF_ELVPRIV is set, and the existing .prepare_request callbacks
should still work without needing modifications.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 block/blk-mq-sched.c   | 20 ++++++++++++++++++++
 block/blk-mq.c         | 13 -------------
 include/linux/blkdev.h |  3 ++-
 3 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 0f006cabfd91..eacacb7088c1 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -466,6 +466,14 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head,
 	if (e) {
 		LIST_HEAD(list);
 
+		rq->elv.icq = NULL;
+		if (e && e->type->ops.prepare_request) {
+			if (e->type->icq_cache)
+				blk_mq_sched_assign_ioc(rq);
+
+			e->type->ops.prepare_request(rq);
+			rq->rq_flags |= RQF_ELVPRIV;
+		}
 		list_add(&rq->queuelist, &list);
 		e->type->ops.insert_requests(hctx, &list, at_head);
 	} else {
@@ -495,6 +503,18 @@ void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx,
 
 	e = hctx->queue->elevator;
 	if (e) {
+		struct request *rq;
+
+		list_for_each_entry(rq, list, queuelist) {
+			rq->elv.icq = NULL;
+			if (e && e->type->ops.prepare_request) {
+				if (e->type->icq_cache)
+					blk_mq_sched_assign_ioc(rq);
+
+				e->type->ops.prepare_request(rq);
+				rq->rq_flags |= RQF_ELVPRIV;
+			}
+		}
 		e->type->ops.insert_requests(hctx, list, false);
 	} else {
 		/*
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9d4fdc2be88a..3527dd9fd10e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -328,19 +328,6 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 	data->ctx->rq_dispatched[op_is_sync(data->cmd_flags)]++;
 	refcount_set(&rq->ref, 1);
 
-	if (!op_is_flush(data->cmd_flags)) {
-		struct elevator_queue *e = data->q->elevator;
-
-		rq->elv.icq = NULL;
-		if (e && e->type->ops.prepare_request) {
-			if (e->type->icq_cache)
-				blk_mq_sched_assign_ioc(rq);
-
-			e->type->ops.prepare_request(rq);
-			rq->rq_flags |= RQF_ELVPRIV;
-		}
-	}
-
 	data->hctx->queued++;
 	return rq;
 }
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 2e12320cb121..a5047c7e9448 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -81,7 +81,8 @@ typedef __u32 __bitwise req_flags_t;
 #define RQF_FAILED		((__force req_flags_t)(1 << 10))
 /* don't warn about errors */
 #define RQF_QUIET		((__force req_flags_t)(1 << 11))
-/* elevator private data attached */
+/* The request has been inserted to an elevator, and thus has private
+   data attached */
 #define RQF_ELVPRIV		((__force req_flags_t)(1 << 12))
 /* account into disk and partition IO statistics */
 #define RQF_IO_STAT		((__force req_flags_t)(1 << 13))
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH 2/2] Revert "mq-deadline: Fix request accounting"
  2021-08-27 12:41 [RFC PATCH 0/2] improve io scheduler callback triggering Niklas Cassel
  2021-08-27 12:41 ` [RFC PATCH 1/2] blk-mq: don't call callbacks for requests that bypassed the scheduler Niklas Cassel
@ 2021-08-27 12:41 ` Niklas Cassel
  1 sibling, 0 replies; 7+ messages in thread
From: Niklas Cassel @ 2021-08-27 12:41 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Bart Van Assche, Damien Le Moal, Paolo Valente, Niklas Cassel,
	linux-block, linux-kernel

From: Niklas Cassel <niklas.cassel@wdc.com>

This reverts commit b6d2b054e8baaee53fd2d4854c63cbf0f2c6262a.

There is no longer any need to perform any workaround private data magic
in order to track if a request was inserted to the scheduler or not.

blk-mq will no longer call .finish_request() for requests that were
never inserted to the scheduler.

This patch also didn't handle requeues properly, so it would have
required additional private data magic if it wasn't reverted.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 block/mq-deadline.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 36920670dccc..a5d171b54f8e 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -682,7 +682,6 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 
 	prio = ioprio_class_to_prio[ioprio_class];
 	dd_count(dd, inserted, prio);
-	rq->elv.priv[0] = (void *)(uintptr_t)1;
 
 	if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
 		blk_mq_free_requests(&free);
@@ -731,10 +730,12 @@ static void dd_insert_requests(struct blk_mq_hw_ctx *hctx,
 	spin_unlock(&dd->lock);
 }
 
-/* Callback from inside blk_mq_rq_ctx_init(). */
+/*
+ * Nothing to do here. This is defined only to ensure that .finish_request
+ * method is called upon request completion.
+ */
 static void dd_prepare_request(struct request *rq)
 {
-	rq->elv.priv[0] = NULL;
 }
 
 /*
@@ -761,14 +762,7 @@ static void dd_finish_request(struct request *rq)
 	const enum dd_prio prio = ioprio_class_to_prio[ioprio_class];
 	struct dd_per_prio *per_prio = &dd->per_prio[prio];
 
-	/*
-	 * The block layer core may call dd_finish_request() without having
-	 * called dd_insert_requests(). Hence only update statistics for
-	 * requests for which dd_insert_requests() has been called. See also
-	 * blk_mq_request_bypass_insert().
-	 */
-	if (rq->elv.priv[0])
-		dd_count(dd, completed, prio);
+	dd_count(dd, completed, prio);
 
 	if (blk_queue_is_zoned(q)) {
 		unsigned long flags;
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 1/2] blk-mq: don't call callbacks for requests that bypassed the scheduler
  2021-08-27 12:41 ` [RFC PATCH 1/2] blk-mq: don't call callbacks for requests that bypassed the scheduler Niklas Cassel
@ 2021-08-27 13:28   ` Ming Lei
  2021-08-30  9:48     ` Niklas Cassel
  0 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2021-08-27 13:28 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Jens Axboe, Bart Van Assche, Damien Le Moal, Paolo Valente,
	linux-block, linux-kernel

On Fri, Aug 27, 2021 at 12:41:31PM +0000, Niklas Cassel wrote:
> From: Niklas Cassel <niklas.cassel@wdc.com>
> 
> Currently, __blk_mq_alloc_request() calls ops.prepare_request and sets
> RQF_ELVPRIV.
> 
> Therefore, (if the request is not a flush) the RQF_ELVPRIV flag will be
> set for the request in blk_mq_submit_bio(), regardless if the request
> was submitted to a scheduler, or bypassed the scheduler.
> 
> Later, blk_mq_free_request() checks if the RQF_ELVPRIV flag is set,
> if it is, the ops.finish_request callback will be called.
> 
> The problem with this is that the finish_request scheduler callback
> will be called for requests that bypassed the scheduler.
> 
> Fix this by calling the scheduler ops.prepare_request callback, and
> set the RQF_ELVPRIV flag only immediately before calling the insert
> callback.

One request could be inserted more than one times, such as requeue,
however __blk_mq_alloc_request() is just run once, so is it fine to
call ->prepare_request more than one time for same request?

Or I am wondering why not call ->prepare_request when the following
check is true?

	if (e && e->type->ops.prepare_request && !op_is_flush(data->cmd_flags) &&
		!blk_op_is_passthrough(data->cmd_flags))
		e->type->ops.prepare_request()




Thanks, 
Ming


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 1/2] blk-mq: don't call callbacks for requests that bypassed the scheduler
  2021-08-27 13:28   ` Ming Lei
@ 2021-08-30  9:48     ` Niklas Cassel
  2021-08-30 10:11       ` Ming Lei
  0 siblings, 1 reply; 7+ messages in thread
From: Niklas Cassel @ 2021-08-30  9:48 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Bart Van Assche, Damien Le Moal, Paolo Valente,
	linux-block, linux-kernel

On Fri, Aug 27, 2021 at 09:28:07PM +0800, Ming Lei wrote:
> On Fri, Aug 27, 2021 at 12:41:31PM +0000, Niklas Cassel wrote:
> > From: Niklas Cassel <niklas.cassel@wdc.com>
> > 
> > Currently, __blk_mq_alloc_request() calls ops.prepare_request and sets
> > RQF_ELVPRIV.
> > 
> > Therefore, (if the request is not a flush) the RQF_ELVPRIV flag will be
> > set for the request in blk_mq_submit_bio(), regardless if the request
> > was submitted to a scheduler, or bypassed the scheduler.
> > 
> > Later, blk_mq_free_request() checks if the RQF_ELVPRIV flag is set,
> > if it is, the ops.finish_request callback will be called.
> > 
> > The problem with this is that the finish_request scheduler callback
> > will be called for requests that bypassed the scheduler.
> > 
> > Fix this by calling the scheduler ops.prepare_request callback, and
> > set the RQF_ELVPRIV flag only immediately before calling the insert
> > callback.
> 
> One request could be inserted more than one times, such as requeue,
> however __blk_mq_alloc_request() is just run once, so is it fine to
> call ->prepare_request more than one time for same request?

Calling ->prepare_request multiple times is fine.
All the different I/O schedulers (BFQ, mq-deadline, kyber)
simply use .prepare_request to clear/set elv->priv to a fixed value.

> 
> Or I am wondering why not call ->prepare_request when the following
> check is true?
> 
> 	if (e && e->type->ops.prepare_request && !op_is_flush(data->cmd_flags) &&
> 		!blk_op_is_passthrough(data->cmd_flags))
> 		e->type->ops.prepare_request()


That might work, and might be a nicer solution indeed.

If a request got plugged, it will be inserted to the scheduler through
blk_flush_plug_list() -> blk_mq_flush_plug_list() -> blk_mq_sched_insert_requests()
which will insert them unconditionally.
In this case. we know that !op_is_flush() (because if it was, blk_mq_submit_bio()
would have inserted directly.)


If we didn't plug, we do blk_mq_sched_insert_request(), which will add it if
blk_mq_sched_bypass_insert() returns false:

blk_mq_sched_bypass_insert() is defined as:

        if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq))
                return true;
Also in this case. we know that !op_is_flush() (blk_mq_submit_bio() would have
inserted directly.)


So, we could easily add && !blk_op_is_passthrough(data->cmd_flags) to the
->prepare_request condition in blk_mq_rq_ctx_init() like you suggested,
but since the bypass condition also seems to look at RQF_FLUSH_SEQ, wouldn't
we need to add RQF_FLUSH_SEQ to the condition in blk_mq_rq_ctx_init() as well?

This flag is set after blk_mq_rq_ctx_init(). Are we sure that RQF_FLUSH_SEQ
flag will only be set for a request which op_is_flush() returned true?

(If so, then only adding  && !blk_op_is_passthrough(data->cmd_flags) should
be fine.)


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 1/2] blk-mq: don't call callbacks for requests that bypassed the scheduler
  2021-08-30  9:48     ` Niklas Cassel
@ 2021-08-30 10:11       ` Ming Lei
  2021-09-07 12:59         ` Niklas Cassel
  0 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2021-08-30 10:11 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Jens Axboe, Bart Van Assche, Damien Le Moal, Paolo Valente,
	linux-block, linux-kernel

On Mon, Aug 30, 2021 at 09:48:06AM +0000, Niklas Cassel wrote:
> On Fri, Aug 27, 2021 at 09:28:07PM +0800, Ming Lei wrote:
> > On Fri, Aug 27, 2021 at 12:41:31PM +0000, Niklas Cassel wrote:
> > > From: Niklas Cassel <niklas.cassel@wdc.com>
> > > 
> > > Currently, __blk_mq_alloc_request() calls ops.prepare_request and sets
> > > RQF_ELVPRIV.
> > > 
> > > Therefore, (if the request is not a flush) the RQF_ELVPRIV flag will be
> > > set for the request in blk_mq_submit_bio(), regardless if the request
> > > was submitted to a scheduler, or bypassed the scheduler.
> > > 
> > > Later, blk_mq_free_request() checks if the RQF_ELVPRIV flag is set,
> > > if it is, the ops.finish_request callback will be called.
> > > 
> > > The problem with this is that the finish_request scheduler callback
> > > will be called for requests that bypassed the scheduler.
> > > 
> > > Fix this by calling the scheduler ops.prepare_request callback, and
> > > set the RQF_ELVPRIV flag only immediately before calling the insert
> > > callback.
> > 
> > One request could be inserted more than one times, such as requeue,
> > however __blk_mq_alloc_request() is just run once, so is it fine to
> > call ->prepare_request more than one time for same request?
> 
> Calling ->prepare_request multiple times is fine.
> All the different I/O schedulers (BFQ, mq-deadline, kyber)
> simply use .prepare_request to clear/set elv->priv to a fixed value.
> 
> > 
> > Or I am wondering why not call ->prepare_request when the following
> > check is true?
> > 
> > 	if (e && e->type->ops.prepare_request && !op_is_flush(data->cmd_flags) &&
> > 		!blk_op_is_passthrough(data->cmd_flags))
> > 		e->type->ops.prepare_request()
> 
> 
> That might work, and might be a nicer solution indeed.
> 
> If a request got plugged, it will be inserted to the scheduler through
> blk_flush_plug_list() -> blk_mq_flush_plug_list() -> blk_mq_sched_insert_requests()
> which will insert them unconditionally.
> In this case. we know that !op_is_flush() (because if it was, blk_mq_submit_bio()
> would have inserted directly.)
> 
> 
> If we didn't plug, we do blk_mq_sched_insert_request(), which will add it if
> blk_mq_sched_bypass_insert() returns false:
> 
> blk_mq_sched_bypass_insert() is defined as:
> 
>         if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq))
>                 return true;
> Also in this case. we know that !op_is_flush() (blk_mq_submit_bio() would have
> inserted directly.)
> 
> 
> So, we could easily add && !blk_op_is_passthrough(data->cmd_flags) to the
> ->prepare_request condition in blk_mq_rq_ctx_init() like you suggested,
> but since the bypass condition also seems to look at RQF_FLUSH_SEQ, wouldn't
> we need to add RQF_FLUSH_SEQ to the condition in blk_mq_rq_ctx_init() as well?
> 
> This flag is set after blk_mq_rq_ctx_init(). Are we sure that RQF_FLUSH_SEQ
> flag will only be set for a request which op_is_flush() returned true?
> 
> (If so, then only adding  && !blk_op_is_passthrough(data->cmd_flags) should
> be fine.)

BTW, what I meant is the following change, is it fine?

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0a33d16a7298..f98f8cc05644 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -327,20 +327,6 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 
 	data->ctx->rq_dispatched[op_is_sync(data->cmd_flags)]++;
 	refcount_set(&rq->ref, 1);
-
-	if (!op_is_flush(data->cmd_flags)) {
-		struct elevator_queue *e = data->q->elevator;
-
-		rq->elv.icq = NULL;
-		if (e && e->type->ops.prepare_request) {
-			if (e->type->icq_cache)
-				blk_mq_sched_assign_ioc(rq);
-
-			e->type->ops.prepare_request(rq);
-			rq->rq_flags |= RQF_ELVPRIV;
-		}
-	}
-
 	data->hctx->queued++;
 	return rq;
 }
@@ -359,17 +345,25 @@ static struct request *__blk_mq_alloc_request(struct blk_mq_alloc_data *data)
 	if (data->cmd_flags & REQ_NOWAIT)
 		data->flags |= BLK_MQ_REQ_NOWAIT;
 
-	if (e) {
+	if (e && !op_is_flush(data->cmd_flags) &&
+			!blk_op_is_passthrough(data->cmd_flags)) {
 		/*
 		 * Flush/passthrough requests are special and go directly to the
 		 * dispatch list. Don't include reserved tags in the
 		 * limiting, as it isn't useful.
 		 */
-		if (!op_is_flush(data->cmd_flags) &&
-		    !blk_op_is_passthrough(data->cmd_flags) &&
-		    e->type->ops.limit_depth &&
-		    !(data->flags & BLK_MQ_REQ_RESERVED))
+		if (e->type->ops.limit_depth &&
+			    !(data->flags & BLK_MQ_REQ_RESERVED))
 			e->type->ops.limit_depth(data->cmd_flags, data);
+
+		rq->elv.icq = NULL;
+		if (e->type->ops.prepare_request) {
+			if (e->type->icq_cache)
+				blk_mq_sched_assign_ioc(rq);
+
+			e->type->ops.prepare_request(rq);
+			rq->rq_flags |= RQF_ELVPRIV;
+		}
 	}
 
 retry:

Thanks, 
Ming


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 1/2] blk-mq: don't call callbacks for requests that bypassed the scheduler
  2021-08-30 10:11       ` Ming Lei
@ 2021-09-07 12:59         ` Niklas Cassel
  0 siblings, 0 replies; 7+ messages in thread
From: Niklas Cassel @ 2021-09-07 12:59 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Bart Van Assche, Damien Le Moal, Paolo Valente,
	linux-block, linux-kernel

On Mon, Aug 30, 2021 at 06:11:12PM +0800, Ming Lei wrote:
> On Mon, Aug 30, 2021 at 09:48:06AM +0000, Niklas Cassel wrote:
> > On Fri, Aug 27, 2021 at 09:28:07PM +0800, Ming Lei wrote:
> > > On Fri, Aug 27, 2021 at 12:41:31PM +0000, Niklas Cassel wrote:
> > > > From: Niklas Cassel <niklas.cassel@wdc.com>
> > > > 
> > > > Currently, __blk_mq_alloc_request() calls ops.prepare_request and sets
> > > > RQF_ELVPRIV.
> > > > 
> > > > Therefore, (if the request is not a flush) the RQF_ELVPRIV flag will be
> > > > set for the request in blk_mq_submit_bio(), regardless if the request
> > > > was submitted to a scheduler, or bypassed the scheduler.
> > > > 
> > > > Later, blk_mq_free_request() checks if the RQF_ELVPRIV flag is set,
> > > > if it is, the ops.finish_request callback will be called.
> > > > 
> > > > The problem with this is that the finish_request scheduler callback
> > > > will be called for requests that bypassed the scheduler.
> > > > 
> > > > Fix this by calling the scheduler ops.prepare_request callback, and
> > > > set the RQF_ELVPRIV flag only immediately before calling the insert
> > > > callback.
> > > 
> > > One request could be inserted more than one times, such as requeue,
> > > however __blk_mq_alloc_request() is just run once, so is it fine to
> > > call ->prepare_request more than one time for same request?
> > 
> > Calling ->prepare_request multiple times is fine.
> > All the different I/O schedulers (BFQ, mq-deadline, kyber)
> > simply use .prepare_request to clear/set elv->priv to a fixed value.
> > 
> > > 
> > > Or I am wondering why not call ->prepare_request when the following
> > > check is true?
> > > 
> > > 	if (e && e->type->ops.prepare_request && !op_is_flush(data->cmd_flags) &&
> > > 		!blk_op_is_passthrough(data->cmd_flags))
> > > 		e->type->ops.prepare_request()
> > 
> > 
> > That might work, and might be a nicer solution indeed.
> > 
> > If a request got plugged, it will be inserted to the scheduler through
> > blk_flush_plug_list() -> blk_mq_flush_plug_list() -> blk_mq_sched_insert_requests()
> > which will insert them unconditionally.
> > In this case. we know that !op_is_flush() (because if it was, blk_mq_submit_bio()
> > would have inserted directly.)
> > 
> > 
> > If we didn't plug, we do blk_mq_sched_insert_request(), which will add it if
> > blk_mq_sched_bypass_insert() returns false:
> > 
> > blk_mq_sched_bypass_insert() is defined as:
> > 
> >         if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq))
> >                 return true;
> > Also in this case. we know that !op_is_flush() (blk_mq_submit_bio() would have
> > inserted directly.)
> > 
> > 
> > So, we could easily add && !blk_op_is_passthrough(data->cmd_flags) to the
> > ->prepare_request condition in blk_mq_rq_ctx_init() like you suggested,
> > but since the bypass condition also seems to look at RQF_FLUSH_SEQ, wouldn't
> > we need to add RQF_FLUSH_SEQ to the condition in blk_mq_rq_ctx_init() as well?
> > 
> > This flag is set after blk_mq_rq_ctx_init(). Are we sure that RQF_FLUSH_SEQ
> > flag will only be set for a request which op_is_flush() returned true?
> > 
> > (If so, then only adding  && !blk_op_is_passthrough(data->cmd_flags) should
> > be fine.)
> 
> BTW, what I meant is the following change, is it fine?
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 0a33d16a7298..f98f8cc05644 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -327,20 +327,6 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
>  
>  	data->ctx->rq_dispatched[op_is_sync(data->cmd_flags)]++;
>  	refcount_set(&rq->ref, 1);
> -
> -	if (!op_is_flush(data->cmd_flags)) {
> -		struct elevator_queue *e = data->q->elevator;
> -
> -		rq->elv.icq = NULL;
> -		if (e && e->type->ops.prepare_request) {
> -			if (e->type->icq_cache)
> -				blk_mq_sched_assign_ioc(rq);
> -
> -			e->type->ops.prepare_request(rq);
> -			rq->rq_flags |= RQF_ELVPRIV;
> -		}
> -	}
> -
>  	data->hctx->queued++;
>  	return rq;
>  }
> @@ -359,17 +345,25 @@ static struct request *__blk_mq_alloc_request(struct blk_mq_alloc_data *data)
>  	if (data->cmd_flags & REQ_NOWAIT)
>  		data->flags |= BLK_MQ_REQ_NOWAIT;
>  
> -	if (e) {
> +	if (e && !op_is_flush(data->cmd_flags) &&
> +			!blk_op_is_passthrough(data->cmd_flags)) {
>  		/*
>  		 * Flush/passthrough requests are special and go directly to the
>  		 * dispatch list. Don't include reserved tags in the
>  		 * limiting, as it isn't useful.
>  		 */
> -		if (!op_is_flush(data->cmd_flags) &&
> -		    !blk_op_is_passthrough(data->cmd_flags) &&
> -		    e->type->ops.limit_depth &&
> -		    !(data->flags & BLK_MQ_REQ_RESERVED))
> +		if (e->type->ops.limit_depth &&
> +			    !(data->flags & BLK_MQ_REQ_RESERVED))
>  			e->type->ops.limit_depth(data->cmd_flags, data);
> +
> +		rq->elv.icq = NULL;
> +		if (e->type->ops.prepare_request) {
> +			if (e->type->icq_cache)
> +				blk_mq_sched_assign_ioc(rq);
> +
> +			e->type->ops.prepare_request(rq);
> +			rq->rq_flags |= RQF_ELVPRIV;
> +		}
>  	}
>  
>  retry:
> 

Hello Ming,


Sorry for the delayed reply.

Your patch does not compile, because rq is not defined in this function.

Another problem seems to be that in __blk_mq_alloc_request(), at the end
of the function, calls blk_mq_rq_ctx_init(), which will unconditionally
set rq->rq_flags = 0;


The simple patch:

--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -328,7 +328,8 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
        data->ctx->rq_dispatched[op_is_sync(data->cmd_flags)]++;
        refcount_set(&rq->ref, 1);
 
-       if (!op_is_flush(data->cmd_flags)) {
+       if (!op_is_flush(data->cmd_flags) &&
+           !blk_op_is_passthrough(data->cmd_flags)) {
                struct elevator_queue *e = data->q->elevator;
 
                rq->elv.icq = NULL;



Does appear to solve the problem.

My only worry was RQF_FLUSH_SEQ flag, but as far as I can tell, it is
only ever set for a request that which op_is_flush() returned true.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-09-07 12:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-27 12:41 [RFC PATCH 0/2] improve io scheduler callback triggering Niklas Cassel
2021-08-27 12:41 ` [RFC PATCH 1/2] blk-mq: don't call callbacks for requests that bypassed the scheduler Niklas Cassel
2021-08-27 13:28   ` Ming Lei
2021-08-30  9:48     ` Niklas Cassel
2021-08-30 10:11       ` Ming Lei
2021-09-07 12:59         ` Niklas Cassel
2021-08-27 12:41 ` [RFC PATCH 2/2] Revert "mq-deadline: Fix request accounting" Niklas Cassel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.