linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Niklas Cassel <Niklas.Cassel@wdc.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	Bart Van Assche <bvanassche@acm.org>,
	Damien Le Moal <Damien.LeMoal@wdc.com>,
	Paolo Valente <paolo.valente@linaro.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 1/2] blk-mq: don't call callbacks for requests that bypassed the scheduler
Date: Tue, 7 Sep 2021 12:59:32 +0000	[thread overview]
Message-ID: <YTdiMoPNa/10VWSC@x1-carbon> (raw)
In-Reply-To: <YSyuwCKi2sS/RaXS@T590>

On Mon, Aug 30, 2021 at 06:11:12PM +0800, Ming Lei wrote:
> On Mon, Aug 30, 2021 at 09:48:06AM +0000, Niklas Cassel wrote:
> > On Fri, Aug 27, 2021 at 09:28:07PM +0800, Ming Lei wrote:
> > > On Fri, Aug 27, 2021 at 12:41:31PM +0000, Niklas Cassel wrote:
> > > > From: Niklas Cassel <niklas.cassel@wdc.com>
> > > > 
> > > > Currently, __blk_mq_alloc_request() calls ops.prepare_request and sets
> > > > RQF_ELVPRIV.
> > > > 
> > > > Therefore, (if the request is not a flush) the RQF_ELVPRIV flag will be
> > > > set for the request in blk_mq_submit_bio(), regardless if the request
> > > > was submitted to a scheduler, or bypassed the scheduler.
> > > > 
> > > > Later, blk_mq_free_request() checks if the RQF_ELVPRIV flag is set,
> > > > if it is, the ops.finish_request callback will be called.
> > > > 
> > > > The problem with this is that the finish_request scheduler callback
> > > > will be called for requests that bypassed the scheduler.
> > > > 
> > > > Fix this by calling the scheduler ops.prepare_request callback, and
> > > > set the RQF_ELVPRIV flag only immediately before calling the insert
> > > > callback.
> > > 
> > > One request could be inserted more than one times, such as requeue,
> > > however __blk_mq_alloc_request() is just run once, so is it fine to
> > > call ->prepare_request more than one time for same request?
> > 
> > Calling ->prepare_request multiple times is fine.
> > All the different I/O schedulers (BFQ, mq-deadline, kyber)
> > simply use .prepare_request to clear/set elv->priv to a fixed value.
> > 
> > > 
> > > Or I am wondering why not call ->prepare_request when the following
> > > check is true?
> > > 
> > > 	if (e && e->type->ops.prepare_request && !op_is_flush(data->cmd_flags) &&
> > > 		!blk_op_is_passthrough(data->cmd_flags))
> > > 		e->type->ops.prepare_request()
> > 
> > 
> > That might work, and might be a nicer solution indeed.
> > 
> > If a request got plugged, it will be inserted to the scheduler through
> > blk_flush_plug_list() -> blk_mq_flush_plug_list() -> blk_mq_sched_insert_requests()
> > which will insert them unconditionally.
> > In this case. we know that !op_is_flush() (because if it was, blk_mq_submit_bio()
> > would have inserted directly.)
> > 
> > 
> > If we didn't plug, we do blk_mq_sched_insert_request(), which will add it if
> > blk_mq_sched_bypass_insert() returns false:
> > 
> > blk_mq_sched_bypass_insert() is defined as:
> > 
> >         if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq))
> >                 return true;
> > Also in this case. we know that !op_is_flush() (blk_mq_submit_bio() would have
> > inserted directly.)
> > 
> > 
> > So, we could easily add && !blk_op_is_passthrough(data->cmd_flags) to the
> > ->prepare_request condition in blk_mq_rq_ctx_init() like you suggested,
> > but since the bypass condition also seems to look at RQF_FLUSH_SEQ, wouldn't
> > we need to add RQF_FLUSH_SEQ to the condition in blk_mq_rq_ctx_init() as well?
> > 
> > This flag is set after blk_mq_rq_ctx_init(). Are we sure that RQF_FLUSH_SEQ
> > flag will only be set for a request which op_is_flush() returned true?
> > 
> > (If so, then only adding  && !blk_op_is_passthrough(data->cmd_flags) should
> > be fine.)
> 
> BTW, what I meant is the following change, is it fine?
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 0a33d16a7298..f98f8cc05644 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -327,20 +327,6 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
>  
>  	data->ctx->rq_dispatched[op_is_sync(data->cmd_flags)]++;
>  	refcount_set(&rq->ref, 1);
> -
> -	if (!op_is_flush(data->cmd_flags)) {
> -		struct elevator_queue *e = data->q->elevator;
> -
> -		rq->elv.icq = NULL;
> -		if (e && e->type->ops.prepare_request) {
> -			if (e->type->icq_cache)
> -				blk_mq_sched_assign_ioc(rq);
> -
> -			e->type->ops.prepare_request(rq);
> -			rq->rq_flags |= RQF_ELVPRIV;
> -		}
> -	}
> -
>  	data->hctx->queued++;
>  	return rq;
>  }
> @@ -359,17 +345,25 @@ static struct request *__blk_mq_alloc_request(struct blk_mq_alloc_data *data)
>  	if (data->cmd_flags & REQ_NOWAIT)
>  		data->flags |= BLK_MQ_REQ_NOWAIT;
>  
> -	if (e) {
> +	if (e && !op_is_flush(data->cmd_flags) &&
> +			!blk_op_is_passthrough(data->cmd_flags)) {
>  		/*
>  		 * Flush/passthrough requests are special and go directly to the
>  		 * dispatch list. Don't include reserved tags in the
>  		 * limiting, as it isn't useful.
>  		 */
> -		if (!op_is_flush(data->cmd_flags) &&
> -		    !blk_op_is_passthrough(data->cmd_flags) &&
> -		    e->type->ops.limit_depth &&
> -		    !(data->flags & BLK_MQ_REQ_RESERVED))
> +		if (e->type->ops.limit_depth &&
> +			    !(data->flags & BLK_MQ_REQ_RESERVED))
>  			e->type->ops.limit_depth(data->cmd_flags, data);
> +
> +		rq->elv.icq = NULL;
> +		if (e->type->ops.prepare_request) {
> +			if (e->type->icq_cache)
> +				blk_mq_sched_assign_ioc(rq);
> +
> +			e->type->ops.prepare_request(rq);
> +			rq->rq_flags |= RQF_ELVPRIV;
> +		}
>  	}
>  
>  retry:
> 

Hello Ming,


Sorry for the delayed reply.

Your patch does not compile, because rq is not defined in this function.

Another problem seems to be that in __blk_mq_alloc_request(), at the end
of the function, calls blk_mq_rq_ctx_init(), which will unconditionally
set rq->rq_flags = 0;


The simple patch:

--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -328,7 +328,8 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
        data->ctx->rq_dispatched[op_is_sync(data->cmd_flags)]++;
        refcount_set(&rq->ref, 1);
 
-       if (!op_is_flush(data->cmd_flags)) {
+       if (!op_is_flush(data->cmd_flags) &&
+           !blk_op_is_passthrough(data->cmd_flags)) {
                struct elevator_queue *e = data->q->elevator;
 
                rq->elv.icq = NULL;



Does appear to solve the problem.

My only worry was RQF_FLUSH_SEQ flag, but as far as I can tell, it is
only ever set for a request that which op_is_flush() returned true.


Kind regards,
Niklas

  reply	other threads:[~2021-09-07 12:59 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-27 12:41 [RFC PATCH 0/2] improve io scheduler callback triggering Niklas Cassel
2021-08-27 12:41 ` [RFC PATCH 1/2] blk-mq: don't call callbacks for requests that bypassed the scheduler Niklas Cassel
2021-08-27 13:28   ` Ming Lei
2021-08-30  9:48     ` Niklas Cassel
2021-08-30 10:11       ` Ming Lei
2021-09-07 12:59         ` Niklas Cassel [this message]
2021-08-27 12:41 ` [RFC PATCH 2/2] Revert "mq-deadline: Fix request accounting" Niklas Cassel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YTdiMoPNa/10VWSC@x1-carbon \
    --to=niklas.cassel@wdc.com \
    --cc=Damien.LeMoal@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=paolo.valente@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).