[PATCH V2 0/2] blk-mq: handle passthrough request as really passthrough

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH V2 0/2] blk-mq: handle passthrough request as really passthrough
@ 2023-05-15 14:45 Ming Lei
  2023-05-15 14:46 ` [PATCH V2 1/2] blk-mq: don't queue plugged passthrough requests into scheduler Ming Lei
  2023-05-15 14:46 ` [PATCH V2 2/2] blk-mq: make sure elevator callbacks aren't called for passthrough request Ming Lei
  0 siblings, 2 replies; 17+ messages in thread
From: Ming Lei @ 2023-05-15 14:45 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Ming Lei

Hello,

The 1st patch avoids to queue pt requests from plug list into scheduler.

The 2nd patch avoids to call elevator callbacks for passthrough.

V2:
	- improve patch style as suggested by Jens (1/2)
	- add comments (1/2)
	- add patch 2


Ming Lei (2):
  blk-mq: don't queue plugged passthrough requests into scheduler
  blk-mq: make sure elevator callbacks aren't called for passthrough
    request

 block/blk-mq-sched.h |  9 +++++++--
 block/blk-mq.c       | 23 ++++++++++++++++++-----
 2 files changed, 25 insertions(+), 7 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH V2 1/2] blk-mq: don't queue plugged passthrough requests into scheduler
  2023-05-15 14:45 [PATCH V2 0/2] blk-mq: handle passthrough request as really passthrough Ming Lei
@ 2023-05-15 14:46 ` Ming Lei
  2023-05-16  6:22   ` Christoph Hellwig
  2023-05-15 14:46 ` [PATCH V2 2/2] blk-mq: make sure elevator callbacks aren't called for passthrough request Ming Lei
  1 sibling, 1 reply; 17+ messages in thread
From: Ming Lei @ 2023-05-15 14:46 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Ming Lei, Guangwu Zhang, Yu Kuai

Passthrough(pt) request shouldn't be queued to scheduler, especially some
schedulers(such as bfq) supposes that req->bio is always available, then
blk-cgroup can be retrieved via bio.

We never let passthrough request cross scheduler before commit 1c2d2fff6dc0
("block: wire-up support for passthrough plugging").

Fix this issue by queuing pt request from plug list to hctx->dispatch
directly.

Reported-by: Guangwu Zhang <guazhang@redhat.com>
Closes: https://lore.kernel.org/linux-block/CAGS2=YosaYaUTEMU3uaf+y=8MqSrhL7sYsJn8EwbaM=76p_4Qg@mail.gmail.com/
Investigated-by: Yu Kuai <yukuai1@huaweicloud.com>
Fixes: 1c2d2fff6dc0 ("block: wire-up support for passthrough plugging")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f6dad0886a2f..b4aaf42f1125 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2711,6 +2711,7 @@ static void blk_mq_dispatch_plug_list(struct blk_plug *plug, bool from_sched)
 	struct request *requeue_list = NULL;
 	struct request **requeue_lastp = &requeue_list;
 	unsigned int depth = 0;
+	bool pt = false;
 	LIST_HEAD(list);
 
 	do {
@@ -2719,7 +2720,14 @@ static void blk_mq_dispatch_plug_list(struct blk_plug *plug, bool from_sched)
 		if (!this_hctx) {
 			this_hctx = rq->mq_hctx;
 			this_ctx = rq->mq_ctx;
-		} else if (this_hctx != rq->mq_hctx || this_ctx != rq->mq_ctx) {
+			pt = blk_rq_is_passthrough(rq);
+		} else if (this_hctx != rq->mq_hctx || this_ctx != rq->mq_ctx ||
+				pt != blk_rq_is_passthrough(rq)) {
+			/*
+			 * Both passthrough and flush request don't belong to
+			 * scheduler, but flush request won't be added to plug
+			 * list, so needn't handle here.
+			 */
 			rq_list_add_tail(&requeue_lastp, rq);
 			continue;
 		}
@@ -2731,7 +2739,13 @@ static void blk_mq_dispatch_plug_list(struct blk_plug *plug, bool from_sched)
 	trace_block_unplug(this_hctx->queue, depth, !from_sched);
 
 	percpu_ref_get(&this_hctx->queue->q_usage_counter);
-	if (this_hctx->queue->elevator) {
+	/* passthrough requests don't belong to scheduler */
+	if (pt) {
+		spin_lock(&this_hctx->lock);
+		list_splice_tail_init(&list, &this_hctx->dispatch);
+		spin_unlock(&this_hctx->lock);
+		blk_mq_run_hw_queue(this_hctx, from_sched);
+	} else if (this_hctx->queue->elevator) {
 		this_hctx->queue->elevator->type->ops.insert_requests(this_hctx,
 				&list, 0);
 		blk_mq_run_hw_queue(this_hctx, from_sched);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V2 2/2] blk-mq: make sure elevator callbacks aren't called for passthrough request
  2023-05-15 14:45 [PATCH V2 0/2] blk-mq: handle passthrough request as really passthrough Ming Lei
  2023-05-15 14:46 ` [PATCH V2 1/2] blk-mq: don't queue plugged passthrough requests into scheduler Ming Lei
@ 2023-05-15 14:46 ` Ming Lei
  2023-05-15 15:52   ` Bart Van Assche
  1 sibling, 1 reply; 17+ messages in thread
From: Ming Lei @ 2023-05-15 14:46 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Ming Lei

In case of q->elevator, passthrought request can still be marked as RQF_ELV,
so some elevator callbacks will be called for passthrough request.

Add helper of blk_mq_bypass_sched(), so that we can bypass elevator
callbacks for both flush and passthrough request.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.h | 9 +++++++--
 block/blk-mq.c       | 5 ++---
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 7c3cbad17f30..4aa879749843 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -22,6 +22,11 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e);
 void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e);
 void blk_mq_sched_free_rqs(struct request_queue *q);
 
+static inline bool blk_mq_bypass_sched(blk_opf_t cmd_flags)
+{
+	return op_is_flush(cmd_flags) || blk_op_is_passthrough(cmd_flags);
+}
+
 static inline void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
 {
 	if (test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
@@ -48,7 +53,7 @@ blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
 
 static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
 {
-	if (rq->rq_flags & RQF_ELV) {
+	if ((rq->rq_flags & RQF_ELV) && !blk_mq_bypass_sched(rq->cmd_flags)) {
 		struct elevator_queue *e = rq->q->elevator;
 
 		if (e->type->ops.completed_request)
@@ -58,7 +63,7 @@ static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
 
 static inline void blk_mq_sched_requeue_request(struct request *rq)
 {
-	if (rq->rq_flags & RQF_ELV) {
+	if ((rq->rq_flags & RQF_ELV) && !blk_mq_bypass_sched(rq->cmd_flags)) {
 		struct request_queue *q = rq->q;
 		struct elevator_queue *e = q->elevator;
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index b4aaf42f1125..bd80fe3aa0c3 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -392,7 +392,7 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 		INIT_HLIST_NODE(&rq->hash);
 		RB_CLEAR_NODE(&rq->rb_node);
 
-		if (!op_is_flush(data->cmd_flags) &&
+		if (!blk_mq_bypass_sched(data->cmd_flags) &&
 		    e->type->ops.prepare_request) {
 			e->type->ops.prepare_request(rq);
 			rq->rq_flags |= RQF_ELVPRIV;
@@ -458,8 +458,7 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
 		 * dispatch list. Don't include reserved tags in the
 		 * limiting, as it isn't useful.
 		 */
-		if (!op_is_flush(data->cmd_flags) &&
-		    !blk_op_is_passthrough(data->cmd_flags) &&
+		if (!blk_mq_bypass_sched(data->cmd_flags) &&
 		    e->type->ops.limit_depth &&
 		    !(data->flags & BLK_MQ_REQ_RESERVED))
 			e->type->ops.limit_depth(data->cmd_flags, data);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH V2 2/2] blk-mq: make sure elevator callbacks aren't called for passthrough request
  2023-05-15 14:46 ` [PATCH V2 2/2] blk-mq: make sure elevator callbacks aren't called for passthrough request Ming Lei
@ 2023-05-15 15:52   ` Bart Van Assche
  2023-05-15 20:22     ` Keith Busch
  0 siblings, 1 reply; 17+ messages in thread
From: Bart Van Assche @ 2023-05-15 15:52 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe; +Cc: linux-block, Christoph Hellwig

On 5/15/23 07:46, Ming Lei wrote:
> @@ -48,7 +53,7 @@ blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
>   
>   static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
>   {
> -	if (rq->rq_flags & RQF_ELV) {
> +	if ((rq->rq_flags & RQF_ELV) && !blk_mq_bypass_sched(rq->cmd_flags)) {
>   		struct elevator_queue *e = rq->q->elevator;
>   
>   		if (e->type->ops.completed_request)
> @@ -58,7 +63,7 @@ static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
>   
>   static inline void blk_mq_sched_requeue_request(struct request *rq)
>   {
> -	if (rq->rq_flags & RQF_ELV) {
> +	if ((rq->rq_flags & RQF_ELV) && !blk_mq_bypass_sched(rq->cmd_flags)) {
>   		struct request_queue *q = rq->q;
>   		struct elevator_queue *e = q->elevator;

Has it been considered not to set RQF_ELV for passthrough requests 
instead of making the above changes?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V2 2/2] blk-mq: make sure elevator callbacks aren't called for passthrough request
  2023-05-15 15:52   ` Bart Van Assche
@ 2023-05-15 20:22     ` Keith Busch
  2023-05-16  1:20       ` Ming Lei
  0 siblings, 1 reply; 17+ messages in thread
From: Keith Busch @ 2023-05-15 20:22 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Ming Lei, Jens Axboe, linux-block, Christoph Hellwig

On Mon, May 15, 2023 at 08:52:38AM -0700, Bart Van Assche wrote:
> On 5/15/23 07:46, Ming Lei wrote:
> > @@ -48,7 +53,7 @@ blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
> >   static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
> >   {
> > -	if (rq->rq_flags & RQF_ELV) {
> > +	if ((rq->rq_flags & RQF_ELV) && !blk_mq_bypass_sched(rq->cmd_flags)) {
> >   		struct elevator_queue *e = rq->q->elevator;
> >   		if (e->type->ops.completed_request)
> > @@ -58,7 +63,7 @@ static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
> >   static inline void blk_mq_sched_requeue_request(struct request *rq)
> >   {
> > -	if (rq->rq_flags & RQF_ELV) {
> > +	if ((rq->rq_flags & RQF_ELV) && !blk_mq_bypass_sched(rq->cmd_flags)) {
> >   		struct request_queue *q = rq->q;
> >   		struct elevator_queue *e = q->elevator;
> 
> Has it been considered not to set RQF_ELV for passthrough requests instead
> of making the above changes?

That sounds like a good idea. It changes more behavior than what Ming is
targeting here, but after looking through each use for RQF_ELV, I think
not having that set really is the right thing to do in all cases for
passthrough requests.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V2 2/2] blk-mq: make sure elevator callbacks aren't called for passthrough request
  2023-05-15 20:22     ` Keith Busch
@ 2023-05-16  1:20       ` Ming Lei
  2023-05-16  6:24         ` Christoph Hellwig
  2023-05-16 14:47         ` Keith Busch
  0 siblings, 2 replies; 17+ messages in thread
From: Ming Lei @ 2023-05-16  1:20 UTC (permalink / raw)
  To: Keith Busch
  Cc: Bart Van Assche, Jens Axboe, linux-block, Christoph Hellwig, ming.lei

On Mon, May 15, 2023 at 02:22:18PM -0600, Keith Busch wrote:
> On Mon, May 15, 2023 at 08:52:38AM -0700, Bart Van Assche wrote:
> > On 5/15/23 07:46, Ming Lei wrote:
> > > @@ -48,7 +53,7 @@ blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
> > >   static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
> > >   {
> > > -	if (rq->rq_flags & RQF_ELV) {
> > > +	if ((rq->rq_flags & RQF_ELV) && !blk_mq_bypass_sched(rq->cmd_flags)) {
> > >   		struct elevator_queue *e = rq->q->elevator;
> > >   		if (e->type->ops.completed_request)
> > > @@ -58,7 +63,7 @@ static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
> > >   static inline void blk_mq_sched_requeue_request(struct request *rq)
> > >   {
> > > -	if (rq->rq_flags & RQF_ELV) {
> > > +	if ((rq->rq_flags & RQF_ELV) && !blk_mq_bypass_sched(rq->cmd_flags)) {
> > >   		struct request_queue *q = rq->q;
> > >   		struct elevator_queue *e = q->elevator;
> > 
> > Has it been considered not to set RQF_ELV for passthrough requests instead
> > of making the above changes?
> 
> That sounds like a good idea. It changes more behavior than what Ming is
> targeting here, but after looking through each use for RQF_ELV, I think
> not having that set really is the right thing to do in all cases for
> passthrough requests.

I did consider that approach. But:

- RQF_ELV actually means that the request & its tag is allocated from sched tags.

- if RQF_ELV is cleared for passthrough request, request may be
  allocated from sched tags(normal IO) and driver tags(passthrough) at the same time.
  This way may cause other problem, such as, breaking blk_mq_hctx_has_requests().
  Meantime it becomes not likely to optimize tags resource utilization in future,
  at least for single LUN/NS, no need to keep sched tags & driver tags
  in memory at the same time.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V2 1/2] blk-mq: don't queue plugged passthrough requests into scheduler
  2023-05-15 14:46 ` [PATCH V2 1/2] blk-mq: don't queue plugged passthrough requests into scheduler Ming Lei
@ 2023-05-16  6:22   ` Christoph Hellwig
  2023-05-16  8:10     ` Ming Lei
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2023-05-16  6:22 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Guangwu Zhang, Yu Kuai

On Mon, May 15, 2023 at 10:46:00PM +0800, Ming Lei wrote:
> +		} else if (this_hctx != rq->mq_hctx || this_ctx != rq->mq_ctx ||
> +				pt != blk_rq_is_passthrough(rq)) {

Can your format this as:

		} else if (this_hctx != rq->mq_hctx || this_ctx != rq->mq_ctx ||
			   pt != blk_rq_is_passthrough(rq)) {

for readability?

> +			/*
> +			 * Both passthrough and flush request don't belong to
> +			 * scheduler, but flush request won't be added to plug
> +			 * list, so needn't handle here.
> +			 */
>  			rq_list_add_tail(&requeue_lastp, rq);

This comment confuses the heck out of me.  The check if for passthrough
vs non-passthrough and doesn't involved flush requests at all.

I'd prefer to drop it, and instead comment on passthrough requests
not going to the scheduled below where we actually issue other requests
to the scheduler.

> +	if (pt) {
> +		spin_lock(&this_hctx->lock);
> +		list_splice_tail_init(&list, &this_hctx->dispatch);
> +		spin_unlock(&this_hctx->lock);
> +		blk_mq_run_hw_queue(this_hctx, from_sched);

.. aka here.  But why can't we just use the blk_mq_insert_requests
for this case anyway?

As in just doing a:


-	if (this_hctx->queue->elevator) {
+	if (this_hctx->queue->elevator && !pt) {

?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V2 2/2] blk-mq: make sure elevator callbacks aren't called for passthrough request
  2023-05-16  1:20       ` Ming Lei
@ 2023-05-16  6:24         ` Christoph Hellwig
  2023-05-16  8:39           ` Ming Lei
  2023-05-16 14:47         ` Keith Busch
  1 sibling, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2023-05-16  6:24 UTC (permalink / raw)
  To: Ming Lei
  Cc: Keith Busch, Bart Van Assche, Jens Axboe, linux-block, Christoph Hellwig

On Tue, May 16, 2023 at 09:20:55AM +0800, Ming Lei wrote:
> > That sounds like a good idea. It changes more behavior than what Ming is
> > targeting here, but after looking through each use for RQF_ELV, I think
> > not having that set really is the right thing to do in all cases for
> > passthrough requests.
> 
> I did consider that approach. But:
> 
> - RQF_ELV actually means that the request & its tag is allocated from sched tags.
> 
> - if RQF_ELV is cleared for passthrough request, request may be
>   allocated from sched tags(normal IO) and driver tags(passthrough) at the same time.
>   This way may cause other problem, such as, breaking blk_mq_hctx_has_requests().
>   Meantime it becomes not likely to optimize tags resource utilization in future,
>   at least for single LUN/NS, no need to keep sched tags & driver tags
>   in memory at the same time.

Then make that obvious.  That is:

 - rename RQF_ELV to RQV_SCHED_TAGS
 - add the RQV_SCHED_TAGS check to your blk_mq_bypass_sched helper.
   I'd also invert the return value and rename it to someting like
   blk_rq_use_sched.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V2 1/2] blk-mq: don't queue plugged passthrough requests into scheduler
  2023-05-16  6:22   ` Christoph Hellwig
@ 2023-05-16  8:10     ` Ming Lei
  2023-05-17  7:20       ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Ming Lei @ 2023-05-16  8:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Guangwu Zhang, Yu Kuai, ming.lei

On Tue, May 16, 2023 at 08:22:21AM +0200, Christoph Hellwig wrote:
> On Mon, May 15, 2023 at 10:46:00PM +0800, Ming Lei wrote:
> > +		} else if (this_hctx != rq->mq_hctx || this_ctx != rq->mq_ctx ||
> > +				pt != blk_rq_is_passthrough(rq)) {
> 
> Can your format this as:
> 
> 		} else if (this_hctx != rq->mq_hctx || this_ctx != rq->mq_ctx ||
> 			   pt != blk_rq_is_passthrough(rq)) {
> 
> for readability?

Do you mean indent for 'pt = blk_rq_is_passthrough(rq)' and keep 'pt' aligned
with 'if' in last line? Otherwise, can't see any difference between the two, :-(

> 
> > +			/*
> > +			 * Both passthrough and flush request don't belong to
> > +			 * scheduler, but flush request won't be added to plug
> > +			 * list, so needn't handle here.
> > +			 */
> >  			rq_list_add_tail(&requeue_lastp, rq);
> 
> This comment confuses the heck out of me.  The check if for passthrough
> vs non-passthrough and doesn't involved flush requests at all.
> 
> I'd prefer to drop it, and instead comment on passthrough requests
> not going to the scheduled below where we actually issue other requests
> to the scheduler.

Any request can be in plug list in theory, we just don't add flush request
to plug, that is why the above comment is added. If you don't like the
words for flush request, I can drop it.

> 
> > +	if (pt) {
> > +		spin_lock(&this_hctx->lock);
> > +		list_splice_tail_init(&list, &this_hctx->dispatch);
> > +		spin_unlock(&this_hctx->lock);
> > +		blk_mq_run_hw_queue(this_hctx, from_sched);
> 
> .. aka here.  But why can't we just use the blk_mq_insert_requests
> for this case anyway?

If the pt request is part of error recovery, it should be issued to
->dispatch list directly, so just for the sake of safety, meantime keep
same behavior with blk_mq_insert_request().

Thanks,
Ming


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V2 2/2] blk-mq: make sure elevator callbacks aren't called for passthrough request
  2023-05-16  6:24         ` Christoph Hellwig
@ 2023-05-16  8:39           ` Ming Lei
  2023-05-17  7:22             ` Christoph Hellwig
  0 siblings, 1 reply; 17+ messages in thread
From: Ming Lei @ 2023-05-16  8:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Bart Van Assche, Jens Axboe, linux-block, ming.lei

On Tue, May 16, 2023 at 08:24:09AM +0200, Christoph Hellwig wrote:
> On Tue, May 16, 2023 at 09:20:55AM +0800, Ming Lei wrote:
> > > That sounds like a good idea. It changes more behavior than what Ming is
> > > targeting here, but after looking through each use for RQF_ELV, I think
> > > not having that set really is the right thing to do in all cases for
> > > passthrough requests.
> > 
> > I did consider that approach. But:
> > 
> > - RQF_ELV actually means that the request & its tag is allocated from sched tags.
> > 
> > - if RQF_ELV is cleared for passthrough request, request may be
> >   allocated from sched tags(normal IO) and driver tags(passthrough) at the same time.
> >   This way may cause other problem, such as, breaking blk_mq_hctx_has_requests().
> >   Meantime it becomes not likely to optimize tags resource utilization in future,
> >   at least for single LUN/NS, no need to keep sched tags & driver tags
> >   in memory at the same time.
> 
> Then make that obvious.  That is:
> 
>  - rename RQF_ELV to RQV_SCHED_TAGS
>  - add the RQV_SCHED_TAGS check to your blk_mq_bypass_sched helper.
>    I'd also invert the return value and rename it to someting like
>    blk_rq_use_sched.

I can understand the point, but it may not be done by single flag, so
how about the following change?

diff --git a/block/blk-mq.c b/block/blk-mq.c
index d1b4aae43cf9..eddc2b5f3319 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -354,7 +354,7 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 		data->rq_flags |= RQF_IO_STAT;
 	rq->rq_flags = data->rq_flags;
 
-	if (!(data->rq_flags & RQF_ELV)) {
+	if (!(data->rq_flags & RQF_SCHED_TAGS)) {
 		rq->tag = tag;
 		rq->internal_tag = BLK_MQ_NO_TAG;
 	} else {
@@ -392,8 +392,7 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 		INIT_HLIST_NODE(&rq->hash);
 		RB_CLEAR_NODE(&rq->rb_node);
 
-		if (!op_is_flush(data->cmd_flags) &&
-		    e->type->ops.prepare_request) {
+		if (e->type->ops.prepare_request) {
 			e->type->ops.prepare_request(rq);
 			rq->rq_flags |= RQF_ELVPRIV;
 		}
@@ -451,15 +450,19 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
 	if (q->elevator) {
 		struct elevator_queue *e = q->elevator;
 
-		data->rq_flags |= RQF_ELV;
+		data->rq_flags |= RQF_SCHED_TAGS;
+
+		/* both flush and passthrough request can't go into scheduler */
+		if (!op_is_flush(data->cmd_flags) &&
+		    !blk_op_is_passthrough(data->cmd_flags))
+			data->rq_flags |= RQF_ELV;
 
 		/*
 		 * Flush/passthrough requests are special and go directly to the
 		 * dispatch list. Don't include reserved tags in the
 		 * limiting, as it isn't useful.
 		 */
-		if (!op_is_flush(data->cmd_flags) &&
-		    !blk_op_is_passthrough(data->cmd_flags) &&
+		if ((data->rq_flags & RQF_ELV) &&
 		    e->type->ops.limit_depth &&
 		    !(data->flags & BLK_MQ_REQ_RESERVED))
 			e->type->ops.limit_depth(data->cmd_flags, data);
@@ -468,7 +471,7 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
 retry:
 	data->ctx = blk_mq_get_ctx(q);
 	data->hctx = blk_mq_map_queue(q, data->cmd_flags, data->ctx);
-	if (!(data->rq_flags & RQF_ELV))
+	if (!(data->rq_flags & RQF_SCHED_TAGS))
 		blk_mq_tag_busy(data->hctx);
 
 	if (data->flags & BLK_MQ_REQ_RESERVED)
@@ -651,7 +654,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 	if (!q->elevator)
 		blk_mq_tag_busy(data.hctx);
 	else
-		data.rq_flags |= RQF_ELV;
+		data.rq_flags |= RQF_SCHED_TAGS;
 
 	if (flags & BLK_MQ_REQ_RESERVED)
 		data.rq_flags |= RQF_RESV;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 06caacd77ed6..b4910a6471b7 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -61,7 +61,8 @@ typedef __u32 __bitwise req_flags_t;
 #define RQF_TIMED_OUT		((__force req_flags_t)(1 << 21))
 /* queue has elevator attached */
 #define RQF_ELV			((__force req_flags_t)(1 << 22))
-#define RQF_RESV			((__force req_flags_t)(1 << 23))
+#define RQF_RESV		((__force req_flags_t)(1 << 23))
+#define RQF_SCHED_TAGS		((__force req_flags_t)(1 << 24))
 
 /* flags that prevent us from merging requests: */
 #define RQF_NOMERGE_FLAGS \



Thanks,
Ming


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH V2 2/2] blk-mq: make sure elevator callbacks aren't called for passthrough request
  2023-05-16  1:20       ` Ming Lei
  2023-05-16  6:24         ` Christoph Hellwig
@ 2023-05-16 14:47         ` Keith Busch
  2023-05-17  3:26           ` Ming Lei
  1 sibling, 1 reply; 17+ messages in thread
From: Keith Busch @ 2023-05-16 14:47 UTC (permalink / raw)
  To: Ming Lei; +Cc: Bart Van Assche, Jens Axboe, linux-block, Christoph Hellwig

On Tue, May 16, 2023 at 09:20:55AM +0800, Ming Lei wrote:
> On Mon, May 15, 2023 at 02:22:18PM -0600, Keith Busch wrote:
> > On Mon, May 15, 2023 at 08:52:38AM -0700, Bart Van Assche wrote:
> > > On 5/15/23 07:46, Ming Lei wrote:
> > > > @@ -48,7 +53,7 @@ blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
> > > >   static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
> > > >   {
> > > > -	if (rq->rq_flags & RQF_ELV) {
> > > > +	if ((rq->rq_flags & RQF_ELV) && !blk_mq_bypass_sched(rq->cmd_flags)) {
> > > >   		struct elevator_queue *e = rq->q->elevator;
> > > >   		if (e->type->ops.completed_request)
> > > > @@ -58,7 +63,7 @@ static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
> > > >   static inline void blk_mq_sched_requeue_request(struct request *rq)
> > > >   {
> > > > -	if (rq->rq_flags & RQF_ELV) {
> > > > +	if ((rq->rq_flags & RQF_ELV) && !blk_mq_bypass_sched(rq->cmd_flags)) {
> > > >   		struct request_queue *q = rq->q;
> > > >   		struct elevator_queue *e = q->elevator;
> > > 
> > > Has it been considered not to set RQF_ELV for passthrough requests instead
> > > of making the above changes?
> > 
> > That sounds like a good idea. It changes more behavior than what Ming is
> > targeting here, but after looking through each use for RQF_ELV, I think
> > not having that set really is the right thing to do in all cases for
> > passthrough requests.
> 
> I did consider that approach. But:
> 
> - RQF_ELV actually means that the request & its tag is allocated from sched tags.
> 
> - if RQF_ELV is cleared for passthrough request, request may be
>   allocated from sched tags(normal IO) and driver tags(passthrough) at the same time.
>   This way may cause other problem, such as, breaking blk_mq_hctx_has_requests().
>   Meantime it becomes not likely to optimize tags resource utilization in future,
>   at least for single LUN/NS, no need to keep sched tags & driver tags
>   in memory at the same time.

Isn't that similar to multiple namespaces where some use elevator and
others use 'none'? They're all contenting for the same shared driver
tags with racing 'has_requests()'.

And the passthrough case is special with users of that interface taking
on a greater responsibility and generally want the kernel out of the
way. I don't think anyone would purposefully run a tag intense workload
through that engine at the same time as using a generic one with the
scheduler. It definitely should still work, but it doesn't need to be
fair, right?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V2 2/2] blk-mq: make sure elevator callbacks aren't called for passthrough request
  2023-05-16 14:47         ` Keith Busch
@ 2023-05-17  3:26           ` Ming Lei
  2023-05-17 18:13             ` Keith Busch
  0 siblings, 1 reply; 17+ messages in thread
From: Ming Lei @ 2023-05-17  3:26 UTC (permalink / raw)
  To: Keith Busch
  Cc: Bart Van Assche, Jens Axboe, linux-block, Christoph Hellwig, ming.lei

On Tue, May 16, 2023 at 08:47:46AM -0600, Keith Busch wrote:
> On Tue, May 16, 2023 at 09:20:55AM +0800, Ming Lei wrote:
> > On Mon, May 15, 2023 at 02:22:18PM -0600, Keith Busch wrote:
> > > On Mon, May 15, 2023 at 08:52:38AM -0700, Bart Van Assche wrote:
> > > > On 5/15/23 07:46, Ming Lei wrote:
> > > > > @@ -48,7 +53,7 @@ blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
> > > > >   static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
> > > > >   {
> > > > > -	if (rq->rq_flags & RQF_ELV) {
> > > > > +	if ((rq->rq_flags & RQF_ELV) && !blk_mq_bypass_sched(rq->cmd_flags)) {
> > > > >   		struct elevator_queue *e = rq->q->elevator;
> > > > >   		if (e->type->ops.completed_request)
> > > > > @@ -58,7 +63,7 @@ static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
> > > > >   static inline void blk_mq_sched_requeue_request(struct request *rq)
> > > > >   {
> > > > > -	if (rq->rq_flags & RQF_ELV) {
> > > > > +	if ((rq->rq_flags & RQF_ELV) && !blk_mq_bypass_sched(rq->cmd_flags)) {
> > > > >   		struct request_queue *q = rq->q;
> > > > >   		struct elevator_queue *e = q->elevator;
> > > > 
> > > > Has it been considered not to set RQF_ELV for passthrough requests instead
> > > > of making the above changes?
> > > 
> > > That sounds like a good idea. It changes more behavior than what Ming is
> > > targeting here, but after looking through each use for RQF_ELV, I think
> > > not having that set really is the right thing to do in all cases for
> > > passthrough requests.
> > 
> > I did consider that approach. But:
> > 
> > - RQF_ELV actually means that the request & its tag is allocated from sched tags.
> > 
> > - if RQF_ELV is cleared for passthrough request, request may be
> >   allocated from sched tags(normal IO) and driver tags(passthrough) at the same time.
> >   This way may cause other problem, such as, breaking blk_mq_hctx_has_requests().
> >   Meantime it becomes not likely to optimize tags resource utilization in future,
> >   at least for single LUN/NS, no need to keep sched tags & driver tags
> >   in memory at the same time.
> 
> Isn't that similar to multiple namespaces where some use elevator and
> others use 'none'? They're all contenting for the same shared driver
> tags with racing 'has_requests()'.

It is similar, but not same.

So far, for each single request queue, we never support to allocate
request/tag from both sched tags and driver tags at the same.

If we clear RQF_ELV simply for pt request, at least:

1) blk_mq_hctx_has_requests() is broken since this function only checks
sched tags in case of q->elevator

2) q->nr_requests may not be respected any more in case of q->elevator

> And the passthrough case is special with users of that interface taking
> on a greater responsibility and generally want the kernel out of the
> way. I don't think anyone would purposefully run a tag intense workload
> through that engine at the same time as using a generic one with the
> scheduler. It definitely should still work, but it doesn't need to be
> fair, right?

I guess it may work, but question is that what we can get from this kind
of big change? And I think this approach may be one following work if it is
proved as useful.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V2 1/2] blk-mq: don't queue plugged passthrough requests into scheduler
  2023-05-16  8:10     ` Ming Lei
@ 2023-05-17  7:20       ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2023-05-17  7:20 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Guangwu Zhang, Yu Kuai

On Tue, May 16, 2023 at 04:10:01PM +0800, Ming Lei wrote:
> On Tue, May 16, 2023 at 08:22:21AM +0200, Christoph Hellwig wrote:
> > On Mon, May 15, 2023 at 10:46:00PM +0800, Ming Lei wrote:
> > > +		} else if (this_hctx != rq->mq_hctx || this_ctx != rq->mq_ctx ||
> > > +				pt != blk_rq_is_passthrough(rq)) {
> > 
> > Can your format this as:
> > 
> > 		} else if (this_hctx != rq->mq_hctx || this_ctx != rq->mq_ctx ||
> > 			   pt != blk_rq_is_passthrough(rq)) {
> > 
> > for readability?
> 
> Do you mean indent for 'pt = blk_rq_is_passthrough(rq)' and keep 'pt' aligned
> with 'if' in last line?

Yes.

> > This comment confuses the heck out of me.  The check if for passthrough
> > vs non-passthrough and doesn't involved flush requests at all.
> > 
> > I'd prefer to drop it, and instead comment on passthrough requests
> > not going to the scheduled below where we actually issue other requests
> > to the scheduler.
> 
> Any request can be in plug list in theory, we just don't add flush request
> to plug, that is why the above comment is added. If you don't like the
> words for flush request, I can drop it.

I just don't think it maks any sense in this context.  If we want to
enforce the invariant that there's no flush request I'd rather add a
WARN_ON to not only talk about enforce it.  I'm not sure it's really
required, though.

> > > +	if (pt) {
> > > +		spin_lock(&this_hctx->lock);
> > > +		list_splice_tail_init(&list, &this_hctx->dispatch);
> > > +		spin_unlock(&this_hctx->lock);
> > > +		blk_mq_run_hw_queue(this_hctx, from_sched);
> > 
> > .. aka here.  But why can't we just use the blk_mq_insert_requests
> > for this case anyway?
> 
> If the pt request is part of error recovery, it should be issued to
> ->dispatch list directly, so just for the sake of safety, meantime keep
> same behavior with blk_mq_insert_request().

But if it is part of error recovery it won't be plugged.  Please don't
do weird cargo cult things here and just use the common helpers.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V2 2/2] blk-mq: make sure elevator callbacks aren't called for passthrough request
  2023-05-16  8:39           ` Ming Lei
@ 2023-05-17  7:22             ` Christoph Hellwig
  2023-05-17  8:58               ` Ming Lei
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2023-05-17  7:22 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Keith Busch, Bart Van Assche, Jens Axboe, linux-block

On Tue, May 16, 2023 at 04:39:05PM +0800, Ming Lei wrote:
> I can understand the point, but it may not be done by single flag,

Can you explain why?  Note that we also already have RQF_ELVPRIV for
any request that has elevator private data.  I don't really think we
need a third flag.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V2 2/2] blk-mq: make sure elevator callbacks aren't called for passthrough request
  2023-05-17  7:22             ` Christoph Hellwig
@ 2023-05-17  8:58               ` Ming Lei
  0 siblings, 0 replies; 17+ messages in thread
From: Ming Lei @ 2023-05-17  8:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Bart Van Assche, Jens Axboe, linux-block, ming.lei

On Wed, May 17, 2023 at 09:22:18AM +0200, Christoph Hellwig wrote:
> On Tue, May 16, 2023 at 04:39:05PM +0800, Ming Lei wrote:
> > I can understand the point, but it may not be done by single flag,
> 
> Can you explain why?  Note that we also already have RQF_ELVPRIV for
> any request that has elevator private data.  I don't really think we
> need a third flag.

RQF_ELVPRIV isn't same with RQF_ELV, and follows the two's relationship:

	RQF_ELVPRIV == (RQF_ELV && non_flush_pt_req && !e->type->ops.prepare_request)

RQF_ELVPRIV can be replaced with the above expression to save one flag.

RQF_ELV isn't same with RQF_SCHED_TAGS too, RQF_ELV should be used for
checking if elevator callback is needed, and RQF_SCHED_TAGS is for allocating
req/tag and dealing with tags busy things.

In case of q->elevator, RQF_SCHED_TAGS is always set, but

- for pt/flush request, RQF_ELV is cleared.
- for other request, RQF_ELV are set

Then we can avoid any elevator callback for pt/flush request.


thanks, 
Ming


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V2 2/2] blk-mq: make sure elevator callbacks aren't called for passthrough request
  2023-05-17  3:26           ` Ming Lei
@ 2023-05-17 18:13             ` Keith Busch
  2023-05-18  1:22               ` Ming Lei
  0 siblings, 1 reply; 17+ messages in thread
From: Keith Busch @ 2023-05-17 18:13 UTC (permalink / raw)
  To: Ming Lei; +Cc: Bart Van Assche, Jens Axboe, linux-block, Christoph Hellwig

On Wed, May 17, 2023 at 11:26:32AM +0800, Ming Lei wrote:
> On Tue, May 16, 2023 at 08:47:46AM -0600, Keith Busch wrote:
> 
> > And the passthrough case is special with users of that interface taking
> > on a greater responsibility and generally want the kernel out of the
> > way. I don't think anyone would purposefully run a tag intense workload
> > through that engine at the same time as using a generic one with the
> > scheduler. It definitely should still work, but it doesn't need to be
> > fair, right?
> 
> I guess it may work, but question is that what we can get from this kind
> of big change? And I think this approach may be one following work if it is
> proved as useful.

I'm just trying to remove any need for side channels to bypass block
layer functionality, like this one:

  http://lists.infradead.org/pipermail/linux-nvme/2023-April/039522.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V2 2/2] blk-mq: make sure elevator callbacks aren't called for passthrough request
  2023-05-17 18:13             ` Keith Busch
@ 2023-05-18  1:22               ` Ming Lei
  0 siblings, 0 replies; 17+ messages in thread
From: Ming Lei @ 2023-05-18  1:22 UTC (permalink / raw)
  To: Keith Busch
  Cc: Bart Van Assche, Jens Axboe, linux-block, Christoph Hellwig,
	Kanchan Joshi

On Wed, May 17, 2023 at 12:13:04PM -0600, Keith Busch wrote:
> On Wed, May 17, 2023 at 11:26:32AM +0800, Ming Lei wrote:
> > On Tue, May 16, 2023 at 08:47:46AM -0600, Keith Busch wrote:
> > 
> > > And the passthrough case is special with users of that interface taking
> > > on a greater responsibility and generally want the kernel out of the
> > > way. I don't think anyone would purposefully run a tag intense workload
> > > through that engine at the same time as using a generic one with the
> > > scheduler. It definitely should still work, but it doesn't need to be
> > > fair, right?
> > 
> > I guess it may work, but question is that what we can get from this kind
> > of big change? And I think this approach may be one following work if it is
> > proved as useful.
> 
> I'm just trying to remove any need for side channels to bypass block
> layer functionality, like this one:
> 
>   http://lists.infradead.org/pipermail/linux-nvme/2023-April/039522.html
> 

In "io_uring attached nvme queue" patchset, Kanchan tried to bypass
request/bio completely, and same with blk-mq's pt code path.

You mean you'd suggest to still reuse req/bio & blk-mq pt code path
for "io_uring attached nvme queue"?

Cc Kanchan.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-05-18  1:23 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-15 14:45 [PATCH V2 0/2] blk-mq: handle passthrough request as really passthrough Ming Lei
2023-05-15 14:46 ` [PATCH V2 1/2] blk-mq: don't queue plugged passthrough requests into scheduler Ming Lei
2023-05-16  6:22   ` Christoph Hellwig
2023-05-16  8:10     ` Ming Lei
2023-05-17  7:20       ` Christoph Hellwig
2023-05-15 14:46 ` [PATCH V2 2/2] blk-mq: make sure elevator callbacks aren't called for passthrough request Ming Lei
2023-05-15 15:52   ` Bart Van Assche
2023-05-15 20:22     ` Keith Busch
2023-05-16  1:20       ` Ming Lei
2023-05-16  6:24         ` Christoph Hellwig
2023-05-16  8:39           ` Ming Lei
2023-05-17  7:22             ` Christoph Hellwig
2023-05-17  8:58               ` Ming Lei
2023-05-16 14:47         ` Keith Busch
2023-05-17  3:26           ` Ming Lei
2023-05-17 18:13             ` Keith Busch
2023-05-18  1:22               ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.