linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] blk-mq: don't insert FUA request with data into scheduler queue
@ 2021-11-18 15:30 Ming Lei
  2021-11-18 18:22 ` Bart Van Assche
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Ming Lei @ 2021-11-18 15:30 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, Yi Zhang, Christoph Hellwig

We never insert flush request into scheduler queue before.

Recently commit d92ca9d8348f ("blk-mq: don't handle non-flush requests in
blk_insert_flush") tries to handle FUA data request as normal request.
This way has caused warning[1] in mq-deadline dd_exit_sched() or io hang in
case of kyber since RQF_ELVPRIV isn't set for flush request, then
->finish_request won't be called.

Fix the issue by inserting FUA data request with blk_mq_request_bypass_insert()
when the device supports FUA, just like what we did before.

[1] https://lore.kernel.org/linux-block/CAHj4cs-_vkTW=dAzbZYGxpEWSpzpcmaNeY1R=vH311+9vMUSdg@mail.gmail.com/

Reported-by: Yi Zhang <yi.zhang@redhat.com>
Fixes: d92ca9d8348f ("blk-mq: don't handle non-flush requests in blk_insert_flush")
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-flush.c | 12 ++++++------
 block/blk-mq.c    |  4 +++-
 block/blk.h       |  2 +-
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 8e364bda5166..1fce6d16e6d3 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -379,7 +379,7 @@ static void mq_flush_data_end_io(struct request *rq, blk_status_t error)
  * @rq is being submitted.  Analyze what needs to be done and put it on the
  * right queue.
  */
-bool blk_insert_flush(struct request *rq)
+void blk_insert_flush(struct request *rq)
 {
 	struct request_queue *q = rq->q;
 	unsigned long fflags = q->queue_flags;	/* may change, cache */
@@ -409,7 +409,7 @@ bool blk_insert_flush(struct request *rq)
 	 */
 	if (!policy) {
 		blk_mq_end_request(rq, 0);
-		return true;
+		return;
 	}
 
 	BUG_ON(rq->bio != rq->biotail); /*assumes zero or single bio rq */
@@ -420,8 +420,10 @@ bool blk_insert_flush(struct request *rq)
 	 * for normal execution.
 	 */
 	if ((policy & REQ_FSEQ_DATA) &&
-	    !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH)))
-		return false;
+	    !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
+		blk_mq_request_bypass_insert(rq, false, true);
+		return;
+	}
 
 	/*
 	 * @rq should go through flush machinery.  Mark it part of flush
@@ -437,8 +439,6 @@ bool blk_insert_flush(struct request *rq)
 	spin_lock_irq(&fq->mq_flush_lock);
 	blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
 	spin_unlock_irq(&fq->mq_flush_lock);
-
-	return true;
 }
 
 /**
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c219271f9d6a..fa1a00d71b61 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2647,8 +2647,10 @@ void blk_mq_submit_bio(struct bio *bio)
 		return;
 	}
 
-	if (op_is_flush(bio->bi_opf) && blk_insert_flush(rq))
+	if (op_is_flush(bio->bi_opf)) {
+		blk_insert_flush(rq);
 		return;
+	}
 
 	if (plug && (q->nr_hw_queues == 1 ||
 	    blk_mq_is_shared_tags(rq->mq_hctx->flags) ||
diff --git a/block/blk.h b/block/blk.h
index 4a910742cce9..7c6f7635bff0 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -272,7 +272,7 @@ void __blk_account_io_done(struct request *req, u64 now);
  */
 #define ELV_ON_HASH(rq) ((rq)->rq_flags & RQF_HASHED)
 
-bool blk_insert_flush(struct request *rq);
+void blk_insert_flush(struct request *rq);
 
 int elevator_switch_mq(struct request_queue *q,
 			      struct elevator_type *new_e);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-mq: don't insert FUA request with data into scheduler queue
  2021-11-18 15:30 [PATCH] blk-mq: don't insert FUA request with data into scheduler queue Ming Lei
@ 2021-11-18 18:22 ` Bart Van Assche
  2021-11-19  6:08 ` Christoph Hellwig
  2021-11-19 13:28 ` Jens Axboe
  2 siblings, 0 replies; 10+ messages in thread
From: Bart Van Assche @ 2021-11-18 18:22 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe; +Cc: linux-block, Yi Zhang, Christoph Hellwig

On 11/18/21 7:30 AM, Ming Lei wrote:
> We never insert flush request into scheduler queue before.
> 
> Recently commit d92ca9d8348f ("blk-mq: don't handle non-flush requests in
> blk_insert_flush") tries to handle FUA data request as normal request.
> This way has caused warning[1] in mq-deadline dd_exit_sched() or io hang in
> case of kyber since RQF_ELVPRIV isn't set for flush request, then
> ->finish_request won't be called.
> 
> Fix the issue by inserting FUA data request with blk_mq_request_bypass_insert()
> when the device supports FUA, just like what we did before.
> 
> [1] https://lore.kernel.org/linux-block/CAHj4cs-_vkTW=dAzbZYGxpEWSpzpcmaNeY1R=vH311+9vMUSdg@mail.gmail.com/

I had not yet noticed that report. Anyway, thank you for having fixed 
this issue.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-mq: don't insert FUA request with data into scheduler queue
  2021-11-18 15:30 [PATCH] blk-mq: don't insert FUA request with data into scheduler queue Ming Lei
  2021-11-18 18:22 ` Bart Van Assche
@ 2021-11-19  6:08 ` Christoph Hellwig
  2021-11-19  8:07   ` Ming Lei
  2021-11-19 13:28 ` Jens Axboe
  2 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2021-11-19  6:08 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Yi Zhang, Christoph Hellwig

On Thu, Nov 18, 2021 at 11:30:41PM +0800, Ming Lei wrote:
> We never insert flush request into scheduler queue before.
> 
> Recently commit d92ca9d8348f ("blk-mq: don't handle non-flush requests in
> blk_insert_flush") tries to handle FUA data request as normal request.
> This way has caused warning[1] in mq-deadline dd_exit_sched() or io hang in
> case of kyber since RQF_ELVPRIV isn't set for flush request, then
> ->finish_request won't be called.
> 
> Fix the issue by inserting FUA data request with blk_mq_request_bypass_insert()
> when the device supports FUA, just like what we did before.

How we did end up with REQ_ELV set for this request?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-mq: don't insert FUA request with data into scheduler queue
  2021-11-19  6:08 ` Christoph Hellwig
@ 2021-11-19  8:07   ` Ming Lei
  0 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2021-11-19  8:07 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, Yi Zhang

On Fri, Nov 19, 2021 at 07:08:58AM +0100, Christoph Hellwig wrote:
> On Thu, Nov 18, 2021 at 11:30:41PM +0800, Ming Lei wrote:
> > We never insert flush request into scheduler queue before.
> > 
> > Recently commit d92ca9d8348f ("blk-mq: don't handle non-flush requests in
> > blk_insert_flush") tries to handle FUA data request as normal request.
> > This way has caused warning[1] in mq-deadline dd_exit_sched() or io hang in
> > case of kyber since RQF_ELVPRIV isn't set for flush request, then
> > ->finish_request won't be called.
> > 
> > Fix the issue by inserting FUA data request with blk_mq_request_bypass_insert()
> > when the device supports FUA, just like what we did before.
> 
> How we did end up with REQ_ELV set for this request?

We set REQ_ELV for any request if q->elevator isn't NULL, see
__blk_mq_alloc_requests(), and REQ_ELV is just for replacing the check on
q->elevator. If we clear REQ_ELV for flush rq, other problem may be caused,
such as blk_mq_rq_ctx_init() may be confused.

Also flush request is always inserted to hctx->dispatch directly, either
before commit d92ca9d8348f or being queued via requeue in current code.



Thanks,
Ming


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-mq: don't insert FUA request with data into scheduler queue
  2021-11-18 15:30 [PATCH] blk-mq: don't insert FUA request with data into scheduler queue Ming Lei
  2021-11-18 18:22 ` Bart Van Assche
  2021-11-19  6:08 ` Christoph Hellwig
@ 2021-11-19 13:28 ` Jens Axboe
       [not found]   ` <BFC93946-13B3-43EC-9E30-8A980CD5234F@seagate.com>
  2 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2021-11-19 13:28 UTC (permalink / raw)
  To: Ming Lei; +Cc: Yi Zhang, linux-block, Christoph Hellwig

On Thu, 18 Nov 2021 23:30:41 +0800, Ming Lei wrote:
> We never insert flush request into scheduler queue before.
> 
> Recently commit d92ca9d8348f ("blk-mq: don't handle non-flush requests in
> blk_insert_flush") tries to handle FUA data request as normal request.
> This way has caused warning[1] in mq-deadline dd_exit_sched() or io hang in
> case of kyber since RQF_ELVPRIV isn't set for flush request, then
> ->finish_request won't be called.
> 
> [...]

Applied, thanks!

[1/1] blk-mq: don't insert FUA request with data into scheduler queue
      commit: 2b504bd4841bccbf3eb83c1fec229b65956ad8ad

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-mq: don't insert FUA request with data into scheduler queue
       [not found]   ` <BFC93946-13B3-43EC-9E30-8A980CD5234F@seagate.com>
@ 2021-11-19 16:49     ` Tim Walker
  2021-11-19 16:56       ` Christoph Hellwig
  2021-11-20  3:58       ` Ming Lei
  0 siblings, 2 replies; 10+ messages in thread
From: Tim Walker @ 2021-11-19 16:49 UTC (permalink / raw)
  To: Jens Axboe, Ming Lei; +Cc: Yi Zhang, linux-block, Christoph Hellwig



On  19 Nov 2021 Tim Walker wrote:

>
>
>On Thu, 18 Nov 2021 23:30:41 +0800, Ming Lei wrote:
>> We never insert flush request into scheduler queue before.
>>
>> Recently commit d92ca9d8348f ("blk-mq: don't handle non-flush requests in
>> blk_insert_flush") tries to handle FUA data request as normal request.
>> This way has caused warning[1] in mq-deadline dd_exit_sched() or io hang in
>> case of kyber since RQF_ELVPRIV isn't set for flush request, then
>> ->finish_request won't be called.
>>
>> [...]
>
>Applied, thanks!
>
>[1/1] blk-mq: don't insert FUA request with data into scheduler queue
>      commit: 2b504bd4841bccbf3eb83c1fec229b65956ad8ad
>
>Best regards,
>--
>Jens Axboe
>
>
>

I know the discussion is over, but I can't figure out why we treat FUA as a flush. A FUA write only applies to the command at hand, and is not required to flush any previous command's data from the device's volatile write cache. Similarly for a read request - servicing a read from media is really more the rule than the exception (lots of workload dependencies here...), so why would a FUA read bypass the scheduler? The device is always free to service any request from media or cache (as long as it follows the applicable volatile write and read cache settings), so normally we don't know how it is treating the request, so it doesn't seem to matter. 

Consider a FUA write: Why does the fact that we intend that write to bypass the device volatile write cache mean it should bypass the scheduler? All the other traffic-shaping algorithms that help effectively schedule writes are still applicable. E.g. we know we can delay/coalesce them a bit to allow reads to be prioritized, but I can't figure out why we would fast-track a FUA write. Isn't the value to the system for scheduling still valid, even though we are forcing the data to go to media?

I'm sure there is some detail I am missing - any help would be appreciated. Thanks!

Best regards,
-Tim



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-mq: don't insert FUA request with data into scheduler queue
  2021-11-19 16:49     ` Tim Walker
@ 2021-11-19 16:56       ` Christoph Hellwig
  2021-11-19 17:49         ` Tim Walker
  2021-11-20  3:58       ` Ming Lei
  1 sibling, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2021-11-19 16:56 UTC (permalink / raw)
  To: Tim Walker; +Cc: Jens Axboe, Ming Lei, Yi Zhang, linux-block, Christoph Hellwig

Direct reply without a quote a the formatting of your mail is completely
messed up.  We don't treat a FUA a a flush.  If the device supports
FUA is is just passed on.  But if the device does not support FUA it
is sequenced into doing the write first and then a flush before returning
completion to the caller to guarantee the FUA semantics.  Because of that
the command needs special treatment and be handed over to the flush
state machine, but it won't do anything interesting if FUA is actually
natively supported.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-mq: don't insert FUA request with data into scheduler queue
  2021-11-19 16:56       ` Christoph Hellwig
@ 2021-11-19 17:49         ` Tim Walker
  0 siblings, 0 replies; 10+ messages in thread
From: Tim Walker @ 2021-11-19 17:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Ming Lei, Yi Zhang, linux-block


On  Friday, November 19, 2021 at 11:56:55 AM Christoph Hellwig wrote:

>
>Direct reply without a quote a the formatting of your mail is completely
>messed up.  

Sorry about that - hopefully this is better.

>We don't treat a FUA a a flush.  If the device supports
>FUA is is just passed on.  

Does this mean a FUA request for a device that natively supports FUA is eventually "passed on" to the scheduler? 
Or does it go straight to dispatch?

>But if the device does not support FUA it
>is sequenced into doing the write first and then a flush before returning
>completion to the caller to guarantee the FUA semantics.  Because of that
>the command needs special treatment and be handed over to the flush
>state machine, but it won't do anything interesting if FUA is actually
>natively supported.
>

Yes, I can see why this flush logic is needed for devices that don't support FUA. Thanks.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-mq: don't insert FUA request with data into scheduler queue
  2021-11-19 16:49     ` Tim Walker
  2021-11-19 16:56       ` Christoph Hellwig
@ 2021-11-20  3:58       ` Ming Lei
  2021-11-23 16:11         ` Tim Walker
  1 sibling, 1 reply; 10+ messages in thread
From: Ming Lei @ 2021-11-20  3:58 UTC (permalink / raw)
  To: Tim Walker; +Cc: Jens Axboe, Yi Zhang, linux-block, Christoph Hellwig

On Fri, Nov 19, 2021 at 04:49:34PM +0000, Tim Walker wrote:
> 
> 
> On  19 Nov 2021 Tim Walker wrote:
> 
> >
> >
> >On Thu, 18 Nov 2021 23:30:41 +0800, Ming Lei wrote:
> >> We never insert flush request into scheduler queue before.
> >>
> >> Recently commit d92ca9d8348f ("blk-mq: don't handle non-flush requests in
> >> blk_insert_flush") tries to handle FUA data request as normal request.
> >> This way has caused warning[1] in mq-deadline dd_exit_sched() or io hang in
> >> case of kyber since RQF_ELVPRIV isn't set for flush request, then
> >> ->finish_request won't be called.
> >>
> >> [...]
> >
> >Applied, thanks!
> >
> >[1/1] blk-mq: don't insert FUA request with data into scheduler queue
> >      commit: 2b504bd4841bccbf3eb83c1fec229b65956ad8ad
> >
> >Best regards,
> >--
> >Jens Axboe
> >
> >
> >
> 
> I know the discussion is over, >

This thread is just for fixing one recent regression caused by queuing
FUA data into scheduler queue, and actually direct dispach has
been done for very long time, but I don't mean it is reasonable.

> but I can't figure out why we treat FUA as a flush. A FUA write only
> applies to the command at hand, and is not required to flush any previous
> command's data from the device's volatile write cache. Similarly for a
> read request - servicing a read from media is really more the rule than
> the exception (lots of workload dependencies here...), so why would a
> FUA read bypass the scheduler?

Is there linux kernel FUA read users? Just run a quick grep, seems FUA
is just used for sync write.

> The device is always free to service any request from media or cache (
> as long as it follows the applicable volatile write and read cache settings),
> so normally we don't know how it is treating the request, so it doesn't seem
> to matter. 
> 
> Consider a FUA write: Why does the fact that we intend that write to bypass
> the device volatile write cache mean it should bypass the scheduler? All the
> other traffic-shaping algorithms that help effectively schedule writes are
> still applicable. E.g. we know we can delay/coalesce them a bit to allow
> reads to be prioritized, but I can't figure out why we would fast-track a
> FUA write. Isn't the value to the system for scheduling still valid, even
> though we are forcing the data to go to media?

It shouldn't be hard to to queue FUA into scheduler, but details need to
consider, such as, if FUA can be merged with normal IO, maybe others.

Also do you have any test or benchmark result to support the change of
queuing FUA to scheduler?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-mq: don't insert FUA request with data into scheduler queue
  2021-11-20  3:58       ` Ming Lei
@ 2021-11-23 16:11         ` Tim Walker
  0 siblings, 0 replies; 10+ messages in thread
From: Tim Walker @ 2021-11-23 16:11 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, Yi Zhang, linux-block, Christoph Hellwig



On  Friday, November 19, 2021 at 10:59:24 PM Ming Lei wrote:

>
>On Fri, Nov 19, 2021 at 04:49:34PM +0000, Tim Walker wrote:
>>
>>
>> On  19 Nov 2021 Tim Walker wrote:
>>
>> >
>> >
>> >On Thu, 18 Nov 2021 23:30:41 +0800, Ming Lei wrote:
>> >> We never insert flush request into scheduler queue before.
>> >>
>> >> Recently commit d92ca9d8348f ("blk-mq: don't handle non-flush requests in
>> >> blk_insert_flush") tries to handle FUA data request as normal request.
>> >> This way has caused warning[1] in mq-deadline dd_exit_sched() or io hang in
>> >> case of kyber since RQF_ELVPRIV isn't set for flush request, then
>> >> ->finish_request won't be called.
>> >>
>> >> [...]
>> >
>> >Applied, thanks!
>> >
>> >[1/1] blk-mq: don't insert FUA request with data into scheduler queue
>> >      commit: 2b504bd4841bccbf3eb83c1fec229b65956ad8ad
>> >
>> >Best regards,
>> >--
>> >Jens Axboe
>> >
>> >
>> >
>>
>> I know the discussion is over, >
>
>This thread is just for fixing one recent regression caused by queuing
>FUA data into scheduler queue, and actually direct dispach has
>been done for very long time, but I don't mean it is reasonable.
>
>> but I can't figure out why we treat FUA as a flush. A FUA write only
>> applies to the command at hand, and is not required to flush any previous
>> command's data from the device's volatile write cache. Similarly for a
>> read request - servicing a read from media is really more the rule than
>> the exception (lots of workload dependencies here...), so why would a
>> FUA read bypass the scheduler?
>
>Is there linux kernel FUA read users? Just run a quick grep, seems FUA
>is just used for sync write.

I don't know if Linux ever issues FUA read. I was thinking from the device's perspective.

>
>> The device is always free to service any request from media or cache (
>> as long as it follows the applicable volatile write and read cache settings),
>> so normally we don't know how it is treating the request, so it doesn't seem
>> to matter.
>>
>> Consider a FUA write: Why does the fact that we intend that write to bypass
>> the device volatile write cache mean it should bypass the scheduler? All the
>> other traffic-shaping algorithms that help effectively schedule writes are
>> still applicable. E.g. we know we can delay/coalesce them a bit to allow
>> reads to be prioritized, but I can't figure out why we would fast-track a
>> FUA write. Isn't the value to the system for scheduling still valid, even
>> though we are forcing the data to go to media?
>
>It shouldn't be hard to to queue FUA into scheduler, but details need to
>consider, such as, if FUA can be merged with normal IO, maybe others.
>
>Also do you have any test or benchmark result to support the change of
>queuing FUA to scheduler?
>
>
>Thanks,
>Ming
>

No, Unfortunately I don't have any benchmarks. It seems that FUA creates
a shortcut by-passing the scheduler. It does not seem too important for flash, 
but of course HDDs are very sensitive to workload. The scheduler attempts 
to ensure the best workload for the best performance, so perhaps all requests 
should be scheduled, unless there is an architectural reason to the contrary. 
From a HDD's perspective we should schedule a FUA write to maintain an 
optimal workload, and I was looking for the block-layer-architecture reason not to.

Perhaps I can generate some problematic workloads and post them later
for discussion.

Thanks,
-Tim  




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-11-23 16:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-18 15:30 [PATCH] blk-mq: don't insert FUA request with data into scheduler queue Ming Lei
2021-11-18 18:22 ` Bart Van Assche
2021-11-19  6:08 ` Christoph Hellwig
2021-11-19  8:07   ` Ming Lei
2021-11-19 13:28 ` Jens Axboe
     [not found]   ` <BFC93946-13B3-43EC-9E30-8A980CD5234F@seagate.com>
2021-11-19 16:49     ` Tim Walker
2021-11-19 16:56       ` Christoph Hellwig
2021-11-19 17:49         ` Tim Walker
2021-11-20  3:58       ` Ming Lei
2021-11-23 16:11         ` Tim Walker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).