* [PATCH] block: Submit flush requests to the I/O scheduler
@ 2022-08-12 21:03 Bart Van Assche
2022-08-13 6:41 ` Christoph Hellwig
0 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2022-08-12 21:03 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Bart Van Assche,
Ming Lei, Damien Le Moal
When submitting a REQ_OP_WRITE | REQ_FUA request to a zoned storage
device, these requests must be passed to the (mq-deadline) I/O scheduler
to ensure that these happen at the write pointer. It has been verfied
that this patch prevents that write pointer violations happen
sporadically when f2fs is using a zoned storage device.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
block/blk-mq.c | 31 ++++---------------------------
1 file changed, 4 insertions(+), 27 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5ee62b95f3e5..530aad95cc33 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2546,16 +2546,14 @@ static blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last)
return __blk_mq_try_issue_directly(rq->mq_hctx, rq, true, last);
}
-static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
+static void blk_mq_plug_issue(struct blk_plug *plug, bool from_schedule)
{
struct blk_mq_hw_ctx *hctx = NULL;
struct request *rq;
int queued = 0;
- int errors = 0;
while ((rq = rq_list_pop(&plug->mq_list))) {
bool last = rq_list_empty(plug->mq_list);
- blk_status_t ret;
if (hctx != rq->mq_hctx) {
if (hctx)
@@ -2563,29 +2561,9 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
hctx = rq->mq_hctx;
}
- ret = blk_mq_request_issue_directly(rq, last);
- switch (ret) {
- case BLK_STS_OK:
- queued++;
- break;
- case BLK_STS_RESOURCE:
- case BLK_STS_DEV_RESOURCE:
- blk_mq_request_bypass_insert(rq, false, last);
- blk_mq_commit_rqs(hctx, &queued, from_schedule);
- return;
- default:
- blk_mq_end_request(rq, ret);
- errors++;
- break;
- }
+ blk_mq_sched_insert_request(rq, /*at_head=*/false,
+ /*run_queue=*/last, /*async=*/false);
}
-
- /*
- * If we didn't flush the entire list, we could have told the driver
- * there was more coming, but that turned out to be a lie.
- */
- if (errors)
- blk_mq_commit_rqs(hctx, &queued, from_schedule);
}
static void __blk_mq_flush_plug_list(struct request_queue *q,
@@ -2655,8 +2633,7 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
return;
}
- blk_mq_run_dispatch_ops(q,
- blk_mq_plug_issue_direct(plug, false));
+ blk_mq_run_dispatch_ops(q, blk_mq_plug_issue(plug, false));
if (rq_list_empty(plug->mq_list))
return;
}
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] block: Submit flush requests to the I/O scheduler
2022-08-12 21:03 [PATCH] block: Submit flush requests to the I/O scheduler Bart Van Assche
@ 2022-08-13 6:41 ` Christoph Hellwig
2022-08-14 17:13 ` Damien Le Moal
0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2022-08-13 6:41 UTC (permalink / raw)
To: Bart Van Assche
Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim,
Ming Lei, Damien Le Moal
On Fri, Aug 12, 2022 at 02:03:55PM -0700, Bart Van Assche wrote:
> When submitting a REQ_OP_WRITE | REQ_FUA request to a zoned storage
> device, these requests must be passed to the (mq-deadline) I/O scheduler
> to ensure that these happen at the write pointer.
Yes.
But maybe I'm stupid, but how is the patch related to fixing that?
blk_mq_plug_issue_direct is called from blk_mq_flush_plug_list for
only the !has_elevator case. How does that change a thing?
Also please include a description of why these changes are otherwise
good and won't regress other cases.
> + blk_mq_sched_insert_request(rq, /*at_head=*/false,
> + /*run_queue=*/last, /*async=*/false);
I find thise comment style very hard to read. Yes, maybe the three
bools here should become flags, but this is even worse than just
passing the arguments.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] block: Submit flush requests to the I/O scheduler
2022-08-13 6:41 ` Christoph Hellwig
@ 2022-08-14 17:13 ` Damien Le Moal
2022-08-14 23:44 ` Bart Van Assche
0 siblings, 1 reply; 6+ messages in thread
From: Damien Le Moal @ 2022-08-14 17:13 UTC (permalink / raw)
To: Christoph Hellwig, Bart Van Assche
Cc: Jens Axboe, linux-block, Jaegeuk Kim, Ming Lei
On 2022/08/12 23:41, Christoph Hellwig wrote:
> On Fri, Aug 12, 2022 at 02:03:55PM -0700, Bart Van Assche wrote:
>> When submitting a REQ_OP_WRITE | REQ_FUA request to a zoned storage
>> device, these requests must be passed to the (mq-deadline) I/O scheduler
>> to ensure that these happen at the write pointer.
>
> Yes.
>
> But maybe I'm stupid, but how is the patch related to fixing that?
> blk_mq_plug_issue_direct is called from blk_mq_flush_plug_list for
> only the !has_elevator case. How does that change a thing?
And writes to zoned drives never get plugged in the first place, scheduler
present or not.
>
> Also please include a description of why these changes are otherwise
> good and won't regress other cases.
>
>> + blk_mq_sched_insert_request(rq, /*at_head=*/false,
>> + /*run_queue=*/last, /*async=*/false);
>
> I find thise comment style very hard to read. Yes, maybe the three
> bools here should become flags, but this is even worse than just
> passing the arguments.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] block: Submit flush requests to the I/O scheduler
2022-08-14 17:13 ` Damien Le Moal
@ 2022-08-14 23:44 ` Bart Van Assche
2022-08-15 9:06 ` Pankaj Raghav
2022-08-15 16:31 ` Damien Le Moal
0 siblings, 2 replies; 6+ messages in thread
From: Bart Van Assche @ 2022-08-14 23:44 UTC (permalink / raw)
To: Damien Le Moal, Christoph Hellwig
Cc: Jens Axboe, linux-block, Jaegeuk Kim, Ming Lei
On 8/14/22 10:13, Damien Le Moal wrote:
> And writes to zoned drives never get plugged in the first place, scheduler
> present or not.
Hi Damien,
I agree that blk_mq_submit_bio() does not plug writes to zoned drives
because of the following code in blk_mq_plug():
/* Zoned block device write operation case: do not plug the BIO */
if (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio)))
return NULL;
However, I have not found any code in blk_execute_rq_nowait() that
causes the plugging mechanism to be skipped for zoned writes. Did I
perhaps overlook something? The current blk_execute_rq_nowait()
implementation is as follows:
void blk_execute_rq_nowait(struct request *rq, bool at_head)
{
WARN_ON(irqs_disabled());
WARN_ON(!blk_rq_is_passthrough(rq));
blk_account_io_start(rq);
if (current->plug)
blk_add_rq_to_plug(current->plug, rq);
else
blk_mq_sched_insert_request(rq, at_head, true, false);
}
Thanks,
Bart.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] block: Submit flush requests to the I/O scheduler
2022-08-14 23:44 ` Bart Van Assche
@ 2022-08-15 9:06 ` Pankaj Raghav
2022-08-15 16:31 ` Damien Le Moal
1 sibling, 0 replies; 6+ messages in thread
From: Pankaj Raghav @ 2022-08-15 9:06 UTC (permalink / raw)
To: Bart Van Assche
Cc: Damien Le Moal, Christoph Hellwig, Jens Axboe, linux-block,
Jaegeuk Kim, Ming Lei
On Sun, Aug 14, 2022 at 04:44:31PM -0700, Bart Van Assche wrote:
> I agree that blk_mq_submit_bio() does not plug writes to zoned drives
> because of the following code in blk_mq_plug():
>
> /* Zoned block device write operation case: do not plug the BIO */
> if (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio)))
> return NULL;
>
> However, I have not found any code in blk_execute_rq_nowait() that causes
> the plugging mechanism to be skipped for zoned writes. Did I perhaps
> overlook something? The current blk_execute_rq_nowait() implementation is as
> follows:
>
IIUC, blk_execute_rq_nowait() is used mainly by lower level drivers to send
commands but current->plug is not initialized with blk_start_plug() in those
drivers. So, the rqs are not added to the plug list.
I did a quick test with fio with the new uring_cmd IO path that uses
blk_execute_rq_nowait() and it never plugged the rqs.
fio --filename=/dev/ng0n3 --size=128M --rw=write --bs=4k --zonemode=zbd --ioengine=io_uring_cmd --name=zoned
Did you notice it otherwise?
But I think it is better if we change current->plug to blk_mq_plug() to
be on the safer side.
> void blk_execute_rq_nowait(struct request *rq, bool at_head)
> {
> WARN_ON(irqs_disabled());
> WARN_ON(!blk_rq_is_passthrough(rq));
>
> blk_account_io_start(rq);
> if (current->plug)
> blk_add_rq_to_plug(current->plug, rq);
> else
> blk_mq_sched_insert_request(rq, at_head, true, false);
> }
>
> Thanks,
>
> Bart.
--
Pankaj Raghav
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] block: Submit flush requests to the I/O scheduler
2022-08-14 23:44 ` Bart Van Assche
2022-08-15 9:06 ` Pankaj Raghav
@ 2022-08-15 16:31 ` Damien Le Moal
1 sibling, 0 replies; 6+ messages in thread
From: Damien Le Moal @ 2022-08-15 16:31 UTC (permalink / raw)
To: Bart Van Assche, Christoph Hellwig
Cc: Jens Axboe, linux-block, Jaegeuk Kim, Ming Lei
On 2022/08/14 16:44, Bart Van Assche wrote:
> On 8/14/22 10:13, Damien Le Moal wrote:
>> And writes to zoned drives never get plugged in the first place, scheduler
>> present or not.
>
> Hi Damien,
>
> I agree that blk_mq_submit_bio() does not plug writes to zoned drives
> because of the following code in blk_mq_plug():
>
> /* Zoned block device write operation case: do not plug the BIO */
> if (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio)))
> return NULL;
>
> However, I have not found any code in blk_execute_rq_nowait() that
> causes the plugging mechanism to be skipped for zoned writes. Did I
> perhaps overlook something? The current blk_execute_rq_nowait()
> implementation is as follows:
>
> void blk_execute_rq_nowait(struct request *rq, bool at_head)
> {
> WARN_ON(irqs_disabled());
> WARN_ON(!blk_rq_is_passthrough(rq));
>
> blk_account_io_start(rq);
> if (current->plug)
> blk_add_rq_to_plug(current->plug, rq);
> else
> blk_mq_sched_insert_request(rq, at_head, true, false);
> }
As far as I understand it, and checking the call sites, this is for LLD internal
commands only. And I think Pankaj has a good point for a fix to this one. Though
I would hate to see an LLD issue a write request though.
For f2fs, it seems to me that the problem is more with the code in
block/blk-flush.c where functions bypassing the scheduler are used for writes,
e.g. blk_insert_flush() / blk_mq_request_bypass_insert(). I am not 100% sure
though but that definitely looks very suspicious.
>
> Thanks,
>
> Bart.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-08-15 16:31 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-12 21:03 [PATCH] block: Submit flush requests to the I/O scheduler Bart Van Assche
2022-08-13 6:41 ` Christoph Hellwig
2022-08-14 17:13 ` Damien Le Moal
2022-08-14 23:44 ` Bart Van Assche
2022-08-15 9:06 ` Pankaj Raghav
2022-08-15 16:31 ` Damien Le Moal
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.