All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] block: Submit flush requests to the I/O scheduler
@ 2022-08-12 21:03 Bart Van Assche
  2022-08-13  6:41 ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2022-08-12 21:03 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Jaegeuk Kim, Bart Van Assche,
	Ming Lei, Damien Le Moal

When submitting a REQ_OP_WRITE | REQ_FUA request to a zoned storage
device, these requests must be passed to the (mq-deadline) I/O scheduler
to ensure that these happen at the write pointer. It has been verfied
that this patch prevents that write pointer violations happen
sporadically when f2fs is using a zoned storage device.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq.c | 31 ++++---------------------------
 1 file changed, 4 insertions(+), 27 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5ee62b95f3e5..530aad95cc33 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2546,16 +2546,14 @@ static blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last)
 	return __blk_mq_try_issue_directly(rq->mq_hctx, rq, true, last);
 }
 
-static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
+static void blk_mq_plug_issue(struct blk_plug *plug, bool from_schedule)
 {
 	struct blk_mq_hw_ctx *hctx = NULL;
 	struct request *rq;
 	int queued = 0;
-	int errors = 0;
 
 	while ((rq = rq_list_pop(&plug->mq_list))) {
 		bool last = rq_list_empty(plug->mq_list);
-		blk_status_t ret;
 
 		if (hctx != rq->mq_hctx) {
 			if (hctx)
@@ -2563,29 +2561,9 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
 			hctx = rq->mq_hctx;
 		}
 
-		ret = blk_mq_request_issue_directly(rq, last);
-		switch (ret) {
-		case BLK_STS_OK:
-			queued++;
-			break;
-		case BLK_STS_RESOURCE:
-		case BLK_STS_DEV_RESOURCE:
-			blk_mq_request_bypass_insert(rq, false, last);
-			blk_mq_commit_rqs(hctx, &queued, from_schedule);
-			return;
-		default:
-			blk_mq_end_request(rq, ret);
-			errors++;
-			break;
-		}
+		blk_mq_sched_insert_request(rq, /*at_head=*/false,
+			/*run_queue=*/last, /*async=*/false);
 	}
-
-	/*
-	 * If we didn't flush the entire list, we could have told the driver
-	 * there was more coming, but that turned out to be a lie.
-	 */
-	if (errors)
-		blk_mq_commit_rqs(hctx, &queued, from_schedule);
 }
 
 static void __blk_mq_flush_plug_list(struct request_queue *q,
@@ -2655,8 +2633,7 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 				return;
 		}
 
-		blk_mq_run_dispatch_ops(q,
-				blk_mq_plug_issue_direct(plug, false));
+		blk_mq_run_dispatch_ops(q, blk_mq_plug_issue(plug, false));
 		if (rq_list_empty(plug->mq_list))
 			return;
 	}

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] block: Submit flush requests to the I/O scheduler
  2022-08-12 21:03 [PATCH] block: Submit flush requests to the I/O scheduler Bart Van Assche
@ 2022-08-13  6:41 ` Christoph Hellwig
  2022-08-14 17:13   ` Damien Le Moal
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2022-08-13  6:41 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Jaegeuk Kim,
	Ming Lei, Damien Le Moal

On Fri, Aug 12, 2022 at 02:03:55PM -0700, Bart Van Assche wrote:
> When submitting a REQ_OP_WRITE | REQ_FUA request to a zoned storage
> device, these requests must be passed to the (mq-deadline) I/O scheduler
> to ensure that these happen at the write pointer.

Yes.

But maybe I'm stupid, but how is the patch related to fixing that?
blk_mq_plug_issue_direct is called from blk_mq_flush_plug_list for
only the !has_elevator case.  How does that change a thing?

Also please include a description of why these changes are otherwise
good and won't regress other cases.

> +		blk_mq_sched_insert_request(rq, /*at_head=*/false,
> +			/*run_queue=*/last, /*async=*/false);

I find thise comment style very hard to read.  Yes, maybe the three
bools here should become flags, but this is even worse than just
passing the arguments.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block: Submit flush requests to the I/O scheduler
  2022-08-13  6:41 ` Christoph Hellwig
@ 2022-08-14 17:13   ` Damien Le Moal
  2022-08-14 23:44     ` Bart Van Assche
  0 siblings, 1 reply; 6+ messages in thread
From: Damien Le Moal @ 2022-08-14 17:13 UTC (permalink / raw)
  To: Christoph Hellwig, Bart Van Assche
  Cc: Jens Axboe, linux-block, Jaegeuk Kim, Ming Lei

On 2022/08/12 23:41, Christoph Hellwig wrote:
> On Fri, Aug 12, 2022 at 02:03:55PM -0700, Bart Van Assche wrote:
>> When submitting a REQ_OP_WRITE | REQ_FUA request to a zoned storage
>> device, these requests must be passed to the (mq-deadline) I/O scheduler
>> to ensure that these happen at the write pointer.
> 
> Yes.
> 
> But maybe I'm stupid, but how is the patch related to fixing that?
> blk_mq_plug_issue_direct is called from blk_mq_flush_plug_list for
> only the !has_elevator case.  How does that change a thing?

And writes to zoned drives never get plugged in the first place, scheduler
present or not.

> 
> Also please include a description of why these changes are otherwise
> good and won't regress other cases.
> 
>> +		blk_mq_sched_insert_request(rq, /*at_head=*/false,
>> +			/*run_queue=*/last, /*async=*/false);
> 
> I find thise comment style very hard to read.  Yes, maybe the three
> bools here should become flags, but this is even worse than just
> passing the arguments.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block: Submit flush requests to the I/O scheduler
  2022-08-14 17:13   ` Damien Le Moal
@ 2022-08-14 23:44     ` Bart Van Assche
  2022-08-15  9:06       ` Pankaj Raghav
  2022-08-15 16:31       ` Damien Le Moal
  0 siblings, 2 replies; 6+ messages in thread
From: Bart Van Assche @ 2022-08-14 23:44 UTC (permalink / raw)
  To: Damien Le Moal, Christoph Hellwig
  Cc: Jens Axboe, linux-block, Jaegeuk Kim, Ming Lei

On 8/14/22 10:13, Damien Le Moal wrote:
> And writes to zoned drives never get plugged in the first place, scheduler
> present or not.

Hi Damien,

I agree that blk_mq_submit_bio() does not plug writes to zoned drives 
because of the following code in blk_mq_plug():

/* Zoned block device write operation case: do not plug the BIO */
if (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio)))
	return NULL;

However, I have not found any code in blk_execute_rq_nowait() that 
causes the plugging mechanism to be skipped for zoned writes. Did I 
perhaps overlook something? The current blk_execute_rq_nowait() 
implementation is as follows:

void blk_execute_rq_nowait(struct request *rq, bool at_head)
{
	WARN_ON(irqs_disabled());
	WARN_ON(!blk_rq_is_passthrough(rq));

	blk_account_io_start(rq);
	if (current->plug)
		blk_add_rq_to_plug(current->plug, rq);
	else
		blk_mq_sched_insert_request(rq, at_head, true, false);
}

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block: Submit flush requests to the I/O scheduler
  2022-08-14 23:44     ` Bart Van Assche
@ 2022-08-15  9:06       ` Pankaj Raghav
  2022-08-15 16:31       ` Damien Le Moal
  1 sibling, 0 replies; 6+ messages in thread
From: Pankaj Raghav @ 2022-08-15  9:06 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Damien Le Moal, Christoph Hellwig, Jens Axboe, linux-block,
	Jaegeuk Kim, Ming Lei

On Sun, Aug 14, 2022 at 04:44:31PM -0700, Bart Van Assche wrote:
> I agree that blk_mq_submit_bio() does not plug writes to zoned drives
> because of the following code in blk_mq_plug():
> 
> /* Zoned block device write operation case: do not plug the BIO */
> if (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio)))
> 	return NULL;
> 
> However, I have not found any code in blk_execute_rq_nowait() that causes
> the plugging mechanism to be skipped for zoned writes. Did I perhaps
> overlook something? The current blk_execute_rq_nowait() implementation is as
> follows:
> 
IIUC, blk_execute_rq_nowait() is used mainly by lower level drivers to send
commands but current->plug is not initialized with blk_start_plug() in those
drivers. So, the rqs are not added to the plug list.

I did a quick test with fio with the new uring_cmd IO path that uses
blk_execute_rq_nowait() and it never plugged the rqs.

fio --filename=/dev/ng0n3 --size=128M --rw=write --bs=4k --zonemode=zbd --ioengine=io_uring_cmd --name=zoned

Did you notice it otherwise?

But I think it is better if we change current->plug to blk_mq_plug() to
be on the safer side.
> void blk_execute_rq_nowait(struct request *rq, bool at_head)
> {
> 	WARN_ON(irqs_disabled());
> 	WARN_ON(!blk_rq_is_passthrough(rq));
> 
> 	blk_account_io_start(rq);
> 	if (current->plug)
> 		blk_add_rq_to_plug(current->plug, rq);
> 	else
> 		blk_mq_sched_insert_request(rq, at_head, true, false);
> }
> 
> Thanks,
> 
> Bart.

-- 
Pankaj Raghav

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block: Submit flush requests to the I/O scheduler
  2022-08-14 23:44     ` Bart Van Assche
  2022-08-15  9:06       ` Pankaj Raghav
@ 2022-08-15 16:31       ` Damien Le Moal
  1 sibling, 0 replies; 6+ messages in thread
From: Damien Le Moal @ 2022-08-15 16:31 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig
  Cc: Jens Axboe, linux-block, Jaegeuk Kim, Ming Lei

On 2022/08/14 16:44, Bart Van Assche wrote:
> On 8/14/22 10:13, Damien Le Moal wrote:
>> And writes to zoned drives never get plugged in the first place, scheduler
>> present or not.
> 
> Hi Damien,
> 
> I agree that blk_mq_submit_bio() does not plug writes to zoned drives 
> because of the following code in blk_mq_plug():
> 
> /* Zoned block device write operation case: do not plug the BIO */
> if (bdev_is_zoned(bio->bi_bdev) && op_is_write(bio_op(bio)))
> 	return NULL;
> 
> However, I have not found any code in blk_execute_rq_nowait() that 
> causes the plugging mechanism to be skipped for zoned writes. Did I 
> perhaps overlook something? The current blk_execute_rq_nowait() 
> implementation is as follows:
> 
> void blk_execute_rq_nowait(struct request *rq, bool at_head)
> {
> 	WARN_ON(irqs_disabled());
> 	WARN_ON(!blk_rq_is_passthrough(rq));
> 
> 	blk_account_io_start(rq);
> 	if (current->plug)
> 		blk_add_rq_to_plug(current->plug, rq);
> 	else
> 		blk_mq_sched_insert_request(rq, at_head, true, false);
> }

As far as I understand it, and checking the call sites, this is for LLD internal
commands only. And I think Pankaj has a good point for a fix to this one. Though
I would hate to see an LLD issue a write request though.

For f2fs, it seems to me that the problem is more with the code in
block/blk-flush.c where functions bypassing the scheduler are used for writes,
e.g. blk_insert_flush() / blk_mq_request_bypass_insert(). I am not 100% sure
though but that definitely looks very suspicious.

> 
> Thanks,
> 
> Bart.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-08-15 16:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-12 21:03 [PATCH] block: Submit flush requests to the I/O scheduler Bart Van Assche
2022-08-13  6:41 ` Christoph Hellwig
2022-08-14 17:13   ` Damien Le Moal
2022-08-14 23:44     ` Bart Van Assche
2022-08-15  9:06       ` Pankaj Raghav
2022-08-15 16:31       ` Damien Le Moal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.