io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org, hch@lst.de,
	andres@anarazel.de
Subject: Re: [PATCH 4/8] iomap: completed polled IO inline
Date: Sat, 22 Jul 2023 07:43:43 +1000	[thread overview]
Message-ID: <ZLr8D60gYqDrHl21@dread.disaster.area> (raw)
In-Reply-To: <20230720181310.71589-5-axboe@kernel.dk>

On Thu, Jul 20, 2023 at 12:13:06PM -0600, Jens Axboe wrote:
> Polled IO is only allowed for conditions where task completion is safe
> anyway, so we can always complete it inline. This cannot easily be
> checked with a submission side flag, as the block layer may clear the
> polled flag and turn it into a regular IO instead. Hence we need to
> check this at completion time. If REQ_POLLED is still set, then we know
> that this IO was successfully polled, and is completing in task context.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  fs/iomap/direct-io.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 9f97d0d03724..c3ea1839628f 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -173,9 +173,19 @@ void iomap_dio_bio_end_io(struct bio *bio)
>  	}
>  
>  	/*
> -	 * Flagged with IOMAP_DIO_INLINE_COMP, we can complete it inline
> +	 * Flagged with IOMAP_DIO_INLINE_COMP, we can complete it inline.
> +	 * Ditto for polled requests - if the flag is still at completion
> +	 * time, then we know the request was actually polled and completion
> +	 * is called from the task itself. This is why we need to check it
> +	 * here rather than flag it at issue time.
>  	 */
> -	if (dio->flags & IOMAP_DIO_INLINE_COMP) {
> +	if ((dio->flags & IOMAP_DIO_INLINE_COMP) || (bio->bi_opf & REQ_POLLED)) {

This still smells wrong to me. Let me see if I can work out why...

<spelunk!>

When we set up the IO in iomap_dio_bio_iter(), we do this:

        /*
         * We can only poll for single bio I/Os.
         */
        if (need_zeroout ||
            ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode)))
                dio->iocb->ki_flags &= ~IOCB_HIPRI;

The "need_zeroout" covers writes into unwritten regions that require
conversion at IO completion, and the latter check is for writes
extending EOF. i.e. this covers the cases where we have dirty
metadata for this specific write and so may need transactions or
journal/metadata IO during IO completion.

The only case it doesn't cover is clearing IOCB_HIPRI for O_DSYNC IO
that may require a call to generic_write_sync() in completion. That
is, if we aren't using FUA, will not have IOMAP_DIO_INLINE_COMP set,
but still do polled IO.

I think this is a bug. We don't want to be issuing more IO in
REQ_POLLED task context during IO completion, and O_DSYNC IO
completion for non-FUA IO requires a journal flush and that can
issue lots of journal IO and wait on it in completion process.

Hence I think we should only be setting REQ_POLLED in the cases
where IOCB_HIPRI and IOMAP_DIO_INLINE_COMP are both set.  If
IOMAP_DIO_INLINE_COMP is set on the dio, then it doesn't matter what
context we are in at completion time or whether REQ_POLLED was set
or cleared during the IO....

That means the above check should be:

        /*
         * We can only poll for single bio I/Os that can run inline
	 * completion.
         */
        if (need_zeroout ||
	    (iocb_is_dsync(dio->iocb) && !use_fua) ||
            ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode)))
                dio->iocb->ki_flags &= ~IOCB_HIPRI;

or if we change the logic such that calculate IOMAP_DIO_INLINE_COMP
first:

	if (!(dio->flags & IOMAP_DIO_INLINE_COMP))
		dio->iocb->ki_flags &= ~IOCB_HIPRI;

Then we don't need to care about polled IO on the completion side at
all at the iomap layer because it doesn't change the completion
requirements at all...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2023-07-21 21:43 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-20 18:13 [PATCHSET v4 0/8] Improve async iomap DIO performance Jens Axboe
2023-07-20 18:13 ` [PATCH 1/8] iomap: cleanup up iomap_dio_bio_end_io() Jens Axboe
2023-07-21  6:14   ` Christoph Hellwig
2023-07-21 15:13   ` Darrick J. Wong
2023-07-20 18:13 ` [PATCH 2/8] iomap: add IOMAP_DIO_INLINE_COMP Jens Axboe
2023-07-21  6:14   ` Christoph Hellwig
2023-07-21 15:16   ` Darrick J. Wong
2023-07-20 18:13 ` [PATCH 3/8] iomap: treat a write through cache the same as FUA Jens Axboe
2023-07-21  6:15   ` Christoph Hellwig
2023-07-21 14:04     ` Jens Axboe
2023-07-21 15:55       ` Darrick J. Wong
2023-07-21 16:03         ` Jens Axboe
2023-07-20 18:13 ` [PATCH 4/8] iomap: completed polled IO inline Jens Axboe
2023-07-21  6:16   ` Christoph Hellwig
2023-07-21 15:19   ` Darrick J. Wong
2023-07-21 21:43   ` Dave Chinner [this message]
2023-07-22  3:10     ` Jens Axboe
2023-07-22 23:05       ` Dave Chinner
2023-07-24 22:35         ` Jens Axboe
2023-07-22 16:54     ` Jens Axboe
2023-07-20 18:13 ` [PATCH 5/8] iomap: only set iocb->private for polled bio Jens Axboe
2023-07-21  6:18   ` Christoph Hellwig
2023-07-21 15:35   ` Darrick J. Wong
2023-07-21 15:37     ` Jens Axboe
2023-07-20 18:13 ` [PATCH 6/8] fs: add IOCB flags related to passing back dio completions Jens Axboe
2023-07-21  6:18   ` Christoph Hellwig
2023-07-21 15:48   ` Darrick J. Wong
2023-07-21 15:53     ` Jens Axboe
2023-07-20 18:13 ` [PATCH 7/8] io_uring/rw: add write support for IOCB_DIO_DEFER Jens Axboe
2023-07-21  6:19   ` Christoph Hellwig
2023-07-21 15:50   ` Darrick J. Wong
2023-07-21 15:53     ` Jens Axboe
2023-07-20 18:13 ` [PATCH 8/8] iomap: support IOCB_DIO_DEFER Jens Axboe
2023-07-21  6:19   ` Christoph Hellwig
2023-07-21 16:01   ` Darrick J. Wong
2023-07-21 16:30     ` Jens Axboe
2023-07-21 22:05   ` Dave Chinner
2023-07-22  3:12     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZLr8D60gYqDrHl21@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=andres@anarazel.de \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).