All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org, hch@lst.de,
	andres@anarazel.de
Subject: Re: [PATCH 1/5] iomap: complete polled writes inline
Date: Wed, 12 Jul 2023 11:02:07 +1000	[thread overview]
Message-ID: <ZK37j/BqFYXLjV/B@dread.disaster.area> (raw)
In-Reply-To: <20230711203325.208957-2-axboe@kernel.dk>

On Tue, Jul 11, 2023 at 02:33:21PM -0600, Jens Axboe wrote:
> Polled IO is always reaped in the context of the process itself, so it
> does not need to be punted to a workqueue for the completion. This is
> different than IRQ driven IO, where iomap_dio_bio_end_io() will be
> invoked from hard/soft IRQ context. For those cases we currently need
> to punt to a workqueue for further processing. For the polled case,
> since it's the task itself reaping completions, we're already in task
> context. That makes it identical to the sync completion case.
> 
> Testing a basic QD 1..8 dio random write with polled IO with the
> following fio job:
> 
> fio --name=polled-dio-write --filename=/data1/file --time_based=1 \
> --runtime=10 --bs=4096 --rw=randwrite --norandommap --buffered=0 \
> --cpus_allowed=4 --ioengine=io_uring --iodepth=$depth --hipri=1

Ok, so this is testing pure overwrite DIOs as fio pre-writes the
file prior to starting the random write part of the test.

> yields:
> 
> 	Stock	Patched		Diff
> =======================================
> QD1	180K	201K		+11%
> QD2	356K	394K		+10%
> QD4	608K	650K		+7%
> QD8	827K	831K		+0.5%
> 
> which shows a nice win, particularly for lower queue depth writes.
> This is expected, as higher queue depths will be busy polling
> completions while the offloaded workqueue completions can happen in
> parallel.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  fs/iomap/direct-io.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index ea3b868c8355..343bde5d50d3 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -161,15 +161,16 @@ void iomap_dio_bio_end_io(struct bio *bio)
>  			struct task_struct *waiter = dio->submit.waiter;
>  			WRITE_ONCE(dio->submit.waiter, NULL);
>  			blk_wake_io_task(waiter);
> -		} else if (dio->flags & IOMAP_DIO_WRITE) {
> +		} else if ((bio->bi_opf & REQ_POLLED) ||
> +			   !(dio->flags & IOMAP_DIO_WRITE)) {
> +			WRITE_ONCE(dio->iocb->private, NULL);
> +			iomap_dio_complete_work(&dio->aio.work);

I'm not sure this is safe for all polled writes. What if the DIO
write was into a hole and we have to run unwritten extent
completion via:

iomap_dio_complete_work(work)
  iomap_dio_complete(dio)
    dio->end_io(iocb)
      xfs_dio_write_end_io()
        xfs_iomap_write_unwritten()
          <runs transactions, takes rwsems, does IO>
  .....
  ki->ki_complete()
    io_complete_rw_iopoll()
  .....

I don't see anything in the iomap DIO path that prevents us from
doing HIPRI/REQ_POLLED IO on IOMAP_UNWRITTEN extents, hence I think
this change will result in bad things happening in general.

> +		} else {
>  			struct inode *inode = file_inode(dio->iocb->ki_filp);
>  
>  			WRITE_ONCE(dio->iocb->private, NULL);
>  			INIT_WORK(&dio->aio.work, iomap_dio_complete_work);
>  			queue_work(inode->i_sb->s_dio_done_wq, &dio->aio.work);
> -		} else {
> -			WRITE_ONCE(dio->iocb->private, NULL);
> -			iomap_dio_complete_work(&dio->aio.work);
>  		}
>  	}

Regardless of the correctness of the code, I don't think adding this
special case is the right thing to do here.  We should be able to
complete all writes that don't require blocking completions directly
here, not just polled writes.

We recently had this discussion over hacking a special case "don't
queue for writes" for ext4 into this code - I had to point out the
broken O_DSYNC completion cases it resulted in there, too. I also
pointed out that we already had generic mechanisms in iomap to
enable us to make a submission time decision as to whether
completion needed to be queued or not. Thread here:

https://lore.kernel.org/linux-xfs/20230621174114.1320834-1-bongiojp@gmail.com/

Essentially, we shouldn't be using IOMAP_DIO_WRITE as the
determining factor for queuing completions - we should be using
the information the iocb and the iomap provides us at submission
time similar to how we determine if we can use REQ_FUA for O_DSYNC
writes to determine if iomap IO completion queuing is required.

This will do the correct *and* optimal thing for all types of
writes, polled or not...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2023-07-12  1:02 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-11 20:33 [PATCHSET 0/5] Improve async iomap DIO performance Jens Axboe
2023-07-11 20:33 ` [PATCH 1/5] iomap: complete polled writes inline Jens Axboe
2023-07-12  1:02   ` Dave Chinner [this message]
2023-07-12  1:17     ` Jens Axboe
2023-07-12  2:51       ` Dave Chinner
2023-07-12 15:22     ` Christoph Hellwig
2023-07-11 20:33 ` [PATCH 2/5] fs: add IOCB flags related to passing back dio completions Jens Axboe
2023-07-11 20:33 ` [PATCH 3/5] io_uring/rw: add write support for IOCB_DIO_DEFER Jens Axboe
2023-07-11 20:33 ` [PATCH 4/5] iomap: add local 'iocb' variable in iomap_dio_bio_end_io() Jens Axboe
2023-07-11 20:33 ` [PATCH 5/5] iomap: support IOCB_DIO_DEFER Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZK37j/BqFYXLjV/B@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=andres@anarazel.de \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.