linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Al Viro <viro@zeniv.linux.org.uk>, Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@infradead.org>, linux-fsdevel@vger.kernel.org
Subject: Re: [RFC] what to do with IOCB_DSYNC?
Date: Sat, 21 May 2022 16:14:07 -0600	[thread overview]
Message-ID: <f2547f65-1a37-793d-07ba-f54d018e16d4@kernel.dk> (raw)
In-Reply-To: <70b5e4a8-1daa-dc75-af58-9d82a732a6be@kernel.dk>

On 5/21/22 1:03 PM, Jens Axboe wrote:
> On 5/21/22 11:48 AM, Al Viro wrote:
>> [resurrecting an old thread]
>>
>> On Mon, Jun 21, 2021 at 04:35:01PM +0200, Christoph Hellwig wrote:
>>> On Mon, Jun 21, 2021 at 02:32:46PM +0000, Al Viro wrote:
>>>> 	I'd rather have a single helper for those checks, rather than
>>>> open-coding IS_SYNC() + IOCB_DSYNC in each, for obvious reasons...
>>>
>>> Yes, I think something like:
>>>
>>> static inline bool iocb_is_sync(struct kiocb *iocb)
>>> {
>>> 	return (iocb->ki_flags & IOCB_DSYNC) ||
>>> 		S_SYNC(iocb->ki_filp->f_mapping->host);
>>> }
>>>
>>> should do the job.
>>
>> There's a problem with that variant.  Take a look at btrfs_direct_write():
>>         const bool is_sync_write = (iocb->ki_flags & IOCB_DSYNC);
>> 	...
>>         /*
>> 	 * We remove IOCB_DSYNC so that we don't deadlock when iomap_dio_rw()
>> 	 * calls generic_write_sync() (through iomap_dio_complete()), because
>> 	 * that results in calling fsync (btrfs_sync_file()) which will try to
>> 	 * lock the inode in exclusive/write mode.
>> 	 */
>> 	if (is_sync_write)
>> 		iocb->ki_flags &= ~IOCB_DSYNC;
>> 	...
>>
>> 	/*
>> 	 * Add back IOCB_DSYNC. Our caller, btrfs_file_write_iter(), will do  
>> 	 * the fsync (call generic_write_sync()).
>> 	 */
>> 	if (is_sync_write)
>> 		iocb->ki_flags |= IOCB_DSYNC;
>>
>> will run into trouble.  How about this (completely untested):
>>
>> Precalculate iocb_flags()
>>
>> Store the value in file->f_i_flags; calculate at open time (do_dentry_open()
>> for opens, alloc_file() for pipe(2)/socket(2)/etc.), update at FCNTL_SETFL
>> time.
>>
>> IOCB_DSYNC is... special in that respect; replace checks for it with
>> an inlined helper (iocb_is_dsync()), leave the checks of underlying fs
>> mounted with -o sync and/or inode being marked sync until then.
>> To avoid breaking btrfs deadlock avoidance, add an explicit "no dsync" flag
>> that would suppress IOCB_DSYNC; that way btrfs_direct_write() can set it
>> for the duration of work where it wants to avoid generic_write_sync()
>> triggering.
>>
>> That ought to reduce the overhead of new_sync_{read,write}() quite a bit.
>>
>> NEEDS TESTING; NOT FOR MERGE IN THAT FORM.
> 
> Definitely generates better code here, unsurprisingly. Unfortunately it
> doesn't seem to help a whole lot for a direct comparison, though it does
> nudge us in the right direction.
> 
> The main issue here, using urandom and 4k reads (iter reads done,
> non-iter writes), the main difference seems to be that with the non-iter
> reads, here's our copy overhead:
> 
> +    1.80%  dd  [kernel.kallsyms]  [k] __arch_copy_to_user
> +    0.74%  dd  [kernel.kallsyms]  [k] _copy_to_user
> 
> and with the iter variant, for the same workload, it looks like this:
> 
> +    4.13%  dd  [kernel.kallsyms]  [k] _copy_to_iter
> +    0.88%  dd  [kernel.kallsyms]  [k] __arch_copy_to_user
> +    0.72%  dd  [kernel.kallsyms]  [k] copyout
> 
> and about 1% just doing the iov_iter advance. Since this test case is a
> single iovec read, ran a quick test where we simply use this helper:
> 
> static size_t random_copy(void *src, int size, struct iov_iter *to)
> {
> 	const struct iovec *iov = to->iov;
> 	size_t to_copy = min_t(size_t, size, iov->iov_len);
> 
> 	if (copy_to_user(iov->iov_base, src, to_copy))
> 		return 0;
> 
> 	to->count -= to_copy;
> 	return to_copy;
> }
> 
> rather than copy_to_iter(). That brings us within 0.8% of the non-iter
> read performance:
> 
> non-iter:	408MB/sec
> iter:		395MB/sec	-3.2%
> iter+al+hack	406MB/sec	-0.8%
> 
> for 32-byte reads:
> 
> non-iter	73.1MB/sec
> iter:		72.1MB/sec	-1.4%
> iter+al+hack	72.5MB/sec	-0.8%
> 
> which as mentioned in my initial email is closer than the 4k, but still
> sees a nice reduction in overhead.

Experimented a bit more - if we special case nr_segs == 1 && is_iovec in
_copy_to_iter():

_copy_to_iter()
{
...
	if (iter_is_iovec(i)) {
		might_fault();
		if (i->nr_segs == 1) {
			/* copy */
			iov_iter_advance(i, copied);
			return copied;
		}
	}
	iterate_and_advance(i, bytes, base, len, off, ...
	return bytes;
}

Then we're almost on par, and it looks like we just need to special case
iov_iter_advance() for the nr_segs == 1 as well to be on par. This is on
top of your patch as well, fwiw.

It might make sense to special case the single segment cases, for both
setup, iteration, and advancing. With that, I think we'll be where we
want to be, and there will be no discernable difference between the iter
paths and the old style paths.

-- 
Jens Axboe


  reply	other threads:[~2022-05-21 22:14 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-21  0:46 [RFC] what to do with IOCB_DSYNC? Al Viro
2021-06-21 13:59 ` Christoph Hellwig
2021-06-21 14:03   ` Matthew Wilcox
2021-06-21 14:09     ` Christoph Hellwig
2021-06-21 14:16       ` Matthew Wilcox
2021-06-21 14:22         ` Christoph Hellwig
2021-06-21 14:32           ` Al Viro
2021-06-21 14:35             ` Christoph Hellwig
2021-06-21 15:22               ` Jens Axboe
2022-05-21 17:48               ` Al Viro
2022-05-21 19:03                 ` Jens Axboe
2022-05-21 22:14                   ` Jens Axboe [this message]
2022-05-22  7:45                     ` Christoph Hellwig
2022-05-22 10:23                       ` Matthew Wilcox
2022-05-22 10:36                         ` Al Viro
2022-05-22 11:15                           ` Matthew Wilcox
2022-05-22 11:45                             ` Christoph Hellwig
2022-05-22 12:39                               ` Jens Axboe
2022-05-22 12:48                                 ` Al Viro
2022-05-22 13:02                                   ` Jens Axboe
2022-05-22 13:07                                     ` Al Viro
2022-05-22 13:09                                       ` Jens Axboe
2022-05-22 18:06                                         ` Jens Axboe
2022-05-22 18:25                                           ` Al Viro
2022-05-22 18:29                                             ` Jens Axboe
2022-05-22 18:39                                               ` Al Viro
2022-05-22 18:48                                                 ` Jens Axboe
2022-05-22 19:04                                                   ` Al Viro
2022-05-22 20:03                                                     ` Jens Axboe
2022-05-23  0:42                                                       ` Al Viro
2022-05-23  1:22                                                         ` Jens Axboe
2022-05-23  1:28                                                           ` Jens Axboe
2022-05-23  1:50                                                             ` Jens Axboe
2022-05-23  2:43                                                               ` Jens Axboe
2022-05-23 14:22                                                                 ` Al Viro
2022-05-23 14:34                                                                   ` Jens Axboe
2022-05-23 14:47                                                                     ` Al Viro
2022-05-23 15:12                                                                       ` Jens Axboe
2022-05-23 15:44                                                                         ` Jens Axboe
2022-05-23 15:49                                                                           ` Matthew Wilcox
2022-05-23 15:55                                                                             ` Jens Axboe
2022-05-23 16:03                                                                               ` Jens Axboe
2022-05-26 14:46                                                                                 ` Jason A. Donenfeld
2022-05-27 10:09                                                                                   ` Ingo Molnar
2022-05-27 10:15                                                                                     ` Jason A. Donenfeld
2022-05-27 14:45                                                                                       ` Samuel Neves
2022-05-27 10:25                                                                                     ` Ingo Molnar
2022-05-27 10:36                                                                                       ` Borislav Petkov
2022-05-28 20:54                                                                                         ` Sedat Dilek
2022-05-28 20:38                                                                                       ` Sedat Dilek
2022-05-28 20:39                                                                                         ` Sedat Dilek
2022-05-23 16:15                                                                         ` Al Viro
2022-05-25 14:34                                                         ` Matthew Wilcox
2022-05-26 23:19                                                     ` Al Viro
2022-05-27 14:51                                                       ` Jens Axboe
2022-05-22 12:21                             ` Al Viro
2022-05-22  7:43                 ` Christoph Hellwig
2022-05-22 12:41                   ` Al Viro
2022-05-22 12:51                     ` Christoph Hellwig
2021-06-21 14:22       ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f2547f65-1a37-793d-07ba-f54d018e16d4@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).