From: Jens Axboe <axboe@kernel.dk>
To: Al Viro <viro@zeniv.linux.org.uk>, Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@infradead.org>, linux-fsdevel@vger.kernel.org
Subject: Re: [RFC] what to do with IOCB_DSYNC?
Date: Sat, 21 May 2022 13:03:45 -0600 [thread overview]
Message-ID: <70b5e4a8-1daa-dc75-af58-9d82a732a6be@kernel.dk> (raw)
In-Reply-To: <Yokl+uHTVWFxoQGn@zeniv-ca.linux.org.uk>
On 5/21/22 11:48 AM, Al Viro wrote:
> [resurrecting an old thread]
>
> On Mon, Jun 21, 2021 at 04:35:01PM +0200, Christoph Hellwig wrote:
>> On Mon, Jun 21, 2021 at 02:32:46PM +0000, Al Viro wrote:
>>> I'd rather have a single helper for those checks, rather than
>>> open-coding IS_SYNC() + IOCB_DSYNC in each, for obvious reasons...
>>
>> Yes, I think something like:
>>
>> static inline bool iocb_is_sync(struct kiocb *iocb)
>> {
>> return (iocb->ki_flags & IOCB_DSYNC) ||
>> S_SYNC(iocb->ki_filp->f_mapping->host);
>> }
>>
>> should do the job.
>
> There's a problem with that variant. Take a look at btrfs_direct_write():
> const bool is_sync_write = (iocb->ki_flags & IOCB_DSYNC);
> ...
> /*
> * We remove IOCB_DSYNC so that we don't deadlock when iomap_dio_rw()
> * calls generic_write_sync() (through iomap_dio_complete()), because
> * that results in calling fsync (btrfs_sync_file()) which will try to
> * lock the inode in exclusive/write mode.
> */
> if (is_sync_write)
> iocb->ki_flags &= ~IOCB_DSYNC;
> ...
>
> /*
> * Add back IOCB_DSYNC. Our caller, btrfs_file_write_iter(), will do
> * the fsync (call generic_write_sync()).
> */
> if (is_sync_write)
> iocb->ki_flags |= IOCB_DSYNC;
>
> will run into trouble. How about this (completely untested):
>
> Precalculate iocb_flags()
>
> Store the value in file->f_i_flags; calculate at open time (do_dentry_open()
> for opens, alloc_file() for pipe(2)/socket(2)/etc.), update at FCNTL_SETFL
> time.
>
> IOCB_DSYNC is... special in that respect; replace checks for it with
> an inlined helper (iocb_is_dsync()), leave the checks of underlying fs
> mounted with -o sync and/or inode being marked sync until then.
> To avoid breaking btrfs deadlock avoidance, add an explicit "no dsync" flag
> that would suppress IOCB_DSYNC; that way btrfs_direct_write() can set it
> for the duration of work where it wants to avoid generic_write_sync()
> triggering.
>
> That ought to reduce the overhead of new_sync_{read,write}() quite a bit.
>
> NEEDS TESTING; NOT FOR MERGE IN THAT FORM.
Definitely generates better code here, unsurprisingly. Unfortunately it
doesn't seem to help a whole lot for a direct comparison, though it does
nudge us in the right direction.
The main issue here, using urandom and 4k reads (iter reads done,
non-iter writes), the main difference seems to be that with the non-iter
reads, here's our copy overhead:
+ 1.80% dd [kernel.kallsyms] [k] __arch_copy_to_user
+ 0.74% dd [kernel.kallsyms] [k] _copy_to_user
and with the iter variant, for the same workload, it looks like this:
+ 4.13% dd [kernel.kallsyms] [k] _copy_to_iter
+ 0.88% dd [kernel.kallsyms] [k] __arch_copy_to_user
+ 0.72% dd [kernel.kallsyms] [k] copyout
and about 1% just doing the iov_iter advance. Since this test case is a
single iovec read, ran a quick test where we simply use this helper:
static size_t random_copy(void *src, int size, struct iov_iter *to)
{
const struct iovec *iov = to->iov;
size_t to_copy = min_t(size_t, size, iov->iov_len);
if (copy_to_user(iov->iov_base, src, to_copy))
return 0;
to->count -= to_copy;
return to_copy;
}
rather than copy_to_iter(). That brings us within 0.8% of the non-iter
read performance:
non-iter: 408MB/sec
iter: 395MB/sec -3.2%
iter+al+hack 406MB/sec -0.8%
for 32-byte reads:
non-iter 73.1MB/sec
iter: 72.1MB/sec -1.4%
iter+al+hack 72.5MB/sec -0.8%
which as mentioned in my initial email is closer than the 4k, but still
sees a nice reduction in overhead.
--
Jens Axboe
next prev parent reply other threads:[~2022-05-21 19:03 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-21 0:46 [RFC] what to do with IOCB_DSYNC? Al Viro
2021-06-21 13:59 ` Christoph Hellwig
2021-06-21 14:03 ` Matthew Wilcox
2021-06-21 14:09 ` Christoph Hellwig
2021-06-21 14:16 ` Matthew Wilcox
2021-06-21 14:22 ` Christoph Hellwig
2021-06-21 14:32 ` Al Viro
2021-06-21 14:35 ` Christoph Hellwig
2021-06-21 15:22 ` Jens Axboe
2022-05-21 17:48 ` Al Viro
2022-05-21 19:03 ` Jens Axboe [this message]
2022-05-21 22:14 ` Jens Axboe
2022-05-22 7:45 ` Christoph Hellwig
2022-05-22 10:23 ` Matthew Wilcox
2022-05-22 10:36 ` Al Viro
2022-05-22 11:15 ` Matthew Wilcox
2022-05-22 11:45 ` Christoph Hellwig
2022-05-22 12:39 ` Jens Axboe
2022-05-22 12:48 ` Al Viro
2022-05-22 13:02 ` Jens Axboe
2022-05-22 13:07 ` Al Viro
2022-05-22 13:09 ` Jens Axboe
2022-05-22 18:06 ` Jens Axboe
2022-05-22 18:25 ` Al Viro
2022-05-22 18:29 ` Jens Axboe
2022-05-22 18:39 ` Al Viro
2022-05-22 18:48 ` Jens Axboe
2022-05-22 19:04 ` Al Viro
2022-05-22 20:03 ` Jens Axboe
2022-05-23 0:42 ` Al Viro
2022-05-23 1:22 ` Jens Axboe
2022-05-23 1:28 ` Jens Axboe
2022-05-23 1:50 ` Jens Axboe
2022-05-23 2:43 ` Jens Axboe
2022-05-23 14:22 ` Al Viro
2022-05-23 14:34 ` Jens Axboe
2022-05-23 14:47 ` Al Viro
2022-05-23 15:12 ` Jens Axboe
2022-05-23 15:44 ` Jens Axboe
2022-05-23 15:49 ` Matthew Wilcox
2022-05-23 15:55 ` Jens Axboe
2022-05-23 16:03 ` Jens Axboe
2022-05-26 14:46 ` Jason A. Donenfeld
2022-05-27 10:09 ` Ingo Molnar
2022-05-27 10:15 ` Jason A. Donenfeld
2022-05-27 14:45 ` Samuel Neves
2022-05-27 10:25 ` Ingo Molnar
2022-05-27 10:36 ` Borislav Petkov
2022-05-28 20:54 ` Sedat Dilek
2022-05-28 20:38 ` Sedat Dilek
2022-05-28 20:39 ` Sedat Dilek
2022-05-23 16:15 ` Al Viro
2022-05-25 14:34 ` Matthew Wilcox
2022-05-26 23:19 ` Al Viro
2022-05-27 14:51 ` Jens Axboe
2022-05-22 12:21 ` Al Viro
2022-05-22 7:43 ` Christoph Hellwig
2022-05-22 12:41 ` Al Viro
2022-05-22 12:51 ` Christoph Hellwig
2021-06-21 14:22 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=70b5e4a8-1daa-dc75-af58-9d82a732a6be@kernel.dk \
--to=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).