From: Mikulas Patocka <mpatocka@redhat.com> To: Matthew Wilcox <willy@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk>, Mingkai Dong <mingkaidong@gmail.com>, Andrew Morton <akpm@linux-foundation.org>, Jan Kara <jack@suse.cz>, Steven Whitehouse <swhiteho@redhat.com>, Eric Sandeen <esandeen@redhat.com>, Dave Chinner <dchinner@redhat.com>, Theodore Ts'o <tytso@mit.edu>, Wang Jianchao <jianchao.wan9@gmail.com>, "Tadakamadla, Rajesh" <rajesh.tadakamadla@hpe.com>, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-nvdimm@lists.01.org Subject: Re: Expense of read_iter Date: Mon, 11 Jan 2021 16:10:11 -0500 (EST) [thread overview] Message-ID: <alpine.LRH.2.02.2101111126150.31017@file01.intranet.prod.int.rdu2.redhat.com> (raw) In-Reply-To: <20210111001805.GD35215@casper.infradead.org> On Mon, 11 Jan 2021, Matthew Wilcox wrote: > On Sun, Jan 10, 2021 at 04:19:15PM -0500, Mikulas Patocka wrote: > > I put counters into vfs_read and vfs_readv. > > > > After a fresh boot of the virtual machine, the counters show "13385 4". > > After a kernel compilation they show "4475220 8". > > > > So, the readv path is almost unused. > > > > My reasoning was that we should optimize for the "read" path and glue the > > "readv" path on the top of that. Currently, the kernel is doing the > > opposite - optimizing for "readv" and glueing "read" on the top of it. > > But it's not about optimising for read vs readv. read_iter handles > a host of other cases, such as pread(), preadv(), AIO reads, splice, > and reads to in-kernel buffers. These things are used rarely compared to "read" and "pread". (BTW. "pread" could be handled by the read method too). What's the reason why do you think that the "read" syscall should use the "read_iter" code path? Is it because duplicating the logic is discouraged? Or because of code size? Or something else? > Some device drivers abused read() vs readv() to actually return different > information, depending which you called. That's why there's now a > prohibition against both. > > So let's figure out how to make iter_read() perform well for sys_read(). I've got another idea - in nvfs_read_iter, test if the iovec contains just one entry and call nvfs_read_locked if it does. diff --git a/file.c b/file.c index f4b8a1a..e4d87b2 100644 --- a/file.c +++ b/file.c @@ -460,6 +460,10 @@ static ssize_t nvfs_read_iter(struct kiocb *iocb, struct iov_iter *iov) if (!IS_DAX(&nmi->vfs_inode)) { r = generic_file_read_iter(iocb, iov); } else { + if (likely(iter_is_iovec(iov)) && likely(!iov->iov_offset) && likely(iov->nr_segs == 1)) { + r = nvfs_read_locked(nmi, iocb->ki_filp, iov->iov->iov_base, iov->count, true, &iocb->ki_pos); + goto unlock_ret; + } #if 1 r = nvfs_rw_iter_locked(iocb, iov, false); #else @@ -467,6 +471,7 @@ static ssize_t nvfs_read_iter(struct kiocb *iocb, struct iov_iter *iov) #endif } +unlock_ret: inode_unlock_shared(&nmi->vfs_inode); return r; The result is: nvfs_read_iter - 7.307s Al Viro's read_iter_locked - 7.147s test for just one entry - 7.010s the read method - 6.782s So far, this is the best way how to do it, but it's still 3.3% worse than the read method. There's not anything more that could be optimized on the filesystem level - the rest of optimizations must be done in the VFS. Mikulas _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org
WARNING: multiple messages have this Message-ID (diff)
From: Mikulas Patocka <mpatocka@redhat.com> To: Matthew Wilcox <willy@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk>, Mingkai Dong <mingkaidong@gmail.com>, Andrew Morton <akpm@linux-foundation.org>, Dan Williams <dan.j.williams@intel.com>, Vishal Verma <vishal.l.verma@intel.com>, Dave Jiang <dave.jiang@intel.com>, Ira Weiny <ira.weiny@intel.com>, Jan Kara <jack@suse.cz>, Steven Whitehouse <swhiteho@redhat.com>, Eric Sandeen <esandeen@redhat.com>, Dave Chinner <dchinner@redhat.com>, "Theodore Ts'o" <tytso@mit.edu>, Wang Jianchao <jianchao.wan9@gmail.com>, "Kani, Toshi" <toshi.kani@hpe.com>, "Norton, Scott J" <scott.norton@hpe.com>, "Tadakamadla, Rajesh" <rajesh.tadakamadla@hpe.com>, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-nvdimm@lists.01.org Subject: Re: Expense of read_iter Date: Mon, 11 Jan 2021 16:10:11 -0500 (EST) [thread overview] Message-ID: <alpine.LRH.2.02.2101111126150.31017@file01.intranet.prod.int.rdu2.redhat.com> (raw) In-Reply-To: <20210111001805.GD35215@casper.infradead.org> On Mon, 11 Jan 2021, Matthew Wilcox wrote: > On Sun, Jan 10, 2021 at 04:19:15PM -0500, Mikulas Patocka wrote: > > I put counters into vfs_read and vfs_readv. > > > > After a fresh boot of the virtual machine, the counters show "13385 4". > > After a kernel compilation they show "4475220 8". > > > > So, the readv path is almost unused. > > > > My reasoning was that we should optimize for the "read" path and glue the > > "readv" path on the top of that. Currently, the kernel is doing the > > opposite - optimizing for "readv" and glueing "read" on the top of it. > > But it's not about optimising for read vs readv. read_iter handles > a host of other cases, such as pread(), preadv(), AIO reads, splice, > and reads to in-kernel buffers. These things are used rarely compared to "read" and "pread". (BTW. "pread" could be handled by the read method too). What's the reason why do you think that the "read" syscall should use the "read_iter" code path? Is it because duplicating the logic is discouraged? Or because of code size? Or something else? > Some device drivers abused read() vs readv() to actually return different > information, depending which you called. That's why there's now a > prohibition against both. > > So let's figure out how to make iter_read() perform well for sys_read(). I've got another idea - in nvfs_read_iter, test if the iovec contains just one entry and call nvfs_read_locked if it does. diff --git a/file.c b/file.c index f4b8a1a..e4d87b2 100644 --- a/file.c +++ b/file.c @@ -460,6 +460,10 @@ static ssize_t nvfs_read_iter(struct kiocb *iocb, struct iov_iter *iov) if (!IS_DAX(&nmi->vfs_inode)) { r = generic_file_read_iter(iocb, iov); } else { + if (likely(iter_is_iovec(iov)) && likely(!iov->iov_offset) && likely(iov->nr_segs == 1)) { + r = nvfs_read_locked(nmi, iocb->ki_filp, iov->iov->iov_base, iov->count, true, &iocb->ki_pos); + goto unlock_ret; + } #if 1 r = nvfs_rw_iter_locked(iocb, iov, false); #else @@ -467,6 +471,7 @@ static ssize_t nvfs_read_iter(struct kiocb *iocb, struct iov_iter *iov) #endif } +unlock_ret: inode_unlock_shared(&nmi->vfs_inode); return r; The result is: nvfs_read_iter - 7.307s Al Viro's read_iter_locked - 7.147s test for just one entry - 7.010s the read method - 6.782s So far, this is the best way how to do it, but it's still 3.3% worse than the read method. There's not anything more that could be optimized on the filesystem level - the rest of optimizations must be done in the VFS. Mikulas
next prev parent reply other threads:[~2021-01-11 21:10 UTC|newest] Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-01-07 13:15 [RFC v2] nvfs: a filesystem for persistent memory Mikulas Patocka 2021-01-07 13:15 ` Mikulas Patocka 2021-01-07 15:11 ` Expense of read_iter Matthew Wilcox 2021-01-07 15:11 ` Matthew Wilcox 2021-01-07 16:43 ` Mingkai Dong 2021-01-07 16:43 ` Mingkai Dong 2021-01-12 13:45 ` Zhongwei Cai 2021-01-12 14:06 ` David Laight 2021-01-12 14:06 ` David Laight 2021-01-13 16:44 ` Mikulas Patocka 2021-01-13 16:44 ` Mikulas Patocka 2021-01-15 9:40 ` Zhongwei Cai 2021-01-20 4:47 ` Dave Chinner 2021-01-20 4:47 ` Dave Chinner 2021-01-20 14:18 ` Jan Kara 2021-01-20 14:18 ` Jan Kara 2021-01-20 15:12 ` Mikulas Patocka 2021-01-20 15:12 ` Mikulas Patocka 2021-01-20 15:44 ` David Laight 2021-01-20 15:44 ` David Laight 2021-01-21 15:47 ` Matthew Wilcox 2021-01-21 15:47 ` Matthew Wilcox 2021-01-21 16:06 ` Mikulas Patocka 2021-01-21 16:06 ` Mikulas Patocka 2021-01-21 16:30 ` Zhongwei Cai 2021-01-07 18:59 ` Mikulas Patocka 2021-01-07 18:59 ` Mikulas Patocka 2021-01-10 6:13 ` Matthew Wilcox 2021-01-10 6:13 ` Matthew Wilcox 2021-01-10 21:19 ` Mikulas Patocka 2021-01-10 21:19 ` Mikulas Patocka 2021-01-11 0:18 ` Matthew Wilcox 2021-01-11 0:18 ` Matthew Wilcox 2021-01-11 21:10 ` Mikulas Patocka [this message] 2021-01-11 21:10 ` Mikulas Patocka 2021-01-11 10:11 ` David Laight 2021-01-11 10:11 ` David Laight 2021-01-10 16:20 ` [RFC v2] nvfs: a filesystem for persistent memory Al Viro 2021-01-10 16:20 ` Al Viro 2021-01-10 16:51 ` Al Viro 2021-01-10 16:51 ` Al Viro 2021-01-10 21:14 ` Mikulas Patocka 2021-01-10 21:14 ` Mikulas Patocka 2021-01-10 23:40 ` Al Viro 2021-01-10 23:40 ` Al Viro 2021-01-11 11:41 ` Mikulas Patocka 2021-01-11 11:41 ` Mikulas Patocka 2021-01-11 10:29 ` David Laight 2021-01-11 10:29 ` David Laight 2021-01-11 11:44 ` Mikulas Patocka 2021-01-11 11:44 ` Mikulas Patocka 2021-01-11 11:57 ` David Laight 2021-01-11 11:57 ` David Laight 2021-01-11 14:43 ` Al Viro 2021-01-11 14:43 ` Al Viro 2021-01-11 14:54 ` David Laight 2021-01-11 14:54 ` David Laight
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=alpine.LRH.2.02.2101111126150.31017@file01.intranet.prod.int.rdu2.redhat.com \ --to=mpatocka@redhat.com \ --cc=akpm@linux-foundation.org \ --cc=dchinner@redhat.com \ --cc=esandeen@redhat.com \ --cc=jack@suse.cz \ --cc=jianchao.wan9@gmail.com \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-nvdimm@lists.01.org \ --cc=mingkaidong@gmail.com \ --cc=rajesh.tadakamadla@hpe.com \ --cc=swhiteho@redhat.com \ --cc=tytso@mit.edu \ --cc=viro@zeniv.linux.org.uk \ --cc=willy@infradead.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.