From: Mikulas Patocka <mpatocka@redhat.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Dan Williams <dan.j.williams@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>,
Dave Jiang <dave.jiang@intel.com>,
Ira Weiny <ira.weiny@intel.com>,
Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
Steven Whitehouse <swhiteho@redhat.com>,
Eric Sandeen <esandeen@redhat.com>,
Dave Chinner <dchinner@redhat.com>,
"Theodore Ts'o" <tytso@mit.edu>,
Wang Jianchao <jianchao.wan9@gmail.com>,
"Kani, Toshi" <toshi.kani@hpe.com>,
"Norton, Scott J" <scott.norton@hpe.com>,
"Tadakamadla, Rajesh" <rajesh.tadakamadla@hpe.com>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-nvdimm@lists.01.org
Subject: Re: [RFC v2] nvfs: a filesystem for persistent memory
Date: Sun, 10 Jan 2021 16:14:55 -0500 (EST) [thread overview]
Message-ID: <alpine.LRH.2.02.2101101410230.7245@file01.intranet.prod.int.rdu2.redhat.com> (raw)
In-Reply-To: <20210110162008.GV3579531@ZenIV.linux.org.uk>
On Sun, 10 Jan 2021, Al Viro wrote:
> On Thu, Jan 07, 2021 at 08:15:41AM -0500, Mikulas Patocka wrote:
> > Hi
> >
> > I announce a new version of NVFS - a filesystem for persistent memory.
> > http://people.redhat.com/~mpatocka/nvfs/
> Utilities, AFAICS
>
> > git://leontynka.twibright.com/nvfs.git
> Seems to hang on git pull at the moment... Do you have it anywhere else?
I saw some errors 'git-daemon: fatal: the remote end hung up unexpectedly'
in syslog. I don't know what's causing them.
> > I found out that on NVFS, reading a file with the read method has 10%
> > better performance than the read_iter method. The benchmark just reads the
> > same 4k page over and over again - and the cost of creating and parsing
> > the kiocb and iov_iter structures is just that high.
>
> Apples and oranges... What happens if you take
>
> ssize_t read_iter_locked(struct file *file, struct iov_iter *to, loff_t *ppos)
> {
> struct inode *inode = file_inode(file);
> struct nvfs_memory_inode *nmi = i_to_nmi(inode);
> struct nvfs_superblock *nvs = inode->i_sb->s_fs_info;
> ssize_t total = 0;
> loff_t pos = *ppos;
> int r;
> int shift = nvs->log2_page_size;
> size_t i_size;
>
> i_size = inode->i_size;
> if (pos >= i_size)
> return 0;
> iov_iter_truncate(to, i_size - pos);
>
> while (iov_iter_count(to)) {
> void *blk, *ptr;
> size_t page_mask = (1UL << shift) - 1;
> unsigned page_offset = pos & page_mask;
> unsigned prealloc = (iov_iter_count(to) + page_mask) >> shift;
> unsigned size;
>
> blk = nvfs_bmap(nmi, pos >> shift, &prealloc, NULL, NULL, NULL);
> if (unlikely(IS_ERR(blk))) {
> r = PTR_ERR(blk);
> goto ret_r;
> }
> size = ((size_t)prealloc << shift) - page_offset;
> ptr = blk + page_offset;
> if (unlikely(!blk)) {
> size = min(size, (unsigned)PAGE_SIZE);
> ptr = empty_zero_page;
> }
> size = copy_to_iter(to, ptr, size);
> if (unlikely(!size)) {
> r = -EFAULT;
> goto ret_r;
> }
>
> pos += size;
> total += size;
> } while (iov_iter_count(to));
>
> r = 0;
>
> ret_r:
> *ppos = pos;
>
> if (file)
> file_accessed(file);
>
> return total ? total : r;
> }
>
> and use that instead of your nvfs_rw_iter_locked() in your
> ->read_iter() for DAX read case? Then the same with
> s/copy_to_iter/_copy_to_iter/, to see how much of that is
> "hardening" overhead.
>
> Incidentally, what's the point of sharing nvfs_rw_iter() for
> read and write cases? They have practically no overlap -
> count the lines common for wr and !wr cases. And if you
> do the same in nvfs_rw_iter_locked(), you'll see that the
> shared parts _there_ are bloody pointless on the read side.
That's a good point. I split nvfs_rw_iter to separate functions
nvfs_read_iter and nvfs_write_iter - and inlined nvfs_rw_iter_locked into
both of them. It improved performance by 1.3%.
> Not that it had been more useful on the write side, really,
> but that's another story (nvfs_write_pages() handling of
> copyin is... interesting). Let's figure out what's going
> on with the read overhead first...
>
> lib/iov_iter.c primitives certainly could use massage for
> better code generation, but let's find out how much of the
> PITA is due to those and how much comes from you fighing
> the damn thing instead of using it sanely...
The results are:
read: 6.744s
read_iter: 7.417s
read_iter - separate read and write path: 7.321s
Al's read_iter: 7.182s
Al's read_iter with _copy_to_iter: 7.181s
Mikulas
next prev parent reply other threads:[~2021-01-10 21:16 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-07 13:15 [RFC v2] nvfs: a filesystem for persistent memory Mikulas Patocka
2021-01-07 15:11 ` Expense of read_iter Matthew Wilcox
2021-01-07 16:43 ` Mingkai Dong
2021-01-12 13:45 ` Zhongwei Cai
2021-01-12 14:06 ` David Laight
2021-01-13 16:44 ` Mikulas Patocka
2021-01-15 9:40 ` Zhongwei Cai
2021-01-20 4:47 ` Dave Chinner
2021-01-20 14:18 ` Jan Kara
2021-01-20 15:12 ` Mikulas Patocka
2021-01-20 15:44 ` David Laight
2021-01-21 15:47 ` Matthew Wilcox
2021-01-21 16:06 ` Mikulas Patocka
2021-01-21 16:30 ` Zhongwei Cai
2021-01-07 18:59 ` Mikulas Patocka
2021-01-10 6:13 ` Matthew Wilcox
2021-01-10 21:19 ` Mikulas Patocka
2021-01-11 0:18 ` Matthew Wilcox
2021-01-11 21:10 ` Mikulas Patocka
2021-01-11 10:11 ` David Laight
2021-01-10 16:20 ` [RFC v2] nvfs: a filesystem for persistent memory Al Viro
2021-01-10 16:51 ` Al Viro
2021-01-10 21:14 ` Mikulas Patocka [this message]
2021-01-10 23:40 ` Al Viro
2021-01-11 11:41 ` Mikulas Patocka
2021-01-11 10:29 ` David Laight
2021-01-11 11:44 ` Mikulas Patocka
2021-01-11 11:57 ` David Laight
2021-01-11 14:43 ` Al Viro
2021-01-11 14:54 ` David Laight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LRH.2.02.2101101410230.7245@file01.intranet.prod.int.rdu2.redhat.com \
--to=mpatocka@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dchinner@redhat.com \
--cc=esandeen@redhat.com \
--cc=ira.weiny@intel.com \
--cc=jack@suse.cz \
--cc=jianchao.wan9@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvdimm@lists.01.org \
--cc=rajesh.tadakamadla@hpe.com \
--cc=scott.norton@hpe.com \
--cc=swhiteho@redhat.com \
--cc=toshi.kani@hpe.com \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
--cc=vishal.l.verma@intel.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).