linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	Steven Whitehouse <swhiteho@redhat.com>,
	Eric Sandeen <esandeen@redhat.com>,
	Dave Chinner <dchinner@redhat.com>, Theodore Ts'o <tytso@mit.edu>,
	Wang Jianchao <jianchao.wan9@gmail.com>,
	"Kani, Toshi" <toshi.kani@hpe.com>,
	"Norton, Scott J" <scott.norton@hpe.com>,
	"Tadakamadla, Rajesh" <rajesh.tadakamadla@hpe.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-nvdimm@lists.01.org
Subject: Re: [RFC v2] nvfs: a filesystem for persistent memory
Date: Sun, 10 Jan 2021 16:20:08 +0000	[thread overview]
Message-ID: <20210110162008.GV3579531@ZenIV.linux.org.uk> (raw)
In-Reply-To: <alpine.LRH.2.02.2101061245100.30542@file01.intranet.prod.int.rdu2.redhat.com>

On Thu, Jan 07, 2021 at 08:15:41AM -0500, Mikulas Patocka wrote:
> Hi
> 
> I announce a new version of NVFS - a filesystem for persistent memory.
> 	http://people.redhat.com/~mpatocka/nvfs/
Utilities, AFAICS

> 	git://leontynka.twibright.com/nvfs.git
Seems to hang on git pull at the moment...  Do you have it anywhere else?

> I found out that on NVFS, reading a file with the read method has 10% 
> better performance than the read_iter method. The benchmark just reads the 
> same 4k page over and over again - and the cost of creating and parsing 
> the kiocb and iov_iter structures is just that high.

Apples and oranges...  What happens if you take

ssize_t read_iter_locked(struct file *file, struct iov_iter *to, loff_t *ppos)
{
	struct inode *inode = file_inode(file);
	struct nvfs_memory_inode *nmi = i_to_nmi(inode);
	struct nvfs_superblock *nvs = inode->i_sb->s_fs_info;
	ssize_t total = 0;
	loff_t pos = *ppos;
	int r;
	int shift = nvs->log2_page_size;
	size_t i_size;

	i_size = inode->i_size;
	if (pos >= i_size)
		return 0;
	iov_iter_truncate(to, i_size - pos);

	while (iov_iter_count(to)) {
		void *blk, *ptr;
		size_t page_mask = (1UL << shift) - 1;
		unsigned page_offset = pos & page_mask;
		unsigned prealloc = (iov_iter_count(to) + page_mask) >> shift;
		unsigned size;

		blk = nvfs_bmap(nmi, pos >> shift, &prealloc, NULL, NULL, NULL);
		if (unlikely(IS_ERR(blk))) {
			r = PTR_ERR(blk);
			goto ret_r;
		}
		size = ((size_t)prealloc << shift) - page_offset;
		ptr = blk + page_offset;
		if (unlikely(!blk)) {
			size = min(size, (unsigned)PAGE_SIZE);
			ptr = empty_zero_page;
		}
		size = copy_to_iter(to, ptr, size);
		if (unlikely(!size)) {
			r = -EFAULT;
			goto ret_r;
		}

		pos += size;
		total += size;
	} while (iov_iter_count(to));

	r = 0;

ret_r:
	*ppos = pos;

	if (file)
		file_accessed(file);

	return total ? total : r;
}

and use that instead of your nvfs_rw_iter_locked() in your
->read_iter() for DAX read case?  Then the same with
s/copy_to_iter/_copy_to_iter/, to see how much of that is
"hardening" overhead.

Incidentally, what's the point of sharing nvfs_rw_iter() for
read and write cases?  They have practically no overlap -
count the lines common for wr and !wr cases.  And if you
do the same in nvfs_rw_iter_locked(), you'll see that the
shared parts _there_ are bloody pointless on the read side.
Not that it had been more useful on the write side, really,
but that's another story (nvfs_write_pages() handling of
copyin is... interesting).  Let's figure out what's going
on with the read overhead first...

lib/iov_iter.c primitives certainly could use massage for
better code generation, but let's find out how much of the
PITA is due to those and how much comes from you fighing
the damn thing instead of using it sanely...

  parent reply	other threads:[~2021-01-10 16:21 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-07 13:15 [RFC v2] nvfs: a filesystem for persistent memory Mikulas Patocka
2021-01-07 15:11 ` Expense of read_iter Matthew Wilcox
2021-01-07 16:43   ` Mingkai Dong
2021-01-12 13:45     ` Zhongwei Cai
2021-01-12 14:06       ` David Laight
2021-01-13 16:44       ` Mikulas Patocka
2021-01-15  9:40         ` Zhongwei Cai
2021-01-20  4:47           ` Dave Chinner
2021-01-20 14:18             ` Jan Kara
2021-01-20 15:12               ` Mikulas Patocka
2021-01-20 15:44                 ` David Laight
2021-01-21 15:47                 ` Matthew Wilcox
2021-01-21 16:06                   ` Mikulas Patocka
2021-01-21 16:30               ` Zhongwei Cai
2021-01-07 18:59   ` Mikulas Patocka
2021-01-10  6:13     ` Matthew Wilcox
2021-01-10 21:19       ` Mikulas Patocka
2021-01-11  0:18         ` Matthew Wilcox
2021-01-11 21:10           ` Mikulas Patocka
2021-01-11 10:11       ` David Laight
2021-01-10 16:20 ` Al Viro [this message]
2021-01-10 16:51   ` [RFC v2] nvfs: a filesystem for persistent memory Al Viro
2021-01-10 21:14   ` Mikulas Patocka
2021-01-10 23:40     ` Al Viro
2021-01-11 11:41       ` Mikulas Patocka
2021-01-11 10:29   ` David Laight
2021-01-11 11:44     ` Mikulas Patocka
2021-01-11 11:57       ` David Laight
2021-01-11 14:43         ` Al Viro
2021-01-11 14:54           ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210110162008.GV3579531@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dchinner@redhat.com \
    --cc=esandeen@redhat.com \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=jianchao.wan9@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mpatocka@redhat.com \
    --cc=rajesh.tadakamadla@hpe.com \
    --cc=scott.norton@hpe.com \
    --cc=swhiteho@redhat.com \
    --cc=toshi.kani@hpe.com \
    --cc=tytso@mit.edu \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).