All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mikulas Patocka <mpatocka@redhat.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	Mingkai Dong <mingkaidong@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jan Kara <jack@suse.cz>, Steven Whitehouse <swhiteho@redhat.com>,
	Eric Sandeen <esandeen@redhat.com>,
	Dave Chinner <dchinner@redhat.com>, Theodore Ts'o <tytso@mit.edu>,
	Wang Jianchao <jianchao.wan9@gmail.com>,
	"Tadakamadla, Rajesh" <rajesh.tadakamadla@hpe.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-nvdimm@lists.01.org
Subject: Re: Expense of read_iter
Date: Mon, 11 Jan 2021 16:10:11 -0500 (EST)	[thread overview]
Message-ID: <alpine.LRH.2.02.2101111126150.31017@file01.intranet.prod.int.rdu2.redhat.com> (raw)
In-Reply-To: <20210111001805.GD35215@casper.infradead.org>



On Mon, 11 Jan 2021, Matthew Wilcox wrote:

> On Sun, Jan 10, 2021 at 04:19:15PM -0500, Mikulas Patocka wrote:
> > I put counters into vfs_read and vfs_readv.
> > 
> > After a fresh boot of the virtual machine, the counters show "13385 4". 
> > After a kernel compilation they show "4475220 8".
> > 
> > So, the readv path is almost unused.
> > 
> > My reasoning was that we should optimize for the "read" path and glue the 
> > "readv" path on the top of that. Currently, the kernel is doing the 
> > opposite - optimizing for "readv" and glueing "read" on the top of it.
> 
> But it's not about optimising for read vs readv.  read_iter handles
> a host of other cases, such as pread(), preadv(), AIO reads, splice,
> and reads to in-kernel buffers.

These things are used rarely compared to "read" and "pread". (BTW. "pread" 
could be handled by the read method too).

What's the reason why do you think that the "read" syscall should use the 
"read_iter" code path? Is it because duplicating the logic is discouraged? 
Or because of code size? Or something else?

> Some device drivers abused read() vs readv() to actually return different 
> information, depending which you called.  That's why there's now a
> prohibition against both.
> 
> So let's figure out how to make iter_read() perform well for sys_read().

I've got another idea - in nvfs_read_iter, test if the iovec contains just 
one entry and call nvfs_read_locked if it does.

diff --git a/file.c b/file.c
index f4b8a1a..e4d87b2 100644
--- a/file.c
+++ b/file.c
@@ -460,6 +460,10 @@ static ssize_t nvfs_read_iter(struct kiocb *iocb, struct iov_iter *iov)
 	if (!IS_DAX(&nmi->vfs_inode)) {
 		r = generic_file_read_iter(iocb, iov);
 	} else {
+		if (likely(iter_is_iovec(iov)) && likely(!iov->iov_offset) && likely(iov->nr_segs == 1)) {
+			r = nvfs_read_locked(nmi, iocb->ki_filp, iov->iov->iov_base, iov->count, true, &iocb->ki_pos);
+			goto unlock_ret;
+		}
 #if 1
 		r = nvfs_rw_iter_locked(iocb, iov, false);
 #else
@@ -467,6 +471,7 @@ static ssize_t nvfs_read_iter(struct kiocb *iocb, struct iov_iter *iov)
 #endif
 	}
 
+unlock_ret:
 	inode_unlock_shared(&nmi->vfs_inode);
 
 	return r;



The result is:

nvfs_read_iter			- 7.307s
Al Viro's read_iter_locked	- 7.147s
test for just one entry		- 7.010s
the read method			- 6.782s

So far, this is the best way how to do it, but it's still 3.3% worse than 
the read method. There's not anything more that could be optimized on the 
filesystem level - the rest of optimizations must be done in the VFS.

Mikulas
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

WARNING: multiple messages have this Message-ID (diff)
From: Mikulas Patocka <mpatocka@redhat.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	Mingkai Dong <mingkaidong@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Ira Weiny <ira.weiny@intel.com>, Jan Kara <jack@suse.cz>,
	Steven Whitehouse <swhiteho@redhat.com>,
	Eric Sandeen <esandeen@redhat.com>,
	Dave Chinner <dchinner@redhat.com>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Wang Jianchao <jianchao.wan9@gmail.com>,
	"Kani, Toshi" <toshi.kani@hpe.com>,
	"Norton, Scott J" <scott.norton@hpe.com>,
	"Tadakamadla, Rajesh" <rajesh.tadakamadla@hpe.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-nvdimm@lists.01.org
Subject: Re: Expense of read_iter
Date: Mon, 11 Jan 2021 16:10:11 -0500 (EST)	[thread overview]
Message-ID: <alpine.LRH.2.02.2101111126150.31017@file01.intranet.prod.int.rdu2.redhat.com> (raw)
In-Reply-To: <20210111001805.GD35215@casper.infradead.org>



On Mon, 11 Jan 2021, Matthew Wilcox wrote:

> On Sun, Jan 10, 2021 at 04:19:15PM -0500, Mikulas Patocka wrote:
> > I put counters into vfs_read and vfs_readv.
> > 
> > After a fresh boot of the virtual machine, the counters show "13385 4". 
> > After a kernel compilation they show "4475220 8".
> > 
> > So, the readv path is almost unused.
> > 
> > My reasoning was that we should optimize for the "read" path and glue the 
> > "readv" path on the top of that. Currently, the kernel is doing the 
> > opposite - optimizing for "readv" and glueing "read" on the top of it.
> 
> But it's not about optimising for read vs readv.  read_iter handles
> a host of other cases, such as pread(), preadv(), AIO reads, splice,
> and reads to in-kernel buffers.

These things are used rarely compared to "read" and "pread". (BTW. "pread" 
could be handled by the read method too).

What's the reason why do you think that the "read" syscall should use the 
"read_iter" code path? Is it because duplicating the logic is discouraged? 
Or because of code size? Or something else?

> Some device drivers abused read() vs readv() to actually return different 
> information, depending which you called.  That's why there's now a
> prohibition against both.
> 
> So let's figure out how to make iter_read() perform well for sys_read().

I've got another idea - in nvfs_read_iter, test if the iovec contains just 
one entry and call nvfs_read_locked if it does.

diff --git a/file.c b/file.c
index f4b8a1a..e4d87b2 100644
--- a/file.c
+++ b/file.c
@@ -460,6 +460,10 @@ static ssize_t nvfs_read_iter(struct kiocb *iocb, struct iov_iter *iov)
 	if (!IS_DAX(&nmi->vfs_inode)) {
 		r = generic_file_read_iter(iocb, iov);
 	} else {
+		if (likely(iter_is_iovec(iov)) && likely(!iov->iov_offset) && likely(iov->nr_segs == 1)) {
+			r = nvfs_read_locked(nmi, iocb->ki_filp, iov->iov->iov_base, iov->count, true, &iocb->ki_pos);
+			goto unlock_ret;
+		}
 #if 1
 		r = nvfs_rw_iter_locked(iocb, iov, false);
 #else
@@ -467,6 +471,7 @@ static ssize_t nvfs_read_iter(struct kiocb *iocb, struct iov_iter *iov)
 #endif
 	}
 
+unlock_ret:
 	inode_unlock_shared(&nmi->vfs_inode);
 
 	return r;



The result is:

nvfs_read_iter			- 7.307s
Al Viro's read_iter_locked	- 7.147s
test for just one entry		- 7.010s
the read method			- 6.782s

So far, this is the best way how to do it, but it's still 3.3% worse than 
the read method. There's not anything more that could be optimized on the 
filesystem level - the rest of optimizations must be done in the VFS.

Mikulas


  reply	other threads:[~2021-01-11 21:10 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-07 13:15 [RFC v2] nvfs: a filesystem for persistent memory Mikulas Patocka
2021-01-07 13:15 ` Mikulas Patocka
2021-01-07 15:11 ` Expense of read_iter Matthew Wilcox
2021-01-07 15:11   ` Matthew Wilcox
2021-01-07 16:43   ` Mingkai Dong
2021-01-07 16:43     ` Mingkai Dong
2021-01-12 13:45     ` Zhongwei Cai
2021-01-12 14:06       ` David Laight
2021-01-12 14:06         ` David Laight
2021-01-13 16:44       ` Mikulas Patocka
2021-01-13 16:44         ` Mikulas Patocka
2021-01-15  9:40         ` Zhongwei Cai
2021-01-20  4:47           ` Dave Chinner
2021-01-20  4:47             ` Dave Chinner
2021-01-20 14:18             ` Jan Kara
2021-01-20 14:18               ` Jan Kara
2021-01-20 15:12               ` Mikulas Patocka
2021-01-20 15:12                 ` Mikulas Patocka
2021-01-20 15:44                 ` David Laight
2021-01-20 15:44                   ` David Laight
2021-01-21 15:47                 ` Matthew Wilcox
2021-01-21 15:47                   ` Matthew Wilcox
2021-01-21 16:06                   ` Mikulas Patocka
2021-01-21 16:06                     ` Mikulas Patocka
2021-01-21 16:30               ` Zhongwei Cai
2021-01-07 18:59   ` Mikulas Patocka
2021-01-07 18:59     ` Mikulas Patocka
2021-01-10  6:13     ` Matthew Wilcox
2021-01-10  6:13       ` Matthew Wilcox
2021-01-10 21:19       ` Mikulas Patocka
2021-01-10 21:19         ` Mikulas Patocka
2021-01-11  0:18         ` Matthew Wilcox
2021-01-11  0:18           ` Matthew Wilcox
2021-01-11 21:10           ` Mikulas Patocka [this message]
2021-01-11 21:10             ` Mikulas Patocka
2021-01-11 10:11       ` David Laight
2021-01-11 10:11         ` David Laight
2021-01-10 16:20 ` [RFC v2] nvfs: a filesystem for persistent memory Al Viro
2021-01-10 16:20   ` Al Viro
2021-01-10 16:51   ` Al Viro
2021-01-10 16:51     ` Al Viro
2021-01-10 21:14   ` Mikulas Patocka
2021-01-10 21:14     ` Mikulas Patocka
2021-01-10 23:40     ` Al Viro
2021-01-10 23:40       ` Al Viro
2021-01-11 11:41       ` Mikulas Patocka
2021-01-11 11:41         ` Mikulas Patocka
2021-01-11 10:29   ` David Laight
2021-01-11 10:29     ` David Laight
2021-01-11 11:44     ` Mikulas Patocka
2021-01-11 11:44       ` Mikulas Patocka
2021-01-11 11:57       ` David Laight
2021-01-11 11:57         ` David Laight
2021-01-11 14:43         ` Al Viro
2021-01-11 14:43           ` Al Viro
2021-01-11 14:54           ` David Laight
2021-01-11 14:54             ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LRH.2.02.2101111126150.31017@file01.intranet.prod.int.rdu2.redhat.com \
    --to=mpatocka@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dchinner@redhat.com \
    --cc=esandeen@redhat.com \
    --cc=jack@suse.cz \
    --cc=jianchao.wan9@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mingkaidong@gmail.com \
    --cc=rajesh.tadakamadla@hpe.com \
    --cc=swhiteho@redhat.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.