All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fs: prefer read_iter over read and write_iter over write
@ 2022-05-20 13:51 Jason A. Donenfeld
  2022-05-20 14:37 ` Jens Axboe
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Jason A. Donenfeld @ 2022-05-20 13:51 UTC (permalink / raw)
  To: viro, linux-kernel; +Cc: Jason A. Donenfeld, Jens Axboe

Most kernel code prefers read_iter over read and write_iter over write,
yet the read function pointer is tested first. Reverse these so that the
iter function is always used first.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 fs/read_write.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index e643aec2b0ef..78a81aa5fa76 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -476,10 +476,10 @@ ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
 	if (count > MAX_RW_COUNT)
 		count =  MAX_RW_COUNT;
 
-	if (file->f_op->read)
-		ret = file->f_op->read(file, buf, count, pos);
-	else if (file->f_op->read_iter)
+	if (file->f_op->read_iter)
 		ret = new_sync_read(file, buf, count, pos);
+	else if (file->f_op->read)
+		ret = file->f_op->read(file, buf, count, pos);
 	else
 		ret = -EINVAL;
 	if (ret > 0) {
@@ -585,10 +585,10 @@ ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_
 	if (count > MAX_RW_COUNT)
 		count =  MAX_RW_COUNT;
 	file_start_write(file);
-	if (file->f_op->write)
-		ret = file->f_op->write(file, buf, count, pos);
-	else if (file->f_op->write_iter)
+	if (file->f_op->write_iter)
 		ret = new_sync_write(file, buf, count, pos);
+	else if (file->f_op->write)
+		ret = file->f_op->write(file, buf, count, pos);
 	else
 		ret = -EINVAL;
 	if (ret > 0) {
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] fs: prefer read_iter over read and write_iter over write
  2022-05-20 13:51 [PATCH] fs: prefer read_iter over read and write_iter over write Jason A. Donenfeld
@ 2022-05-20 14:37 ` Jens Axboe
  2022-05-20 15:04 ` Al Viro
  2022-05-20 21:24 ` David Laight
  2 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2022-05-20 14:37 UTC (permalink / raw)
  To: Jason A. Donenfeld, viro, linux-kernel

On 5/20/22 7:51 AM, Jason A. Donenfeld wrote:
> Most kernel code prefers read_iter over read and write_iter over write,
> yet the read function pointer is tested first. Reverse these so that the
> iter function is always used first.

Acked-by: Jens Axboe <axboe@kernel.dk>

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fs: prefer read_iter over read and write_iter over write
  2022-05-20 13:51 [PATCH] fs: prefer read_iter over read and write_iter over write Jason A. Donenfeld
  2022-05-20 14:37 ` Jens Axboe
@ 2022-05-20 15:04 ` Al Viro
  2022-05-20 21:24 ` David Laight
  2 siblings, 0 replies; 8+ messages in thread
From: Al Viro @ 2022-05-20 15:04 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: linux-kernel, Jens Axboe

On Fri, May 20, 2022 at 03:51:03PM +0200, Jason A. Donenfeld wrote:
> Most kernel code prefers read_iter over read and write_iter over write,
> yet the read function pointer is tested first. Reverse these so that the
> iter function is always used first.

NAK.  There are some weird devices (at the very least, one in sound)
where data gets interpreted differently for write() and writev().
There are several degrees of messiness:
	1) packet-like semantics, where boundaries of iovecs are
significant; writev() is equivalent to loop of write() calls, but
*NOT* to write() on a single concatenated copy.  _Any_ short write
on any segment (due to ->write() instance ignoring the rest of data
as well as due to unmapped page halfway through) terminates writev().
	2) similar, but more extreme - write() reports consuming all
the data it's been given (assuming the damn thing parses) and
ignores the excess.  writev() is equivalent to iterated write() on
all segments, as long as each is valid.  Not uncommon, sadly...
	3) completely unrelated interpretations of input for write()
and for writev().  writev() is *NOT* equivalent to a loop of write()
there.  Yes, such beasts exist.  And it's a user-visible ABI.
Example: snd_pcm_write() vs. snd_pcm_writev().  Not a chance to
retire that one any time soon, and the difference in semantics is
that writev() is "feed several channels at once; the chunks for
individual channels are covered by elements of iovec array".
Worse one: qib_write() and qib_write_iter().  There we flat-out
have different command sets for write() and for writev().  That,
at least, might be possible to retire someday.

IIRC, for pcm the readv() vs. read() differences are same as for
writev() vs. write() - parallel reads from different channels,
each to its own iovec.

It's a bad userland ABI design, but we are stuck with it - it's a couple
of decades too late to change.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] fs: prefer read_iter over read and write_iter over write
  2022-05-20 13:51 [PATCH] fs: prefer read_iter over read and write_iter over write Jason A. Donenfeld
  2022-05-20 14:37 ` Jens Axboe
  2022-05-20 15:04 ` Al Viro
@ 2022-05-20 21:24 ` David Laight
  2022-05-20 21:30   ` Jason A. Donenfeld
  2 siblings, 1 reply; 8+ messages in thread
From: David Laight @ 2022-05-20 21:24 UTC (permalink / raw)
  To: 'Jason A. Donenfeld', viro, linux-kernel; +Cc: Jens Axboe

From: Jason A. Donenfeld
> Sent: 20 May 2022 14:51
> 
> Most kernel code prefers read_iter over read and write_iter over write,
> yet the read function pointer is tested first. Reverse these so that the
> iter function is always used first.

There will be a measurable performance hit for the xxx_iter versions.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fs: prefer read_iter over read and write_iter over write
  2022-05-20 21:24 ` David Laight
@ 2022-05-20 21:30   ` Jason A. Donenfeld
  2022-05-20 22:08     ` David Laight
  0 siblings, 1 reply; 8+ messages in thread
From: Jason A. Donenfeld @ 2022-05-20 21:30 UTC (permalink / raw)
  To: David Laight; +Cc: viro, linux-kernel, Jens Axboe

Hi David,

On Fri, May 20, 2022 at 09:24:50PM +0000, David Laight wrote:
> From: Jason A. Donenfeld
> > Sent: 20 May 2022 14:51
> > 
> > Most kernel code prefers read_iter over read and write_iter over write,
> > yet the read function pointer is tested first. Reverse these so that the
> > iter function is always used first.
> 
> There will be a measurable performance hit for the xxx_iter versions.

Indeed. We now have the misfortune of a 3% hit on random.c, per this
sub-thread:

   https://lore.kernel.org/lkml/Yoey+FOYO69lS5qP@zx2c4.com/

The hope is that it eventually becomes faster... :-\

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] fs: prefer read_iter over read and write_iter over write
  2022-05-20 21:30   ` Jason A. Donenfeld
@ 2022-05-20 22:08     ` David Laight
  2022-05-20 22:18       ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: David Laight @ 2022-05-20 22:08 UTC (permalink / raw)
  To: 'Jason A. Donenfeld'; +Cc: viro, linux-kernel, Jens Axboe

From: Jason A. Donenfeld
> Sent: 20 May 2022 22:31
> 
> Hi David,
> 
> On Fri, May 20, 2022 at 09:24:50PM +0000, David Laight wrote:
> > From: Jason A. Donenfeld
> > > Sent: 20 May 2022 14:51
> > >
> > > Most kernel code prefers read_iter over read and write_iter over write,
> > > yet the read function pointer is tested first. Reverse these so that the
> > > iter function is always used first.
> >
> > There will be a measurable performance hit for the xxx_iter versions.
> 
> Indeed. We now have the misfortune of a 3% hit on random.c, per this
> sub-thread:

I wrote that a few hours ago and forgot to send it :-(

>    https://lore.kernel.org/lkml/Yoey+FOYO69lS5qP@zx2c4.com/
> 
> The hope is that it eventually becomes faster... :-\

I suspect all the xxx_iter functions need optimising for
the common case of a single buffer in userspace.

That also includes the code to read the iov[] from usespace.
At the moment I think the 32bit compat code is actually
faster than the native amd64 version!
I've written some patches to speed that up.
But the bigger improvements all hit massive changes
to the ioring code.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fs: prefer read_iter over read and write_iter over write
  2022-05-20 22:08     ` David Laight
@ 2022-05-20 22:18       ` Jens Axboe
  2022-05-23  8:18         ` David Laight
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2022-05-20 22:18 UTC (permalink / raw)
  To: David Laight, 'Jason A. Donenfeld'; +Cc: viro, linux-kernel

On 5/20/22 4:08 PM, David Laight wrote:
>>    https://lore.kernel.org/lkml/Yoey+FOYO69lS5qP@zx2c4.com/
>>
>> The hope is that it eventually becomes faster... :-\
> 
> I suspect all the xxx_iter functions need optimising for
> the common case of a single buffer in userspace.
> 
> That also includes the code to read the iov[] from usespace.
> At the moment I think the 32bit compat code is actually
> faster than the native amd64 version!
> I've written some patches to speed that up.
> But the bigger improvements all hit massive changes
> to the ioring code.

Do you have a link to those patches? I can certainly help with the
io_uring side of things, and I have a genuine interest in improving the
core and getting the iter side up to snuff.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] fs: prefer read_iter over read and write_iter over write
  2022-05-20 22:18       ` Jens Axboe
@ 2022-05-23  8:18         ` David Laight
  0 siblings, 0 replies; 8+ messages in thread
From: David Laight @ 2022-05-23  8:18 UTC (permalink / raw)
  To: 'Jens Axboe', 'Jason A. Donenfeld'; +Cc: viro, linux-kernel

From: Jens Axboe
> Sent: 20 May 2022 23:18
> 
> On 5/20/22 4:08 PM, David Laight wrote:
> >>    https://lore.kernel.org/lkml/Yoey+FOYO69lS5qP@zx2c4.com/
> >>
> >> The hope is that it eventually becomes faster... :-\
> >
> > I suspect all the xxx_iter functions need optimising for
> > the common case of a single buffer in userspace.
> >
> > That also includes the code to read the iov[] from usespace.
> > At the moment I think the 32bit compat code is actually
> > faster than the native amd64 version!
> > I've written some patches to speed that up.
> > But the bigger improvements all hit massive changes
> > to the ioring code.
> 
> Do you have a link to those patches? I can certainly help with the
> io_uring side of things, and I have a genuine interest in improving the
> core and getting the iter side up to snuff.

I'll see if I can find them.
Some bits of the last patch set did get applied.

One aim was to change all the callers of import_iovec()
to use a structure than contained both the 'iov_iter' and
the 'iovstack[]'.
The lifetimes of the two structures are effectively identical.
usually they are both allocated on stack together.

Merging them would significantly simplify the callers
and reduce the number of parameters passed through
multiple layers of functions - especially pointers
passed by value.

That change needs work done to the io_uring code to sanitise
the way it uses the iovstack[] cache and any extended kmalloc()ed
copy.

I need to look elsewhere for the optimisation to import_iovec()
itself.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-05-23  8:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-20 13:51 [PATCH] fs: prefer read_iter over read and write_iter over write Jason A. Donenfeld
2022-05-20 14:37 ` Jens Axboe
2022-05-20 15:04 ` Al Viro
2022-05-20 21:24 ` David Laight
2022-05-20 21:30   ` Jason A. Donenfeld
2022-05-20 22:08     ` David Laight
2022-05-20 22:18       ` Jens Axboe
2022-05-23  8:18         ` David Laight

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.