All of lore.kernel.org
 help / color / mirror / Atom feed
* Splicing to/from a tty
@ 2021-01-16  7:35 Oliver Giles
  2021-01-16 16:46 ` Johannes Berg
  2021-01-18  8:16 ` Christoph Hellwig
  0 siblings, 2 replies; 65+ messages in thread
From: Oliver Giles @ 2021-01-16  7:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, Christoph Hellwig

Commit 36e2c7421f02 (fs: don't allow splice read/write without explicit ops) broke my userspace application which talks to an SSL VPN by splice()ing between "openssl s_client" and "pppd". The latter operates over a pty, and since that commit there is no fallback for splice()ing between a pipe and a pty, or any tty for that matter.

The above commit mentions switching them to the iter ops and using generic_file_splice_read. IIUC, this would require implementing iter ops also on the line disciplines, which sounds pretty disruptive.

For my case, I attempted to instead implement splice_write and splice_read in tty_fops; I managed to get splice_write working calling ld->ops->write, but splice_read is not so simple because the tty_ldisc_ops read method expects a userspace buffer. So I cannot see how to implement this without either (a) using set_fs, or (b) implementing iter ops on all line disciplines.

Is splice()ing between a tty and a pipe worth supporting at all? Not a big deal for my use case at least, but it used to work.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-16  7:35 Splicing to/from a tty Oliver Giles
@ 2021-01-16 16:46 ` Johannes Berg
  2021-01-17  6:12   ` Oliver Giles
  2021-01-18  8:53   ` Christoph Hellwig
  2021-01-18  8:16 ` Christoph Hellwig
  1 sibling, 2 replies; 65+ messages in thread
From: Johannes Berg @ 2021-01-16 16:46 UTC (permalink / raw)
  To: Oliver Giles, linux-kernel; +Cc: Greg Kroah-Hartman, Christoph Hellwig

On Sat, 2021-01-16 at 20:35 +1300, Oliver Giles wrote:
> Commit 36e2c7421f02 (fs: don't allow splice read/write without
> explicit ops) broke my userspace application which talks to an SSL VPN
> by splice()ing between "openssl s_client" and "pppd". The latter
> operates over a pty, and since that commit there is no fallback for
> splice()ing between a pipe and a pty, or any tty for that matter.
> 
> The above commit mentions switching them to the iter ops and using
> generic_file_splice_read. IIUC, this would require implementing iter
> ops also on the line disciplines, which sounds pretty disruptive.
> 
> For my case, I attempted to instead implement splice_write and
> splice_read in tty_fops; I managed to get splice_write working calling
> ld->ops->write, but splice_read is not so simple because the
> tty_ldisc_ops read method expects a userspace buffer. So I cannot see
> how to implement this without either (a) using set_fs, or (b)
> implementing iter ops on all line disciplines.
> 
> Is splice()ing between a tty and a pipe worth supporting at all? Not a
> big deal for my use case at least, but it used to work.

Is it even strictly related to the tty?

I was just now looking into why my cgit/fcgi/nginx setup no longer
works, and the reason is getting -EINVAL from sendfile() when the input
is a file and the output is a pipe().

So I wrote a simple test program (below) and that errors out on kernel
5.10.4, while it works fine on the 5.9.16 I currently have. Haven't
tried reverting anything yet, but now that I haev a test program it
should be simple to even bisect.

johannes


#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/sendfile.h>
#include <stdio.h>
#include <assert.h>

int main(int argc, char **argv)
{
	int in = open(argv[0], O_RDONLY);
	int p[2], out;
	off_t off = 0;
	int err;

	assert(in >= 0);
	assert(pipe(p) >= 0);
	out = p[1];
	err = sendfile(out, in, &off, 1024);
	if (err < 0)
		perror("sendfile");
	assert(err == 1024);

	return 0;
}



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-16 16:46 ` Johannes Berg
@ 2021-01-17  6:12   ` Oliver Giles
  2021-01-18  8:53   ` Christoph Hellwig
  1 sibling, 0 replies; 65+ messages in thread
From: Oliver Giles @ 2021-01-17  6:12 UTC (permalink / raw)
  To: Johannes Berg, linux-kernel; +Cc: Greg Kroah-Hartman, Christoph Hellwig

On Sun Jan 17, 2021 at 5:46 AM NZDT, Johannes Berg wrote:
> On Sat, 2021-01-16 at 20:35 +1300, Oliver Giles wrote:
> > Commit 36e2c7421f02 (fs: don't allow splice read/write without
> > explicit ops) broke my userspace application which talks to an SSL VPN
> > by splice()ing between "openssl s_client" and "pppd". The latter
> > operates over a pty, and since that commit there is no fallback for
> > splice()ing between a pipe and a pty, or any tty for that matter.
> > 
> > The above commit mentions switching them to the iter ops and using
> > generic_file_splice_read. IIUC, this would require implementing iter
> > ops also on the line disciplines, which sounds pretty disruptive.
> > 
> > For my case, I attempted to instead implement splice_write and
> > splice_read in tty_fops; I managed to get splice_write working calling
> > ld->ops->write, but splice_read is not so simple because the
> > tty_ldisc_ops read method expects a userspace buffer. So I cannot see
> > how to implement this without either (a) using set_fs, or (b)
> > implementing iter ops on all line disciplines.
> > 
> > Is splice()ing between a tty and a pipe worth supporting at all? Not a
> > big deal for my use case at least, but it used to work.
>
> Is it even strictly related to the tty?
>
> I was just now looking into why my cgit/fcgi/nginx setup no longer
> works, and the reason is getting -EINVAL from sendfile() when the input
> is a file and the output is a pipe().
>
> So I wrote a simple test program (below) and that errors out on kernel
> 5.10.4, while it works fine on the 5.9.16 I currently have. Haven't
> tried reverting anything yet, but now that I haev a test program it
> should be simple to even bisect.
>
> johannes
>
>
> #include <unistd.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sys/sendfile.h>
> #include <stdio.h>
> #include <assert.h>
>
> int main(int argc, char **argv)
> {
> int in = open(argv[0], O_RDONLY);
> int p[2], out;
> off_t off = 0;
> int err;
>
> assert(in >= 0);
> assert(pipe(p) >= 0);
> out = p[1];
> err = sendfile(out, in, &off, 1024);
> if (err < 0)
> perror("sendfile");
> assert(err == 1024);
>
> return 0;
> }

I can confirm the behaviour you see, and that it starts occurring from the same
commit 36e2c7421f02a22 (fs: don't allow splice read/write without explicit ops).

In my tty case, it is clear that removing the default fallback would cause this
to fail, but assuming the sendfile() source is on a regular filesystem, I am
unsure why splice cannot find the appropriate splice_write method. Could be
connected to the fact that the message from warn_unsupported in fs/splice.c
outputs "splice write not supported for file  (pid: 983819 comm: test-sendfile)",
i.e. the file path is blank. In my case of directly calling splice on a pty, I
do see a path such as /ptyp0 in that error before implementing splice_(read/write)
callbacks. Maybe splice is getting a bogus file pointer from sendfile?

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-16  7:35 Splicing to/from a tty Oliver Giles
  2021-01-16 16:46 ` Johannes Berg
@ 2021-01-18  8:16 ` Christoph Hellwig
  2021-01-18 19:36   ` Linus Torvalds
  1 sibling, 1 reply; 65+ messages in thread
From: Christoph Hellwig @ 2021-01-18  8:16 UTC (permalink / raw)
  To: Oliver Giles
  Cc: linux-kernel, Greg Kroah-Hartman, Christoph Hellwig,
	Linus Torvalds, Al Viro

On Sat, Jan 16, 2021 at 08:35:41PM +1300, Oliver Giles wrote:
> Commit 36e2c7421f02 (fs: don't allow splice read/write without explicit ops) broke my userspace application which talks to an SSL VPN by splice()ing between "openssl s_client" and "pppd". The latter operates over a pty, and since that commit there is no fallback for splice()ing between a pipe and a pty, or any tty for that matter.
> 
> The above commit mentions switching them to the iter ops and using generic_file_splice_read. IIUC, this would require implementing iter ops also on the line disciplines, which sounds pretty disruptive.
> 
> For my case, I attempted to instead implement splice_write and splice_read in tty_fops; I managed to get splice_write working calling ld->ops->write, but splice_read is not so simple because the tty_ldisc_ops read method expects a userspace buffer. So I cannot see how to implement this without either (a) using set_fs, or (b) implementing iter ops on all line disciplines.

set_fs is gone for all the important platforms.  So yes, you basically
need to convert to iov_iter or have a huge splice_write parallel
infrastucture.

> 
> Is splice()ing between a tty and a pipe worth supporting at all? Not a big deal for my use case at least, but it used to work.

Our normal policy is no regressions for exiting userspace.  By that we'd
have to fix it.  Let me see if I can help you with this in any way.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-16 16:46 ` Johannes Berg
  2021-01-17  6:12   ` Oliver Giles
@ 2021-01-18  8:53   ` Christoph Hellwig
  2021-01-18  8:58     ` Johannes Berg
  2021-01-18 19:34     ` Al Viro
  1 sibling, 2 replies; 65+ messages in thread
From: Christoph Hellwig @ 2021-01-18  8:53 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Oliver Giles, linux-kernel, Greg Kroah-Hartman,
	Christoph Hellwig, Linus Torvalds, Al Viro

On Sat, Jan 16, 2021 at 05:46:33PM +0100, Johannes Berg wrote:
> > For my case, I attempted to instead implement splice_write and
> > splice_read in tty_fops; I managed to get splice_write working calling
> > ld->ops->write, but splice_read is not so simple because the
> > tty_ldisc_ops read method expects a userspace buffer. So I cannot see
> > how to implement this without either (a) using set_fs, or (b)
> > implementing iter ops on all line disciplines.
> > 
> > Is splice()ing between a tty and a pipe worth supporting at all? Not a
> > big deal for my use case at least, but it used to work.
> 
> Is it even strictly related to the tty?
> 
> I was just now looking into why my cgit/fcgi/nginx setup no longer
> works, and the reason is getting -EINVAL from sendfile() when the input
> is a file and the output is a pipe().

Yes, pipes do not support ->splice_write currenly.   I think just wiring
up iter_file_splice_write would work.  Al?

> So I wrote a simple test program (below) and that errors out on kernel
> 5.10.4, while it works fine on the 5.9.16 I currently have. Haven't
> tried reverting anything yet, but now that I haev a test program it
> should be simple to even bisect.
> 
> johannes
> 
> 
> #include <unistd.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sys/sendfile.h>
> #include <stdio.h>
> #include <assert.h>
> 
> int main(int argc, char **argv)
> {
> 	int in = open(argv[0], O_RDONLY);
> 	int p[2], out;
> 	off_t off = 0;
> 	int err;
> 
> 	assert(in >= 0);
> 	assert(pipe(p) >= 0);
> 	out = p[1];
> 	err = sendfile(out, in, &off, 1024);
> 	if (err < 0)
> 		perror("sendfile");
> 	assert(err == 1024);
> 
> 	return 0;
> }
> 
---end quoted text---

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-18  8:53   ` Christoph Hellwig
@ 2021-01-18  8:58     ` Johannes Berg
  2021-01-18 19:26       ` Linus Torvalds
  2021-01-18 19:34     ` Al Viro
  1 sibling, 1 reply; 65+ messages in thread
From: Johannes Berg @ 2021-01-18  8:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Oliver Giles, linux-kernel, Greg Kroah-Hartman, Linus Torvalds, Al Viro

On Mon, 2021-01-18 at 09:53 +0100, Christoph Hellwig wrote:
> On Sat, Jan 16, 2021 at 05:46:33PM +0100, Johannes Berg wrote:
> > > For my case, I attempted to instead implement splice_write and
> > > splice_read in tty_fops; I managed to get splice_write working calling
> > > ld->ops->write, but splice_read is not so simple because the
> > > tty_ldisc_ops read method expects a userspace buffer. So I cannot see
> > > how to implement this without either (a) using set_fs, or (b)
> > > implementing iter ops on all line disciplines.
> > > 
> > > Is splice()ing between a tty and a pipe worth supporting at all? Not a
> > > big deal for my use case at least, but it used to work.
> > 
> > Is it even strictly related to the tty?
> > 
> > I was just now looking into why my cgit/fcgi/nginx setup no longer
> > works, and the reason is getting -EINVAL from sendfile() when the input
> > is a file and the output is a pipe().
> 
> Yes, pipes do not support ->splice_write currenly.

Well, it clearly used to work :-)

>    I think just wiring
> up iter_file_splice_write would work.  Al?

Seems to work for the simple test case that I had, at least:

diff --git a/fs/pipe.c b/fs/pipe.c
index c5989cfd564d..39c96845a72f 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1206,6 +1206,7 @@ const struct file_operations pipefifo_fops = {
 	.unlocked_ioctl	= pipe_ioctl,
 	.release	= pipe_release,
 	.fasync		= pipe_fasync,
+	.splice_write	= iter_file_splice_write,
 };
 
 /*

johannes


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-18  8:58     ` Johannes Berg
@ 2021-01-18 19:26       ` Linus Torvalds
  2021-01-18 19:45         ` Al Viro
  0 siblings, 1 reply; 65+ messages in thread
From: Linus Torvalds @ 2021-01-18 19:26 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List,
	Greg Kroah-Hartman, Al Viro

On Mon, Jan 18, 2021 at 12:58 AM Johannes Berg
<johannes@sipsolutions.net> wrote:
>
> > I think just wiring up iter_file_splice_write would work.  Al?
>
> Seems to work for the simple test case that I had, at least:

Mind sending me a signed-off patch for this?

Yeah, I know it's a trivial one-liner, but I much prefer having an
author with a patch and a sign-off to just doing it personally and
reaping all that glory for it..

           Linus

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-18  8:53   ` Christoph Hellwig
  2021-01-18  8:58     ` Johannes Berg
@ 2021-01-18 19:34     ` Al Viro
  2021-01-18 19:46       ` Linus Torvalds
  1 sibling, 1 reply; 65+ messages in thread
From: Al Viro @ 2021-01-18 19:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Johannes Berg, Oliver Giles, linux-kernel, Greg Kroah-Hartman,
	Linus Torvalds, Al Viro

On Mon, Jan 18, 2021 at 09:53:11AM +0100, Christoph Hellwig wrote:
> On Sat, Jan 16, 2021 at 05:46:33PM +0100, Johannes Berg wrote:
> > > For my case, I attempted to instead implement splice_write and
> > > splice_read in tty_fops; I managed to get splice_write working calling
> > > ld->ops->write, but splice_read is not so simple because the
> > > tty_ldisc_ops read method expects a userspace buffer. So I cannot see
> > > how to implement this without either (a) using set_fs, or (b)
> > > implementing iter ops on all line disciplines.
> > > 
> > > Is splice()ing between a tty and a pipe worth supporting at all? Not a
> > > big deal for my use case at least, but it used to work.
> > 
> > Is it even strictly related to the tty?
> > 
> > I was just now looking into why my cgit/fcgi/nginx setup no longer
> > works, and the reason is getting -EINVAL from sendfile() when the input
> > is a file and the output is a pipe().
> 
> Yes, pipes do not support ->splice_write currenly.   I think just wiring
> up iter_file_splice_write would work.  Al?

I'd rather have sendfile(2) do what splice(2) does and handle pipes
directly.  Let me take a look,,,

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-18  8:16 ` Christoph Hellwig
@ 2021-01-18 19:36   ` Linus Torvalds
  2021-01-18 20:24     ` Linus Torvalds
  0 siblings, 1 reply; 65+ messages in thread
From: Linus Torvalds @ 2021-01-18 19:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman, Al Viro

On Mon, Jan 18, 2021 at 12:16 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Sat, Jan 16, 2021 at 08:35:41PM +1300, Oliver Giles wrote:
> > For my case, I attempted to instead implement splice_write and splice_read in tty_fops; I managed to get splice_write working calling ld->ops->write, but splice_read is not so simple because the tty_ldisc_ops read method expects a userspace buffer. So I cannot see how to implement this without either (a) using set_fs, or (b) implementing iter ops on all line disciplines.
>
> set_fs is gone for all the important platforms.  So yes, you basically
> need to convert to iov_iter or have a huge splice_write parallel
> infrastucture.

It might ok to try to limit it to just the pty cases and ldisc ops
that need it, apparently in this case pty (and presumably just one or
two line disciplines)

Of course, it probably would be really nice to try to convert
tty_read() to use the same model that we have for tty_write(), and
then make the ld->ops->read() function actually take a kernel buffer
instead.

I wonder how painful that would be.

             Linus

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-18 19:26       ` Linus Torvalds
@ 2021-01-18 19:45         ` Al Viro
  2021-01-18 19:49           ` Linus Torvalds
  2021-01-24 19:11           ` Linus Torvalds
  0 siblings, 2 replies; 65+ messages in thread
From: Al Viro @ 2021-01-18 19:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Berg, Christoph Hellwig, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman, Al Viro

On Mon, Jan 18, 2021 at 11:26:00AM -0800, Linus Torvalds wrote:
> On Mon, Jan 18, 2021 at 12:58 AM Johannes Berg
> <johannes@sipsolutions.net> wrote:
> >
> > > I think just wiring up iter_file_splice_write would work.  Al?
> >
> > Seems to work for the simple test case that I had, at least:
> 
> Mind sending me a signed-off patch for this?
> 
> Yeah, I know it's a trivial one-liner, but I much prefer having an
> author with a patch and a sign-off to just doing it personally and
> reaping all that glory for it..

IMO it's a wrong way to handle that.   Look: do_sendfile() calls
do_splice_direct(), which calls splice_direct_to_actor().  There
we allocate an internal pipe and go through "feed from source into
that pipe, then shove what's there into destination".  Which is
insane for the case when destination (or source, for that matter)
is a pipe itself.

do_splice_direct() does something that do_splice() won't - it
handles non-pipe to non-pipe case.  Which is how sendfile(2) is
normally used, of course.

I'll look into that in more details, but IMO bothering with
internal pipe is just plain wrong for those cases.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-18 19:34     ` Al Viro
@ 2021-01-18 19:46       ` Linus Torvalds
  2021-01-18 19:54         ` Al Viro
  0 siblings, 1 reply; 65+ messages in thread
From: Linus Torvalds @ 2021-01-18 19:46 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Johannes Berg, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

On Mon, Jan 18, 2021 at 11:35 AM Al Viro <viro@zeniv-ca> wrote:
>
> I'd rather have sendfile(2) do what splice(2) does and handle pipes
> directly.  Let me take a look,,,

It's not technically just the sendfile() thing. Some things do
sendfile "internally" (ie they use do_splice_direct()).

Although maybe they always happen just on non-pipe destinations (ie
file-to-file copy). I didn't check.

But particularly if it can be handled in do_splice_direct(), that
would be a good thing.

             Linus

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-18 19:45         ` Al Viro
@ 2021-01-18 19:49           ` Linus Torvalds
  2021-01-18 19:56             ` Al Viro
  2021-01-24 19:11           ` Linus Torvalds
  1 sibling, 1 reply; 65+ messages in thread
From: Linus Torvalds @ 2021-01-18 19:49 UTC (permalink / raw)
  To: Al Viro
  Cc: Johannes Berg, Christoph Hellwig, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

On Mon, Jan 18, 2021 at 11:45 AM Al Viro <viro@zeniv-ca> wrote:
>
> do_splice_direct() does something that do_splice() won't - it
> handles non-pipe to non-pipe case.  Which is how sendfile(2) is
> normally used, of course.

Yeah, I agree, it's better if we do the pipe case specially, but if
it's too painful for a stable backport I'll certainly take the
one-liner..

Btw, your email setup is broken. Your emails now have a "From:" line like this:

    From: Al Viro <viro@zeniv-ca>

and that is not a valid email address.

              Linus

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-18 19:46       ` Linus Torvalds
@ 2021-01-18 19:54         ` Al Viro
  2021-01-20 16:26           ` Al Viro
  0 siblings, 1 reply; 65+ messages in thread
From: Al Viro @ 2021-01-18 19:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Christoph Hellwig, Johannes Berg, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

On Mon, Jan 18, 2021 at 11:46:42AM -0800, Linus Torvalds wrote:
> On Mon, Jan 18, 2021 at 11:35 AM Al Viro <viro@zeniv-ca> wrote:
> >
> > I'd rather have sendfile(2) do what splice(2) does and handle pipes
> > directly.  Let me take a look,,,
> 
> It's not technically just the sendfile() thing. Some things do
> sendfile "internally" (ie they use do_splice_direct()).
> 
> Although maybe they always happen just on non-pipe destinations (ie
> file-to-file copy). I didn't check.
> 
> But particularly if it can be handled in do_splice_direct(), that
> would be a good thing.

FWIW, it might make sense to merge do_splice_direct() and do_splice();
interfaces are very similar and do_splice() is basically
	if both are pipes
		....
	else if input is pipe
		....
	else if output is pipe
		....
	else
		return -EINVAL
with do_splice_direct() being fairly close to the missing case.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-18 19:49           ` Linus Torvalds
@ 2021-01-18 19:56             ` Al Viro
  0 siblings, 0 replies; 65+ messages in thread
From: Al Viro @ 2021-01-18 19:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Berg, Christoph Hellwig, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

On Mon, Jan 18, 2021 at 11:49:53AM -0800, Linus Torvalds wrote:
> On Mon, Jan 18, 2021 at 11:45 AM Al Viro <viro@zeniv-ca> wrote:
> >
> > do_splice_direct() does something that do_splice() won't - it
> > handles non-pipe to non-pipe case.  Which is how sendfile(2) is
> > normally used, of course.
> 
> Yeah, I agree, it's better if we do the pipe case specially, but if
> it's too painful for a stable backport I'll certainly take the
> one-liner..
> 
> Btw, your email setup is broken. Your emails now have a "From:" line like this:
> 
>     From: Al Viro <viro@zeniv-ca>
> 
> and that is not a valid email address.

Just noticed, should be fixed now...

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-18 19:36   ` Linus Torvalds
@ 2021-01-18 20:24     ` Linus Torvalds
  2021-01-18 21:35       ` Linus Torvalds
  0 siblings, 1 reply; 65+ messages in thread
From: Linus Torvalds @ 2021-01-18 20:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman, Al Viro

[-- Attachment #1: Type: text/plain, Size: 2108 bytes --]

On Mon, Jan 18, 2021 at 11:36 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Of course, it probably would be really nice to try to convert
> tty_read() to use the same model that we have for tty_write(), and
> then make the ld->ops->read() function actually take a kernel buffer
> instead.
>
> I wonder how painful that would be.

Oh, that seems _much_ less painful than I thought it would be.

This is a COMPLETELY BROKEN patch that builds cleanly for me. I say
that it's completely broken for three reasons:

 (a) that extra "int dummy" argument is there purely to force build
errors if some user hasn't been converted.

 (b) I intentionally made the conversion truly stupid. It's not
"broken", but it needs cleanup for nasty variable names (ie things
like me changing that already badly named "b" variable to "kbp" to
again catch all users at build time)

 (c) the buffer handling in tty_read() is actively broken. Things like
HDLC rely on getting a proper buffer that can hold the whole packet,
and the max packet size is actually potentially quite big.

But the only real problem does seem to be HDLC, which has a default
maximum frame size of 4k, but it can be set all the way up to 64kB as
a module parameter.

So that tty_read() conversion in this patch is complete garbage, but I
really just wrote this to see how many painful places there are. And
there are not many.

The HDLC case could probably be done fairly well by

 - have a small on-stack buffer for the "small read" case

 - have a kmalloc() buffer that defaults to one page for bigger reads

 - grow the kmalloc() buffer if the ->read function returns -EOVERFLOW

Comments?

NOTE! This does *not* do the actual splice_read() implementation. No,
this literally is just the preparatory stage for that by starting the
whole "make the tty ldisc read path use a kernel buffer instead".

Anybody want to play with this? I'd suggest keeping that "dummy"
parameter around for a while - I did an allmodconfig build with this,
but if there are any architecture-specific non-x86-64 cases I wouldn't
have seen them.

             Linus

[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 10483 bytes --]

 drivers/bluetooth/hci_ldisc.c |  2 +-
 drivers/input/serio/serport.c |  3 ++-
 drivers/net/ppp/ppp_async.c   |  2 +-
 drivers/net/ppp/ppp_synctty.c |  2 +-
 drivers/tty/n_gsm.c           |  2 +-
 drivers/tty/n_hdlc.c          |  8 +++-----
 drivers/tty/n_null.c          |  2 +-
 drivers/tty/n_tracerouter.c   |  2 +-
 drivers/tty/n_tracesink.c     |  2 +-
 drivers/tty/n_tty.c           | 39 +++++++++++++++------------------------
 drivers/tty/tty_io.c          | 13 +++++++++----
 include/linux/tty_ldisc.h     |  2 +-
 net/nfc/nci/uart.c            |  2 +-
 13 files changed, 38 insertions(+), 43 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index f83d67eafc9f..371c21715202 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -802,7 +802,7 @@ static int hci_uart_tty_ioctl(struct tty_struct *tty, struct file *file,
  * We don't provide read/write/poll interface for user space.
  */
 static ssize_t hci_uart_tty_read(struct tty_struct *tty, struct file *file,
-				 unsigned char __user *buf, size_t nr)
+				 int dummy, unsigned char *buf, size_t nr)
 {
 	return 0;
 }
diff --git a/drivers/input/serio/serport.c b/drivers/input/serio/serport.c
index 8ac970a423de..1a9afd350a73 100644
--- a/drivers/input/serio/serport.c
+++ b/drivers/input/serio/serport.c
@@ -156,7 +156,8 @@ static void serport_ldisc_receive(struct tty_struct *tty, const unsigned char *c
  * returning 0 characters.
  */
 
-static ssize_t serport_ldisc_read(struct tty_struct * tty, struct file * file, unsigned char __user * buf, size_t nr)
+static ssize_t serport_ldisc_read(struct tty_struct * tty, struct file * file,
+				  int dummy, unsigned char *kbuf, size_t nr)
 {
 	struct serport *serport = (struct serport*) tty->disc_data;
 	struct serio *serio;
diff --git a/drivers/net/ppp/ppp_async.c b/drivers/net/ppp/ppp_async.c
index 29a0917a81e6..58789a70282d 100644
--- a/drivers/net/ppp/ppp_async.c
+++ b/drivers/net/ppp/ppp_async.c
@@ -259,7 +259,7 @@ static int ppp_asynctty_hangup(struct tty_struct *tty)
  */
 static ssize_t
 ppp_asynctty_read(struct tty_struct *tty, struct file *file,
-		  unsigned char __user *buf, size_t count)
+		  int dummy, unsigned char *buf, size_t count)
 {
 	return -EAGAIN;
 }
diff --git a/drivers/net/ppp/ppp_synctty.c b/drivers/net/ppp/ppp_synctty.c
index 0f338752c38b..7ad4180b1360 100644
--- a/drivers/net/ppp/ppp_synctty.c
+++ b/drivers/net/ppp/ppp_synctty.c
@@ -257,7 +257,7 @@ static int ppp_sync_hangup(struct tty_struct *tty)
  */
 static ssize_t
 ppp_sync_read(struct tty_struct *tty, struct file *file,
-	       unsigned char __user *buf, size_t count)
+	      int dummy, unsigned char *buf, size_t count)
 {
 	return -EAGAIN;
 }
diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index c676fa89ee0b..e37ba8903c82 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -2559,7 +2559,7 @@ static void gsmld_write_wakeup(struct tty_struct *tty)
  */
 
 static ssize_t gsmld_read(struct tty_struct *tty, struct file *file,
-			 unsigned char __user *buf, size_t nr)
+			  int dummy, unsigned char *buf, size_t nr)
 {
 	return -EOPNOTSUPP;
 }
diff --git a/drivers/tty/n_hdlc.c b/drivers/tty/n_hdlc.c
index 12557ee1edb6..c62d719adaaa 100644
--- a/drivers/tty/n_hdlc.c
+++ b/drivers/tty/n_hdlc.c
@@ -416,7 +416,7 @@ static void n_hdlc_tty_receive(struct tty_struct *tty, const __u8 *data,
  * Returns the number of bytes returned or error code.
  */
 static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file,
-			   __u8 __user *buf, size_t nr)
+			   int dummy, __u8 *kbuf, size_t nr)
 {
 	struct n_hdlc *n_hdlc = tty->disc_data;
 	int ret = 0;
@@ -442,10 +442,8 @@ static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file,
 				ret = -EOVERFLOW;
 			} else {
 				__set_current_state(TASK_RUNNING);
-				if (copy_to_user(buf, rbuf->buf, rbuf->count))
-					ret = -EFAULT;
-				else
-					ret = rbuf->count;
+				memcpy(kbuf, rbuf->buf, rbuf->count);
+				ret = rbuf->count;
 			}
 
 			if (n_hdlc->rx_free_buf_list.count >
diff --git a/drivers/tty/n_null.c b/drivers/tty/n_null.c
index 96feabae4740..d42363c62333 100644
--- a/drivers/tty/n_null.c
+++ b/drivers/tty/n_null.c
@@ -20,7 +20,7 @@ static void n_null_close(struct tty_struct *tty)
 }
 
 static ssize_t n_null_read(struct tty_struct *tty, struct file *file,
-			   unsigned char __user * buf, size_t nr)
+			   int dummy, unsigned char *buf, size_t nr)
 {
 	return -EOPNOTSUPP;
 }
diff --git a/drivers/tty/n_tracerouter.c b/drivers/tty/n_tracerouter.c
index 4479af4d2fa5..9eadaab29d7e 100644
--- a/drivers/tty/n_tracerouter.c
+++ b/drivers/tty/n_tracerouter.c
@@ -118,7 +118,7 @@ static void n_tracerouter_close(struct tty_struct *tty)
  *	 -EINVAL
  */
 static ssize_t n_tracerouter_read(struct tty_struct *tty, struct file *file,
-				  unsigned char __user *buf, size_t nr) {
+				  int dummy, unsigned char *buf, size_t nr) {
 	return -EINVAL;
 }
 
diff --git a/drivers/tty/n_tracesink.c b/drivers/tty/n_tracesink.c
index d96ba82cc356..a8508a07bc0d 100644
--- a/drivers/tty/n_tracesink.c
+++ b/drivers/tty/n_tracesink.c
@@ -115,7 +115,7 @@ static void n_tracesink_close(struct tty_struct *tty)
  *	 -EINVAL
  */
 static ssize_t n_tracesink_read(struct tty_struct *tty, struct file *file,
-				unsigned char __user *buf, size_t nr) {
+				int dummy, unsigned char *buf, size_t nr) {
 	return -EINVAL;
 }
 
diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index 319d68c8a5df..e46fa08f3354 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -1962,7 +1962,7 @@ static inline int input_available_p(struct tty_struct *tty, int poll)
  */
 
 static int copy_from_read_buf(struct tty_struct *tty,
-				      unsigned char __user **b,
+				      unsigned char **kbp,
 				      size_t *nr)
 
 {
@@ -1978,8 +1978,7 @@ static int copy_from_read_buf(struct tty_struct *tty,
 	n = min(*nr, n);
 	if (n) {
 		unsigned char *from = read_buf_addr(ldata, tail);
-		retval = copy_to_user(*b, from, n);
-		n -= retval;
+		memcpy(*kbp, from, n);
 		is_eof = n == 1 && *from == EOF_CHAR(tty);
 		tty_audit_add_data(tty, from, n);
 		zero_buffer(tty, from, n);
@@ -1988,7 +1987,7 @@ static int copy_from_read_buf(struct tty_struct *tty,
 		if (L_EXTPROC(tty) && ldata->icanon && is_eof &&
 		    (head == ldata->read_tail))
 			n = 0;
-		*b += n;
+		*kbp += n;
 		*nr -= n;
 	}
 	return retval;
@@ -2132,10 +2131,10 @@ static int job_control(struct tty_struct *tty, struct file *file)
  */
 
 static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
-			 unsigned char __user *buf, size_t nr)
+			  int dummy, unsigned char *kbuf, size_t nr)
 {
 	struct n_tty_data *ldata = tty->disc_data;
-	unsigned char __user *b = buf;
+	unsigned char *kb = kbuf;
 	DEFINE_WAIT_FUNC(wait, woken_wake_function);
 	int c;
 	int minimum, time;
@@ -2181,17 +2180,13 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 		/* First test for status change. */
 		if (packet && tty->link->ctrl_status) {
 			unsigned char cs;
-			if (b != buf)
+			if (kb != kbuf)
 				break;
 			spin_lock_irq(&tty->link->ctrl_lock);
 			cs = tty->link->ctrl_status;
 			tty->link->ctrl_status = 0;
 			spin_unlock_irq(&tty->link->ctrl_lock);
-			if (put_user(cs, b)) {
-				retval = -EFAULT;
-				break;
-			}
-			b++;
+			*kb++ = cs;
 			nr--;
 			break;
 		}
@@ -2234,24 +2229,20 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 		}
 
 		if (ldata->icanon && !L_EXTPROC(tty)) {
-			retval = canon_copy_from_read_buf(tty, &b, &nr);
+			retval = canon_copy_from_read_buf(tty, &kb, &nr);
 			if (retval)
 				break;
 		} else {
 			int uncopied;
 
 			/* Deal with packet mode. */
-			if (packet && b == buf) {
-				if (put_user(TIOCPKT_DATA, b)) {
-					retval = -EFAULT;
-					break;
-				}
-				b++;
+			if (packet && kb == kbuf) {
+				*kb++ = TIOCPKT_DATA;
 				nr--;
 			}
 
-			uncopied = copy_from_read_buf(tty, &b, &nr);
-			uncopied += copy_from_read_buf(tty, &b, &nr);
+			uncopied = copy_from_read_buf(tty, &kb, &nr);
+			uncopied += copy_from_read_buf(tty, &kb, &nr);
 			if (uncopied) {
 				retval = -EFAULT;
 				break;
@@ -2260,7 +2251,7 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 
 		n_tty_check_unthrottle(tty);
 
-		if (b - buf >= minimum)
+		if (kb - kbuf >= minimum)
 			break;
 		if (time)
 			timeout = time;
@@ -2272,8 +2263,8 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 	remove_wait_queue(&tty->read_wait, &wait);
 	mutex_unlock(&ldata->atomic_read_lock);
 
-	if (b - buf)
-		retval = b - buf;
+	if (kb - kbuf)
+		retval = kb - kbuf;
 
 	return retval;
 }
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 8034489337d7..d2d558046420 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -864,10 +864,15 @@ static ssize_t tty_read(struct file *file, char __user *buf, size_t count,
 	ld = tty_ldisc_ref_wait(tty);
 	if (!ld)
 		return hung_up_tty_read(file, buf, count, ppos);
-	if (ld->ops->read)
-		i = ld->ops->read(tty, file, buf, count);
-	else
-		i = -EIO;
+	i = -EIO;
+	if (ld->ops->read) {
+		char kernel_buf[32];
+		if (count > sizeof(kernel_buf))
+			count = sizeof(kernel_buf);
+		i = ld->ops->read(tty, file, 0, kernel_buf, count);
+		if (i > 0 && copy_to_user(buf, kernel_buf, i))
+			i = -EFAULT;
+	}
 	tty_ldisc_deref(ld);
 
 	if (i > 0)
diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h
index b1e6043e9917..516d946e57b6 100644
--- a/include/linux/tty_ldisc.h
+++ b/include/linux/tty_ldisc.h
@@ -185,7 +185,7 @@ struct tty_ldisc_ops {
 	void	(*close)(struct tty_struct *);
 	void	(*flush_buffer)(struct tty_struct *tty);
 	ssize_t	(*read)(struct tty_struct *tty, struct file *file,
-			unsigned char __user *buf, size_t nr);
+			int dummy, unsigned char *buf, size_t nr);
 	ssize_t	(*write)(struct tty_struct *tty, struct file *file,
 			 const unsigned char *buf, size_t nr);
 	int	(*ioctl)(struct tty_struct *tty, struct file *file,
diff --git a/net/nfc/nci/uart.c b/net/nfc/nci/uart.c
index 11b554ce07ff..c4d457ba1668 100644
--- a/net/nfc/nci/uart.c
+++ b/net/nfc/nci/uart.c
@@ -292,7 +292,7 @@ static int nci_uart_tty_ioctl(struct tty_struct *tty, struct file *file,
 
 /* We don't provide read/write/poll interface for user space. */
 static ssize_t nci_uart_tty_read(struct tty_struct *tty, struct file *file,
-				 unsigned char __user *buf, size_t nr)
+				 int dummy, unsigned char *buf, size_t nr)
 {
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-18 20:24     ` Linus Torvalds
@ 2021-01-18 21:35       ` Linus Torvalds
  2021-01-18 21:54         ` Linus Torvalds
  2021-01-19 11:52         ` Greg Kroah-Hartman
  0 siblings, 2 replies; 65+ messages in thread
From: Linus Torvalds @ 2021-01-18 21:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman, Al Viro

[-- Attachment #1: Type: text/plain, Size: 1108 bytes --]

On Mon, Jan 18, 2021 at 12:24 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Anybody want to play with this? I'd suggest keeping that "dummy"
> parameter around for a while - I did an allmodconfig build with this,
> but if there are any architecture-specific non-x86-64 cases I wouldn't
> have seen them.

Not architecture-specific, but I did find by some grepping that I
missed one line discipline driver: the Siemens R3964.

The reason I missed that is because it's been marked BROKEN in the
Kconfig for almost two years, so it didn't show up in my allmodconfig
coverage.

But the fact that I found it makes me a bit happier about my patch
actually covering all the cases.

My grep exercise failed on the bluetooth hci_ldisc.c pattern, which I
feel is because the bluetooth code used the wrong pattern to
initialize the 'struct tty_ldisc_ops', so I fixed that up too.

So here's a slightly updated version of that patch, but apart from
slightly better coverage - even if it's a driver that is disabled
anyway - I'd like to point out that all my previous caveats still
apply.

          Linus

[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 13324 bytes --]

 drivers/bluetooth/hci_ldisc.c | 33 ++++++++++++++++-----------------
 drivers/input/serio/serport.c |  3 ++-
 drivers/net/ppp/ppp_async.c   |  2 +-
 drivers/net/ppp/ppp_synctty.c |  2 +-
 drivers/tty/n_gsm.c           |  2 +-
 drivers/tty/n_hdlc.c          |  8 +++-----
 drivers/tty/n_null.c          |  2 +-
 drivers/tty/n_r3964.c         |  9 +++------
 drivers/tty/n_tracerouter.c   |  2 +-
 drivers/tty/n_tracesink.c     |  2 +-
 drivers/tty/n_tty.c           | 39 +++++++++++++++------------------------
 drivers/tty/tty_io.c          | 13 +++++++++----
 include/linux/tty_ldisc.h     |  2 +-
 net/nfc/nci/uart.c            |  2 +-
 14 files changed, 56 insertions(+), 65 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index f83d67eafc9f..1adc43e471bc 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -802,7 +802,7 @@ static int hci_uart_tty_ioctl(struct tty_struct *tty, struct file *file,
  * We don't provide read/write/poll interface for user space.
  */
 static ssize_t hci_uart_tty_read(struct tty_struct *tty, struct file *file,
-				 unsigned char __user *buf, size_t nr)
+				 int dummy, unsigned char *buf, size_t nr)
 {
 	return 0;
 }
@@ -819,29 +819,28 @@ static __poll_t hci_uart_tty_poll(struct tty_struct *tty,
 	return 0;
 }
 
+static struct tty_ldisc_ops hci_uart_ldisc = {
+	.owner		= THIS_MODULE,
+	.magic		= TTY_LDISC_MAGIC,
+	.name		= "n_hci",
+	.open		= hci_uart_tty_open,
+	.close		= hci_uart_tty_close,
+	.read		= hci_uart_tty_read,
+	.write		= hci_uart_tty_write,
+	.ioctl		= hci_uart_tty_ioctl,
+	.compat_ioctl	= hci_uart_tty_ioctl,
+	.poll		= hci_uart_tty_poll,
+	.receive_buf	= hci_uart_tty_receive,
+	.write_wakeup	= hci_uart_tty_wakeup,
+};
+
 static int __init hci_uart_init(void)
 {
-	static struct tty_ldisc_ops hci_uart_ldisc;
 	int err;
 
 	BT_INFO("HCI UART driver ver %s", VERSION);
 
 	/* Register the tty discipline */
-
-	memset(&hci_uart_ldisc, 0, sizeof(hci_uart_ldisc));
-	hci_uart_ldisc.magic		= TTY_LDISC_MAGIC;
-	hci_uart_ldisc.name		= "n_hci";
-	hci_uart_ldisc.open		= hci_uart_tty_open;
-	hci_uart_ldisc.close		= hci_uart_tty_close;
-	hci_uart_ldisc.read		= hci_uart_tty_read;
-	hci_uart_ldisc.write		= hci_uart_tty_write;
-	hci_uart_ldisc.ioctl		= hci_uart_tty_ioctl;
-	hci_uart_ldisc.compat_ioctl	= hci_uart_tty_ioctl;
-	hci_uart_ldisc.poll		= hci_uart_tty_poll;
-	hci_uart_ldisc.receive_buf	= hci_uart_tty_receive;
-	hci_uart_ldisc.write_wakeup	= hci_uart_tty_wakeup;
-	hci_uart_ldisc.owner		= THIS_MODULE;
-
 	err = tty_register_ldisc(N_HCI, &hci_uart_ldisc);
 	if (err) {
 		BT_ERR("HCI line discipline registration failed. (%d)", err);
diff --git a/drivers/input/serio/serport.c b/drivers/input/serio/serport.c
index 8ac970a423de..1a9afd350a73 100644
--- a/drivers/input/serio/serport.c
+++ b/drivers/input/serio/serport.c
@@ -156,7 +156,8 @@ static void serport_ldisc_receive(struct tty_struct *tty, const unsigned char *c
  * returning 0 characters.
  */
 
-static ssize_t serport_ldisc_read(struct tty_struct * tty, struct file * file, unsigned char __user * buf, size_t nr)
+static ssize_t serport_ldisc_read(struct tty_struct * tty, struct file * file,
+				  int dummy, unsigned char *kbuf, size_t nr)
 {
 	struct serport *serport = (struct serport*) tty->disc_data;
 	struct serio *serio;
diff --git a/drivers/net/ppp/ppp_async.c b/drivers/net/ppp/ppp_async.c
index 29a0917a81e6..58789a70282d 100644
--- a/drivers/net/ppp/ppp_async.c
+++ b/drivers/net/ppp/ppp_async.c
@@ -259,7 +259,7 @@ static int ppp_asynctty_hangup(struct tty_struct *tty)
  */
 static ssize_t
 ppp_asynctty_read(struct tty_struct *tty, struct file *file,
-		  unsigned char __user *buf, size_t count)
+		  int dummy, unsigned char *buf, size_t count)
 {
 	return -EAGAIN;
 }
diff --git a/drivers/net/ppp/ppp_synctty.c b/drivers/net/ppp/ppp_synctty.c
index 0f338752c38b..7ad4180b1360 100644
--- a/drivers/net/ppp/ppp_synctty.c
+++ b/drivers/net/ppp/ppp_synctty.c
@@ -257,7 +257,7 @@ static int ppp_sync_hangup(struct tty_struct *tty)
  */
 static ssize_t
 ppp_sync_read(struct tty_struct *tty, struct file *file,
-	       unsigned char __user *buf, size_t count)
+	      int dummy, unsigned char *buf, size_t count)
 {
 	return -EAGAIN;
 }
diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index c676fa89ee0b..e37ba8903c82 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -2559,7 +2559,7 @@ static void gsmld_write_wakeup(struct tty_struct *tty)
  */
 
 static ssize_t gsmld_read(struct tty_struct *tty, struct file *file,
-			 unsigned char __user *buf, size_t nr)
+			  int dummy, unsigned char *buf, size_t nr)
 {
 	return -EOPNOTSUPP;
 }
diff --git a/drivers/tty/n_hdlc.c b/drivers/tty/n_hdlc.c
index 12557ee1edb6..c62d719adaaa 100644
--- a/drivers/tty/n_hdlc.c
+++ b/drivers/tty/n_hdlc.c
@@ -416,7 +416,7 @@ static void n_hdlc_tty_receive(struct tty_struct *tty, const __u8 *data,
  * Returns the number of bytes returned or error code.
  */
 static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file,
-			   __u8 __user *buf, size_t nr)
+			   int dummy, __u8 *kbuf, size_t nr)
 {
 	struct n_hdlc *n_hdlc = tty->disc_data;
 	int ret = 0;
@@ -442,10 +442,8 @@ static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file,
 				ret = -EOVERFLOW;
 			} else {
 				__set_current_state(TASK_RUNNING);
-				if (copy_to_user(buf, rbuf->buf, rbuf->count))
-					ret = -EFAULT;
-				else
-					ret = rbuf->count;
+				memcpy(kbuf, rbuf->buf, rbuf->count);
+				ret = rbuf->count;
 			}
 
 			if (n_hdlc->rx_free_buf_list.count >
diff --git a/drivers/tty/n_null.c b/drivers/tty/n_null.c
index 96feabae4740..d42363c62333 100644
--- a/drivers/tty/n_null.c
+++ b/drivers/tty/n_null.c
@@ -20,7 +20,7 @@ static void n_null_close(struct tty_struct *tty)
 }
 
 static ssize_t n_null_read(struct tty_struct *tty, struct file *file,
-			   unsigned char __user * buf, size_t nr)
+			   int dummy, unsigned char *buf, size_t nr)
 {
 	return -EOPNOTSUPP;
 }
diff --git a/drivers/tty/n_r3964.c b/drivers/tty/n_r3964.c
index 934dd2fb2ec8..1c8143276b3e 100644
--- a/drivers/tty/n_r3964.c
+++ b/drivers/tty/n_r3964.c
@@ -129,7 +129,7 @@ static void remove_client_block(struct r3964_info *pInfo,
 static int r3964_open(struct tty_struct *tty);
 static void r3964_close(struct tty_struct *tty);
 static ssize_t r3964_read(struct tty_struct *tty, struct file *file,
-		unsigned char __user * buf, size_t nr);
+		int dummy, unsigned char *buf, size_t nr);
 static ssize_t r3964_write(struct tty_struct *tty, struct file *file,
 		const unsigned char *buf, size_t nr);
 static int r3964_ioctl(struct tty_struct *tty, struct file *file,
@@ -1058,7 +1058,7 @@ static void r3964_close(struct tty_struct *tty)
 }
 
 static ssize_t r3964_read(struct tty_struct *tty, struct file *file,
-			  unsigned char __user * buf, size_t nr)
+			  int dummy, unsigned char *kbuf, size_t nr)
 {
 	struct r3964_info *pInfo = tty->disc_data;
 	struct r3964_client_info *pClient;
@@ -1109,10 +1109,7 @@ static ssize_t r3964_read(struct tty_struct *tty, struct file *file,
 		kfree(pMsg);
 		TRACE_M("r3964_read - msg kfree %p", pMsg);
 
-		if (copy_to_user(buf, &theMsg, ret)) {
-			ret = -EFAULT;
-			goto unlock;
-		}
+		memcpy(kbuf, &theMsg, ret);
 
 		TRACE_PS("read - return %d", ret);
 		goto unlock;
diff --git a/drivers/tty/n_tracerouter.c b/drivers/tty/n_tracerouter.c
index 4479af4d2fa5..9eadaab29d7e 100644
--- a/drivers/tty/n_tracerouter.c
+++ b/drivers/tty/n_tracerouter.c
@@ -118,7 +118,7 @@ static void n_tracerouter_close(struct tty_struct *tty)
  *	 -EINVAL
  */
 static ssize_t n_tracerouter_read(struct tty_struct *tty, struct file *file,
-				  unsigned char __user *buf, size_t nr) {
+				  int dummy, unsigned char *buf, size_t nr) {
 	return -EINVAL;
 }
 
diff --git a/drivers/tty/n_tracesink.c b/drivers/tty/n_tracesink.c
index d96ba82cc356..a8508a07bc0d 100644
--- a/drivers/tty/n_tracesink.c
+++ b/drivers/tty/n_tracesink.c
@@ -115,7 +115,7 @@ static void n_tracesink_close(struct tty_struct *tty)
  *	 -EINVAL
  */
 static ssize_t n_tracesink_read(struct tty_struct *tty, struct file *file,
-				unsigned char __user *buf, size_t nr) {
+				int dummy, unsigned char *buf, size_t nr) {
 	return -EINVAL;
 }
 
diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index 319d68c8a5df..e46fa08f3354 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -1962,7 +1962,7 @@ static inline int input_available_p(struct tty_struct *tty, int poll)
  */
 
 static int copy_from_read_buf(struct tty_struct *tty,
-				      unsigned char __user **b,
+				      unsigned char **kbp,
 				      size_t *nr)
 
 {
@@ -1978,8 +1978,7 @@ static int copy_from_read_buf(struct tty_struct *tty,
 	n = min(*nr, n);
 	if (n) {
 		unsigned char *from = read_buf_addr(ldata, tail);
-		retval = copy_to_user(*b, from, n);
-		n -= retval;
+		memcpy(*kbp, from, n);
 		is_eof = n == 1 && *from == EOF_CHAR(tty);
 		tty_audit_add_data(tty, from, n);
 		zero_buffer(tty, from, n);
@@ -1988,7 +1987,7 @@ static int copy_from_read_buf(struct tty_struct *tty,
 		if (L_EXTPROC(tty) && ldata->icanon && is_eof &&
 		    (head == ldata->read_tail))
 			n = 0;
-		*b += n;
+		*kbp += n;
 		*nr -= n;
 	}
 	return retval;
@@ -2132,10 +2131,10 @@ static int job_control(struct tty_struct *tty, struct file *file)
  */
 
 static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
-			 unsigned char __user *buf, size_t nr)
+			  int dummy, unsigned char *kbuf, size_t nr)
 {
 	struct n_tty_data *ldata = tty->disc_data;
-	unsigned char __user *b = buf;
+	unsigned char *kb = kbuf;
 	DEFINE_WAIT_FUNC(wait, woken_wake_function);
 	int c;
 	int minimum, time;
@@ -2181,17 +2180,13 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 		/* First test for status change. */
 		if (packet && tty->link->ctrl_status) {
 			unsigned char cs;
-			if (b != buf)
+			if (kb != kbuf)
 				break;
 			spin_lock_irq(&tty->link->ctrl_lock);
 			cs = tty->link->ctrl_status;
 			tty->link->ctrl_status = 0;
 			spin_unlock_irq(&tty->link->ctrl_lock);
-			if (put_user(cs, b)) {
-				retval = -EFAULT;
-				break;
-			}
-			b++;
+			*kb++ = cs;
 			nr--;
 			break;
 		}
@@ -2234,24 +2229,20 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 		}
 
 		if (ldata->icanon && !L_EXTPROC(tty)) {
-			retval = canon_copy_from_read_buf(tty, &b, &nr);
+			retval = canon_copy_from_read_buf(tty, &kb, &nr);
 			if (retval)
 				break;
 		} else {
 			int uncopied;
 
 			/* Deal with packet mode. */
-			if (packet && b == buf) {
-				if (put_user(TIOCPKT_DATA, b)) {
-					retval = -EFAULT;
-					break;
-				}
-				b++;
+			if (packet && kb == kbuf) {
+				*kb++ = TIOCPKT_DATA;
 				nr--;
 			}
 
-			uncopied = copy_from_read_buf(tty, &b, &nr);
-			uncopied += copy_from_read_buf(tty, &b, &nr);
+			uncopied = copy_from_read_buf(tty, &kb, &nr);
+			uncopied += copy_from_read_buf(tty, &kb, &nr);
 			if (uncopied) {
 				retval = -EFAULT;
 				break;
@@ -2260,7 +2251,7 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 
 		n_tty_check_unthrottle(tty);
 
-		if (b - buf >= minimum)
+		if (kb - kbuf >= minimum)
 			break;
 		if (time)
 			timeout = time;
@@ -2272,8 +2263,8 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 	remove_wait_queue(&tty->read_wait, &wait);
 	mutex_unlock(&ldata->atomic_read_lock);
 
-	if (b - buf)
-		retval = b - buf;
+	if (kb - kbuf)
+		retval = kb - kbuf;
 
 	return retval;
 }
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 8034489337d7..d2d558046420 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -864,10 +864,15 @@ static ssize_t tty_read(struct file *file, char __user *buf, size_t count,
 	ld = tty_ldisc_ref_wait(tty);
 	if (!ld)
 		return hung_up_tty_read(file, buf, count, ppos);
-	if (ld->ops->read)
-		i = ld->ops->read(tty, file, buf, count);
-	else
-		i = -EIO;
+	i = -EIO;
+	if (ld->ops->read) {
+		char kernel_buf[32];
+		if (count > sizeof(kernel_buf))
+			count = sizeof(kernel_buf);
+		i = ld->ops->read(tty, file, 0, kernel_buf, count);
+		if (i > 0 && copy_to_user(buf, kernel_buf, i))
+			i = -EFAULT;
+	}
 	tty_ldisc_deref(ld);
 
 	if (i > 0)
diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h
index b1e6043e9917..516d946e57b6 100644
--- a/include/linux/tty_ldisc.h
+++ b/include/linux/tty_ldisc.h
@@ -185,7 +185,7 @@ struct tty_ldisc_ops {
 	void	(*close)(struct tty_struct *);
 	void	(*flush_buffer)(struct tty_struct *tty);
 	ssize_t	(*read)(struct tty_struct *tty, struct file *file,
-			unsigned char __user *buf, size_t nr);
+			int dummy, unsigned char *buf, size_t nr);
 	ssize_t	(*write)(struct tty_struct *tty, struct file *file,
 			 const unsigned char *buf, size_t nr);
 	int	(*ioctl)(struct tty_struct *tty, struct file *file,
diff --git a/net/nfc/nci/uart.c b/net/nfc/nci/uart.c
index 11b554ce07ff..c4d457ba1668 100644
--- a/net/nfc/nci/uart.c
+++ b/net/nfc/nci/uart.c
@@ -292,7 +292,7 @@ static int nci_uart_tty_ioctl(struct tty_struct *tty, struct file *file,
 
 /* We don't provide read/write/poll interface for user space. */
 static ssize_t nci_uart_tty_read(struct tty_struct *tty, struct file *file,
-				 unsigned char __user *buf, size_t nr)
+				 int dummy, unsigned char *buf, size_t nr)
 {
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-18 21:35       ` Linus Torvalds
@ 2021-01-18 21:54         ` Linus Torvalds
  2021-01-18 22:03           ` Linus Torvalds
  2021-01-19 11:52         ` Greg Kroah-Hartman
  1 sibling, 1 reply; 65+ messages in thread
From: Linus Torvalds @ 2021-01-18 21:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman, Al Viro

On Mon, Jan 18, 2021 at 1:35 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So here's a slightly updated version of that patch, but apart from
> slightly better coverage - even if it's a driver that is disabled
> anyway - I'd like to point out that all my previous caveats still
> apply.

I have now booted that version to see that I didn't make any horribly
obvious mistakes.

And I must have screwed something up. I can actually use the machine
normally (I'm writing this running that kernel), but when I decided to
test the actual virtual console (as opposed to the GUI terminal
windows that use pty's), I can't even log in.

That *may* just be due to the inexcusably lazy and stupid "chunk
things up into 32 byte pieces" I did. Most programs shouldn't care,
tty's can return partial results anyway, but it's obviously a fairly
fundamental and silly change.

But it might well be some other conversion bug of mine even if I tried
to keep it fairly minimal and straight-forward.

So I suggest taking that patch with a lot of salt, and really treating
it as a very rough starting point (which was the point of it - trying
to actually boot it as-is was more of a "let's see if it survives at
all" thing).

                Linus

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-18 21:54         ` Linus Torvalds
@ 2021-01-18 22:03           ` Linus Torvalds
  2021-01-18 22:20             ` Linus Torvalds
  0 siblings, 1 reply; 65+ messages in thread
From: Linus Torvalds @ 2021-01-18 22:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman, Al Viro

On Mon, Jan 18, 2021 at 1:54 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> But it might well be some other conversion bug of mine even if I tried
> to keep it fairly minimal and straight-forward.

Duh. I completely forgot to handle the canon_copy_from_read_buf()
case. So ICANON mode was entirely scrogged and would just return
-EFAULT.

That would do it. I'm surprised how well everything I did actually
worked - because all my normal terminal apps (shell, editor etc)
obviously end up not using icanon at all.

I'll have a third patch in a moment, but while it's ready I want to
actually reboot and confirm it first.

             Linus

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-18 22:03           ` Linus Torvalds
@ 2021-01-18 22:20             ` Linus Torvalds
  2021-01-19  1:38               ` Linus Torvalds
  0 siblings, 1 reply; 65+ messages in thread
From: Linus Torvalds @ 2021-01-18 22:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman, Al Viro

[-- Attachment #1: Type: text/plain, Size: 817 bytes --]

On Mon, Jan 18, 2021 at 2:03 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I'll have a third patch in a moment, but while it's ready I want to
> actually reboot and confirm it first.

Here. This is an actually usable and tested starting point for this all.

Again - it doesn't handle HDLC correctly, because it will chop all
reads up into 32-byte chunks.

And some other apps that think they'll get whole lines might end up
being confused too.

So it's not a "real" patch, but with improved buffer handling in
tty_read(), I think this is actually quite close.

NOTE: due to security reasons (ie password data), we do want to clear
that buffer after we've copied the data to user space. Again, not
something I did in my patch, so it would be part of that "improved
buffer handling"

              Linus

[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 16622 bytes --]

 drivers/bluetooth/hci_ldisc.c | 33 +++++++++---------
 drivers/input/serio/serport.c |  3 +-
 drivers/net/ppp/ppp_async.c   |  2 +-
 drivers/net/ppp/ppp_synctty.c |  2 +-
 drivers/tty/n_gsm.c           |  2 +-
 drivers/tty/n_hdlc.c          |  8 ++---
 drivers/tty/n_null.c          |  2 +-
 drivers/tty/n_r3964.c         |  9 ++---
 drivers/tty/n_tracerouter.c   |  2 +-
 drivers/tty/n_tracesink.c     |  2 +-
 drivers/tty/n_tty.c           | 81 ++++++++++++++++++-------------------------
 drivers/tty/tty_io.c          | 13 ++++---
 include/linux/tty_ldisc.h     |  2 +-
 net/nfc/nci/uart.c            |  2 +-
 14 files changed, 74 insertions(+), 89 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index f83d67eafc9f..1adc43e471bc 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -802,7 +802,7 @@ static int hci_uart_tty_ioctl(struct tty_struct *tty, struct file *file,
  * We don't provide read/write/poll interface for user space.
  */
 static ssize_t hci_uart_tty_read(struct tty_struct *tty, struct file *file,
-				 unsigned char __user *buf, size_t nr)
+				 int dummy, unsigned char *buf, size_t nr)
 {
 	return 0;
 }
@@ -819,29 +819,28 @@ static __poll_t hci_uart_tty_poll(struct tty_struct *tty,
 	return 0;
 }
 
+static struct tty_ldisc_ops hci_uart_ldisc = {
+	.owner		= THIS_MODULE,
+	.magic		= TTY_LDISC_MAGIC,
+	.name		= "n_hci",
+	.open		= hci_uart_tty_open,
+	.close		= hci_uart_tty_close,
+	.read		= hci_uart_tty_read,
+	.write		= hci_uart_tty_write,
+	.ioctl		= hci_uart_tty_ioctl,
+	.compat_ioctl	= hci_uart_tty_ioctl,
+	.poll		= hci_uart_tty_poll,
+	.receive_buf	= hci_uart_tty_receive,
+	.write_wakeup	= hci_uart_tty_wakeup,
+};
+
 static int __init hci_uart_init(void)
 {
-	static struct tty_ldisc_ops hci_uart_ldisc;
 	int err;
 
 	BT_INFO("HCI UART driver ver %s", VERSION);
 
 	/* Register the tty discipline */
-
-	memset(&hci_uart_ldisc, 0, sizeof(hci_uart_ldisc));
-	hci_uart_ldisc.magic		= TTY_LDISC_MAGIC;
-	hci_uart_ldisc.name		= "n_hci";
-	hci_uart_ldisc.open		= hci_uart_tty_open;
-	hci_uart_ldisc.close		= hci_uart_tty_close;
-	hci_uart_ldisc.read		= hci_uart_tty_read;
-	hci_uart_ldisc.write		= hci_uart_tty_write;
-	hci_uart_ldisc.ioctl		= hci_uart_tty_ioctl;
-	hci_uart_ldisc.compat_ioctl	= hci_uart_tty_ioctl;
-	hci_uart_ldisc.poll		= hci_uart_tty_poll;
-	hci_uart_ldisc.receive_buf	= hci_uart_tty_receive;
-	hci_uart_ldisc.write_wakeup	= hci_uart_tty_wakeup;
-	hci_uart_ldisc.owner		= THIS_MODULE;
-
 	err = tty_register_ldisc(N_HCI, &hci_uart_ldisc);
 	if (err) {
 		BT_ERR("HCI line discipline registration failed. (%d)", err);
diff --git a/drivers/input/serio/serport.c b/drivers/input/serio/serport.c
index 8ac970a423de..1a9afd350a73 100644
--- a/drivers/input/serio/serport.c
+++ b/drivers/input/serio/serport.c
@@ -156,7 +156,8 @@ static void serport_ldisc_receive(struct tty_struct *tty, const unsigned char *c
  * returning 0 characters.
  */
 
-static ssize_t serport_ldisc_read(struct tty_struct * tty, struct file * file, unsigned char __user * buf, size_t nr)
+static ssize_t serport_ldisc_read(struct tty_struct * tty, struct file * file,
+				  int dummy, unsigned char *kbuf, size_t nr)
 {
 	struct serport *serport = (struct serport*) tty->disc_data;
 	struct serio *serio;
diff --git a/drivers/net/ppp/ppp_async.c b/drivers/net/ppp/ppp_async.c
index 29a0917a81e6..58789a70282d 100644
--- a/drivers/net/ppp/ppp_async.c
+++ b/drivers/net/ppp/ppp_async.c
@@ -259,7 +259,7 @@ static int ppp_asynctty_hangup(struct tty_struct *tty)
  */
 static ssize_t
 ppp_asynctty_read(struct tty_struct *tty, struct file *file,
-		  unsigned char __user *buf, size_t count)
+		  int dummy, unsigned char *buf, size_t count)
 {
 	return -EAGAIN;
 }
diff --git a/drivers/net/ppp/ppp_synctty.c b/drivers/net/ppp/ppp_synctty.c
index 0f338752c38b..7ad4180b1360 100644
--- a/drivers/net/ppp/ppp_synctty.c
+++ b/drivers/net/ppp/ppp_synctty.c
@@ -257,7 +257,7 @@ static int ppp_sync_hangup(struct tty_struct *tty)
  */
 static ssize_t
 ppp_sync_read(struct tty_struct *tty, struct file *file,
-	       unsigned char __user *buf, size_t count)
+	      int dummy, unsigned char *buf, size_t count)
 {
 	return -EAGAIN;
 }
diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index c676fa89ee0b..e37ba8903c82 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -2559,7 +2559,7 @@ static void gsmld_write_wakeup(struct tty_struct *tty)
  */
 
 static ssize_t gsmld_read(struct tty_struct *tty, struct file *file,
-			 unsigned char __user *buf, size_t nr)
+			  int dummy, unsigned char *buf, size_t nr)
 {
 	return -EOPNOTSUPP;
 }
diff --git a/drivers/tty/n_hdlc.c b/drivers/tty/n_hdlc.c
index 12557ee1edb6..c62d719adaaa 100644
--- a/drivers/tty/n_hdlc.c
+++ b/drivers/tty/n_hdlc.c
@@ -416,7 +416,7 @@ static void n_hdlc_tty_receive(struct tty_struct *tty, const __u8 *data,
  * Returns the number of bytes returned or error code.
  */
 static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file,
-			   __u8 __user *buf, size_t nr)
+			   int dummy, __u8 *kbuf, size_t nr)
 {
 	struct n_hdlc *n_hdlc = tty->disc_data;
 	int ret = 0;
@@ -442,10 +442,8 @@ static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file,
 				ret = -EOVERFLOW;
 			} else {
 				__set_current_state(TASK_RUNNING);
-				if (copy_to_user(buf, rbuf->buf, rbuf->count))
-					ret = -EFAULT;
-				else
-					ret = rbuf->count;
+				memcpy(kbuf, rbuf->buf, rbuf->count);
+				ret = rbuf->count;
 			}
 
 			if (n_hdlc->rx_free_buf_list.count >
diff --git a/drivers/tty/n_null.c b/drivers/tty/n_null.c
index 96feabae4740..d42363c62333 100644
--- a/drivers/tty/n_null.c
+++ b/drivers/tty/n_null.c
@@ -20,7 +20,7 @@ static void n_null_close(struct tty_struct *tty)
 }
 
 static ssize_t n_null_read(struct tty_struct *tty, struct file *file,
-			   unsigned char __user * buf, size_t nr)
+			   int dummy, unsigned char *buf, size_t nr)
 {
 	return -EOPNOTSUPP;
 }
diff --git a/drivers/tty/n_r3964.c b/drivers/tty/n_r3964.c
index 934dd2fb2ec8..1c8143276b3e 100644
--- a/drivers/tty/n_r3964.c
+++ b/drivers/tty/n_r3964.c
@@ -129,7 +129,7 @@ static void remove_client_block(struct r3964_info *pInfo,
 static int r3964_open(struct tty_struct *tty);
 static void r3964_close(struct tty_struct *tty);
 static ssize_t r3964_read(struct tty_struct *tty, struct file *file,
-		unsigned char __user * buf, size_t nr);
+		int dummy, unsigned char *buf, size_t nr);
 static ssize_t r3964_write(struct tty_struct *tty, struct file *file,
 		const unsigned char *buf, size_t nr);
 static int r3964_ioctl(struct tty_struct *tty, struct file *file,
@@ -1058,7 +1058,7 @@ static void r3964_close(struct tty_struct *tty)
 }
 
 static ssize_t r3964_read(struct tty_struct *tty, struct file *file,
-			  unsigned char __user * buf, size_t nr)
+			  int dummy, unsigned char *kbuf, size_t nr)
 {
 	struct r3964_info *pInfo = tty->disc_data;
 	struct r3964_client_info *pClient;
@@ -1109,10 +1109,7 @@ static ssize_t r3964_read(struct tty_struct *tty, struct file *file,
 		kfree(pMsg);
 		TRACE_M("r3964_read - msg kfree %p", pMsg);
 
-		if (copy_to_user(buf, &theMsg, ret)) {
-			ret = -EFAULT;
-			goto unlock;
-		}
+		memcpy(kbuf, &theMsg, ret);
 
 		TRACE_PS("read - return %d", ret);
 		goto unlock;
diff --git a/drivers/tty/n_tracerouter.c b/drivers/tty/n_tracerouter.c
index 4479af4d2fa5..9eadaab29d7e 100644
--- a/drivers/tty/n_tracerouter.c
+++ b/drivers/tty/n_tracerouter.c
@@ -118,7 +118,7 @@ static void n_tracerouter_close(struct tty_struct *tty)
  *	 -EINVAL
  */
 static ssize_t n_tracerouter_read(struct tty_struct *tty, struct file *file,
-				  unsigned char __user *buf, size_t nr) {
+				  int dummy, unsigned char *buf, size_t nr) {
 	return -EINVAL;
 }
 
diff --git a/drivers/tty/n_tracesink.c b/drivers/tty/n_tracesink.c
index d96ba82cc356..a8508a07bc0d 100644
--- a/drivers/tty/n_tracesink.c
+++ b/drivers/tty/n_tracesink.c
@@ -115,7 +115,7 @@ static void n_tracesink_close(struct tty_struct *tty)
  *	 -EINVAL
  */
 static ssize_t n_tracesink_read(struct tty_struct *tty, struct file *file,
-				unsigned char __user *buf, size_t nr) {
+				int dummy, unsigned char *buf, size_t nr) {
 	return -EINVAL;
 }
 
diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index 319d68c8a5df..2b75b595a45c 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -164,29 +164,24 @@ static void zero_buffer(struct tty_struct *tty, u8 *buffer, int size)
 		memset(buffer, 0x00, size);
 }
 
-static int tty_copy_to_user(struct tty_struct *tty, void __user *to,
-			    size_t tail, size_t n)
+static void tty_copy(struct tty_struct *tty, void *to, size_t tail, size_t n)
 {
 	struct n_tty_data *ldata = tty->disc_data;
 	size_t size = N_TTY_BUF_SIZE - tail;
 	void *from = read_buf_addr(ldata, tail);
-	int uncopied;
 
 	if (n > size) {
 		tty_audit_add_data(tty, from, size);
-		uncopied = copy_to_user(to, from, size);
-		zero_buffer(tty, from, size - uncopied);
-		if (uncopied)
-			return uncopied;
+		memcpy(to, from, size);
+		zero_buffer(tty, from, size);
 		to += size;
 		n -= size;
 		from = ldata->read_buf;
 	}
 
 	tty_audit_add_data(tty, from, n);
-	uncopied = copy_to_user(to, from, n);
-	zero_buffer(tty, from, n - uncopied);
-	return uncopied;
+	memcpy(to, from, n);
+	zero_buffer(tty, from, n);
 }
 
 /**
@@ -1944,15 +1939,16 @@ static inline int input_available_p(struct tty_struct *tty, int poll)
 /**
  *	copy_from_read_buf	-	copy read data directly
  *	@tty: terminal device
- *	@b: user data
+ *	@kbp: data
  *	@nr: size of data
  *
  *	Helper function to speed up n_tty_read.  It is only called when
- *	ICANON is off; it copies characters straight from the tty queue to
- *	user space directly.  It can be profitably called twice; once to
- *	drain the space from the tail pointer to the (physical) end of the
- *	buffer, and once to drain the space from the (physical) beginning of
- *	the buffer to head pointer.
+ *	ICANON is off; it copies characters straight from the tty queue.
+ *
+ *	It can be profitably called twice; once to drain the space from
+ *	the tail pointer to the (physical) end of the buffer, and once
+ *	to drain the space from the (physical) beginning of the buffer
+ *	to head pointer.
  *
  *	Called under the ldata->atomic_read_lock sem
  *
@@ -1962,7 +1958,7 @@ static inline int input_available_p(struct tty_struct *tty, int poll)
  */
 
 static int copy_from_read_buf(struct tty_struct *tty,
-				      unsigned char __user **b,
+				      unsigned char **kbp,
 				      size_t *nr)
 
 {
@@ -1978,8 +1974,7 @@ static int copy_from_read_buf(struct tty_struct *tty,
 	n = min(*nr, n);
 	if (n) {
 		unsigned char *from = read_buf_addr(ldata, tail);
-		retval = copy_to_user(*b, from, n);
-		n -= retval;
+		memcpy(*kbp, from, n);
 		is_eof = n == 1 && *from == EOF_CHAR(tty);
 		tty_audit_add_data(tty, from, n);
 		zero_buffer(tty, from, n);
@@ -1988,7 +1983,7 @@ static int copy_from_read_buf(struct tty_struct *tty,
 		if (L_EXTPROC(tty) && ldata->icanon && is_eof &&
 		    (head == ldata->read_tail))
 			n = 0;
-		*b += n;
+		*kbp += n;
 		*nr -= n;
 	}
 	return retval;
@@ -1997,12 +1992,12 @@ static int copy_from_read_buf(struct tty_struct *tty,
 /**
  *	canon_copy_from_read_buf	-	copy read data in canonical mode
  *	@tty: terminal device
- *	@b: user data
+ *	@kbp: data
  *	@nr: size of data
  *
  *	Helper function for n_tty_read.  It is only called when ICANON is on;
  *	it copies one line of input up to and including the line-delimiting
- *	character into the user-space buffer.
+ *	character into the result buffer.
  *
  *	NB: When termios is changed from non-canonical to canonical mode and
  *	the read buffer contains data, n_tty_set_termios() simulates an EOF
@@ -2018,14 +2013,14 @@ static int copy_from_read_buf(struct tty_struct *tty,
  */
 
 static int canon_copy_from_read_buf(struct tty_struct *tty,
-				    unsigned char __user **b,
+				    unsigned char **kbp,
 				    size_t *nr)
 {
 	struct n_tty_data *ldata = tty->disc_data;
 	size_t n, size, more, c;
 	size_t eol;
 	size_t tail;
-	int ret, found = 0;
+	int found = 0;
 
 	/* N.B. avoid overrun if nr == 0 */
 	if (!*nr)
@@ -2061,10 +2056,8 @@ static int canon_copy_from_read_buf(struct tty_struct *tty,
 	n_tty_trace("%s: eol:%zu found:%d n:%zu c:%zu tail:%zu more:%zu\n",
 		    __func__, eol, found, n, c, tail, more);
 
-	ret = tty_copy_to_user(tty, *b, tail, n);
-	if (ret)
-		return -EFAULT;
-	*b += n;
+	tty_copy(tty, *kbp, tail, n);
+	*kbp += n;
 	*nr -= n;
 
 	if (found)
@@ -2132,10 +2125,10 @@ static int job_control(struct tty_struct *tty, struct file *file)
  */
 
 static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
-			 unsigned char __user *buf, size_t nr)
+			  int dummy, unsigned char *kbuf, size_t nr)
 {
 	struct n_tty_data *ldata = tty->disc_data;
-	unsigned char __user *b = buf;
+	unsigned char *kb = kbuf;
 	DEFINE_WAIT_FUNC(wait, woken_wake_function);
 	int c;
 	int minimum, time;
@@ -2181,17 +2174,13 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 		/* First test for status change. */
 		if (packet && tty->link->ctrl_status) {
 			unsigned char cs;
-			if (b != buf)
+			if (kb != kbuf)
 				break;
 			spin_lock_irq(&tty->link->ctrl_lock);
 			cs = tty->link->ctrl_status;
 			tty->link->ctrl_status = 0;
 			spin_unlock_irq(&tty->link->ctrl_lock);
-			if (put_user(cs, b)) {
-				retval = -EFAULT;
-				break;
-			}
-			b++;
+			*kb++ = cs;
 			nr--;
 			break;
 		}
@@ -2234,24 +2223,20 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 		}
 
 		if (ldata->icanon && !L_EXTPROC(tty)) {
-			retval = canon_copy_from_read_buf(tty, &b, &nr);
+			retval = canon_copy_from_read_buf(tty, &kb, &nr);
 			if (retval)
 				break;
 		} else {
 			int uncopied;
 
 			/* Deal with packet mode. */
-			if (packet && b == buf) {
-				if (put_user(TIOCPKT_DATA, b)) {
-					retval = -EFAULT;
-					break;
-				}
-				b++;
+			if (packet && kb == kbuf) {
+				*kb++ = TIOCPKT_DATA;
 				nr--;
 			}
 
-			uncopied = copy_from_read_buf(tty, &b, &nr);
-			uncopied += copy_from_read_buf(tty, &b, &nr);
+			uncopied = copy_from_read_buf(tty, &kb, &nr);
+			uncopied += copy_from_read_buf(tty, &kb, &nr);
 			if (uncopied) {
 				retval = -EFAULT;
 				break;
@@ -2260,7 +2245,7 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 
 		n_tty_check_unthrottle(tty);
 
-		if (b - buf >= minimum)
+		if (kb - kbuf >= minimum)
 			break;
 		if (time)
 			timeout = time;
@@ -2272,8 +2257,8 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 	remove_wait_queue(&tty->read_wait, &wait);
 	mutex_unlock(&ldata->atomic_read_lock);
 
-	if (b - buf)
-		retval = b - buf;
+	if (kb - kbuf)
+		retval = kb - kbuf;
 
 	return retval;
 }
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 8034489337d7..d2d558046420 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -864,10 +864,15 @@ static ssize_t tty_read(struct file *file, char __user *buf, size_t count,
 	ld = tty_ldisc_ref_wait(tty);
 	if (!ld)
 		return hung_up_tty_read(file, buf, count, ppos);
-	if (ld->ops->read)
-		i = ld->ops->read(tty, file, buf, count);
-	else
-		i = -EIO;
+	i = -EIO;
+	if (ld->ops->read) {
+		char kernel_buf[32];
+		if (count > sizeof(kernel_buf))
+			count = sizeof(kernel_buf);
+		i = ld->ops->read(tty, file, 0, kernel_buf, count);
+		if (i > 0 && copy_to_user(buf, kernel_buf, i))
+			i = -EFAULT;
+	}
 	tty_ldisc_deref(ld);
 
 	if (i > 0)
diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h
index b1e6043e9917..516d946e57b6 100644
--- a/include/linux/tty_ldisc.h
+++ b/include/linux/tty_ldisc.h
@@ -185,7 +185,7 @@ struct tty_ldisc_ops {
 	void	(*close)(struct tty_struct *);
 	void	(*flush_buffer)(struct tty_struct *tty);
 	ssize_t	(*read)(struct tty_struct *tty, struct file *file,
-			unsigned char __user *buf, size_t nr);
+			int dummy, unsigned char *buf, size_t nr);
 	ssize_t	(*write)(struct tty_struct *tty, struct file *file,
 			 const unsigned char *buf, size_t nr);
 	int	(*ioctl)(struct tty_struct *tty, struct file *file,
diff --git a/net/nfc/nci/uart.c b/net/nfc/nci/uart.c
index 11b554ce07ff..c4d457ba1668 100644
--- a/net/nfc/nci/uart.c
+++ b/net/nfc/nci/uart.c
@@ -292,7 +292,7 @@ static int nci_uart_tty_ioctl(struct tty_struct *tty, struct file *file,
 
 /* We don't provide read/write/poll interface for user space. */
 static ssize_t nci_uart_tty_read(struct tty_struct *tty, struct file *file,
-				 unsigned char __user *buf, size_t nr)
+				 int dummy, unsigned char *buf, size_t nr)
 {
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-18 22:20             ` Linus Torvalds
@ 2021-01-19  1:38               ` Linus Torvalds
  2021-01-19 11:53                 ` Greg Kroah-Hartman
  0 siblings, 1 reply; 65+ messages in thread
From: Linus Torvalds @ 2021-01-19  1:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman,
	Al Viro, Jiri Slaby

[-- Attachment #1: Type: text/plain, Size: 2844 bytes --]

On Mon, Jan 18, 2021 at 2:20 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So it's not a "real" patch, but with improved buffer handling in
> tty_read(), I think this is actually quite close.

Hmm.

I somehow ended up working on this all because it's a Monday, and I
don't see a lot of pull requests early in the week.

And I think I have a solution for the HDLC "we may need to copy a
packet that might be up to 64kB" issue, that isn't really all that
ugly.

We can just iterate over a random "cookie" that the line discipline
can use any way it wants to. In the case of n_hdlc, it can just put
the 'rbuf' thing it has into that cookie, and then it can copy it all
piece-meal until it is all used up. And if it runs out of space in the
middle, it will return -EOVERFLOW, and we're all good.

The only other thing such a line discipline needs is the offset into
the cookie, but the iterator has to maintain that anyway, so that's
simple enough.

So here's a fourth patch for this thing today, this time with what I
think is actually a working model for the buffer handling.

Other line disciplines *could* use the cookie if they want to. I
didn't do any of that, though.

The normal n_tty line discipline, for example, could easily just loop
over the data. It doesn't need an offset or that 'rbuf' pointer, but
it still needs to know whether the call is the first one or not,
because the first time the n_tty line discipline is called it may need
to wait for a minimum number of characters or whatever the termios
settings say - but obviously once it has waited for it once, it
shouldn't wait for it again the next time around (only on the next
actual full read()). IOW, it would be wrong if the termios said "wait
for 5 characters", and then it saw 68 characters, copied the first 64,
in the first iteration, and than saw "oh, now there are only 4
characters left so now I have to wait for a fifth".

So n_tty could use the cookie purely to see whether it's the first
iteration or not, and it could just set the cookie to a random value
(it always starts out as NULL) to just show what state it is in.

I did *NOT* do that, because it's not technically necessary - unlike
the hdlc packet case, n_tty returning a partial result is not wrong
per se even if we might have reasons to improve on it later.

What do people think about this?

Also, does anybody have any test-code for the HDLC case? I did find an
interesting comment when doing a Debian code search:

  /* Bloody hell... readv doesn't work with N_HDLC line discipline... GRR! */

and yes, this model would allow us to handle readv() properly for hdlc
(and no, the old one did not, because it really wanted to see the
whole packet in *one* user buffer).

But I have no idea if hdlc is even relevant any more, and if anybody
really cares.

Anybody?

              Linus

[-- Attachment #2: 0001-tty-convert-tty_ldisc_ops-read-function-to-take-a-ke.patch --]
[-- Type: text/x-patch, Size: 21264 bytes --]

From 526258e18d13ebb113638d26eba18ca05a51dc2a Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 18 Jan 2021 13:31:30 -0800
Subject: [PATCH] tty: convert tty_ldisc_ops 'read()' function to take a kernel
 pointer

The tty line discipline .read() function was passed the final user
pointer destination as an argument, which doesn't match the 'write()'
function, and makes it very inconvenient to do a splice method for
tty's.

This is a conversion to use a kernel buffer instead.

NOTE! It does this by passing the tty line discipline ->read() function
an additional "cookie" to fill in, and an offset into the cookie data.

The line discipline can fill in the cookie data with its own private
information, and then the reader will repeat the read until either the
cookie is cleared or it runs out of data.

The only real user of this is N_HDLC, which can use this to handle big
packets, even if the kernel buffer is smaller than the whole packet.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/bluetooth/hci_ldisc.c | 34 +++++++--------
 drivers/input/serio/serport.c |  4 +-
 drivers/net/ppp/ppp_async.c   |  3 +-
 drivers/net/ppp/ppp_synctty.c |  3 +-
 drivers/tty/n_gsm.c           |  3 +-
 drivers/tty/n_hdlc.c          | 60 +++++++++++++++++--------
 drivers/tty/n_null.c          |  3 +-
 drivers/tty/n_r3964.c         | 10 ++---
 drivers/tty/n_tracerouter.c   |  4 +-
 drivers/tty/n_tracesink.c     |  4 +-
 drivers/tty/n_tty.c           | 82 +++++++++++++++--------------------
 drivers/tty/tty_io.c          | 64 +++++++++++++++++++++++++--
 include/linux/tty_ldisc.h     |  3 +-
 net/nfc/nci/uart.c            |  3 +-
 14 files changed, 178 insertions(+), 102 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index f83d67eafc9f..dd92aea15b8b 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -802,7 +802,8 @@ static int hci_uart_tty_ioctl(struct tty_struct *tty, struct file *file,
  * We don't provide read/write/poll interface for user space.
  */
 static ssize_t hci_uart_tty_read(struct tty_struct *tty, struct file *file,
-				 unsigned char __user *buf, size_t nr)
+				 unsigned char *buf, size_t nr,
+				 void **cookie, unsigned long offset)
 {
 	return 0;
 }
@@ -819,29 +820,28 @@ static __poll_t hci_uart_tty_poll(struct tty_struct *tty,
 	return 0;
 }
 
+static struct tty_ldisc_ops hci_uart_ldisc = {
+	.owner		= THIS_MODULE,
+	.magic		= TTY_LDISC_MAGIC,
+	.name		= "n_hci",
+	.open		= hci_uart_tty_open,
+	.close		= hci_uart_tty_close,
+	.read		= hci_uart_tty_read,
+	.write		= hci_uart_tty_write,
+	.ioctl		= hci_uart_tty_ioctl,
+	.compat_ioctl	= hci_uart_tty_ioctl,
+	.poll		= hci_uart_tty_poll,
+	.receive_buf	= hci_uart_tty_receive,
+	.write_wakeup	= hci_uart_tty_wakeup,
+};
+
 static int __init hci_uart_init(void)
 {
-	static struct tty_ldisc_ops hci_uart_ldisc;
 	int err;
 
 	BT_INFO("HCI UART driver ver %s", VERSION);
 
 	/* Register the tty discipline */
-
-	memset(&hci_uart_ldisc, 0, sizeof(hci_uart_ldisc));
-	hci_uart_ldisc.magic		= TTY_LDISC_MAGIC;
-	hci_uart_ldisc.name		= "n_hci";
-	hci_uart_ldisc.open		= hci_uart_tty_open;
-	hci_uart_ldisc.close		= hci_uart_tty_close;
-	hci_uart_ldisc.read		= hci_uart_tty_read;
-	hci_uart_ldisc.write		= hci_uart_tty_write;
-	hci_uart_ldisc.ioctl		= hci_uart_tty_ioctl;
-	hci_uart_ldisc.compat_ioctl	= hci_uart_tty_ioctl;
-	hci_uart_ldisc.poll		= hci_uart_tty_poll;
-	hci_uart_ldisc.receive_buf	= hci_uart_tty_receive;
-	hci_uart_ldisc.write_wakeup	= hci_uart_tty_wakeup;
-	hci_uart_ldisc.owner		= THIS_MODULE;
-
 	err = tty_register_ldisc(N_HCI, &hci_uart_ldisc);
 	if (err) {
 		BT_ERR("HCI line discipline registration failed. (%d)", err);
diff --git a/drivers/input/serio/serport.c b/drivers/input/serio/serport.c
index 8ac970a423de..33e9d9bfd036 100644
--- a/drivers/input/serio/serport.c
+++ b/drivers/input/serio/serport.c
@@ -156,7 +156,9 @@ static void serport_ldisc_receive(struct tty_struct *tty, const unsigned char *c
  * returning 0 characters.
  */
 
-static ssize_t serport_ldisc_read(struct tty_struct * tty, struct file * file, unsigned char __user * buf, size_t nr)
+static ssize_t serport_ldisc_read(struct tty_struct * tty, struct file * file,
+				  unsigned char *kbuf, size_t nr,
+				  void **cookie, unsigned long offset)
 {
 	struct serport *serport = (struct serport*) tty->disc_data;
 	struct serio *serio;
diff --git a/drivers/net/ppp/ppp_async.c b/drivers/net/ppp/ppp_async.c
index 29a0917a81e6..f14a9d190de9 100644
--- a/drivers/net/ppp/ppp_async.c
+++ b/drivers/net/ppp/ppp_async.c
@@ -259,7 +259,8 @@ static int ppp_asynctty_hangup(struct tty_struct *tty)
  */
 static ssize_t
 ppp_asynctty_read(struct tty_struct *tty, struct file *file,
-		  unsigned char __user *buf, size_t count)
+		  unsigned char *buf, size_t count,
+		  void **cookie, unsigned long offset)
 {
 	return -EAGAIN;
 }
diff --git a/drivers/net/ppp/ppp_synctty.c b/drivers/net/ppp/ppp_synctty.c
index 0f338752c38b..f774b7e52da4 100644
--- a/drivers/net/ppp/ppp_synctty.c
+++ b/drivers/net/ppp/ppp_synctty.c
@@ -257,7 +257,8 @@ static int ppp_sync_hangup(struct tty_struct *tty)
  */
 static ssize_t
 ppp_sync_read(struct tty_struct *tty, struct file *file,
-	       unsigned char __user *buf, size_t count)
+	      unsigned char *buf, size_t count,
+	      void **cookie, unsigned long offset)
 {
 	return -EAGAIN;
 }
diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index c676fa89ee0b..51dafc06f541 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -2559,7 +2559,8 @@ static void gsmld_write_wakeup(struct tty_struct *tty)
  */
 
 static ssize_t gsmld_read(struct tty_struct *tty, struct file *file,
-			 unsigned char __user *buf, size_t nr)
+			  unsigned char *buf, size_t nr,
+			  void **cookie, unsigned long offset)
 {
 	return -EOPNOTSUPP;
 }
diff --git a/drivers/tty/n_hdlc.c b/drivers/tty/n_hdlc.c
index 12557ee1edb6..1363e659dc1d 100644
--- a/drivers/tty/n_hdlc.c
+++ b/drivers/tty/n_hdlc.c
@@ -416,13 +416,19 @@ static void n_hdlc_tty_receive(struct tty_struct *tty, const __u8 *data,
  * Returns the number of bytes returned or error code.
  */
 static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file,
-			   __u8 __user *buf, size_t nr)
+			   __u8 *kbuf, size_t nr,
+			   void **cookie, unsigned long offset)
 {
 	struct n_hdlc *n_hdlc = tty->disc_data;
 	int ret = 0;
 	struct n_hdlc_buf *rbuf;
 	DECLARE_WAITQUEUE(wait, current);
 
+	/* Is this a repeated call for an rbuf we already found earlier? */
+	rbuf = *cookie;
+	if (rbuf)
+		goto have_rbuf;
+
 	add_wait_queue(&tty->read_wait, &wait);
 
 	for (;;) {
@@ -436,25 +442,8 @@ static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file,
 		set_current_state(TASK_INTERRUPTIBLE);
 
 		rbuf = n_hdlc_buf_get(&n_hdlc->rx_buf_list);
-		if (rbuf) {
-			if (rbuf->count > nr) {
-				/* too large for caller's buffer */
-				ret = -EOVERFLOW;
-			} else {
-				__set_current_state(TASK_RUNNING);
-				if (copy_to_user(buf, rbuf->buf, rbuf->count))
-					ret = -EFAULT;
-				else
-					ret = rbuf->count;
-			}
-
-			if (n_hdlc->rx_free_buf_list.count >
-			    DEFAULT_RX_BUF_COUNT)
-				kfree(rbuf);
-			else
-				n_hdlc_buf_put(&n_hdlc->rx_free_buf_list, rbuf);
+		if (rbuf)
 			break;
-		}
 
 		/* no data */
 		if (tty_io_nonblock(tty, file)) {
@@ -473,6 +462,39 @@ static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file,
 	remove_wait_queue(&tty->read_wait, &wait);
 	__set_current_state(TASK_RUNNING);
 
+	if (!rbuf)
+		return ret;
+	*cookie = rbuf;
+
+have_rbuf:
+	/* Have we used it up entirely? */
+	if (offset >= rbuf->count)
+		goto done_with_rbuf;
+
+	/* More data to go, but can't copy any more? EOVERFLOW */
+	ret = -EOVERFLOW;
+	if (!nr)
+		goto done_with_rbuf;
+
+	/* Copy as much data as possible */
+	ret = rbuf->count - offset;
+	if (ret > nr)
+		ret = nr;
+	memcpy(kbuf, rbuf->buf+offset, ret);
+	offset += ret;
+
+	/* If we still have data left, we leave the rbuf in the cookie */
+	if (offset < rbuf->count)
+		return ret;
+
+done_with_rbuf:
+	*cookie = NULL;
+
+	if (n_hdlc->rx_free_buf_list.count > DEFAULT_RX_BUF_COUNT)
+		kfree(rbuf);
+	else
+		n_hdlc_buf_put(&n_hdlc->rx_free_buf_list, rbuf);
+
 	return ret;
 
 }	/* end of n_hdlc_tty_read() */
diff --git a/drivers/tty/n_null.c b/drivers/tty/n_null.c
index 96feabae4740..ce03ae78f5c6 100644
--- a/drivers/tty/n_null.c
+++ b/drivers/tty/n_null.c
@@ -20,7 +20,8 @@ static void n_null_close(struct tty_struct *tty)
 }
 
 static ssize_t n_null_read(struct tty_struct *tty, struct file *file,
-			   unsigned char __user * buf, size_t nr)
+			   unsigned char *buf, size_t nr,
+			   void **cookie, unsigned long offset)
 {
 	return -EOPNOTSUPP;
 }
diff --git a/drivers/tty/n_r3964.c b/drivers/tty/n_r3964.c
index 934dd2fb2ec8..3161f0a535e3 100644
--- a/drivers/tty/n_r3964.c
+++ b/drivers/tty/n_r3964.c
@@ -129,7 +129,7 @@ static void remove_client_block(struct r3964_info *pInfo,
 static int r3964_open(struct tty_struct *tty);
 static void r3964_close(struct tty_struct *tty);
 static ssize_t r3964_read(struct tty_struct *tty, struct file *file,
-		unsigned char __user * buf, size_t nr);
+		void *cookie, unsigned char *buf, size_t nr);
 static ssize_t r3964_write(struct tty_struct *tty, struct file *file,
 		const unsigned char *buf, size_t nr);
 static int r3964_ioctl(struct tty_struct *tty, struct file *file,
@@ -1058,7 +1058,8 @@ static void r3964_close(struct tty_struct *tty)
 }
 
 static ssize_t r3964_read(struct tty_struct *tty, struct file *file,
-			  unsigned char __user * buf, size_t nr)
+			  unsigned char *kbuf, size_t nr,
+			  void **cookie, unsigned long offset)
 {
 	struct r3964_info *pInfo = tty->disc_data;
 	struct r3964_client_info *pClient;
@@ -1109,10 +1110,7 @@ static ssize_t r3964_read(struct tty_struct *tty, struct file *file,
 		kfree(pMsg);
 		TRACE_M("r3964_read - msg kfree %p", pMsg);
 
-		if (copy_to_user(buf, &theMsg, ret)) {
-			ret = -EFAULT;
-			goto unlock;
-		}
+		memcpy(kbuf, &theMsg, ret);
 
 		TRACE_PS("read - return %d", ret);
 		goto unlock;
diff --git a/drivers/tty/n_tracerouter.c b/drivers/tty/n_tracerouter.c
index 4479af4d2fa5..3490ed51b1a3 100644
--- a/drivers/tty/n_tracerouter.c
+++ b/drivers/tty/n_tracerouter.c
@@ -118,7 +118,9 @@ static void n_tracerouter_close(struct tty_struct *tty)
  *	 -EINVAL
  */
 static ssize_t n_tracerouter_read(struct tty_struct *tty, struct file *file,
-				  unsigned char __user *buf, size_t nr) {
+				  unsigned char *buf, size_t nr,
+				  void **cookie, unsigned long offset)
+{
 	return -EINVAL;
 }
 
diff --git a/drivers/tty/n_tracesink.c b/drivers/tty/n_tracesink.c
index d96ba82cc356..1d9931041fd8 100644
--- a/drivers/tty/n_tracesink.c
+++ b/drivers/tty/n_tracesink.c
@@ -115,7 +115,9 @@ static void n_tracesink_close(struct tty_struct *tty)
  *	 -EINVAL
  */
 static ssize_t n_tracesink_read(struct tty_struct *tty, struct file *file,
-				unsigned char __user *buf, size_t nr) {
+				unsigned char *buf, size_t nr,
+				void **cookie, unsigned long offset)
+{
 	return -EINVAL;
 }
 
diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index 319d68c8a5df..2f2f57a53968 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -164,29 +164,24 @@ static void zero_buffer(struct tty_struct *tty, u8 *buffer, int size)
 		memset(buffer, 0x00, size);
 }
 
-static int tty_copy_to_user(struct tty_struct *tty, void __user *to,
-			    size_t tail, size_t n)
+static void tty_copy(struct tty_struct *tty, void *to, size_t tail, size_t n)
 {
 	struct n_tty_data *ldata = tty->disc_data;
 	size_t size = N_TTY_BUF_SIZE - tail;
 	void *from = read_buf_addr(ldata, tail);
-	int uncopied;
 
 	if (n > size) {
 		tty_audit_add_data(tty, from, size);
-		uncopied = copy_to_user(to, from, size);
-		zero_buffer(tty, from, size - uncopied);
-		if (uncopied)
-			return uncopied;
+		memcpy(to, from, size);
+		zero_buffer(tty, from, size);
 		to += size;
 		n -= size;
 		from = ldata->read_buf;
 	}
 
 	tty_audit_add_data(tty, from, n);
-	uncopied = copy_to_user(to, from, n);
-	zero_buffer(tty, from, n - uncopied);
-	return uncopied;
+	memcpy(to, from, n);
+	zero_buffer(tty, from, n);
 }
 
 /**
@@ -1944,15 +1939,16 @@ static inline int input_available_p(struct tty_struct *tty, int poll)
 /**
  *	copy_from_read_buf	-	copy read data directly
  *	@tty: terminal device
- *	@b: user data
+ *	@kbp: data
  *	@nr: size of data
  *
  *	Helper function to speed up n_tty_read.  It is only called when
- *	ICANON is off; it copies characters straight from the tty queue to
- *	user space directly.  It can be profitably called twice; once to
- *	drain the space from the tail pointer to the (physical) end of the
- *	buffer, and once to drain the space from the (physical) beginning of
- *	the buffer to head pointer.
+ *	ICANON is off; it copies characters straight from the tty queue.
+ *
+ *	It can be profitably called twice; once to drain the space from
+ *	the tail pointer to the (physical) end of the buffer, and once
+ *	to drain the space from the (physical) beginning of the buffer
+ *	to head pointer.
  *
  *	Called under the ldata->atomic_read_lock sem
  *
@@ -1962,7 +1958,7 @@ static inline int input_available_p(struct tty_struct *tty, int poll)
  */
 
 static int copy_from_read_buf(struct tty_struct *tty,
-				      unsigned char __user **b,
+				      unsigned char **kbp,
 				      size_t *nr)
 
 {
@@ -1978,8 +1974,7 @@ static int copy_from_read_buf(struct tty_struct *tty,
 	n = min(*nr, n);
 	if (n) {
 		unsigned char *from = read_buf_addr(ldata, tail);
-		retval = copy_to_user(*b, from, n);
-		n -= retval;
+		memcpy(*kbp, from, n);
 		is_eof = n == 1 && *from == EOF_CHAR(tty);
 		tty_audit_add_data(tty, from, n);
 		zero_buffer(tty, from, n);
@@ -1988,7 +1983,7 @@ static int copy_from_read_buf(struct tty_struct *tty,
 		if (L_EXTPROC(tty) && ldata->icanon && is_eof &&
 		    (head == ldata->read_tail))
 			n = 0;
-		*b += n;
+		*kbp += n;
 		*nr -= n;
 	}
 	return retval;
@@ -1997,12 +1992,12 @@ static int copy_from_read_buf(struct tty_struct *tty,
 /**
  *	canon_copy_from_read_buf	-	copy read data in canonical mode
  *	@tty: terminal device
- *	@b: user data
+ *	@kbp: data
  *	@nr: size of data
  *
  *	Helper function for n_tty_read.  It is only called when ICANON is on;
  *	it copies one line of input up to and including the line-delimiting
- *	character into the user-space buffer.
+ *	character into the result buffer.
  *
  *	NB: When termios is changed from non-canonical to canonical mode and
  *	the read buffer contains data, n_tty_set_termios() simulates an EOF
@@ -2018,14 +2013,14 @@ static int copy_from_read_buf(struct tty_struct *tty,
  */
 
 static int canon_copy_from_read_buf(struct tty_struct *tty,
-				    unsigned char __user **b,
+				    unsigned char **kbp,
 				    size_t *nr)
 {
 	struct n_tty_data *ldata = tty->disc_data;
 	size_t n, size, more, c;
 	size_t eol;
 	size_t tail;
-	int ret, found = 0;
+	int found = 0;
 
 	/* N.B. avoid overrun if nr == 0 */
 	if (!*nr)
@@ -2061,10 +2056,8 @@ static int canon_copy_from_read_buf(struct tty_struct *tty,
 	n_tty_trace("%s: eol:%zu found:%d n:%zu c:%zu tail:%zu more:%zu\n",
 		    __func__, eol, found, n, c, tail, more);
 
-	ret = tty_copy_to_user(tty, *b, tail, n);
-	if (ret)
-		return -EFAULT;
-	*b += n;
+	tty_copy(tty, *kbp, tail, n);
+	*kbp += n;
 	*nr -= n;
 
 	if (found)
@@ -2132,10 +2125,11 @@ static int job_control(struct tty_struct *tty, struct file *file)
  */
 
 static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
-			 unsigned char __user *buf, size_t nr)
+			  unsigned char *kbuf, size_t nr,
+			  void **cookie, unsigned long offset)
 {
 	struct n_tty_data *ldata = tty->disc_data;
-	unsigned char __user *b = buf;
+	unsigned char *kb = kbuf;
 	DEFINE_WAIT_FUNC(wait, woken_wake_function);
 	int c;
 	int minimum, time;
@@ -2181,17 +2175,13 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 		/* First test for status change. */
 		if (packet && tty->link->ctrl_status) {
 			unsigned char cs;
-			if (b != buf)
+			if (kb != kbuf)
 				break;
 			spin_lock_irq(&tty->link->ctrl_lock);
 			cs = tty->link->ctrl_status;
 			tty->link->ctrl_status = 0;
 			spin_unlock_irq(&tty->link->ctrl_lock);
-			if (put_user(cs, b)) {
-				retval = -EFAULT;
-				break;
-			}
-			b++;
+			*kb++ = cs;
 			nr--;
 			break;
 		}
@@ -2234,24 +2224,20 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 		}
 
 		if (ldata->icanon && !L_EXTPROC(tty)) {
-			retval = canon_copy_from_read_buf(tty, &b, &nr);
+			retval = canon_copy_from_read_buf(tty, &kb, &nr);
 			if (retval)
 				break;
 		} else {
 			int uncopied;
 
 			/* Deal with packet mode. */
-			if (packet && b == buf) {
-				if (put_user(TIOCPKT_DATA, b)) {
-					retval = -EFAULT;
-					break;
-				}
-				b++;
+			if (packet && kb == kbuf) {
+				*kb++ = TIOCPKT_DATA;
 				nr--;
 			}
 
-			uncopied = copy_from_read_buf(tty, &b, &nr);
-			uncopied += copy_from_read_buf(tty, &b, &nr);
+			uncopied = copy_from_read_buf(tty, &kb, &nr);
+			uncopied += copy_from_read_buf(tty, &kb, &nr);
 			if (uncopied) {
 				retval = -EFAULT;
 				break;
@@ -2260,7 +2246,7 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 
 		n_tty_check_unthrottle(tty);
 
-		if (b - buf >= minimum)
+		if (kb - kbuf >= minimum)
 			break;
 		if (time)
 			timeout = time;
@@ -2272,8 +2258,8 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 	remove_wait_queue(&tty->read_wait, &wait);
 	mutex_unlock(&ldata->atomic_read_lock);
 
-	if (b - buf)
-		retval = b - buf;
+	if (kb - kbuf)
+		retval = kb - kbuf;
 
 	return retval;
 }
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 8034489337d7..821657e8a88f 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -831,6 +831,65 @@ static void tty_update_time(struct timespec64 *time)
 		time->tv_sec = sec;
 }
 
+/*
+ * Iterate on the ldisc ->read() function until we've gotten all
+ * the data the ldisc has for us.
+ *
+ * The "cookie" is something that the ldisc read function can fill
+ * in to let us know that there is more data to be had.
+ *
+ * We promise to continue to call the ldisc until it stops returning
+ * data or clears the cookie. The cookie may be something that the
+ * ldisc maintains state for and needs to free.
+ */
+static int iterate_tty_read(struct tty_ldisc *ld, struct tty_struct *tty, struct file *file,
+		char __user *buf, size_t count)
+{
+	int retval = 0;
+	void *cookie = NULL;
+	unsigned long offset = 0;
+	char kernel_buf[64];
+
+	do {
+		int size, uncopied;
+
+		size = count > sizeof(kernel_buf) ? sizeof(kernel_buf) : count;
+		size = ld->ops->read(tty, file, kernel_buf, size, &cookie, offset);
+		if (!size)
+			break;
+
+		/*
+		 * A ldisc read error return will override any previously copied
+		 * data (eg -EOVERFLOW from HDLC)
+		 */
+		if (size < 0) {
+			memzero_explicit(kernel_buf, sizeof(kernel_buf));
+			return size;
+		}
+
+		uncopied = copy_to_user(buf+offset, kernel_buf, size);
+		size -= uncopied;
+		offset += size;
+		count -= size;
+
+		/*
+		 * If the user copy failed, we still need to do another ->read()
+		 * call if we had a cookie to let the ldisc clear up.
+		 *
+		 * But make sure size is zeroed.
+		 */
+		if (unlikely(uncopied)) {
+			count = 0;
+			retval = -EFAULT;
+		}
+	} while (cookie);
+
+	/* We always clear tty buffer in case they contained passwords */
+	memzero_explicit(kernel_buf, sizeof(kernel_buf));
+	return offset ? offset : retval;
+}
+
+
 /**
  *	tty_read	-	read method for tty device files
  *	@file: pointer to tty file
@@ -864,10 +923,9 @@ static ssize_t tty_read(struct file *file, char __user *buf, size_t count,
 	ld = tty_ldisc_ref_wait(tty);
 	if (!ld)
 		return hung_up_tty_read(file, buf, count, ppos);
+	i = -EIO;
 	if (ld->ops->read)
-		i = ld->ops->read(tty, file, buf, count);
-	else
-		i = -EIO;
+		i = iterate_tty_read(ld, tty, file, buf, count);
 	tty_ldisc_deref(ld);
 
 	if (i > 0)
diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h
index b1e6043e9917..572a07976116 100644
--- a/include/linux/tty_ldisc.h
+++ b/include/linux/tty_ldisc.h
@@ -185,7 +185,8 @@ struct tty_ldisc_ops {
 	void	(*close)(struct tty_struct *);
 	void	(*flush_buffer)(struct tty_struct *tty);
 	ssize_t	(*read)(struct tty_struct *tty, struct file *file,
-			unsigned char __user *buf, size_t nr);
+			unsigned char *buf, size_t nr,
+			void **cookie, unsigned long offset);
 	ssize_t	(*write)(struct tty_struct *tty, struct file *file,
 			 const unsigned char *buf, size_t nr);
 	int	(*ioctl)(struct tty_struct *tty, struct file *file,
diff --git a/net/nfc/nci/uart.c b/net/nfc/nci/uart.c
index 11b554ce07ff..1204c438e87d 100644
--- a/net/nfc/nci/uart.c
+++ b/net/nfc/nci/uart.c
@@ -292,7 +292,8 @@ static int nci_uart_tty_ioctl(struct tty_struct *tty, struct file *file,
 
 /* We don't provide read/write/poll interface for user space. */
 static ssize_t nci_uart_tty_read(struct tty_struct *tty, struct file *file,
-				 unsigned char __user *buf, size_t nr)
+				 unsigned char *buf, size_t nr,
+				 void **cookie, unsigned long offset)
 {
 	return 0;
 }
-- 
2.29.2.157.g1d47791a39


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-18 21:35       ` Linus Torvalds
  2021-01-18 21:54         ` Linus Torvalds
@ 2021-01-19 11:52         ` Greg Kroah-Hartman
  1 sibling, 0 replies; 65+ messages in thread
From: Greg Kroah-Hartman @ 2021-01-19 11:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Al Viro

On Mon, Jan 18, 2021 at 01:35:15PM -0800, Linus Torvalds wrote:
> On Mon, Jan 18, 2021 at 12:24 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Anybody want to play with this? I'd suggest keeping that "dummy"
> > parameter around for a while - I did an allmodconfig build with this,
> > but if there are any architecture-specific non-x86-64 cases I wouldn't
> > have seen them.
> 
> Not architecture-specific, but I did find by some grepping that I
> missed one line discipline driver: the Siemens R3964.
> 
> The reason I missed that is because it's been marked BROKEN in the
> Kconfig for almost two years, so it didn't show up in my allmodconfig
> coverage.

I need to just delete that thing now, obviously no one uses it anymore,
sorry for it getting in the way...

greg k-h

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-19  1:38               ` Linus Torvalds
@ 2021-01-19 11:53                 ` Greg Kroah-Hartman
  2021-01-19 16:56                   ` Robert Karszniewicz
                                     ` (2 more replies)
  0 siblings, 3 replies; 65+ messages in thread
From: Greg Kroah-Hartman @ 2021-01-19 11:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List,
	Al Viro, Jiri Slaby

On Mon, Jan 18, 2021 at 05:38:55PM -0800, Linus Torvalds wrote:
> On Mon, Jan 18, 2021 at 2:20 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > So it's not a "real" patch, but with improved buffer handling in
> > tty_read(), I think this is actually quite close.
> 
> Hmm.
> 
> I somehow ended up working on this all because it's a Monday, and I
> don't see a lot of pull requests early in the week.
> 
> And I think I have a solution for the HDLC "we may need to copy a
> packet that might be up to 64kB" issue, that isn't really all that
> ugly.
> 
> We can just iterate over a random "cookie" that the line discipline
> can use any way it wants to. In the case of n_hdlc, it can just put
> the 'rbuf' thing it has into that cookie, and then it can copy it all
> piece-meal until it is all used up. And if it runs out of space in the
> middle, it will return -EOVERFLOW, and we're all good.
> 
> The only other thing such a line discipline needs is the offset into
> the cookie, but the iterator has to maintain that anyway, so that's
> simple enough.
> 
> So here's a fourth patch for this thing today, this time with what I
> think is actually a working model for the buffer handling.
> 
> Other line disciplines *could* use the cookie if they want to. I
> didn't do any of that, though.
> 
> The normal n_tty line discipline, for example, could easily just loop
> over the data. It doesn't need an offset or that 'rbuf' pointer, but
> it still needs to know whether the call is the first one or not,
> because the first time the n_tty line discipline is called it may need
> to wait for a minimum number of characters or whatever the termios
> settings say - but obviously once it has waited for it once, it
> shouldn't wait for it again the next time around (only on the next
> actual full read()). IOW, it would be wrong if the termios said "wait
> for 5 characters", and then it saw 68 characters, copied the first 64,
> in the first iteration, and than saw "oh, now there are only 4
> characters left so now I have to wait for a fifth".
> 
> So n_tty could use the cookie purely to see whether it's the first
> iteration or not, and it could just set the cookie to a random value
> (it always starts out as NULL) to just show what state it is in.
> 
> I did *NOT* do that, because it's not technically necessary - unlike
> the hdlc packet case, n_tty returning a partial result is not wrong
> per se even if we might have reasons to improve on it later.
> 
> What do people think about this?
> 
> Also, does anybody have any test-code for the HDLC case? I did find an
> interesting comment when doing a Debian code search:
> 
>   /* Bloody hell... readv doesn't work with N_HDLC line discipline... GRR! */
> 
> and yes, this model would allow us to handle readv() properly for hdlc
> (and no, the old one did not, because it really wanted to see the
> whole packet in *one* user buffer).
> 
> But I have no idea if hdlc is even relevant any more, and if anybody
> really cares.
> 
> Anybody?


This looks sane, but I'm still missing what the goal of this is here.
It's nice from a "don't make the ldisc do the userspace copy", point of
view, but what is the next step in order to tie that into splice?

I ask as I also have reports that sysfs binary files are now failing for
this same reason, so I need to make the same change for them and it's
not excatly obvious what to do:
	https://lore.kernel.org/r/1adf9aa4-ed7e-8f05-a354-57419d61ec18@codeaurora.org

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-19 11:53                 ` Greg Kroah-Hartman
@ 2021-01-19 16:56                   ` Robert Karszniewicz
  2021-01-19 17:10                     ` Robert Karszniewicz
  2021-01-19 22:09                     ` Oliver Giles
  2021-01-19 17:25                   ` Linus Torvalds
  2021-01-19 20:24                   ` Linus Torvalds
  2 siblings, 2 replies; 65+ messages in thread
From: Robert Karszniewicz @ 2021-01-19 16:56 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Linus Torvalds
  Cc: Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List,
	Al Viro, Jiri Slaby

On 1/19/21 12:53 PM, Greg Kroah-Hartman wrote:
> This looks sane, but I'm still missing what the goal of this is here.
> It's nice from a "don't make the ldisc do the userspace copy", point of
> view, but what is the next step in order to tie that into splice?
> 
> I ask as I also have reports that sysfs binary files are now failing for
> this same reason, so I need to make the same change for them and it's
> not excatly obvious what to do:
> 	https://lore.kernel.org/r/1adf9aa4-ed7e-8f05-a354-57419d61ec18@codeaurora.org

I would like to confirm this. We are using firmwared and it returns EINVAL on
sendfile(), too. I have tried setting the .splice_write callback as in the
linked thread, but it didn't help, it just EINVAL'd in a different place.

I have bisected this issue down to this commit:
4d03e3cc5982 ("fs: don't allow kernel reads and writes without iter ops")

Another case I've also noticed is writing to a serial connection:
kernel write not supported for file /ttymxc0 (pid: 252 comm: cat)

(Which still prints, though, because cat falls back to write(2), I suppose.)

Thank you,
Robert

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-19 16:56                   ` Robert Karszniewicz
@ 2021-01-19 17:10                     ` Robert Karszniewicz
  2021-01-19 22:09                     ` Oliver Giles
  1 sibling, 0 replies; 65+ messages in thread
From: Robert Karszniewicz @ 2021-01-19 17:10 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Linus Torvalds
  Cc: Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List,
	Al Viro, Jiri Slaby

On 1/19/21 5:56 PM, Robert Karszniewicz wrote:
> Another case I've also noticed is writing to a serial connection:
> kernel write not supported for file /ttymxc0 (pid: 252 comm: cat)

Which is actually a TTY and I just failed to see the forest for all the trees...

Cheers.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-19 11:53                 ` Greg Kroah-Hartman
  2021-01-19 16:56                   ` Robert Karszniewicz
@ 2021-01-19 17:25                   ` Linus Torvalds
  2021-01-19 20:24                   ` Linus Torvalds
  2 siblings, 0 replies; 65+ messages in thread
From: Linus Torvalds @ 2021-01-19 17:25 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List,
	Al Viro, Jiri Slaby

On Tue, Jan 19, 2021 at 3:54 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> This looks sane, but I'm still missing what the goal of this is here.
> It's nice from a "don't make the ldisc do the userspace copy", point of
> view, but what is the next step in order to tie that into splice?

I'll cook something up. With this, it should be fairly easy to add
both the splice and iov_iter versions, because now it only needs the
wrappers in tty_io.c, not for each line discipline.

I hope. Let's see..

           Linus

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-19 11:53                 ` Greg Kroah-Hartman
  2021-01-19 16:56                   ` Robert Karszniewicz
  2021-01-19 17:25                   ` Linus Torvalds
@ 2021-01-19 20:24                   ` Linus Torvalds
  2021-01-19 20:38                     ` Christoph Hellwig
  2021-01-20  1:25                     ` Oliver Giles
  2 siblings, 2 replies; 65+ messages in thread
From: Linus Torvalds @ 2021-01-19 20:24 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List,
	Al Viro, Jiri Slaby

[-- Attachment #1: Type: text/plain, Size: 3343 bytes --]

On Tue, Jan 19, 2021 at 3:54 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> This looks sane, but I'm still missing what the goal of this is here.
> It's nice from a "don't make the ldisc do the userspace copy", point of
> view, but what is the next step in order to tie that into splice?

Ok, so here's a series of four patches that make ttys possible sources
and destinations for splice() again.

Well, the first patch is just the pipe one for sendfile() - and it's
the hacky one-liner, not the proper one that Al will hopefully add.

NOTE! I've signed off on these, because I think they are fine for
testing - but they are really meant for testing ONLY.

I'm running a kernel with these in place, so they kind of work. And
yes, I verified that sendfile() now works with a pipe or tty target. I
didn't actually check splicing _from_ a tty, nor did I check that
readv/writev now works properly, but it all LoosGood(tm) to me.

HOWEVER.

The reason these are for testing only is that

 (a) my tests are pretty limited, and I'd like the actual people who
reported this to really test them out

 (b) the new read iterator model is going to be quite horribly slow
for big pty transfers because the n_tty ldisc isn't doing the cookie
batching

 (c) I really really want Al to take a look at that iov_iter_revert()
thing in do_tty_write() (in "[PATCH 2/4] tty: implement write_iter")

Note that I'm more than happy to do (b) as a patch on top of this, but
I'd like (a) and (c) to be clarified before I do that.

> I ask as I also have reports that sysfs binary files are now failing for
> this same reason, so I need to make the same change for them and it's
> not excatly obvious what to do:

Ok, so that would require those kernfs_fop_{read,write}() functions to
also be converted to read_iter/write_iter.

That doesn't look horrendous: it's not all that dissimilar from the
two patches to do that for tty's ("tty: implement {read,write}_iter").
The seq_file part already has a iter version for reading, and the main
change to kernfs_file_direct_read() and kernfs_fop_write() is to do
that

 (a) change the arguments from file/buf/count/ppos to kiocb/iov_iter

 (b) change the copy_to/from_user() calls to copy_to/from_iter()

Note that (b) involves changing the semantics of the return value:
"copy_to/from_user()" returns the number of bytes that were *NOT*
copied, while "copy_to/from_iter()" returns the number of bytes that
WERE copied.

So the error case check does from

        if (copy_to/from_user()) **ERROR**

to

        if (copy_to/from_iter(n) != n) **ERROR**

but that is fairly straightforward.

The two "tty: implement write/read_iter" patches (patch 2 and 4) can
be used as examples. That said, I want to again stress that they
haven't seen all that much testing, and I do want Al to spray his holy
penguin pee on that iov_iter_revert() thing in patch 2.

I'm honestly not that motivated on those sysfs files: the tty layer
was an interesting test-case that I wanted to look at just because the
conversion to kernel pointers was nontrivial for the read side.

But that sysfs binary file case really isn't interesting, and just
more of a "Christoph broke it, I think he should just fix it".
Christoph?

Anyway, anybody willing to test these tty/pipe patches on the loads
that failed? Oliver?

               Linus

[-- Attachment #2: 0001-pipe-allow-sendfile-destination-with-splice_write.patch --]
[-- Type: text/x-patch, Size: 1311 bytes --]

From 95713b6e8b2247c55dd0a04174a55ea9a7fde7f6 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Tue, 19 Jan 2021 09:26:15 -0800
Subject: [PATCH 1/4] pipe: allow sendfile() destination with splice_write

Note that Al Viro is 100% right that this isn't needed for regular
splicing (that treats pipes specially, since pipes _are_ the splice
buffers).

So the correct thing to do is to teach do_splice_direct() the same "hey,
it's already a pipe", and fix sendfile() with a pipe destination that way.

But this is the one-liner "make it work" thing, rather than the "do it
properly" thing that Al will hopefully do.

Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops")
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/pipe.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/pipe.c b/fs/pipe.c
index c5989cfd564d..39c96845a72f 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1206,6 +1206,7 @@ const struct file_operations pipefifo_fops = {
 	.unlocked_ioctl	= pipe_ioctl,
 	.release	= pipe_release,
 	.fasync		= pipe_fasync,
+	.splice_write	= iter_file_splice_write,
 };
 
 /*
-- 
2.29.2.157.g1d47791a39


[-- Attachment #3: 0002-tty-implement-write_iter.patch --]
[-- Type: text/x-patch, Size: 5754 bytes --]

From 0dce8c5ef15f0aa7b4525721b86a20b7c4df8ca0 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Tue, 19 Jan 2021 11:41:16 -0800
Subject: [PATCH 2/4] tty: implement write_iter

This makes the tty layer use the .write_iter() function instead of the
traditional .write() functionality.

That allows writev(), but more importantly also makes it possible to
enable .splice_write() for ttys, reinstating the "splice to tty"
functionality that was lost in commit 36e2c7421f02 ("fs: don't allow
splice read/write without explicit ops").

Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops")
Reported-by: Oliver Giles <ohw.giles@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/tty/tty_io.c | 48 ++++++++++++++++++++++++--------------------
 1 file changed, 26 insertions(+), 22 deletions(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 8034489337d7..502862626b2b 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -143,9 +143,8 @@ LIST_HEAD(tty_drivers);			/* linked list of tty drivers */
 DEFINE_MUTEX(tty_mutex);
 
 static ssize_t tty_read(struct file *, char __user *, size_t, loff_t *);
-static ssize_t tty_write(struct file *, const char __user *, size_t, loff_t *);
-ssize_t redirected_tty_write(struct file *, const char __user *,
-							size_t, loff_t *);
+static ssize_t tty_write(struct kiocb *, struct iov_iter *);
+ssize_t redirected_tty_write(struct kiocb *, struct iov_iter *);
 static __poll_t tty_poll(struct file *, poll_table *);
 static int tty_open(struct inode *, struct file *);
 long tty_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
@@ -478,7 +477,8 @@ static void tty_show_fdinfo(struct seq_file *m, struct file *file)
 static const struct file_operations tty_fops = {
 	.llseek		= no_llseek,
 	.read		= tty_read,
-	.write		= tty_write,
+	.write_iter	= tty_write,
+	.splice_write	= iter_file_splice_write,
 	.poll		= tty_poll,
 	.unlocked_ioctl	= tty_ioctl,
 	.compat_ioctl	= tty_compat_ioctl,
@@ -491,7 +491,8 @@ static const struct file_operations tty_fops = {
 static const struct file_operations console_fops = {
 	.llseek		= no_llseek,
 	.read		= tty_read,
-	.write		= redirected_tty_write,
+	.write_iter	= redirected_tty_write,
+	.splice_write	= iter_file_splice_write,
 	.poll		= tty_poll,
 	.unlocked_ioctl	= tty_ioctl,
 	.compat_ioctl	= tty_compat_ioctl,
@@ -606,9 +607,9 @@ static void __tty_hangup(struct tty_struct *tty, int exit_session)
 	/* This breaks for file handles being sent over AF_UNIX sockets ? */
 	list_for_each_entry(priv, &tty->tty_files, list) {
 		filp = priv->file;
-		if (filp->f_op->write == redirected_tty_write)
+		if (filp->f_op->write_iter == redirected_tty_write)
 			cons_filp = filp;
-		if (filp->f_op->write != tty_write)
+		if (filp->f_op->write_iter != tty_write)
 			continue;
 		closecount++;
 		__tty_fasync(-1, filp, 0);	/* can't block */
@@ -901,9 +902,9 @@ static inline ssize_t do_tty_write(
 	ssize_t (*write)(struct tty_struct *, struct file *, const unsigned char *, size_t),
 	struct tty_struct *tty,
 	struct file *file,
-	const char __user *buf,
-	size_t count)
+	struct iov_iter *from)
 {
+	size_t count = iov_iter_count(from);
 	ssize_t ret, written = 0;
 	unsigned int chunk;
 
@@ -955,14 +956,20 @@ static inline ssize_t do_tty_write(
 		size_t size = count;
 		if (size > chunk)
 			size = chunk;
+
 		ret = -EFAULT;
-		if (copy_from_user(tty->write_buf, buf, size))
+		if (copy_from_iter(tty->write_buf, size, from) != size)
 			break;
+
 		ret = write(tty, file, tty->write_buf, size);
 		if (ret <= 0)
 			break;
+
+		/* FIXME! Have Al check this! */
+		if (ret != size)
+			iov_iter_revert(from, size-ret);
+
 		written += ret;
-		buf += ret;
 		count -= ret;
 		if (!count)
 			break;
@@ -1022,9 +1029,9 @@ void tty_write_message(struct tty_struct *tty, char *msg)
  *	write method will not be invoked in parallel for each device.
  */
 
-static ssize_t tty_write(struct file *file, const char __user *buf,
-						size_t count, loff_t *ppos)
+static ssize_t tty_write(struct kiocb *iocb, struct iov_iter *from)
 {
+	struct file *file = iocb->ki_filp;
 	struct tty_struct *tty = file_tty(file);
  	struct tty_ldisc *ld;
 	ssize_t ret;
@@ -1037,18 +1044,15 @@ static ssize_t tty_write(struct file *file, const char __user *buf,
 	if (tty->ops->write_room == NULL)
 		tty_err(tty, "missing write_room method\n");
 	ld = tty_ldisc_ref_wait(tty);
-	if (!ld)
-		return hung_up_tty_write(file, buf, count, ppos);
-	if (!ld->ops->write)
+	if (!ld || !ld->ops->write)
 		ret = -EIO;
 	else
-		ret = do_tty_write(ld->ops->write, tty, file, buf, count);
+		ret = do_tty_write(ld->ops->write, tty, file, from);
 	tty_ldisc_deref(ld);
 	return ret;
 }
 
-ssize_t redirected_tty_write(struct file *file, const char __user *buf,
-						size_t count, loff_t *ppos)
+ssize_t redirected_tty_write(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct file *p = NULL;
 
@@ -1059,11 +1063,11 @@ ssize_t redirected_tty_write(struct file *file, const char __user *buf,
 
 	if (p) {
 		ssize_t res;
-		res = vfs_write(p, buf, count, &p->f_pos);
+		res = vfs_iocb_iter_write(p, iocb, iter);
 		fput(p);
 		return res;
 	}
-	return tty_write(file, buf, count, ppos);
+	return tty_write(iocb, iter);
 }
 
 /*
@@ -2295,7 +2299,7 @@ static int tioccons(struct file *file)
 {
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
-	if (file->f_op->write == redirected_tty_write) {
+	if (file->f_op->write_iter == redirected_tty_write) {
 		struct file *f;
 		spin_lock(&redirect_lock);
 		f = redirect;
-- 
2.29.2.157.g1d47791a39


[-- Attachment #4: 0003-tty-convert-tty_ldisc_ops-read-function-to-take-a-ke.patch --]
[-- Type: text/x-patch, Size: 21393 bytes --]

From 8b7bacf932d1090ea87fd9ad218715055d3eb66e Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 18 Jan 2021 13:31:30 -0800
Subject: [PATCH 3/4] tty: convert tty_ldisc_ops 'read()' function to take a
 kernel pointer

The tty line discipline .read() function was passed the final user
pointer destination as an argument, which doesn't match the 'write()'
function, and makes it very inconvenient to do a splice method for
tty's.

This is a conversion to use a kernel buffer instead.

NOTE! It does this by passing the tty line discipline ->read() function
an additional "cookie" to fill in, and an offset into the cookie data.

The line discipline can fill in the cookie data with its own private
information, and then the reader will repeat the read until either the
cookie is cleared or it runs out of data.

The only real user of this is N_HDLC, which can use this to handle big
packets, even if the kernel buffer is smaller than the whole packet.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/bluetooth/hci_ldisc.c | 34 +++++++--------
 drivers/input/serio/serport.c |  4 +-
 drivers/net/ppp/ppp_async.c   |  3 +-
 drivers/net/ppp/ppp_synctty.c |  3 +-
 drivers/tty/n_gsm.c           |  3 +-
 drivers/tty/n_hdlc.c          | 60 +++++++++++++++++--------
 drivers/tty/n_null.c          |  3 +-
 drivers/tty/n_r3964.c         | 10 ++---
 drivers/tty/n_tracerouter.c   |  4 +-
 drivers/tty/n_tracesink.c     |  4 +-
 drivers/tty/n_tty.c           | 82 +++++++++++++++--------------------
 drivers/tty/tty_io.c          | 64 +++++++++++++++++++++++++--
 include/linux/tty_ldisc.h     |  3 +-
 net/nfc/nci/uart.c            |  3 +-
 14 files changed, 178 insertions(+), 102 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index f83d67eafc9f..dd92aea15b8b 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -802,7 +802,8 @@ static int hci_uart_tty_ioctl(struct tty_struct *tty, struct file *file,
  * We don't provide read/write/poll interface for user space.
  */
 static ssize_t hci_uart_tty_read(struct tty_struct *tty, struct file *file,
-				 unsigned char __user *buf, size_t nr)
+				 unsigned char *buf, size_t nr,
+				 void **cookie, unsigned long offset)
 {
 	return 0;
 }
@@ -819,29 +820,28 @@ static __poll_t hci_uart_tty_poll(struct tty_struct *tty,
 	return 0;
 }
 
+static struct tty_ldisc_ops hci_uart_ldisc = {
+	.owner		= THIS_MODULE,
+	.magic		= TTY_LDISC_MAGIC,
+	.name		= "n_hci",
+	.open		= hci_uart_tty_open,
+	.close		= hci_uart_tty_close,
+	.read		= hci_uart_tty_read,
+	.write		= hci_uart_tty_write,
+	.ioctl		= hci_uart_tty_ioctl,
+	.compat_ioctl	= hci_uart_tty_ioctl,
+	.poll		= hci_uart_tty_poll,
+	.receive_buf	= hci_uart_tty_receive,
+	.write_wakeup	= hci_uart_tty_wakeup,
+};
+
 static int __init hci_uart_init(void)
 {
-	static struct tty_ldisc_ops hci_uart_ldisc;
 	int err;
 
 	BT_INFO("HCI UART driver ver %s", VERSION);
 
 	/* Register the tty discipline */
-
-	memset(&hci_uart_ldisc, 0, sizeof(hci_uart_ldisc));
-	hci_uart_ldisc.magic		= TTY_LDISC_MAGIC;
-	hci_uart_ldisc.name		= "n_hci";
-	hci_uart_ldisc.open		= hci_uart_tty_open;
-	hci_uart_ldisc.close		= hci_uart_tty_close;
-	hci_uart_ldisc.read		= hci_uart_tty_read;
-	hci_uart_ldisc.write		= hci_uart_tty_write;
-	hci_uart_ldisc.ioctl		= hci_uart_tty_ioctl;
-	hci_uart_ldisc.compat_ioctl	= hci_uart_tty_ioctl;
-	hci_uart_ldisc.poll		= hci_uart_tty_poll;
-	hci_uart_ldisc.receive_buf	= hci_uart_tty_receive;
-	hci_uart_ldisc.write_wakeup	= hci_uart_tty_wakeup;
-	hci_uart_ldisc.owner		= THIS_MODULE;
-
 	err = tty_register_ldisc(N_HCI, &hci_uart_ldisc);
 	if (err) {
 		BT_ERR("HCI line discipline registration failed. (%d)", err);
diff --git a/drivers/input/serio/serport.c b/drivers/input/serio/serport.c
index 8ac970a423de..33e9d9bfd036 100644
--- a/drivers/input/serio/serport.c
+++ b/drivers/input/serio/serport.c
@@ -156,7 +156,9 @@ static void serport_ldisc_receive(struct tty_struct *tty, const unsigned char *c
  * returning 0 characters.
  */
 
-static ssize_t serport_ldisc_read(struct tty_struct * tty, struct file * file, unsigned char __user * buf, size_t nr)
+static ssize_t serport_ldisc_read(struct tty_struct * tty, struct file * file,
+				  unsigned char *kbuf, size_t nr,
+				  void **cookie, unsigned long offset)
 {
 	struct serport *serport = (struct serport*) tty->disc_data;
 	struct serio *serio;
diff --git a/drivers/net/ppp/ppp_async.c b/drivers/net/ppp/ppp_async.c
index 29a0917a81e6..f14a9d190de9 100644
--- a/drivers/net/ppp/ppp_async.c
+++ b/drivers/net/ppp/ppp_async.c
@@ -259,7 +259,8 @@ static int ppp_asynctty_hangup(struct tty_struct *tty)
  */
 static ssize_t
 ppp_asynctty_read(struct tty_struct *tty, struct file *file,
-		  unsigned char __user *buf, size_t count)
+		  unsigned char *buf, size_t count,
+		  void **cookie, unsigned long offset)
 {
 	return -EAGAIN;
 }
diff --git a/drivers/net/ppp/ppp_synctty.c b/drivers/net/ppp/ppp_synctty.c
index 0f338752c38b..f774b7e52da4 100644
--- a/drivers/net/ppp/ppp_synctty.c
+++ b/drivers/net/ppp/ppp_synctty.c
@@ -257,7 +257,8 @@ static int ppp_sync_hangup(struct tty_struct *tty)
  */
 static ssize_t
 ppp_sync_read(struct tty_struct *tty, struct file *file,
-	       unsigned char __user *buf, size_t count)
+	      unsigned char *buf, size_t count,
+	      void **cookie, unsigned long offset)
 {
 	return -EAGAIN;
 }
diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index c676fa89ee0b..51dafc06f541 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -2559,7 +2559,8 @@ static void gsmld_write_wakeup(struct tty_struct *tty)
  */
 
 static ssize_t gsmld_read(struct tty_struct *tty, struct file *file,
-			 unsigned char __user *buf, size_t nr)
+			  unsigned char *buf, size_t nr,
+			  void **cookie, unsigned long offset)
 {
 	return -EOPNOTSUPP;
 }
diff --git a/drivers/tty/n_hdlc.c b/drivers/tty/n_hdlc.c
index 12557ee1edb6..1363e659dc1d 100644
--- a/drivers/tty/n_hdlc.c
+++ b/drivers/tty/n_hdlc.c
@@ -416,13 +416,19 @@ static void n_hdlc_tty_receive(struct tty_struct *tty, const __u8 *data,
  * Returns the number of bytes returned or error code.
  */
 static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file,
-			   __u8 __user *buf, size_t nr)
+			   __u8 *kbuf, size_t nr,
+			   void **cookie, unsigned long offset)
 {
 	struct n_hdlc *n_hdlc = tty->disc_data;
 	int ret = 0;
 	struct n_hdlc_buf *rbuf;
 	DECLARE_WAITQUEUE(wait, current);
 
+	/* Is this a repeated call for an rbuf we already found earlier? */
+	rbuf = *cookie;
+	if (rbuf)
+		goto have_rbuf;
+
 	add_wait_queue(&tty->read_wait, &wait);
 
 	for (;;) {
@@ -436,25 +442,8 @@ static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file,
 		set_current_state(TASK_INTERRUPTIBLE);
 
 		rbuf = n_hdlc_buf_get(&n_hdlc->rx_buf_list);
-		if (rbuf) {
-			if (rbuf->count > nr) {
-				/* too large for caller's buffer */
-				ret = -EOVERFLOW;
-			} else {
-				__set_current_state(TASK_RUNNING);
-				if (copy_to_user(buf, rbuf->buf, rbuf->count))
-					ret = -EFAULT;
-				else
-					ret = rbuf->count;
-			}
-
-			if (n_hdlc->rx_free_buf_list.count >
-			    DEFAULT_RX_BUF_COUNT)
-				kfree(rbuf);
-			else
-				n_hdlc_buf_put(&n_hdlc->rx_free_buf_list, rbuf);
+		if (rbuf)
 			break;
-		}
 
 		/* no data */
 		if (tty_io_nonblock(tty, file)) {
@@ -473,6 +462,39 @@ static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file,
 	remove_wait_queue(&tty->read_wait, &wait);
 	__set_current_state(TASK_RUNNING);
 
+	if (!rbuf)
+		return ret;
+	*cookie = rbuf;
+
+have_rbuf:
+	/* Have we used it up entirely? */
+	if (offset >= rbuf->count)
+		goto done_with_rbuf;
+
+	/* More data to go, but can't copy any more? EOVERFLOW */
+	ret = -EOVERFLOW;
+	if (!nr)
+		goto done_with_rbuf;
+
+	/* Copy as much data as possible */
+	ret = rbuf->count - offset;
+	if (ret > nr)
+		ret = nr;
+	memcpy(kbuf, rbuf->buf+offset, ret);
+	offset += ret;
+
+	/* If we still have data left, we leave the rbuf in the cookie */
+	if (offset < rbuf->count)
+		return ret;
+
+done_with_rbuf:
+	*cookie = NULL;
+
+	if (n_hdlc->rx_free_buf_list.count > DEFAULT_RX_BUF_COUNT)
+		kfree(rbuf);
+	else
+		n_hdlc_buf_put(&n_hdlc->rx_free_buf_list, rbuf);
+
 	return ret;
 
 }	/* end of n_hdlc_tty_read() */
diff --git a/drivers/tty/n_null.c b/drivers/tty/n_null.c
index 96feabae4740..ce03ae78f5c6 100644
--- a/drivers/tty/n_null.c
+++ b/drivers/tty/n_null.c
@@ -20,7 +20,8 @@ static void n_null_close(struct tty_struct *tty)
 }
 
 static ssize_t n_null_read(struct tty_struct *tty, struct file *file,
-			   unsigned char __user * buf, size_t nr)
+			   unsigned char *buf, size_t nr,
+			   void **cookie, unsigned long offset)
 {
 	return -EOPNOTSUPP;
 }
diff --git a/drivers/tty/n_r3964.c b/drivers/tty/n_r3964.c
index 934dd2fb2ec8..3161f0a535e3 100644
--- a/drivers/tty/n_r3964.c
+++ b/drivers/tty/n_r3964.c
@@ -129,7 +129,7 @@ static void remove_client_block(struct r3964_info *pInfo,
 static int r3964_open(struct tty_struct *tty);
 static void r3964_close(struct tty_struct *tty);
 static ssize_t r3964_read(struct tty_struct *tty, struct file *file,
-		unsigned char __user * buf, size_t nr);
+		void *cookie, unsigned char *buf, size_t nr);
 static ssize_t r3964_write(struct tty_struct *tty, struct file *file,
 		const unsigned char *buf, size_t nr);
 static int r3964_ioctl(struct tty_struct *tty, struct file *file,
@@ -1058,7 +1058,8 @@ static void r3964_close(struct tty_struct *tty)
 }
 
 static ssize_t r3964_read(struct tty_struct *tty, struct file *file,
-			  unsigned char __user * buf, size_t nr)
+			  unsigned char *kbuf, size_t nr,
+			  void **cookie, unsigned long offset)
 {
 	struct r3964_info *pInfo = tty->disc_data;
 	struct r3964_client_info *pClient;
@@ -1109,10 +1110,7 @@ static ssize_t r3964_read(struct tty_struct *tty, struct file *file,
 		kfree(pMsg);
 		TRACE_M("r3964_read - msg kfree %p", pMsg);
 
-		if (copy_to_user(buf, &theMsg, ret)) {
-			ret = -EFAULT;
-			goto unlock;
-		}
+		memcpy(kbuf, &theMsg, ret);
 
 		TRACE_PS("read - return %d", ret);
 		goto unlock;
diff --git a/drivers/tty/n_tracerouter.c b/drivers/tty/n_tracerouter.c
index 4479af4d2fa5..3490ed51b1a3 100644
--- a/drivers/tty/n_tracerouter.c
+++ b/drivers/tty/n_tracerouter.c
@@ -118,7 +118,9 @@ static void n_tracerouter_close(struct tty_struct *tty)
  *	 -EINVAL
  */
 static ssize_t n_tracerouter_read(struct tty_struct *tty, struct file *file,
-				  unsigned char __user *buf, size_t nr) {
+				  unsigned char *buf, size_t nr,
+				  void **cookie, unsigned long offset)
+{
 	return -EINVAL;
 }
 
diff --git a/drivers/tty/n_tracesink.c b/drivers/tty/n_tracesink.c
index d96ba82cc356..1d9931041fd8 100644
--- a/drivers/tty/n_tracesink.c
+++ b/drivers/tty/n_tracesink.c
@@ -115,7 +115,9 @@ static void n_tracesink_close(struct tty_struct *tty)
  *	 -EINVAL
  */
 static ssize_t n_tracesink_read(struct tty_struct *tty, struct file *file,
-				unsigned char __user *buf, size_t nr) {
+				unsigned char *buf, size_t nr,
+				void **cookie, unsigned long offset)
+{
 	return -EINVAL;
 }
 
diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index 319d68c8a5df..2f2f57a53968 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -164,29 +164,24 @@ static void zero_buffer(struct tty_struct *tty, u8 *buffer, int size)
 		memset(buffer, 0x00, size);
 }
 
-static int tty_copy_to_user(struct tty_struct *tty, void __user *to,
-			    size_t tail, size_t n)
+static void tty_copy(struct tty_struct *tty, void *to, size_t tail, size_t n)
 {
 	struct n_tty_data *ldata = tty->disc_data;
 	size_t size = N_TTY_BUF_SIZE - tail;
 	void *from = read_buf_addr(ldata, tail);
-	int uncopied;
 
 	if (n > size) {
 		tty_audit_add_data(tty, from, size);
-		uncopied = copy_to_user(to, from, size);
-		zero_buffer(tty, from, size - uncopied);
-		if (uncopied)
-			return uncopied;
+		memcpy(to, from, size);
+		zero_buffer(tty, from, size);
 		to += size;
 		n -= size;
 		from = ldata->read_buf;
 	}
 
 	tty_audit_add_data(tty, from, n);
-	uncopied = copy_to_user(to, from, n);
-	zero_buffer(tty, from, n - uncopied);
-	return uncopied;
+	memcpy(to, from, n);
+	zero_buffer(tty, from, n);
 }
 
 /**
@@ -1944,15 +1939,16 @@ static inline int input_available_p(struct tty_struct *tty, int poll)
 /**
  *	copy_from_read_buf	-	copy read data directly
  *	@tty: terminal device
- *	@b: user data
+ *	@kbp: data
  *	@nr: size of data
  *
  *	Helper function to speed up n_tty_read.  It is only called when
- *	ICANON is off; it copies characters straight from the tty queue to
- *	user space directly.  It can be profitably called twice; once to
- *	drain the space from the tail pointer to the (physical) end of the
- *	buffer, and once to drain the space from the (physical) beginning of
- *	the buffer to head pointer.
+ *	ICANON is off; it copies characters straight from the tty queue.
+ *
+ *	It can be profitably called twice; once to drain the space from
+ *	the tail pointer to the (physical) end of the buffer, and once
+ *	to drain the space from the (physical) beginning of the buffer
+ *	to head pointer.
  *
  *	Called under the ldata->atomic_read_lock sem
  *
@@ -1962,7 +1958,7 @@ static inline int input_available_p(struct tty_struct *tty, int poll)
  */
 
 static int copy_from_read_buf(struct tty_struct *tty,
-				      unsigned char __user **b,
+				      unsigned char **kbp,
 				      size_t *nr)
 
 {
@@ -1978,8 +1974,7 @@ static int copy_from_read_buf(struct tty_struct *tty,
 	n = min(*nr, n);
 	if (n) {
 		unsigned char *from = read_buf_addr(ldata, tail);
-		retval = copy_to_user(*b, from, n);
-		n -= retval;
+		memcpy(*kbp, from, n);
 		is_eof = n == 1 && *from == EOF_CHAR(tty);
 		tty_audit_add_data(tty, from, n);
 		zero_buffer(tty, from, n);
@@ -1988,7 +1983,7 @@ static int copy_from_read_buf(struct tty_struct *tty,
 		if (L_EXTPROC(tty) && ldata->icanon && is_eof &&
 		    (head == ldata->read_tail))
 			n = 0;
-		*b += n;
+		*kbp += n;
 		*nr -= n;
 	}
 	return retval;
@@ -1997,12 +1992,12 @@ static int copy_from_read_buf(struct tty_struct *tty,
 /**
  *	canon_copy_from_read_buf	-	copy read data in canonical mode
  *	@tty: terminal device
- *	@b: user data
+ *	@kbp: data
  *	@nr: size of data
  *
  *	Helper function for n_tty_read.  It is only called when ICANON is on;
  *	it copies one line of input up to and including the line-delimiting
- *	character into the user-space buffer.
+ *	character into the result buffer.
  *
  *	NB: When termios is changed from non-canonical to canonical mode and
  *	the read buffer contains data, n_tty_set_termios() simulates an EOF
@@ -2018,14 +2013,14 @@ static int copy_from_read_buf(struct tty_struct *tty,
  */
 
 static int canon_copy_from_read_buf(struct tty_struct *tty,
-				    unsigned char __user **b,
+				    unsigned char **kbp,
 				    size_t *nr)
 {
 	struct n_tty_data *ldata = tty->disc_data;
 	size_t n, size, more, c;
 	size_t eol;
 	size_t tail;
-	int ret, found = 0;
+	int found = 0;
 
 	/* N.B. avoid overrun if nr == 0 */
 	if (!*nr)
@@ -2061,10 +2056,8 @@ static int canon_copy_from_read_buf(struct tty_struct *tty,
 	n_tty_trace("%s: eol:%zu found:%d n:%zu c:%zu tail:%zu more:%zu\n",
 		    __func__, eol, found, n, c, tail, more);
 
-	ret = tty_copy_to_user(tty, *b, tail, n);
-	if (ret)
-		return -EFAULT;
-	*b += n;
+	tty_copy(tty, *kbp, tail, n);
+	*kbp += n;
 	*nr -= n;
 
 	if (found)
@@ -2132,10 +2125,11 @@ static int job_control(struct tty_struct *tty, struct file *file)
  */
 
 static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
-			 unsigned char __user *buf, size_t nr)
+			  unsigned char *kbuf, size_t nr,
+			  void **cookie, unsigned long offset)
 {
 	struct n_tty_data *ldata = tty->disc_data;
-	unsigned char __user *b = buf;
+	unsigned char *kb = kbuf;
 	DEFINE_WAIT_FUNC(wait, woken_wake_function);
 	int c;
 	int minimum, time;
@@ -2181,17 +2175,13 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 		/* First test for status change. */
 		if (packet && tty->link->ctrl_status) {
 			unsigned char cs;
-			if (b != buf)
+			if (kb != kbuf)
 				break;
 			spin_lock_irq(&tty->link->ctrl_lock);
 			cs = tty->link->ctrl_status;
 			tty->link->ctrl_status = 0;
 			spin_unlock_irq(&tty->link->ctrl_lock);
-			if (put_user(cs, b)) {
-				retval = -EFAULT;
-				break;
-			}
-			b++;
+			*kb++ = cs;
 			nr--;
 			break;
 		}
@@ -2234,24 +2224,20 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 		}
 
 		if (ldata->icanon && !L_EXTPROC(tty)) {
-			retval = canon_copy_from_read_buf(tty, &b, &nr);
+			retval = canon_copy_from_read_buf(tty, &kb, &nr);
 			if (retval)
 				break;
 		} else {
 			int uncopied;
 
 			/* Deal with packet mode. */
-			if (packet && b == buf) {
-				if (put_user(TIOCPKT_DATA, b)) {
-					retval = -EFAULT;
-					break;
-				}
-				b++;
+			if (packet && kb == kbuf) {
+				*kb++ = TIOCPKT_DATA;
 				nr--;
 			}
 
-			uncopied = copy_from_read_buf(tty, &b, &nr);
-			uncopied += copy_from_read_buf(tty, &b, &nr);
+			uncopied = copy_from_read_buf(tty, &kb, &nr);
+			uncopied += copy_from_read_buf(tty, &kb, &nr);
 			if (uncopied) {
 				retval = -EFAULT;
 				break;
@@ -2260,7 +2246,7 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 
 		n_tty_check_unthrottle(tty);
 
-		if (b - buf >= minimum)
+		if (kb - kbuf >= minimum)
 			break;
 		if (time)
 			timeout = time;
@@ -2272,8 +2258,8 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 	remove_wait_queue(&tty->read_wait, &wait);
 	mutex_unlock(&ldata->atomic_read_lock);
 
-	if (b - buf)
-		retval = b - buf;
+	if (kb - kbuf)
+		retval = kb - kbuf;
 
 	return retval;
 }
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 502862626b2b..d33e120046a6 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -832,6 +832,65 @@ static void tty_update_time(struct timespec64 *time)
 		time->tv_sec = sec;
 }
 
+/*
+ * Iterate on the ldisc ->read() function until we've gotten all
+ * the data the ldisc has for us.
+ *
+ * The "cookie" is something that the ldisc read function can fill
+ * in to let us know that there is more data to be had.
+ *
+ * We promise to continue to call the ldisc until it stops returning
+ * data or clears the cookie. The cookie may be something that the
+ * ldisc maintains state for and needs to free.
+ */
+static int iterate_tty_read(struct tty_ldisc *ld, struct tty_struct *tty, struct file *file,
+		char __user *buf, size_t count)
+{
+	int retval = 0;
+	void *cookie = NULL;
+	unsigned long offset = 0;
+	char kernel_buf[64];
+
+	do {
+		int size, uncopied;
+
+		size = count > sizeof(kernel_buf) ? sizeof(kernel_buf) : count;
+		size = ld->ops->read(tty, file, kernel_buf, size, &cookie, offset);
+		if (!size)
+			break;
+
+		/*
+		 * A ldisc read error return will override any previously copied
+		 * data (eg -EOVERFLOW from HDLC)
+		 */
+		if (size < 0) {
+			memzero_explicit(kernel_buf, sizeof(kernel_buf));
+			return size;
+		}
+
+		uncopied = copy_to_user(buf+offset, kernel_buf, size);
+		size -= uncopied;
+		offset += size;
+		count -= size;
+
+		/*
+		 * If the user copy failed, we still need to do another ->read()
+		 * call if we had a cookie to let the ldisc clear up.
+		 *
+		 * But make sure size is zeroed.
+		 */
+		if (unlikely(uncopied)) {
+			count = 0;
+			retval = -EFAULT;
+		}
+	} while (cookie);
+
+	/* We always clear tty buffer in case they contained passwords */
+	memzero_explicit(kernel_buf, sizeof(kernel_buf));
+	return offset ? offset : retval;
+}
+
+
 /**
  *	tty_read	-	read method for tty device files
  *	@file: pointer to tty file
@@ -865,10 +924,9 @@ static ssize_t tty_read(struct file *file, char __user *buf, size_t count,
 	ld = tty_ldisc_ref_wait(tty);
 	if (!ld)
 		return hung_up_tty_read(file, buf, count, ppos);
+	i = -EIO;
 	if (ld->ops->read)
-		i = ld->ops->read(tty, file, buf, count);
-	else
-		i = -EIO;
+		i = iterate_tty_read(ld, tty, file, buf, count);
 	tty_ldisc_deref(ld);
 
 	if (i > 0)
diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h
index b1e6043e9917..572a07976116 100644
--- a/include/linux/tty_ldisc.h
+++ b/include/linux/tty_ldisc.h
@@ -185,7 +185,8 @@ struct tty_ldisc_ops {
 	void	(*close)(struct tty_struct *);
 	void	(*flush_buffer)(struct tty_struct *tty);
 	ssize_t	(*read)(struct tty_struct *tty, struct file *file,
-			unsigned char __user *buf, size_t nr);
+			unsigned char *buf, size_t nr,
+			void **cookie, unsigned long offset);
 	ssize_t	(*write)(struct tty_struct *tty, struct file *file,
 			 const unsigned char *buf, size_t nr);
 	int	(*ioctl)(struct tty_struct *tty, struct file *file,
diff --git a/net/nfc/nci/uart.c b/net/nfc/nci/uart.c
index 11b554ce07ff..1204c438e87d 100644
--- a/net/nfc/nci/uart.c
+++ b/net/nfc/nci/uart.c
@@ -292,7 +292,8 @@ static int nci_uart_tty_ioctl(struct tty_struct *tty, struct file *file,
 
 /* We don't provide read/write/poll interface for user space. */
 static ssize_t nci_uart_tty_read(struct tty_struct *tty, struct file *file,
-				 unsigned char __user *buf, size_t nr)
+				 unsigned char *buf, size_t nr,
+				 void **cookie, unsigned long offset)
 {
 	return 0;
 }
-- 
2.29.2.157.g1d47791a39


[-- Attachment #5: 0004-tty-implement-read_iter.patch --]
[-- Type: text/x-patch, Size: 5002 bytes --]

From 08cb81c888e88b152f49ad2c90146b8f0c9ce6b3 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Tue, 19 Jan 2021 10:49:19 -0800
Subject: [PATCH 4/4] tty: implement read_iter

Now that the ldisc read() function takes kernel pointers, it's fairly
straightforward to make the tty file operations use .read_iter() instead
of .read().

That automatically gives us vread() and friends, and also makes it
possible to do .splice_read() on tty's again.

Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops")
Reported-by: Oliver Giles <ohw.giles@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/tty/tty_io.c | 36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index d33e120046a6..b8c0b40f3298 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -142,7 +142,7 @@ LIST_HEAD(tty_drivers);			/* linked list of tty drivers */
 /* Mutex to protect creating and releasing a tty */
 DEFINE_MUTEX(tty_mutex);
 
-static ssize_t tty_read(struct file *, char __user *, size_t, loff_t *);
+static ssize_t tty_read(struct kiocb *, struct iov_iter *);
 static ssize_t tty_write(struct kiocb *, struct iov_iter *);
 ssize_t redirected_tty_write(struct kiocb *, struct iov_iter *);
 static __poll_t tty_poll(struct file *, poll_table *);
@@ -476,8 +476,9 @@ static void tty_show_fdinfo(struct seq_file *m, struct file *file)
 
 static const struct file_operations tty_fops = {
 	.llseek		= no_llseek,
-	.read		= tty_read,
+	.read_iter	= tty_read,
 	.write_iter	= tty_write,
+	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
 	.poll		= tty_poll,
 	.unlocked_ioctl	= tty_ioctl,
@@ -490,8 +491,9 @@ static const struct file_operations tty_fops = {
 
 static const struct file_operations console_fops = {
 	.llseek		= no_llseek,
-	.read		= tty_read,
+	.read_iter	= tty_read,
 	.write_iter	= redirected_tty_write,
+	.splice_read	= generic_file_splice_read,
 	.splice_write	= iter_file_splice_write,
 	.poll		= tty_poll,
 	.unlocked_ioctl	= tty_ioctl,
@@ -843,16 +845,17 @@ static void tty_update_time(struct timespec64 *time)
  * data or clears the cookie. The cookie may be something that the
  * ldisc maintains state for and needs to free.
  */
-static int iterate_tty_read(struct tty_ldisc *ld, struct tty_struct *tty, struct file *file,
-		char __user *buf, size_t count)
+static int iterate_tty_read(struct tty_ldisc *ld, struct tty_struct *tty,
+		struct file *file, struct iov_iter *to)
 {
 	int retval = 0;
 	void *cookie = NULL;
 	unsigned long offset = 0;
 	char kernel_buf[64];
+	size_t count = iov_iter_count(to);
 
 	do {
-		int size, uncopied;
+		int size, copied;
 
 		size = count > sizeof(kernel_buf) ? sizeof(kernel_buf) : count;
 		size = ld->ops->read(tty, file, kernel_buf, size, &cookie, offset);
@@ -868,10 +871,9 @@ static int iterate_tty_read(struct tty_ldisc *ld, struct tty_struct *tty, struct
 			return size;
 		}
 
-		uncopied = copy_to_user(buf+offset, kernel_buf, size);
-		size -= uncopied;
-		offset += size;
-		count -= size;
+		copied = copy_to_iter(kernel_buf, size, to);
+		offset += copied;
+		count -= copied;
 
 		/*
 		 * If the user copy failed, we still need to do another ->read()
@@ -879,7 +881,7 @@ static int iterate_tty_read(struct tty_ldisc *ld, struct tty_struct *tty, struct
 		 *
 		 * But make sure size is zeroed.
 		 */
-		if (unlikely(uncopied)) {
+		if (unlikely(copied != size)) {
 			count = 0;
 			retval = -EFAULT;
 		}
@@ -906,10 +908,10 @@ static int iterate_tty_read(struct tty_ldisc *ld, struct tty_struct *tty, struct
  *	read calls may be outstanding in parallel.
  */
 
-static ssize_t tty_read(struct file *file, char __user *buf, size_t count,
-			loff_t *ppos)
+static ssize_t tty_read(struct kiocb *iocb, struct iov_iter *to)
 {
 	int i;
+	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
 	struct tty_struct *tty = file_tty(file);
 	struct tty_ldisc *ld;
@@ -922,11 +924,9 @@ static ssize_t tty_read(struct file *file, char __user *buf, size_t count,
 	/* We want to wait for the line discipline to sort out in this
 	   situation */
 	ld = tty_ldisc_ref_wait(tty);
-	if (!ld)
-		return hung_up_tty_read(file, buf, count, ppos);
 	i = -EIO;
-	if (ld->ops->read)
-		i = iterate_tty_read(ld, tty, file, buf, count);
+	if (ld && ld->ops->read)
+		i = iterate_tty_read(ld, tty, file, to);
 	tty_ldisc_deref(ld);
 
 	if (i > 0)
@@ -2929,7 +2929,7 @@ static long tty_compat_ioctl(struct file *file, unsigned int cmd,
 
 static int this_tty(const void *t, struct file *file, unsigned fd)
 {
-	if (likely(file->f_op->read != tty_read))
+	if (likely(file->f_op->read_iter != tty_read))
 		return 0;
 	return file_tty(file) != t ? 0 : fd + 1;
 }
-- 
2.29.2.157.g1d47791a39


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-19 20:24                   ` Linus Torvalds
@ 2021-01-19 20:38                     ` Christoph Hellwig
  2021-01-20  1:25                     ` Oliver Giles
  1 sibling, 0 replies; 65+ messages in thread
From: Christoph Hellwig @ 2021-01-19 20:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Greg Kroah-Hartman, Christoph Hellwig, Oliver Giles,
	Linux Kernel Mailing List, Al Viro, Jiri Slaby

On Tue, Jan 19, 2021 at 12:24:22PM -0800, Linus Torvalds wrote:
> But that sysfs binary file case really isn't interesting, and just
> more of a "Christoph broke it, I think he should just fix it".
> Christoph?

I'll give it a spin tomorrow.  Should be simple enough.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-19 16:56                   ` Robert Karszniewicz
  2021-01-19 17:10                     ` Robert Karszniewicz
@ 2021-01-19 22:09                     ` Oliver Giles
  1 sibling, 0 replies; 65+ messages in thread
From: Oliver Giles @ 2021-01-19 22:09 UTC (permalink / raw)
  To: Robert Karszniewicz, Greg Kroah-Hartman, Linus Torvalds
  Cc: Christoph Hellwig, Linux Kernel Mailing List, Al Viro, Jiri Slaby

On Wed Jan 20, 2021 at 5:56 AM NZDT, Robert Karszniewicz wrote:
>
> I have bisected this issue down to this commit:
> 4d03e3cc5982 ("fs: don't allow kernel reads and writes without iter
> ops")
>
> Another case I've also noticed is writing to a serial connection:
> kernel write not supported for file /ttymxc0 (pid: 252 comm: cat)
>

Tangentially, I hit the same thing when hacking on this. Here's a diff
making the implementation match the comment:

--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -541,7 +541,7 @@ ssize_t __kernel_write(struct file *file, const void *buf, size_t count, loff_t
         * Also fail if ->write_iter and ->write are both wired up as that
         * implies very convoluted semantics.
         */
-       if (unlikely(!file->f_op->write_iter || file->f_op->write))
+       if (unlikely(file->f_op->write_iter && file->f_op->write))
                return warn_unsupported(file, "write");
 
        init_sync_kiocb(&kiocb, file);


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-19 20:24                   ` Linus Torvalds
  2021-01-19 20:38                     ` Christoph Hellwig
@ 2021-01-20  1:25                     ` Oliver Giles
  2021-01-20  4:44                       ` Linus Torvalds
  1 sibling, 1 reply; 65+ messages in thread
From: Oliver Giles @ 2021-01-20  1:25 UTC (permalink / raw)
  To: Linus Torvalds, Greg Kroah-Hartman
  Cc: Christoph Hellwig, Linux Kernel Mailing List, Al Viro, Jiri Slaby

On Wed Jan 20, 2021 at 9:24 AM NZDT, Linus Torvalds wrote:
> Anyway, anybody willing to test these tty/pipe patches on the loads
> that failed? Oliver?

Writing this from a kernel with those patches in; happily splice()ing
to and from a pty.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-20  1:25                     ` Oliver Giles
@ 2021-01-20  4:44                       ` Linus Torvalds
  2021-01-20  8:15                         ` Oliver Giles
                                           ` (2 more replies)
  0 siblings, 3 replies; 65+ messages in thread
From: Linus Torvalds @ 2021-01-20  4:44 UTC (permalink / raw)
  To: Oliver Giles, Robert Karszniewicz
  Cc: Greg Kroah-Hartman, Christoph Hellwig, Linux Kernel Mailing List,
	Al Viro, Jiri Slaby

[-- Attachment #1: Type: text/plain, Size: 1687 bytes --]

On Tue, Jan 19, 2021 at 5:29 PM Oliver Giles <ohw.giles@gmail.com> wrote:
>
> Writing this from a kernel with those patches in; happily splice()ing
> to and from a pty.

Ok, good.

I have a couple of improvement patches on top of those, that I'm attaching here.

The first four patches worked fine for me (and apparently for you
too), but due to the small buffer, the regular N_TTY case for tty
read() calls are now limited to 64 bytes each.

That is unfortunate for performance, but it might also cause some
actual breakage: anybody doing "read()" calls on a tty file descriptor
_should_ be perfectly fine with short reads since they happen for
signals etc, but I could well imagine that not everybody is.

And that new kernel buffer interface was _designed_ to allow stitching
small buffers together efficiently (since the hdlc case needed it), so
this implements that for the non-icanon case for n_tty too.

I wasted an embarrasing amount of time today on that final patch - I
spent something like 6 hours chasing a truly stupid one-liner bug I
had initially, and couldn't see what was wrong.

Which is why this only does the non-icanon case, because after I
finally had my "Duh!" moment, I was so annoyed with it that I had to
just walk away.

I'll come back to this tomorrow and do the line-buffered icanon case
too (unless pull requests pile up), and then I'll be happy with the
tty changes, and I think I can submit this series for real to Greg.

But in the meantime, here's two more patches to try on top of the
previous four. They shouldn't matter, other than making the non-icanon
throughput a lot better. But the more coverage they get, the happier
I'll be.

           Linus

[-- Attachment #2: 0005-tty-clean-up-legacy-leftovers-from-n_tty-line-discip.patch --]
[-- Type: text/x-patch, Size: 2202 bytes --]

From b12a1652c91648e96ae11946f7489515dd063789 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Tue, 19 Jan 2021 13:46:28 -0800
Subject: [PATCH 5/6] tty: clean up legacy leftovers from n_tty line discipline

Back when the line disciplines did their own direct user accesses, they
had to deal with the data copy possibly failing in the middle.

Now that the user copy is done by the tty_io.c code, that failure case
no longer exists.

Remove the left-over error handling code that cannot trigger.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/tty/n_tty.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index 2f2f57a53968..a02fe661f617 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -1957,19 +1957,17 @@ static inline int input_available_p(struct tty_struct *tty, int poll)
  *		read_tail published
  */
 
-static int copy_from_read_buf(struct tty_struct *tty,
+static void copy_from_read_buf(struct tty_struct *tty,
 				      unsigned char **kbp,
 				      size_t *nr)
 
 {
 	struct n_tty_data *ldata = tty->disc_data;
-	int retval;
 	size_t n;
 	bool is_eof;
 	size_t head = smp_load_acquire(&ldata->commit_head);
 	size_t tail = ldata->read_tail & (N_TTY_BUF_SIZE - 1);
 
-	retval = 0;
 	n = min(head - ldata->read_tail, N_TTY_BUF_SIZE - tail);
 	n = min(*nr, n);
 	if (n) {
@@ -1986,7 +1984,6 @@ static int copy_from_read_buf(struct tty_struct *tty,
 		*kbp += n;
 		*nr -= n;
 	}
-	return retval;
 }
 
 /**
@@ -2228,20 +2225,15 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 			if (retval)
 				break;
 		} else {
-			int uncopied;
-
 			/* Deal with packet mode. */
 			if (packet && kb == kbuf) {
 				*kb++ = TIOCPKT_DATA;
 				nr--;
 			}
 
-			uncopied = copy_from_read_buf(tty, &kb, &nr);
-			uncopied += copy_from_read_buf(tty, &kb, &nr);
-			if (uncopied) {
-				retval = -EFAULT;
-				break;
-			}
+			/* See comment above copy_from_read_buf() why twice */
+			copy_from_read_buf(tty, &kb, &nr);
+			copy_from_read_buf(tty, &kb, &nr);
 		}
 
 		n_tty_check_unthrottle(tty);
-- 
2.29.2.157.g1d47791a39


[-- Attachment #3: 0006-tty-teach-n_tty-line-discipline-about-the-new-cookie.patch --]
[-- Type: text/x-patch, Size: 4889 bytes --]

From e724cc9c4b101a4de1a56bcca6b5ec1d3493b173 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Tue, 19 Jan 2021 18:14:20 -0800
Subject: [PATCH 6/6] tty: teach n_tty line discipline about the new "cookie
 continuations"

With the conversion to do the tty ldisc read operations in small chunks,
the n_tty line discipline became noticeably slower for throughput
oriented loads, because rather than read things in up to 2kB chunks, it
would return at most 64 bytes per read() system call.

The cost is mainly all in the "do system calls over and over", not
really in the new "copy to an extra kernel buffer".

This can be fixed by teaching the n_tty line discipline about the
"cookie continuation" model, which the chunking code supports because
things like hdlc need to be able to handle packets up to 64kB in size.

Doing that doesn't just get us back to the old performace, but to much
better performance: my stupid "copy 10MB of data over a pty" test
program is now almost twice as fast as it used to be (going down from
0.1s to 0.054s).

This is entirely because it now creates maximal chunks (which happens to
be "one byte less than one page" due to how we do the circular tty
buffers).

NOTE! This case only handles the simpler non-icanon case, which is the
one where people may care about throughput.  I'm going to do the icanon
case later too, because while performance isn't a major issue for that,
there may be programs that think they'll always get a full line and
don't like the 64-byte chunking for that reason.

Such programs are arguably buggy (signals etc can cause random partial
results from tty reads anyway), and good programs will handle such
partial reads, but expecting everybody to write "good programs" has
never been a winning policy for the kernel..

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/tty/n_tty.c | 52 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 42 insertions(+), 10 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index a02fe661f617..37bfd695011d 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -1945,19 +1945,17 @@ static inline int input_available_p(struct tty_struct *tty, int poll)
  *	Helper function to speed up n_tty_read.  It is only called when
  *	ICANON is off; it copies characters straight from the tty queue.
  *
- *	It can be profitably called twice; once to drain the space from
- *	the tail pointer to the (physical) end of the buffer, and once
- *	to drain the space from the (physical) beginning of the buffer
- *	to head pointer.
- *
  *	Called under the ldata->atomic_read_lock sem
  *
+ *	Returns true if it successfully copied data, but there is still
+ *	more data to be had.
+ *
  *	n_tty_read()/consumer path:
  *		caller holds non-exclusive termios_rwsem
  *		read_tail published
  */
 
-static void copy_from_read_buf(struct tty_struct *tty,
+static bool copy_from_read_buf(struct tty_struct *tty,
 				      unsigned char **kbp,
 				      size_t *nr)
 
@@ -1980,10 +1978,14 @@ static void copy_from_read_buf(struct tty_struct *tty,
 		/* Turn single EOF into zero-length read */
 		if (L_EXTPROC(tty) && ldata->icanon && is_eof &&
 		    (head == ldata->read_tail))
-			n = 0;
+			return false;
 		*kbp += n;
 		*nr -= n;
+
+		/* If we have more to copy, let the caller know */
+		return head != ldata->read_tail;
 	}
+	return false;
 }
 
 /**
@@ -2135,6 +2137,25 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 	int packet;
 	size_t tail;
 
+	/*
+	 * Is this a continuation of a read started earler?
+	 *
+	 * If so, we still hold the atomic_read_lock and the
+	 * termios_rwsem, and can just continue to copy data.
+	 */
+	if (*cookie) {
+		if (copy_from_read_buf(tty, &kb, &nr))
+			return kb - kbuf;
+
+		/* No more data - release locks and stop retries */
+		n_tty_kick_worker(tty);
+		n_tty_check_unthrottle(tty);
+		up_read(&tty->termios_rwsem);
+		mutex_unlock(&ldata->atomic_read_lock);
+		*cookie = NULL;
+		return kb - kbuf;
+	}
+
 	c = job_control(tty, file);
 	if (c < 0)
 		return c;
@@ -2231,9 +2252,20 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 				nr--;
 			}
 
-			/* See comment above copy_from_read_buf() why twice */
-			copy_from_read_buf(tty, &kb, &nr);
-			copy_from_read_buf(tty, &kb, &nr);
+			/*
+			 * Copy data, and if there is more to be had
+			 * and we have nothing more to wait for, then
+			 * let's mark us for retries.
+			 *
+			 * NOTE! We return here with both the termios_sem
+			 * and atomic_read_lock still held, the retries
+			 * will release them when done.
+			 */
+			if (copy_from_read_buf(tty, &kb, &nr) && kb - kbuf >= minimum) {
+				remove_wait_queue(&tty->read_wait, &wait);
+				*cookie = cookie;
+				return kb - kbuf;
+			}
 		}
 
 		n_tty_check_unthrottle(tty);
-- 
2.29.2.157.g1d47791a39


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-20  4:44                       ` Linus Torvalds
@ 2021-01-20  8:15                         ` Oliver Giles
  2021-01-21  1:18                         ` tty splice branch (was "Re: Splicing to/from a tty") Linus Torvalds
  2021-01-21 17:03                         ` Splicing to/from a tty Robert Karszniewicz
  2 siblings, 0 replies; 65+ messages in thread
From: Oliver Giles @ 2021-01-20  8:15 UTC (permalink / raw)
  To: Linus Torvalds, Robert Karszniewicz
  Cc: Greg Kroah-Hartman, Christoph Hellwig, Linux Kernel Mailing List,
	Al Viro, Jiri Slaby

On Wed Jan 20, 2021 at 5:44 PM NZDT, Linus Torvalds wrote:
>
> But in the meantime, here's two more patches to try on top of the
> previous four. They shouldn't matter, other than making the non-icanon
> throughput a lot better. But the more coverage they get, the happier
> I'll be.
>

I tried these out too, my use case is still working well.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-18 19:54         ` Al Viro
@ 2021-01-20 16:26           ` Al Viro
  2021-01-20 19:11             ` Al Viro
  0 siblings, 1 reply; 65+ messages in thread
From: Al Viro @ 2021-01-20 16:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Johannes Berg, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

On Mon, Jan 18, 2021 at 07:54:00PM +0000, Al Viro wrote:
> On Mon, Jan 18, 2021 at 11:46:42AM -0800, Linus Torvalds wrote:
> > On Mon, Jan 18, 2021 at 11:35 AM Al Viro <viro@zeniv-ca> wrote:
> > >
> > > I'd rather have sendfile(2) do what splice(2) does and handle pipes
> > > directly.  Let me take a look,,,
> > 
> > It's not technically just the sendfile() thing. Some things do
> > sendfile "internally" (ie they use do_splice_direct()).
> > 
> > Although maybe they always happen just on non-pipe destinations (ie
> > file-to-file copy). I didn't check.
> > 
> > But particularly if it can be handled in do_splice_direct(), that
> > would be a good thing.
> 
> FWIW, it might make sense to merge do_splice_direct() and do_splice();
> interfaces are very similar and do_splice() is basically
> 	if both are pipes
> 		....
> 	else if input is pipe
> 		....
> 	else if output is pipe
> 		....
> 	else
> 		return -EINVAL
> with do_splice_direct() being fairly close to the missing case.

Yecchhh...  The situation with checks is interesting.
	do_splice():
		in needs FMODE_READ, out - FMODE_WRITE [EBADF]
		if in is a pipe, off_in must be NULL.  [ESPIPE]
		if out is a pipe, off_out must be NULL.  [ESPIPE]
		if in is not a pipe
			non-NULL off_in is allowed only with FMODE_PREAD [EINVAL]
			rw_verify_area done on in
		if out is not a pipe
			non-NULL off_out is allowed only with FMODE_PWRITE [EINVAL]
			no O_APPEND on out [EINVAL]
			rw_verify_area done on out
		either in or out must be a pipe [EINVAL]
	do_splice_direct():
		out needs FMODE_WRITE [EBADF]
		rw_verify_area done on out
		no O_APPEND on out [EINVAL]
Callers:
	__do_splice() -> do_splice():
		if in is a pipe, off_in must be NULL.  [ESPIPE, duplicate]
		if out is a pipe, off_out must be NULL.  [ESPIPE, duplicate]

	io_uring io_splice() -> do_splice():
		no extra checks

        do_sendfile() -> do_splice_direct():
		in needs FMODE_READ [EBADF]
                out needs FMODE_WRITE [EBADF, duplicate]
		non-NULL off_in is allowed only with FMODE_PREAD [EINVAL]
		for out we act as splice with NULL off_out (== use ->f_pos)
		rw_verify_area done on in
		rw_verify_area done on out [duplicate]

	vfs_copy_file_range() -> do_copy_file_range() ->
		-> generic_copy_file_range() or __ceph_copy_file_range() ->
		-> do_splice_direct():
		both in and out must be regular [EINVAL or EISDIR]
		in needs FMODE_READ [EBADF]
		out needs FMODE_WRITE [EBADF]
		no O_APPEND on out [EBADF]
		on immutable for out [EPERM]
		rw_verify_area done on in
		rw_verify_area done on out [duplicate]
		FMODE_P{READ,WRITE} is ignored.  I wonder what happens if
			somebody starts playing with copy_file_range(2) on
			e.g. procfs or sysfs...

	ovl_copy_up_date() -> do_splice_direct():
		both in and out are hopefully regular - I'm not sure that copyup
			logics is sufficiently protected, TBH.
		FMODE_READ on in and FMODE_WRITE on out are guaranteed
		lack of O_APPEND on out is guaranteed
		FMODE_P{READ,WRITE} is completely ignored.  Should be OK, modulo
			the same issues with protection against manipulation of
			layers.

Checks in vfs_copy_file_range() need to remain there - do_splice_direct() is
not guaranteed downstream of that.

AFAICS, we have the following:
	in must have FMODE_READ
	out must have FMODE_WRITE
	no O_APPEND on out, unless a pipe[1]
	for non-pipe out we do rw_verify_area
	for non-pipe in we do rw_verify_area
	no offsets for pipes

The lack of offsets for pipes should've been "not without FMODE_P{READ,WRITE}",
but that's not entirely consistent - we have splice(2) fail with EINVAL in
case of non-NULL off_in on a file that doesn't allow pread(), except that
when this file happens to be a pipe (which obviously doesn't allow pread())
we fail with ESPIPE instead.  Note that for pread(2)/pwrite(2)/sendfile(2)
we fail with consistent ESPIPE in all such cases.

We also do not check FMODE_PREAD/FMODE_PWRITE for copy_file_range()/copyup.
Which is probably wrong.

Another inconsistency is that rw_verify_area() in copyup is done for
output (once for each chunk), but not for input.  The tricky part here
is that there's an LSM hook in the end of rw_verify_area()...

Can we turn "not a pipe, but lacks FMODE_PREAD" error value for splice(2)
from EINVAL to ESPIPE?  Would make the logics easier to consolidate...
Right now the checks are spread over many layers of several call chains
and rather hard to follow.

[1] yes, it is possible to have O_APPEND opened pipes - open a FIFO with
O_APPEND and you've got it.  We are not quite consistent in handling
those - sendfile() to such is rejected, splice() is not.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-20 16:26           ` Al Viro
@ 2021-01-20 19:11             ` Al Viro
  2021-01-20 19:27               ` Linus Torvalds
  0 siblings, 1 reply; 65+ messages in thread
From: Al Viro @ 2021-01-20 19:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Johannes Berg, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

On Wed, Jan 20, 2021 at 04:26:08PM +0000, Al Viro wrote:

> [1] yes, it is possible to have O_APPEND opened pipes - open a FIFO with
> O_APPEND and you've got it.  We are not quite consistent in handling
> those - sendfile() to such is rejected, splice() is not.

BTW, according to manpages of splice(2) and sendfile(2), we have
       EINVAL [snip] target file is opened in append mode [snip]
and
       EINVAL out_fd has the O_APPEND flag set.  This is not currently supported by sendfile().

However, splice(2) to FIFO opened with O_APPEND works just fine.  So
it doesn't match the manpage either.

Why do we care about O_APPEND on anything without FMODE_PWRITE (including
pipes), anyway?  All writes there ignore position, after all...

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-20 19:11             ` Al Viro
@ 2021-01-20 19:27               ` Linus Torvalds
  2021-01-20 22:25                 ` David Laight
  2021-01-20 23:14                 ` Al Viro
  0 siblings, 2 replies; 65+ messages in thread
From: Linus Torvalds @ 2021-01-20 19:27 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Johannes Berg, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

On Wed, Jan 20, 2021 at 11:11 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Why do we care about O_APPEND on anything without FMODE_PWRITE (including
> pipes), anyway?  All writes there ignore position, after all...

We shouldn't care.

Also, I think we should try to move away from FMODE_PWRITE/PREAD
entirely, and use FMODE_STREAM as the primary "this thing doesn't have
a position at all".

That's what gets rid of all the f_pos locking etc after all. The
FMODE_PWRITE/PREAD flags are I think legacy (although we do seem to
have the seq_file case that normally allows position on reads, but not
on writes, so we may need to keep all three bits).

Anyway, I think that with FMODE_STREAM, O_APPEND definitely should be a no-op.

            Linus

^ permalink raw reply	[flat|nested] 65+ messages in thread

* RE: Splicing to/from a tty
  2021-01-20 19:27               ` Linus Torvalds
@ 2021-01-20 22:25                 ` David Laight
  2021-01-20 23:02                   ` Al Viro
  2021-01-20 23:14                 ` Al Viro
  1 sibling, 1 reply; 65+ messages in thread
From: David Laight @ 2021-01-20 22:25 UTC (permalink / raw)
  To: 'Linus Torvalds', Al Viro
  Cc: Christoph Hellwig, Johannes Berg, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

From: Linus Torvalds
> Sent: 20 January 2021 19:27
> On Wed, Jan 20, 2021 at 11:11 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Why do we care about O_APPEND on anything without FMODE_PWRITE (including
> > pipes), anyway?  All writes there ignore position, after all...
> 
> We shouldn't care.
> 
> Also, I think we should try to move away from FMODE_PWRITE/PREAD
> entirely, and use FMODE_STREAM as the primary "this thing doesn't have
> a position at all".
> 
> That's what gets rid of all the f_pos locking etc after all. The
> FMODE_PWRITE/PREAD flags are I think legacy (although we do seem to
> have the seq_file case that normally allows position on reads, but not
> on writes, so we may need to keep all three bits).
> 
> Anyway, I think that with FMODE_STREAM, O_APPEND definitely should be a no-op.

I also wonder if pread/pwrite with offset == 0 should be valid
on things where the offset makes no sense.

I'm rather surprised the offset isn't just silently ignored
for devices where seeking is non-sensical.
You might want to error it for mag tapes, but not pipes,
ttys, sockets etc.

I really can't remember what SYSV, Solaris or NetBSD do.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-20 22:25                 ` David Laight
@ 2021-01-20 23:02                   ` Al Viro
  0 siblings, 0 replies; 65+ messages in thread
From: Al Viro @ 2021-01-20 23:02 UTC (permalink / raw)
  To: David Laight
  Cc: 'Linus Torvalds',
	Christoph Hellwig, Johannes Berg, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

On Wed, Jan 20, 2021 at 10:25:56PM +0000, David Laight wrote:

> I also wonder if pread/pwrite with offset == 0 should be valid
> on things where the offset makes no sense.
> 
> I'm rather surprised the offset isn't just silently ignored
> for devices where seeking is non-sensical.
> You might want to error it for mag tapes, but not pipes,
> ttys, sockets etc.
> 
> I really can't remember what SYSV, Solaris or NetBSD do.

... nor can you be arsed to RTFPOSIX.   Why am I not surprised?

In https://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html
(located by arcane action known as googling for pwrite POSIX):
==============================
The pwrite() function shall fail if:

[EINVAL]
	The file is a regular file or block special file, and the offset argument is negative.
The file offset shall remain unchanged.
[ESPIPE]
	The file is incapable of seeking.
==============================

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-20 19:27               ` Linus Torvalds
  2021-01-20 22:25                 ` David Laight
@ 2021-01-20 23:14                 ` Al Viro
  2021-01-20 23:40                   ` Linus Torvalds
  1 sibling, 1 reply; 65+ messages in thread
From: Al Viro @ 2021-01-20 23:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Johannes Berg, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

On Wed, Jan 20, 2021 at 11:27:26AM -0800, Linus Torvalds wrote:
> On Wed, Jan 20, 2021 at 11:11 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Why do we care about O_APPEND on anything without FMODE_PWRITE (including
> > pipes), anyway?  All writes there ignore position, after all...
> 
> We shouldn't care.
> 
> Also, I think we should try to move away from FMODE_PWRITE/PREAD
> entirely, and use FMODE_STREAM as the primary "this thing doesn't have
> a position at all".
> 
> That's what gets rid of all the f_pos locking etc after all. The
> FMODE_PWRITE/PREAD flags are I think legacy (although we do seem to
> have the seq_file case that normally allows position on reads, but not
> on writes, so we may need to keep all three bits).

Umm...  Why do we clear FMODE_PWRITE there, anyway?  It came in
commit 915a29ec1c5e34283a6231af1036114e4d612cb0
Author: Linus Torvalds <torvalds@ppc970.osdl.org>
Date:   Sat Aug 7 02:08:23 2004 -0700

    Add pread/pwrite support bits to match the lseek bit.
    
    This also removes the ESPIPE logic from pipes and seq_files,
    since the VFS layer now supports it.

with seq_read() losing the special-cased pread prevention and
seq_open() getting a ban on both pread and pwrite.  With
pread() support added in 2009, and (pointless) pwrite prohibition
left in place.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-20 23:14                 ` Al Viro
@ 2021-01-20 23:40                   ` Linus Torvalds
  2021-01-21  0:38                     ` Al Viro
  0 siblings, 1 reply; 65+ messages in thread
From: Linus Torvalds @ 2021-01-20 23:40 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Johannes Berg, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

On Wed, Jan 20, 2021 at 3:14 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Umm...  Why do we clear FMODE_PWRITE there [seq_open - ed], anyway?

I think it's pointless and historical, and comes from "several /proc
files supported the simple single-write model, nothing ever supported
moving around and writing".

The seq_file stuff was always about reading, and then the writing part
was generally random special-case hacks on the side.

So I think that "clear PWRITE" thing is to make sure we get sane error
cases if somebody tries something funny, knowing that none of the
hacky stuff support it.

And then the very special kernfs thing adds it back in, because it
does in fact allow seeking writes.

             Linus

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-20 23:40                   ` Linus Torvalds
@ 2021-01-21  0:38                     ` Al Viro
  2021-01-21  1:04                       ` Linus Torvalds
  0 siblings, 1 reply; 65+ messages in thread
From: Al Viro @ 2021-01-21  0:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Johannes Berg, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

On Wed, Jan 20, 2021 at 03:40:29PM -0800, Linus Torvalds wrote:
> On Wed, Jan 20, 2021 at 3:14 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Umm...  Why do we clear FMODE_PWRITE there [seq_open - ed], anyway?
> 
> I think it's pointless and historical, and comes from "several /proc
> files supported the simple single-write model, nothing ever supported
> moving around and writing".
> 
> The seq_file stuff was always about reading, and then the writing part
> was generally random special-case hacks on the side.
> 
> So I think that "clear PWRITE" thing is to make sure we get sane error
> cases if somebody tries something funny, knowing that none of the
> hacky stuff support it.
> 
> And then the very special kernfs thing adds it back in, because it
> does in fact allow seeking writes.

OK...  I wonder how many debugfs writable files allow pwrite() with
BS results...

Anyway, possibly more interesting question is why do we care about
O_APPEND at all - why not treat it the same way we do in write()?
Hell, even our pwrite() just goes ahead and writes to the end of
file, whatever position it had been given.  Yes, for pwrite(2) that's
contrary to POSIX, but it's probably cast in stone by that point
anyway...

Looking through the instances of ->splice_write(), iter_file_splice_write()
will end up appending the data to EOF and so will gfs2_file_splice_write().
For sockets (generic_splice_sendpage()) we definitely don't give a toss
about O_APPEND (F_SETFL can set it, so that case is possible to hit),
ditto for splice_write_null() and port_fops_splice_write().  Which leaves
only one instance: fuse_dev_splice_write(), which also should ignore
O_APPEND (IMO fuse_dev_open() ought to call nonseekable_open() anyway).

So... why do we ban O_APPEND on destination for splice() or for sendfile()?
AFAICS, if we simply remove that test, we'll end up with write going to
the end of O_APPEND file. same as for write()/pwrite().

Comments?

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-21  0:38                     ` Al Viro
@ 2021-01-21  1:04                       ` Linus Torvalds
  2021-01-21  1:45                         ` Al Viro
  2021-01-21 10:08                         ` David Laight
  0 siblings, 2 replies; 65+ messages in thread
From: Linus Torvalds @ 2021-01-21  1:04 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Johannes Berg, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

On Wed, Jan 20, 2021 at 4:38 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> OK...  I wonder how many debugfs writable files allow pwrite() with
> BS results...

I hope some of them check for "pos == 0" when they start parsing integers.

But honestly, I don't think it's a big deal. We've had these things
that just basically assume that whenever you write, the offset just
doesn't matter at all, and as long as some number comes in one single
write call, we accept it.

Because even if you end up doing something like just

   echo $SOMETHING > /sys/xyz/abc

and that "$SOMETHING" could be done multiple writes, in practice it
all works out just fine and it never really is. You almost have to try
to screw up with something like

  (echo -n 3; echo -n 4) > /sys/xyz/abc

to actually see two writes of "3" and "4" instead of one write with
"34". And honestly, if somebody does something like that, do we really
care? They might get 3, they might get 4, and they might get 34. They
get what they deserve.

> Anyway, possibly more interesting question is why do we care about
> O_APPEND at all - why not treat it the same way we do in write()?

The whole point of O_APPEND is that the position shouldn't matter.

And the whole point of "pwrite()" is that you specify a position.

So the two just do not go together - although we may have legacy
issues, of course.

In contrast, the whole point of just a plain "write()" is that the
position is the "current file position", with O_APPEND is just a
special rule for what the current position for a write is.

Now, splice() is able to do *both* write() and pwrite(), because
unlike pwrite() it doesn't take a "pos" argument, it takes a _pointer_
to pos. So with a NULL pointer, it's like read/write, and with a
non-NULL pointer it is like pread/pwrite.

So I do think that "splice with non-NULL off_out and O_APPEND" should
cause an error in general.

That said, we probabyl have legacy behavior with splice and pipes in
particular, and that legacy behavior would override any "this is
conceptually the sane model".

> So... why do we ban O_APPEND on destination for splice() or for sendfile()?

sendfile() shouldn't be an issue. The offset pointer for sendfile is
for the _source_, not the destination.

For splice(), I do think that O_APPEND and a position pointer don't
make sense as a combination, although if we do allow it for regular
file pwrite() (I didn't check), maybe we could allow it for splice()
too just to be erqually inconsistent.

Honestly, I don't think it's a huge deal. O_APPEND isn't that
interesting, but I do hope that if we allow O_APPEND and a file
position, then O_APPEND always overrides it.

           Linus

^ permalink raw reply	[flat|nested] 65+ messages in thread

* tty splice branch (was "Re: Splicing to/from a tty")
  2021-01-20  4:44                       ` Linus Torvalds
  2021-01-20  8:15                         ` Oliver Giles
@ 2021-01-21  1:18                         ` Linus Torvalds
  2021-01-21  8:44                           ` Greg Kroah-Hartman
  2021-01-21  8:50                           ` Jiri Slaby
  2021-01-21 17:03                         ` Splicing to/from a tty Robert Karszniewicz
  2 siblings, 2 replies; 65+ messages in thread
From: Linus Torvalds @ 2021-01-21  1:18 UTC (permalink / raw)
  To: Oliver Giles, Robert Karszniewicz, Greg Kroah-Hartman
  Cc: Christoph Hellwig, Linux Kernel Mailing List, Al Viro, Jiri Slaby

On Tue, Jan 19, 2021 at 8:44 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I'll come back to this tomorrow and do the line-buffered icanon case
> too (unless pull requests pile up), and then I'll be happy with the
> tty changes, and I think I can submit this series for real to Greg.

Greg, I don't know how you want to handle this.

I have a branch with my tty splice patches at

    git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git tty-splice

and that now includes doing that "cookie continuation" thing even for
the N_TTY icanon modes.

It passes my local tests, and I did try a few rather odd things. And
Oliver tested an ealier version without that final commit on his load.
But...

That tty splice thing is clearly a regression, but it's not like we
have seen a lot of reports of it, so it's clearly a very special
thing.

End result: I'm leaving it to you to decide how you want to handle it.
You can tell me to just merge it myself as a regression fix, despite
it being fairly late in the 5.11 series.  Or you can pull it into your
tty tree for linux-next and 5.12. And we can just plan to backport it
(for 5.10 and 5.11) later when it has had more wide testing.

Another alternative is to do just that first patch immediately (the
"tty: implement write_iter" one), because that one should be the
simple case that gets sendfile() and splice() working when the
_destination_ is a tty. The "source is a tty" is the much more complex
case that the other patches deal with.

         Linus

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-21  1:04                       ` Linus Torvalds
@ 2021-01-21  1:45                         ` Al Viro
  2021-01-21  3:38                           ` Linus Torvalds
  2021-01-21 10:08                         ` David Laight
  1 sibling, 1 reply; 65+ messages in thread
From: Al Viro @ 2021-01-21  1:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Johannes Berg, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

On Wed, Jan 20, 2021 at 05:04:17PM -0800, Linus Torvalds wrote:

> The whole point of O_APPEND is that the position shouldn't matter.
> 
> And the whole point of "pwrite()" is that you specify a position.
> 
> So the two just do not go together - although we may have legacy
> issues, of course.

Our pwrite(2):
BUGS
       POSIX  requires  that  opening  a  file  with the O_APPEND flag
       should have no effect on the location at which pwrite() writes
       data.  However, on Linux, if a file is opened with O_APPEND,
       pwrite() appends data to the  end of the file, regardless of
       the value of offset.
POSIX pwrite(2):
	The pwrite() function shall be equivalent to write(), except that
	it writes into a given position and does not change the file offset
	(regardless of whether O_APPEND is set).  The first three arguments
	to pwrite() are the same as write() with the addition of a fourth
	argument offset for the desired position inside the file.  An attempt
	to perform a pwrite() on a file that is incapable of seeking shall
	result in an error.

I don't believe that we could change behaviour of our pwrite(2) without
breaking userland, even if we wanted to.  It's been that way since
2.1.60 when pwrite() had been first introduced - 23 years ago is more
than enough to have it cast in stone.  We do allow pwrite(2) with O_APPEND
and on such descriptors it acts like write(2) on the same.

> Now, splice() is able to do *both* write() and pwrite(), because
> unlike pwrite() it doesn't take a "pos" argument, it takes a _pointer_
> to pos. So with a NULL pointer, it's like read/write, and with a
> non-NULL pointer it is like pread/pwrite.
> 
> So I do think that "splice with non-NULL off_out and O_APPEND" should
> cause an error in general.

splice() triggers an error for seekable destination with O_APPEND and
with NULL off_out.  Same for splice() to socket with
	fcntl(sock_fd, F_SETFL, O_APPEND);
done first.
 
> Honestly, I don't think it's a huge deal. O_APPEND isn't that
> interesting, but I do hope that if we allow O_APPEND and a file
> position, then O_APPEND always overrides it.

It does, when it is allowed.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-21  1:45                         ` Al Viro
@ 2021-01-21  3:38                           ` Linus Torvalds
  2021-01-21  6:05                             ` Willy Tarreau
  0 siblings, 1 reply; 65+ messages in thread
From: Linus Torvalds @ 2021-01-21  3:38 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Johannes Berg, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

On Wed, Jan 20, 2021 at 5:45 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> splice() triggers an error for seekable destination with O_APPEND and
> with NULL off_out.

Ok, that's just broken.

> Same for splice() to socket with
>         fcntl(sock_fd, F_SETFL, O_APPEND);
> done first.

Same.

As long as you don't pass a position pointer, I think both should just work.

Not that I imagine it matters for a lot of people..

              Linus

           Linus

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-21  3:38                           ` Linus Torvalds
@ 2021-01-21  6:05                             ` Willy Tarreau
  2021-01-21  8:04                               ` Johannes Berg
  0 siblings, 1 reply; 65+ messages in thread
From: Willy Tarreau @ 2021-01-21  6:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Christoph Hellwig, Johannes Berg, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

On Wed, Jan 20, 2021 at 07:38:38PM -0800, Linus Torvalds wrote:
> On Wed, Jan 20, 2021 at 5:45 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > splice() triggers an error for seekable destination with O_APPEND and
> > with NULL off_out.
> 
> Ok, that's just broken.
> 
> > Same for splice() to socket with
> >         fcntl(sock_fd, F_SETFL, O_APPEND);
> > done first.
> 
> Same.
> 
> As long as you don't pass a position pointer, I think both should just work.
> 
> Not that I imagine it matters for a lot of people..

I think that most users of splice() on sockets got used to falling back
to recv/send on splice failure due to various cases not being supported
historically (UNIX family sockets immediately come to my mind but I seem
to remember other combinations). Thus I guess that most users of splice()
detect that it doesn't work either due to lower than expected performance
or while running strace.

Willy

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-21  6:05                             ` Willy Tarreau
@ 2021-01-21  8:04                               ` Johannes Berg
  0 siblings, 0 replies; 65+ messages in thread
From: Johannes Berg @ 2021-01-21  8:04 UTC (permalink / raw)
  To: Willy Tarreau, Linus Torvalds
  Cc: Al Viro, Christoph Hellwig, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

On Thu, 2021-01-21 at 07:05 +0100, Willy Tarreau wrote:

> I think that most users of splice() on sockets got used to falling back
> to recv/send on splice failure due to various cases not being supported
> historically (UNIX family sockets immediately come to my mind but I seem
> to remember other combinations).

Note, however, that I got here because cgit, if using sendfile(), does
*not* fall back if it fails (and thus my git tree view is currently down
because I haven't downgraded the kernel so far). That may not be common
for splice() though.

johannes


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: tty splice branch (was "Re: Splicing to/from a tty")
  2021-01-21  1:18                         ` tty splice branch (was "Re: Splicing to/from a tty") Linus Torvalds
@ 2021-01-21  8:44                           ` Greg Kroah-Hartman
  2021-01-21  8:50                           ` Jiri Slaby
  1 sibling, 0 replies; 65+ messages in thread
From: Greg Kroah-Hartman @ 2021-01-21  8:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oliver Giles, Robert Karszniewicz, Christoph Hellwig,
	Linux Kernel Mailing List, Al Viro, Jiri Slaby

On Wed, Jan 20, 2021 at 05:18:36PM -0800, Linus Torvalds wrote:
> On Tue, Jan 19, 2021 at 8:44 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > I'll come back to this tomorrow and do the line-buffered icanon case
> > too (unless pull requests pile up), and then I'll be happy with the
> > tty changes, and I think I can submit this series for real to Greg.
> 
> Greg, I don't know how you want to handle this.
> 
> I have a branch with my tty splice patches at
> 
>     git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git tty-splice
> 
> and that now includes doing that "cookie continuation" thing even for
> the N_TTY icanon modes.
> 
> It passes my local tests, and I did try a few rather odd things. And
> Oliver tested an ealier version without that final commit on his load.
> But...
> 
> That tty splice thing is clearly a regression, but it's not like we
> have seen a lot of reports of it, so it's clearly a very special
> thing.
> 
> End result: I'm leaving it to you to decide how you want to handle it.
> You can tell me to just merge it myself as a regression fix, despite
> it being fairly late in the 5.11 series.  Or you can pull it into your
> tty tree for linux-next and 5.12. And we can just plan to backport it
> (for 5.10 and 5.11) later when it has had more wide testing.
> 
> Another alternative is to do just that first patch immediately (the
> "tty: implement write_iter" one), because that one should be the
> simple case that gets sendfile() and splice() working when the
> _destination_ is a tty. The "source is a tty" is the much more complex
> case that the other patches deal with.

Let me do this last thing.  I've taken your one patch into my
"tty-linus" branch and will go beat on it for a day and then ask you to
pull it in for the next 5.11-rc release, and I've taken your full series
into my "tty-next" branch so it will get much wider testing in
linux-next for a few weeks.  If it turns out that we get reports of the
"splice/sendfile from a tty", we can always merge them into 5.11 and
5.10 as needed.

Thanks for doing this work,

greg k-h

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: tty splice branch (was "Re: Splicing to/from a tty")
  2021-01-21  1:18                         ` tty splice branch (was "Re: Splicing to/from a tty") Linus Torvalds
  2021-01-21  8:44                           ` Greg Kroah-Hartman
@ 2021-01-21  8:50                           ` Jiri Slaby
  2021-01-21  8:58                             ` Jiri Slaby
  2021-01-21  8:58                             ` Greg Kroah-Hartman
  1 sibling, 2 replies; 65+ messages in thread
From: Jiri Slaby @ 2021-01-21  8:50 UTC (permalink / raw)
  To: Linus Torvalds, Oliver Giles, Robert Karszniewicz, Greg Kroah-Hartman
  Cc: Christoph Hellwig, Linux Kernel Mailing List, Al Viro, Jiri Slaby

On 21. 01. 21, 2:18, Linus Torvalds wrote:
> On Tue, Jan 19, 2021 at 8:44 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> I'll come back to this tomorrow and do the line-buffered icanon case
>> too (unless pull requests pile up), and then I'll be happy with the
>> tty changes, and I think I can submit this series for real to Greg.
> 
> Greg, I don't know how you want to handle this.
> 
> I have a branch with my tty splice patches at
> 
>      git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git tty-splice
> 
> and that now includes doing that "cookie continuation" thing even for
> the N_TTY icanon modes.
> 
> It passes my local tests, and I did try a few rather odd things. And
> Oliver tested an ealier version without that final commit on his load.
> But...

Hm, I would like to review this first. I noticed the changes only 
because a new branch appeared when I grabbed your tree and the branch 
has "tty" in its name.

So for example:

> @@ -1038,18 +1045,15 @@ static ssize_t tty_write(struct file *file, const char __user *buf,
>         if (tty->ops->write_room == NULL)
>                 tty_err(tty, "missing write_room method\n");
>         ld = tty_ldisc_ref_wait(tty);
> -       if (!ld)
> -               return hung_up_tty_write(file, buf, count, ppos);
> -       if (!ld->ops->write)
> +       if (!ld || !ld->ops->write)
>                 ret = -EIO;
>         else
> -               ret = do_tty_write(ld->ops->write, tty, file, buf, count);
> +               ret = do_tty_write(ld->ops->write, tty, file, from);
>         tty_ldisc_deref(ld);

if ld == NULL => crash here.

So can you send the patches to the list and Cc me too?

>         return ret;
>  }
thanks,
-- 
js

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: tty splice branch (was "Re: Splicing to/from a tty")
  2021-01-21  8:50                           ` Jiri Slaby
@ 2021-01-21  8:58                             ` Jiri Slaby
  2021-01-21 17:52                               ` Linus Torvalds
  2021-01-21  8:58                             ` Greg Kroah-Hartman
  1 sibling, 1 reply; 65+ messages in thread
From: Jiri Slaby @ 2021-01-21  8:58 UTC (permalink / raw)
  To: Linus Torvalds, Oliver Giles, Robert Karszniewicz, Greg Kroah-Hartman
  Cc: Christoph Hellwig, Linux Kernel Mailing List, Al Viro

On 21. 01. 21, 9:50, Jiri Slaby wrote:
> Hm, I would like to review this first. I noticed the changes only 
> because a new branch appeared when I grabbed your tree and the branch 
> has "tty" in its name.

Which is weird as you Cced me. Let me check what is wrong with my e-mail 
setup.

thanks,
-- 
js

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: tty splice branch (was "Re: Splicing to/from a tty")
  2021-01-21  8:50                           ` Jiri Slaby
  2021-01-21  8:58                             ` Jiri Slaby
@ 2021-01-21  8:58                             ` Greg Kroah-Hartman
  1 sibling, 0 replies; 65+ messages in thread
From: Greg Kroah-Hartman @ 2021-01-21  8:58 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Linus Torvalds, Oliver Giles, Robert Karszniewicz,
	Christoph Hellwig, Linux Kernel Mailing List, Al Viro

On Thu, Jan 21, 2021 at 09:50:39AM +0100, Jiri Slaby wrote:
> On 21. 01. 21, 2:18, Linus Torvalds wrote:
> > On Tue, Jan 19, 2021 at 8:44 PM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > > 
> > > I'll come back to this tomorrow and do the line-buffered icanon case
> > > too (unless pull requests pile up), and then I'll be happy with the
> > > tty changes, and I think I can submit this series for real to Greg.
> > 
> > Greg, I don't know how you want to handle this.
> > 
> > I have a branch with my tty splice patches at
> > 
> >      git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git tty-splice
> > 
> > and that now includes doing that "cookie continuation" thing even for
> > the N_TTY icanon modes.
> > 
> > It passes my local tests, and I did try a few rather odd things. And
> > Oliver tested an ealier version without that final commit on his load.
> > But...
> 
> Hm, I would like to review this first. I noticed the changes only because a
> new branch appeared when I grabbed your tree and the branch has "tty" in its
> name.
> 
> So for example:
> 
> > @@ -1038,18 +1045,15 @@ static ssize_t tty_write(struct file *file, const char __user *buf,
> >         if (tty->ops->write_room == NULL)
> >                 tty_err(tty, "missing write_room method\n");
> >         ld = tty_ldisc_ref_wait(tty);
> > -       if (!ld)
> > -               return hung_up_tty_write(file, buf, count, ppos);
> > -       if (!ld->ops->write)
> > +       if (!ld || !ld->ops->write)
> >                 ret = -EIO;
> >         else
> > -               ret = do_tty_write(ld->ops->write, tty, file, buf, count);
> > +               ret = do_tty_write(ld->ops->write, tty, file, from);
> >         tty_ldisc_deref(ld);
> 
> if ld == NULL => crash here.
> 
> So can you send the patches to the list and Cc me too?

I'll send them out right now, I've already merged them to my branches.

greg k-h

^ permalink raw reply	[flat|nested] 65+ messages in thread

* RE: Splicing to/from a tty
  2021-01-21  1:04                       ` Linus Torvalds
  2021-01-21  1:45                         ` Al Viro
@ 2021-01-21 10:08                         ` David Laight
  1 sibling, 0 replies; 65+ messages in thread
From: David Laight @ 2021-01-21 10:08 UTC (permalink / raw)
  To: 'Linus Torvalds', Al Viro
  Cc: Christoph Hellwig, Johannes Berg, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

From: Linus Torvalds
> Sent: 21 January 2021 01:04
> On Wed, Jan 20, 2021 at 4:38 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > OK...  I wonder how many debugfs writable files allow pwrite() with
> > BS results...
> 
> I hope some of them check for "pos == 0" when they start parsing integers.
> 
> But honestly, I don't think it's a big deal. We've had these things
> that just basically assume that whenever you write, the offset just
> doesn't matter at all, and as long as some number comes in one single
> write call, we accept it.
> 
> Because even if you end up doing something like just
> 
>    echo $SOMETHING > /sys/xyz/abc
> 
> and that "$SOMETHING" could be done multiple writes, in practice it
> all works out just fine and it never really is. You almost have to try
> to screw up with something like
> 
>   (echo -n 3; echo -n 4) > /sys/xyz/abc
> 
> to actually see two writes of "3" and "4" instead of one write with
> "34". And honestly, if somebody does something like that, do we really
> care? They might get 3, they might get 4, and they might get 34. They
> get what they deserve.

Or worse:
	echo abc >/sys/xyz/abc
	echo x | dd bs=1 count=2 oseek=1 conv=notrunc of=/sys/xyz/abc
which (if I got the dd command right) would generate "axc" on a real file.

OTOH multiple short reads are quite likely.
Best not done on a counter...

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-20  4:44                       ` Linus Torvalds
  2021-01-20  8:15                         ` Oliver Giles
  2021-01-21  1:18                         ` tty splice branch (was "Re: Splicing to/from a tty") Linus Torvalds
@ 2021-01-21 17:03                         ` Robert Karszniewicz
  2021-01-21 18:43                           ` Linus Torvalds
  2 siblings, 1 reply; 65+ messages in thread
From: Robert Karszniewicz @ 2021-01-21 17:03 UTC (permalink / raw)
  To: Linus Torvalds, Oliver Giles
  Cc: Greg Kroah-Hartman, Christoph Hellwig, Linux Kernel Mailing List,
	Al Viro, Jiri Slaby

On 1/20/21 5:44 AM, Linus Torvalds wrote:
> On Tue, Jan 19, 2021 at 5:29 PM Oliver Giles <ohw.giles@gmail.com> wrote:
>>
>> Writing this from a kernel with those patches in; happily splice()ing
>> to and from a pty.
> 
> Ok, good.
> 
> I have a couple of improvement patches on top of those, that I'm attaching here.
> 
> [...]
> 
> But in the meantime, here's two more patches to try on top of the
> previous four. They shouldn't matter, other than making the non-icanon
> throughput a lot better. But the more coverage they get, the happier
> I'll be.

I confirm that the 4 patches, as well as the 4+2 patches, fix the regression I
noticed with cat failing on sendfile() to ttymxc0.

Thanks,
Robert

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: tty splice branch (was "Re: Splicing to/from a tty")
  2021-01-21  8:58                             ` Jiri Slaby
@ 2021-01-21 17:52                               ` Linus Torvalds
  0 siblings, 0 replies; 65+ messages in thread
From: Linus Torvalds @ 2021-01-21 17:52 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Oliver Giles, Robert Karszniewicz, Greg Kroah-Hartman,
	Christoph Hellwig, Linux Kernel Mailing List, Al Viro

On Thu, Jan 21, 2021 at 12:58 AM Jiri Slaby <jirislaby@kernel.org> wrote:
>
> Which is weird as you Cced me. Let me check what is wrong with my e-mail
> setup.

Yeah, I think you were cc'd on not just the patches, but on all the
discussion that preceded them.

But better late than never. I was actually much more nervous about the
much more subtle "cookie continuation" stuff, not the trivial
conversion patches.

Which just goes to show that most bugs are when you don't really think
about it because it's "trivial".

            Linus

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-21 17:03                         ` Splicing to/from a tty Robert Karszniewicz
@ 2021-01-21 18:43                           ` Linus Torvalds
  0 siblings, 0 replies; 65+ messages in thread
From: Linus Torvalds @ 2021-01-21 18:43 UTC (permalink / raw)
  To: Robert Karszniewicz
  Cc: Oliver Giles, Greg Kroah-Hartman, Christoph Hellwig,
	Linux Kernel Mailing List, Al Viro, Jiri Slaby

On Thu, Jan 21, 2021 at 9:03 AM Robert Karszniewicz
<r.karszniewicz@phytec.de> wrote:
>
> I confirm that the 4 patches, as well as the 4+2 patches, fix the regression I
> noticed with cat failing on sendfile() to ttymxc0.

Thanks. Now we just need to find somebody with HDLC use cases and
we'll actually have this series properly tested...

          Linus

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-18 19:45         ` Al Viro
  2021-01-18 19:49           ` Linus Torvalds
@ 2021-01-24 19:11           ` Linus Torvalds
  2021-01-25  9:16             ` [PATCH] fs/pipe: allow sendfile() to pipe again Johannes Berg
  2021-01-26  6:07             ` Splicing to/from a tty Al Viro
  1 sibling, 2 replies; 65+ messages in thread
From: Linus Torvalds @ 2021-01-24 19:11 UTC (permalink / raw)
  To: Al Viro
  Cc: Johannes Berg, Christoph Hellwig, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

Al,
 coming back to this because rc5 is imminent..

On Mon, Jan 18, 2021 at 11:45 AM Al Viro <viro@zeniv-ca> wrote:
>
> do_splice_direct() does something that do_splice() won't - it
> handles non-pipe to non-pipe case.  Which is how sendfile(2) is
> normally used, of course.
>
> I'll look into that in more details, but IMO bothering with
> internal pipe is just plain wrong for those cases.

You clearly thought about this, with the emails about odd error cases,
but I get the feeling that for fixing the current "you can't
sendfile() to a pipe" regression (including stable) we should do the
one-liner. No?

I agree that it would be better fixed by just having sendfile()
basically turn into splice() for the pipe target case, but I haven't
seen any patches from you, so I assume it wasn't 100% trivial.

Hmm?

              Linus

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] fs/pipe: allow sendfile() to pipe again
  2021-01-24 19:11           ` Linus Torvalds
@ 2021-01-25  9:16             ` Johannes Berg
  2021-01-25 10:16               ` Christoph Hellwig
  2021-01-25 20:34               ` Linus Torvalds
  2021-01-26  6:07             ` Splicing to/from a tty Al Viro
  1 sibling, 2 replies; 65+ messages in thread
From: Johannes Berg @ 2021-01-25  9:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Christoph Hellwig, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

After commit 36e2c7421f02 ("fs: don't allow splice read/write
without explicit ops") sendfile() could no longer send data
from a real file to a pipe, breaking for example certain cgit
setups (e.g. when running behind fcgiwrap), because in this
case cgit will try to do exactly this: sendfile() to a pipe.

Fix this by using iter_file_splice_write for the splice_write
method of pipes, as suggested by Christoph.

Cc: stable@vger.kernel.org
Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops")
Suggested-by: Christoph Hellwig <hch@lst.de>
Tested-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
---
Mostly for the record, since it looks like we'll have a proper
fix without another intermediate pipe. However, this fixes the
regression for now.
---
 fs/pipe.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/pipe.c b/fs/pipe.c
index c5989cfd564d..39c96845a72f 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1206,6 +1206,7 @@ const struct file_operations pipefifo_fops = {
 	.unlocked_ioctl	= pipe_ioctl,
 	.release	= pipe_release,
 	.fasync		= pipe_fasync,
+	.splice_write	= iter_file_splice_write,
 };
 
 /*
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH] fs/pipe: allow sendfile() to pipe again
  2021-01-25  9:16             ` [PATCH] fs/pipe: allow sendfile() to pipe again Johannes Berg
@ 2021-01-25 10:16               ` Christoph Hellwig
  2021-01-25 20:34               ` Linus Torvalds
  1 sibling, 0 replies; 65+ messages in thread
From: Christoph Hellwig @ 2021-01-25 10:16 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Linus Torvalds, Al Viro, Christoph Hellwig, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] fs/pipe: allow sendfile() to pipe again
  2021-01-25  9:16             ` [PATCH] fs/pipe: allow sendfile() to pipe again Johannes Berg
  2021-01-25 10:16               ` Christoph Hellwig
@ 2021-01-25 20:34               ` Linus Torvalds
  1 sibling, 0 replies; 65+ messages in thread
From: Linus Torvalds @ 2021-01-25 20:34 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Al Viro, Christoph Hellwig, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

On Mon, Jan 25, 2021 at 1:16 AM Johannes Berg <johannes@sipsolutions.net> wrote:
>
> Mostly for the record, since it looks like we'll have a proper
> fix without another intermediate pipe. However, this fixes the
> regression for now.

I've applied this anyway.

We'll see how big Al's proper fix ends up being, and depending on that
this one-liner might just be the right thing for stable anyway.

And if Al's patch ends up being small, simple and clean, this
one-liner won't matter, but it's not like there's any real downside to
it.

           Linus

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-24 19:11           ` Linus Torvalds
  2021-01-25  9:16             ` [PATCH] fs/pipe: allow sendfile() to pipe again Johannes Berg
@ 2021-01-26  6:07             ` Al Viro
  2021-01-26  6:08               ` [PATCH 1/3] do_splice_to(): move the logics for limiting the read length in Al Viro
                                 ` (3 more replies)
  1 sibling, 4 replies; 65+ messages in thread
From: Al Viro @ 2021-01-26  6:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Berg, Christoph Hellwig, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

On Sun, Jan 24, 2021 at 11:11:42AM -0800, Linus Torvalds wrote:
> Al,
>  coming back to this because rc5 is imminent..
> 
> On Mon, Jan 18, 2021 at 11:45 AM Al Viro <viro@zeniv-ca> wrote:
> >
> > do_splice_direct() does something that do_splice() won't - it
> > handles non-pipe to non-pipe case.  Which is how sendfile(2) is
> > normally used, of course.
> >
> > I'll look into that in more details, but IMO bothering with
> > internal pipe is just plain wrong for those cases.
> 
> You clearly thought about this, with the emails about odd error cases,
> but I get the feeling that for fixing the current "you can't
> sendfile() to a pipe" regression (including stable) we should do the
> one-liner. No?
> 
> I agree that it would be better fixed by just having sendfile()
> basically turn into splice() for the pipe target case, but I haven't
> seen any patches from you, so I assume it wasn't 100% trivial.
> 
> Hmm?

Just to make clear - sendfile() regular-to-pipe is *not* the same
issue as splice to/from tty.  The latter needs ->splice_read()
and ->splice_write() in tty_fops; the former can be dealt with
by teaching do_sendifile() to use the guts of do_splice() in
case when in or out happens to be a pipe (with some rearrangement
of checks) instead of bothering with do_splice_direct().

The only commonality between these two is that both got broken
by the same patch series.  Johannes' patch is an attempt to
deal with regular-to-pipe sendfile(2), and it's not a good
way to handle that.  Neither it, nor the variant I proposed
would do a damn thing for tty (and sendfile(2) never worked
for source other than regular or block anyway).  FWIW, the real
check in do_splice_direct() should be "has FMODE_PREAD", regardless
of the position we are asking to read from - do_splice_direct()
is basically
	while left to copy
		splice_read from in to internal pipe
		splice_write from internal pipe to out
and if splice_write ends up with short copy, we advance the position
by the amount written and discard everything left in the internal pipe.
Using it for something non-seekable would end up with data silently
lost on short copy.

Note that decision against splice(2) between non-pipes appears
to have been deliberate and unless Jens (and mingo?) decide
they are OK with that change, I'm *not* adding that functionality
to do_splice().

Anyway, the series is in vfs.git #work.sendfile, 5.11-rc1-based

Shortlog:
Al Viro (3):
      do_splice_to(): move the logics for limiting the read length in
      take the guts of file-to-pipe splice into a helper function
      teach sendfile(2) to handle send-to-pipe directly

Diffstat:
 fs/internal.h   |  9 +++++++++
 fs/read_write.c | 19 +++++++++++++------
 fs/splice.c     | 42 +++++++++++++++++++++++-------------------
 3 files changed, 45 insertions(+), 25 deletions(-)

Individual patches in followups; first two are equivalent transformations
(fairly obvious cleanup and taking a part of do_splice() into a helper),
while the third one makes do_sendfile() use that new helper for file-to-pipe
case.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH 1/3] do_splice_to(): move the logics for limiting the read length in
  2021-01-26  6:07             ` Splicing to/from a tty Al Viro
@ 2021-01-26  6:08               ` Al Viro
  2021-01-26  6:09               ` [PATCH 2/3] take the guts of file-to-pipe splice into a helper function Al Viro
                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 65+ messages in thread
From: Al Viro @ 2021-01-26  6:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Berg, Christoph Hellwig, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

Both callers have the identical logics limiting the amount of
data we try to read into pipe - no more than would fit into
that pipe.  Move that into do_splice_to() itself.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/splice.c | 21 +++++++--------------
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 866d5c2367b2..c1ca2cc63b43 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -771,11 +771,16 @@ static long do_splice_to(struct file *in, loff_t *ppos,
 			 struct pipe_inode_info *pipe, size_t len,
 			 unsigned int flags)
 {
+	unsigned int p_space;
 	int ret;
 
 	if (unlikely(!(in->f_mode & FMODE_READ)))
 		return -EBADF;
 
+	/* Don't try to read more the pipe has space for. */
+	p_space = pipe->max_usage - pipe_occupancy(pipe->head, pipe->tail);
+	len = min_t(size_t, len, p_space << PAGE_SHIFT);
+
 	ret = rw_verify_area(READ, in, ppos, len);
 	if (unlikely(ret < 0))
 		return ret;
@@ -856,15 +861,10 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
 	WARN_ON_ONCE(!pipe_empty(pipe->head, pipe->tail));
 
 	while (len) {
-		unsigned int p_space;
 		size_t read_len;
 		loff_t pos = sd->pos, prev_pos = pos;
 
-		/* Don't try to read more the pipe has space for. */
-		p_space = pipe->max_usage -
-			pipe_occupancy(pipe->head, pipe->tail);
-		read_len = min_t(size_t, len, p_space << PAGE_SHIFT);
-		ret = do_splice_to(in, &pos, pipe, read_len, flags);
+		ret = do_splice_to(in, &pos, pipe, len, flags);
 		if (unlikely(ret <= 0))
 			goto out_release;
 
@@ -1083,15 +1083,8 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out,
 
 		pipe_lock(opipe);
 		ret = wait_for_space(opipe, flags);
-		if (!ret) {
-			unsigned int p_space;
-
-			/* Don't try to read more the pipe has space for. */
-			p_space = opipe->max_usage - pipe_occupancy(opipe->head, opipe->tail);
-			len = min_t(size_t, len, p_space << PAGE_SHIFT);
-
+		if (!ret)
 			ret = do_splice_to(in, &offset, opipe, len, flags);
-		}
 		pipe_unlock(opipe);
 		if (ret > 0)
 			wakeup_pipe_readers(opipe);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 2/3] take the guts of file-to-pipe splice into a helper function
  2021-01-26  6:07             ` Splicing to/from a tty Al Viro
  2021-01-26  6:08               ` [PATCH 1/3] do_splice_to(): move the logics for limiting the read length in Al Viro
@ 2021-01-26  6:09               ` Al Viro
  2021-01-26  6:09               ` [PATCH 3/3] teach sendfile(2) to handle send-to-pipe directly Al Viro
  2021-01-26 18:49               ` Splicing to/from a tty Linus Torvalds
  3 siblings, 0 replies; 65+ messages in thread
From: Al Viro @ 2021-01-26  6:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Berg, Christoph Hellwig, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman


Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/splice.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index c1ca2cc63b43..74f968c65a93 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1002,6 +1002,23 @@ static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
 			       struct pipe_inode_info *opipe,
 			       size_t len, unsigned int flags);
 
+static long splice_file_to_pipe(struct file *in,
+			 struct pipe_inode_info *opipe,
+			 loff_t *offset,
+			 size_t len, unsigned int flags)
+{
+	long ret;
+
+	pipe_lock(opipe);
+	ret = wait_for_space(opipe, flags);
+	if (!ret)
+		ret = do_splice_to(in, offset, opipe, len, flags);
+	pipe_unlock(opipe);
+	if (ret > 0)
+		wakeup_pipe_readers(opipe);
+	return ret;
+}
+
 /*
  * Determine where to splice to/from.
  */
@@ -1081,13 +1098,7 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out,
 		if (out->f_flags & O_NONBLOCK)
 			flags |= SPLICE_F_NONBLOCK;
 
-		pipe_lock(opipe);
-		ret = wait_for_space(opipe, flags);
-		if (!ret)
-			ret = do_splice_to(in, &offset, opipe, len, flags);
-		pipe_unlock(opipe);
-		if (ret > 0)
-			wakeup_pipe_readers(opipe);
+		ret = splice_file_to_pipe(in, opipe, &offset, len, flags);
 		if (!off_in)
 			in->f_pos = offset;
 		else
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 3/3] teach sendfile(2) to handle send-to-pipe directly
  2021-01-26  6:07             ` Splicing to/from a tty Al Viro
  2021-01-26  6:08               ` [PATCH 1/3] do_splice_to(): move the logics for limiting the read length in Al Viro
  2021-01-26  6:09               ` [PATCH 2/3] take the guts of file-to-pipe splice into a helper function Al Viro
@ 2021-01-26  6:09               ` Al Viro
  2021-01-26 18:57                 ` Linus Torvalds
  2021-01-26 18:49               ` Splicing to/from a tty Linus Torvalds
  3 siblings, 1 reply; 65+ messages in thread
From: Al Viro @ 2021-01-26  6:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Berg, Christoph Hellwig, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

no point going through the intermediate pipe

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/internal.h   |  9 +++++++++
 fs/read_write.c | 19 +++++++++++++------
 fs/splice.c     |  2 +-
 3 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 77c50befbfbe..cff1f30cfefb 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -15,6 +15,7 @@ struct mount;
 struct shrink_control;
 struct fs_context;
 struct user_namespace;
+struct pipe_inode_info;
 
 /*
  * block_dev.c
@@ -193,3 +194,11 @@ int sb_init_dio_done_wq(struct super_block *sb);
  */
 int do_statx(int dfd, const char __user *filename, unsigned flags,
 	     unsigned int mask, struct statx __user *buffer);
+
+/*
+ * fs/splice.c:
+ */
+long splice_file_to_pipe(struct file *in,
+			 struct pipe_inode_info *opipe,
+			 loff_t *offset,
+			 size_t len, unsigned int flags);
diff --git a/fs/read_write.c b/fs/read_write.c
index 75f764b43418..9db7adf160d2 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1188,6 +1188,7 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 {
 	struct fd in, out;
 	struct inode *in_inode, *out_inode;
+	struct pipe_inode_info *opipe;
 	loff_t pos;
 	loff_t out_pos;
 	ssize_t retval;
@@ -1228,9 +1229,6 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 	in_inode = file_inode(in.file);
 	out_inode = file_inode(out.file);
 	out_pos = out.file->f_pos;
-	retval = rw_verify_area(WRITE, out.file, &out_pos, count);
-	if (retval < 0)
-		goto fput_out;
 
 	if (!max)
 		max = min(in_inode->i_sb->s_maxbytes, out_inode->i_sb->s_maxbytes);
@@ -1253,9 +1251,18 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 	if (in.file->f_flags & O_NONBLOCK)
 		fl = SPLICE_F_NONBLOCK;
 #endif
-	file_start_write(out.file);
-	retval = do_splice_direct(in.file, &pos, out.file, &out_pos, count, fl);
-	file_end_write(out.file);
+	opipe = get_pipe_info(out.file, true);
+	if (!opipe) {
+		retval = rw_verify_area(WRITE, out.file, &out_pos, count);
+		if (retval < 0)
+			goto fput_out;
+		file_start_write(out.file);
+		retval = do_splice_direct(in.file, &pos, out.file, &out_pos,
+					  count, fl);
+		file_end_write(out.file);
+	} else {
+		retval = splice_file_to_pipe(in.file, opipe, &pos, count, fl);
+	}
 
 	if (retval > 0) {
 		add_rchar(current, retval);
diff --git a/fs/splice.c b/fs/splice.c
index 74f968c65a93..b06846f1e6ee 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1002,7 +1002,7 @@ static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
 			       struct pipe_inode_info *opipe,
 			       size_t len, unsigned int flags);
 
-static long splice_file_to_pipe(struct file *in,
+long splice_file_to_pipe(struct file *in,
 			 struct pipe_inode_info *opipe,
 			 loff_t *offset,
 			 size_t len, unsigned int flags)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-26  6:07             ` Splicing to/from a tty Al Viro
                                 ` (2 preceding siblings ...)
  2021-01-26  6:09               ` [PATCH 3/3] teach sendfile(2) to handle send-to-pipe directly Al Viro
@ 2021-01-26 18:49               ` Linus Torvalds
  2021-01-26 19:42                 ` Al Viro
  3 siblings, 1 reply; 65+ messages in thread
From: Linus Torvalds @ 2021-01-26 18:49 UTC (permalink / raw)
  To: Al Viro
  Cc: Johannes Berg, Christoph Hellwig, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

On Mon, Jan 25, 2021 at 10:07 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sun, Jan 24, 2021 at 11:11:42AM -0800, Linus Torvalds wrote:
> >
> > I agree that it would be better fixed by just having sendfile()
> > basically turn into splice() for the pipe target case, but I haven't
> > seen any patches from you, so I assume it wasn't 100% trivial.
>
> Just to make clear - sendfile() regular-to-pipe is *not* the same
> issue as splice to/from tty.

That's not what I meant.

sendfile() to a pipe is basically the same thing as splice() to a pipe.

Except I think it might have different looping behavior. And as you
noted earlier, the error returns may be randomly different.

             Linus

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 3/3] teach sendfile(2) to handle send-to-pipe directly
  2021-01-26  6:09               ` [PATCH 3/3] teach sendfile(2) to handle send-to-pipe directly Al Viro
@ 2021-01-26 18:57                 ` Linus Torvalds
  2021-01-26 19:33                   ` Al Viro
  0 siblings, 1 reply; 65+ messages in thread
From: Linus Torvalds @ 2021-01-26 18:57 UTC (permalink / raw)
  To: Al Viro
  Cc: Johannes Berg, Christoph Hellwig, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

On Mon, Jan 25, 2021 at 10:09 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> no point going through the intermediate pipe

Ok, this series looks fine to me.

Al - do you want me to apply this as patches for 5.11 (and I'll revert
the then unnecessary one-liner), or should I expect a pull request or
what?

              Linus

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 3/3] teach sendfile(2) to handle send-to-pipe directly
  2021-01-26 18:57                 ` Linus Torvalds
@ 2021-01-26 19:33                   ` Al Viro
  0 siblings, 0 replies; 65+ messages in thread
From: Al Viro @ 2021-01-26 19:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Berg, Christoph Hellwig, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

On Tue, Jan 26, 2021 at 10:57:38AM -0800, Linus Torvalds wrote:
> On Mon, Jan 25, 2021 at 10:09 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > no point going through the intermediate pipe
> 
> Ok, this series looks fine to me.
> 
> Al - do you want me to apply this as patches for 5.11 (and I'll revert
> the then unnecessary one-liner), or should I expect a pull request or
> what?

I'm going to put it into #for-next for a few days, then send a pull request
your way (assuming no problems show up, obviously).

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: Splicing to/from a tty
  2021-01-26 18:49               ` Splicing to/from a tty Linus Torvalds
@ 2021-01-26 19:42                 ` Al Viro
  0 siblings, 0 replies; 65+ messages in thread
From: Al Viro @ 2021-01-26 19:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Berg, Christoph Hellwig, Oliver Giles,
	Linux Kernel Mailing List, Greg Kroah-Hartman

On Tue, Jan 26, 2021 at 10:49:11AM -0800, Linus Torvalds wrote:
> On Mon, Jan 25, 2021 at 10:07 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Sun, Jan 24, 2021 at 11:11:42AM -0800, Linus Torvalds wrote:
> > >
> > > I agree that it would be better fixed by just having sendfile()
> > > basically turn into splice() for the pipe target case, but I haven't
> > > seen any patches from you, so I assume it wasn't 100% trivial.
> >
> > Just to make clear - sendfile() regular-to-pipe is *not* the same
> > issue as splice to/from tty.
> 
> That's not what I meant.
> 
> sendfile() to a pipe is basically the same thing as splice() to a pipe.
> 
> Except I think it might have different looping behavior. And as you
> noted earlier, the error returns may be randomly different.

Sure.  What I'm saying is that there'd been two different regressions
discussed in that thread, and the one dealt with by this patch series
is not the one brought up in the original posting.  IOW, it's an
alternative not to "let's add ->splice_{read,write}() for tty" stuff
but to "let's add ->splice_write() for pipes" one-liner.

I do not doubt that you know that.  But the thread had been confusing
enough (hell, the sendfile-related subthread has kept the original
subject explicitly refering to tty), so I wanted to avoid any confusion
for somebody looking through it a year or two down the road.

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2021-01-27  5:56 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-16  7:35 Splicing to/from a tty Oliver Giles
2021-01-16 16:46 ` Johannes Berg
2021-01-17  6:12   ` Oliver Giles
2021-01-18  8:53   ` Christoph Hellwig
2021-01-18  8:58     ` Johannes Berg
2021-01-18 19:26       ` Linus Torvalds
2021-01-18 19:45         ` Al Viro
2021-01-18 19:49           ` Linus Torvalds
2021-01-18 19:56             ` Al Viro
2021-01-24 19:11           ` Linus Torvalds
2021-01-25  9:16             ` [PATCH] fs/pipe: allow sendfile() to pipe again Johannes Berg
2021-01-25 10:16               ` Christoph Hellwig
2021-01-25 20:34               ` Linus Torvalds
2021-01-26  6:07             ` Splicing to/from a tty Al Viro
2021-01-26  6:08               ` [PATCH 1/3] do_splice_to(): move the logics for limiting the read length in Al Viro
2021-01-26  6:09               ` [PATCH 2/3] take the guts of file-to-pipe splice into a helper function Al Viro
2021-01-26  6:09               ` [PATCH 3/3] teach sendfile(2) to handle send-to-pipe directly Al Viro
2021-01-26 18:57                 ` Linus Torvalds
2021-01-26 19:33                   ` Al Viro
2021-01-26 18:49               ` Splicing to/from a tty Linus Torvalds
2021-01-26 19:42                 ` Al Viro
2021-01-18 19:34     ` Al Viro
2021-01-18 19:46       ` Linus Torvalds
2021-01-18 19:54         ` Al Viro
2021-01-20 16:26           ` Al Viro
2021-01-20 19:11             ` Al Viro
2021-01-20 19:27               ` Linus Torvalds
2021-01-20 22:25                 ` David Laight
2021-01-20 23:02                   ` Al Viro
2021-01-20 23:14                 ` Al Viro
2021-01-20 23:40                   ` Linus Torvalds
2021-01-21  0:38                     ` Al Viro
2021-01-21  1:04                       ` Linus Torvalds
2021-01-21  1:45                         ` Al Viro
2021-01-21  3:38                           ` Linus Torvalds
2021-01-21  6:05                             ` Willy Tarreau
2021-01-21  8:04                               ` Johannes Berg
2021-01-21 10:08                         ` David Laight
2021-01-18  8:16 ` Christoph Hellwig
2021-01-18 19:36   ` Linus Torvalds
2021-01-18 20:24     ` Linus Torvalds
2021-01-18 21:35       ` Linus Torvalds
2021-01-18 21:54         ` Linus Torvalds
2021-01-18 22:03           ` Linus Torvalds
2021-01-18 22:20             ` Linus Torvalds
2021-01-19  1:38               ` Linus Torvalds
2021-01-19 11:53                 ` Greg Kroah-Hartman
2021-01-19 16:56                   ` Robert Karszniewicz
2021-01-19 17:10                     ` Robert Karszniewicz
2021-01-19 22:09                     ` Oliver Giles
2021-01-19 17:25                   ` Linus Torvalds
2021-01-19 20:24                   ` Linus Torvalds
2021-01-19 20:38                     ` Christoph Hellwig
2021-01-20  1:25                     ` Oliver Giles
2021-01-20  4:44                       ` Linus Torvalds
2021-01-20  8:15                         ` Oliver Giles
2021-01-21  1:18                         ` tty splice branch (was "Re: Splicing to/from a tty") Linus Torvalds
2021-01-21  8:44                           ` Greg Kroah-Hartman
2021-01-21  8:50                           ` Jiri Slaby
2021-01-21  8:58                             ` Jiri Slaby
2021-01-21 17:52                               ` Linus Torvalds
2021-01-21  8:58                             ` Greg Kroah-Hartman
2021-01-21 17:03                         ` Splicing to/from a tty Robert Karszniewicz
2021-01-21 18:43                           ` Linus Torvalds
2021-01-19 11:52         ` Greg Kroah-Hartman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.