* Splicing to/from a tty @ 2021-01-16 7:35 Oliver Giles 2021-01-16 16:46 ` Johannes Berg 2021-01-18 8:16 ` Christoph Hellwig 0 siblings, 2 replies; 65+ messages in thread From: Oliver Giles @ 2021-01-16 7:35 UTC (permalink / raw) To: linux-kernel; +Cc: Greg Kroah-Hartman, Christoph Hellwig Commit 36e2c7421f02 (fs: don't allow splice read/write without explicit ops) broke my userspace application which talks to an SSL VPN by splice()ing between "openssl s_client" and "pppd". The latter operates over a pty, and since that commit there is no fallback for splice()ing between a pipe and a pty, or any tty for that matter. The above commit mentions switching them to the iter ops and using generic_file_splice_read. IIUC, this would require implementing iter ops also on the line disciplines, which sounds pretty disruptive. For my case, I attempted to instead implement splice_write and splice_read in tty_fops; I managed to get splice_write working calling ld->ops->write, but splice_read is not so simple because the tty_ldisc_ops read method expects a userspace buffer. So I cannot see how to implement this without either (a) using set_fs, or (b) implementing iter ops on all line disciplines. Is splice()ing between a tty and a pipe worth supporting at all? Not a big deal for my use case at least, but it used to work. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-16 7:35 Splicing to/from a tty Oliver Giles @ 2021-01-16 16:46 ` Johannes Berg 2021-01-17 6:12 ` Oliver Giles 2021-01-18 8:53 ` Christoph Hellwig 2021-01-18 8:16 ` Christoph Hellwig 1 sibling, 2 replies; 65+ messages in thread From: Johannes Berg @ 2021-01-16 16:46 UTC (permalink / raw) To: Oliver Giles, linux-kernel; +Cc: Greg Kroah-Hartman, Christoph Hellwig On Sat, 2021-01-16 at 20:35 +1300, Oliver Giles wrote: > Commit 36e2c7421f02 (fs: don't allow splice read/write without > explicit ops) broke my userspace application which talks to an SSL VPN > by splice()ing between "openssl s_client" and "pppd". The latter > operates over a pty, and since that commit there is no fallback for > splice()ing between a pipe and a pty, or any tty for that matter. > > The above commit mentions switching them to the iter ops and using > generic_file_splice_read. IIUC, this would require implementing iter > ops also on the line disciplines, which sounds pretty disruptive. > > For my case, I attempted to instead implement splice_write and > splice_read in tty_fops; I managed to get splice_write working calling > ld->ops->write, but splice_read is not so simple because the > tty_ldisc_ops read method expects a userspace buffer. So I cannot see > how to implement this without either (a) using set_fs, or (b) > implementing iter ops on all line disciplines. > > Is splice()ing between a tty and a pipe worth supporting at all? Not a > big deal for my use case at least, but it used to work. Is it even strictly related to the tty? I was just now looking into why my cgit/fcgi/nginx setup no longer works, and the reason is getting -EINVAL from sendfile() when the input is a file and the output is a pipe(). So I wrote a simple test program (below) and that errors out on kernel 5.10.4, while it works fine on the 5.9.16 I currently have. Haven't tried reverting anything yet, but now that I haev a test program it should be simple to even bisect. johannes #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sys/sendfile.h> #include <stdio.h> #include <assert.h> int main(int argc, char **argv) { int in = open(argv[0], O_RDONLY); int p[2], out; off_t off = 0; int err; assert(in >= 0); assert(pipe(p) >= 0); out = p[1]; err = sendfile(out, in, &off, 1024); if (err < 0) perror("sendfile"); assert(err == 1024); return 0; } ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-16 16:46 ` Johannes Berg @ 2021-01-17 6:12 ` Oliver Giles 2021-01-18 8:53 ` Christoph Hellwig 1 sibling, 0 replies; 65+ messages in thread From: Oliver Giles @ 2021-01-17 6:12 UTC (permalink / raw) To: Johannes Berg, linux-kernel; +Cc: Greg Kroah-Hartman, Christoph Hellwig On Sun Jan 17, 2021 at 5:46 AM NZDT, Johannes Berg wrote: > On Sat, 2021-01-16 at 20:35 +1300, Oliver Giles wrote: > > Commit 36e2c7421f02 (fs: don't allow splice read/write without > > explicit ops) broke my userspace application which talks to an SSL VPN > > by splice()ing between "openssl s_client" and "pppd". The latter > > operates over a pty, and since that commit there is no fallback for > > splice()ing between a pipe and a pty, or any tty for that matter. > > > > The above commit mentions switching them to the iter ops and using > > generic_file_splice_read. IIUC, this would require implementing iter > > ops also on the line disciplines, which sounds pretty disruptive. > > > > For my case, I attempted to instead implement splice_write and > > splice_read in tty_fops; I managed to get splice_write working calling > > ld->ops->write, but splice_read is not so simple because the > > tty_ldisc_ops read method expects a userspace buffer. So I cannot see > > how to implement this without either (a) using set_fs, or (b) > > implementing iter ops on all line disciplines. > > > > Is splice()ing between a tty and a pipe worth supporting at all? Not a > > big deal for my use case at least, but it used to work. > > Is it even strictly related to the tty? > > I was just now looking into why my cgit/fcgi/nginx setup no longer > works, and the reason is getting -EINVAL from sendfile() when the input > is a file and the output is a pipe(). > > So I wrote a simple test program (below) and that errors out on kernel > 5.10.4, while it works fine on the 5.9.16 I currently have. Haven't > tried reverting anything yet, but now that I haev a test program it > should be simple to even bisect. > > johannes > > > #include <unistd.h> > #include <sys/types.h> > #include <sys/stat.h> > #include <fcntl.h> > #include <sys/sendfile.h> > #include <stdio.h> > #include <assert.h> > > int main(int argc, char **argv) > { > int in = open(argv[0], O_RDONLY); > int p[2], out; > off_t off = 0; > int err; > > assert(in >= 0); > assert(pipe(p) >= 0); > out = p[1]; > err = sendfile(out, in, &off, 1024); > if (err < 0) > perror("sendfile"); > assert(err == 1024); > > return 0; > } I can confirm the behaviour you see, and that it starts occurring from the same commit 36e2c7421f02a22 (fs: don't allow splice read/write without explicit ops). In my tty case, it is clear that removing the default fallback would cause this to fail, but assuming the sendfile() source is on a regular filesystem, I am unsure why splice cannot find the appropriate splice_write method. Could be connected to the fact that the message from warn_unsupported in fs/splice.c outputs "splice write not supported for file (pid: 983819 comm: test-sendfile)", i.e. the file path is blank. In my case of directly calling splice on a pty, I do see a path such as /ptyp0 in that error before implementing splice_(read/write) callbacks. Maybe splice is getting a bogus file pointer from sendfile? ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-16 16:46 ` Johannes Berg 2021-01-17 6:12 ` Oliver Giles @ 2021-01-18 8:53 ` Christoph Hellwig 2021-01-18 8:58 ` Johannes Berg 2021-01-18 19:34 ` Al Viro 1 sibling, 2 replies; 65+ messages in thread From: Christoph Hellwig @ 2021-01-18 8:53 UTC (permalink / raw) To: Johannes Berg Cc: Oliver Giles, linux-kernel, Greg Kroah-Hartman, Christoph Hellwig, Linus Torvalds, Al Viro On Sat, Jan 16, 2021 at 05:46:33PM +0100, Johannes Berg wrote: > > For my case, I attempted to instead implement splice_write and > > splice_read in tty_fops; I managed to get splice_write working calling > > ld->ops->write, but splice_read is not so simple because the > > tty_ldisc_ops read method expects a userspace buffer. So I cannot see > > how to implement this without either (a) using set_fs, or (b) > > implementing iter ops on all line disciplines. > > > > Is splice()ing between a tty and a pipe worth supporting at all? Not a > > big deal for my use case at least, but it used to work. > > Is it even strictly related to the tty? > > I was just now looking into why my cgit/fcgi/nginx setup no longer > works, and the reason is getting -EINVAL from sendfile() when the input > is a file and the output is a pipe(). Yes, pipes do not support ->splice_write currenly. I think just wiring up iter_file_splice_write would work. Al? > So I wrote a simple test program (below) and that errors out on kernel > 5.10.4, while it works fine on the 5.9.16 I currently have. Haven't > tried reverting anything yet, but now that I haev a test program it > should be simple to even bisect. > > johannes > > > #include <unistd.h> > #include <sys/types.h> > #include <sys/stat.h> > #include <fcntl.h> > #include <sys/sendfile.h> > #include <stdio.h> > #include <assert.h> > > int main(int argc, char **argv) > { > int in = open(argv[0], O_RDONLY); > int p[2], out; > off_t off = 0; > int err; > > assert(in >= 0); > assert(pipe(p) >= 0); > out = p[1]; > err = sendfile(out, in, &off, 1024); > if (err < 0) > perror("sendfile"); > assert(err == 1024); > > return 0; > } > ---end quoted text--- ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-18 8:53 ` Christoph Hellwig @ 2021-01-18 8:58 ` Johannes Berg 2021-01-18 19:26 ` Linus Torvalds 2021-01-18 19:34 ` Al Viro 1 sibling, 1 reply; 65+ messages in thread From: Johannes Berg @ 2021-01-18 8:58 UTC (permalink / raw) To: Christoph Hellwig Cc: Oliver Giles, linux-kernel, Greg Kroah-Hartman, Linus Torvalds, Al Viro On Mon, 2021-01-18 at 09:53 +0100, Christoph Hellwig wrote: > On Sat, Jan 16, 2021 at 05:46:33PM +0100, Johannes Berg wrote: > > > For my case, I attempted to instead implement splice_write and > > > splice_read in tty_fops; I managed to get splice_write working calling > > > ld->ops->write, but splice_read is not so simple because the > > > tty_ldisc_ops read method expects a userspace buffer. So I cannot see > > > how to implement this without either (a) using set_fs, or (b) > > > implementing iter ops on all line disciplines. > > > > > > Is splice()ing between a tty and a pipe worth supporting at all? Not a > > > big deal for my use case at least, but it used to work. > > > > Is it even strictly related to the tty? > > > > I was just now looking into why my cgit/fcgi/nginx setup no longer > > works, and the reason is getting -EINVAL from sendfile() when the input > > is a file and the output is a pipe(). > > Yes, pipes do not support ->splice_write currenly. Well, it clearly used to work :-) > I think just wiring > up iter_file_splice_write would work. Al? Seems to work for the simple test case that I had, at least: diff --git a/fs/pipe.c b/fs/pipe.c index c5989cfd564d..39c96845a72f 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -1206,6 +1206,7 @@ const struct file_operations pipefifo_fops = { .unlocked_ioctl = pipe_ioctl, .release = pipe_release, .fasync = pipe_fasync, + .splice_write = iter_file_splice_write, }; /* johannes ^ permalink raw reply related [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-18 8:58 ` Johannes Berg @ 2021-01-18 19:26 ` Linus Torvalds 2021-01-18 19:45 ` Al Viro 0 siblings, 1 reply; 65+ messages in thread From: Linus Torvalds @ 2021-01-18 19:26 UTC (permalink / raw) To: Johannes Berg Cc: Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman, Al Viro On Mon, Jan 18, 2021 at 12:58 AM Johannes Berg <johannes@sipsolutions.net> wrote: > > > I think just wiring up iter_file_splice_write would work. Al? > > Seems to work for the simple test case that I had, at least: Mind sending me a signed-off patch for this? Yeah, I know it's a trivial one-liner, but I much prefer having an author with a patch and a sign-off to just doing it personally and reaping all that glory for it.. Linus ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-18 19:26 ` Linus Torvalds @ 2021-01-18 19:45 ` Al Viro 2021-01-18 19:49 ` Linus Torvalds 2021-01-24 19:11 ` Linus Torvalds 0 siblings, 2 replies; 65+ messages in thread From: Al Viro @ 2021-01-18 19:45 UTC (permalink / raw) To: Linus Torvalds Cc: Johannes Berg, Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman, Al Viro On Mon, Jan 18, 2021 at 11:26:00AM -0800, Linus Torvalds wrote: > On Mon, Jan 18, 2021 at 12:58 AM Johannes Berg > <johannes@sipsolutions.net> wrote: > > > > > I think just wiring up iter_file_splice_write would work. Al? > > > > Seems to work for the simple test case that I had, at least: > > Mind sending me a signed-off patch for this? > > Yeah, I know it's a trivial one-liner, but I much prefer having an > author with a patch and a sign-off to just doing it personally and > reaping all that glory for it.. IMO it's a wrong way to handle that. Look: do_sendfile() calls do_splice_direct(), which calls splice_direct_to_actor(). There we allocate an internal pipe and go through "feed from source into that pipe, then shove what's there into destination". Which is insane for the case when destination (or source, for that matter) is a pipe itself. do_splice_direct() does something that do_splice() won't - it handles non-pipe to non-pipe case. Which is how sendfile(2) is normally used, of course. I'll look into that in more details, but IMO bothering with internal pipe is just plain wrong for those cases. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-18 19:45 ` Al Viro @ 2021-01-18 19:49 ` Linus Torvalds 2021-01-18 19:56 ` Al Viro 2021-01-24 19:11 ` Linus Torvalds 1 sibling, 1 reply; 65+ messages in thread From: Linus Torvalds @ 2021-01-18 19:49 UTC (permalink / raw) To: Al Viro Cc: Johannes Berg, Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman On Mon, Jan 18, 2021 at 11:45 AM Al Viro <viro@zeniv-ca> wrote: > > do_splice_direct() does something that do_splice() won't - it > handles non-pipe to non-pipe case. Which is how sendfile(2) is > normally used, of course. Yeah, I agree, it's better if we do the pipe case specially, but if it's too painful for a stable backport I'll certainly take the one-liner.. Btw, your email setup is broken. Your emails now have a "From:" line like this: From: Al Viro <viro@zeniv-ca> and that is not a valid email address. Linus ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-18 19:49 ` Linus Torvalds @ 2021-01-18 19:56 ` Al Viro 0 siblings, 0 replies; 65+ messages in thread From: Al Viro @ 2021-01-18 19:56 UTC (permalink / raw) To: Linus Torvalds Cc: Johannes Berg, Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman On Mon, Jan 18, 2021 at 11:49:53AM -0800, Linus Torvalds wrote: > On Mon, Jan 18, 2021 at 11:45 AM Al Viro <viro@zeniv-ca> wrote: > > > > do_splice_direct() does something that do_splice() won't - it > > handles non-pipe to non-pipe case. Which is how sendfile(2) is > > normally used, of course. > > Yeah, I agree, it's better if we do the pipe case specially, but if > it's too painful for a stable backport I'll certainly take the > one-liner.. > > Btw, your email setup is broken. Your emails now have a "From:" line like this: > > From: Al Viro <viro@zeniv-ca> > > and that is not a valid email address. Just noticed, should be fixed now... ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-18 19:45 ` Al Viro 2021-01-18 19:49 ` Linus Torvalds @ 2021-01-24 19:11 ` Linus Torvalds 2021-01-25 9:16 ` [PATCH] fs/pipe: allow sendfile() to pipe again Johannes Berg 2021-01-26 6:07 ` Splicing to/from a tty Al Viro 1 sibling, 2 replies; 65+ messages in thread From: Linus Torvalds @ 2021-01-24 19:11 UTC (permalink / raw) To: Al Viro Cc: Johannes Berg, Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman Al, coming back to this because rc5 is imminent.. On Mon, Jan 18, 2021 at 11:45 AM Al Viro <viro@zeniv-ca> wrote: > > do_splice_direct() does something that do_splice() won't - it > handles non-pipe to non-pipe case. Which is how sendfile(2) is > normally used, of course. > > I'll look into that in more details, but IMO bothering with > internal pipe is just plain wrong for those cases. You clearly thought about this, with the emails about odd error cases, but I get the feeling that for fixing the current "you can't sendfile() to a pipe" regression (including stable) we should do the one-liner. No? I agree that it would be better fixed by just having sendfile() basically turn into splice() for the pipe target case, but I haven't seen any patches from you, so I assume it wasn't 100% trivial. Hmm? Linus ^ permalink raw reply [flat|nested] 65+ messages in thread
* [PATCH] fs/pipe: allow sendfile() to pipe again 2021-01-24 19:11 ` Linus Torvalds @ 2021-01-25 9:16 ` Johannes Berg 2021-01-25 10:16 ` Christoph Hellwig 2021-01-25 20:34 ` Linus Torvalds 2021-01-26 6:07 ` Splicing to/from a tty Al Viro 1 sibling, 2 replies; 65+ messages in thread From: Johannes Berg @ 2021-01-25 9:16 UTC (permalink / raw) To: Linus Torvalds Cc: Al Viro, Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman After commit 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops") sendfile() could no longer send data from a real file to a pipe, breaking for example certain cgit setups (e.g. when running behind fcgiwrap), because in this case cgit will try to do exactly this: sendfile() to a pipe. Fix this by using iter_file_splice_write for the splice_write method of pipes, as suggested by Christoph. Cc: stable@vger.kernel.org Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops") Suggested-by: Christoph Hellwig <hch@lst.de> Tested-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Johannes Berg <johannes@sipsolutions.net> --- Mostly for the record, since it looks like we'll have a proper fix without another intermediate pipe. However, this fixes the regression for now. --- fs/pipe.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/pipe.c b/fs/pipe.c index c5989cfd564d..39c96845a72f 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -1206,6 +1206,7 @@ const struct file_operations pipefifo_fops = { .unlocked_ioctl = pipe_ioctl, .release = pipe_release, .fasync = pipe_fasync, + .splice_write = iter_file_splice_write, }; /* -- 2.26.2 ^ permalink raw reply related [flat|nested] 65+ messages in thread
* Re: [PATCH] fs/pipe: allow sendfile() to pipe again 2021-01-25 9:16 ` [PATCH] fs/pipe: allow sendfile() to pipe again Johannes Berg @ 2021-01-25 10:16 ` Christoph Hellwig 2021-01-25 20:34 ` Linus Torvalds 1 sibling, 0 replies; 65+ messages in thread From: Christoph Hellwig @ 2021-01-25 10:16 UTC (permalink / raw) To: Johannes Berg Cc: Linus Torvalds, Al Viro, Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman Looks good, Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH] fs/pipe: allow sendfile() to pipe again 2021-01-25 9:16 ` [PATCH] fs/pipe: allow sendfile() to pipe again Johannes Berg 2021-01-25 10:16 ` Christoph Hellwig @ 2021-01-25 20:34 ` Linus Torvalds 1 sibling, 0 replies; 65+ messages in thread From: Linus Torvalds @ 2021-01-25 20:34 UTC (permalink / raw) To: Johannes Berg Cc: Al Viro, Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman On Mon, Jan 25, 2021 at 1:16 AM Johannes Berg <johannes@sipsolutions.net> wrote: > > Mostly for the record, since it looks like we'll have a proper > fix without another intermediate pipe. However, this fixes the > regression for now. I've applied this anyway. We'll see how big Al's proper fix ends up being, and depending on that this one-liner might just be the right thing for stable anyway. And if Al's patch ends up being small, simple and clean, this one-liner won't matter, but it's not like there's any real downside to it. Linus ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-24 19:11 ` Linus Torvalds 2021-01-25 9:16 ` [PATCH] fs/pipe: allow sendfile() to pipe again Johannes Berg @ 2021-01-26 6:07 ` Al Viro 2021-01-26 6:08 ` [PATCH 1/3] do_splice_to(): move the logics for limiting the read length in Al Viro ` (3 more replies) 1 sibling, 4 replies; 65+ messages in thread From: Al Viro @ 2021-01-26 6:07 UTC (permalink / raw) To: Linus Torvalds Cc: Johannes Berg, Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman On Sun, Jan 24, 2021 at 11:11:42AM -0800, Linus Torvalds wrote: > Al, > coming back to this because rc5 is imminent.. > > On Mon, Jan 18, 2021 at 11:45 AM Al Viro <viro@zeniv-ca> wrote: > > > > do_splice_direct() does something that do_splice() won't - it > > handles non-pipe to non-pipe case. Which is how sendfile(2) is > > normally used, of course. > > > > I'll look into that in more details, but IMO bothering with > > internal pipe is just plain wrong for those cases. > > You clearly thought about this, with the emails about odd error cases, > but I get the feeling that for fixing the current "you can't > sendfile() to a pipe" regression (including stable) we should do the > one-liner. No? > > I agree that it would be better fixed by just having sendfile() > basically turn into splice() for the pipe target case, but I haven't > seen any patches from you, so I assume it wasn't 100% trivial. > > Hmm? Just to make clear - sendfile() regular-to-pipe is *not* the same issue as splice to/from tty. The latter needs ->splice_read() and ->splice_write() in tty_fops; the former can be dealt with by teaching do_sendifile() to use the guts of do_splice() in case when in or out happens to be a pipe (with some rearrangement of checks) instead of bothering with do_splice_direct(). The only commonality between these two is that both got broken by the same patch series. Johannes' patch is an attempt to deal with regular-to-pipe sendfile(2), and it's not a good way to handle that. Neither it, nor the variant I proposed would do a damn thing for tty (and sendfile(2) never worked for source other than regular or block anyway). FWIW, the real check in do_splice_direct() should be "has FMODE_PREAD", regardless of the position we are asking to read from - do_splice_direct() is basically while left to copy splice_read from in to internal pipe splice_write from internal pipe to out and if splice_write ends up with short copy, we advance the position by the amount written and discard everything left in the internal pipe. Using it for something non-seekable would end up with data silently lost on short copy. Note that decision against splice(2) between non-pipes appears to have been deliberate and unless Jens (and mingo?) decide they are OK with that change, I'm *not* adding that functionality to do_splice(). Anyway, the series is in vfs.git #work.sendfile, 5.11-rc1-based Shortlog: Al Viro (3): do_splice_to(): move the logics for limiting the read length in take the guts of file-to-pipe splice into a helper function teach sendfile(2) to handle send-to-pipe directly Diffstat: fs/internal.h | 9 +++++++++ fs/read_write.c | 19 +++++++++++++------ fs/splice.c | 42 +++++++++++++++++++++++------------------- 3 files changed, 45 insertions(+), 25 deletions(-) Individual patches in followups; first two are equivalent transformations (fairly obvious cleanup and taking a part of do_splice() into a helper), while the third one makes do_sendfile() use that new helper for file-to-pipe case. ^ permalink raw reply [flat|nested] 65+ messages in thread
* [PATCH 1/3] do_splice_to(): move the logics for limiting the read length in 2021-01-26 6:07 ` Splicing to/from a tty Al Viro @ 2021-01-26 6:08 ` Al Viro 2021-01-26 6:09 ` [PATCH 2/3] take the guts of file-to-pipe splice into a helper function Al Viro ` (2 subsequent siblings) 3 siblings, 0 replies; 65+ messages in thread From: Al Viro @ 2021-01-26 6:08 UTC (permalink / raw) To: Linus Torvalds Cc: Johannes Berg, Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman Both callers have the identical logics limiting the amount of data we try to read into pipe - no more than would fit into that pipe. Move that into do_splice_to() itself. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- fs/splice.c | 21 +++++++-------------- 1 file changed, 7 insertions(+), 14 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index 866d5c2367b2..c1ca2cc63b43 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -771,11 +771,16 @@ static long do_splice_to(struct file *in, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags) { + unsigned int p_space; int ret; if (unlikely(!(in->f_mode & FMODE_READ))) return -EBADF; + /* Don't try to read more the pipe has space for. */ + p_space = pipe->max_usage - pipe_occupancy(pipe->head, pipe->tail); + len = min_t(size_t, len, p_space << PAGE_SHIFT); + ret = rw_verify_area(READ, in, ppos, len); if (unlikely(ret < 0)) return ret; @@ -856,15 +861,10 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd, WARN_ON_ONCE(!pipe_empty(pipe->head, pipe->tail)); while (len) { - unsigned int p_space; size_t read_len; loff_t pos = sd->pos, prev_pos = pos; - /* Don't try to read more the pipe has space for. */ - p_space = pipe->max_usage - - pipe_occupancy(pipe->head, pipe->tail); - read_len = min_t(size_t, len, p_space << PAGE_SHIFT); - ret = do_splice_to(in, &pos, pipe, read_len, flags); + ret = do_splice_to(in, &pos, pipe, len, flags); if (unlikely(ret <= 0)) goto out_release; @@ -1083,15 +1083,8 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, pipe_lock(opipe); ret = wait_for_space(opipe, flags); - if (!ret) { - unsigned int p_space; - - /* Don't try to read more the pipe has space for. */ - p_space = opipe->max_usage - pipe_occupancy(opipe->head, opipe->tail); - len = min_t(size_t, len, p_space << PAGE_SHIFT); - + if (!ret) ret = do_splice_to(in, &offset, opipe, len, flags); - } pipe_unlock(opipe); if (ret > 0) wakeup_pipe_readers(opipe); -- 2.11.0 ^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 2/3] take the guts of file-to-pipe splice into a helper function 2021-01-26 6:07 ` Splicing to/from a tty Al Viro 2021-01-26 6:08 ` [PATCH 1/3] do_splice_to(): move the logics for limiting the read length in Al Viro @ 2021-01-26 6:09 ` Al Viro 2021-01-26 6:09 ` [PATCH 3/3] teach sendfile(2) to handle send-to-pipe directly Al Viro 2021-01-26 18:49 ` Splicing to/from a tty Linus Torvalds 3 siblings, 0 replies; 65+ messages in thread From: Al Viro @ 2021-01-26 6:09 UTC (permalink / raw) To: Linus Torvalds Cc: Johannes Berg, Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- fs/splice.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index c1ca2cc63b43..74f968c65a93 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1002,6 +1002,23 @@ static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe, struct pipe_inode_info *opipe, size_t len, unsigned int flags); +static long splice_file_to_pipe(struct file *in, + struct pipe_inode_info *opipe, + loff_t *offset, + size_t len, unsigned int flags) +{ + long ret; + + pipe_lock(opipe); + ret = wait_for_space(opipe, flags); + if (!ret) + ret = do_splice_to(in, offset, opipe, len, flags); + pipe_unlock(opipe); + if (ret > 0) + wakeup_pipe_readers(opipe); + return ret; +} + /* * Determine where to splice to/from. */ @@ -1081,13 +1098,7 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, if (out->f_flags & O_NONBLOCK) flags |= SPLICE_F_NONBLOCK; - pipe_lock(opipe); - ret = wait_for_space(opipe, flags); - if (!ret) - ret = do_splice_to(in, &offset, opipe, len, flags); - pipe_unlock(opipe); - if (ret > 0) - wakeup_pipe_readers(opipe); + ret = splice_file_to_pipe(in, opipe, &offset, len, flags); if (!off_in) in->f_pos = offset; else -- 2.11.0 ^ permalink raw reply related [flat|nested] 65+ messages in thread
* [PATCH 3/3] teach sendfile(2) to handle send-to-pipe directly 2021-01-26 6:07 ` Splicing to/from a tty Al Viro 2021-01-26 6:08 ` [PATCH 1/3] do_splice_to(): move the logics for limiting the read length in Al Viro 2021-01-26 6:09 ` [PATCH 2/3] take the guts of file-to-pipe splice into a helper function Al Viro @ 2021-01-26 6:09 ` Al Viro 2021-01-26 18:57 ` Linus Torvalds 2021-01-26 18:49 ` Splicing to/from a tty Linus Torvalds 3 siblings, 1 reply; 65+ messages in thread From: Al Viro @ 2021-01-26 6:09 UTC (permalink / raw) To: Linus Torvalds Cc: Johannes Berg, Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman no point going through the intermediate pipe Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --- fs/internal.h | 9 +++++++++ fs/read_write.c | 19 +++++++++++++------ fs/splice.c | 2 +- 3 files changed, 23 insertions(+), 7 deletions(-) diff --git a/fs/internal.h b/fs/internal.h index 77c50befbfbe..cff1f30cfefb 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -15,6 +15,7 @@ struct mount; struct shrink_control; struct fs_context; struct user_namespace; +struct pipe_inode_info; /* * block_dev.c @@ -193,3 +194,11 @@ int sb_init_dio_done_wq(struct super_block *sb); */ int do_statx(int dfd, const char __user *filename, unsigned flags, unsigned int mask, struct statx __user *buffer); + +/* + * fs/splice.c: + */ +long splice_file_to_pipe(struct file *in, + struct pipe_inode_info *opipe, + loff_t *offset, + size_t len, unsigned int flags); diff --git a/fs/read_write.c b/fs/read_write.c index 75f764b43418..9db7adf160d2 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1188,6 +1188,7 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos, { struct fd in, out; struct inode *in_inode, *out_inode; + struct pipe_inode_info *opipe; loff_t pos; loff_t out_pos; ssize_t retval; @@ -1228,9 +1229,6 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos, in_inode = file_inode(in.file); out_inode = file_inode(out.file); out_pos = out.file->f_pos; - retval = rw_verify_area(WRITE, out.file, &out_pos, count); - if (retval < 0) - goto fput_out; if (!max) max = min(in_inode->i_sb->s_maxbytes, out_inode->i_sb->s_maxbytes); @@ -1253,9 +1251,18 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos, if (in.file->f_flags & O_NONBLOCK) fl = SPLICE_F_NONBLOCK; #endif - file_start_write(out.file); - retval = do_splice_direct(in.file, &pos, out.file, &out_pos, count, fl); - file_end_write(out.file); + opipe = get_pipe_info(out.file, true); + if (!opipe) { + retval = rw_verify_area(WRITE, out.file, &out_pos, count); + if (retval < 0) + goto fput_out; + file_start_write(out.file); + retval = do_splice_direct(in.file, &pos, out.file, &out_pos, + count, fl); + file_end_write(out.file); + } else { + retval = splice_file_to_pipe(in.file, opipe, &pos, count, fl); + } if (retval > 0) { add_rchar(current, retval); diff --git a/fs/splice.c b/fs/splice.c index 74f968c65a93..b06846f1e6ee 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1002,7 +1002,7 @@ static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe, struct pipe_inode_info *opipe, size_t len, unsigned int flags); -static long splice_file_to_pipe(struct file *in, +long splice_file_to_pipe(struct file *in, struct pipe_inode_info *opipe, loff_t *offset, size_t len, unsigned int flags) -- 2.11.0 ^ permalink raw reply related [flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] teach sendfile(2) to handle send-to-pipe directly 2021-01-26 6:09 ` [PATCH 3/3] teach sendfile(2) to handle send-to-pipe directly Al Viro @ 2021-01-26 18:57 ` Linus Torvalds 2021-01-26 19:33 ` Al Viro 0 siblings, 1 reply; 65+ messages in thread From: Linus Torvalds @ 2021-01-26 18:57 UTC (permalink / raw) To: Al Viro Cc: Johannes Berg, Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman On Mon, Jan 25, 2021 at 10:09 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > no point going through the intermediate pipe Ok, this series looks fine to me. Al - do you want me to apply this as patches for 5.11 (and I'll revert the then unnecessary one-liner), or should I expect a pull request or what? Linus ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [PATCH 3/3] teach sendfile(2) to handle send-to-pipe directly 2021-01-26 18:57 ` Linus Torvalds @ 2021-01-26 19:33 ` Al Viro 0 siblings, 0 replies; 65+ messages in thread From: Al Viro @ 2021-01-26 19:33 UTC (permalink / raw) To: Linus Torvalds Cc: Johannes Berg, Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman On Tue, Jan 26, 2021 at 10:57:38AM -0800, Linus Torvalds wrote: > On Mon, Jan 25, 2021 at 10:09 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > no point going through the intermediate pipe > > Ok, this series looks fine to me. > > Al - do you want me to apply this as patches for 5.11 (and I'll revert > the then unnecessary one-liner), or should I expect a pull request or > what? I'm going to put it into #for-next for a few days, then send a pull request your way (assuming no problems show up, obviously). ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-26 6:07 ` Splicing to/from a tty Al Viro ` (2 preceding siblings ...) 2021-01-26 6:09 ` [PATCH 3/3] teach sendfile(2) to handle send-to-pipe directly Al Viro @ 2021-01-26 18:49 ` Linus Torvalds 2021-01-26 19:42 ` Al Viro 3 siblings, 1 reply; 65+ messages in thread From: Linus Torvalds @ 2021-01-26 18:49 UTC (permalink / raw) To: Al Viro Cc: Johannes Berg, Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman On Mon, Jan 25, 2021 at 10:07 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Sun, Jan 24, 2021 at 11:11:42AM -0800, Linus Torvalds wrote: > > > > I agree that it would be better fixed by just having sendfile() > > basically turn into splice() for the pipe target case, but I haven't > > seen any patches from you, so I assume it wasn't 100% trivial. > > Just to make clear - sendfile() regular-to-pipe is *not* the same > issue as splice to/from tty. That's not what I meant. sendfile() to a pipe is basically the same thing as splice() to a pipe. Except I think it might have different looping behavior. And as you noted earlier, the error returns may be randomly different. Linus ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-26 18:49 ` Splicing to/from a tty Linus Torvalds @ 2021-01-26 19:42 ` Al Viro 0 siblings, 0 replies; 65+ messages in thread From: Al Viro @ 2021-01-26 19:42 UTC (permalink / raw) To: Linus Torvalds Cc: Johannes Berg, Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman On Tue, Jan 26, 2021 at 10:49:11AM -0800, Linus Torvalds wrote: > On Mon, Jan 25, 2021 at 10:07 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > On Sun, Jan 24, 2021 at 11:11:42AM -0800, Linus Torvalds wrote: > > > > > > I agree that it would be better fixed by just having sendfile() > > > basically turn into splice() for the pipe target case, but I haven't > > > seen any patches from you, so I assume it wasn't 100% trivial. > > > > Just to make clear - sendfile() regular-to-pipe is *not* the same > > issue as splice to/from tty. > > That's not what I meant. > > sendfile() to a pipe is basically the same thing as splice() to a pipe. > > Except I think it might have different looping behavior. And as you > noted earlier, the error returns may be randomly different. Sure. What I'm saying is that there'd been two different regressions discussed in that thread, and the one dealt with by this patch series is not the one brought up in the original posting. IOW, it's an alternative not to "let's add ->splice_{read,write}() for tty" stuff but to "let's add ->splice_write() for pipes" one-liner. I do not doubt that you know that. But the thread had been confusing enough (hell, the sendfile-related subthread has kept the original subject explicitly refering to tty), so I wanted to avoid any confusion for somebody looking through it a year or two down the road. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-18 8:53 ` Christoph Hellwig 2021-01-18 8:58 ` Johannes Berg @ 2021-01-18 19:34 ` Al Viro 2021-01-18 19:46 ` Linus Torvalds 1 sibling, 1 reply; 65+ messages in thread From: Al Viro @ 2021-01-18 19:34 UTC (permalink / raw) To: Christoph Hellwig Cc: Johannes Berg, Oliver Giles, linux-kernel, Greg Kroah-Hartman, Linus Torvalds, Al Viro On Mon, Jan 18, 2021 at 09:53:11AM +0100, Christoph Hellwig wrote: > On Sat, Jan 16, 2021 at 05:46:33PM +0100, Johannes Berg wrote: > > > For my case, I attempted to instead implement splice_write and > > > splice_read in tty_fops; I managed to get splice_write working calling > > > ld->ops->write, but splice_read is not so simple because the > > > tty_ldisc_ops read method expects a userspace buffer. So I cannot see > > > how to implement this without either (a) using set_fs, or (b) > > > implementing iter ops on all line disciplines. > > > > > > Is splice()ing between a tty and a pipe worth supporting at all? Not a > > > big deal for my use case at least, but it used to work. > > > > Is it even strictly related to the tty? > > > > I was just now looking into why my cgit/fcgi/nginx setup no longer > > works, and the reason is getting -EINVAL from sendfile() when the input > > is a file and the output is a pipe(). > > Yes, pipes do not support ->splice_write currenly. I think just wiring > up iter_file_splice_write would work. Al? I'd rather have sendfile(2) do what splice(2) does and handle pipes directly. Let me take a look,,, ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-18 19:34 ` Al Viro @ 2021-01-18 19:46 ` Linus Torvalds 2021-01-18 19:54 ` Al Viro 0 siblings, 1 reply; 65+ messages in thread From: Linus Torvalds @ 2021-01-18 19:46 UTC (permalink / raw) To: Al Viro Cc: Christoph Hellwig, Johannes Berg, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman On Mon, Jan 18, 2021 at 11:35 AM Al Viro <viro@zeniv-ca> wrote: > > I'd rather have sendfile(2) do what splice(2) does and handle pipes > directly. Let me take a look,,, It's not technically just the sendfile() thing. Some things do sendfile "internally" (ie they use do_splice_direct()). Although maybe they always happen just on non-pipe destinations (ie file-to-file copy). I didn't check. But particularly if it can be handled in do_splice_direct(), that would be a good thing. Linus ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-18 19:46 ` Linus Torvalds @ 2021-01-18 19:54 ` Al Viro 2021-01-20 16:26 ` Al Viro 0 siblings, 1 reply; 65+ messages in thread From: Al Viro @ 2021-01-18 19:54 UTC (permalink / raw) To: Linus Torvalds Cc: Al Viro, Christoph Hellwig, Johannes Berg, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman On Mon, Jan 18, 2021 at 11:46:42AM -0800, Linus Torvalds wrote: > On Mon, Jan 18, 2021 at 11:35 AM Al Viro <viro@zeniv-ca> wrote: > > > > I'd rather have sendfile(2) do what splice(2) does and handle pipes > > directly. Let me take a look,,, > > It's not technically just the sendfile() thing. Some things do > sendfile "internally" (ie they use do_splice_direct()). > > Although maybe they always happen just on non-pipe destinations (ie > file-to-file copy). I didn't check. > > But particularly if it can be handled in do_splice_direct(), that > would be a good thing. FWIW, it might make sense to merge do_splice_direct() and do_splice(); interfaces are very similar and do_splice() is basically if both are pipes .... else if input is pipe .... else if output is pipe .... else return -EINVAL with do_splice_direct() being fairly close to the missing case. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-18 19:54 ` Al Viro @ 2021-01-20 16:26 ` Al Viro 2021-01-20 19:11 ` Al Viro 0 siblings, 1 reply; 65+ messages in thread From: Al Viro @ 2021-01-20 16:26 UTC (permalink / raw) To: Linus Torvalds Cc: Christoph Hellwig, Johannes Berg, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman On Mon, Jan 18, 2021 at 07:54:00PM +0000, Al Viro wrote: > On Mon, Jan 18, 2021 at 11:46:42AM -0800, Linus Torvalds wrote: > > On Mon, Jan 18, 2021 at 11:35 AM Al Viro <viro@zeniv-ca> wrote: > > > > > > I'd rather have sendfile(2) do what splice(2) does and handle pipes > > > directly. Let me take a look,,, > > > > It's not technically just the sendfile() thing. Some things do > > sendfile "internally" (ie they use do_splice_direct()). > > > > Although maybe they always happen just on non-pipe destinations (ie > > file-to-file copy). I didn't check. > > > > But particularly if it can be handled in do_splice_direct(), that > > would be a good thing. > > FWIW, it might make sense to merge do_splice_direct() and do_splice(); > interfaces are very similar and do_splice() is basically > if both are pipes > .... > else if input is pipe > .... > else if output is pipe > .... > else > return -EINVAL > with do_splice_direct() being fairly close to the missing case. Yecchhh... The situation with checks is interesting. do_splice(): in needs FMODE_READ, out - FMODE_WRITE [EBADF] if in is a pipe, off_in must be NULL. [ESPIPE] if out is a pipe, off_out must be NULL. [ESPIPE] if in is not a pipe non-NULL off_in is allowed only with FMODE_PREAD [EINVAL] rw_verify_area done on in if out is not a pipe non-NULL off_out is allowed only with FMODE_PWRITE [EINVAL] no O_APPEND on out [EINVAL] rw_verify_area done on out either in or out must be a pipe [EINVAL] do_splice_direct(): out needs FMODE_WRITE [EBADF] rw_verify_area done on out no O_APPEND on out [EINVAL] Callers: __do_splice() -> do_splice(): if in is a pipe, off_in must be NULL. [ESPIPE, duplicate] if out is a pipe, off_out must be NULL. [ESPIPE, duplicate] io_uring io_splice() -> do_splice(): no extra checks do_sendfile() -> do_splice_direct(): in needs FMODE_READ [EBADF] out needs FMODE_WRITE [EBADF, duplicate] non-NULL off_in is allowed only with FMODE_PREAD [EINVAL] for out we act as splice with NULL off_out (== use ->f_pos) rw_verify_area done on in rw_verify_area done on out [duplicate] vfs_copy_file_range() -> do_copy_file_range() -> -> generic_copy_file_range() or __ceph_copy_file_range() -> -> do_splice_direct(): both in and out must be regular [EINVAL or EISDIR] in needs FMODE_READ [EBADF] out needs FMODE_WRITE [EBADF] no O_APPEND on out [EBADF] on immutable for out [EPERM] rw_verify_area done on in rw_verify_area done on out [duplicate] FMODE_P{READ,WRITE} is ignored. I wonder what happens if somebody starts playing with copy_file_range(2) on e.g. procfs or sysfs... ovl_copy_up_date() -> do_splice_direct(): both in and out are hopefully regular - I'm not sure that copyup logics is sufficiently protected, TBH. FMODE_READ on in and FMODE_WRITE on out are guaranteed lack of O_APPEND on out is guaranteed FMODE_P{READ,WRITE} is completely ignored. Should be OK, modulo the same issues with protection against manipulation of layers. Checks in vfs_copy_file_range() need to remain there - do_splice_direct() is not guaranteed downstream of that. AFAICS, we have the following: in must have FMODE_READ out must have FMODE_WRITE no O_APPEND on out, unless a pipe[1] for non-pipe out we do rw_verify_area for non-pipe in we do rw_verify_area no offsets for pipes The lack of offsets for pipes should've been "not without FMODE_P{READ,WRITE}", but that's not entirely consistent - we have splice(2) fail with EINVAL in case of non-NULL off_in on a file that doesn't allow pread(), except that when this file happens to be a pipe (which obviously doesn't allow pread()) we fail with ESPIPE instead. Note that for pread(2)/pwrite(2)/sendfile(2) we fail with consistent ESPIPE in all such cases. We also do not check FMODE_PREAD/FMODE_PWRITE for copy_file_range()/copyup. Which is probably wrong. Another inconsistency is that rw_verify_area() in copyup is done for output (once for each chunk), but not for input. The tricky part here is that there's an LSM hook in the end of rw_verify_area()... Can we turn "not a pipe, but lacks FMODE_PREAD" error value for splice(2) from EINVAL to ESPIPE? Would make the logics easier to consolidate... Right now the checks are spread over many layers of several call chains and rather hard to follow. [1] yes, it is possible to have O_APPEND opened pipes - open a FIFO with O_APPEND and you've got it. We are not quite consistent in handling those - sendfile() to such is rejected, splice() is not. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-20 16:26 ` Al Viro @ 2021-01-20 19:11 ` Al Viro 2021-01-20 19:27 ` Linus Torvalds 0 siblings, 1 reply; 65+ messages in thread From: Al Viro @ 2021-01-20 19:11 UTC (permalink / raw) To: Linus Torvalds Cc: Christoph Hellwig, Johannes Berg, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman On Wed, Jan 20, 2021 at 04:26:08PM +0000, Al Viro wrote: > [1] yes, it is possible to have O_APPEND opened pipes - open a FIFO with > O_APPEND and you've got it. We are not quite consistent in handling > those - sendfile() to such is rejected, splice() is not. BTW, according to manpages of splice(2) and sendfile(2), we have EINVAL [snip] target file is opened in append mode [snip] and EINVAL out_fd has the O_APPEND flag set. This is not currently supported by sendfile(). However, splice(2) to FIFO opened with O_APPEND works just fine. So it doesn't match the manpage either. Why do we care about O_APPEND on anything without FMODE_PWRITE (including pipes), anyway? All writes there ignore position, after all... ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-20 19:11 ` Al Viro @ 2021-01-20 19:27 ` Linus Torvalds 2021-01-20 22:25 ` David Laight 2021-01-20 23:14 ` Al Viro 0 siblings, 2 replies; 65+ messages in thread From: Linus Torvalds @ 2021-01-20 19:27 UTC (permalink / raw) To: Al Viro Cc: Christoph Hellwig, Johannes Berg, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman On Wed, Jan 20, 2021 at 11:11 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > Why do we care about O_APPEND on anything without FMODE_PWRITE (including > pipes), anyway? All writes there ignore position, after all... We shouldn't care. Also, I think we should try to move away from FMODE_PWRITE/PREAD entirely, and use FMODE_STREAM as the primary "this thing doesn't have a position at all". That's what gets rid of all the f_pos locking etc after all. The FMODE_PWRITE/PREAD flags are I think legacy (although we do seem to have the seq_file case that normally allows position on reads, but not on writes, so we may need to keep all three bits). Anyway, I think that with FMODE_STREAM, O_APPEND definitely should be a no-op. Linus ^ permalink raw reply [flat|nested] 65+ messages in thread
* RE: Splicing to/from a tty 2021-01-20 19:27 ` Linus Torvalds @ 2021-01-20 22:25 ` David Laight 2021-01-20 23:02 ` Al Viro 2021-01-20 23:14 ` Al Viro 1 sibling, 1 reply; 65+ messages in thread From: David Laight @ 2021-01-20 22:25 UTC (permalink / raw) To: 'Linus Torvalds', Al Viro Cc: Christoph Hellwig, Johannes Berg, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman From: Linus Torvalds > Sent: 20 January 2021 19:27 > On Wed, Jan 20, 2021 at 11:11 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > Why do we care about O_APPEND on anything without FMODE_PWRITE (including > > pipes), anyway? All writes there ignore position, after all... > > We shouldn't care. > > Also, I think we should try to move away from FMODE_PWRITE/PREAD > entirely, and use FMODE_STREAM as the primary "this thing doesn't have > a position at all". > > That's what gets rid of all the f_pos locking etc after all. The > FMODE_PWRITE/PREAD flags are I think legacy (although we do seem to > have the seq_file case that normally allows position on reads, but not > on writes, so we may need to keep all three bits). > > Anyway, I think that with FMODE_STREAM, O_APPEND definitely should be a no-op. I also wonder if pread/pwrite with offset == 0 should be valid on things where the offset makes no sense. I'm rather surprised the offset isn't just silently ignored for devices where seeking is non-sensical. You might want to error it for mag tapes, but not pipes, ttys, sockets etc. I really can't remember what SYSV, Solaris or NetBSD do. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-20 22:25 ` David Laight @ 2021-01-20 23:02 ` Al Viro 0 siblings, 0 replies; 65+ messages in thread From: Al Viro @ 2021-01-20 23:02 UTC (permalink / raw) To: David Laight Cc: 'Linus Torvalds', Christoph Hellwig, Johannes Berg, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman On Wed, Jan 20, 2021 at 10:25:56PM +0000, David Laight wrote: > I also wonder if pread/pwrite with offset == 0 should be valid > on things where the offset makes no sense. > > I'm rather surprised the offset isn't just silently ignored > for devices where seeking is non-sensical. > You might want to error it for mag tapes, but not pipes, > ttys, sockets etc. > > I really can't remember what SYSV, Solaris or NetBSD do. ... nor can you be arsed to RTFPOSIX. Why am I not surprised? In https://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html (located by arcane action known as googling for pwrite POSIX): ============================== The pwrite() function shall fail if: [EINVAL] The file is a regular file or block special file, and the offset argument is negative. The file offset shall remain unchanged. [ESPIPE] The file is incapable of seeking. ============================== ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-20 19:27 ` Linus Torvalds 2021-01-20 22:25 ` David Laight @ 2021-01-20 23:14 ` Al Viro 2021-01-20 23:40 ` Linus Torvalds 1 sibling, 1 reply; 65+ messages in thread From: Al Viro @ 2021-01-20 23:14 UTC (permalink / raw) To: Linus Torvalds Cc: Christoph Hellwig, Johannes Berg, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman On Wed, Jan 20, 2021 at 11:27:26AM -0800, Linus Torvalds wrote: > On Wed, Jan 20, 2021 at 11:11 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > Why do we care about O_APPEND on anything without FMODE_PWRITE (including > > pipes), anyway? All writes there ignore position, after all... > > We shouldn't care. > > Also, I think we should try to move away from FMODE_PWRITE/PREAD > entirely, and use FMODE_STREAM as the primary "this thing doesn't have > a position at all". > > That's what gets rid of all the f_pos locking etc after all. The > FMODE_PWRITE/PREAD flags are I think legacy (although we do seem to > have the seq_file case that normally allows position on reads, but not > on writes, so we may need to keep all three bits). Umm... Why do we clear FMODE_PWRITE there, anyway? It came in commit 915a29ec1c5e34283a6231af1036114e4d612cb0 Author: Linus Torvalds <torvalds@ppc970.osdl.org> Date: Sat Aug 7 02:08:23 2004 -0700 Add pread/pwrite support bits to match the lseek bit. This also removes the ESPIPE logic from pipes and seq_files, since the VFS layer now supports it. with seq_read() losing the special-cased pread prevention and seq_open() getting a ban on both pread and pwrite. With pread() support added in 2009, and (pointless) pwrite prohibition left in place. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-20 23:14 ` Al Viro @ 2021-01-20 23:40 ` Linus Torvalds 2021-01-21 0:38 ` Al Viro 0 siblings, 1 reply; 65+ messages in thread From: Linus Torvalds @ 2021-01-20 23:40 UTC (permalink / raw) To: Al Viro Cc: Christoph Hellwig, Johannes Berg, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman On Wed, Jan 20, 2021 at 3:14 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > Umm... Why do we clear FMODE_PWRITE there [seq_open - ed], anyway? I think it's pointless and historical, and comes from "several /proc files supported the simple single-write model, nothing ever supported moving around and writing". The seq_file stuff was always about reading, and then the writing part was generally random special-case hacks on the side. So I think that "clear PWRITE" thing is to make sure we get sane error cases if somebody tries something funny, knowing that none of the hacky stuff support it. And then the very special kernfs thing adds it back in, because it does in fact allow seeking writes. Linus ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-20 23:40 ` Linus Torvalds @ 2021-01-21 0:38 ` Al Viro 2021-01-21 1:04 ` Linus Torvalds 0 siblings, 1 reply; 65+ messages in thread From: Al Viro @ 2021-01-21 0:38 UTC (permalink / raw) To: Linus Torvalds Cc: Christoph Hellwig, Johannes Berg, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman On Wed, Jan 20, 2021 at 03:40:29PM -0800, Linus Torvalds wrote: > On Wed, Jan 20, 2021 at 3:14 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > Umm... Why do we clear FMODE_PWRITE there [seq_open - ed], anyway? > > I think it's pointless and historical, and comes from "several /proc > files supported the simple single-write model, nothing ever supported > moving around and writing". > > The seq_file stuff was always about reading, and then the writing part > was generally random special-case hacks on the side. > > So I think that "clear PWRITE" thing is to make sure we get sane error > cases if somebody tries something funny, knowing that none of the > hacky stuff support it. > > And then the very special kernfs thing adds it back in, because it > does in fact allow seeking writes. OK... I wonder how many debugfs writable files allow pwrite() with BS results... Anyway, possibly more interesting question is why do we care about O_APPEND at all - why not treat it the same way we do in write()? Hell, even our pwrite() just goes ahead and writes to the end of file, whatever position it had been given. Yes, for pwrite(2) that's contrary to POSIX, but it's probably cast in stone by that point anyway... Looking through the instances of ->splice_write(), iter_file_splice_write() will end up appending the data to EOF and so will gfs2_file_splice_write(). For sockets (generic_splice_sendpage()) we definitely don't give a toss about O_APPEND (F_SETFL can set it, so that case is possible to hit), ditto for splice_write_null() and port_fops_splice_write(). Which leaves only one instance: fuse_dev_splice_write(), which also should ignore O_APPEND (IMO fuse_dev_open() ought to call nonseekable_open() anyway). So... why do we ban O_APPEND on destination for splice() or for sendfile()? AFAICS, if we simply remove that test, we'll end up with write going to the end of O_APPEND file. same as for write()/pwrite(). Comments? ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-21 0:38 ` Al Viro @ 2021-01-21 1:04 ` Linus Torvalds 2021-01-21 1:45 ` Al Viro 2021-01-21 10:08 ` David Laight 0 siblings, 2 replies; 65+ messages in thread From: Linus Torvalds @ 2021-01-21 1:04 UTC (permalink / raw) To: Al Viro Cc: Christoph Hellwig, Johannes Berg, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman On Wed, Jan 20, 2021 at 4:38 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > OK... I wonder how many debugfs writable files allow pwrite() with > BS results... I hope some of them check for "pos == 0" when they start parsing integers. But honestly, I don't think it's a big deal. We've had these things that just basically assume that whenever you write, the offset just doesn't matter at all, and as long as some number comes in one single write call, we accept it. Because even if you end up doing something like just echo $SOMETHING > /sys/xyz/abc and that "$SOMETHING" could be done multiple writes, in practice it all works out just fine and it never really is. You almost have to try to screw up with something like (echo -n 3; echo -n 4) > /sys/xyz/abc to actually see two writes of "3" and "4" instead of one write with "34". And honestly, if somebody does something like that, do we really care? They might get 3, they might get 4, and they might get 34. They get what they deserve. > Anyway, possibly more interesting question is why do we care about > O_APPEND at all - why not treat it the same way we do in write()? The whole point of O_APPEND is that the position shouldn't matter. And the whole point of "pwrite()" is that you specify a position. So the two just do not go together - although we may have legacy issues, of course. In contrast, the whole point of just a plain "write()" is that the position is the "current file position", with O_APPEND is just a special rule for what the current position for a write is. Now, splice() is able to do *both* write() and pwrite(), because unlike pwrite() it doesn't take a "pos" argument, it takes a _pointer_ to pos. So with a NULL pointer, it's like read/write, and with a non-NULL pointer it is like pread/pwrite. So I do think that "splice with non-NULL off_out and O_APPEND" should cause an error in general. That said, we probabyl have legacy behavior with splice and pipes in particular, and that legacy behavior would override any "this is conceptually the sane model". > So... why do we ban O_APPEND on destination for splice() or for sendfile()? sendfile() shouldn't be an issue. The offset pointer for sendfile is for the _source_, not the destination. For splice(), I do think that O_APPEND and a position pointer don't make sense as a combination, although if we do allow it for regular file pwrite() (I didn't check), maybe we could allow it for splice() too just to be erqually inconsistent. Honestly, I don't think it's a huge deal. O_APPEND isn't that interesting, but I do hope that if we allow O_APPEND and a file position, then O_APPEND always overrides it. Linus ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-21 1:04 ` Linus Torvalds @ 2021-01-21 1:45 ` Al Viro 2021-01-21 3:38 ` Linus Torvalds 2021-01-21 10:08 ` David Laight 1 sibling, 1 reply; 65+ messages in thread From: Al Viro @ 2021-01-21 1:45 UTC (permalink / raw) To: Linus Torvalds Cc: Christoph Hellwig, Johannes Berg, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman On Wed, Jan 20, 2021 at 05:04:17PM -0800, Linus Torvalds wrote: > The whole point of O_APPEND is that the position shouldn't matter. > > And the whole point of "pwrite()" is that you specify a position. > > So the two just do not go together - although we may have legacy > issues, of course. Our pwrite(2): BUGS POSIX requires that opening a file with the O_APPEND flag should have no effect on the location at which pwrite() writes data. However, on Linux, if a file is opened with O_APPEND, pwrite() appends data to the end of the file, regardless of the value of offset. POSIX pwrite(2): The pwrite() function shall be equivalent to write(), except that it writes into a given position and does not change the file offset (regardless of whether O_APPEND is set). The first three arguments to pwrite() are the same as write() with the addition of a fourth argument offset for the desired position inside the file. An attempt to perform a pwrite() on a file that is incapable of seeking shall result in an error. I don't believe that we could change behaviour of our pwrite(2) without breaking userland, even if we wanted to. It's been that way since 2.1.60 when pwrite() had been first introduced - 23 years ago is more than enough to have it cast in stone. We do allow pwrite(2) with O_APPEND and on such descriptors it acts like write(2) on the same. > Now, splice() is able to do *both* write() and pwrite(), because > unlike pwrite() it doesn't take a "pos" argument, it takes a _pointer_ > to pos. So with a NULL pointer, it's like read/write, and with a > non-NULL pointer it is like pread/pwrite. > > So I do think that "splice with non-NULL off_out and O_APPEND" should > cause an error in general. splice() triggers an error for seekable destination with O_APPEND and with NULL off_out. Same for splice() to socket with fcntl(sock_fd, F_SETFL, O_APPEND); done first. > Honestly, I don't think it's a huge deal. O_APPEND isn't that > interesting, but I do hope that if we allow O_APPEND and a file > position, then O_APPEND always overrides it. It does, when it is allowed. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-21 1:45 ` Al Viro @ 2021-01-21 3:38 ` Linus Torvalds 2021-01-21 6:05 ` Willy Tarreau 0 siblings, 1 reply; 65+ messages in thread From: Linus Torvalds @ 2021-01-21 3:38 UTC (permalink / raw) To: Al Viro Cc: Christoph Hellwig, Johannes Berg, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman On Wed, Jan 20, 2021 at 5:45 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > splice() triggers an error for seekable destination with O_APPEND and > with NULL off_out. Ok, that's just broken. > Same for splice() to socket with > fcntl(sock_fd, F_SETFL, O_APPEND); > done first. Same. As long as you don't pass a position pointer, I think both should just work. Not that I imagine it matters for a lot of people.. Linus Linus ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-21 3:38 ` Linus Torvalds @ 2021-01-21 6:05 ` Willy Tarreau 2021-01-21 8:04 ` Johannes Berg 0 siblings, 1 reply; 65+ messages in thread From: Willy Tarreau @ 2021-01-21 6:05 UTC (permalink / raw) To: Linus Torvalds Cc: Al Viro, Christoph Hellwig, Johannes Berg, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman On Wed, Jan 20, 2021 at 07:38:38PM -0800, Linus Torvalds wrote: > On Wed, Jan 20, 2021 at 5:45 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > splice() triggers an error for seekable destination with O_APPEND and > > with NULL off_out. > > Ok, that's just broken. > > > Same for splice() to socket with > > fcntl(sock_fd, F_SETFL, O_APPEND); > > done first. > > Same. > > As long as you don't pass a position pointer, I think both should just work. > > Not that I imagine it matters for a lot of people.. I think that most users of splice() on sockets got used to falling back to recv/send on splice failure due to various cases not being supported historically (UNIX family sockets immediately come to my mind but I seem to remember other combinations). Thus I guess that most users of splice() detect that it doesn't work either due to lower than expected performance or while running strace. Willy ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-21 6:05 ` Willy Tarreau @ 2021-01-21 8:04 ` Johannes Berg 0 siblings, 0 replies; 65+ messages in thread From: Johannes Berg @ 2021-01-21 8:04 UTC (permalink / raw) To: Willy Tarreau, Linus Torvalds Cc: Al Viro, Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman On Thu, 2021-01-21 at 07:05 +0100, Willy Tarreau wrote: > I think that most users of splice() on sockets got used to falling back > to recv/send on splice failure due to various cases not being supported > historically (UNIX family sockets immediately come to my mind but I seem > to remember other combinations). Note, however, that I got here because cgit, if using sendfile(), does *not* fall back if it fails (and thus my git tree view is currently down because I haven't downgraded the kernel so far). That may not be common for splice() though. johannes ^ permalink raw reply [flat|nested] 65+ messages in thread
* RE: Splicing to/from a tty 2021-01-21 1:04 ` Linus Torvalds 2021-01-21 1:45 ` Al Viro @ 2021-01-21 10:08 ` David Laight 1 sibling, 0 replies; 65+ messages in thread From: David Laight @ 2021-01-21 10:08 UTC (permalink / raw) To: 'Linus Torvalds', Al Viro Cc: Christoph Hellwig, Johannes Berg, Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman From: Linus Torvalds > Sent: 21 January 2021 01:04 > On Wed, Jan 20, 2021 at 4:38 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > OK... I wonder how many debugfs writable files allow pwrite() with > > BS results... > > I hope some of them check for "pos == 0" when they start parsing integers. > > But honestly, I don't think it's a big deal. We've had these things > that just basically assume that whenever you write, the offset just > doesn't matter at all, and as long as some number comes in one single > write call, we accept it. > > Because even if you end up doing something like just > > echo $SOMETHING > /sys/xyz/abc > > and that "$SOMETHING" could be done multiple writes, in practice it > all works out just fine and it never really is. You almost have to try > to screw up with something like > > (echo -n 3; echo -n 4) > /sys/xyz/abc > > to actually see two writes of "3" and "4" instead of one write with > "34". And honestly, if somebody does something like that, do we really > care? They might get 3, they might get 4, and they might get 34. They > get what they deserve. Or worse: echo abc >/sys/xyz/abc echo x | dd bs=1 count=2 oseek=1 conv=notrunc of=/sys/xyz/abc which (if I got the dd command right) would generate "axc" on a real file. OTOH multiple short reads are quite likely. Best not done on a counter... David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-16 7:35 Splicing to/from a tty Oliver Giles 2021-01-16 16:46 ` Johannes Berg @ 2021-01-18 8:16 ` Christoph Hellwig 2021-01-18 19:36 ` Linus Torvalds 1 sibling, 1 reply; 65+ messages in thread From: Christoph Hellwig @ 2021-01-18 8:16 UTC (permalink / raw) To: Oliver Giles Cc: linux-kernel, Greg Kroah-Hartman, Christoph Hellwig, Linus Torvalds, Al Viro On Sat, Jan 16, 2021 at 08:35:41PM +1300, Oliver Giles wrote: > Commit 36e2c7421f02 (fs: don't allow splice read/write without explicit ops) broke my userspace application which talks to an SSL VPN by splice()ing between "openssl s_client" and "pppd". The latter operates over a pty, and since that commit there is no fallback for splice()ing between a pipe and a pty, or any tty for that matter. > > The above commit mentions switching them to the iter ops and using generic_file_splice_read. IIUC, this would require implementing iter ops also on the line disciplines, which sounds pretty disruptive. > > For my case, I attempted to instead implement splice_write and splice_read in tty_fops; I managed to get splice_write working calling ld->ops->write, but splice_read is not so simple because the tty_ldisc_ops read method expects a userspace buffer. So I cannot see how to implement this without either (a) using set_fs, or (b) implementing iter ops on all line disciplines. set_fs is gone for all the important platforms. So yes, you basically need to convert to iov_iter or have a huge splice_write parallel infrastucture. > > Is splice()ing between a tty and a pipe worth supporting at all? Not a big deal for my use case at least, but it used to work. Our normal policy is no regressions for exiting userspace. By that we'd have to fix it. Let me see if I can help you with this in any way. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-18 8:16 ` Christoph Hellwig @ 2021-01-18 19:36 ` Linus Torvalds 2021-01-18 20:24 ` Linus Torvalds 0 siblings, 1 reply; 65+ messages in thread From: Linus Torvalds @ 2021-01-18 19:36 UTC (permalink / raw) To: Christoph Hellwig Cc: Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman, Al Viro On Mon, Jan 18, 2021 at 12:16 AM Christoph Hellwig <hch@lst.de> wrote: > > On Sat, Jan 16, 2021 at 08:35:41PM +1300, Oliver Giles wrote: > > For my case, I attempted to instead implement splice_write and splice_read in tty_fops; I managed to get splice_write working calling ld->ops->write, but splice_read is not so simple because the tty_ldisc_ops read method expects a userspace buffer. So I cannot see how to implement this without either (a) using set_fs, or (b) implementing iter ops on all line disciplines. > > set_fs is gone for all the important platforms. So yes, you basically > need to convert to iov_iter or have a huge splice_write parallel > infrastucture. It might ok to try to limit it to just the pty cases and ldisc ops that need it, apparently in this case pty (and presumably just one or two line disciplines) Of course, it probably would be really nice to try to convert tty_read() to use the same model that we have for tty_write(), and then make the ld->ops->read() function actually take a kernel buffer instead. I wonder how painful that would be. Linus ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-18 19:36 ` Linus Torvalds @ 2021-01-18 20:24 ` Linus Torvalds 2021-01-18 21:35 ` Linus Torvalds 0 siblings, 1 reply; 65+ messages in thread From: Linus Torvalds @ 2021-01-18 20:24 UTC (permalink / raw) To: Christoph Hellwig Cc: Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman, Al Viro [-- Attachment #1: Type: text/plain, Size: 2108 bytes --] On Mon, Jan 18, 2021 at 11:36 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Of course, it probably would be really nice to try to convert > tty_read() to use the same model that we have for tty_write(), and > then make the ld->ops->read() function actually take a kernel buffer > instead. > > I wonder how painful that would be. Oh, that seems _much_ less painful than I thought it would be. This is a COMPLETELY BROKEN patch that builds cleanly for me. I say that it's completely broken for three reasons: (a) that extra "int dummy" argument is there purely to force build errors if some user hasn't been converted. (b) I intentionally made the conversion truly stupid. It's not "broken", but it needs cleanup for nasty variable names (ie things like me changing that already badly named "b" variable to "kbp" to again catch all users at build time) (c) the buffer handling in tty_read() is actively broken. Things like HDLC rely on getting a proper buffer that can hold the whole packet, and the max packet size is actually potentially quite big. But the only real problem does seem to be HDLC, which has a default maximum frame size of 4k, but it can be set all the way up to 64kB as a module parameter. So that tty_read() conversion in this patch is complete garbage, but I really just wrote this to see how many painful places there are. And there are not many. The HDLC case could probably be done fairly well by - have a small on-stack buffer for the "small read" case - have a kmalloc() buffer that defaults to one page for bigger reads - grow the kmalloc() buffer if the ->read function returns -EOVERFLOW Comments? NOTE! This does *not* do the actual splice_read() implementation. No, this literally is just the preparatory stage for that by starting the whole "make the tty ldisc read path use a kernel buffer instead". Anybody want to play with this? I'd suggest keeping that "dummy" parameter around for a while - I did an allmodconfig build with this, but if there are any architecture-specific non-x86-64 cases I wouldn't have seen them. Linus [-- Attachment #2: patch --] [-- Type: application/octet-stream, Size: 10483 bytes --] drivers/bluetooth/hci_ldisc.c | 2 +- drivers/input/serio/serport.c | 3 ++- drivers/net/ppp/ppp_async.c | 2 +- drivers/net/ppp/ppp_synctty.c | 2 +- drivers/tty/n_gsm.c | 2 +- drivers/tty/n_hdlc.c | 8 +++----- drivers/tty/n_null.c | 2 +- drivers/tty/n_tracerouter.c | 2 +- drivers/tty/n_tracesink.c | 2 +- drivers/tty/n_tty.c | 39 +++++++++++++++------------------------ drivers/tty/tty_io.c | 13 +++++++++---- include/linux/tty_ldisc.h | 2 +- net/nfc/nci/uart.c | 2 +- 13 files changed, 38 insertions(+), 43 deletions(-) diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c index f83d67eafc9f..371c21715202 100644 --- a/drivers/bluetooth/hci_ldisc.c +++ b/drivers/bluetooth/hci_ldisc.c @@ -802,7 +802,7 @@ static int hci_uart_tty_ioctl(struct tty_struct *tty, struct file *file, * We don't provide read/write/poll interface for user space. */ static ssize_t hci_uart_tty_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) + int dummy, unsigned char *buf, size_t nr) { return 0; } diff --git a/drivers/input/serio/serport.c b/drivers/input/serio/serport.c index 8ac970a423de..1a9afd350a73 100644 --- a/drivers/input/serio/serport.c +++ b/drivers/input/serio/serport.c @@ -156,7 +156,8 @@ static void serport_ldisc_receive(struct tty_struct *tty, const unsigned char *c * returning 0 characters. */ -static ssize_t serport_ldisc_read(struct tty_struct * tty, struct file * file, unsigned char __user * buf, size_t nr) +static ssize_t serport_ldisc_read(struct tty_struct * tty, struct file * file, + int dummy, unsigned char *kbuf, size_t nr) { struct serport *serport = (struct serport*) tty->disc_data; struct serio *serio; diff --git a/drivers/net/ppp/ppp_async.c b/drivers/net/ppp/ppp_async.c index 29a0917a81e6..58789a70282d 100644 --- a/drivers/net/ppp/ppp_async.c +++ b/drivers/net/ppp/ppp_async.c @@ -259,7 +259,7 @@ static int ppp_asynctty_hangup(struct tty_struct *tty) */ static ssize_t ppp_asynctty_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t count) + int dummy, unsigned char *buf, size_t count) { return -EAGAIN; } diff --git a/drivers/net/ppp/ppp_synctty.c b/drivers/net/ppp/ppp_synctty.c index 0f338752c38b..7ad4180b1360 100644 --- a/drivers/net/ppp/ppp_synctty.c +++ b/drivers/net/ppp/ppp_synctty.c @@ -257,7 +257,7 @@ static int ppp_sync_hangup(struct tty_struct *tty) */ static ssize_t ppp_sync_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t count) + int dummy, unsigned char *buf, size_t count) { return -EAGAIN; } diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index c676fa89ee0b..e37ba8903c82 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -2559,7 +2559,7 @@ static void gsmld_write_wakeup(struct tty_struct *tty) */ static ssize_t gsmld_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) + int dummy, unsigned char *buf, size_t nr) { return -EOPNOTSUPP; } diff --git a/drivers/tty/n_hdlc.c b/drivers/tty/n_hdlc.c index 12557ee1edb6..c62d719adaaa 100644 --- a/drivers/tty/n_hdlc.c +++ b/drivers/tty/n_hdlc.c @@ -416,7 +416,7 @@ static void n_hdlc_tty_receive(struct tty_struct *tty, const __u8 *data, * Returns the number of bytes returned or error code. */ static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file, - __u8 __user *buf, size_t nr) + int dummy, __u8 *kbuf, size_t nr) { struct n_hdlc *n_hdlc = tty->disc_data; int ret = 0; @@ -442,10 +442,8 @@ static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file, ret = -EOVERFLOW; } else { __set_current_state(TASK_RUNNING); - if (copy_to_user(buf, rbuf->buf, rbuf->count)) - ret = -EFAULT; - else - ret = rbuf->count; + memcpy(kbuf, rbuf->buf, rbuf->count); + ret = rbuf->count; } if (n_hdlc->rx_free_buf_list.count > diff --git a/drivers/tty/n_null.c b/drivers/tty/n_null.c index 96feabae4740..d42363c62333 100644 --- a/drivers/tty/n_null.c +++ b/drivers/tty/n_null.c @@ -20,7 +20,7 @@ static void n_null_close(struct tty_struct *tty) } static ssize_t n_null_read(struct tty_struct *tty, struct file *file, - unsigned char __user * buf, size_t nr) + int dummy, unsigned char *buf, size_t nr) { return -EOPNOTSUPP; } diff --git a/drivers/tty/n_tracerouter.c b/drivers/tty/n_tracerouter.c index 4479af4d2fa5..9eadaab29d7e 100644 --- a/drivers/tty/n_tracerouter.c +++ b/drivers/tty/n_tracerouter.c @@ -118,7 +118,7 @@ static void n_tracerouter_close(struct tty_struct *tty) * -EINVAL */ static ssize_t n_tracerouter_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) { + int dummy, unsigned char *buf, size_t nr) { return -EINVAL; } diff --git a/drivers/tty/n_tracesink.c b/drivers/tty/n_tracesink.c index d96ba82cc356..a8508a07bc0d 100644 --- a/drivers/tty/n_tracesink.c +++ b/drivers/tty/n_tracesink.c @@ -115,7 +115,7 @@ static void n_tracesink_close(struct tty_struct *tty) * -EINVAL */ static ssize_t n_tracesink_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) { + int dummy, unsigned char *buf, size_t nr) { return -EINVAL; } diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index 319d68c8a5df..e46fa08f3354 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -1962,7 +1962,7 @@ static inline int input_available_p(struct tty_struct *tty, int poll) */ static int copy_from_read_buf(struct tty_struct *tty, - unsigned char __user **b, + unsigned char **kbp, size_t *nr) { @@ -1978,8 +1978,7 @@ static int copy_from_read_buf(struct tty_struct *tty, n = min(*nr, n); if (n) { unsigned char *from = read_buf_addr(ldata, tail); - retval = copy_to_user(*b, from, n); - n -= retval; + memcpy(*kbp, from, n); is_eof = n == 1 && *from == EOF_CHAR(tty); tty_audit_add_data(tty, from, n); zero_buffer(tty, from, n); @@ -1988,7 +1987,7 @@ static int copy_from_read_buf(struct tty_struct *tty, if (L_EXTPROC(tty) && ldata->icanon && is_eof && (head == ldata->read_tail)) n = 0; - *b += n; + *kbp += n; *nr -= n; } return retval; @@ -2132,10 +2131,10 @@ static int job_control(struct tty_struct *tty, struct file *file) */ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) + int dummy, unsigned char *kbuf, size_t nr) { struct n_tty_data *ldata = tty->disc_data; - unsigned char __user *b = buf; + unsigned char *kb = kbuf; DEFINE_WAIT_FUNC(wait, woken_wake_function); int c; int minimum, time; @@ -2181,17 +2180,13 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, /* First test for status change. */ if (packet && tty->link->ctrl_status) { unsigned char cs; - if (b != buf) + if (kb != kbuf) break; spin_lock_irq(&tty->link->ctrl_lock); cs = tty->link->ctrl_status; tty->link->ctrl_status = 0; spin_unlock_irq(&tty->link->ctrl_lock); - if (put_user(cs, b)) { - retval = -EFAULT; - break; - } - b++; + *kb++ = cs; nr--; break; } @@ -2234,24 +2229,20 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, } if (ldata->icanon && !L_EXTPROC(tty)) { - retval = canon_copy_from_read_buf(tty, &b, &nr); + retval = canon_copy_from_read_buf(tty, &kb, &nr); if (retval) break; } else { int uncopied; /* Deal with packet mode. */ - if (packet && b == buf) { - if (put_user(TIOCPKT_DATA, b)) { - retval = -EFAULT; - break; - } - b++; + if (packet && kb == kbuf) { + *kb++ = TIOCPKT_DATA; nr--; } - uncopied = copy_from_read_buf(tty, &b, &nr); - uncopied += copy_from_read_buf(tty, &b, &nr); + uncopied = copy_from_read_buf(tty, &kb, &nr); + uncopied += copy_from_read_buf(tty, &kb, &nr); if (uncopied) { retval = -EFAULT; break; @@ -2260,7 +2251,7 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, n_tty_check_unthrottle(tty); - if (b - buf >= minimum) + if (kb - kbuf >= minimum) break; if (time) timeout = time; @@ -2272,8 +2263,8 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, remove_wait_queue(&tty->read_wait, &wait); mutex_unlock(&ldata->atomic_read_lock); - if (b - buf) - retval = b - buf; + if (kb - kbuf) + retval = kb - kbuf; return retval; } diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index 8034489337d7..d2d558046420 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -864,10 +864,15 @@ static ssize_t tty_read(struct file *file, char __user *buf, size_t count, ld = tty_ldisc_ref_wait(tty); if (!ld) return hung_up_tty_read(file, buf, count, ppos); - if (ld->ops->read) - i = ld->ops->read(tty, file, buf, count); - else - i = -EIO; + i = -EIO; + if (ld->ops->read) { + char kernel_buf[32]; + if (count > sizeof(kernel_buf)) + count = sizeof(kernel_buf); + i = ld->ops->read(tty, file, 0, kernel_buf, count); + if (i > 0 && copy_to_user(buf, kernel_buf, i)) + i = -EFAULT; + } tty_ldisc_deref(ld); if (i > 0) diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h index b1e6043e9917..516d946e57b6 100644 --- a/include/linux/tty_ldisc.h +++ b/include/linux/tty_ldisc.h @@ -185,7 +185,7 @@ struct tty_ldisc_ops { void (*close)(struct tty_struct *); void (*flush_buffer)(struct tty_struct *tty); ssize_t (*read)(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr); + int dummy, unsigned char *buf, size_t nr); ssize_t (*write)(struct tty_struct *tty, struct file *file, const unsigned char *buf, size_t nr); int (*ioctl)(struct tty_struct *tty, struct file *file, diff --git a/net/nfc/nci/uart.c b/net/nfc/nci/uart.c index 11b554ce07ff..c4d457ba1668 100644 --- a/net/nfc/nci/uart.c +++ b/net/nfc/nci/uart.c @@ -292,7 +292,7 @@ static int nci_uart_tty_ioctl(struct tty_struct *tty, struct file *file, /* We don't provide read/write/poll interface for user space. */ static ssize_t nci_uart_tty_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) + int dummy, unsigned char *buf, size_t nr) { return 0; } ^ permalink raw reply related [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-18 20:24 ` Linus Torvalds @ 2021-01-18 21:35 ` Linus Torvalds 2021-01-18 21:54 ` Linus Torvalds 2021-01-19 11:52 ` Greg Kroah-Hartman 0 siblings, 2 replies; 65+ messages in thread From: Linus Torvalds @ 2021-01-18 21:35 UTC (permalink / raw) To: Christoph Hellwig Cc: Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman, Al Viro [-- Attachment #1: Type: text/plain, Size: 1108 bytes --] On Mon, Jan 18, 2021 at 12:24 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Anybody want to play with this? I'd suggest keeping that "dummy" > parameter around for a while - I did an allmodconfig build with this, > but if there are any architecture-specific non-x86-64 cases I wouldn't > have seen them. Not architecture-specific, but I did find by some grepping that I missed one line discipline driver: the Siemens R3964. The reason I missed that is because it's been marked BROKEN in the Kconfig for almost two years, so it didn't show up in my allmodconfig coverage. But the fact that I found it makes me a bit happier about my patch actually covering all the cases. My grep exercise failed on the bluetooth hci_ldisc.c pattern, which I feel is because the bluetooth code used the wrong pattern to initialize the 'struct tty_ldisc_ops', so I fixed that up too. So here's a slightly updated version of that patch, but apart from slightly better coverage - even if it's a driver that is disabled anyway - I'd like to point out that all my previous caveats still apply. Linus [-- Attachment #2: patch --] [-- Type: application/octet-stream, Size: 13324 bytes --] drivers/bluetooth/hci_ldisc.c | 33 ++++++++++++++++----------------- drivers/input/serio/serport.c | 3 ++- drivers/net/ppp/ppp_async.c | 2 +- drivers/net/ppp/ppp_synctty.c | 2 +- drivers/tty/n_gsm.c | 2 +- drivers/tty/n_hdlc.c | 8 +++----- drivers/tty/n_null.c | 2 +- drivers/tty/n_r3964.c | 9 +++------ drivers/tty/n_tracerouter.c | 2 +- drivers/tty/n_tracesink.c | 2 +- drivers/tty/n_tty.c | 39 +++++++++++++++------------------------ drivers/tty/tty_io.c | 13 +++++++++---- include/linux/tty_ldisc.h | 2 +- net/nfc/nci/uart.c | 2 +- 14 files changed, 56 insertions(+), 65 deletions(-) diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c index f83d67eafc9f..1adc43e471bc 100644 --- a/drivers/bluetooth/hci_ldisc.c +++ b/drivers/bluetooth/hci_ldisc.c @@ -802,7 +802,7 @@ static int hci_uart_tty_ioctl(struct tty_struct *tty, struct file *file, * We don't provide read/write/poll interface for user space. */ static ssize_t hci_uart_tty_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) + int dummy, unsigned char *buf, size_t nr) { return 0; } @@ -819,29 +819,28 @@ static __poll_t hci_uart_tty_poll(struct tty_struct *tty, return 0; } +static struct tty_ldisc_ops hci_uart_ldisc = { + .owner = THIS_MODULE, + .magic = TTY_LDISC_MAGIC, + .name = "n_hci", + .open = hci_uart_tty_open, + .close = hci_uart_tty_close, + .read = hci_uart_tty_read, + .write = hci_uart_tty_write, + .ioctl = hci_uart_tty_ioctl, + .compat_ioctl = hci_uart_tty_ioctl, + .poll = hci_uart_tty_poll, + .receive_buf = hci_uart_tty_receive, + .write_wakeup = hci_uart_tty_wakeup, +}; + static int __init hci_uart_init(void) { - static struct tty_ldisc_ops hci_uart_ldisc; int err; BT_INFO("HCI UART driver ver %s", VERSION); /* Register the tty discipline */ - - memset(&hci_uart_ldisc, 0, sizeof(hci_uart_ldisc)); - hci_uart_ldisc.magic = TTY_LDISC_MAGIC; - hci_uart_ldisc.name = "n_hci"; - hci_uart_ldisc.open = hci_uart_tty_open; - hci_uart_ldisc.close = hci_uart_tty_close; - hci_uart_ldisc.read = hci_uart_tty_read; - hci_uart_ldisc.write = hci_uart_tty_write; - hci_uart_ldisc.ioctl = hci_uart_tty_ioctl; - hci_uart_ldisc.compat_ioctl = hci_uart_tty_ioctl; - hci_uart_ldisc.poll = hci_uart_tty_poll; - hci_uart_ldisc.receive_buf = hci_uart_tty_receive; - hci_uart_ldisc.write_wakeup = hci_uart_tty_wakeup; - hci_uart_ldisc.owner = THIS_MODULE; - err = tty_register_ldisc(N_HCI, &hci_uart_ldisc); if (err) { BT_ERR("HCI line discipline registration failed. (%d)", err); diff --git a/drivers/input/serio/serport.c b/drivers/input/serio/serport.c index 8ac970a423de..1a9afd350a73 100644 --- a/drivers/input/serio/serport.c +++ b/drivers/input/serio/serport.c @@ -156,7 +156,8 @@ static void serport_ldisc_receive(struct tty_struct *tty, const unsigned char *c * returning 0 characters. */ -static ssize_t serport_ldisc_read(struct tty_struct * tty, struct file * file, unsigned char __user * buf, size_t nr) +static ssize_t serport_ldisc_read(struct tty_struct * tty, struct file * file, + int dummy, unsigned char *kbuf, size_t nr) { struct serport *serport = (struct serport*) tty->disc_data; struct serio *serio; diff --git a/drivers/net/ppp/ppp_async.c b/drivers/net/ppp/ppp_async.c index 29a0917a81e6..58789a70282d 100644 --- a/drivers/net/ppp/ppp_async.c +++ b/drivers/net/ppp/ppp_async.c @@ -259,7 +259,7 @@ static int ppp_asynctty_hangup(struct tty_struct *tty) */ static ssize_t ppp_asynctty_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t count) + int dummy, unsigned char *buf, size_t count) { return -EAGAIN; } diff --git a/drivers/net/ppp/ppp_synctty.c b/drivers/net/ppp/ppp_synctty.c index 0f338752c38b..7ad4180b1360 100644 --- a/drivers/net/ppp/ppp_synctty.c +++ b/drivers/net/ppp/ppp_synctty.c @@ -257,7 +257,7 @@ static int ppp_sync_hangup(struct tty_struct *tty) */ static ssize_t ppp_sync_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t count) + int dummy, unsigned char *buf, size_t count) { return -EAGAIN; } diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index c676fa89ee0b..e37ba8903c82 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -2559,7 +2559,7 @@ static void gsmld_write_wakeup(struct tty_struct *tty) */ static ssize_t gsmld_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) + int dummy, unsigned char *buf, size_t nr) { return -EOPNOTSUPP; } diff --git a/drivers/tty/n_hdlc.c b/drivers/tty/n_hdlc.c index 12557ee1edb6..c62d719adaaa 100644 --- a/drivers/tty/n_hdlc.c +++ b/drivers/tty/n_hdlc.c @@ -416,7 +416,7 @@ static void n_hdlc_tty_receive(struct tty_struct *tty, const __u8 *data, * Returns the number of bytes returned or error code. */ static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file, - __u8 __user *buf, size_t nr) + int dummy, __u8 *kbuf, size_t nr) { struct n_hdlc *n_hdlc = tty->disc_data; int ret = 0; @@ -442,10 +442,8 @@ static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file, ret = -EOVERFLOW; } else { __set_current_state(TASK_RUNNING); - if (copy_to_user(buf, rbuf->buf, rbuf->count)) - ret = -EFAULT; - else - ret = rbuf->count; + memcpy(kbuf, rbuf->buf, rbuf->count); + ret = rbuf->count; } if (n_hdlc->rx_free_buf_list.count > diff --git a/drivers/tty/n_null.c b/drivers/tty/n_null.c index 96feabae4740..d42363c62333 100644 --- a/drivers/tty/n_null.c +++ b/drivers/tty/n_null.c @@ -20,7 +20,7 @@ static void n_null_close(struct tty_struct *tty) } static ssize_t n_null_read(struct tty_struct *tty, struct file *file, - unsigned char __user * buf, size_t nr) + int dummy, unsigned char *buf, size_t nr) { return -EOPNOTSUPP; } diff --git a/drivers/tty/n_r3964.c b/drivers/tty/n_r3964.c index 934dd2fb2ec8..1c8143276b3e 100644 --- a/drivers/tty/n_r3964.c +++ b/drivers/tty/n_r3964.c @@ -129,7 +129,7 @@ static void remove_client_block(struct r3964_info *pInfo, static int r3964_open(struct tty_struct *tty); static void r3964_close(struct tty_struct *tty); static ssize_t r3964_read(struct tty_struct *tty, struct file *file, - unsigned char __user * buf, size_t nr); + int dummy, unsigned char *buf, size_t nr); static ssize_t r3964_write(struct tty_struct *tty, struct file *file, const unsigned char *buf, size_t nr); static int r3964_ioctl(struct tty_struct *tty, struct file *file, @@ -1058,7 +1058,7 @@ static void r3964_close(struct tty_struct *tty) } static ssize_t r3964_read(struct tty_struct *tty, struct file *file, - unsigned char __user * buf, size_t nr) + int dummy, unsigned char *kbuf, size_t nr) { struct r3964_info *pInfo = tty->disc_data; struct r3964_client_info *pClient; @@ -1109,10 +1109,7 @@ static ssize_t r3964_read(struct tty_struct *tty, struct file *file, kfree(pMsg); TRACE_M("r3964_read - msg kfree %p", pMsg); - if (copy_to_user(buf, &theMsg, ret)) { - ret = -EFAULT; - goto unlock; - } + memcpy(kbuf, &theMsg, ret); TRACE_PS("read - return %d", ret); goto unlock; diff --git a/drivers/tty/n_tracerouter.c b/drivers/tty/n_tracerouter.c index 4479af4d2fa5..9eadaab29d7e 100644 --- a/drivers/tty/n_tracerouter.c +++ b/drivers/tty/n_tracerouter.c @@ -118,7 +118,7 @@ static void n_tracerouter_close(struct tty_struct *tty) * -EINVAL */ static ssize_t n_tracerouter_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) { + int dummy, unsigned char *buf, size_t nr) { return -EINVAL; } diff --git a/drivers/tty/n_tracesink.c b/drivers/tty/n_tracesink.c index d96ba82cc356..a8508a07bc0d 100644 --- a/drivers/tty/n_tracesink.c +++ b/drivers/tty/n_tracesink.c @@ -115,7 +115,7 @@ static void n_tracesink_close(struct tty_struct *tty) * -EINVAL */ static ssize_t n_tracesink_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) { + int dummy, unsigned char *buf, size_t nr) { return -EINVAL; } diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index 319d68c8a5df..e46fa08f3354 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -1962,7 +1962,7 @@ static inline int input_available_p(struct tty_struct *tty, int poll) */ static int copy_from_read_buf(struct tty_struct *tty, - unsigned char __user **b, + unsigned char **kbp, size_t *nr) { @@ -1978,8 +1978,7 @@ static int copy_from_read_buf(struct tty_struct *tty, n = min(*nr, n); if (n) { unsigned char *from = read_buf_addr(ldata, tail); - retval = copy_to_user(*b, from, n); - n -= retval; + memcpy(*kbp, from, n); is_eof = n == 1 && *from == EOF_CHAR(tty); tty_audit_add_data(tty, from, n); zero_buffer(tty, from, n); @@ -1988,7 +1987,7 @@ static int copy_from_read_buf(struct tty_struct *tty, if (L_EXTPROC(tty) && ldata->icanon && is_eof && (head == ldata->read_tail)) n = 0; - *b += n; + *kbp += n; *nr -= n; } return retval; @@ -2132,10 +2131,10 @@ static int job_control(struct tty_struct *tty, struct file *file) */ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) + int dummy, unsigned char *kbuf, size_t nr) { struct n_tty_data *ldata = tty->disc_data; - unsigned char __user *b = buf; + unsigned char *kb = kbuf; DEFINE_WAIT_FUNC(wait, woken_wake_function); int c; int minimum, time; @@ -2181,17 +2180,13 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, /* First test for status change. */ if (packet && tty->link->ctrl_status) { unsigned char cs; - if (b != buf) + if (kb != kbuf) break; spin_lock_irq(&tty->link->ctrl_lock); cs = tty->link->ctrl_status; tty->link->ctrl_status = 0; spin_unlock_irq(&tty->link->ctrl_lock); - if (put_user(cs, b)) { - retval = -EFAULT; - break; - } - b++; + *kb++ = cs; nr--; break; } @@ -2234,24 +2229,20 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, } if (ldata->icanon && !L_EXTPROC(tty)) { - retval = canon_copy_from_read_buf(tty, &b, &nr); + retval = canon_copy_from_read_buf(tty, &kb, &nr); if (retval) break; } else { int uncopied; /* Deal with packet mode. */ - if (packet && b == buf) { - if (put_user(TIOCPKT_DATA, b)) { - retval = -EFAULT; - break; - } - b++; + if (packet && kb == kbuf) { + *kb++ = TIOCPKT_DATA; nr--; } - uncopied = copy_from_read_buf(tty, &b, &nr); - uncopied += copy_from_read_buf(tty, &b, &nr); + uncopied = copy_from_read_buf(tty, &kb, &nr); + uncopied += copy_from_read_buf(tty, &kb, &nr); if (uncopied) { retval = -EFAULT; break; @@ -2260,7 +2251,7 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, n_tty_check_unthrottle(tty); - if (b - buf >= minimum) + if (kb - kbuf >= minimum) break; if (time) timeout = time; @@ -2272,8 +2263,8 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, remove_wait_queue(&tty->read_wait, &wait); mutex_unlock(&ldata->atomic_read_lock); - if (b - buf) - retval = b - buf; + if (kb - kbuf) + retval = kb - kbuf; return retval; } diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index 8034489337d7..d2d558046420 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -864,10 +864,15 @@ static ssize_t tty_read(struct file *file, char __user *buf, size_t count, ld = tty_ldisc_ref_wait(tty); if (!ld) return hung_up_tty_read(file, buf, count, ppos); - if (ld->ops->read) - i = ld->ops->read(tty, file, buf, count); - else - i = -EIO; + i = -EIO; + if (ld->ops->read) { + char kernel_buf[32]; + if (count > sizeof(kernel_buf)) + count = sizeof(kernel_buf); + i = ld->ops->read(tty, file, 0, kernel_buf, count); + if (i > 0 && copy_to_user(buf, kernel_buf, i)) + i = -EFAULT; + } tty_ldisc_deref(ld); if (i > 0) diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h index b1e6043e9917..516d946e57b6 100644 --- a/include/linux/tty_ldisc.h +++ b/include/linux/tty_ldisc.h @@ -185,7 +185,7 @@ struct tty_ldisc_ops { void (*close)(struct tty_struct *); void (*flush_buffer)(struct tty_struct *tty); ssize_t (*read)(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr); + int dummy, unsigned char *buf, size_t nr); ssize_t (*write)(struct tty_struct *tty, struct file *file, const unsigned char *buf, size_t nr); int (*ioctl)(struct tty_struct *tty, struct file *file, diff --git a/net/nfc/nci/uart.c b/net/nfc/nci/uart.c index 11b554ce07ff..c4d457ba1668 100644 --- a/net/nfc/nci/uart.c +++ b/net/nfc/nci/uart.c @@ -292,7 +292,7 @@ static int nci_uart_tty_ioctl(struct tty_struct *tty, struct file *file, /* We don't provide read/write/poll interface for user space. */ static ssize_t nci_uart_tty_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) + int dummy, unsigned char *buf, size_t nr) { return 0; } ^ permalink raw reply related [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-18 21:35 ` Linus Torvalds @ 2021-01-18 21:54 ` Linus Torvalds 2021-01-18 22:03 ` Linus Torvalds 2021-01-19 11:52 ` Greg Kroah-Hartman 1 sibling, 1 reply; 65+ messages in thread From: Linus Torvalds @ 2021-01-18 21:54 UTC (permalink / raw) To: Christoph Hellwig Cc: Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman, Al Viro On Mon, Jan 18, 2021 at 1:35 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > So here's a slightly updated version of that patch, but apart from > slightly better coverage - even if it's a driver that is disabled > anyway - I'd like to point out that all my previous caveats still > apply. I have now booted that version to see that I didn't make any horribly obvious mistakes. And I must have screwed something up. I can actually use the machine normally (I'm writing this running that kernel), but when I decided to test the actual virtual console (as opposed to the GUI terminal windows that use pty's), I can't even log in. That *may* just be due to the inexcusably lazy and stupid "chunk things up into 32 byte pieces" I did. Most programs shouldn't care, tty's can return partial results anyway, but it's obviously a fairly fundamental and silly change. But it might well be some other conversion bug of mine even if I tried to keep it fairly minimal and straight-forward. So I suggest taking that patch with a lot of salt, and really treating it as a very rough starting point (which was the point of it - trying to actually boot it as-is was more of a "let's see if it survives at all" thing). Linus ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-18 21:54 ` Linus Torvalds @ 2021-01-18 22:03 ` Linus Torvalds 2021-01-18 22:20 ` Linus Torvalds 0 siblings, 1 reply; 65+ messages in thread From: Linus Torvalds @ 2021-01-18 22:03 UTC (permalink / raw) To: Christoph Hellwig Cc: Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman, Al Viro On Mon, Jan 18, 2021 at 1:54 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > But it might well be some other conversion bug of mine even if I tried > to keep it fairly minimal and straight-forward. Duh. I completely forgot to handle the canon_copy_from_read_buf() case. So ICANON mode was entirely scrogged and would just return -EFAULT. That would do it. I'm surprised how well everything I did actually worked - because all my normal terminal apps (shell, editor etc) obviously end up not using icanon at all. I'll have a third patch in a moment, but while it's ready I want to actually reboot and confirm it first. Linus ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-18 22:03 ` Linus Torvalds @ 2021-01-18 22:20 ` Linus Torvalds 2021-01-19 1:38 ` Linus Torvalds 0 siblings, 1 reply; 65+ messages in thread From: Linus Torvalds @ 2021-01-18 22:20 UTC (permalink / raw) To: Christoph Hellwig Cc: Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman, Al Viro [-- Attachment #1: Type: text/plain, Size: 817 bytes --] On Mon, Jan 18, 2021 at 2:03 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I'll have a third patch in a moment, but while it's ready I want to > actually reboot and confirm it first. Here. This is an actually usable and tested starting point for this all. Again - it doesn't handle HDLC correctly, because it will chop all reads up into 32-byte chunks. And some other apps that think they'll get whole lines might end up being confused too. So it's not a "real" patch, but with improved buffer handling in tty_read(), I think this is actually quite close. NOTE: due to security reasons (ie password data), we do want to clear that buffer after we've copied the data to user space. Again, not something I did in my patch, so it would be part of that "improved buffer handling" Linus [-- Attachment #2: patch --] [-- Type: application/octet-stream, Size: 16622 bytes --] drivers/bluetooth/hci_ldisc.c | 33 +++++++++--------- drivers/input/serio/serport.c | 3 +- drivers/net/ppp/ppp_async.c | 2 +- drivers/net/ppp/ppp_synctty.c | 2 +- drivers/tty/n_gsm.c | 2 +- drivers/tty/n_hdlc.c | 8 ++--- drivers/tty/n_null.c | 2 +- drivers/tty/n_r3964.c | 9 ++--- drivers/tty/n_tracerouter.c | 2 +- drivers/tty/n_tracesink.c | 2 +- drivers/tty/n_tty.c | 81 ++++++++++++++++++------------------------- drivers/tty/tty_io.c | 13 ++++--- include/linux/tty_ldisc.h | 2 +- net/nfc/nci/uart.c | 2 +- 14 files changed, 74 insertions(+), 89 deletions(-) diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c index f83d67eafc9f..1adc43e471bc 100644 --- a/drivers/bluetooth/hci_ldisc.c +++ b/drivers/bluetooth/hci_ldisc.c @@ -802,7 +802,7 @@ static int hci_uart_tty_ioctl(struct tty_struct *tty, struct file *file, * We don't provide read/write/poll interface for user space. */ static ssize_t hci_uart_tty_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) + int dummy, unsigned char *buf, size_t nr) { return 0; } @@ -819,29 +819,28 @@ static __poll_t hci_uart_tty_poll(struct tty_struct *tty, return 0; } +static struct tty_ldisc_ops hci_uart_ldisc = { + .owner = THIS_MODULE, + .magic = TTY_LDISC_MAGIC, + .name = "n_hci", + .open = hci_uart_tty_open, + .close = hci_uart_tty_close, + .read = hci_uart_tty_read, + .write = hci_uart_tty_write, + .ioctl = hci_uart_tty_ioctl, + .compat_ioctl = hci_uart_tty_ioctl, + .poll = hci_uart_tty_poll, + .receive_buf = hci_uart_tty_receive, + .write_wakeup = hci_uart_tty_wakeup, +}; + static int __init hci_uart_init(void) { - static struct tty_ldisc_ops hci_uart_ldisc; int err; BT_INFO("HCI UART driver ver %s", VERSION); /* Register the tty discipline */ - - memset(&hci_uart_ldisc, 0, sizeof(hci_uart_ldisc)); - hci_uart_ldisc.magic = TTY_LDISC_MAGIC; - hci_uart_ldisc.name = "n_hci"; - hci_uart_ldisc.open = hci_uart_tty_open; - hci_uart_ldisc.close = hci_uart_tty_close; - hci_uart_ldisc.read = hci_uart_tty_read; - hci_uart_ldisc.write = hci_uart_tty_write; - hci_uart_ldisc.ioctl = hci_uart_tty_ioctl; - hci_uart_ldisc.compat_ioctl = hci_uart_tty_ioctl; - hci_uart_ldisc.poll = hci_uart_tty_poll; - hci_uart_ldisc.receive_buf = hci_uart_tty_receive; - hci_uart_ldisc.write_wakeup = hci_uart_tty_wakeup; - hci_uart_ldisc.owner = THIS_MODULE; - err = tty_register_ldisc(N_HCI, &hci_uart_ldisc); if (err) { BT_ERR("HCI line discipline registration failed. (%d)", err); diff --git a/drivers/input/serio/serport.c b/drivers/input/serio/serport.c index 8ac970a423de..1a9afd350a73 100644 --- a/drivers/input/serio/serport.c +++ b/drivers/input/serio/serport.c @@ -156,7 +156,8 @@ static void serport_ldisc_receive(struct tty_struct *tty, const unsigned char *c * returning 0 characters. */ -static ssize_t serport_ldisc_read(struct tty_struct * tty, struct file * file, unsigned char __user * buf, size_t nr) +static ssize_t serport_ldisc_read(struct tty_struct * tty, struct file * file, + int dummy, unsigned char *kbuf, size_t nr) { struct serport *serport = (struct serport*) tty->disc_data; struct serio *serio; diff --git a/drivers/net/ppp/ppp_async.c b/drivers/net/ppp/ppp_async.c index 29a0917a81e6..58789a70282d 100644 --- a/drivers/net/ppp/ppp_async.c +++ b/drivers/net/ppp/ppp_async.c @@ -259,7 +259,7 @@ static int ppp_asynctty_hangup(struct tty_struct *tty) */ static ssize_t ppp_asynctty_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t count) + int dummy, unsigned char *buf, size_t count) { return -EAGAIN; } diff --git a/drivers/net/ppp/ppp_synctty.c b/drivers/net/ppp/ppp_synctty.c index 0f338752c38b..7ad4180b1360 100644 --- a/drivers/net/ppp/ppp_synctty.c +++ b/drivers/net/ppp/ppp_synctty.c @@ -257,7 +257,7 @@ static int ppp_sync_hangup(struct tty_struct *tty) */ static ssize_t ppp_sync_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t count) + int dummy, unsigned char *buf, size_t count) { return -EAGAIN; } diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index c676fa89ee0b..e37ba8903c82 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -2559,7 +2559,7 @@ static void gsmld_write_wakeup(struct tty_struct *tty) */ static ssize_t gsmld_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) + int dummy, unsigned char *buf, size_t nr) { return -EOPNOTSUPP; } diff --git a/drivers/tty/n_hdlc.c b/drivers/tty/n_hdlc.c index 12557ee1edb6..c62d719adaaa 100644 --- a/drivers/tty/n_hdlc.c +++ b/drivers/tty/n_hdlc.c @@ -416,7 +416,7 @@ static void n_hdlc_tty_receive(struct tty_struct *tty, const __u8 *data, * Returns the number of bytes returned or error code. */ static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file, - __u8 __user *buf, size_t nr) + int dummy, __u8 *kbuf, size_t nr) { struct n_hdlc *n_hdlc = tty->disc_data; int ret = 0; @@ -442,10 +442,8 @@ static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file, ret = -EOVERFLOW; } else { __set_current_state(TASK_RUNNING); - if (copy_to_user(buf, rbuf->buf, rbuf->count)) - ret = -EFAULT; - else - ret = rbuf->count; + memcpy(kbuf, rbuf->buf, rbuf->count); + ret = rbuf->count; } if (n_hdlc->rx_free_buf_list.count > diff --git a/drivers/tty/n_null.c b/drivers/tty/n_null.c index 96feabae4740..d42363c62333 100644 --- a/drivers/tty/n_null.c +++ b/drivers/tty/n_null.c @@ -20,7 +20,7 @@ static void n_null_close(struct tty_struct *tty) } static ssize_t n_null_read(struct tty_struct *tty, struct file *file, - unsigned char __user * buf, size_t nr) + int dummy, unsigned char *buf, size_t nr) { return -EOPNOTSUPP; } diff --git a/drivers/tty/n_r3964.c b/drivers/tty/n_r3964.c index 934dd2fb2ec8..1c8143276b3e 100644 --- a/drivers/tty/n_r3964.c +++ b/drivers/tty/n_r3964.c @@ -129,7 +129,7 @@ static void remove_client_block(struct r3964_info *pInfo, static int r3964_open(struct tty_struct *tty); static void r3964_close(struct tty_struct *tty); static ssize_t r3964_read(struct tty_struct *tty, struct file *file, - unsigned char __user * buf, size_t nr); + int dummy, unsigned char *buf, size_t nr); static ssize_t r3964_write(struct tty_struct *tty, struct file *file, const unsigned char *buf, size_t nr); static int r3964_ioctl(struct tty_struct *tty, struct file *file, @@ -1058,7 +1058,7 @@ static void r3964_close(struct tty_struct *tty) } static ssize_t r3964_read(struct tty_struct *tty, struct file *file, - unsigned char __user * buf, size_t nr) + int dummy, unsigned char *kbuf, size_t nr) { struct r3964_info *pInfo = tty->disc_data; struct r3964_client_info *pClient; @@ -1109,10 +1109,7 @@ static ssize_t r3964_read(struct tty_struct *tty, struct file *file, kfree(pMsg); TRACE_M("r3964_read - msg kfree %p", pMsg); - if (copy_to_user(buf, &theMsg, ret)) { - ret = -EFAULT; - goto unlock; - } + memcpy(kbuf, &theMsg, ret); TRACE_PS("read - return %d", ret); goto unlock; diff --git a/drivers/tty/n_tracerouter.c b/drivers/tty/n_tracerouter.c index 4479af4d2fa5..9eadaab29d7e 100644 --- a/drivers/tty/n_tracerouter.c +++ b/drivers/tty/n_tracerouter.c @@ -118,7 +118,7 @@ static void n_tracerouter_close(struct tty_struct *tty) * -EINVAL */ static ssize_t n_tracerouter_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) { + int dummy, unsigned char *buf, size_t nr) { return -EINVAL; } diff --git a/drivers/tty/n_tracesink.c b/drivers/tty/n_tracesink.c index d96ba82cc356..a8508a07bc0d 100644 --- a/drivers/tty/n_tracesink.c +++ b/drivers/tty/n_tracesink.c @@ -115,7 +115,7 @@ static void n_tracesink_close(struct tty_struct *tty) * -EINVAL */ static ssize_t n_tracesink_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) { + int dummy, unsigned char *buf, size_t nr) { return -EINVAL; } diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index 319d68c8a5df..2b75b595a45c 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -164,29 +164,24 @@ static void zero_buffer(struct tty_struct *tty, u8 *buffer, int size) memset(buffer, 0x00, size); } -static int tty_copy_to_user(struct tty_struct *tty, void __user *to, - size_t tail, size_t n) +static void tty_copy(struct tty_struct *tty, void *to, size_t tail, size_t n) { struct n_tty_data *ldata = tty->disc_data; size_t size = N_TTY_BUF_SIZE - tail; void *from = read_buf_addr(ldata, tail); - int uncopied; if (n > size) { tty_audit_add_data(tty, from, size); - uncopied = copy_to_user(to, from, size); - zero_buffer(tty, from, size - uncopied); - if (uncopied) - return uncopied; + memcpy(to, from, size); + zero_buffer(tty, from, size); to += size; n -= size; from = ldata->read_buf; } tty_audit_add_data(tty, from, n); - uncopied = copy_to_user(to, from, n); - zero_buffer(tty, from, n - uncopied); - return uncopied; + memcpy(to, from, n); + zero_buffer(tty, from, n); } /** @@ -1944,15 +1939,16 @@ static inline int input_available_p(struct tty_struct *tty, int poll) /** * copy_from_read_buf - copy read data directly * @tty: terminal device - * @b: user data + * @kbp: data * @nr: size of data * * Helper function to speed up n_tty_read. It is only called when - * ICANON is off; it copies characters straight from the tty queue to - * user space directly. It can be profitably called twice; once to - * drain the space from the tail pointer to the (physical) end of the - * buffer, and once to drain the space from the (physical) beginning of - * the buffer to head pointer. + * ICANON is off; it copies characters straight from the tty queue. + * + * It can be profitably called twice; once to drain the space from + * the tail pointer to the (physical) end of the buffer, and once + * to drain the space from the (physical) beginning of the buffer + * to head pointer. * * Called under the ldata->atomic_read_lock sem * @@ -1962,7 +1958,7 @@ static inline int input_available_p(struct tty_struct *tty, int poll) */ static int copy_from_read_buf(struct tty_struct *tty, - unsigned char __user **b, + unsigned char **kbp, size_t *nr) { @@ -1978,8 +1974,7 @@ static int copy_from_read_buf(struct tty_struct *tty, n = min(*nr, n); if (n) { unsigned char *from = read_buf_addr(ldata, tail); - retval = copy_to_user(*b, from, n); - n -= retval; + memcpy(*kbp, from, n); is_eof = n == 1 && *from == EOF_CHAR(tty); tty_audit_add_data(tty, from, n); zero_buffer(tty, from, n); @@ -1988,7 +1983,7 @@ static int copy_from_read_buf(struct tty_struct *tty, if (L_EXTPROC(tty) && ldata->icanon && is_eof && (head == ldata->read_tail)) n = 0; - *b += n; + *kbp += n; *nr -= n; } return retval; @@ -1997,12 +1992,12 @@ static int copy_from_read_buf(struct tty_struct *tty, /** * canon_copy_from_read_buf - copy read data in canonical mode * @tty: terminal device - * @b: user data + * @kbp: data * @nr: size of data * * Helper function for n_tty_read. It is only called when ICANON is on; * it copies one line of input up to and including the line-delimiting - * character into the user-space buffer. + * character into the result buffer. * * NB: When termios is changed from non-canonical to canonical mode and * the read buffer contains data, n_tty_set_termios() simulates an EOF @@ -2018,14 +2013,14 @@ static int copy_from_read_buf(struct tty_struct *tty, */ static int canon_copy_from_read_buf(struct tty_struct *tty, - unsigned char __user **b, + unsigned char **kbp, size_t *nr) { struct n_tty_data *ldata = tty->disc_data; size_t n, size, more, c; size_t eol; size_t tail; - int ret, found = 0; + int found = 0; /* N.B. avoid overrun if nr == 0 */ if (!*nr) @@ -2061,10 +2056,8 @@ static int canon_copy_from_read_buf(struct tty_struct *tty, n_tty_trace("%s: eol:%zu found:%d n:%zu c:%zu tail:%zu more:%zu\n", __func__, eol, found, n, c, tail, more); - ret = tty_copy_to_user(tty, *b, tail, n); - if (ret) - return -EFAULT; - *b += n; + tty_copy(tty, *kbp, tail, n); + *kbp += n; *nr -= n; if (found) @@ -2132,10 +2125,10 @@ static int job_control(struct tty_struct *tty, struct file *file) */ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) + int dummy, unsigned char *kbuf, size_t nr) { struct n_tty_data *ldata = tty->disc_data; - unsigned char __user *b = buf; + unsigned char *kb = kbuf; DEFINE_WAIT_FUNC(wait, woken_wake_function); int c; int minimum, time; @@ -2181,17 +2174,13 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, /* First test for status change. */ if (packet && tty->link->ctrl_status) { unsigned char cs; - if (b != buf) + if (kb != kbuf) break; spin_lock_irq(&tty->link->ctrl_lock); cs = tty->link->ctrl_status; tty->link->ctrl_status = 0; spin_unlock_irq(&tty->link->ctrl_lock); - if (put_user(cs, b)) { - retval = -EFAULT; - break; - } - b++; + *kb++ = cs; nr--; break; } @@ -2234,24 +2223,20 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, } if (ldata->icanon && !L_EXTPROC(tty)) { - retval = canon_copy_from_read_buf(tty, &b, &nr); + retval = canon_copy_from_read_buf(tty, &kb, &nr); if (retval) break; } else { int uncopied; /* Deal with packet mode. */ - if (packet && b == buf) { - if (put_user(TIOCPKT_DATA, b)) { - retval = -EFAULT; - break; - } - b++; + if (packet && kb == kbuf) { + *kb++ = TIOCPKT_DATA; nr--; } - uncopied = copy_from_read_buf(tty, &b, &nr); - uncopied += copy_from_read_buf(tty, &b, &nr); + uncopied = copy_from_read_buf(tty, &kb, &nr); + uncopied += copy_from_read_buf(tty, &kb, &nr); if (uncopied) { retval = -EFAULT; break; @@ -2260,7 +2245,7 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, n_tty_check_unthrottle(tty); - if (b - buf >= minimum) + if (kb - kbuf >= minimum) break; if (time) timeout = time; @@ -2272,8 +2257,8 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, remove_wait_queue(&tty->read_wait, &wait); mutex_unlock(&ldata->atomic_read_lock); - if (b - buf) - retval = b - buf; + if (kb - kbuf) + retval = kb - kbuf; return retval; } diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index 8034489337d7..d2d558046420 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -864,10 +864,15 @@ static ssize_t tty_read(struct file *file, char __user *buf, size_t count, ld = tty_ldisc_ref_wait(tty); if (!ld) return hung_up_tty_read(file, buf, count, ppos); - if (ld->ops->read) - i = ld->ops->read(tty, file, buf, count); - else - i = -EIO; + i = -EIO; + if (ld->ops->read) { + char kernel_buf[32]; + if (count > sizeof(kernel_buf)) + count = sizeof(kernel_buf); + i = ld->ops->read(tty, file, 0, kernel_buf, count); + if (i > 0 && copy_to_user(buf, kernel_buf, i)) + i = -EFAULT; + } tty_ldisc_deref(ld); if (i > 0) diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h index b1e6043e9917..516d946e57b6 100644 --- a/include/linux/tty_ldisc.h +++ b/include/linux/tty_ldisc.h @@ -185,7 +185,7 @@ struct tty_ldisc_ops { void (*close)(struct tty_struct *); void (*flush_buffer)(struct tty_struct *tty); ssize_t (*read)(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr); + int dummy, unsigned char *buf, size_t nr); ssize_t (*write)(struct tty_struct *tty, struct file *file, const unsigned char *buf, size_t nr); int (*ioctl)(struct tty_struct *tty, struct file *file, diff --git a/net/nfc/nci/uart.c b/net/nfc/nci/uart.c index 11b554ce07ff..c4d457ba1668 100644 --- a/net/nfc/nci/uart.c +++ b/net/nfc/nci/uart.c @@ -292,7 +292,7 @@ static int nci_uart_tty_ioctl(struct tty_struct *tty, struct file *file, /* We don't provide read/write/poll interface for user space. */ static ssize_t nci_uart_tty_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) + int dummy, unsigned char *buf, size_t nr) { return 0; } ^ permalink raw reply related [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-18 22:20 ` Linus Torvalds @ 2021-01-19 1:38 ` Linus Torvalds 2021-01-19 11:53 ` Greg Kroah-Hartman 0 siblings, 1 reply; 65+ messages in thread From: Linus Torvalds @ 2021-01-19 1:38 UTC (permalink / raw) To: Christoph Hellwig Cc: Oliver Giles, Linux Kernel Mailing List, Greg Kroah-Hartman, Al Viro, Jiri Slaby [-- Attachment #1: Type: text/plain, Size: 2844 bytes --] On Mon, Jan 18, 2021 at 2:20 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > So it's not a "real" patch, but with improved buffer handling in > tty_read(), I think this is actually quite close. Hmm. I somehow ended up working on this all because it's a Monday, and I don't see a lot of pull requests early in the week. And I think I have a solution for the HDLC "we may need to copy a packet that might be up to 64kB" issue, that isn't really all that ugly. We can just iterate over a random "cookie" that the line discipline can use any way it wants to. In the case of n_hdlc, it can just put the 'rbuf' thing it has into that cookie, and then it can copy it all piece-meal until it is all used up. And if it runs out of space in the middle, it will return -EOVERFLOW, and we're all good. The only other thing such a line discipline needs is the offset into the cookie, but the iterator has to maintain that anyway, so that's simple enough. So here's a fourth patch for this thing today, this time with what I think is actually a working model for the buffer handling. Other line disciplines *could* use the cookie if they want to. I didn't do any of that, though. The normal n_tty line discipline, for example, could easily just loop over the data. It doesn't need an offset or that 'rbuf' pointer, but it still needs to know whether the call is the first one or not, because the first time the n_tty line discipline is called it may need to wait for a minimum number of characters or whatever the termios settings say - but obviously once it has waited for it once, it shouldn't wait for it again the next time around (only on the next actual full read()). IOW, it would be wrong if the termios said "wait for 5 characters", and then it saw 68 characters, copied the first 64, in the first iteration, and than saw "oh, now there are only 4 characters left so now I have to wait for a fifth". So n_tty could use the cookie purely to see whether it's the first iteration or not, and it could just set the cookie to a random value (it always starts out as NULL) to just show what state it is in. I did *NOT* do that, because it's not technically necessary - unlike the hdlc packet case, n_tty returning a partial result is not wrong per se even if we might have reasons to improve on it later. What do people think about this? Also, does anybody have any test-code for the HDLC case? I did find an interesting comment when doing a Debian code search: /* Bloody hell... readv doesn't work with N_HDLC line discipline... GRR! */ and yes, this model would allow us to handle readv() properly for hdlc (and no, the old one did not, because it really wanted to see the whole packet in *one* user buffer). But I have no idea if hdlc is even relevant any more, and if anybody really cares. Anybody? Linus [-- Attachment #2: 0001-tty-convert-tty_ldisc_ops-read-function-to-take-a-ke.patch --] [-- Type: text/x-patch, Size: 21264 bytes --] From 526258e18d13ebb113638d26eba18ca05a51dc2a Mon Sep 17 00:00:00 2001 From: Linus Torvalds <torvalds@linux-foundation.org> Date: Mon, 18 Jan 2021 13:31:30 -0800 Subject: [PATCH] tty: convert tty_ldisc_ops 'read()' function to take a kernel pointer The tty line discipline .read() function was passed the final user pointer destination as an argument, which doesn't match the 'write()' function, and makes it very inconvenient to do a splice method for tty's. This is a conversion to use a kernel buffer instead. NOTE! It does this by passing the tty line discipline ->read() function an additional "cookie" to fill in, and an offset into the cookie data. The line discipline can fill in the cookie data with its own private information, and then the reader will repeat the read until either the cookie is cleared or it runs out of data. The only real user of this is N_HDLC, which can use this to handle big packets, even if the kernel buffer is smaller than the whole packet. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> --- drivers/bluetooth/hci_ldisc.c | 34 +++++++-------- drivers/input/serio/serport.c | 4 +- drivers/net/ppp/ppp_async.c | 3 +- drivers/net/ppp/ppp_synctty.c | 3 +- drivers/tty/n_gsm.c | 3 +- drivers/tty/n_hdlc.c | 60 +++++++++++++++++-------- drivers/tty/n_null.c | 3 +- drivers/tty/n_r3964.c | 10 ++--- drivers/tty/n_tracerouter.c | 4 +- drivers/tty/n_tracesink.c | 4 +- drivers/tty/n_tty.c | 82 +++++++++++++++-------------------- drivers/tty/tty_io.c | 64 +++++++++++++++++++++++++-- include/linux/tty_ldisc.h | 3 +- net/nfc/nci/uart.c | 3 +- 14 files changed, 178 insertions(+), 102 deletions(-) diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c index f83d67eafc9f..dd92aea15b8b 100644 --- a/drivers/bluetooth/hci_ldisc.c +++ b/drivers/bluetooth/hci_ldisc.c @@ -802,7 +802,8 @@ static int hci_uart_tty_ioctl(struct tty_struct *tty, struct file *file, * We don't provide read/write/poll interface for user space. */ static ssize_t hci_uart_tty_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) + unsigned char *buf, size_t nr, + void **cookie, unsigned long offset) { return 0; } @@ -819,29 +820,28 @@ static __poll_t hci_uart_tty_poll(struct tty_struct *tty, return 0; } +static struct tty_ldisc_ops hci_uart_ldisc = { + .owner = THIS_MODULE, + .magic = TTY_LDISC_MAGIC, + .name = "n_hci", + .open = hci_uart_tty_open, + .close = hci_uart_tty_close, + .read = hci_uart_tty_read, + .write = hci_uart_tty_write, + .ioctl = hci_uart_tty_ioctl, + .compat_ioctl = hci_uart_tty_ioctl, + .poll = hci_uart_tty_poll, + .receive_buf = hci_uart_tty_receive, + .write_wakeup = hci_uart_tty_wakeup, +}; + static int __init hci_uart_init(void) { - static struct tty_ldisc_ops hci_uart_ldisc; int err; BT_INFO("HCI UART driver ver %s", VERSION); /* Register the tty discipline */ - - memset(&hci_uart_ldisc, 0, sizeof(hci_uart_ldisc)); - hci_uart_ldisc.magic = TTY_LDISC_MAGIC; - hci_uart_ldisc.name = "n_hci"; - hci_uart_ldisc.open = hci_uart_tty_open; - hci_uart_ldisc.close = hci_uart_tty_close; - hci_uart_ldisc.read = hci_uart_tty_read; - hci_uart_ldisc.write = hci_uart_tty_write; - hci_uart_ldisc.ioctl = hci_uart_tty_ioctl; - hci_uart_ldisc.compat_ioctl = hci_uart_tty_ioctl; - hci_uart_ldisc.poll = hci_uart_tty_poll; - hci_uart_ldisc.receive_buf = hci_uart_tty_receive; - hci_uart_ldisc.write_wakeup = hci_uart_tty_wakeup; - hci_uart_ldisc.owner = THIS_MODULE; - err = tty_register_ldisc(N_HCI, &hci_uart_ldisc); if (err) { BT_ERR("HCI line discipline registration failed. (%d)", err); diff --git a/drivers/input/serio/serport.c b/drivers/input/serio/serport.c index 8ac970a423de..33e9d9bfd036 100644 --- a/drivers/input/serio/serport.c +++ b/drivers/input/serio/serport.c @@ -156,7 +156,9 @@ static void serport_ldisc_receive(struct tty_struct *tty, const unsigned char *c * returning 0 characters. */ -static ssize_t serport_ldisc_read(struct tty_struct * tty, struct file * file, unsigned char __user * buf, size_t nr) +static ssize_t serport_ldisc_read(struct tty_struct * tty, struct file * file, + unsigned char *kbuf, size_t nr, + void **cookie, unsigned long offset) { struct serport *serport = (struct serport*) tty->disc_data; struct serio *serio; diff --git a/drivers/net/ppp/ppp_async.c b/drivers/net/ppp/ppp_async.c index 29a0917a81e6..f14a9d190de9 100644 --- a/drivers/net/ppp/ppp_async.c +++ b/drivers/net/ppp/ppp_async.c @@ -259,7 +259,8 @@ static int ppp_asynctty_hangup(struct tty_struct *tty) */ static ssize_t ppp_asynctty_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t count) + unsigned char *buf, size_t count, + void **cookie, unsigned long offset) { return -EAGAIN; } diff --git a/drivers/net/ppp/ppp_synctty.c b/drivers/net/ppp/ppp_synctty.c index 0f338752c38b..f774b7e52da4 100644 --- a/drivers/net/ppp/ppp_synctty.c +++ b/drivers/net/ppp/ppp_synctty.c @@ -257,7 +257,8 @@ static int ppp_sync_hangup(struct tty_struct *tty) */ static ssize_t ppp_sync_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t count) + unsigned char *buf, size_t count, + void **cookie, unsigned long offset) { return -EAGAIN; } diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index c676fa89ee0b..51dafc06f541 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -2559,7 +2559,8 @@ static void gsmld_write_wakeup(struct tty_struct *tty) */ static ssize_t gsmld_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) + unsigned char *buf, size_t nr, + void **cookie, unsigned long offset) { return -EOPNOTSUPP; } diff --git a/drivers/tty/n_hdlc.c b/drivers/tty/n_hdlc.c index 12557ee1edb6..1363e659dc1d 100644 --- a/drivers/tty/n_hdlc.c +++ b/drivers/tty/n_hdlc.c @@ -416,13 +416,19 @@ static void n_hdlc_tty_receive(struct tty_struct *tty, const __u8 *data, * Returns the number of bytes returned or error code. */ static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file, - __u8 __user *buf, size_t nr) + __u8 *kbuf, size_t nr, + void **cookie, unsigned long offset) { struct n_hdlc *n_hdlc = tty->disc_data; int ret = 0; struct n_hdlc_buf *rbuf; DECLARE_WAITQUEUE(wait, current); + /* Is this a repeated call for an rbuf we already found earlier? */ + rbuf = *cookie; + if (rbuf) + goto have_rbuf; + add_wait_queue(&tty->read_wait, &wait); for (;;) { @@ -436,25 +442,8 @@ static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file, set_current_state(TASK_INTERRUPTIBLE); rbuf = n_hdlc_buf_get(&n_hdlc->rx_buf_list); - if (rbuf) { - if (rbuf->count > nr) { - /* too large for caller's buffer */ - ret = -EOVERFLOW; - } else { - __set_current_state(TASK_RUNNING); - if (copy_to_user(buf, rbuf->buf, rbuf->count)) - ret = -EFAULT; - else - ret = rbuf->count; - } - - if (n_hdlc->rx_free_buf_list.count > - DEFAULT_RX_BUF_COUNT) - kfree(rbuf); - else - n_hdlc_buf_put(&n_hdlc->rx_free_buf_list, rbuf); + if (rbuf) break; - } /* no data */ if (tty_io_nonblock(tty, file)) { @@ -473,6 +462,39 @@ static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file, remove_wait_queue(&tty->read_wait, &wait); __set_current_state(TASK_RUNNING); + if (!rbuf) + return ret; + *cookie = rbuf; + +have_rbuf: + /* Have we used it up entirely? */ + if (offset >= rbuf->count) + goto done_with_rbuf; + + /* More data to go, but can't copy any more? EOVERFLOW */ + ret = -EOVERFLOW; + if (!nr) + goto done_with_rbuf; + + /* Copy as much data as possible */ + ret = rbuf->count - offset; + if (ret > nr) + ret = nr; + memcpy(kbuf, rbuf->buf+offset, ret); + offset += ret; + + /* If we still have data left, we leave the rbuf in the cookie */ + if (offset < rbuf->count) + return ret; + +done_with_rbuf: + *cookie = NULL; + + if (n_hdlc->rx_free_buf_list.count > DEFAULT_RX_BUF_COUNT) + kfree(rbuf); + else + n_hdlc_buf_put(&n_hdlc->rx_free_buf_list, rbuf); + return ret; } /* end of n_hdlc_tty_read() */ diff --git a/drivers/tty/n_null.c b/drivers/tty/n_null.c index 96feabae4740..ce03ae78f5c6 100644 --- a/drivers/tty/n_null.c +++ b/drivers/tty/n_null.c @@ -20,7 +20,8 @@ static void n_null_close(struct tty_struct *tty) } static ssize_t n_null_read(struct tty_struct *tty, struct file *file, - unsigned char __user * buf, size_t nr) + unsigned char *buf, size_t nr, + void **cookie, unsigned long offset) { return -EOPNOTSUPP; } diff --git a/drivers/tty/n_r3964.c b/drivers/tty/n_r3964.c index 934dd2fb2ec8..3161f0a535e3 100644 --- a/drivers/tty/n_r3964.c +++ b/drivers/tty/n_r3964.c @@ -129,7 +129,7 @@ static void remove_client_block(struct r3964_info *pInfo, static int r3964_open(struct tty_struct *tty); static void r3964_close(struct tty_struct *tty); static ssize_t r3964_read(struct tty_struct *tty, struct file *file, - unsigned char __user * buf, size_t nr); + void *cookie, unsigned char *buf, size_t nr); static ssize_t r3964_write(struct tty_struct *tty, struct file *file, const unsigned char *buf, size_t nr); static int r3964_ioctl(struct tty_struct *tty, struct file *file, @@ -1058,7 +1058,8 @@ static void r3964_close(struct tty_struct *tty) } static ssize_t r3964_read(struct tty_struct *tty, struct file *file, - unsigned char __user * buf, size_t nr) + unsigned char *kbuf, size_t nr, + void **cookie, unsigned long offset) { struct r3964_info *pInfo = tty->disc_data; struct r3964_client_info *pClient; @@ -1109,10 +1110,7 @@ static ssize_t r3964_read(struct tty_struct *tty, struct file *file, kfree(pMsg); TRACE_M("r3964_read - msg kfree %p", pMsg); - if (copy_to_user(buf, &theMsg, ret)) { - ret = -EFAULT; - goto unlock; - } + memcpy(kbuf, &theMsg, ret); TRACE_PS("read - return %d", ret); goto unlock; diff --git a/drivers/tty/n_tracerouter.c b/drivers/tty/n_tracerouter.c index 4479af4d2fa5..3490ed51b1a3 100644 --- a/drivers/tty/n_tracerouter.c +++ b/drivers/tty/n_tracerouter.c @@ -118,7 +118,9 @@ static void n_tracerouter_close(struct tty_struct *tty) * -EINVAL */ static ssize_t n_tracerouter_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) { + unsigned char *buf, size_t nr, + void **cookie, unsigned long offset) +{ return -EINVAL; } diff --git a/drivers/tty/n_tracesink.c b/drivers/tty/n_tracesink.c index d96ba82cc356..1d9931041fd8 100644 --- a/drivers/tty/n_tracesink.c +++ b/drivers/tty/n_tracesink.c @@ -115,7 +115,9 @@ static void n_tracesink_close(struct tty_struct *tty) * -EINVAL */ static ssize_t n_tracesink_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) { + unsigned char *buf, size_t nr, + void **cookie, unsigned long offset) +{ return -EINVAL; } diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index 319d68c8a5df..2f2f57a53968 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -164,29 +164,24 @@ static void zero_buffer(struct tty_struct *tty, u8 *buffer, int size) memset(buffer, 0x00, size); } -static int tty_copy_to_user(struct tty_struct *tty, void __user *to, - size_t tail, size_t n) +static void tty_copy(struct tty_struct *tty, void *to, size_t tail, size_t n) { struct n_tty_data *ldata = tty->disc_data; size_t size = N_TTY_BUF_SIZE - tail; void *from = read_buf_addr(ldata, tail); - int uncopied; if (n > size) { tty_audit_add_data(tty, from, size); - uncopied = copy_to_user(to, from, size); - zero_buffer(tty, from, size - uncopied); - if (uncopied) - return uncopied; + memcpy(to, from, size); + zero_buffer(tty, from, size); to += size; n -= size; from = ldata->read_buf; } tty_audit_add_data(tty, from, n); - uncopied = copy_to_user(to, from, n); - zero_buffer(tty, from, n - uncopied); - return uncopied; + memcpy(to, from, n); + zero_buffer(tty, from, n); } /** @@ -1944,15 +1939,16 @@ static inline int input_available_p(struct tty_struct *tty, int poll) /** * copy_from_read_buf - copy read data directly * @tty: terminal device - * @b: user data + * @kbp: data * @nr: size of data * * Helper function to speed up n_tty_read. It is only called when - * ICANON is off; it copies characters straight from the tty queue to - * user space directly. It can be profitably called twice; once to - * drain the space from the tail pointer to the (physical) end of the - * buffer, and once to drain the space from the (physical) beginning of - * the buffer to head pointer. + * ICANON is off; it copies characters straight from the tty queue. + * + * It can be profitably called twice; once to drain the space from + * the tail pointer to the (physical) end of the buffer, and once + * to drain the space from the (physical) beginning of the buffer + * to head pointer. * * Called under the ldata->atomic_read_lock sem * @@ -1962,7 +1958,7 @@ static inline int input_available_p(struct tty_struct *tty, int poll) */ static int copy_from_read_buf(struct tty_struct *tty, - unsigned char __user **b, + unsigned char **kbp, size_t *nr) { @@ -1978,8 +1974,7 @@ static int copy_from_read_buf(struct tty_struct *tty, n = min(*nr, n); if (n) { unsigned char *from = read_buf_addr(ldata, tail); - retval = copy_to_user(*b, from, n); - n -= retval; + memcpy(*kbp, from, n); is_eof = n == 1 && *from == EOF_CHAR(tty); tty_audit_add_data(tty, from, n); zero_buffer(tty, from, n); @@ -1988,7 +1983,7 @@ static int copy_from_read_buf(struct tty_struct *tty, if (L_EXTPROC(tty) && ldata->icanon && is_eof && (head == ldata->read_tail)) n = 0; - *b += n; + *kbp += n; *nr -= n; } return retval; @@ -1997,12 +1992,12 @@ static int copy_from_read_buf(struct tty_struct *tty, /** * canon_copy_from_read_buf - copy read data in canonical mode * @tty: terminal device - * @b: user data + * @kbp: data * @nr: size of data * * Helper function for n_tty_read. It is only called when ICANON is on; * it copies one line of input up to and including the line-delimiting - * character into the user-space buffer. + * character into the result buffer. * * NB: When termios is changed from non-canonical to canonical mode and * the read buffer contains data, n_tty_set_termios() simulates an EOF @@ -2018,14 +2013,14 @@ static int copy_from_read_buf(struct tty_struct *tty, */ static int canon_copy_from_read_buf(struct tty_struct *tty, - unsigned char __user **b, + unsigned char **kbp, size_t *nr) { struct n_tty_data *ldata = tty->disc_data; size_t n, size, more, c; size_t eol; size_t tail; - int ret, found = 0; + int found = 0; /* N.B. avoid overrun if nr == 0 */ if (!*nr) @@ -2061,10 +2056,8 @@ static int canon_copy_from_read_buf(struct tty_struct *tty, n_tty_trace("%s: eol:%zu found:%d n:%zu c:%zu tail:%zu more:%zu\n", __func__, eol, found, n, c, tail, more); - ret = tty_copy_to_user(tty, *b, tail, n); - if (ret) - return -EFAULT; - *b += n; + tty_copy(tty, *kbp, tail, n); + *kbp += n; *nr -= n; if (found) @@ -2132,10 +2125,11 @@ static int job_control(struct tty_struct *tty, struct file *file) */ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) + unsigned char *kbuf, size_t nr, + void **cookie, unsigned long offset) { struct n_tty_data *ldata = tty->disc_data; - unsigned char __user *b = buf; + unsigned char *kb = kbuf; DEFINE_WAIT_FUNC(wait, woken_wake_function); int c; int minimum, time; @@ -2181,17 +2175,13 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, /* First test for status change. */ if (packet && tty->link->ctrl_status) { unsigned char cs; - if (b != buf) + if (kb != kbuf) break; spin_lock_irq(&tty->link->ctrl_lock); cs = tty->link->ctrl_status; tty->link->ctrl_status = 0; spin_unlock_irq(&tty->link->ctrl_lock); - if (put_user(cs, b)) { - retval = -EFAULT; - break; - } - b++; + *kb++ = cs; nr--; break; } @@ -2234,24 +2224,20 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, } if (ldata->icanon && !L_EXTPROC(tty)) { - retval = canon_copy_from_read_buf(tty, &b, &nr); + retval = canon_copy_from_read_buf(tty, &kb, &nr); if (retval) break; } else { int uncopied; /* Deal with packet mode. */ - if (packet && b == buf) { - if (put_user(TIOCPKT_DATA, b)) { - retval = -EFAULT; - break; - } - b++; + if (packet && kb == kbuf) { + *kb++ = TIOCPKT_DATA; nr--; } - uncopied = copy_from_read_buf(tty, &b, &nr); - uncopied += copy_from_read_buf(tty, &b, &nr); + uncopied = copy_from_read_buf(tty, &kb, &nr); + uncopied += copy_from_read_buf(tty, &kb, &nr); if (uncopied) { retval = -EFAULT; break; @@ -2260,7 +2246,7 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, n_tty_check_unthrottle(tty); - if (b - buf >= minimum) + if (kb - kbuf >= minimum) break; if (time) timeout = time; @@ -2272,8 +2258,8 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, remove_wait_queue(&tty->read_wait, &wait); mutex_unlock(&ldata->atomic_read_lock); - if (b - buf) - retval = b - buf; + if (kb - kbuf) + retval = kb - kbuf; return retval; } diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index 8034489337d7..821657e8a88f 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -831,6 +831,65 @@ static void tty_update_time(struct timespec64 *time) time->tv_sec = sec; } +/* + * Iterate on the ldisc ->read() function until we've gotten all + * the data the ldisc has for us. + * + * The "cookie" is something that the ldisc read function can fill + * in to let us know that there is more data to be had. + * + * We promise to continue to call the ldisc until it stops returning + * data or clears the cookie. The cookie may be something that the + * ldisc maintains state for and needs to free. + */ +static int iterate_tty_read(struct tty_ldisc *ld, struct tty_struct *tty, struct file *file, + char __user *buf, size_t count) +{ + int retval = 0; + void *cookie = NULL; + unsigned long offset = 0; + char kernel_buf[64]; + + do { + int size, uncopied; + + size = count > sizeof(kernel_buf) ? sizeof(kernel_buf) : count; + size = ld->ops->read(tty, file, kernel_buf, size, &cookie, offset); + if (!size) + break; + + /* + * A ldisc read error return will override any previously copied + * data (eg -EOVERFLOW from HDLC) + */ + if (size < 0) { + memzero_explicit(kernel_buf, sizeof(kernel_buf)); + return size; + } + + uncopied = copy_to_user(buf+offset, kernel_buf, size); + size -= uncopied; + offset += size; + count -= size; + + /* + * If the user copy failed, we still need to do another ->read() + * call if we had a cookie to let the ldisc clear up. + * + * But make sure size is zeroed. + */ + if (unlikely(uncopied)) { + count = 0; + retval = -EFAULT; + } + } while (cookie); + + /* We always clear tty buffer in case they contained passwords */ + memzero_explicit(kernel_buf, sizeof(kernel_buf)); + return offset ? offset : retval; +} + + /** * tty_read - read method for tty device files * @file: pointer to tty file @@ -864,10 +923,9 @@ static ssize_t tty_read(struct file *file, char __user *buf, size_t count, ld = tty_ldisc_ref_wait(tty); if (!ld) return hung_up_tty_read(file, buf, count, ppos); + i = -EIO; if (ld->ops->read) - i = ld->ops->read(tty, file, buf, count); - else - i = -EIO; + i = iterate_tty_read(ld, tty, file, buf, count); tty_ldisc_deref(ld); if (i > 0) diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h index b1e6043e9917..572a07976116 100644 --- a/include/linux/tty_ldisc.h +++ b/include/linux/tty_ldisc.h @@ -185,7 +185,8 @@ struct tty_ldisc_ops { void (*close)(struct tty_struct *); void (*flush_buffer)(struct tty_struct *tty); ssize_t (*read)(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr); + unsigned char *buf, size_t nr, + void **cookie, unsigned long offset); ssize_t (*write)(struct tty_struct *tty, struct file *file, const unsigned char *buf, size_t nr); int (*ioctl)(struct tty_struct *tty, struct file *file, diff --git a/net/nfc/nci/uart.c b/net/nfc/nci/uart.c index 11b554ce07ff..1204c438e87d 100644 --- a/net/nfc/nci/uart.c +++ b/net/nfc/nci/uart.c @@ -292,7 +292,8 @@ static int nci_uart_tty_ioctl(struct tty_struct *tty, struct file *file, /* We don't provide read/write/poll interface for user space. */ static ssize_t nci_uart_tty_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) + unsigned char *buf, size_t nr, + void **cookie, unsigned long offset) { return 0; } -- 2.29.2.157.g1d47791a39 ^ permalink raw reply related [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-19 1:38 ` Linus Torvalds @ 2021-01-19 11:53 ` Greg Kroah-Hartman 2021-01-19 16:56 ` Robert Karszniewicz ` (2 more replies) 0 siblings, 3 replies; 65+ messages in thread From: Greg Kroah-Hartman @ 2021-01-19 11:53 UTC (permalink / raw) To: Linus Torvalds Cc: Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Al Viro, Jiri Slaby On Mon, Jan 18, 2021 at 05:38:55PM -0800, Linus Torvalds wrote: > On Mon, Jan 18, 2021 at 2:20 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > So it's not a "real" patch, but with improved buffer handling in > > tty_read(), I think this is actually quite close. > > Hmm. > > I somehow ended up working on this all because it's a Monday, and I > don't see a lot of pull requests early in the week. > > And I think I have a solution for the HDLC "we may need to copy a > packet that might be up to 64kB" issue, that isn't really all that > ugly. > > We can just iterate over a random "cookie" that the line discipline > can use any way it wants to. In the case of n_hdlc, it can just put > the 'rbuf' thing it has into that cookie, and then it can copy it all > piece-meal until it is all used up. And if it runs out of space in the > middle, it will return -EOVERFLOW, and we're all good. > > The only other thing such a line discipline needs is the offset into > the cookie, but the iterator has to maintain that anyway, so that's > simple enough. > > So here's a fourth patch for this thing today, this time with what I > think is actually a working model for the buffer handling. > > Other line disciplines *could* use the cookie if they want to. I > didn't do any of that, though. > > The normal n_tty line discipline, for example, could easily just loop > over the data. It doesn't need an offset or that 'rbuf' pointer, but > it still needs to know whether the call is the first one or not, > because the first time the n_tty line discipline is called it may need > to wait for a minimum number of characters or whatever the termios > settings say - but obviously once it has waited for it once, it > shouldn't wait for it again the next time around (only on the next > actual full read()). IOW, it would be wrong if the termios said "wait > for 5 characters", and then it saw 68 characters, copied the first 64, > in the first iteration, and than saw "oh, now there are only 4 > characters left so now I have to wait for a fifth". > > So n_tty could use the cookie purely to see whether it's the first > iteration or not, and it could just set the cookie to a random value > (it always starts out as NULL) to just show what state it is in. > > I did *NOT* do that, because it's not technically necessary - unlike > the hdlc packet case, n_tty returning a partial result is not wrong > per se even if we might have reasons to improve on it later. > > What do people think about this? > > Also, does anybody have any test-code for the HDLC case? I did find an > interesting comment when doing a Debian code search: > > /* Bloody hell... readv doesn't work with N_HDLC line discipline... GRR! */ > > and yes, this model would allow us to handle readv() properly for hdlc > (and no, the old one did not, because it really wanted to see the > whole packet in *one* user buffer). > > But I have no idea if hdlc is even relevant any more, and if anybody > really cares. > > Anybody? This looks sane, but I'm still missing what the goal of this is here. It's nice from a "don't make the ldisc do the userspace copy", point of view, but what is the next step in order to tie that into splice? I ask as I also have reports that sysfs binary files are now failing for this same reason, so I need to make the same change for them and it's not excatly obvious what to do: https://lore.kernel.org/r/1adf9aa4-ed7e-8f05-a354-57419d61ec18@codeaurora.org thanks, greg k-h ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-19 11:53 ` Greg Kroah-Hartman @ 2021-01-19 16:56 ` Robert Karszniewicz 2021-01-19 17:10 ` Robert Karszniewicz 2021-01-19 22:09 ` Oliver Giles 2021-01-19 17:25 ` Linus Torvalds 2021-01-19 20:24 ` Linus Torvalds 2 siblings, 2 replies; 65+ messages in thread From: Robert Karszniewicz @ 2021-01-19 16:56 UTC (permalink / raw) To: Greg Kroah-Hartman, Linus Torvalds Cc: Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Al Viro, Jiri Slaby On 1/19/21 12:53 PM, Greg Kroah-Hartman wrote: > This looks sane, but I'm still missing what the goal of this is here. > It's nice from a "don't make the ldisc do the userspace copy", point of > view, but what is the next step in order to tie that into splice? > > I ask as I also have reports that sysfs binary files are now failing for > this same reason, so I need to make the same change for them and it's > not excatly obvious what to do: > https://lore.kernel.org/r/1adf9aa4-ed7e-8f05-a354-57419d61ec18@codeaurora.org I would like to confirm this. We are using firmwared and it returns EINVAL on sendfile(), too. I have tried setting the .splice_write callback as in the linked thread, but it didn't help, it just EINVAL'd in a different place. I have bisected this issue down to this commit: 4d03e3cc5982 ("fs: don't allow kernel reads and writes without iter ops") Another case I've also noticed is writing to a serial connection: kernel write not supported for file /ttymxc0 (pid: 252 comm: cat) (Which still prints, though, because cat falls back to write(2), I suppose.) Thank you, Robert ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-19 16:56 ` Robert Karszniewicz @ 2021-01-19 17:10 ` Robert Karszniewicz 2021-01-19 22:09 ` Oliver Giles 1 sibling, 0 replies; 65+ messages in thread From: Robert Karszniewicz @ 2021-01-19 17:10 UTC (permalink / raw) To: Greg Kroah-Hartman, Linus Torvalds Cc: Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Al Viro, Jiri Slaby On 1/19/21 5:56 PM, Robert Karszniewicz wrote: > Another case I've also noticed is writing to a serial connection: > kernel write not supported for file /ttymxc0 (pid: 252 comm: cat) Which is actually a TTY and I just failed to see the forest for all the trees... Cheers. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-19 16:56 ` Robert Karszniewicz 2021-01-19 17:10 ` Robert Karszniewicz @ 2021-01-19 22:09 ` Oliver Giles 1 sibling, 0 replies; 65+ messages in thread From: Oliver Giles @ 2021-01-19 22:09 UTC (permalink / raw) To: Robert Karszniewicz, Greg Kroah-Hartman, Linus Torvalds Cc: Christoph Hellwig, Linux Kernel Mailing List, Al Viro, Jiri Slaby On Wed Jan 20, 2021 at 5:56 AM NZDT, Robert Karszniewicz wrote: > > I have bisected this issue down to this commit: > 4d03e3cc5982 ("fs: don't allow kernel reads and writes without iter > ops") > > Another case I've also noticed is writing to a serial connection: > kernel write not supported for file /ttymxc0 (pid: 252 comm: cat) > Tangentially, I hit the same thing when hacking on this. Here's a diff making the implementation match the comment: --- a/fs/read_write.c +++ b/fs/read_write.c @@ -541,7 +541,7 @@ ssize_t __kernel_write(struct file *file, const void *buf, size_t count, loff_t * Also fail if ->write_iter and ->write are both wired up as that * implies very convoluted semantics. */ - if (unlikely(!file->f_op->write_iter || file->f_op->write)) + if (unlikely(file->f_op->write_iter && file->f_op->write)) return warn_unsupported(file, "write"); init_sync_kiocb(&kiocb, file); ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-19 11:53 ` Greg Kroah-Hartman 2021-01-19 16:56 ` Robert Karszniewicz @ 2021-01-19 17:25 ` Linus Torvalds 2021-01-19 20:24 ` Linus Torvalds 2 siblings, 0 replies; 65+ messages in thread From: Linus Torvalds @ 2021-01-19 17:25 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Al Viro, Jiri Slaby On Tue, Jan 19, 2021 at 3:54 AM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > This looks sane, but I'm still missing what the goal of this is here. > It's nice from a "don't make the ldisc do the userspace copy", point of > view, but what is the next step in order to tie that into splice? I'll cook something up. With this, it should be fairly easy to add both the splice and iov_iter versions, because now it only needs the wrappers in tty_io.c, not for each line discipline. I hope. Let's see.. Linus ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-19 11:53 ` Greg Kroah-Hartman 2021-01-19 16:56 ` Robert Karszniewicz 2021-01-19 17:25 ` Linus Torvalds @ 2021-01-19 20:24 ` Linus Torvalds 2021-01-19 20:38 ` Christoph Hellwig 2021-01-20 1:25 ` Oliver Giles 2 siblings, 2 replies; 65+ messages in thread From: Linus Torvalds @ 2021-01-19 20:24 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Al Viro, Jiri Slaby [-- Attachment #1: Type: text/plain, Size: 3343 bytes --] On Tue, Jan 19, 2021 at 3:54 AM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > This looks sane, but I'm still missing what the goal of this is here. > It's nice from a "don't make the ldisc do the userspace copy", point of > view, but what is the next step in order to tie that into splice? Ok, so here's a series of four patches that make ttys possible sources and destinations for splice() again. Well, the first patch is just the pipe one for sendfile() - and it's the hacky one-liner, not the proper one that Al will hopefully add. NOTE! I've signed off on these, because I think they are fine for testing - but they are really meant for testing ONLY. I'm running a kernel with these in place, so they kind of work. And yes, I verified that sendfile() now works with a pipe or tty target. I didn't actually check splicing _from_ a tty, nor did I check that readv/writev now works properly, but it all LoosGood(tm) to me. HOWEVER. The reason these are for testing only is that (a) my tests are pretty limited, and I'd like the actual people who reported this to really test them out (b) the new read iterator model is going to be quite horribly slow for big pty transfers because the n_tty ldisc isn't doing the cookie batching (c) I really really want Al to take a look at that iov_iter_revert() thing in do_tty_write() (in "[PATCH 2/4] tty: implement write_iter") Note that I'm more than happy to do (b) as a patch on top of this, but I'd like (a) and (c) to be clarified before I do that. > I ask as I also have reports that sysfs binary files are now failing for > this same reason, so I need to make the same change for them and it's > not excatly obvious what to do: Ok, so that would require those kernfs_fop_{read,write}() functions to also be converted to read_iter/write_iter. That doesn't look horrendous: it's not all that dissimilar from the two patches to do that for tty's ("tty: implement {read,write}_iter"). The seq_file part already has a iter version for reading, and the main change to kernfs_file_direct_read() and kernfs_fop_write() is to do that (a) change the arguments from file/buf/count/ppos to kiocb/iov_iter (b) change the copy_to/from_user() calls to copy_to/from_iter() Note that (b) involves changing the semantics of the return value: "copy_to/from_user()" returns the number of bytes that were *NOT* copied, while "copy_to/from_iter()" returns the number of bytes that WERE copied. So the error case check does from if (copy_to/from_user()) **ERROR** to if (copy_to/from_iter(n) != n) **ERROR** but that is fairly straightforward. The two "tty: implement write/read_iter" patches (patch 2 and 4) can be used as examples. That said, I want to again stress that they haven't seen all that much testing, and I do want Al to spray his holy penguin pee on that iov_iter_revert() thing in patch 2. I'm honestly not that motivated on those sysfs files: the tty layer was an interesting test-case that I wanted to look at just because the conversion to kernel pointers was nontrivial for the read side. But that sysfs binary file case really isn't interesting, and just more of a "Christoph broke it, I think he should just fix it". Christoph? Anyway, anybody willing to test these tty/pipe patches on the loads that failed? Oliver? Linus [-- Attachment #2: 0001-pipe-allow-sendfile-destination-with-splice_write.patch --] [-- Type: text/x-patch, Size: 1311 bytes --] From 95713b6e8b2247c55dd0a04174a55ea9a7fde7f6 Mon Sep 17 00:00:00 2001 From: Linus Torvalds <torvalds@linux-foundation.org> Date: Tue, 19 Jan 2021 09:26:15 -0800 Subject: [PATCH 1/4] pipe: allow sendfile() destination with splice_write Note that Al Viro is 100% right that this isn't needed for regular splicing (that treats pipes specially, since pipes _are_ the splice buffers). So the correct thing to do is to teach do_splice_direct() the same "hey, it's already a pipe", and fix sendfile() with a pipe destination that way. But this is the one-liner "make it work" thing, rather than the "do it properly" thing that Al will hopefully do. Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops") Reported-by: Johannes Berg <johannes@sipsolutions.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> --- fs/pipe.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/pipe.c b/fs/pipe.c index c5989cfd564d..39c96845a72f 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -1206,6 +1206,7 @@ const struct file_operations pipefifo_fops = { .unlocked_ioctl = pipe_ioctl, .release = pipe_release, .fasync = pipe_fasync, + .splice_write = iter_file_splice_write, }; /* -- 2.29.2.157.g1d47791a39 [-- Attachment #3: 0002-tty-implement-write_iter.patch --] [-- Type: text/x-patch, Size: 5754 bytes --] From 0dce8c5ef15f0aa7b4525721b86a20b7c4df8ca0 Mon Sep 17 00:00:00 2001 From: Linus Torvalds <torvalds@linux-foundation.org> Date: Tue, 19 Jan 2021 11:41:16 -0800 Subject: [PATCH 2/4] tty: implement write_iter This makes the tty layer use the .write_iter() function instead of the traditional .write() functionality. That allows writev(), but more importantly also makes it possible to enable .splice_write() for ttys, reinstating the "splice to tty" functionality that was lost in commit 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops"). Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops") Reported-by: Oliver Giles <ohw.giles@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> --- drivers/tty/tty_io.c | 48 ++++++++++++++++++++++++-------------------- 1 file changed, 26 insertions(+), 22 deletions(-) diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index 8034489337d7..502862626b2b 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -143,9 +143,8 @@ LIST_HEAD(tty_drivers); /* linked list of tty drivers */ DEFINE_MUTEX(tty_mutex); static ssize_t tty_read(struct file *, char __user *, size_t, loff_t *); -static ssize_t tty_write(struct file *, const char __user *, size_t, loff_t *); -ssize_t redirected_tty_write(struct file *, const char __user *, - size_t, loff_t *); +static ssize_t tty_write(struct kiocb *, struct iov_iter *); +ssize_t redirected_tty_write(struct kiocb *, struct iov_iter *); static __poll_t tty_poll(struct file *, poll_table *); static int tty_open(struct inode *, struct file *); long tty_ioctl(struct file *file, unsigned int cmd, unsigned long arg); @@ -478,7 +477,8 @@ static void tty_show_fdinfo(struct seq_file *m, struct file *file) static const struct file_operations tty_fops = { .llseek = no_llseek, .read = tty_read, - .write = tty_write, + .write_iter = tty_write, + .splice_write = iter_file_splice_write, .poll = tty_poll, .unlocked_ioctl = tty_ioctl, .compat_ioctl = tty_compat_ioctl, @@ -491,7 +491,8 @@ static const struct file_operations tty_fops = { static const struct file_operations console_fops = { .llseek = no_llseek, .read = tty_read, - .write = redirected_tty_write, + .write_iter = redirected_tty_write, + .splice_write = iter_file_splice_write, .poll = tty_poll, .unlocked_ioctl = tty_ioctl, .compat_ioctl = tty_compat_ioctl, @@ -606,9 +607,9 @@ static void __tty_hangup(struct tty_struct *tty, int exit_session) /* This breaks for file handles being sent over AF_UNIX sockets ? */ list_for_each_entry(priv, &tty->tty_files, list) { filp = priv->file; - if (filp->f_op->write == redirected_tty_write) + if (filp->f_op->write_iter == redirected_tty_write) cons_filp = filp; - if (filp->f_op->write != tty_write) + if (filp->f_op->write_iter != tty_write) continue; closecount++; __tty_fasync(-1, filp, 0); /* can't block */ @@ -901,9 +902,9 @@ static inline ssize_t do_tty_write( ssize_t (*write)(struct tty_struct *, struct file *, const unsigned char *, size_t), struct tty_struct *tty, struct file *file, - const char __user *buf, - size_t count) + struct iov_iter *from) { + size_t count = iov_iter_count(from); ssize_t ret, written = 0; unsigned int chunk; @@ -955,14 +956,20 @@ static inline ssize_t do_tty_write( size_t size = count; if (size > chunk) size = chunk; + ret = -EFAULT; - if (copy_from_user(tty->write_buf, buf, size)) + if (copy_from_iter(tty->write_buf, size, from) != size) break; + ret = write(tty, file, tty->write_buf, size); if (ret <= 0) break; + + /* FIXME! Have Al check this! */ + if (ret != size) + iov_iter_revert(from, size-ret); + written += ret; - buf += ret; count -= ret; if (!count) break; @@ -1022,9 +1029,9 @@ void tty_write_message(struct tty_struct *tty, char *msg) * write method will not be invoked in parallel for each device. */ -static ssize_t tty_write(struct file *file, const char __user *buf, - size_t count, loff_t *ppos) +static ssize_t tty_write(struct kiocb *iocb, struct iov_iter *from) { + struct file *file = iocb->ki_filp; struct tty_struct *tty = file_tty(file); struct tty_ldisc *ld; ssize_t ret; @@ -1037,18 +1044,15 @@ static ssize_t tty_write(struct file *file, const char __user *buf, if (tty->ops->write_room == NULL) tty_err(tty, "missing write_room method\n"); ld = tty_ldisc_ref_wait(tty); - if (!ld) - return hung_up_tty_write(file, buf, count, ppos); - if (!ld->ops->write) + if (!ld || !ld->ops->write) ret = -EIO; else - ret = do_tty_write(ld->ops->write, tty, file, buf, count); + ret = do_tty_write(ld->ops->write, tty, file, from); tty_ldisc_deref(ld); return ret; } -ssize_t redirected_tty_write(struct file *file, const char __user *buf, - size_t count, loff_t *ppos) +ssize_t redirected_tty_write(struct kiocb *iocb, struct iov_iter *iter) { struct file *p = NULL; @@ -1059,11 +1063,11 @@ ssize_t redirected_tty_write(struct file *file, const char __user *buf, if (p) { ssize_t res; - res = vfs_write(p, buf, count, &p->f_pos); + res = vfs_iocb_iter_write(p, iocb, iter); fput(p); return res; } - return tty_write(file, buf, count, ppos); + return tty_write(iocb, iter); } /* @@ -2295,7 +2299,7 @@ static int tioccons(struct file *file) { if (!capable(CAP_SYS_ADMIN)) return -EPERM; - if (file->f_op->write == redirected_tty_write) { + if (file->f_op->write_iter == redirected_tty_write) { struct file *f; spin_lock(&redirect_lock); f = redirect; -- 2.29.2.157.g1d47791a39 [-- Attachment #4: 0003-tty-convert-tty_ldisc_ops-read-function-to-take-a-ke.patch --] [-- Type: text/x-patch, Size: 21393 bytes --] From 8b7bacf932d1090ea87fd9ad218715055d3eb66e Mon Sep 17 00:00:00 2001 From: Linus Torvalds <torvalds@linux-foundation.org> Date: Mon, 18 Jan 2021 13:31:30 -0800 Subject: [PATCH 3/4] tty: convert tty_ldisc_ops 'read()' function to take a kernel pointer The tty line discipline .read() function was passed the final user pointer destination as an argument, which doesn't match the 'write()' function, and makes it very inconvenient to do a splice method for tty's. This is a conversion to use a kernel buffer instead. NOTE! It does this by passing the tty line discipline ->read() function an additional "cookie" to fill in, and an offset into the cookie data. The line discipline can fill in the cookie data with its own private information, and then the reader will repeat the read until either the cookie is cleared or it runs out of data. The only real user of this is N_HDLC, which can use this to handle big packets, even if the kernel buffer is smaller than the whole packet. Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> --- drivers/bluetooth/hci_ldisc.c | 34 +++++++-------- drivers/input/serio/serport.c | 4 +- drivers/net/ppp/ppp_async.c | 3 +- drivers/net/ppp/ppp_synctty.c | 3 +- drivers/tty/n_gsm.c | 3 +- drivers/tty/n_hdlc.c | 60 +++++++++++++++++-------- drivers/tty/n_null.c | 3 +- drivers/tty/n_r3964.c | 10 ++--- drivers/tty/n_tracerouter.c | 4 +- drivers/tty/n_tracesink.c | 4 +- drivers/tty/n_tty.c | 82 +++++++++++++++-------------------- drivers/tty/tty_io.c | 64 +++++++++++++++++++++++++-- include/linux/tty_ldisc.h | 3 +- net/nfc/nci/uart.c | 3 +- 14 files changed, 178 insertions(+), 102 deletions(-) diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c index f83d67eafc9f..dd92aea15b8b 100644 --- a/drivers/bluetooth/hci_ldisc.c +++ b/drivers/bluetooth/hci_ldisc.c @@ -802,7 +802,8 @@ static int hci_uart_tty_ioctl(struct tty_struct *tty, struct file *file, * We don't provide read/write/poll interface for user space. */ static ssize_t hci_uart_tty_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) + unsigned char *buf, size_t nr, + void **cookie, unsigned long offset) { return 0; } @@ -819,29 +820,28 @@ static __poll_t hci_uart_tty_poll(struct tty_struct *tty, return 0; } +static struct tty_ldisc_ops hci_uart_ldisc = { + .owner = THIS_MODULE, + .magic = TTY_LDISC_MAGIC, + .name = "n_hci", + .open = hci_uart_tty_open, + .close = hci_uart_tty_close, + .read = hci_uart_tty_read, + .write = hci_uart_tty_write, + .ioctl = hci_uart_tty_ioctl, + .compat_ioctl = hci_uart_tty_ioctl, + .poll = hci_uart_tty_poll, + .receive_buf = hci_uart_tty_receive, + .write_wakeup = hci_uart_tty_wakeup, +}; + static int __init hci_uart_init(void) { - static struct tty_ldisc_ops hci_uart_ldisc; int err; BT_INFO("HCI UART driver ver %s", VERSION); /* Register the tty discipline */ - - memset(&hci_uart_ldisc, 0, sizeof(hci_uart_ldisc)); - hci_uart_ldisc.magic = TTY_LDISC_MAGIC; - hci_uart_ldisc.name = "n_hci"; - hci_uart_ldisc.open = hci_uart_tty_open; - hci_uart_ldisc.close = hci_uart_tty_close; - hci_uart_ldisc.read = hci_uart_tty_read; - hci_uart_ldisc.write = hci_uart_tty_write; - hci_uart_ldisc.ioctl = hci_uart_tty_ioctl; - hci_uart_ldisc.compat_ioctl = hci_uart_tty_ioctl; - hci_uart_ldisc.poll = hci_uart_tty_poll; - hci_uart_ldisc.receive_buf = hci_uart_tty_receive; - hci_uart_ldisc.write_wakeup = hci_uart_tty_wakeup; - hci_uart_ldisc.owner = THIS_MODULE; - err = tty_register_ldisc(N_HCI, &hci_uart_ldisc); if (err) { BT_ERR("HCI line discipline registration failed. (%d)", err); diff --git a/drivers/input/serio/serport.c b/drivers/input/serio/serport.c index 8ac970a423de..33e9d9bfd036 100644 --- a/drivers/input/serio/serport.c +++ b/drivers/input/serio/serport.c @@ -156,7 +156,9 @@ static void serport_ldisc_receive(struct tty_struct *tty, const unsigned char *c * returning 0 characters. */ -static ssize_t serport_ldisc_read(struct tty_struct * tty, struct file * file, unsigned char __user * buf, size_t nr) +static ssize_t serport_ldisc_read(struct tty_struct * tty, struct file * file, + unsigned char *kbuf, size_t nr, + void **cookie, unsigned long offset) { struct serport *serport = (struct serport*) tty->disc_data; struct serio *serio; diff --git a/drivers/net/ppp/ppp_async.c b/drivers/net/ppp/ppp_async.c index 29a0917a81e6..f14a9d190de9 100644 --- a/drivers/net/ppp/ppp_async.c +++ b/drivers/net/ppp/ppp_async.c @@ -259,7 +259,8 @@ static int ppp_asynctty_hangup(struct tty_struct *tty) */ static ssize_t ppp_asynctty_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t count) + unsigned char *buf, size_t count, + void **cookie, unsigned long offset) { return -EAGAIN; } diff --git a/drivers/net/ppp/ppp_synctty.c b/drivers/net/ppp/ppp_synctty.c index 0f338752c38b..f774b7e52da4 100644 --- a/drivers/net/ppp/ppp_synctty.c +++ b/drivers/net/ppp/ppp_synctty.c @@ -257,7 +257,8 @@ static int ppp_sync_hangup(struct tty_struct *tty) */ static ssize_t ppp_sync_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t count) + unsigned char *buf, size_t count, + void **cookie, unsigned long offset) { return -EAGAIN; } diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index c676fa89ee0b..51dafc06f541 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -2559,7 +2559,8 @@ static void gsmld_write_wakeup(struct tty_struct *tty) */ static ssize_t gsmld_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) + unsigned char *buf, size_t nr, + void **cookie, unsigned long offset) { return -EOPNOTSUPP; } diff --git a/drivers/tty/n_hdlc.c b/drivers/tty/n_hdlc.c index 12557ee1edb6..1363e659dc1d 100644 --- a/drivers/tty/n_hdlc.c +++ b/drivers/tty/n_hdlc.c @@ -416,13 +416,19 @@ static void n_hdlc_tty_receive(struct tty_struct *tty, const __u8 *data, * Returns the number of bytes returned or error code. */ static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file, - __u8 __user *buf, size_t nr) + __u8 *kbuf, size_t nr, + void **cookie, unsigned long offset) { struct n_hdlc *n_hdlc = tty->disc_data; int ret = 0; struct n_hdlc_buf *rbuf; DECLARE_WAITQUEUE(wait, current); + /* Is this a repeated call for an rbuf we already found earlier? */ + rbuf = *cookie; + if (rbuf) + goto have_rbuf; + add_wait_queue(&tty->read_wait, &wait); for (;;) { @@ -436,25 +442,8 @@ static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file, set_current_state(TASK_INTERRUPTIBLE); rbuf = n_hdlc_buf_get(&n_hdlc->rx_buf_list); - if (rbuf) { - if (rbuf->count > nr) { - /* too large for caller's buffer */ - ret = -EOVERFLOW; - } else { - __set_current_state(TASK_RUNNING); - if (copy_to_user(buf, rbuf->buf, rbuf->count)) - ret = -EFAULT; - else - ret = rbuf->count; - } - - if (n_hdlc->rx_free_buf_list.count > - DEFAULT_RX_BUF_COUNT) - kfree(rbuf); - else - n_hdlc_buf_put(&n_hdlc->rx_free_buf_list, rbuf); + if (rbuf) break; - } /* no data */ if (tty_io_nonblock(tty, file)) { @@ -473,6 +462,39 @@ static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file, remove_wait_queue(&tty->read_wait, &wait); __set_current_state(TASK_RUNNING); + if (!rbuf) + return ret; + *cookie = rbuf; + +have_rbuf: + /* Have we used it up entirely? */ + if (offset >= rbuf->count) + goto done_with_rbuf; + + /* More data to go, but can't copy any more? EOVERFLOW */ + ret = -EOVERFLOW; + if (!nr) + goto done_with_rbuf; + + /* Copy as much data as possible */ + ret = rbuf->count - offset; + if (ret > nr) + ret = nr; + memcpy(kbuf, rbuf->buf+offset, ret); + offset += ret; + + /* If we still have data left, we leave the rbuf in the cookie */ + if (offset < rbuf->count) + return ret; + +done_with_rbuf: + *cookie = NULL; + + if (n_hdlc->rx_free_buf_list.count > DEFAULT_RX_BUF_COUNT) + kfree(rbuf); + else + n_hdlc_buf_put(&n_hdlc->rx_free_buf_list, rbuf); + return ret; } /* end of n_hdlc_tty_read() */ diff --git a/drivers/tty/n_null.c b/drivers/tty/n_null.c index 96feabae4740..ce03ae78f5c6 100644 --- a/drivers/tty/n_null.c +++ b/drivers/tty/n_null.c @@ -20,7 +20,8 @@ static void n_null_close(struct tty_struct *tty) } static ssize_t n_null_read(struct tty_struct *tty, struct file *file, - unsigned char __user * buf, size_t nr) + unsigned char *buf, size_t nr, + void **cookie, unsigned long offset) { return -EOPNOTSUPP; } diff --git a/drivers/tty/n_r3964.c b/drivers/tty/n_r3964.c index 934dd2fb2ec8..3161f0a535e3 100644 --- a/drivers/tty/n_r3964.c +++ b/drivers/tty/n_r3964.c @@ -129,7 +129,7 @@ static void remove_client_block(struct r3964_info *pInfo, static int r3964_open(struct tty_struct *tty); static void r3964_close(struct tty_struct *tty); static ssize_t r3964_read(struct tty_struct *tty, struct file *file, - unsigned char __user * buf, size_t nr); + void *cookie, unsigned char *buf, size_t nr); static ssize_t r3964_write(struct tty_struct *tty, struct file *file, const unsigned char *buf, size_t nr); static int r3964_ioctl(struct tty_struct *tty, struct file *file, @@ -1058,7 +1058,8 @@ static void r3964_close(struct tty_struct *tty) } static ssize_t r3964_read(struct tty_struct *tty, struct file *file, - unsigned char __user * buf, size_t nr) + unsigned char *kbuf, size_t nr, + void **cookie, unsigned long offset) { struct r3964_info *pInfo = tty->disc_data; struct r3964_client_info *pClient; @@ -1109,10 +1110,7 @@ static ssize_t r3964_read(struct tty_struct *tty, struct file *file, kfree(pMsg); TRACE_M("r3964_read - msg kfree %p", pMsg); - if (copy_to_user(buf, &theMsg, ret)) { - ret = -EFAULT; - goto unlock; - } + memcpy(kbuf, &theMsg, ret); TRACE_PS("read - return %d", ret); goto unlock; diff --git a/drivers/tty/n_tracerouter.c b/drivers/tty/n_tracerouter.c index 4479af4d2fa5..3490ed51b1a3 100644 --- a/drivers/tty/n_tracerouter.c +++ b/drivers/tty/n_tracerouter.c @@ -118,7 +118,9 @@ static void n_tracerouter_close(struct tty_struct *tty) * -EINVAL */ static ssize_t n_tracerouter_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) { + unsigned char *buf, size_t nr, + void **cookie, unsigned long offset) +{ return -EINVAL; } diff --git a/drivers/tty/n_tracesink.c b/drivers/tty/n_tracesink.c index d96ba82cc356..1d9931041fd8 100644 --- a/drivers/tty/n_tracesink.c +++ b/drivers/tty/n_tracesink.c @@ -115,7 +115,9 @@ static void n_tracesink_close(struct tty_struct *tty) * -EINVAL */ static ssize_t n_tracesink_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) { + unsigned char *buf, size_t nr, + void **cookie, unsigned long offset) +{ return -EINVAL; } diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index 319d68c8a5df..2f2f57a53968 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -164,29 +164,24 @@ static void zero_buffer(struct tty_struct *tty, u8 *buffer, int size) memset(buffer, 0x00, size); } -static int tty_copy_to_user(struct tty_struct *tty, void __user *to, - size_t tail, size_t n) +static void tty_copy(struct tty_struct *tty, void *to, size_t tail, size_t n) { struct n_tty_data *ldata = tty->disc_data; size_t size = N_TTY_BUF_SIZE - tail; void *from = read_buf_addr(ldata, tail); - int uncopied; if (n > size) { tty_audit_add_data(tty, from, size); - uncopied = copy_to_user(to, from, size); - zero_buffer(tty, from, size - uncopied); - if (uncopied) - return uncopied; + memcpy(to, from, size); + zero_buffer(tty, from, size); to += size; n -= size; from = ldata->read_buf; } tty_audit_add_data(tty, from, n); - uncopied = copy_to_user(to, from, n); - zero_buffer(tty, from, n - uncopied); - return uncopied; + memcpy(to, from, n); + zero_buffer(tty, from, n); } /** @@ -1944,15 +1939,16 @@ static inline int input_available_p(struct tty_struct *tty, int poll) /** * copy_from_read_buf - copy read data directly * @tty: terminal device - * @b: user data + * @kbp: data * @nr: size of data * * Helper function to speed up n_tty_read. It is only called when - * ICANON is off; it copies characters straight from the tty queue to - * user space directly. It can be profitably called twice; once to - * drain the space from the tail pointer to the (physical) end of the - * buffer, and once to drain the space from the (physical) beginning of - * the buffer to head pointer. + * ICANON is off; it copies characters straight from the tty queue. + * + * It can be profitably called twice; once to drain the space from + * the tail pointer to the (physical) end of the buffer, and once + * to drain the space from the (physical) beginning of the buffer + * to head pointer. * * Called under the ldata->atomic_read_lock sem * @@ -1962,7 +1958,7 @@ static inline int input_available_p(struct tty_struct *tty, int poll) */ static int copy_from_read_buf(struct tty_struct *tty, - unsigned char __user **b, + unsigned char **kbp, size_t *nr) { @@ -1978,8 +1974,7 @@ static int copy_from_read_buf(struct tty_struct *tty, n = min(*nr, n); if (n) { unsigned char *from = read_buf_addr(ldata, tail); - retval = copy_to_user(*b, from, n); - n -= retval; + memcpy(*kbp, from, n); is_eof = n == 1 && *from == EOF_CHAR(tty); tty_audit_add_data(tty, from, n); zero_buffer(tty, from, n); @@ -1988,7 +1983,7 @@ static int copy_from_read_buf(struct tty_struct *tty, if (L_EXTPROC(tty) && ldata->icanon && is_eof && (head == ldata->read_tail)) n = 0; - *b += n; + *kbp += n; *nr -= n; } return retval; @@ -1997,12 +1992,12 @@ static int copy_from_read_buf(struct tty_struct *tty, /** * canon_copy_from_read_buf - copy read data in canonical mode * @tty: terminal device - * @b: user data + * @kbp: data * @nr: size of data * * Helper function for n_tty_read. It is only called when ICANON is on; * it copies one line of input up to and including the line-delimiting - * character into the user-space buffer. + * character into the result buffer. * * NB: When termios is changed from non-canonical to canonical mode and * the read buffer contains data, n_tty_set_termios() simulates an EOF @@ -2018,14 +2013,14 @@ static int copy_from_read_buf(struct tty_struct *tty, */ static int canon_copy_from_read_buf(struct tty_struct *tty, - unsigned char __user **b, + unsigned char **kbp, size_t *nr) { struct n_tty_data *ldata = tty->disc_data; size_t n, size, more, c; size_t eol; size_t tail; - int ret, found = 0; + int found = 0; /* N.B. avoid overrun if nr == 0 */ if (!*nr) @@ -2061,10 +2056,8 @@ static int canon_copy_from_read_buf(struct tty_struct *tty, n_tty_trace("%s: eol:%zu found:%d n:%zu c:%zu tail:%zu more:%zu\n", __func__, eol, found, n, c, tail, more); - ret = tty_copy_to_user(tty, *b, tail, n); - if (ret) - return -EFAULT; - *b += n; + tty_copy(tty, *kbp, tail, n); + *kbp += n; *nr -= n; if (found) @@ -2132,10 +2125,11 @@ static int job_control(struct tty_struct *tty, struct file *file) */ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) + unsigned char *kbuf, size_t nr, + void **cookie, unsigned long offset) { struct n_tty_data *ldata = tty->disc_data; - unsigned char __user *b = buf; + unsigned char *kb = kbuf; DEFINE_WAIT_FUNC(wait, woken_wake_function); int c; int minimum, time; @@ -2181,17 +2175,13 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, /* First test for status change. */ if (packet && tty->link->ctrl_status) { unsigned char cs; - if (b != buf) + if (kb != kbuf) break; spin_lock_irq(&tty->link->ctrl_lock); cs = tty->link->ctrl_status; tty->link->ctrl_status = 0; spin_unlock_irq(&tty->link->ctrl_lock); - if (put_user(cs, b)) { - retval = -EFAULT; - break; - } - b++; + *kb++ = cs; nr--; break; } @@ -2234,24 +2224,20 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, } if (ldata->icanon && !L_EXTPROC(tty)) { - retval = canon_copy_from_read_buf(tty, &b, &nr); + retval = canon_copy_from_read_buf(tty, &kb, &nr); if (retval) break; } else { int uncopied; /* Deal with packet mode. */ - if (packet && b == buf) { - if (put_user(TIOCPKT_DATA, b)) { - retval = -EFAULT; - break; - } - b++; + if (packet && kb == kbuf) { + *kb++ = TIOCPKT_DATA; nr--; } - uncopied = copy_from_read_buf(tty, &b, &nr); - uncopied += copy_from_read_buf(tty, &b, &nr); + uncopied = copy_from_read_buf(tty, &kb, &nr); + uncopied += copy_from_read_buf(tty, &kb, &nr); if (uncopied) { retval = -EFAULT; break; @@ -2260,7 +2246,7 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, n_tty_check_unthrottle(tty); - if (b - buf >= minimum) + if (kb - kbuf >= minimum) break; if (time) timeout = time; @@ -2272,8 +2258,8 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, remove_wait_queue(&tty->read_wait, &wait); mutex_unlock(&ldata->atomic_read_lock); - if (b - buf) - retval = b - buf; + if (kb - kbuf) + retval = kb - kbuf; return retval; } diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index 502862626b2b..d33e120046a6 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -832,6 +832,65 @@ static void tty_update_time(struct timespec64 *time) time->tv_sec = sec; } +/* + * Iterate on the ldisc ->read() function until we've gotten all + * the data the ldisc has for us. + * + * The "cookie" is something that the ldisc read function can fill + * in to let us know that there is more data to be had. + * + * We promise to continue to call the ldisc until it stops returning + * data or clears the cookie. The cookie may be something that the + * ldisc maintains state for and needs to free. + */ +static int iterate_tty_read(struct tty_ldisc *ld, struct tty_struct *tty, struct file *file, + char __user *buf, size_t count) +{ + int retval = 0; + void *cookie = NULL; + unsigned long offset = 0; + char kernel_buf[64]; + + do { + int size, uncopied; + + size = count > sizeof(kernel_buf) ? sizeof(kernel_buf) : count; + size = ld->ops->read(tty, file, kernel_buf, size, &cookie, offset); + if (!size) + break; + + /* + * A ldisc read error return will override any previously copied + * data (eg -EOVERFLOW from HDLC) + */ + if (size < 0) { + memzero_explicit(kernel_buf, sizeof(kernel_buf)); + return size; + } + + uncopied = copy_to_user(buf+offset, kernel_buf, size); + size -= uncopied; + offset += size; + count -= size; + + /* + * If the user copy failed, we still need to do another ->read() + * call if we had a cookie to let the ldisc clear up. + * + * But make sure size is zeroed. + */ + if (unlikely(uncopied)) { + count = 0; + retval = -EFAULT; + } + } while (cookie); + + /* We always clear tty buffer in case they contained passwords */ + memzero_explicit(kernel_buf, sizeof(kernel_buf)); + return offset ? offset : retval; +} + + /** * tty_read - read method for tty device files * @file: pointer to tty file @@ -865,10 +924,9 @@ static ssize_t tty_read(struct file *file, char __user *buf, size_t count, ld = tty_ldisc_ref_wait(tty); if (!ld) return hung_up_tty_read(file, buf, count, ppos); + i = -EIO; if (ld->ops->read) - i = ld->ops->read(tty, file, buf, count); - else - i = -EIO; + i = iterate_tty_read(ld, tty, file, buf, count); tty_ldisc_deref(ld); if (i > 0) diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h index b1e6043e9917..572a07976116 100644 --- a/include/linux/tty_ldisc.h +++ b/include/linux/tty_ldisc.h @@ -185,7 +185,8 @@ struct tty_ldisc_ops { void (*close)(struct tty_struct *); void (*flush_buffer)(struct tty_struct *tty); ssize_t (*read)(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr); + unsigned char *buf, size_t nr, + void **cookie, unsigned long offset); ssize_t (*write)(struct tty_struct *tty, struct file *file, const unsigned char *buf, size_t nr); int (*ioctl)(struct tty_struct *tty, struct file *file, diff --git a/net/nfc/nci/uart.c b/net/nfc/nci/uart.c index 11b554ce07ff..1204c438e87d 100644 --- a/net/nfc/nci/uart.c +++ b/net/nfc/nci/uart.c @@ -292,7 +292,8 @@ static int nci_uart_tty_ioctl(struct tty_struct *tty, struct file *file, /* We don't provide read/write/poll interface for user space. */ static ssize_t nci_uart_tty_read(struct tty_struct *tty, struct file *file, - unsigned char __user *buf, size_t nr) + unsigned char *buf, size_t nr, + void **cookie, unsigned long offset) { return 0; } -- 2.29.2.157.g1d47791a39 [-- Attachment #5: 0004-tty-implement-read_iter.patch --] [-- Type: text/x-patch, Size: 5002 bytes --] From 08cb81c888e88b152f49ad2c90146b8f0c9ce6b3 Mon Sep 17 00:00:00 2001 From: Linus Torvalds <torvalds@linux-foundation.org> Date: Tue, 19 Jan 2021 10:49:19 -0800 Subject: [PATCH 4/4] tty: implement read_iter Now that the ldisc read() function takes kernel pointers, it's fairly straightforward to make the tty file operations use .read_iter() instead of .read(). That automatically gives us vread() and friends, and also makes it possible to do .splice_read() on tty's again. Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops") Reported-by: Oliver Giles <ohw.giles@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> --- drivers/tty/tty_io.c | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index d33e120046a6..b8c0b40f3298 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -142,7 +142,7 @@ LIST_HEAD(tty_drivers); /* linked list of tty drivers */ /* Mutex to protect creating and releasing a tty */ DEFINE_MUTEX(tty_mutex); -static ssize_t tty_read(struct file *, char __user *, size_t, loff_t *); +static ssize_t tty_read(struct kiocb *, struct iov_iter *); static ssize_t tty_write(struct kiocb *, struct iov_iter *); ssize_t redirected_tty_write(struct kiocb *, struct iov_iter *); static __poll_t tty_poll(struct file *, poll_table *); @@ -476,8 +476,9 @@ static void tty_show_fdinfo(struct seq_file *m, struct file *file) static const struct file_operations tty_fops = { .llseek = no_llseek, - .read = tty_read, + .read_iter = tty_read, .write_iter = tty_write, + .splice_read = generic_file_splice_read, .splice_write = iter_file_splice_write, .poll = tty_poll, .unlocked_ioctl = tty_ioctl, @@ -490,8 +491,9 @@ static const struct file_operations tty_fops = { static const struct file_operations console_fops = { .llseek = no_llseek, - .read = tty_read, + .read_iter = tty_read, .write_iter = redirected_tty_write, + .splice_read = generic_file_splice_read, .splice_write = iter_file_splice_write, .poll = tty_poll, .unlocked_ioctl = tty_ioctl, @@ -843,16 +845,17 @@ static void tty_update_time(struct timespec64 *time) * data or clears the cookie. The cookie may be something that the * ldisc maintains state for and needs to free. */ -static int iterate_tty_read(struct tty_ldisc *ld, struct tty_struct *tty, struct file *file, - char __user *buf, size_t count) +static int iterate_tty_read(struct tty_ldisc *ld, struct tty_struct *tty, + struct file *file, struct iov_iter *to) { int retval = 0; void *cookie = NULL; unsigned long offset = 0; char kernel_buf[64]; + size_t count = iov_iter_count(to); do { - int size, uncopied; + int size, copied; size = count > sizeof(kernel_buf) ? sizeof(kernel_buf) : count; size = ld->ops->read(tty, file, kernel_buf, size, &cookie, offset); @@ -868,10 +871,9 @@ static int iterate_tty_read(struct tty_ldisc *ld, struct tty_struct *tty, struct return size; } - uncopied = copy_to_user(buf+offset, kernel_buf, size); - size -= uncopied; - offset += size; - count -= size; + copied = copy_to_iter(kernel_buf, size, to); + offset += copied; + count -= copied; /* * If the user copy failed, we still need to do another ->read() @@ -879,7 +881,7 @@ static int iterate_tty_read(struct tty_ldisc *ld, struct tty_struct *tty, struct * * But make sure size is zeroed. */ - if (unlikely(uncopied)) { + if (unlikely(copied != size)) { count = 0; retval = -EFAULT; } @@ -906,10 +908,10 @@ static int iterate_tty_read(struct tty_ldisc *ld, struct tty_struct *tty, struct * read calls may be outstanding in parallel. */ -static ssize_t tty_read(struct file *file, char __user *buf, size_t count, - loff_t *ppos) +static ssize_t tty_read(struct kiocb *iocb, struct iov_iter *to) { int i; + struct file *file = iocb->ki_filp; struct inode *inode = file_inode(file); struct tty_struct *tty = file_tty(file); struct tty_ldisc *ld; @@ -922,11 +924,9 @@ static ssize_t tty_read(struct file *file, char __user *buf, size_t count, /* We want to wait for the line discipline to sort out in this situation */ ld = tty_ldisc_ref_wait(tty); - if (!ld) - return hung_up_tty_read(file, buf, count, ppos); i = -EIO; - if (ld->ops->read) - i = iterate_tty_read(ld, tty, file, buf, count); + if (ld && ld->ops->read) + i = iterate_tty_read(ld, tty, file, to); tty_ldisc_deref(ld); if (i > 0) @@ -2929,7 +2929,7 @@ static long tty_compat_ioctl(struct file *file, unsigned int cmd, static int this_tty(const void *t, struct file *file, unsigned fd) { - if (likely(file->f_op->read != tty_read)) + if (likely(file->f_op->read_iter != tty_read)) return 0; return file_tty(file) != t ? 0 : fd + 1; } -- 2.29.2.157.g1d47791a39 ^ permalink raw reply related [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-19 20:24 ` Linus Torvalds @ 2021-01-19 20:38 ` Christoph Hellwig 2021-01-20 1:25 ` Oliver Giles 1 sibling, 0 replies; 65+ messages in thread From: Christoph Hellwig @ 2021-01-19 20:38 UTC (permalink / raw) To: Linus Torvalds Cc: Greg Kroah-Hartman, Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Al Viro, Jiri Slaby On Tue, Jan 19, 2021 at 12:24:22PM -0800, Linus Torvalds wrote: > But that sysfs binary file case really isn't interesting, and just > more of a "Christoph broke it, I think he should just fix it". > Christoph? I'll give it a spin tomorrow. Should be simple enough. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-19 20:24 ` Linus Torvalds 2021-01-19 20:38 ` Christoph Hellwig @ 2021-01-20 1:25 ` Oliver Giles 2021-01-20 4:44 ` Linus Torvalds 1 sibling, 1 reply; 65+ messages in thread From: Oliver Giles @ 2021-01-20 1:25 UTC (permalink / raw) To: Linus Torvalds, Greg Kroah-Hartman Cc: Christoph Hellwig, Linux Kernel Mailing List, Al Viro, Jiri Slaby On Wed Jan 20, 2021 at 9:24 AM NZDT, Linus Torvalds wrote: > Anyway, anybody willing to test these tty/pipe patches on the loads > that failed? Oliver? Writing this from a kernel with those patches in; happily splice()ing to and from a pty. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-20 1:25 ` Oliver Giles @ 2021-01-20 4:44 ` Linus Torvalds 2021-01-20 8:15 ` Oliver Giles ` (2 more replies) 0 siblings, 3 replies; 65+ messages in thread From: Linus Torvalds @ 2021-01-20 4:44 UTC (permalink / raw) To: Oliver Giles, Robert Karszniewicz Cc: Greg Kroah-Hartman, Christoph Hellwig, Linux Kernel Mailing List, Al Viro, Jiri Slaby [-- Attachment #1: Type: text/plain, Size: 1687 bytes --] On Tue, Jan 19, 2021 at 5:29 PM Oliver Giles <ohw.giles@gmail.com> wrote: > > Writing this from a kernel with those patches in; happily splice()ing > to and from a pty. Ok, good. I have a couple of improvement patches on top of those, that I'm attaching here. The first four patches worked fine for me (and apparently for you too), but due to the small buffer, the regular N_TTY case for tty read() calls are now limited to 64 bytes each. That is unfortunate for performance, but it might also cause some actual breakage: anybody doing "read()" calls on a tty file descriptor _should_ be perfectly fine with short reads since they happen for signals etc, but I could well imagine that not everybody is. And that new kernel buffer interface was _designed_ to allow stitching small buffers together efficiently (since the hdlc case needed it), so this implements that for the non-icanon case for n_tty too. I wasted an embarrasing amount of time today on that final patch - I spent something like 6 hours chasing a truly stupid one-liner bug I had initially, and couldn't see what was wrong. Which is why this only does the non-icanon case, because after I finally had my "Duh!" moment, I was so annoyed with it that I had to just walk away. I'll come back to this tomorrow and do the line-buffered icanon case too (unless pull requests pile up), and then I'll be happy with the tty changes, and I think I can submit this series for real to Greg. But in the meantime, here's two more patches to try on top of the previous four. They shouldn't matter, other than making the non-icanon throughput a lot better. But the more coverage they get, the happier I'll be. Linus [-- Attachment #2: 0005-tty-clean-up-legacy-leftovers-from-n_tty-line-discip.patch --] [-- Type: text/x-patch, Size: 2202 bytes --] From b12a1652c91648e96ae11946f7489515dd063789 Mon Sep 17 00:00:00 2001 From: Linus Torvalds <torvalds@linux-foundation.org> Date: Tue, 19 Jan 2021 13:46:28 -0800 Subject: [PATCH 5/6] tty: clean up legacy leftovers from n_tty line discipline Back when the line disciplines did their own direct user accesses, they had to deal with the data copy possibly failing in the middle. Now that the user copy is done by the tty_io.c code, that failure case no longer exists. Remove the left-over error handling code that cannot trigger. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> --- drivers/tty/n_tty.c | 16 ++++------------ 1 file changed, 4 insertions(+), 12 deletions(-) diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index 2f2f57a53968..a02fe661f617 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -1957,19 +1957,17 @@ static inline int input_available_p(struct tty_struct *tty, int poll) * read_tail published */ -static int copy_from_read_buf(struct tty_struct *tty, +static void copy_from_read_buf(struct tty_struct *tty, unsigned char **kbp, size_t *nr) { struct n_tty_data *ldata = tty->disc_data; - int retval; size_t n; bool is_eof; size_t head = smp_load_acquire(&ldata->commit_head); size_t tail = ldata->read_tail & (N_TTY_BUF_SIZE - 1); - retval = 0; n = min(head - ldata->read_tail, N_TTY_BUF_SIZE - tail); n = min(*nr, n); if (n) { @@ -1986,7 +1984,6 @@ static int copy_from_read_buf(struct tty_struct *tty, *kbp += n; *nr -= n; } - return retval; } /** @@ -2228,20 +2225,15 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, if (retval) break; } else { - int uncopied; - /* Deal with packet mode. */ if (packet && kb == kbuf) { *kb++ = TIOCPKT_DATA; nr--; } - uncopied = copy_from_read_buf(tty, &kb, &nr); - uncopied += copy_from_read_buf(tty, &kb, &nr); - if (uncopied) { - retval = -EFAULT; - break; - } + /* See comment above copy_from_read_buf() why twice */ + copy_from_read_buf(tty, &kb, &nr); + copy_from_read_buf(tty, &kb, &nr); } n_tty_check_unthrottle(tty); -- 2.29.2.157.g1d47791a39 [-- Attachment #3: 0006-tty-teach-n_tty-line-discipline-about-the-new-cookie.patch --] [-- Type: text/x-patch, Size: 4889 bytes --] From e724cc9c4b101a4de1a56bcca6b5ec1d3493b173 Mon Sep 17 00:00:00 2001 From: Linus Torvalds <torvalds@linux-foundation.org> Date: Tue, 19 Jan 2021 18:14:20 -0800 Subject: [PATCH 6/6] tty: teach n_tty line discipline about the new "cookie continuations" With the conversion to do the tty ldisc read operations in small chunks, the n_tty line discipline became noticeably slower for throughput oriented loads, because rather than read things in up to 2kB chunks, it would return at most 64 bytes per read() system call. The cost is mainly all in the "do system calls over and over", not really in the new "copy to an extra kernel buffer". This can be fixed by teaching the n_tty line discipline about the "cookie continuation" model, which the chunking code supports because things like hdlc need to be able to handle packets up to 64kB in size. Doing that doesn't just get us back to the old performace, but to much better performance: my stupid "copy 10MB of data over a pty" test program is now almost twice as fast as it used to be (going down from 0.1s to 0.054s). This is entirely because it now creates maximal chunks (which happens to be "one byte less than one page" due to how we do the circular tty buffers). NOTE! This case only handles the simpler non-icanon case, which is the one where people may care about throughput. I'm going to do the icanon case later too, because while performance isn't a major issue for that, there may be programs that think they'll always get a full line and don't like the 64-byte chunking for that reason. Such programs are arguably buggy (signals etc can cause random partial results from tty reads anyway), and good programs will handle such partial reads, but expecting everybody to write "good programs" has never been a winning policy for the kernel.. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> --- drivers/tty/n_tty.c | 52 ++++++++++++++++++++++++++++++++++++--------- 1 file changed, 42 insertions(+), 10 deletions(-) diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index a02fe661f617..37bfd695011d 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -1945,19 +1945,17 @@ static inline int input_available_p(struct tty_struct *tty, int poll) * Helper function to speed up n_tty_read. It is only called when * ICANON is off; it copies characters straight from the tty queue. * - * It can be profitably called twice; once to drain the space from - * the tail pointer to the (physical) end of the buffer, and once - * to drain the space from the (physical) beginning of the buffer - * to head pointer. - * * Called under the ldata->atomic_read_lock sem * + * Returns true if it successfully copied data, but there is still + * more data to be had. + * * n_tty_read()/consumer path: * caller holds non-exclusive termios_rwsem * read_tail published */ -static void copy_from_read_buf(struct tty_struct *tty, +static bool copy_from_read_buf(struct tty_struct *tty, unsigned char **kbp, size_t *nr) @@ -1980,10 +1978,14 @@ static void copy_from_read_buf(struct tty_struct *tty, /* Turn single EOF into zero-length read */ if (L_EXTPROC(tty) && ldata->icanon && is_eof && (head == ldata->read_tail)) - n = 0; + return false; *kbp += n; *nr -= n; + + /* If we have more to copy, let the caller know */ + return head != ldata->read_tail; } + return false; } /** @@ -2135,6 +2137,25 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, int packet; size_t tail; + /* + * Is this a continuation of a read started earler? + * + * If so, we still hold the atomic_read_lock and the + * termios_rwsem, and can just continue to copy data. + */ + if (*cookie) { + if (copy_from_read_buf(tty, &kb, &nr)) + return kb - kbuf; + + /* No more data - release locks and stop retries */ + n_tty_kick_worker(tty); + n_tty_check_unthrottle(tty); + up_read(&tty->termios_rwsem); + mutex_unlock(&ldata->atomic_read_lock); + *cookie = NULL; + return kb - kbuf; + } + c = job_control(tty, file); if (c < 0) return c; @@ -2231,9 +2252,20 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file, nr--; } - /* See comment above copy_from_read_buf() why twice */ - copy_from_read_buf(tty, &kb, &nr); - copy_from_read_buf(tty, &kb, &nr); + /* + * Copy data, and if there is more to be had + * and we have nothing more to wait for, then + * let's mark us for retries. + * + * NOTE! We return here with both the termios_sem + * and atomic_read_lock still held, the retries + * will release them when done. + */ + if (copy_from_read_buf(tty, &kb, &nr) && kb - kbuf >= minimum) { + remove_wait_queue(&tty->read_wait, &wait); + *cookie = cookie; + return kb - kbuf; + } } n_tty_check_unthrottle(tty); -- 2.29.2.157.g1d47791a39 ^ permalink raw reply related [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-20 4:44 ` Linus Torvalds @ 2021-01-20 8:15 ` Oliver Giles 2021-01-21 1:18 ` tty splice branch (was "Re: Splicing to/from a tty") Linus Torvalds 2021-01-21 17:03 ` Splicing to/from a tty Robert Karszniewicz 2 siblings, 0 replies; 65+ messages in thread From: Oliver Giles @ 2021-01-20 8:15 UTC (permalink / raw) To: Linus Torvalds, Robert Karszniewicz Cc: Greg Kroah-Hartman, Christoph Hellwig, Linux Kernel Mailing List, Al Viro, Jiri Slaby On Wed Jan 20, 2021 at 5:44 PM NZDT, Linus Torvalds wrote: > > But in the meantime, here's two more patches to try on top of the > previous four. They shouldn't matter, other than making the non-icanon > throughput a lot better. But the more coverage they get, the happier > I'll be. > I tried these out too, my use case is still working well. ^ permalink raw reply [flat|nested] 65+ messages in thread
* tty splice branch (was "Re: Splicing to/from a tty") 2021-01-20 4:44 ` Linus Torvalds 2021-01-20 8:15 ` Oliver Giles @ 2021-01-21 1:18 ` Linus Torvalds 2021-01-21 8:44 ` Greg Kroah-Hartman 2021-01-21 8:50 ` Jiri Slaby 2021-01-21 17:03 ` Splicing to/from a tty Robert Karszniewicz 2 siblings, 2 replies; 65+ messages in thread From: Linus Torvalds @ 2021-01-21 1:18 UTC (permalink / raw) To: Oliver Giles, Robert Karszniewicz, Greg Kroah-Hartman Cc: Christoph Hellwig, Linux Kernel Mailing List, Al Viro, Jiri Slaby On Tue, Jan 19, 2021 at 8:44 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I'll come back to this tomorrow and do the line-buffered icanon case > too (unless pull requests pile up), and then I'll be happy with the > tty changes, and I think I can submit this series for real to Greg. Greg, I don't know how you want to handle this. I have a branch with my tty splice patches at git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git tty-splice and that now includes doing that "cookie continuation" thing even for the N_TTY icanon modes. It passes my local tests, and I did try a few rather odd things. And Oliver tested an ealier version without that final commit on his load. But... That tty splice thing is clearly a regression, but it's not like we have seen a lot of reports of it, so it's clearly a very special thing. End result: I'm leaving it to you to decide how you want to handle it. You can tell me to just merge it myself as a regression fix, despite it being fairly late in the 5.11 series. Or you can pull it into your tty tree for linux-next and 5.12. And we can just plan to backport it (for 5.10 and 5.11) later when it has had more wide testing. Another alternative is to do just that first patch immediately (the "tty: implement write_iter" one), because that one should be the simple case that gets sendfile() and splice() working when the _destination_ is a tty. The "source is a tty" is the much more complex case that the other patches deal with. Linus ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: tty splice branch (was "Re: Splicing to/from a tty") 2021-01-21 1:18 ` tty splice branch (was "Re: Splicing to/from a tty") Linus Torvalds @ 2021-01-21 8:44 ` Greg Kroah-Hartman 2021-01-21 8:50 ` Jiri Slaby 1 sibling, 0 replies; 65+ messages in thread From: Greg Kroah-Hartman @ 2021-01-21 8:44 UTC (permalink / raw) To: Linus Torvalds Cc: Oliver Giles, Robert Karszniewicz, Christoph Hellwig, Linux Kernel Mailing List, Al Viro, Jiri Slaby On Wed, Jan 20, 2021 at 05:18:36PM -0800, Linus Torvalds wrote: > On Tue, Jan 19, 2021 at 8:44 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > I'll come back to this tomorrow and do the line-buffered icanon case > > too (unless pull requests pile up), and then I'll be happy with the > > tty changes, and I think I can submit this series for real to Greg. > > Greg, I don't know how you want to handle this. > > I have a branch with my tty splice patches at > > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git tty-splice > > and that now includes doing that "cookie continuation" thing even for > the N_TTY icanon modes. > > It passes my local tests, and I did try a few rather odd things. And > Oliver tested an ealier version without that final commit on his load. > But... > > That tty splice thing is clearly a regression, but it's not like we > have seen a lot of reports of it, so it's clearly a very special > thing. > > End result: I'm leaving it to you to decide how you want to handle it. > You can tell me to just merge it myself as a regression fix, despite > it being fairly late in the 5.11 series. Or you can pull it into your > tty tree for linux-next and 5.12. And we can just plan to backport it > (for 5.10 and 5.11) later when it has had more wide testing. > > Another alternative is to do just that first patch immediately (the > "tty: implement write_iter" one), because that one should be the > simple case that gets sendfile() and splice() working when the > _destination_ is a tty. The "source is a tty" is the much more complex > case that the other patches deal with. Let me do this last thing. I've taken your one patch into my "tty-linus" branch and will go beat on it for a day and then ask you to pull it in for the next 5.11-rc release, and I've taken your full series into my "tty-next" branch so it will get much wider testing in linux-next for a few weeks. If it turns out that we get reports of the "splice/sendfile from a tty", we can always merge them into 5.11 and 5.10 as needed. Thanks for doing this work, greg k-h ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: tty splice branch (was "Re: Splicing to/from a tty") 2021-01-21 1:18 ` tty splice branch (was "Re: Splicing to/from a tty") Linus Torvalds 2021-01-21 8:44 ` Greg Kroah-Hartman @ 2021-01-21 8:50 ` Jiri Slaby 2021-01-21 8:58 ` Jiri Slaby 2021-01-21 8:58 ` Greg Kroah-Hartman 1 sibling, 2 replies; 65+ messages in thread From: Jiri Slaby @ 2021-01-21 8:50 UTC (permalink / raw) To: Linus Torvalds, Oliver Giles, Robert Karszniewicz, Greg Kroah-Hartman Cc: Christoph Hellwig, Linux Kernel Mailing List, Al Viro, Jiri Slaby On 21. 01. 21, 2:18, Linus Torvalds wrote: > On Tue, Jan 19, 2021 at 8:44 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> >> I'll come back to this tomorrow and do the line-buffered icanon case >> too (unless pull requests pile up), and then I'll be happy with the >> tty changes, and I think I can submit this series for real to Greg. > > Greg, I don't know how you want to handle this. > > I have a branch with my tty splice patches at > > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git tty-splice > > and that now includes doing that "cookie continuation" thing even for > the N_TTY icanon modes. > > It passes my local tests, and I did try a few rather odd things. And > Oliver tested an ealier version without that final commit on his load. > But... Hm, I would like to review this first. I noticed the changes only because a new branch appeared when I grabbed your tree and the branch has "tty" in its name. So for example: > @@ -1038,18 +1045,15 @@ static ssize_t tty_write(struct file *file, const char __user *buf, > if (tty->ops->write_room == NULL) > tty_err(tty, "missing write_room method\n"); > ld = tty_ldisc_ref_wait(tty); > - if (!ld) > - return hung_up_tty_write(file, buf, count, ppos); > - if (!ld->ops->write) > + if (!ld || !ld->ops->write) > ret = -EIO; > else > - ret = do_tty_write(ld->ops->write, tty, file, buf, count); > + ret = do_tty_write(ld->ops->write, tty, file, from); > tty_ldisc_deref(ld); if ld == NULL => crash here. So can you send the patches to the list and Cc me too? > return ret; > } thanks, -- js ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: tty splice branch (was "Re: Splicing to/from a tty") 2021-01-21 8:50 ` Jiri Slaby @ 2021-01-21 8:58 ` Jiri Slaby 2021-01-21 17:52 ` Linus Torvalds 2021-01-21 8:58 ` Greg Kroah-Hartman 1 sibling, 1 reply; 65+ messages in thread From: Jiri Slaby @ 2021-01-21 8:58 UTC (permalink / raw) To: Linus Torvalds, Oliver Giles, Robert Karszniewicz, Greg Kroah-Hartman Cc: Christoph Hellwig, Linux Kernel Mailing List, Al Viro On 21. 01. 21, 9:50, Jiri Slaby wrote: > Hm, I would like to review this first. I noticed the changes only > because a new branch appeared when I grabbed your tree and the branch > has "tty" in its name. Which is weird as you Cced me. Let me check what is wrong with my e-mail setup. thanks, -- js ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: tty splice branch (was "Re: Splicing to/from a tty") 2021-01-21 8:58 ` Jiri Slaby @ 2021-01-21 17:52 ` Linus Torvalds 0 siblings, 0 replies; 65+ messages in thread From: Linus Torvalds @ 2021-01-21 17:52 UTC (permalink / raw) To: Jiri Slaby Cc: Oliver Giles, Robert Karszniewicz, Greg Kroah-Hartman, Christoph Hellwig, Linux Kernel Mailing List, Al Viro On Thu, Jan 21, 2021 at 12:58 AM Jiri Slaby <jirislaby@kernel.org> wrote: > > Which is weird as you Cced me. Let me check what is wrong with my e-mail > setup. Yeah, I think you were cc'd on not just the patches, but on all the discussion that preceded them. But better late than never. I was actually much more nervous about the much more subtle "cookie continuation" stuff, not the trivial conversion patches. Which just goes to show that most bugs are when you don't really think about it because it's "trivial". Linus ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: tty splice branch (was "Re: Splicing to/from a tty") 2021-01-21 8:50 ` Jiri Slaby 2021-01-21 8:58 ` Jiri Slaby @ 2021-01-21 8:58 ` Greg Kroah-Hartman 1 sibling, 0 replies; 65+ messages in thread From: Greg Kroah-Hartman @ 2021-01-21 8:58 UTC (permalink / raw) To: Jiri Slaby Cc: Linus Torvalds, Oliver Giles, Robert Karszniewicz, Christoph Hellwig, Linux Kernel Mailing List, Al Viro On Thu, Jan 21, 2021 at 09:50:39AM +0100, Jiri Slaby wrote: > On 21. 01. 21, 2:18, Linus Torvalds wrote: > > On Tue, Jan 19, 2021 at 8:44 PM Linus Torvalds > > <torvalds@linux-foundation.org> wrote: > > > > > > I'll come back to this tomorrow and do the line-buffered icanon case > > > too (unless pull requests pile up), and then I'll be happy with the > > > tty changes, and I think I can submit this series for real to Greg. > > > > Greg, I don't know how you want to handle this. > > > > I have a branch with my tty splice patches at > > > > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git tty-splice > > > > and that now includes doing that "cookie continuation" thing even for > > the N_TTY icanon modes. > > > > It passes my local tests, and I did try a few rather odd things. And > > Oliver tested an ealier version without that final commit on his load. > > But... > > Hm, I would like to review this first. I noticed the changes only because a > new branch appeared when I grabbed your tree and the branch has "tty" in its > name. > > So for example: > > > @@ -1038,18 +1045,15 @@ static ssize_t tty_write(struct file *file, const char __user *buf, > > if (tty->ops->write_room == NULL) > > tty_err(tty, "missing write_room method\n"); > > ld = tty_ldisc_ref_wait(tty); > > - if (!ld) > > - return hung_up_tty_write(file, buf, count, ppos); > > - if (!ld->ops->write) > > + if (!ld || !ld->ops->write) > > ret = -EIO; > > else > > - ret = do_tty_write(ld->ops->write, tty, file, buf, count); > > + ret = do_tty_write(ld->ops->write, tty, file, from); > > tty_ldisc_deref(ld); > > if ld == NULL => crash here. > > So can you send the patches to the list and Cc me too? I'll send them out right now, I've already merged them to my branches. greg k-h ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-20 4:44 ` Linus Torvalds 2021-01-20 8:15 ` Oliver Giles 2021-01-21 1:18 ` tty splice branch (was "Re: Splicing to/from a tty") Linus Torvalds @ 2021-01-21 17:03 ` Robert Karszniewicz 2021-01-21 18:43 ` Linus Torvalds 2 siblings, 1 reply; 65+ messages in thread From: Robert Karszniewicz @ 2021-01-21 17:03 UTC (permalink / raw) To: Linus Torvalds, Oliver Giles Cc: Greg Kroah-Hartman, Christoph Hellwig, Linux Kernel Mailing List, Al Viro, Jiri Slaby On 1/20/21 5:44 AM, Linus Torvalds wrote: > On Tue, Jan 19, 2021 at 5:29 PM Oliver Giles <ohw.giles@gmail.com> wrote: >> >> Writing this from a kernel with those patches in; happily splice()ing >> to and from a pty. > > Ok, good. > > I have a couple of improvement patches on top of those, that I'm attaching here. > > [...] > > But in the meantime, here's two more patches to try on top of the > previous four. They shouldn't matter, other than making the non-icanon > throughput a lot better. But the more coverage they get, the happier > I'll be. I confirm that the 4 patches, as well as the 4+2 patches, fix the regression I noticed with cat failing on sendfile() to ttymxc0. Thanks, Robert ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-21 17:03 ` Splicing to/from a tty Robert Karszniewicz @ 2021-01-21 18:43 ` Linus Torvalds 0 siblings, 0 replies; 65+ messages in thread From: Linus Torvalds @ 2021-01-21 18:43 UTC (permalink / raw) To: Robert Karszniewicz Cc: Oliver Giles, Greg Kroah-Hartman, Christoph Hellwig, Linux Kernel Mailing List, Al Viro, Jiri Slaby On Thu, Jan 21, 2021 at 9:03 AM Robert Karszniewicz <r.karszniewicz@phytec.de> wrote: > > I confirm that the 4 patches, as well as the 4+2 patches, fix the regression I > noticed with cat failing on sendfile() to ttymxc0. Thanks. Now we just need to find somebody with HDLC use cases and we'll actually have this series properly tested... Linus ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Splicing to/from a tty 2021-01-18 21:35 ` Linus Torvalds 2021-01-18 21:54 ` Linus Torvalds @ 2021-01-19 11:52 ` Greg Kroah-Hartman 1 sibling, 0 replies; 65+ messages in thread From: Greg Kroah-Hartman @ 2021-01-19 11:52 UTC (permalink / raw) To: Linus Torvalds Cc: Christoph Hellwig, Oliver Giles, Linux Kernel Mailing List, Al Viro On Mon, Jan 18, 2021 at 01:35:15PM -0800, Linus Torvalds wrote: > On Mon, Jan 18, 2021 at 12:24 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > Anybody want to play with this? I'd suggest keeping that "dummy" > > parameter around for a while - I did an allmodconfig build with this, > > but if there are any architecture-specific non-x86-64 cases I wouldn't > > have seen them. > > Not architecture-specific, but I did find by some grepping that I > missed one line discipline driver: the Siemens R3964. > > The reason I missed that is because it's been marked BROKEN in the > Kconfig for almost two years, so it didn't show up in my allmodconfig > coverage. I need to just delete that thing now, obviously no one uses it anymore, sorry for it getting in the way... greg k-h ^ permalink raw reply [flat|nested] 65+ messages in thread
end of thread, other threads:[~2021-01-27 5:56 UTC | newest] Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-01-16 7:35 Splicing to/from a tty Oliver Giles 2021-01-16 16:46 ` Johannes Berg 2021-01-17 6:12 ` Oliver Giles 2021-01-18 8:53 ` Christoph Hellwig 2021-01-18 8:58 ` Johannes Berg 2021-01-18 19:26 ` Linus Torvalds 2021-01-18 19:45 ` Al Viro 2021-01-18 19:49 ` Linus Torvalds 2021-01-18 19:56 ` Al Viro 2021-01-24 19:11 ` Linus Torvalds 2021-01-25 9:16 ` [PATCH] fs/pipe: allow sendfile() to pipe again Johannes Berg 2021-01-25 10:16 ` Christoph Hellwig 2021-01-25 20:34 ` Linus Torvalds 2021-01-26 6:07 ` Splicing to/from a tty Al Viro 2021-01-26 6:08 ` [PATCH 1/3] do_splice_to(): move the logics for limiting the read length in Al Viro 2021-01-26 6:09 ` [PATCH 2/3] take the guts of file-to-pipe splice into a helper function Al Viro 2021-01-26 6:09 ` [PATCH 3/3] teach sendfile(2) to handle send-to-pipe directly Al Viro 2021-01-26 18:57 ` Linus Torvalds 2021-01-26 19:33 ` Al Viro 2021-01-26 18:49 ` Splicing to/from a tty Linus Torvalds 2021-01-26 19:42 ` Al Viro 2021-01-18 19:34 ` Al Viro 2021-01-18 19:46 ` Linus Torvalds 2021-01-18 19:54 ` Al Viro 2021-01-20 16:26 ` Al Viro 2021-01-20 19:11 ` Al Viro 2021-01-20 19:27 ` Linus Torvalds 2021-01-20 22:25 ` David Laight 2021-01-20 23:02 ` Al Viro 2021-01-20 23:14 ` Al Viro 2021-01-20 23:40 ` Linus Torvalds 2021-01-21 0:38 ` Al Viro 2021-01-21 1:04 ` Linus Torvalds 2021-01-21 1:45 ` Al Viro 2021-01-21 3:38 ` Linus Torvalds 2021-01-21 6:05 ` Willy Tarreau 2021-01-21 8:04 ` Johannes Berg 2021-01-21 10:08 ` David Laight 2021-01-18 8:16 ` Christoph Hellwig 2021-01-18 19:36 ` Linus Torvalds 2021-01-18 20:24 ` Linus Torvalds 2021-01-18 21:35 ` Linus Torvalds 2021-01-18 21:54 ` Linus Torvalds 2021-01-18 22:03 ` Linus Torvalds 2021-01-18 22:20 ` Linus Torvalds 2021-01-19 1:38 ` Linus Torvalds 2021-01-19 11:53 ` Greg Kroah-Hartman 2021-01-19 16:56 ` Robert Karszniewicz 2021-01-19 17:10 ` Robert Karszniewicz 2021-01-19 22:09 ` Oliver Giles 2021-01-19 17:25 ` Linus Torvalds 2021-01-19 20:24 ` Linus Torvalds 2021-01-19 20:38 ` Christoph Hellwig 2021-01-20 1:25 ` Oliver Giles 2021-01-20 4:44 ` Linus Torvalds 2021-01-20 8:15 ` Oliver Giles 2021-01-21 1:18 ` tty splice branch (was "Re: Splicing to/from a tty") Linus Torvalds 2021-01-21 8:44 ` Greg Kroah-Hartman 2021-01-21 8:50 ` Jiri Slaby 2021-01-21 8:58 ` Jiri Slaby 2021-01-21 17:52 ` Linus Torvalds 2021-01-21 8:58 ` Greg Kroah-Hartman 2021-01-21 17:03 ` Splicing to/from a tty Robert Karszniewicz 2021-01-21 18:43 ` Linus Torvalds 2021-01-19 11:52 ` Greg Kroah-Hartman
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.