All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/11] Avoid unprivileged splice(file->)/(->socket) pipe exclusion
@ 2023-12-21  3:08 Ahelenia Ziemiańska
  2023-12-21  3:08 ` [PATCH v2 01/11] splice: copy_splice_read: do the I/O with IOCB_NOWAIT Ahelenia Ziemiańska
                   ` (13 more replies)
  0 siblings, 14 replies; 25+ messages in thread
From: Ahelenia Ziemiańska @ 2023-12-21  3:08 UTC (permalink / raw)
  Cc: Jens Axboe, Christian Brauner, Alexander Viro, linux-fsdevel,
	Greg Kroah-Hartman, Jiri Slaby, Miklos Szeredi, Vivek Goyal,
	Stefan Hajnoczi, Eric Dumazet, David S. Miller, David Ahern,
	Jakub Kicinski, Paolo Abeni, Wenjia Zhang, Jan Karcher, D. Wythe,
	Tony Lu, Wen Gu, Boris Pismenny, John Fastabend, David Howells,
	Shigeru Yoshida, Peilin Ye, Kuniyuki Iwashima,
	Alexander Mikhalitsyn, Daan De Meyer, linux-kernel, linux-serial,
	virtualization, netdev, linux-s390, Alejandro Colomar, linux-man

[-- Attachment #1: Type: text/plain, Size: 4391 bytes --]

Hi!

As it stands, splice(file -> pipe):
1. locks the pipe,
2. does a read from the file,
3. unlocks the pipe.

When the file resides on a normal filesystem, this isn't an issue
because the filesystem has been defined as trusted by root having
mounted it.

But when the file is actually IPC (FUSE) or is just IPC (sockets)
or is a tty, this means that the pipe lock will be held for an
attacker-controlled length of time, and during that time every
process trying to read from, write to, open, or close the pipe
enters an uninterruptible sleep, and will only exit it if the
splicing process is killed.

This trivially denies service to:
* any hypothetical pipe-based log collexion system
* all nullmailer installations
* me, personally, when I'm pasting stuff into qemu -serial chardev:pipe

A symmetric situation happens for splicing(pipe -> socket):
the pipe is locked for as long as the socket is full.

This follows:
1. https://lore.kernel.org/linux-fsdevel/qk6hjuam54khlaikf2ssom6custxf5is2ekkaequf4hvode3ls@zgf7j5j4ubvw/t/#u
2. a security@ thread rooted in
   <irrrblivicfc7o3lfq7yjm2lrxq35iyya4gyozlohw24gdzyg7@azmluufpdfvu>
3. https://nabijaczleweli.xyz/content/blogn_t/011-linux-splice-exclusion.html
4. https://lore.kernel.org/lkml/cover.1697486714.git.nabijaczleweli@nabijaczleweli.xyz/t/#u  (v1)
   https://lore.kernel.org/lkml/1cover.1697486714.git.nabijaczleweli@nabijaczleweli.xyz/t/#u (resend)
   https://lore.kernel.org/lkml/2cover.1697486714.git.nabijaczleweli@nabijaczleweli.xyz/t/#u (reresend)
5. https://lore.kernel.org/lkml/dtexwpw6zcdx7dkx3xj5gyjp5syxmyretdcbcdtvrnukd4vvuh@tarta.nabijaczleweli.xyz/t/#u
   (relay_file_splice_read removal)

1-7/11 request MSG_DONTWAIT (sockets)/IOCB_NOWAIT (generic) on the read

  8/11 removes splice_read from tty completely

  9/11 removes splice_read from FUSE filesystems
       (except virtiofs which has normal mounting security semantics,
        but is handled via FUSE code)

 10/11 allows splice_read from FUSE filesystems mounted by real root
       (this matches the blessing received by non-FUSE network filesystems)

 11/11 requests MSG_DONTWAIT for splice(pipe -> socket).

 12/11 has the man-pages patch with draft wording.

All but 5/11 (AF_SMC) have been tested and embed shell programs to
repro them. AIUI I'd need an s390 machine for it? It's trivial.

6/11 (AF_KCM) also fixes kcm_splice_read() passing SPLICE_F_*-style
flags to skb_recv_datagram(), which takes MSG_*-style flags. I don't
think they did anything anyway? But.

There are two implementations that definitely sleep all the time
and I didn't touch them:
  tracing_splice_read_pipe
  tracing_buffers_splice_read (dropped in v2, v1 4/11)
the semantics are lost on me, but they're in debugfs/tracefs, so
it doesn't matter if they block so long as they work, and presumably
they do right now.

There is also relay_file_splice_read (dropped in v2, v1 5/11),
which isn't an implementation at all because it's dead code, broken,
and removed in -mm.

The diffs in 1-7,11/11 are unchanged, save for a rebase in 7/11.
8/11 replaces the file type test in v1 10/11.
9/11 and 10/11 are new in v2.

Ahelenia Ziemiańska (11):
  splice: copy_splice_read: do the I/O with IOCB_NOWAIT
  af_unix: unix_stream_splice_read: always request MSG_DONTWAIT
  fuse: fuse_dev_splice_read: use nonblocking I/O
  net/smc: smc_splice_read: always request MSG_DONTWAIT
  kcm: kcm_splice_read: always request MSG_DONTWAIT
  tls/sw: tls_sw_splice_read: always request non-blocking I/O
  net/tcp: tcp_splice_read: always do non-blocking reads
  tty: splice_read: disable
  fuse: file: limit splice_read to virtiofs
  fuse: allow splicing from filesystems mounted by real root
  splice: splice_to_socket: always request MSG_DONTWAIT

 drivers/tty/tty_io.c |  2 --
 fs/fuse/dev.c        | 10 ++++++----
 fs/fuse/file.c       | 17 ++++++++++++++++-
 fs/fuse/fuse_i.h     |  4 ++++
 fs/fuse/inode.c      |  2 ++
 fs/fuse/virtio_fs.c  |  1 +
 fs/splice.c          |  5 ++---
 net/ipv4/tcp.c       | 32 +++-----------------------------
 net/kcm/kcmsock.c    |  2 +-
 net/smc/af_smc.c     |  6 +-----
 net/tls/tls_sw.c     |  5 ++---
 net/unix/af_unix.c   |  5 +----
 12 files changed, 39 insertions(+), 52 deletions(-)

base-commit: 2cf4f94d8e8646803f8fb0facf134b0cd7fb691a
--
2.39.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 01/11] splice: copy_splice_read: do the I/O with IOCB_NOWAIT
  2023-12-21  3:08 [PATCH v2 00/11] Avoid unprivileged splice(file->)/(->socket) pipe exclusion Ahelenia Ziemiańska
@ 2023-12-21  3:08 ` Ahelenia Ziemiańska
  2023-12-21  8:27   ` Christoph Hellwig
  2023-12-21  3:08 ` [PATCH v2 02/11] af_unix: unix_stream_splice_read: always request MSG_DONTWAIT Ahelenia Ziemiańska
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: Ahelenia Ziemiańska @ 2023-12-21  3:08 UTC (permalink / raw)
  Cc: Jens Axboe, Christian Brauner, Alexander Viro, linux-fsdevel,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1330 bytes --]

Otherwise we risk sleeping with the pipe locked for indeterminate
lengths of time ‒ given:
	cat > udp.c <<^D
	#define _GNU_SOURCE
	#include <fcntl.h>
	#include <sys/socket.h>
	#include <netinet/in.h>
	#include <netinet/udp.h>
	int main()
	{
		int s = socket(AF_INET, SOCK_DGRAM, 0);
		bind(s,
		     &(struct sockaddr_in){ .sin_family = AF_INET,
					    .sin_addr.s_addr = htonl(INADDR_ANY) },
		     sizeof(struct sockaddr_in));
		for (;;)
			splice(s, 0, 1, 0, 128 * 1024 * 1024, 0);
	}
	^D
	cc udp.c -o udp
	mkfifo fifo
	./udp > fifo &
	read -r _ < fifo &
	sleep 0.1
	echo zupa > fifo
udp used to sleep in splice and the shell used to enter an
uninterruptible sleep in open("fifo");
now the splice returns -EAGAIN and the whole program completes.

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
---
 fs/splice.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/splice.c b/fs/splice.c
index d983d375ff11..9d29664f23ee 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -361,6 +361,7 @@ ssize_t copy_splice_read(struct file *in, loff_t *ppos,
 	iov_iter_bvec(&to, ITER_DEST, bv, npages, len);
 	init_sync_kiocb(&kiocb, in);
 	kiocb.ki_pos = *ppos;
+	kiocb.ki_flags |= IOCB_NOWAIT;
 	ret = call_read_iter(in, &kiocb, &to);
 
 	if (ret > 0) {
-- 
2.39.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 02/11] af_unix: unix_stream_splice_read: always request MSG_DONTWAIT
  2023-12-21  3:08 [PATCH v2 00/11] Avoid unprivileged splice(file->)/(->socket) pipe exclusion Ahelenia Ziemiańska
  2023-12-21  3:08 ` [PATCH v2 01/11] splice: copy_splice_read: do the I/O with IOCB_NOWAIT Ahelenia Ziemiańska
@ 2023-12-21  3:08 ` Ahelenia Ziemiańska
  2023-12-21  3:08 ` [PATCH v2 03/11] fuse: fuse_dev_splice_read: use nonblocking I/O Ahelenia Ziemiańska
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Ahelenia Ziemiańska @ 2023-12-21  3:08 UTC (permalink / raw)
  Cc: Jens Axboe, Christian Brauner, Alexander Viro, linux-fsdevel,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Kuniyuki Iwashima, David Howells, Alexander Mikhalitsyn,
	John Fastabend, Daan De Meyer, netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1360 bytes --]

Otherwise we risk sleeping with the pipe locked for indeterminate
lengths of time ‒ given:
	cat > unix.c <<^D
	#define _GNU_SOURCE
	#include <fcntl.h>
	#include <sys/socket.h>
	#include <sys/un.h>
	int main()
	{
		int sp[2];
		socketpair(AF_UNIX, SOCK_STREAM, 0, sp);
		for (;;)
			splice(sp[0], 0, 1, 0, 128 * 1024 * 1024, 0);
	}
	^D
	cc unix.c -o unix
	mkfifo fifo
	./unix > fifo &
	read -r _ < fifo &
	sleep 0.1
	echo zupa > fifo
unix used to sleep in splice and the shell used to enter an
uninterruptible sleep in open("fifo");
now the splice returns -EAGAIN and the whole program completes.

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
---
 net/unix/af_unix.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index ac1f2bc18fc9..bae84552bf58 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2921,15 +2921,12 @@ static ssize_t unix_stream_splice_read(struct socket *sock,  loff_t *ppos,
 		.pipe = pipe,
 		.size = size,
 		.splice_flags = flags,
+		.flags = MSG_DONTWAIT,
 	};
 
 	if (unlikely(*ppos))
 		return -ESPIPE;
 
-	if (sock->file->f_flags & O_NONBLOCK ||
-	    flags & SPLICE_F_NONBLOCK)
-		state.flags = MSG_DONTWAIT;
-
 	return unix_stream_read_generic(&state, false);
 }
 
-- 
2.39.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 03/11] fuse: fuse_dev_splice_read: use nonblocking I/O
  2023-12-21  3:08 [PATCH v2 00/11] Avoid unprivileged splice(file->)/(->socket) pipe exclusion Ahelenia Ziemiańska
  2023-12-21  3:08 ` [PATCH v2 01/11] splice: copy_splice_read: do the I/O with IOCB_NOWAIT Ahelenia Ziemiańska
  2023-12-21  3:08 ` [PATCH v2 02/11] af_unix: unix_stream_splice_read: always request MSG_DONTWAIT Ahelenia Ziemiańska
@ 2023-12-21  3:08 ` Ahelenia Ziemiańska
  2023-12-21  3:09 ` [PATCH v2 04/11] net/smc: smc_splice_read: always request MSG_DONTWAIT Ahelenia Ziemiańska
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Ahelenia Ziemiańska @ 2023-12-21  3:08 UTC (permalink / raw)
  Cc: Jens Axboe, Christian Brauner, Alexander Viro, linux-fsdevel,
	Miklos Szeredi, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2532 bytes --]

Otherwise we risk sleeping with the pipe locked for indeterminate
lengths of time ‒ since FUSE is usually installed with the fusermount
helper suid, given
	cat > fusedev.c <<^D
	#define _GNU_SOURCE
	#include <fcntl.h>
	#define FUSE_USE_VERSION 30
	#include <fuse.h>
	static void *fop_init(struct fuse_conn_info *conn, struct fuse_config *cfg)
	{
		for (;;)
			splice(3, 0, 4, 0, 128 * 1024 * 1024, 0);
	}
	static const struct fuse_operations fops = { .init = fop_init };
	int main(int argc, char **argv)
	{
		return fuse_main(argc, argv, &fops, NULL);
	}
	^D
	cc nullsleep.c $(pkg-config fuse3 --cflags --libs) -o nullsleep
	mkfifo fifo
	mkdir dir
	./nullsleep dir 4>fifo &
	read -r _ < fifo &
	sleep 0.1
	echo zupa > fifo
nullsleep used to sleep in splice and the shell used to enter an
uninterruptible sleep in open("fifo");
now the splice returns -EAGAIN and the whole program completes.

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
---
 fs/fuse/dev.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 1a8f82f478cb..4e8caf66c01e 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1202,7 +1202,8 @@ __releases(fiq->lock)
  * the 'sent' flag.
  */
 static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
-				struct fuse_copy_state *cs, size_t nbytes)
+				struct fuse_copy_state *cs, size_t nbytes,
+				bool nonblock)
 {
 	ssize_t err;
 	struct fuse_conn *fc = fud->fc;
@@ -1238,7 +1239,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
 			break;
 		spin_unlock(&fiq->lock);
 
-		if (file->f_flags & O_NONBLOCK)
+		if (nonblock)
 			return -EAGAIN;
 		err = wait_event_interruptible_exclusive(fiq->waitq,
 				!fiq->connected || request_pending(fiq));
@@ -1364,7 +1365,8 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, struct iov_iter *to)
 
 	fuse_copy_init(&cs, 1, to);
 
-	return fuse_dev_do_read(fud, file, &cs, iov_iter_count(to));
+	return fuse_dev_do_read(fud, file, &cs, iov_iter_count(to),
+				file->f_flags & O_NONBLOCK);
 }
 
 static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos,
@@ -1388,7 +1390,7 @@ static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos,
 	fuse_copy_init(&cs, 1, NULL);
 	cs.pipebufs = bufs;
 	cs.pipe = pipe;
-	ret = fuse_dev_do_read(fud, in, &cs, len);
+	ret = fuse_dev_do_read(fud, in, &cs, len, true);
 	if (ret < 0)
 		goto out;
 
-- 
2.39.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 04/11] net/smc: smc_splice_read: always request MSG_DONTWAIT
  2023-12-21  3:08 [PATCH v2 00/11] Avoid unprivileged splice(file->)/(->socket) pipe exclusion Ahelenia Ziemiańska
                   ` (2 preceding siblings ...)
  2023-12-21  3:08 ` [PATCH v2 03/11] fuse: fuse_dev_splice_read: use nonblocking I/O Ahelenia Ziemiańska
@ 2023-12-21  3:09 ` Ahelenia Ziemiańska
  2023-12-21  3:09 ` [PATCH v2 05/11] kcm: kcm_splice_read: " Ahelenia Ziemiańska
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Ahelenia Ziemiańska @ 2023-12-21  3:09 UTC (permalink / raw)
  Cc: Jens Axboe, Christian Brauner, Alexander Viro, linux-fsdevel,
	Wenjia Zhang, Jan Karcher, D. Wythe, Tony Lu, Wen Gu,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-s390, netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 955 bytes --]

Otherwise we risk sleeping with the pipe locked for indeterminate
lengths of time ‒ this meant that splice(smc -> pipe) with no data
would hold the pipe lock, and any open/read/write/close on the pipe
would enter uninterruptible sleep.

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
---
 net/smc/af_smc.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 73eebddbbf41..a11a966d031a 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -3248,12 +3248,8 @@ static ssize_t smc_splice_read(struct socket *sock, loff_t *ppos,
 			rc = -ESPIPE;
 			goto out;
 		}
-		if (flags & SPLICE_F_NONBLOCK)
-			flags = MSG_DONTWAIT;
-		else
-			flags = 0;
 		SMC_STAT_INC(smc, splice_cnt);
-		rc = smc_rx_recvmsg(smc, NULL, pipe, len, flags);
+		rc = smc_rx_recvmsg(smc, NULL, pipe, len, MSG_DONTWAIT);
 	}
 out:
 	release_sock(sk);
-- 
2.39.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 05/11] kcm: kcm_splice_read: always request MSG_DONTWAIT
  2023-12-21  3:08 [PATCH v2 00/11] Avoid unprivileged splice(file->)/(->socket) pipe exclusion Ahelenia Ziemiańska
                   ` (3 preceding siblings ...)
  2023-12-21  3:09 ` [PATCH v2 04/11] net/smc: smc_splice_read: always request MSG_DONTWAIT Ahelenia Ziemiańska
@ 2023-12-21  3:09 ` Ahelenia Ziemiańska
  2023-12-21  3:09 ` [PATCH v2 06/11] tls/sw: tls_sw_splice_read: always request non-blocking I/O Ahelenia Ziemiańska
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Ahelenia Ziemiańska @ 2023-12-21  3:09 UTC (permalink / raw)
  Cc: Jens Axboe, Christian Brauner, Alexander Viro, linux-fsdevel,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Howells, Shigeru Yoshida, Kuniyuki Iwashima, Peilin Ye,
	netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1400 bytes --]

Otherwise we risk sleeping with the pipe locked for indeterminate
lengths of time ‒ given:
	cat > kcm.c <<^D
	#define _GNU_SOURCE
	#include <fcntl.h>
	#include <sys/socket.h>
	#include <netinet/in.h>
	#include <linux/kcm.h>
	int main()
	{
		int kcm = socket(AF_KCM, SOCK_SEQPACKET, KCMPROTO_CONNECTED);
		for (;;)
			splice(kcm, 0, 1, 0, 128 * 1024 * 1024, 0);
	}
	^D
	cc kcm.c -o kcm
	mkfifo fifo
	./kcm > fifo &
	read -r _ < fifo &
	sleep 0.1
	echo zupa > fifo
kcm used to sleep in splice and the shell used to enter an
uninterruptible sleep in open("fifo");
now the splice returns -EAGAIN and the whole program completes.

Also: don't pass the SPLICE_F_*-style flags argument to
skb_recv_datagram(), which expects MSG_*-style flags.
This fixes SPLICE_F_NONBLOCK not having worked.

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
---
 net/kcm/kcmsock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index 65d1f6755f98..ccfc46f31891 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -1028,7 +1028,7 @@ static ssize_t kcm_splice_read(struct socket *sock, loff_t *ppos,
 
 	/* Only support splice for SOCKSEQPACKET */
 
-	skb = skb_recv_datagram(sk, flags, &err);
+	skb = skb_recv_datagram(sk, MSG_DONTWAIT, &err);
 	if (!skb)
 		goto err_out;
 
-- 
2.39.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 06/11] tls/sw: tls_sw_splice_read: always request non-blocking I/O
  2023-12-21  3:08 [PATCH v2 00/11] Avoid unprivileged splice(file->)/(->socket) pipe exclusion Ahelenia Ziemiańska
                   ` (4 preceding siblings ...)
  2023-12-21  3:09 ` [PATCH v2 05/11] kcm: kcm_splice_read: " Ahelenia Ziemiańska
@ 2023-12-21  3:09 ` Ahelenia Ziemiańska
  2023-12-21  3:09 ` [PATCH v2 07/11] net/tcp: tcp_splice_read: always do non-blocking reads Ahelenia Ziemiańska
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Ahelenia Ziemiańska @ 2023-12-21  3:09 UTC (permalink / raw)
  Cc: Jens Axboe, Christian Brauner, Alexander Viro, linux-fsdevel,
	Boris Pismenny, John Fastabend, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2234 bytes --]

Otherwise we risk sleeping with the pipe locked for indeterminate
lengths of time ‒ given
	cat > tls_sw.c <<^D
	#define _GNU_SOURCE
	#include <fcntl.h>
	#include <unistd.h>
	#include <sys/socket.h>
	#include <netinet/in.h>
	#include <netinet/tcp.h>
	#include <linux/tls.h>
	int main()
	{
		int s = socket(AF_INET, SOCK_STREAM, 0);
		struct sockaddr_in addr = {
			.sin_family = AF_INET,
			.sin_addr = { htonl(INADDR_LOOPBACK) },
			.sin_port = htons(getpid() % (0xFFFF - 1000) + 1000)
		};
		bind(s, &addr, sizeof(addr));
		listen(s, 1);
		if (!fork()) {
			connect(socket(AF_INET, SOCK_STREAM, 0), &addr, sizeof(addr));
			sleep(100);
			return 0;
		}

		s = accept(s, NULL, NULL);
		setsockopt(s, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));
		setsockopt(s, SOL_TLS, TLS_RX,
			   &(struct tls12_crypto_info_aes_gcm_128){
				   .info.version = TLS_1_2_VERSION,
				   .info.cipher_type = TLS_CIPHER_AES_GCM_128 },
			   sizeof(struct tls12_crypto_info_aes_gcm_128));

		for (;;)
			splice(s, 0, 1, 0, 128 * 1024 * 1024, 0);
	}
	^D
	cc tls_sw.c -o tls_sw
	mkfifo fifo
	./tls_sw > fifo &
	read -r _ < fifo &
	sleep 0.1
	echo zupa > fifo
tls_sw used to sleep in splice and the shell used to enter an
uninterruptible sleep in open("fifo");
now the splice returns -EAGAIN and the whole program completes.

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
---
 net/tls/tls_sw.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index e37b4d2e2acd..3f474deed94d 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -2157,7 +2157,7 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t *ppos,
 	int chunk;
 	int err;
 
-	err = tls_rx_reader_lock(sk, ctx, flags & SPLICE_F_NONBLOCK);
+	err = tls_rx_reader_lock(sk, ctx, true);
 	if (err < 0)
 		return err;
 
@@ -2166,8 +2166,7 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t *ppos,
 	} else {
 		struct tls_decrypt_arg darg;
 
-		err = tls_rx_rec_wait(sk, NULL, flags & SPLICE_F_NONBLOCK,
-				      true);
+		err = tls_rx_rec_wait(sk, NULL, true, true);
 		if (err <= 0)
 			goto splice_read_end;
 
-- 
2.39.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 07/11] net/tcp: tcp_splice_read: always do non-blocking reads
  2023-12-21  3:08 [PATCH v2 00/11] Avoid unprivileged splice(file->)/(->socket) pipe exclusion Ahelenia Ziemiańska
                   ` (5 preceding siblings ...)
  2023-12-21  3:09 ` [PATCH v2 06/11] tls/sw: tls_sw_splice_read: always request non-blocking I/O Ahelenia Ziemiańska
@ 2023-12-21  3:09 ` Ahelenia Ziemiańska
  2023-12-21  3:09 ` [PATCH v2 08/11] tty: splice_read: disable Ahelenia Ziemiańska
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Ahelenia Ziemiańska @ 2023-12-21  3:09 UTC (permalink / raw)
  Cc: Jens Axboe, Christian Brauner, Alexander Viro, linux-fsdevel,
	Eric Dumazet, David S. Miller, David Ahern, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2846 bytes --]

Otherwise we risk sleeping with the pipe locked for indeterminate
lengths of time ‒ given:
	cat > tcp.c <<^D
	#define _GNU_SOURCE
	#include <fcntl.h>
	#include <unistd.h>
	#include <sys/socket.h>
	#include <netinet/in.h>
	#include <linux/tls.h>
	int main()
	{
		int s = socket(AF_INET, SOCK_STREAM, 0);
		struct sockaddr_in addr = {
			.sin_family = AF_INET,
			.sin_addr = { htonl(INADDR_LOOPBACK) },
			.sin_port = htons(getpid() % (0xFFFF - 1000) + 1000)
		};
		bind(s, &addr, sizeof(addr));
		listen(s, 1);
		if (!fork()) {
			connect(socket(AF_INET, SOCK_STREAM, 0), &addr, sizeof(addr));
			sleep(100);
			return 0;
		}

		s = accept(s, NULL, NULL);
		for (;;)
			splice(s, 0, 1, 0, 128 * 1024 * 1024, 0);
	}
	^D
	cc tcp.c -o tcp
	mkfifo fifo
	./tcp > fifo &
	read -r _ < fifo &
	sleep 0.1
	echo zupa > fifo
tcp used to sleep in splice and the shell used to enter an
uninterruptible sleep in open("fifo");
now the splice returns -EAGAIN and the whole program completes.

sock_rcvtimeo() returns 0 if the second argument is true, so the
explicit re-try loop for empty read conditions can be removed
entirely.

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
---
 net/ipv4/tcp.c | 32 +++-----------------------------
 1 file changed, 3 insertions(+), 29 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index ff6838ca2e58..17a0e2a766b7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -782,7 +782,6 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
 		.len = len,
 		.flags = flags,
 	};
-	long timeo;
 	ssize_t spliced;
 	int ret;
 
@@ -797,7 +796,6 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
 
 	lock_sock(sk);
 
-	timeo = sock_rcvtimeo(sk, sock->file->f_flags & O_NONBLOCK);
 	while (tss.len) {
 		ret = __tcp_splice_read(sk, &tss);
 		if (ret < 0)
@@ -821,37 +819,13 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
 				ret = -ENOTCONN;
 				break;
 			}
-			if (!timeo) {
-				ret = -EAGAIN;
-				break;
-			}
-			/* if __tcp_splice_read() got nothing while we have
-			 * an skb in receive queue, we do not want to loop.
-			 * This might happen with URG data.
-			 */
-			if (!skb_queue_empty(&sk->sk_receive_queue))
-				break;
-			ret = sk_wait_data(sk, &timeo, NULL);
-			if (ret < 0)
-				break;
-			if (signal_pending(current)) {
-				ret = sock_intr_errno(timeo);
-				break;
-			}
-			continue;
+			ret = -EAGAIN;
+			break;
 		}
 		tss.len -= ret;
 		spliced += ret;
 
-		if (!tss.len || !timeo)
-			break;
-		release_sock(sk);
-		lock_sock(sk);
-
-		if (sk->sk_err || sk->sk_state == TCP_CLOSE ||
-		    (sk->sk_shutdown & RCV_SHUTDOWN) ||
-		    signal_pending(current))
-			break;
+		break;
 	}
 
 	release_sock(sk);
-- 
2.39.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 08/11] tty: splice_read: disable
  2023-12-21  3:08 [PATCH v2 00/11] Avoid unprivileged splice(file->)/(->socket) pipe exclusion Ahelenia Ziemiańska
                   ` (6 preceding siblings ...)
  2023-12-21  3:09 ` [PATCH v2 07/11] net/tcp: tcp_splice_read: always do non-blocking reads Ahelenia Ziemiańska
@ 2023-12-21  3:09 ` Ahelenia Ziemiańska
  2023-12-21  8:10   ` Greg Kroah-Hartman
  2024-01-03 11:36   ` Jiri Slaby
  2023-12-21  3:09 ` [PATCH v2 09/11] fuse: file: limit splice_read to virtiofs Ahelenia Ziemiańska
                   ` (5 subsequent siblings)
  13 siblings, 2 replies; 25+ messages in thread
From: Ahelenia Ziemiańska @ 2023-12-21  3:09 UTC (permalink / raw)
  Cc: Jens Axboe, Christian Brauner, Alexander Viro, linux-fsdevel,
	Greg Kroah-Hartman, Jiri Slaby, linux-kernel, linux-serial

[-- Attachment #1: Type: text/plain, Size: 1463 bytes --]

We request non-blocking I/O in the generic copy_splice_read, but
"the tty layer doesn't actually honor the IOCB_NOWAIT flag for
various historical reasons.". This means that a tty->pipe splice
will happily sleep with the pipe locked forever, and any process
trying to take it (due to an open/read/write/&c.) will enter
uninterruptible sleep.

This also masks inconsistent wake-ups (usually every second line)
when splicing from ttys in icanon mode.

Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wimmqG_wvSRtMiKPeGGDL816n65u=Mq2+H3-=uM2U6FmA@mail.gmail.com/
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
---
 drivers/tty/tty_io.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 06414e43e0b5..50c2957a9c7f 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -465,7 +465,6 @@ static const struct file_operations tty_fops = {
 	.llseek		= no_llseek,
 	.read_iter	= tty_read,
 	.write_iter	= tty_write,
-	.splice_read	= copy_splice_read,
 	.splice_write	= iter_file_splice_write,
 	.poll		= tty_poll,
 	.unlocked_ioctl	= tty_ioctl,
@@ -480,7 +479,6 @@ static const struct file_operations console_fops = {
 	.llseek		= no_llseek,
 	.read_iter	= tty_read,
 	.write_iter	= redirected_tty_write,
-	.splice_read	= copy_splice_read,
 	.splice_write	= iter_file_splice_write,
 	.poll		= tty_poll,
 	.unlocked_ioctl	= tty_ioctl,
-- 
2.39.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 09/11] fuse: file: limit splice_read to virtiofs
  2023-12-21  3:08 [PATCH v2 00/11] Avoid unprivileged splice(file->)/(->socket) pipe exclusion Ahelenia Ziemiańska
                   ` (7 preceding siblings ...)
  2023-12-21  3:09 ` [PATCH v2 08/11] tty: splice_read: disable Ahelenia Ziemiańska
@ 2023-12-21  3:09 ` Ahelenia Ziemiańska
  2024-01-10 13:43   ` Miklos Szeredi
  2023-12-21  3:09 ` [PATCH v2 10/11] fuse: allow splicing from filesystems mounted by real root Ahelenia Ziemiańska
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: Ahelenia Ziemiańska @ 2023-12-21  3:09 UTC (permalink / raw)
  Cc: Jens Axboe, Christian Brauner, Alexander Viro, linux-fsdevel,
	Miklos Szeredi, Vivek Goyal, Stefan Hajnoczi, linux-kernel,
	virtualization

[-- Attachment #1: Type: text/plain, Size: 2992 bytes --]

Potentially-blocking splice_reads are allowed for normal filesystems
like NFS because they're blessed by root.

FUSE is commonly used suid-root, and allows anyone to trivially create
a file that, when spliced from, will just sleep forever with the pipe
lock held.

The only way IPC to the fusing process could be avoided is if
!(ff->open_flags & FOPEN_DIRECT_IO) and the range was already cached
and we weren't past the end. Just refuse it.

virtiofs behaves like a normal filesystem and can only be mounted
by root, it's unaffected by use of a new "trusted" connection flag.
This may be extended to include real FUSE mounts by processes which
aren't suid, to match the semantics for normal filesystems.

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
---
 fs/fuse/file.c      | 17 ++++++++++++++++-
 fs/fuse/fuse_i.h    |  3 +++
 fs/fuse/virtio_fs.c |  1 +
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index a660f1f21540..20bb16ddfcc9 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -3200,6 +3200,21 @@ static ssize_t fuse_copy_file_range(struct file *src_file, loff_t src_off,
 	return ret;
 }
 
+static long fuse_splice_read(struct file *in, loff_t *ppos,
+			     struct pipe_inode_info *pipe, size_t len,
+			     unsigned int flags)
+{
+	struct inode *inode = file_inode(in);
+
+	if (fuse_is_bad(inode))
+		return -EIO;
+
+	if (get_fuse_conn(inode)->trusted)
+		return filemap_splice_read(in, ppos, pipe, len, flags);
+
+	return -EINVAL;
+}
+
 static const struct file_operations fuse_file_operations = {
 	.llseek		= fuse_file_llseek,
 	.read_iter	= fuse_file_read_iter,
@@ -3212,7 +3227,7 @@ static const struct file_operations fuse_file_operations = {
 	.lock		= fuse_file_lock,
 	.get_unmapped_area = thp_get_unmapped_area,
 	.flock		= fuse_file_flock,
-	.splice_read	= filemap_splice_read,
+	.splice_read	= fuse_splice_read,
 	.splice_write	= iter_file_splice_write,
 	.unlocked_ioctl	= fuse_file_ioctl,
 	.compat_ioctl	= fuse_file_compat_ioctl,
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 1df83eebda92..463c5d4ad8b4 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -818,6 +818,9 @@ struct fuse_conn {
 	/* Is statx not implemented by fs? */
 	unsigned int no_statx:1;
 
+	/* Do we trust this connection to always respond? */
+	bool trusted:1;
+
 	/** The number of requests waiting for completion */
 	atomic_t num_waiting;
 
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 5f1be1da92ce..fce0fe24899a 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -1448,6 +1448,7 @@ static int virtio_fs_get_tree(struct fs_context *fsc)
 	fc->delete_stale = true;
 	fc->auto_submounts = true;
 	fc->sync_fs = true;
+	fc->trusted = true;
 
 	/* Tell FUSE to split requests that exceed the virtqueue's size */
 	fc->max_pages_limit = min_t(unsigned int, fc->max_pages_limit,
-- 
2.39.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 10/11] fuse: allow splicing from filesystems mounted by real root
  2023-12-21  3:08 [PATCH v2 00/11] Avoid unprivileged splice(file->)/(->socket) pipe exclusion Ahelenia Ziemiańska
                   ` (8 preceding siblings ...)
  2023-12-21  3:09 ` [PATCH v2 09/11] fuse: file: limit splice_read to virtiofs Ahelenia Ziemiańska
@ 2023-12-21  3:09 ` Ahelenia Ziemiańska
  2023-12-21  3:09 ` [PATCH v2 11/11] splice: splice_to_socket: always request MSG_DONTWAIT Ahelenia Ziemiańska
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Ahelenia Ziemiańska @ 2023-12-21  3:09 UTC (permalink / raw)
  Cc: Jens Axboe, Christian Brauner, Alexander Viro, linux-fsdevel,
	Miklos Szeredi, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1605 bytes --]

FUSE tends to be installed suid 0: this allows normal users to mount
anything, including a program whose read implementation consists
of for(;;) sleep(1);, which, if splice were allowed, would sleep
forever with the pipe lock held.

Normal filesystems can only be mounted by root, and are thus deemed
safe. Extend this to when root mounts a FUSE filesystem with an
explicit check.

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
---
 fs/fuse/fuse_i.h | 1 +
 fs/fuse/inode.c  | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 463c5d4ad8b4..a9ceaf10c1d2 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -532,6 +532,7 @@ struct fuse_fs_context {
 	bool no_control:1;
 	bool no_force_umount:1;
 	bool legacy_opts_show:1;
+	bool trusted:1;
 	enum fuse_dax_mode dax_mode;
 	unsigned int max_read;
 	unsigned int blksize;
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 2a6d44f91729..91108ba9acec 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1779,6 +1779,7 @@ static int fuse_get_tree(struct fs_context *fsc)
 
 	fuse_conn_init(fc, fm, fsc->user_ns, &fuse_dev_fiq_ops, NULL);
 	fc->release = fuse_free_conn;
+	fc->trusted = ctx->trusted;
 
 	fsc->s_fs_info = fm;
 
@@ -1840,6 +1841,7 @@ static int fuse_init_fs_context(struct fs_context *fsc)
 	ctx->max_read = ~0;
 	ctx->blksize = FUSE_DEFAULT_BLKSIZE;
 	ctx->legacy_opts_show = true;
+	ctx->trusted = uid_eq(current_uid(), GLOBAL_ROOT_UID);
 
 #ifdef CONFIG_BLOCK
 	if (fsc->fs_type == &fuseblk_fs_type) {
-- 
2.39.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 11/11] splice: splice_to_socket: always request MSG_DONTWAIT
  2023-12-21  3:08 [PATCH v2 00/11] Avoid unprivileged splice(file->)/(->socket) pipe exclusion Ahelenia Ziemiańska
                   ` (9 preceding siblings ...)
  2023-12-21  3:09 ` [PATCH v2 10/11] fuse: allow splicing from filesystems mounted by real root Ahelenia Ziemiańska
@ 2023-12-21  3:09 ` Ahelenia Ziemiańska
  2023-12-21  3:09 ` [PATCH v2 12/11 man-pages] splice.2: document 6.8 blocking behaviour Ahelenia Ziemiańska
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Ahelenia Ziemiańska @ 2023-12-21  3:09 UTC (permalink / raw)
  Cc: Jens Axboe, Christian Brauner, Alexander Viro, linux-fsdevel,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1591 bytes --]

The pipe is locked at the top of the function, so sock_sendmsg
sleeps for space with the pipe lock held ‒ given:
	cat > to_socket.c <<^D
	#define _GNU_SOURCE
	#include <fcntl.h>
	#include <unistd.h>
	#include <sys/socket.h>
	#include <sys/un.h>
	int main()
	{
		int sp[2];
		socketpair(AF_UNIX, SOCK_STREAM | SOCK_NONBLOCK, 0, sp);
		while(write(sp[1], sp, 1) == 1)
			;
		fcntl(sp[1], F_SETFL, 0);
		for (;;)
			splice(0, 0, sp[1], 0, 128 * 1024 * 1024, 0);
	}
	^D
	cc to_socket.c -o to_socket
	mkfifo fifo
	sleep 10 > fifo &
	./to_socket < fifo &
	echo zupa > fifo
to_socket used to sleep in splice and the shell used to enter an
uninterruptible sleep in closing the fifo in dup2(10, 1);
now the splice returns -EAGAIN and the whole program completes.

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
---
 fs/splice.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 9d29664f23ee..2871c6f9366f 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -869,13 +869,11 @@ ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out,
 		if (!bc)
 			break;
 
-		msg.msg_flags = MSG_SPLICE_PAGES;
+		msg.msg_flags = MSG_SPLICE_PAGES | MSG_DONTWAIT;
 		if (flags & SPLICE_F_MORE)
 			msg.msg_flags |= MSG_MORE;
 		if (remain && pipe_occupancy(pipe->head, tail) > 0)
 			msg.msg_flags |= MSG_MORE;
-		if (out->f_flags & O_NONBLOCK)
-			msg.msg_flags |= MSG_DONTWAIT;
 
 		iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, bvec, bc,
 			      len - remain);
-- 
2.39.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 12/11 man-pages] splice.2: document 6.8 blocking behaviour
  2023-12-21  3:08 [PATCH v2 00/11] Avoid unprivileged splice(file->)/(->socket) pipe exclusion Ahelenia Ziemiańska
                   ` (10 preceding siblings ...)
  2023-12-21  3:09 ` [PATCH v2 11/11] splice: splice_to_socket: always request MSG_DONTWAIT Ahelenia Ziemiańska
@ 2023-12-21  3:09 ` Ahelenia Ziemiańska
  2023-12-24  5:01 ` [PATCH v2 13/11] tty: splice_write: disable Ahelenia Ziemiańska
  2023-12-24  5:01 ` [PATCH v2 14/11] fuse: allow splicing to trusted mounts only Ahelenia Ziemiańska
  13 siblings, 0 replies; 25+ messages in thread
From: Ahelenia Ziemiańska @ 2023-12-21  3:09 UTC (permalink / raw)
  Cc: Jens Axboe, Christian Brauner, Alexander Viro, linux-fsdevel,
	Alejandro Colomar, linux-man, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2402 bytes --]

Hypothetical text that matches v2.

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
---
 man2/splice.2 | 47 +++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 43 insertions(+), 4 deletions(-)

diff --git a/man2/splice.2 b/man2/splice.2
index e5d05a05c..d2c7ac8d5 100644
--- a/man2/splice.2
+++ b/man2/splice.2
@@ -139,10 +139,11 @@ .SH ERRORS
 .B EAGAIN
 .B SPLICE_F_NONBLOCK
 was specified in
-.I flags
-or one of the file descriptors had been marked as nonblocking
-.RB ( O_NONBLOCK ) ,
-and the operation would block.
+.IR flags ,
+one of the file descriptors had been marked as nonblocking
+.RB ( O_NONBLOCK )
+and the operation would block,
+or splicing from an untrusted IPC mechanism and no data was available (see HISTORY below).
 .TP
 .B EBADF
 One or both file descriptors are not valid,
@@ -192,6 +193,44 @@ .SH HISTORY
 Since Linux 2.6.31,
 .\" commit 7c77f0b3f9208c339a4b40737bb2cb0f0319bb8d
 both arguments may refer to pipes.
+.P
+Between Linux 4.9 and 6.7,
+.\" commit 8924feff66f35fe22ce77aafe3f21eb8e5cff881
+splicing from a non-pipe to a pipe without
+.B SPLICE_F_NONBLOCK
+would hold the pipe lock and wait for data on the non-pipe.
+This isn't an issue for files, but if the non-pipe is a tty,
+or an IPC mechanism like a socket or a
+.BR fuse (4)
+filesystem, this means that a thread attempting any operation (like
+.BR open (2)/ read (2)/ write (2)/ close (2))
+on the pipe would enter uninterruptible sleep until data appeared,
+which may never happen.
+The same applies to splicing from a pipe to a full socket.
+.P
+Since Linux 6.8,
+.\" commit TBD
+splicing from ttys is disabled
+.RB ( EINVAL ),
+reads done when splicing from sockets happen in non-blocking mode
+(as-if
+.BR MSG_DONTWAIT ,
+returning
+.B EAGAIN
+if no data is available),
+and splicing from
+.BR fuse (4)
+filesystems is only allowed if they were mounted by
+root in the initial user namespace
+(this matches security semantics for normal filesystems).
+If a splice implementation is devised that doesn't need to lock the pipe
+while waiting for data, this may be reversed in a future version.
+Writes when splicing to sockets are also done non-blockingly
+(as-if
+.BR MSG_DONTWAIT ,
+returning
+.B EAGAIN
+if the socket is full).
 .SH NOTES
 The three system calls
 .BR splice (),
-- 
2.39.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 08/11] tty: splice_read: disable
  2023-12-21  3:09 ` [PATCH v2 08/11] tty: splice_read: disable Ahelenia Ziemiańska
@ 2023-12-21  8:10   ` Greg Kroah-Hartman
  2024-01-03 11:36   ` Jiri Slaby
  1 sibling, 0 replies; 25+ messages in thread
From: Greg Kroah-Hartman @ 2023-12-21  8:10 UTC (permalink / raw)
  To: Ahelenia Ziemiańska
  Cc: Jens Axboe, Christian Brauner, Alexander Viro, linux-fsdevel,
	Jiri Slaby, linux-kernel, linux-serial

On Thu, Dec 21, 2023 at 04:09:10AM +0100, Ahelenia Ziemiańska wrote:
> We request non-blocking I/O in the generic copy_splice_read, but
> "the tty layer doesn't actually honor the IOCB_NOWAIT flag for
> various historical reasons.". This means that a tty->pipe splice
> will happily sleep with the pipe locked forever, and any process
> trying to take it (due to an open/read/write/&c.) will enter
> uninterruptible sleep.
> 
> This also masks inconsistent wake-ups (usually every second line)
> when splicing from ttys in icanon mode.
> 
> Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wimmqG_wvSRtMiKPeGGDL816n65u=Mq2+H3-=uM2U6FmA@mail.gmail.com/
> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
> ---

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 01/11] splice: copy_splice_read: do the I/O with IOCB_NOWAIT
  2023-12-21  3:08 ` [PATCH v2 01/11] splice: copy_splice_read: do the I/O with IOCB_NOWAIT Ahelenia Ziemiańska
@ 2023-12-21  8:27   ` Christoph Hellwig
  2023-12-21 16:30     ` Ahelenia Ziemiańska
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2023-12-21  8:27 UTC (permalink / raw)
  To: Ahelenia Ziemiańska
  Cc: Jens Axboe, Christian Brauner, Alexander Viro, linux-fsdevel,
	linux-kernel

On Thu, Dec 21, 2023 at 04:08:45AM +0100, Ahelenia Ziemiańska wrote:
> Otherwise we risk sleeping with the pipe locked for indeterminate

You can't just assume that any ->read_iter support IOCB_NOWAIT.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 01/11] splice: copy_splice_read: do the I/O with IOCB_NOWAIT
  2023-12-21  8:27   ` Christoph Hellwig
@ 2023-12-21 16:30     ` Ahelenia Ziemiańska
  0 siblings, 0 replies; 25+ messages in thread
From: Ahelenia Ziemiańska @ 2023-12-21 16:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Christian Brauner, Alexander Viro, linux-fsdevel,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 12239 bytes --]

On Thu, Dec 21, 2023 at 12:27:12AM -0800, Christoph Hellwig wrote:
> On Thu, Dec 21, 2023 at 04:08:45AM +0100, Ahelenia Ziemiańska wrote:
> > Otherwise we risk sleeping with the pipe locked for indeterminate
> You can't just assume that any ->read_iter support IOCB_NOWAIT.
Let's see.

zero_fops	drivers/char/mem.c:     .splice_read    = copy_splice_read,
full_fops	drivers/char/mem.c:     .splice_read    = copy_splice_read,

random_fops	drivers/char/random.c:  .splice_read = copy_splice_read,
random_read_iter checks
urandom_fops	drivers/char/random.c:  .splice_read = copy_splice_read,
urandom_read_iter returns instantly

if ((in->f_flags & O_DIRECT) || IS_DAX(in->f_mapping->host))
fs/splice.c:            return copy_splice_read(in, ppos, pipe, len, flags);
FMODE_CAN_ODIRECT is set by filesystems and blockdevs, so trusted

fs/9p/vfs_file.c:               return copy_splice_read(in, ppos, pipe, len, flags);
fs/ceph/file.c:         return copy_splice_read(in, ppos, pipe, len, flags);
fs/ceph/file.c:         return copy_splice_read(in, ppos, pipe, len, flags);
fs/gfs2/file.c: .splice_read    = copy_splice_read,
fs/gfs2/file.c: .splice_read    = copy_splice_read,
fs/kernfs/file.c:       .splice_read    = copy_splice_read,
fs/smb/client/cifsfs.c: .splice_read = copy_splice_read,
fs/smb/client/cifsfs.c: .splice_read = copy_splice_read,
fs/proc/inode.c:        .splice_read    = copy_splice_read,
fs/proc/inode.c:        .splice_read    = copy_splice_read,
fs/proc/proc_sysctl.c:  .splice_read    = copy_splice_read,
fs/proc_namespace.c:    .splice_read    = copy_splice_read,
fs/proc_namespace.c:    .splice_read    = copy_splice_read,
fs/proc_namespace.c:    .splice_read    = copy_splice_read,
filesystems => trusted

tracing_fops
kernel/trace/trace.c:   .splice_read    = copy_splice_read,
used in /sys/kernel/debug/tracing/per_cpu/cpu*/trace
    and /sys/kernel/debug/tracing/trace
which are seq_read_iter and even if they did block,
it's in tracefs so same logic as tracing_buffers_splice_read applies.

net/socket.c:           return copy_splice_read(file, ppos, pipe, len, flags);
this is the default implementation for protocols without explicit
splice_reads, and sock_read_iter translates IOCB_NOWAIT into
MSG_DONTAIT.


So I think I can, because the ~three implementations that we want
to constrain do support it. If anything, this hints to me that to
yield a more consistent API that doesn't arbitrarily distinguish
between O_DIRECT files with and without IOCB_NOWAIT support, something
to the effect of the following diff may be used.

diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index 11cd8d23f6f2..dc42837ee0af 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -392,7 +392,7 @@ static ssize_t v9fs_file_splice_read(struct file *in, loff_t *ppos,
 		 fid->fid, len, *ppos);
 
 	if (fid->mode & P9L_DIRECT)
-		return copy_splice_read(in, ppos, pipe, len, flags);
+		return copy_splice_read_sleepok(in, ppos, pipe, len, flags);
 	return filemap_splice_read(in, ppos, pipe, len, flags);
 }
 
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 3b5aae29e944..9a4679013135 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -2209,7 +2209,7 @@ static ssize_t ceph_splice_read(struct file *in, loff_t *ppos,
 
 	if (ceph_has_inline_data(ci) ||
 	    (fi->flags & CEPH_F_SYNC))
-		return copy_splice_read(in, ppos, pipe, len, flags);
+		return copy_splice_read_sleepok(in, ppos, pipe, len, flags);
 
 	ceph_start_io_read(inode);
 
@@ -2228,7 +2228,7 @@ static ssize_t ceph_splice_read(struct file *in, loff_t *ppos,
 
 		ceph_put_cap_refs(ci, got);
 		ceph_end_io_read(inode);
-		return copy_splice_read(in, ppos, pipe, len, flags);
+		return copy_splice_read_sleepok(in, ppos, pipe, len, flags);
 	}
 
 	dout("splice_read %p %llx.%llx %llu~%zu got cap refs on %s\n",
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 4b66efc1a82a..5b0cbb6b95c4 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1581,7 +1581,7 @@ const struct file_operations gfs2_file_fops = {
 	.fsync		= gfs2_fsync,
 	.lock		= gfs2_lock,
 	.flock		= gfs2_flock,
-	.splice_read	= copy_splice_read,
+	.splice_read	= copy_splice_read_sleepok,
 	.splice_write	= gfs2_file_splice_write,
 	.setlease	= simple_nosetlease,
 	.fallocate	= gfs2_fallocate,
@@ -1612,7 +1612,7 @@ const struct file_operations gfs2_file_fops_nolock = {
 	.open		= gfs2_open,
 	.release	= gfs2_release,
 	.fsync		= gfs2_fsync,
-	.splice_read	= copy_splice_read,
+	.splice_read	= copy_splice_read_sleepok,
 	.splice_write	= gfs2_file_splice_write,
 	.setlease	= generic_setlease,
 	.fallocate	= gfs2_fallocate,
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index f0cb729e9a97..f0b6e85b2c5b 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -989,7 +989,7 @@ const struct file_operations kernfs_file_fops = {
 	.release	= kernfs_fop_release,
 	.poll		= kernfs_fop_poll,
 	.fsync		= noop_fsync,
-	.splice_read	= copy_splice_read,
+	.splice_read	= copy_splice_read_sleepok,
 	.splice_write	= iter_file_splice_write,
 };
 
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index b33e490e3fd9..7ec2f4653299 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -588,7 +588,7 @@ static const struct file_operations proc_iter_file_ops = {
 	.llseek		= proc_reg_llseek,
 	.read_iter	= proc_reg_read_iter,
 	.write		= proc_reg_write,
-	.splice_read	= copy_splice_read,
+	.splice_read	= copy_splice_read_sleepok,
 	.poll		= proc_reg_poll,
 	.unlocked_ioctl	= proc_reg_unlocked_ioctl,
 	.mmap		= proc_reg_mmap,
@@ -614,7 +614,7 @@ static const struct file_operations proc_reg_file_ops_compat = {
 static const struct file_operations proc_iter_file_ops_compat = {
 	.llseek		= proc_reg_llseek,
 	.read_iter	= proc_reg_read_iter,
-	.splice_read	= copy_splice_read,
+	.splice_read	= copy_splice_read_sleepok,
 	.write		= proc_reg_write,
 	.poll		= proc_reg_poll,
 	.unlocked_ioctl	= proc_reg_unlocked_ioctl,
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 8064ea76f80b..11d26fd14e7d 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -864,7 +864,7 @@ static const struct file_operations proc_sys_file_operations = {
 	.poll		= proc_sys_poll,
 	.read_iter	= proc_sys_read,
 	.write_iter	= proc_sys_write,
-	.splice_read	= copy_splice_read,
+	.splice_read	= copy_splice_read_sleepok,
 	.splice_write	= iter_file_splice_write,
 	.llseek		= default_llseek,
 };
diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
index 250eb5bf7b52..e9d19a856dd7 100644
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -324,7 +324,7 @@ static int mountstats_open(struct inode *inode, struct file *file)
 const struct file_operations proc_mounts_operations = {
 	.open		= mounts_open,
 	.read_iter	= seq_read_iter,
-	.splice_read	= copy_splice_read,
+	.splice_read	= copy_splice_read_sleepok,
 	.llseek		= seq_lseek,
 	.release	= mounts_release,
 	.poll		= mounts_poll,
@@ -333,7 +333,7 @@ const struct file_operations proc_mounts_operations = {
 const struct file_operations proc_mountinfo_operations = {
 	.open		= mountinfo_open,
 	.read_iter	= seq_read_iter,
-	.splice_read	= copy_splice_read,
+	.splice_read	= copy_splice_read_sleepok,
 	.llseek		= seq_lseek,
 	.release	= mounts_release,
 	.poll		= mounts_poll,
@@ -342,7 +342,7 @@ const struct file_operations proc_mountinfo_operations = {
 const struct file_operations proc_mountstats_operations = {
 	.open		= mountstats_open,
 	.read_iter	= seq_read_iter,
-	.splice_read	= copy_splice_read,
+	.splice_read	= copy_splice_read_sleepok,
 	.llseek		= seq_lseek,
 	.release	= mounts_release,
 };
diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
index 2131638f26d0..5f9fb3ce3bcb 100644
--- a/fs/smb/client/cifsfs.c
+++ b/fs/smb/client/cifsfs.c
@@ -1561,7 +1561,7 @@ const struct file_operations cifs_file_direct_ops = {
 	.fsync = cifs_fsync,
 	.flush = cifs_flush,
 	.mmap = cifs_file_mmap,
-	.splice_read = copy_splice_read,
+	.splice_read = copy_splice_read_sleepok,
 	.splice_write = iter_file_splice_write,
 	.unlocked_ioctl  = cifs_ioctl,
 	.copy_file_range = cifs_copy_file_range,
@@ -1615,7 +1615,7 @@ const struct file_operations cifs_file_direct_nobrl_ops = {
 	.fsync = cifs_fsync,
 	.flush = cifs_flush,
 	.mmap = cifs_file_mmap,
-	.splice_read = copy_splice_read,
+	.splice_read = copy_splice_read_sleepok,
 	.splice_write = iter_file_splice_write,
 	.unlocked_ioctl  = cifs_ioctl,
 	.copy_file_range = cifs_copy_file_range,
diff --git a/fs/splice.c b/fs/splice.c
index 2871c6f9366f..90ebcf236c05 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -298,12 +298,14 @@ void splice_shrink_spd(struct splice_pipe_desc *spd)
 }
 
 /**
- * copy_splice_read -  Copy data from a file and splice the copy into a pipe
+ * __copy_splice_read -  Copy data from a file and splice the copy into a pipe
  * @in: The file to read from
  * @ppos: Pointer to the file position to read from
  * @pipe: The pipe to splice into
  * @len: The amount to splice
  * @flags: The SPLICE_F_* flags
+ * @sleepok: Set if splicing from a trusted filesystem,
+ *           don't set if splicing from an IPC mechanism
  *
  * This function allocates a bunch of pages sufficient to hold the requested
  * amount of data (but limited by the remaining pipe capacity), passes it to
@@ -317,10 +319,11 @@ void splice_shrink_spd(struct splice_pipe_desc *spd)
  * if the pipe has insufficient space, we reach the end of the data or we hit a
  * hole.
  */
-ssize_t copy_splice_read(struct file *in, loff_t *ppos,
-			 struct pipe_inode_info *pipe,
-			 size_t len, unsigned int flags)
+static ssize_t __copy_splice_read(struct file *in, loff_t *ppos,
+				  struct pipe_inode_info *pipe,
+				  size_t len, unsigned int flags, bool sleepok)
 {
+	printk("__copy_splice_read(%d)\n", sleepok);
 	struct iov_iter to;
 	struct bio_vec *bv;
 	struct kiocb kiocb;
@@ -361,7 +364,8 @@ ssize_t copy_splice_read(struct file *in, loff_t *ppos,
 	iov_iter_bvec(&to, ITER_DEST, bv, npages, len);
 	init_sync_kiocb(&kiocb, in);
 	kiocb.ki_pos = *ppos;
-	kiocb.ki_flags |= IOCB_NOWAIT;
+	if (!sleepok)
+		kiocb.ki_flags |= IOCB_NOWAIT;
 	ret = call_read_iter(in, &kiocb, &to);
 
 	if (ret > 0) {
@@ -399,8 +403,21 @@ ssize_t copy_splice_read(struct file *in, loff_t *ppos,
 	kfree(bv);
 	return ret;
 }
+
+ssize_t copy_splice_read(struct file *in, loff_t *ppos,
+			 struct pipe_inode_info *pipe,
+			 size_t len, unsigned int flags) {
+	return __copy_splice_read(in, ppos, pipe, len, flags, false);
+}
 EXPORT_SYMBOL(copy_splice_read);
 
+ssize_t copy_splice_read_sleepok(struct file *in, loff_t *ppos,
+				 struct pipe_inode_info *pipe,
+				 size_t len, unsigned int flags) {
+	return __copy_splice_read(in, ppos, pipe, len, flags, true);
+}
+EXPORT_SYMBOL(copy_splice_read_sleepok);
+
 const struct pipe_buf_operations default_pipe_buf_ops = {
 	.release	= generic_pipe_buf_release,
 	.try_steal	= generic_pipe_buf_try_steal,
@@ -988,7 +1005,7 @@ long vfs_splice_read(struct file *in, loff_t *ppos,
 	 * buffer, copy into it and splice that into the pipe.
 	 */
 	if ((in->f_flags & O_DIRECT) || IS_DAX(in->f_mapping->host))
-		return copy_splice_read(in, ppos, pipe, len, flags);
+		return copy_splice_read_sleepok(in, ppos, pipe, len, flags);
 	return in->f_op->splice_read(in, ppos, pipe, len, flags);
 }
 EXPORT_SYMBOL_GPL(vfs_splice_read);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 98b7a7a8c42e..0980bf6ba8fd 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2989,6 +2989,9 @@ ssize_t filemap_splice_read(struct file *in, loff_t *ppos,
 ssize_t copy_splice_read(struct file *in, loff_t *ppos,
 			 struct pipe_inode_info *pipe,
 			 size_t len, unsigned int flags);
+ssize_t copy_splice_read_sleepok(struct file *in, loff_t *ppos,
+			 struct pipe_inode_info *pipe,
+			 size_t len, unsigned int flags);
 extern ssize_t iter_file_splice_write(struct pipe_inode_info *,
 		struct file *, loff_t *, size_t, unsigned int);
 extern long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 13/11] tty: splice_write: disable
  2023-12-21  3:08 [PATCH v2 00/11] Avoid unprivileged splice(file->)/(->socket) pipe exclusion Ahelenia Ziemiańska
                   ` (11 preceding siblings ...)
  2023-12-21  3:09 ` [PATCH v2 12/11 man-pages] splice.2: document 6.8 blocking behaviour Ahelenia Ziemiańska
@ 2023-12-24  5:01 ` Ahelenia Ziemiańska
  2023-12-24  5:01 ` [PATCH v2 14/11] fuse: allow splicing to trusted mounts only Ahelenia Ziemiańska
  13 siblings, 0 replies; 25+ messages in thread
From: Ahelenia Ziemiańska @ 2023-12-24  5:01 UTC (permalink / raw)
  Cc: Jens Axboe, Christian Brauner, Alexander Viro, linux-fsdevel,
	linux-kernel, Greg Kroah-Hartman, Jiri Slaby, linux-serial

[-- Attachment #1: Type: text/plain, Size: 2606 bytes --]

Given:
	cat > ttyW.c <<^D
	#define _GNU_SOURCE
	#include <fcntl.h>
	#include <stdlib.h>
	int main()
	{
		int pt = posix_openpt(O_RDWR);
		grantpt(pt);
		unlockpt(pt);
		int cl = open(ptsname(pt), O_WRONLY);
		for (;;)
			splice(0, 0, cl, 0, 128 * 1024 * 1024, 0);
	}
	^D
	cc ttyW.c -o ttyW
	mkfifo fifo
	truncate 32M 32M
	./ttyW < fifo &
	cp 32M fifo &
	sleep 0.1
	read -r _ < fifo
ttyW used to sleep in splice and the shell used to enter an
uninterruptible sleep in open("fifo");
now the splice returns -EINVAL and the whole program completes.

This is also symmetric with the splice_read removal.

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
---
It hit me that I should actually probably exhaustively re-evaluate
splice_write as well since re-evaluating splice_read went so well.

fs/fuse/dev.c:  .splice_write   = fuse_dev_splice_write,
drivers/char/virtio_console.c:  .splice_write = port_fops_splice_write,
locks, takes some pages, unlocks, writes, so OK

drivers/char/mem.c:	.splice_write	= splice_write_null,
drivers/char/mem.c:	.splice_write	= splice_write_zero,
no-op

drivers/char/random.c:	.splice_write = iter_file_splice_write,
drivers/char/random.c:	.splice_write = iter_file_splice_write,
AFAICT write_pool_user is okay to invoke like this?

net/socket.c:   .splice_write = splice_to_socket,
already dealt with in 11/11

drivers/tty/tty_io.c:   .splice_write   = iter_file_splice_write,
drivers/tty/tty_io.c:   .splice_write   = iter_file_splice_write,
they do lock the pipe and try the write with the lock held;
we already killed splice_read so just kill splice_write for symmetry
(13/11)

fs/fuse/file.c: .splice_write   = iter_file_splice_write,
same logic as splice_read applies (14/11)

 drivers/tty/tty_io.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 50c2957a9c7f..d931c34ddcbf 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -465,7 +465,6 @@ static const struct file_operations tty_fops = {
 	.llseek		= no_llseek,
 	.read_iter	= tty_read,
 	.write_iter	= tty_write,
-	.splice_write	= iter_file_splice_write,
 	.poll		= tty_poll,
 	.unlocked_ioctl	= tty_ioctl,
 	.compat_ioctl	= tty_compat_ioctl,
@@ -479,7 +478,6 @@ static const struct file_operations console_fops = {
 	.llseek		= no_llseek,
 	.read_iter	= tty_read,
 	.write_iter	= redirected_tty_write,
-	.splice_write	= iter_file_splice_write,
 	.poll		= tty_poll,
 	.unlocked_ioctl	= tty_ioctl,
 	.compat_ioctl	= tty_compat_ioctl,
-- 
2.39.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 14/11] fuse: allow splicing to trusted mounts only
  2023-12-21  3:08 [PATCH v2 00/11] Avoid unprivileged splice(file->)/(->socket) pipe exclusion Ahelenia Ziemiańska
                   ` (12 preceding siblings ...)
  2023-12-24  5:01 ` [PATCH v2 13/11] tty: splice_write: disable Ahelenia Ziemiańska
@ 2023-12-24  5:01 ` Ahelenia Ziemiańska
  13 siblings, 0 replies; 25+ messages in thread
From: Ahelenia Ziemiańska @ 2023-12-24  5:01 UTC (permalink / raw)
  Cc: Jens Axboe, Christian Brauner, Alexander Viro, linux-fsdevel,
	linux-kernel, Miklos Szeredi

[-- Attachment #1: Type: text/plain, Size: 1735 bytes --]

FUSE tends to be installed suid 0: this allows normal users to mount
anything, including a program whose write implementation consists
of for(;;) sleep(1);, which, if splice were allowed, would sleep
forever with the pipe lock held.

Normal filesystems can only be mounted by root, and are thus deemed
safe. Extend this to when root mounts a FUSE filesystem and to
virtiofs, mirroring the splice_read "trusted" logic.

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
---
 fs/fuse/file.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 20bb16ddfcc9..62308af13396 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -3215,6 +3215,21 @@ static long fuse_splice_read(struct file *in, loff_t *ppos,
 	return -EINVAL;
 }
 
+static ssize_t
+fuse_splice_write(struct pipe_inode_info *pipe, struct file *out,
+		  loff_t *ppos, size_t len, unsigned int flags)
+{
+	struct inode *inode = file_inode(out);
+
+	if (fuse_is_bad(inode))
+		return -EIO;
+
+	if (get_fuse_conn(inode)->trusted)
+		return iter_file_splice_write(pipe, out, ppos, len, flags);
+
+	return -EINVAL;
+}
+
 static const struct file_operations fuse_file_operations = {
 	.llseek		= fuse_file_llseek,
 	.read_iter	= fuse_file_read_iter,
@@ -3228,7 +3243,7 @@ static const struct file_operations fuse_file_operations = {
 	.get_unmapped_area = thp_get_unmapped_area,
 	.flock		= fuse_file_flock,
 	.splice_read	= fuse_splice_read,
-	.splice_write	= iter_file_splice_write,
+	.splice_write	= fuse_splice_write,
 	.unlocked_ioctl	= fuse_file_ioctl,
 	.compat_ioctl	= fuse_file_compat_ioctl,
 	.poll		= fuse_file_poll,
-- 
2.39.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 08/11] tty: splice_read: disable
  2023-12-21  3:09 ` [PATCH v2 08/11] tty: splice_read: disable Ahelenia Ziemiańska
  2023-12-21  8:10   ` Greg Kroah-Hartman
@ 2024-01-03 11:36   ` Jiri Slaby
  2024-01-03 19:14     ` Linus Torvalds
  1 sibling, 1 reply; 25+ messages in thread
From: Jiri Slaby @ 2024-01-03 11:36 UTC (permalink / raw)
  To: Ahelenia Ziemiańska, torvalds
  Cc: Jens Axboe, Christian Brauner, Alexander Viro, linux-fsdevel,
	Greg Kroah-Hartman, linux-kernel, linux-serial

On 21. 12. 23, 4:09, Ahelenia Ziemiańska wrote:
> We request non-blocking I/O in the generic copy_splice_read, but
> "the tty layer doesn't actually honor the IOCB_NOWAIT flag for
> various historical reasons.". This means that a tty->pipe splice
> will happily sleep with the pipe locked forever, and any process
> trying to take it (due to an open/read/write/&c.) will enter
> uninterruptible sleep.
> 
> This also masks inconsistent wake-ups (usually every second line)
> when splicing from ttys in icanon mode.
> 
> Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wimmqG_wvSRtMiKPeGGDL816n65u=Mq2+H3-=uM2U6FmA@mail.gmail.com/
> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
> ---
>   drivers/tty/tty_io.c | 2 --
>   1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
> index 06414e43e0b5..50c2957a9c7f 100644
> --- a/drivers/tty/tty_io.c
> +++ b/drivers/tty/tty_io.c
> @@ -465,7 +465,6 @@ static const struct file_operations tty_fops = {
>   	.llseek		= no_llseek,
>   	.read_iter	= tty_read,
>   	.write_iter	= tty_write,
> -	.splice_read	= copy_splice_read,
>   	.splice_write	= iter_file_splice_write,

This and the other patch effectively reverts dd78b0c483e33 and 
9bb48c82aced0. I.e. it breaks "things". Especially:

commit 9bb48c82aced07698a2d08ee0f1475a6c4f6b266
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Tue Jan 19 11:41:16 2021 -0800

     tty: implement write_iter

     This makes the tty layer use the .write_iter() function instead of the
     traditional .write() functionality.

     That allows writev(), but more importantly also makes it possible to
     enable .splice_write() for ttys, reinstating the "splice to tty"
     functionality that was lost in commit 36e2c7421f02 ("fs: don't allow
     splice read/write without explicit ops").

     Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without 
explicit ops")


What are those "things" doing that "splice to tty", I don't recall and 
the commit message above ^^^ does not spell that out. Linus?

thanks,
-- 
js


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 08/11] tty: splice_read: disable
  2024-01-03 11:36   ` Jiri Slaby
@ 2024-01-03 19:14     ` Linus Torvalds
  2024-01-03 21:34       ` Oliver Giles
  0 siblings, 1 reply; 25+ messages in thread
From: Linus Torvalds @ 2024-01-03 19:14 UTC (permalink / raw)
  To: Jiri Slaby, Oliver Giles
  Cc: Ahelenia Ziemiańska, Jens Axboe, Christian Brauner,
	Alexander Viro, linux-fsdevel, Greg Kroah-Hartman, linux-kernel,
	linux-serial

On Wed, 3 Jan 2024 at 03:36, Jiri Slaby <jirislaby@kernel.org> wrote:
>
> What are those "things" doing that "splice to tty", I don't recall and
> the commit message above ^^^ does not spell that out. Linus?

It's some annoying SSL VPN thing that splices to pppd:

   https://lore.kernel.org/all/C8KER7U60WXE.25UFD8RE6QZQK@oguc/

and I'd be happy to try to limit splice to tty's to maybe just the one
case that pppd uses.

So I don't think we should remove splice_write for tty's entirely, but
maybe we can limit it to only the case that the VPN thing used.

I never saw the original issue personally, and I think only Oliver
reported it, so cc'ing Oliver.

Maybe that VPN thing already has the pty in non-blocking mode, for
example, and we could make the tty splicing fail for any blocking op?

                Linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 08/11] tty: splice_read: disable
  2024-01-03 19:14     ` Linus Torvalds
@ 2024-01-03 21:34       ` Oliver Giles
  2024-01-03 21:57         ` Linus Torvalds
  0 siblings, 1 reply; 25+ messages in thread
From: Oliver Giles @ 2024-01-03 21:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jiri Slaby, Ahelenia Ziemiańska, Jens Axboe,
	Christian Brauner, Alexander Viro, linux-fsdevel,
	Greg Kroah-Hartman, linux-kernel, linux-serial


On Wed, Jan 3 2024 at 11:14:59 -08:00:00, Linus Torvalds 
<torvalds@linux-foundation.org> wrote:
> 
> It's some annoying SSL VPN thing that splices to pppd:
> 
>    https://lore.kernel.org/all/C8KER7U60WXE.25UFD8RE6QZQK@oguc/

I'm happy to report that that particular SSL VPN tool is no longer 
around.
And it had anyway grown a fall-back-to-read/write in case splice() 
fails.
So at least from my perspective, no objections to splice-to-tty going 
away
altogether.

> and I'd be happy to try to limit splice to tty's to maybe just the one
> case that pppd uses.

To be exact, pppd is just providing a pty with which other (now all 
extinct?)
applications can do nefarious things.

> Maybe that VPN thing already has the pty in non-blocking mode, for
> example, and we could make the tty splicing fail for any blocking op?

FWIW, the SSL VPN tool did indeed have the pty in non-blocking mode.

Oliver




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 08/11] tty: splice_read: disable
  2024-01-03 21:34       ` Oliver Giles
@ 2024-01-03 21:57         ` Linus Torvalds
  0 siblings, 0 replies; 25+ messages in thread
From: Linus Torvalds @ 2024-01-03 21:57 UTC (permalink / raw)
  To: Oliver Giles
  Cc: Jiri Slaby, Ahelenia Ziemiańska, Jens Axboe,
	Christian Brauner, Alexander Viro, linux-fsdevel,
	Greg Kroah-Hartman, linux-kernel, linux-serial

On Wed, 3 Jan 2024 at 13:34, Oliver Giles <ohw.giles@gmail.com> wrote:
>
> I'm happy to report that that particular SSL VPN tool is no longer
> around.

Ahh, well that simplifies things and we can then just remove the tty
splice support again.

Of course, maybe then somebody else will report on some other odd
user, but ... fingers crossed.

                Linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 09/11] fuse: file: limit splice_read to virtiofs
  2023-12-21  3:09 ` [PATCH v2 09/11] fuse: file: limit splice_read to virtiofs Ahelenia Ziemiańska
@ 2024-01-10 13:43   ` Miklos Szeredi
  2024-01-10 15:19     ` Ahelenia Ziemiańska
  0 siblings, 1 reply; 25+ messages in thread
From: Miklos Szeredi @ 2024-01-10 13:43 UTC (permalink / raw)
  To: Ahelenia Ziemiańska
  Cc: Jens Axboe, Christian Brauner, Alexander Viro, linux-fsdevel,
	Vivek Goyal, Stefan Hajnoczi, linux-kernel, virtualization

On Thu, 21 Dec 2023 at 04:09, Ahelenia Ziemiańska
<nabijaczleweli@nabijaczleweli.xyz> wrote:
>
> Potentially-blocking splice_reads are allowed for normal filesystems
> like NFS because they're blessed by root.
>
> FUSE is commonly used suid-root, and allows anyone to trivially create
> a file that, when spliced from, will just sleep forever with the pipe
> lock held.
>
> The only way IPC to the fusing process could be avoided is if
> !(ff->open_flags & FOPEN_DIRECT_IO) and the range was already cached
> and we weren't past the end. Just refuse it.

How is this not going to cause regressions out there?

We need to find an alternative to refusing splice, since this is not
going to fly, IMO.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 09/11] fuse: file: limit splice_read to virtiofs
  2024-01-10 13:43   ` Miklos Szeredi
@ 2024-01-10 15:19     ` Ahelenia Ziemiańska
  2024-01-10 15:47       ` Miklos Szeredi
  0 siblings, 1 reply; 25+ messages in thread
From: Ahelenia Ziemiańska @ 2024-01-10 15:19 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Christian Brauner, Alexander Viro, linux-fsdevel,
	Vivek Goyal, Stefan Hajnoczi, linux-kernel, virtualization

[-- Attachment #1: Type: text/plain, Size: 1120 bytes --]

On Wed, Jan 10, 2024 at 02:43:04PM +0100, Miklos Szeredi wrote:
> On Thu, 21 Dec 2023 at 04:09, Ahelenia Ziemiańska
> <nabijaczleweli@nabijaczleweli.xyz> wrote:
> > Potentially-blocking splice_reads are allowed for normal filesystems
> > like NFS because they're blessed by root.
> >
> > FUSE is commonly used suid-root, and allows anyone to trivially create
> > a file that, when spliced from, will just sleep forever with the pipe
> > lock held.
> >
> > The only way IPC to the fusing process could be avoided is if
> > !(ff->open_flags & FOPEN_DIRECT_IO) and the range was already cached
> > and we weren't past the end. Just refuse it.
> How is this not going to cause regressions out there?
In "[PATCH v2 14/11] fuse: allow splicing to trusted mounts only"
splicing is re-enabled for mounts made by the real root.

> We need to find an alternative to refusing splice, since this is not
> going to fly, IMO.
The alternative is to not hold the lock. See the references in the
cover letter for why this wasn't done. IMO a potential slight perf
hit flies more than a total exclusion on the pipe.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 09/11] fuse: file: limit splice_read to virtiofs
  2024-01-10 15:19     ` Ahelenia Ziemiańska
@ 2024-01-10 15:47       ` Miklos Szeredi
  0 siblings, 0 replies; 25+ messages in thread
From: Miklos Szeredi @ 2024-01-10 15:47 UTC (permalink / raw)
  To: Ahelenia Ziemiańska
  Cc: Jens Axboe, Christian Brauner, Alexander Viro, linux-fsdevel,
	Vivek Goyal, Stefan Hajnoczi, linux-kernel, virtualization

On Wed, 10 Jan 2024 at 16:19, Ahelenia Ziemiańska
<nabijaczleweli@nabijaczleweli.xyz> wrote:

> > We need to find an alternative to refusing splice, since this is not
> > going to fly, IMO.
> The alternative is to not hold the lock. See the references in the
> cover letter for why this wasn't done. IMO a potential slight perf
> hit flies more than a total exclusion on the pipe.

IDGI.  This will make splice(2) return EINVAL for unprivileged fuse
files, right?

That would be a regression, not a perf hit, if the application is not
falling back to plain read; a reasonable scenario, considering splice
from files (including fuse) has worked on linux for a *long* time.

Thanks,
Mikos

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2024-01-10 15:47 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-21  3:08 [PATCH v2 00/11] Avoid unprivileged splice(file->)/(->socket) pipe exclusion Ahelenia Ziemiańska
2023-12-21  3:08 ` [PATCH v2 01/11] splice: copy_splice_read: do the I/O with IOCB_NOWAIT Ahelenia Ziemiańska
2023-12-21  8:27   ` Christoph Hellwig
2023-12-21 16:30     ` Ahelenia Ziemiańska
2023-12-21  3:08 ` [PATCH v2 02/11] af_unix: unix_stream_splice_read: always request MSG_DONTWAIT Ahelenia Ziemiańska
2023-12-21  3:08 ` [PATCH v2 03/11] fuse: fuse_dev_splice_read: use nonblocking I/O Ahelenia Ziemiańska
2023-12-21  3:09 ` [PATCH v2 04/11] net/smc: smc_splice_read: always request MSG_DONTWAIT Ahelenia Ziemiańska
2023-12-21  3:09 ` [PATCH v2 05/11] kcm: kcm_splice_read: " Ahelenia Ziemiańska
2023-12-21  3:09 ` [PATCH v2 06/11] tls/sw: tls_sw_splice_read: always request non-blocking I/O Ahelenia Ziemiańska
2023-12-21  3:09 ` [PATCH v2 07/11] net/tcp: tcp_splice_read: always do non-blocking reads Ahelenia Ziemiańska
2023-12-21  3:09 ` [PATCH v2 08/11] tty: splice_read: disable Ahelenia Ziemiańska
2023-12-21  8:10   ` Greg Kroah-Hartman
2024-01-03 11:36   ` Jiri Slaby
2024-01-03 19:14     ` Linus Torvalds
2024-01-03 21:34       ` Oliver Giles
2024-01-03 21:57         ` Linus Torvalds
2023-12-21  3:09 ` [PATCH v2 09/11] fuse: file: limit splice_read to virtiofs Ahelenia Ziemiańska
2024-01-10 13:43   ` Miklos Szeredi
2024-01-10 15:19     ` Ahelenia Ziemiańska
2024-01-10 15:47       ` Miklos Szeredi
2023-12-21  3:09 ` [PATCH v2 10/11] fuse: allow splicing from filesystems mounted by real root Ahelenia Ziemiańska
2023-12-21  3:09 ` [PATCH v2 11/11] splice: splice_to_socket: always request MSG_DONTWAIT Ahelenia Ziemiańska
2023-12-21  3:09 ` [PATCH v2 12/11 man-pages] splice.2: document 6.8 blocking behaviour Ahelenia Ziemiańska
2023-12-24  5:01 ` [PATCH v2 13/11] tty: splice_write: disable Ahelenia Ziemiańska
2023-12-24  5:01 ` [PATCH v2 14/11] fuse: allow splicing to trusted mounts only Ahelenia Ziemiańska

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.