* [PATCH net-next v6 01/14] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace
2023-06-07 18:19 [PATCH net-next v6 00/14] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
@ 2023-06-07 18:19 ` David Howells
2023-06-07 18:19 ` [PATCH net-next v6 02/14] tls: Allow MSG_SPLICE_PAGES but treat it as normal sendmsg David Howells
` (13 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: David Howells @ 2023-06-07 18:19 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel
It is necessary to allow MSG_SENDPAGE_* to be passed into ->sendmsg() to
allow sendmsg(MSG_SPLICE_PAGES) to replace ->sendpage(). Unblocking them
in the network protocol, however, allows these flags to be passed in by
userspace too[1].
Fix this by marking MSG_SENDPAGE_NOPOLICY, MSG_SENDPAGE_NOTLAST and
MSG_SENDPAGE_DECRYPTED as internal flags, which causes sendmsg() to object
if they are passed to sendmsg() by userspace. Network protocol ->sendmsg()
implementations can then allow them through.
Note that it should be possible to remove MSG_SENDPAGE_NOTLAST once
sendpage is removed as a whole slew of pages will be passed in in one go by
splice through sendmsg, with MSG_MORE being set if it has more data waiting
in the pipe.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20230526181338.03a99016@kernel.org/ [1]
---
include/linux/socket.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/socket.h b/include/linux/socket.h
index bd1cc3238851..3fd3436bc09f 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -339,7 +339,9 @@ struct ucred {
#endif
/* Flags to be cleared on entry by sendmsg and sendmmsg syscalls */
-#define MSG_INTERNAL_SENDMSG_FLAGS (MSG_SPLICE_PAGES)
+#define MSG_INTERNAL_SENDMSG_FLAGS \
+ (MSG_SPLICE_PAGES | MSG_SENDPAGE_NOPOLICY | MSG_SENDPAGE_NOTLAST | \
+ MSG_SENDPAGE_DECRYPTED)
/* Setsockoptions(2) level. Thanks to BSD these must match IPPROTO_xxx */
#define SOL_IP 0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v6 02/14] tls: Allow MSG_SPLICE_PAGES but treat it as normal sendmsg
2023-06-07 18:19 [PATCH net-next v6 00/14] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
2023-06-07 18:19 ` [PATCH net-next v6 01/14] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace David Howells
@ 2023-06-07 18:19 ` David Howells
2023-06-07 18:19 ` [PATCH net-next v6 03/14] splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage() David Howells
` (12 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: David Howells @ 2023-06-07 18:19 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel
Allow MSG_SPLICE_PAGES to be specified to sendmsg() but treat it as normal
sendmsg for now. This means the data will just be copied until
MSG_SPLICE_PAGES is handled.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
net/tls/tls_device.c | 3 ++-
net/tls/tls_sw.c | 2 +-
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index a959572a816f..9ef766e41c7a 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -447,7 +447,8 @@ static int tls_push_data(struct sock *sk,
long timeo;
if (flags &
- ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST))
+ ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST |
+ MSG_SPLICE_PAGES))
return -EOPNOTSUPP;
if (unlikely(sk->sk_err))
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 1a53c8f481e9..38acc27a0dd0 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -955,7 +955,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
int pending;
if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
- MSG_CMSG_COMPAT))
+ MSG_CMSG_COMPAT | MSG_SPLICE_PAGES))
return -EOPNOTSUPP;
ret = mutex_lock_interruptible(&tls_ctx->tx_lock);
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v6 03/14] splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()
2023-06-07 18:19 [PATCH net-next v6 00/14] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
2023-06-07 18:19 ` [PATCH net-next v6 01/14] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace David Howells
2023-06-07 18:19 ` [PATCH net-next v6 02/14] tls: Allow MSG_SPLICE_PAGES but treat it as normal sendmsg David Howells
@ 2023-06-07 18:19 ` David Howells
2023-06-07 18:19 ` [PATCH net-next v6 04/14] splice, net: Add a splice_eof op to file-ops and socket-ops David Howells
` (11 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: David Howells @ 2023-06-07 18:19 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel
Replace generic_splice_sendpage() + splice_from_pipe + pipe_to_sendpage()
with a net-specific handler, splice_to_socket(), that calls sendmsg() with
MSG_SPLICE_PAGES set instead of calling ->sendpage().
MSG_MORE is used to indicate if the sendmsg() is expected to be followed
with more data.
This allows multiple pipe-buffer pages to be passed in a single call in a
BVEC iterator, allowing the processing to be pushed down to a loop in the
protocol driver. This helps pave the way for passing multipage folios down
too.
Protocols that haven't been converted to handle MSG_SPLICE_PAGES yet should
just ignore it and do a normal sendmsg() for now - although that may be a
bit slower as it may copy everything.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
Notes:
ver #5)
- Preclear ret just in case len is 0.
fs/splice.c | 158 +++++++++++++++++++++++++++++++++--------
include/linux/fs.h | 2 -
include/linux/splice.h | 2 +
net/socket.c | 26 +------
4 files changed, 131 insertions(+), 57 deletions(-)
diff --git a/fs/splice.c b/fs/splice.c
index 3e06611d19ae..e337630aed64 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -33,6 +33,7 @@
#include <linux/fsnotify.h>
#include <linux/security.h>
#include <linux/gfp.h>
+#include <linux/net.h>
#include <linux/socket.h>
#include <linux/sched/signal.h>
@@ -448,30 +449,6 @@ const struct pipe_buf_operations nosteal_pipe_buf_ops = {
};
EXPORT_SYMBOL(nosteal_pipe_buf_ops);
-/*
- * Send 'sd->len' bytes to socket from 'sd->file' at position 'sd->pos'
- * using sendpage(). Return the number of bytes sent.
- */
-static int pipe_to_sendpage(struct pipe_inode_info *pipe,
- struct pipe_buffer *buf, struct splice_desc *sd)
-{
- struct file *file = sd->u.file;
- loff_t pos = sd->pos;
- int more;
-
- if (!likely(file->f_op->sendpage))
- return -EINVAL;
-
- more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0;
-
- if (sd->len < sd->total_len &&
- pipe_occupancy(pipe->head, pipe->tail) > 1)
- more |= MSG_SENDPAGE_NOTLAST;
-
- return file->f_op->sendpage(file, buf->page, buf->offset,
- sd->len, &pos, more);
-}
-
static void wakeup_pipe_writers(struct pipe_inode_info *pipe)
{
smp_mb();
@@ -652,7 +629,7 @@ static void splice_from_pipe_end(struct pipe_inode_info *pipe, struct splice_des
* Description:
* This function does little more than loop over the pipe and call
* @actor to do the actual moving of a single struct pipe_buffer to
- * the desired destination. See pipe_to_file, pipe_to_sendpage, or
+ * the desired destination. See pipe_to_file, pipe_to_sendmsg, or
* pipe_to_user.
*
*/
@@ -833,8 +810,9 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
EXPORT_SYMBOL(iter_file_splice_write);
+#ifdef CONFIG_NET
/**
- * generic_splice_sendpage - splice data from a pipe to a socket
+ * splice_to_socket - splice data from a pipe to a socket
* @pipe: pipe to splice from
* @out: socket to write to
* @ppos: position in @out
@@ -846,13 +824,131 @@ EXPORT_SYMBOL(iter_file_splice_write);
* is involved.
*
*/
-ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe, struct file *out,
- loff_t *ppos, size_t len, unsigned int flags)
+ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out,
+ loff_t *ppos, size_t len, unsigned int flags)
{
- return splice_from_pipe(pipe, out, ppos, len, flags, pipe_to_sendpage);
-}
+ struct socket *sock = sock_from_file(out);
+ struct bio_vec bvec[16];
+ struct msghdr msg = {};
+ ssize_t ret = 0;
+ size_t spliced = 0;
+ bool need_wakeup = false;
+
+ pipe_lock(pipe);
+
+ while (len > 0) {
+ unsigned int head, tail, mask, bc = 0;
+ size_t remain = len;
+
+ /*
+ * Check for signal early to make process killable when there
+ * are always buffers available
+ */
+ ret = -ERESTARTSYS;
+ if (signal_pending(current))
+ break;
-EXPORT_SYMBOL(generic_splice_sendpage);
+ while (pipe_empty(pipe->head, pipe->tail)) {
+ ret = 0;
+ if (!pipe->writers)
+ goto out;
+
+ if (spliced)
+ goto out;
+
+ ret = -EAGAIN;
+ if (flags & SPLICE_F_NONBLOCK)
+ goto out;
+
+ ret = -ERESTARTSYS;
+ if (signal_pending(current))
+ goto out;
+
+ if (need_wakeup) {
+ wakeup_pipe_writers(pipe);
+ need_wakeup = false;
+ }
+
+ pipe_wait_readable(pipe);
+ }
+
+ head = pipe->head;
+ tail = pipe->tail;
+ mask = pipe->ring_size - 1;
+
+ while (!pipe_empty(head, tail)) {
+ struct pipe_buffer *buf = &pipe->bufs[tail & mask];
+ size_t seg;
+
+ if (!buf->len) {
+ tail++;
+ continue;
+ }
+
+ seg = min_t(size_t, remain, buf->len);
+ seg = min_t(size_t, seg, PAGE_SIZE);
+
+ ret = pipe_buf_confirm(pipe, buf);
+ if (unlikely(ret)) {
+ if (ret == -ENODATA)
+ ret = 0;
+ break;
+ }
+
+ bvec_set_page(&bvec[bc++], buf->page, seg, buf->offset);
+ remain -= seg;
+ if (seg >= buf->len)
+ tail++;
+ if (bc >= ARRAY_SIZE(bvec))
+ break;
+ }
+
+ if (!bc)
+ break;
+
+ msg.msg_flags = MSG_SPLICE_PAGES;
+ if (flags & SPLICE_F_MORE)
+ msg.msg_flags |= MSG_MORE;
+ if (remain && pipe_occupancy(pipe->head, tail) > 0)
+ msg.msg_flags |= MSG_MORE;
+
+ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, bvec, bc,
+ len - remain);
+ ret = sock_sendmsg(sock, &msg);
+ if (ret <= 0)
+ break;
+
+ spliced += ret;
+ len -= ret;
+ tail = pipe->tail;
+ while (ret > 0) {
+ struct pipe_buffer *buf = &pipe->bufs[tail & mask];
+ size_t seg = min_t(size_t, ret, buf->len);
+
+ buf->offset += seg;
+ buf->len -= seg;
+ ret -= seg;
+
+ if (!buf->len) {
+ pipe_buf_release(pipe, buf);
+ tail++;
+ }
+ }
+
+ if (tail != pipe->tail) {
+ pipe->tail = tail;
+ if (pipe->files)
+ need_wakeup = true;
+ }
+ }
+
+out:
+ pipe_unlock(pipe);
+ if (need_wakeup)
+ wakeup_pipe_writers(pipe);
+ return spliced ?: ret;
+}
+#endif
static int warn_unsupported(struct file *file, const char *op)
{
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 133f0640fb24..df92f4b3d122 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2759,8 +2759,6 @@ extern ssize_t generic_file_splice_read(struct file *, loff_t *,
struct pipe_inode_info *, size_t, unsigned int);
extern ssize_t iter_file_splice_write(struct pipe_inode_info *,
struct file *, loff_t *, size_t, unsigned int);
-extern ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe,
- struct file *out, loff_t *, size_t len, unsigned int flags);
extern long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
loff_t *opos, size_t len, unsigned int flags);
diff --git a/include/linux/splice.h b/include/linux/splice.h
index a55179fd60fc..991ae318b6eb 100644
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -84,6 +84,8 @@ extern long do_splice(struct file *in, loff_t *off_in,
extern long do_tee(struct file *in, struct file *out, size_t len,
unsigned int flags);
+extern ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out,
+ loff_t *ppos, size_t len, unsigned int flags);
/*
* for dynamic pipe sizing
diff --git a/net/socket.c b/net/socket.c
index 3df96e9ba4e2..c4d9104418c8 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -57,6 +57,7 @@
#include <linux/mm.h>
#include <linux/socket.h>
#include <linux/file.h>
+#include <linux/splice.h>
#include <linux/net.h>
#include <linux/interrupt.h>
#include <linux/thread_info.h>
@@ -126,8 +127,6 @@ static long compat_sock_ioctl(struct file *file,
unsigned int cmd, unsigned long arg);
#endif
static int sock_fasync(int fd, struct file *filp, int on);
-static ssize_t sock_sendpage(struct file *file, struct page *page,
- int offset, size_t size, loff_t *ppos, int more);
static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len,
unsigned int flags);
@@ -162,8 +161,7 @@ static const struct file_operations socket_file_ops = {
.mmap = sock_mmap,
.release = sock_close,
.fasync = sock_fasync,
- .sendpage = sock_sendpage,
- .splice_write = generic_splice_sendpage,
+ .splice_write = splice_to_socket,
.splice_read = sock_splice_read,
.show_fdinfo = sock_show_fdinfo,
};
@@ -1066,26 +1064,6 @@ int kernel_recvmsg(struct socket *sock, struct msghdr *msg,
}
EXPORT_SYMBOL(kernel_recvmsg);
-static ssize_t sock_sendpage(struct file *file, struct page *page,
- int offset, size_t size, loff_t *ppos, int more)
-{
- struct socket *sock;
- int flags;
- int ret;
-
- sock = file->private_data;
-
- flags = (file->f_flags & O_NONBLOCK) ? MSG_DONTWAIT : 0;
- /* more is a combination of MSG_MORE and MSG_SENDPAGE_NOTLAST */
- flags |= more;
-
- ret = kernel_sendpage(sock, page, offset, size, flags);
-
- if (trace_sock_send_length_enabled())
- call_trace_sock_send_length(sock->sk, ret, 0);
- return ret;
-}
-
static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len,
unsigned int flags)
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v6 04/14] splice, net: Add a splice_eof op to file-ops and socket-ops
2023-06-07 18:19 [PATCH net-next v6 00/14] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
` (2 preceding siblings ...)
2023-06-07 18:19 ` [PATCH net-next v6 03/14] splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage() David Howells
@ 2023-06-07 18:19 ` David Howells
2023-06-07 18:19 ` [PATCH net-next v6 05/14] tls/sw: Use splice_eof() to flush David Howells
` (10 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: David Howells @ 2023-06-07 18:19 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel, Christoph Hellwig, Al Viro, Jan Kara,
Jeff Layton, David Hildenbrand, Christian Brauner, linux-fsdevel,
linux-block
Add an optional method, ->splice_eof(), to allow splice to indicate the
premature termination of a splice to struct file_operations and struct
proto_ops.
This is called if sendfile() or splice() encounters all of the following
conditions inside splice_direct_to_actor():
(1) the user did not set SPLICE_F_MORE (splice only), and
(2) an EOF condition occurred (->splice_read() returned 0), and
(3) we haven't read enough to fulfill the request (ie. len > 0 still), and
(4) we have already spliced at least one byte.
A further patch will modify the behaviour of SPLICE_F_MORE to always be
passed to the actor if either the user set it or we haven't yet read
sufficient data to fulfill the request.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jan Kara <jack@suse.cz>
cc: Jeff Layton <jlayton@kernel.org>
cc: David Hildenbrand <david@redhat.com>
cc: Christian Brauner <brauner@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
cc: netdev@vger.kernel.org
---
fs/splice.c | 31 ++++++++++++++++++++++++++++++-
include/linux/fs.h | 1 +
include/linux/net.h | 1 +
include/linux/splice.h | 1 +
include/net/sock.h | 1 +
net/socket.c | 10 ++++++++++
6 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/fs/splice.c b/fs/splice.c
index e337630aed64..67dbd85db207 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -969,6 +969,17 @@ static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
return out->f_op->splice_write(pipe, out, ppos, len, flags);
}
+/*
+ * Indicate to the caller that there was a premature EOF when reading from the
+ * source and the caller didn't indicate they would be sending more data after
+ * this.
+ */
+static void do_splice_eof(struct splice_desc *sd)
+{
+ if (sd->splice_eof)
+ sd->splice_eof(sd);
+}
+
/*
* Attempt to initiate a splice from a file to a pipe.
*/
@@ -1068,7 +1079,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
ret = do_splice_to(in, &pos, pipe, len, flags);
if (unlikely(ret <= 0))
- goto out_release;
+ goto read_failure;
read_len = ret;
sd->total_len = read_len;
@@ -1108,6 +1119,15 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
file_accessed(in);
return bytes;
+read_failure:
+ /*
+ * If the user did *not* set SPLICE_F_MORE *and* we didn't hit that
+ * "use all of len" case that cleared SPLICE_F_MORE, *and* we did a
+ * "->splice_in()" that returned EOF (ie zero) *and* we have sent at
+ * least 1 byte *then* we will also do the ->splice_eof() call.
+ */
+ if (ret == 0 && !more && len > 0 && bytes)
+ do_splice_eof(sd);
out_release:
/*
* If we did an incomplete transfer we must release
@@ -1136,6 +1156,14 @@ static int direct_splice_actor(struct pipe_inode_info *pipe,
sd->flags);
}
+static void direct_file_splice_eof(struct splice_desc *sd)
+{
+ struct file *file = sd->u.file;
+
+ if (file->f_op->splice_eof)
+ file->f_op->splice_eof(file);
+}
+
/**
* do_splice_direct - splices data directly between two files
* @in: file to splice from
@@ -1161,6 +1189,7 @@ long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
.flags = flags,
.pos = *ppos,
.u.file = out,
+ .splice_eof = direct_file_splice_eof,
.opos = opos,
};
long ret;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index df92f4b3d122..de2cb1132f07 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1796,6 +1796,7 @@ struct file_operations {
int (*flock) (struct file *, int, struct file_lock *);
ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int);
ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int);
+ void (*splice_eof)(struct file *file);
int (*setlease)(struct file *, long, struct file_lock **, void **);
long (*fallocate)(struct file *file, int mode, loff_t offset,
loff_t len);
diff --git a/include/linux/net.h b/include/linux/net.h
index b73ad8e3c212..8defc8f1d82e 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -210,6 +210,7 @@ struct proto_ops {
int offset, size_t size, int flags);
ssize_t (*splice_read)(struct socket *sock, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len, unsigned int flags);
+ void (*splice_eof)(struct socket *sock);
int (*set_peek_off)(struct sock *sk, int val);
int (*peek_len)(struct socket *sock);
diff --git a/include/linux/splice.h b/include/linux/splice.h
index 991ae318b6eb..4fab18a6e371 100644
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -38,6 +38,7 @@ struct splice_desc {
struct file *file; /* file to read/write */
void *data; /* cookie */
} u;
+ void (*splice_eof)(struct splice_desc *sd); /* Unexpected EOF handler */
loff_t pos; /* file position */
loff_t *opos; /* sendfile: output position */
size_t num_spliced; /* number of bytes already spliced */
diff --git a/include/net/sock.h b/include/net/sock.h
index b418425d7230..ae2d74a0bc4c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1271,6 +1271,7 @@ struct proto {
size_t len, int flags, int *addr_len);
int (*sendpage)(struct sock *sk, struct page *page,
int offset, size_t size, int flags);
+ void (*splice_eof)(struct socket *sock);
int (*bind)(struct sock *sk,
struct sockaddr *addr, int addr_len);
int (*bind_add)(struct sock *sk,
diff --git a/net/socket.c b/net/socket.c
index c4d9104418c8..b778fc03c6e0 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -130,6 +130,7 @@ static int sock_fasync(int fd, struct file *filp, int on);
static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len,
unsigned int flags);
+static void sock_splice_eof(struct file *file);
#ifdef CONFIG_PROC_FS
static void sock_show_fdinfo(struct seq_file *m, struct file *f)
@@ -163,6 +164,7 @@ static const struct file_operations socket_file_ops = {
.fasync = sock_fasync,
.splice_write = splice_to_socket,
.splice_read = sock_splice_read,
+ .splice_eof = sock_splice_eof,
.show_fdinfo = sock_show_fdinfo,
};
@@ -1076,6 +1078,14 @@ static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
return sock->ops->splice_read(sock, ppos, pipe, len, flags);
}
+static void sock_splice_eof(struct file *file)
+{
+ struct socket *sock = file->private_data;
+
+ if (sock->ops->splice_eof)
+ sock->ops->splice_eof(sock);
+}
+
static ssize_t sock_read_iter(struct kiocb *iocb, struct iov_iter *to)
{
struct file *file = iocb->ki_filp;
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v6 05/14] tls/sw: Use splice_eof() to flush
2023-06-07 18:19 [PATCH net-next v6 00/14] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
` (3 preceding siblings ...)
2023-06-07 18:19 ` [PATCH net-next v6 04/14] splice, net: Add a splice_eof op to file-ops and socket-ops David Howells
@ 2023-06-07 18:19 ` David Howells
2023-06-07 18:19 ` [PATCH net-next v6 06/14] tls/device: " David Howells
` (9 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: David Howells @ 2023-06-07 18:19 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel
Allow splice to end a TLS record after prematurely ending a splice/sendfile
due to getting an EOF condition (->splice_read() returned 0) after splice
had called TLS with a sendmsg() with MSG_MORE set when the user didn't set
MSG_MORE.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
net/tls/tls.h | 1 +
net/tls/tls_main.c | 2 ++
net/tls/tls_sw.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 77 insertions(+)
diff --git a/net/tls/tls.h b/net/tls/tls.h
index 0672acab2773..4922668fefaa 100644
--- a/net/tls/tls.h
+++ b/net/tls/tls.h
@@ -97,6 +97,7 @@ void tls_update_rx_zc_capable(struct tls_context *tls_ctx);
void tls_sw_strparser_arm(struct sock *sk, struct tls_context *ctx);
void tls_sw_strparser_done(struct tls_context *tls_ctx);
int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
+void tls_sw_splice_eof(struct socket *sock);
int tls_sw_sendpage_locked(struct sock *sk, struct page *page,
int offset, size_t size, int flags);
int tls_sw_sendpage(struct sock *sk, struct page *page,
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index e02a0d882ed3..82ec5c654f32 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -957,6 +957,7 @@ static void build_proto_ops(struct proto_ops ops[TLS_NUM_CONFIG][TLS_NUM_CONFIG]
ops[TLS_BASE][TLS_BASE] = *base;
ops[TLS_SW ][TLS_BASE] = ops[TLS_BASE][TLS_BASE];
+ ops[TLS_SW ][TLS_BASE].splice_eof = tls_sw_splice_eof;
ops[TLS_SW ][TLS_BASE].sendpage_locked = tls_sw_sendpage_locked;
ops[TLS_BASE][TLS_SW ] = ops[TLS_BASE][TLS_BASE];
@@ -1027,6 +1028,7 @@ static void build_protos(struct proto prot[TLS_NUM_CONFIG][TLS_NUM_CONFIG],
prot[TLS_SW][TLS_BASE] = prot[TLS_BASE][TLS_BASE];
prot[TLS_SW][TLS_BASE].sendmsg = tls_sw_sendmsg;
+ prot[TLS_SW][TLS_BASE].splice_eof = tls_sw_splice_eof;
prot[TLS_SW][TLS_BASE].sendpage = tls_sw_sendpage;
prot[TLS_BASE][TLS_SW] = prot[TLS_BASE][TLS_BASE];
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 38acc27a0dd0..a2fb0256ff1c 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1157,6 +1157,80 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
return copied > 0 ? copied : ret;
}
+/*
+ * Handle unexpected EOF during splice without SPLICE_F_MORE set.
+ */
+void tls_sw_splice_eof(struct socket *sock)
+{
+ struct sock *sk = sock->sk;
+ struct tls_context *tls_ctx = tls_get_ctx(sk);
+ struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
+ struct tls_rec *rec;
+ struct sk_msg *msg_pl;
+ ssize_t copied = 0;
+ bool retrying = false;
+ int ret = 0;
+ int pending;
+
+ if (!ctx->open_rec)
+ return;
+
+ mutex_lock(&tls_ctx->tx_lock);
+ lock_sock(sk);
+
+retry:
+ rec = ctx->open_rec;
+ if (!rec)
+ goto unlock;
+
+ msg_pl = &rec->msg_plaintext;
+
+ /* Check the BPF advisor and perform transmission. */
+ ret = bpf_exec_tx_verdict(msg_pl, sk, false, TLS_RECORD_TYPE_DATA,
+ &copied, 0);
+ switch (ret) {
+ case 0:
+ case -EAGAIN:
+ if (retrying)
+ goto unlock;
+ retrying = true;
+ goto retry;
+ case -EINPROGRESS:
+ break;
+ default:
+ goto unlock;
+ }
+
+ /* Wait for pending encryptions to get completed */
+ spin_lock_bh(&ctx->encrypt_compl_lock);
+ ctx->async_notify = true;
+
+ pending = atomic_read(&ctx->encrypt_pending);
+ spin_unlock_bh(&ctx->encrypt_compl_lock);
+ if (pending)
+ crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
+ else
+ reinit_completion(&ctx->async_wait.completion);
+
+ /* There can be no concurrent accesses, since we have no pending
+ * encrypt operations
+ */
+ WRITE_ONCE(ctx->async_notify, false);
+
+ if (ctx->async_wait.err)
+ goto unlock;
+
+ /* Transmit if any encryptions have completed */
+ if (test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) {
+ cancel_delayed_work(&ctx->tx_work.work);
+ tls_tx_records(sk, 0);
+ }
+
+unlock:
+ release_sock(sk);
+ mutex_unlock(&tls_ctx->tx_lock);
+}
+
static int tls_sw_do_sendpage(struct sock *sk, struct page *page,
int offset, size_t size, int flags)
{
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v6 06/14] tls/device: Use splice_eof() to flush
2023-06-07 18:19 [PATCH net-next v6 00/14] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
` (4 preceding siblings ...)
2023-06-07 18:19 ` [PATCH net-next v6 05/14] tls/sw: Use splice_eof() to flush David Howells
@ 2023-06-07 18:19 ` David Howells
2023-06-07 18:19 ` [PATCH net-next v6 07/14] ipv4, ipv6: " David Howells
` (8 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: David Howells @ 2023-06-07 18:19 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel
Allow splice to end a TLS record after prematurely ending a splice/sendfile
due to getting an EOF condition (->splice_read() returned 0) after splice
had called TLS with a sendmsg() with MSG_MORE set when the user didn't set
MSG_MORE.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
net/tls/tls.h | 1 +
net/tls/tls_device.c | 23 +++++++++++++++++++++++
net/tls/tls_main.c | 2 ++
3 files changed, 26 insertions(+)
diff --git a/net/tls/tls.h b/net/tls/tls.h
index 4922668fefaa..d002c3af1966 100644
--- a/net/tls/tls.h
+++ b/net/tls/tls.h
@@ -116,6 +116,7 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
size_t len, unsigned int flags);
int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
+void tls_device_splice_eof(struct socket *sock);
int tls_device_sendpage(struct sock *sk, struct page *page,
int offset, size_t size, int flags);
int tls_tx_records(struct sock *sk, int flags);
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 9ef766e41c7a..439be833dcf9 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -590,6 +590,29 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
return rc;
}
+void tls_device_splice_eof(struct socket *sock)
+{
+ struct sock *sk = sock->sk;
+ struct tls_context *tls_ctx = tls_get_ctx(sk);
+ union tls_iter_offset iter;
+ struct iov_iter iov_iter = {};
+
+ if (!tls_is_partially_sent_record(tls_ctx))
+ return;
+
+ mutex_lock(&tls_ctx->tx_lock);
+ lock_sock(sk);
+
+ if (tls_is_partially_sent_record(tls_ctx)) {
+ iov_iter_bvec(&iov_iter, ITER_SOURCE, NULL, 0, 0);
+ iter.msg_iter = &iov_iter;
+ tls_push_data(sk, iter, 0, 0, TLS_RECORD_TYPE_DATA, NULL);
+ }
+
+ release_sock(sk);
+ mutex_unlock(&tls_ctx->tx_lock);
+}
+
int tls_device_sendpage(struct sock *sk, struct page *page,
int offset, size_t size, int flags)
{
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 82ec5c654f32..7b9c83dd7de2 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -1044,10 +1044,12 @@ static void build_protos(struct proto prot[TLS_NUM_CONFIG][TLS_NUM_CONFIG],
#ifdef CONFIG_TLS_DEVICE
prot[TLS_HW][TLS_BASE] = prot[TLS_BASE][TLS_BASE];
prot[TLS_HW][TLS_BASE].sendmsg = tls_device_sendmsg;
+ prot[TLS_HW][TLS_BASE].splice_eof = tls_device_splice_eof;
prot[TLS_HW][TLS_BASE].sendpage = tls_device_sendpage;
prot[TLS_HW][TLS_SW] = prot[TLS_BASE][TLS_SW];
prot[TLS_HW][TLS_SW].sendmsg = tls_device_sendmsg;
+ prot[TLS_HW][TLS_SW].splice_eof = tls_device_splice_eof;
prot[TLS_HW][TLS_SW].sendpage = tls_device_sendpage;
prot[TLS_BASE][TLS_HW] = prot[TLS_BASE][TLS_SW];
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v6 07/14] ipv4, ipv6: Use splice_eof() to flush
2023-06-07 18:19 [PATCH net-next v6 00/14] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
` (5 preceding siblings ...)
2023-06-07 18:19 ` [PATCH net-next v6 06/14] tls/device: " David Howells
@ 2023-06-07 18:19 ` David Howells
2023-06-07 18:19 ` [PATCH net-next v6 08/14] chelsio/chtls: " David Howells
` (7 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: David Howells @ 2023-06-07 18:19 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel, Kuniyuki Iwashima
Allow splice to undo the effects of MSG_MORE after prematurely ending a
splice/sendfile due to getting an EOF condition (->splice_read() returned
0) after splice had called sendmsg() with MSG_MORE set when the user didn't
set MSG_MORE.
For UDP, a pending packet will not be emitted if the socket is closed
before it is flushed; with this change, it be flushed by ->splice_eof().
For TCP, it's not clear that MSG_MORE is actually effective.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Kuniyuki Iwashima <kuniyu@amazon.com>
cc: Eric Dumazet <edumazet@google.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: David Ahern <dsahern@kernel.org>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
Notes:
ver #6)
- In inet_splice_eof(), use prot after deref of sk->sk_prot.
- In udpv6_splice_eof(), use udp_v6_push_pending_frames().
- In udpv6_splice_eof(), don't check for AF_INET.
include/net/inet_common.h | 1 +
include/net/tcp.h | 1 +
include/net/udp.h | 1 +
net/ipv4/af_inet.c | 18 ++++++++++++++++++
net/ipv4/tcp.c | 16 ++++++++++++++++
net/ipv4/tcp_ipv4.c | 1 +
net/ipv4/udp.c | 16 ++++++++++++++++
net/ipv6/af_inet6.c | 1 +
net/ipv6/tcp_ipv6.c | 1 +
net/ipv6/udp.c | 15 +++++++++++++++
10 files changed, 71 insertions(+)
diff --git a/include/net/inet_common.h b/include/net/inet_common.h
index 77f4b0ef5b92..a75333342c4e 100644
--- a/include/net/inet_common.h
+++ b/include/net/inet_common.h
@@ -35,6 +35,7 @@ void __inet_accept(struct socket *sock, struct socket *newsock,
struct sock *newsk);
int inet_send_prepare(struct sock *sk);
int inet_sendmsg(struct socket *sock, struct msghdr *msg, size_t size);
+void inet_splice_eof(struct socket *sock);
ssize_t inet_sendpage(struct socket *sock, struct page *page, int offset,
size_t size, int flags);
int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 68990a8f556a..49611af31bb7 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -327,6 +327,7 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size);
int tcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg, int *copied,
size_t size, struct ubuf_info *uarg);
+void tcp_splice_eof(struct socket *sock);
int tcp_sendpage(struct sock *sk, struct page *page, int offset, size_t size,
int flags);
int tcp_sendpage_locked(struct sock *sk, struct page *page, int offset,
diff --git a/include/net/udp.h b/include/net/udp.h
index 5cad44318d71..4ed0b47c5582 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -278,6 +278,7 @@ int udp_get_port(struct sock *sk, unsigned short snum,
int udp_err(struct sk_buff *, u32);
int udp_abort(struct sock *sk, int err);
int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len);
+void udp_splice_eof(struct socket *sock);
int udp_push_pending_frames(struct sock *sk);
void udp_flush_pending_frames(struct sock *sk);
int udp_cmsg_send(struct sock *sk, struct msghdr *msg, u16 *gso_size);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index b5735b3551cf..fd233c4195ac 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -831,6 +831,21 @@ int inet_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
}
EXPORT_SYMBOL(inet_sendmsg);
+void inet_splice_eof(struct socket *sock)
+{
+ const struct proto *prot;
+ struct sock *sk = sock->sk;
+
+ if (unlikely(inet_send_prepare(sk)))
+ return;
+
+ /* IPV6_ADDRFORM can change sk->sk_prot under us. */
+ prot = READ_ONCE(sk->sk_prot);
+ if (prot->splice_eof)
+ prot->splice_eof(sock);
+}
+EXPORT_SYMBOL_GPL(inet_splice_eof);
+
ssize_t inet_sendpage(struct socket *sock, struct page *page, int offset,
size_t size, int flags)
{
@@ -1050,6 +1065,7 @@ const struct proto_ops inet_stream_ops = {
#ifdef CONFIG_MMU
.mmap = tcp_mmap,
#endif
+ .splice_eof = inet_splice_eof,
.sendpage = inet_sendpage,
.splice_read = tcp_splice_read,
.read_sock = tcp_read_sock,
@@ -1084,6 +1100,7 @@ const struct proto_ops inet_dgram_ops = {
.read_skb = udp_read_skb,
.recvmsg = inet_recvmsg,
.mmap = sock_no_mmap,
+ .splice_eof = inet_splice_eof,
.sendpage = inet_sendpage,
.set_peek_off = sk_set_peek_off,
#ifdef CONFIG_COMPAT
@@ -1115,6 +1132,7 @@ static const struct proto_ops inet_sockraw_ops = {
.sendmsg = inet_sendmsg,
.recvmsg = inet_recvmsg,
.mmap = sock_no_mmap,
+ .splice_eof = inet_splice_eof,
.sendpage = inet_sendpage,
#ifdef CONFIG_COMPAT
.compat_ioctl = inet_compat_ioctl,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 53b7751b68e1..09f03221a6f1 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1371,6 +1371,22 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
}
EXPORT_SYMBOL(tcp_sendmsg);
+void tcp_splice_eof(struct socket *sock)
+{
+ struct sock *sk = sock->sk;
+ struct tcp_sock *tp = tcp_sk(sk);
+ int mss_now, size_goal;
+
+ if (!tcp_write_queue_tail(sk))
+ return;
+
+ lock_sock(sk);
+ mss_now = tcp_send_mss(sk, &size_goal, 0);
+ tcp_push(sk, 0, mss_now, tp->nonagle, size_goal);
+ release_sock(sk);
+}
+EXPORT_SYMBOL_GPL(tcp_splice_eof);
+
/*
* Handle reading urgent data. BSD has very simple semantics for
* this, no blocking and very strange errors 8)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 53e9ce2f05bb..84a5d557dc1a 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -3116,6 +3116,7 @@ struct proto tcp_prot = {
.keepalive = tcp_set_keepalive,
.recvmsg = tcp_recvmsg,
.sendmsg = tcp_sendmsg,
+ .splice_eof = tcp_splice_eof,
.sendpage = tcp_sendpage,
.backlog_rcv = tcp_v4_do_rcv,
.release_cb = tcp_release_cb,
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index fd3dae081f3a..df5e407286d7 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1324,6 +1324,21 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
}
EXPORT_SYMBOL(udp_sendmsg);
+void udp_splice_eof(struct socket *sock)
+{
+ struct sock *sk = sock->sk;
+ struct udp_sock *up = udp_sk(sk);
+
+ if (!up->pending || READ_ONCE(up->corkflag))
+ return;
+
+ lock_sock(sk);
+ if (up->pending && !READ_ONCE(up->corkflag))
+ udp_push_pending_frames(sk);
+ release_sock(sk);
+}
+EXPORT_SYMBOL_GPL(udp_splice_eof);
+
int udp_sendpage(struct sock *sk, struct page *page, int offset,
size_t size, int flags)
{
@@ -2918,6 +2933,7 @@ struct proto udp_prot = {
.getsockopt = udp_getsockopt,
.sendmsg = udp_sendmsg,
.recvmsg = udp_recvmsg,
+ .splice_eof = udp_splice_eof,
.sendpage = udp_sendpage,
.release_cb = ip4_datagram_release_cb,
.hash = udp_lib_hash,
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 2bbf13216a3d..564942bee067 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -695,6 +695,7 @@ const struct proto_ops inet6_stream_ops = {
#ifdef CONFIG_MMU
.mmap = tcp_mmap,
#endif
+ .splice_eof = inet_splice_eof,
.sendpage = inet_sendpage,
.sendmsg_locked = tcp_sendmsg_locked,
.sendpage_locked = tcp_sendpage_locked,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index d657713d1c71..c17c8ff94b79 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -2150,6 +2150,7 @@ struct proto tcpv6_prot = {
.keepalive = tcp_set_keepalive,
.recvmsg = tcp_recvmsg,
.sendmsg = tcp_sendmsg,
+ .splice_eof = tcp_splice_eof,
.sendpage = tcp_sendpage,
.backlog_rcv = tcp_v6_do_rcv,
.release_cb = tcp_release_cb,
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index e5a337e6b970..317b01c9bc39 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1653,6 +1653,20 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
}
EXPORT_SYMBOL(udpv6_sendmsg);
+static void udpv6_splice_eof(struct socket *sock)
+{
+ struct sock *sk = sock->sk;
+ struct udp_sock *up = udp_sk(sk);
+
+ if (!up->pending || READ_ONCE(up->corkflag))
+ return;
+
+ lock_sock(sk);
+ if (up->pending && !READ_ONCE(up->corkflag))
+ udp_v6_push_pending_frames(sk);
+ release_sock(sk);
+}
+
void udpv6_destroy_sock(struct sock *sk)
{
struct udp_sock *up = udp_sk(sk);
@@ -1764,6 +1778,7 @@ struct proto udpv6_prot = {
.getsockopt = udpv6_getsockopt,
.sendmsg = udpv6_sendmsg,
.recvmsg = udpv6_recvmsg,
+ .splice_eof = udpv6_splice_eof,
.release_cb = ip6_datagram_release_cb,
.hash = udp_lib_hash,
.unhash = udp_lib_unhash,
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v6 08/14] chelsio/chtls: Use splice_eof() to flush
2023-06-07 18:19 [PATCH net-next v6 00/14] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
` (6 preceding siblings ...)
2023-06-07 18:19 ` [PATCH net-next v6 07/14] ipv4, ipv6: " David Howells
@ 2023-06-07 18:19 ` David Howells
2023-06-07 18:19 ` [PATCH net-next v6 09/14] kcm: " David Howells
` (6 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: David Howells @ 2023-06-07 18:19 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel, Ayush Sawal
Allow splice to end a Chelsio TLS record after prematurely ending a
splice/sendfile due to getting an EOF condition (->splice_read() returned
0) after splice had called sendmsg() with MSG_MORE set when the user didn't
set MSG_MORE.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Ayush Sawal <ayush.sawal@chelsio.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls.h | 1 +
.../net/ethernet/chelsio/inline_crypto/chtls/chtls_io.c | 9 +++++++++
.../ethernet/chelsio/inline_crypto/chtls/chtls_main.c | 1 +
3 files changed, 11 insertions(+)
diff --git a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls.h b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls.h
index 41714203ace8..da4818d2c856 100644
--- a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls.h
+++ b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls.h
@@ -568,6 +568,7 @@ void chtls_destroy_sock(struct sock *sk);
int chtls_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
int chtls_recvmsg(struct sock *sk, struct msghdr *msg,
size_t len, int flags, int *addr_len);
+void chtls_splice_eof(struct socket *sock);
int chtls_sendpage(struct sock *sk, struct page *page,
int offset, size_t size, int flags);
int send_tx_flowc_wr(struct sock *sk, int compl,
diff --git a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_io.c b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_io.c
index 5724bbbb6ee0..e08ac960c967 100644
--- a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_io.c
+++ b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_io.c
@@ -1237,6 +1237,15 @@ int chtls_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
goto done;
}
+void chtls_splice_eof(struct socket *sock)
+{
+ struct sock *sk = sock->sk;
+
+ lock_sock(sk);
+ chtls_tcp_push(sk, 0);
+ release_sock(sk);
+}
+
int chtls_sendpage(struct sock *sk, struct page *page,
int offset, size_t size, int flags)
{
diff --git a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_main.c b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_main.c
index 1e55b12fee51..6b6787eafd2f 100644
--- a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_main.c
+++ b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_main.c
@@ -606,6 +606,7 @@ static void __init chtls_init_ulp_ops(void)
chtls_cpl_prot.destroy = chtls_destroy_sock;
chtls_cpl_prot.shutdown = chtls_shutdown;
chtls_cpl_prot.sendmsg = chtls_sendmsg;
+ chtls_cpl_prot.splice_eof = chtls_splice_eof;
chtls_cpl_prot.sendpage = chtls_sendpage;
chtls_cpl_prot.recvmsg = chtls_recvmsg;
chtls_cpl_prot.setsockopt = chtls_setsockopt;
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v6 09/14] kcm: Use splice_eof() to flush
2023-06-07 18:19 [PATCH net-next v6 00/14] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
` (7 preceding siblings ...)
2023-06-07 18:19 ` [PATCH net-next v6 08/14] chelsio/chtls: " David Howells
@ 2023-06-07 18:19 ` David Howells
2023-06-07 18:19 ` [PATCH net-next v6 10/14] splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor() David Howells
` (5 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: David Howells @ 2023-06-07 18:19 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel, Tom Herbert, Tom Herbert, Cong Wang
Allow splice to undo the effects of MSG_MORE after prematurely ending a
splice/sendfile due to getting an EOF condition (->splice_read() returned
0) after splice had called sendmsg() with MSG_MORE set when the user didn't
set MSG_MORE.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Tom Herbert <tom@herbertland.com>
cc: Tom Herbert <tom@quantonium.net>
cc: Cong Wang <cong.wang@bytedance.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
Notes:
ver #6)
- In kcm_splice_eof(), use skb_queue_empty_lockless().
net/kcm/kcmsock.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index ba22af16b96d..7dee74430b59 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -968,6 +968,19 @@ static int kcm_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
return err;
}
+static void kcm_splice_eof(struct socket *sock)
+{
+ struct sock *sk = sock->sk;
+ struct kcm_sock *kcm = kcm_sk(sk);
+
+ if (skb_queue_empty_lockless(&sk->sk_write_queue))
+ return;
+
+ lock_sock(sk);
+ kcm_write_msgs(kcm);
+ release_sock(sk);
+}
+
static ssize_t kcm_sendpage(struct socket *sock, struct page *page,
int offset, size_t size, int flags)
@@ -1773,6 +1786,7 @@ static const struct proto_ops kcm_dgram_ops = {
.sendmsg = kcm_sendmsg,
.recvmsg = kcm_recvmsg,
.mmap = sock_no_mmap,
+ .splice_eof = kcm_splice_eof,
.sendpage = kcm_sendpage,
};
@@ -1794,6 +1808,7 @@ static const struct proto_ops kcm_seqpacket_ops = {
.sendmsg = kcm_sendmsg,
.recvmsg = kcm_recvmsg,
.mmap = sock_no_mmap,
+ .splice_eof = kcm_splice_eof,
.sendpage = kcm_sendpage,
.splice_read = kcm_splice_read,
};
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v6 10/14] splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor()
2023-06-07 18:19 [PATCH net-next v6 00/14] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
` (8 preceding siblings ...)
2023-06-07 18:19 ` [PATCH net-next v6 09/14] kcm: " David Howells
@ 2023-06-07 18:19 ` David Howells
2023-06-07 18:19 ` [PATCH net-next v6 11/14] tls/sw: Support MSG_SPLICE_PAGES David Howells
` (4 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: David Howells @ 2023-06-07 18:19 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel, Christoph Hellwig, Al Viro, Jan Kara,
Jeff Layton, David Hildenbrand, Christian Brauner, linux-fsdevel,
linux-block
splice_direct_to_actor() doesn't manage SPLICE_F_MORE correctly[1] - and,
as a result, it incorrectly signals/fails to signal MSG_MORE when splicing
to a socket. The problem I'm seeing happens when a short splice occurs
because we got a short read due to hitting the EOF on a file: as the length
read (read_len) is less than the remaining size to be spliced (len),
SPLICE_F_MORE (and thus MSG_MORE) is set.
The issue is that, for the moment, we have no way to know *why* the short
read occurred and so can't make a good decision on whether we *should* keep
MSG_MORE set.
MSG_SENDPAGE_NOTLAST was added to work around this, but that is also set
incorrectly under some circumstances - for example if a short read fills a
single pipe_buffer, but the next read would return more (seqfile can do
this).
This was observed with the multi_chunk_sendfile tests in the tls kselftest
program. Some of those tests would hang and time out when the last chunk
of file was less than the sendfile request size:
build/kselftest/net/tls -r tls.12_aes_gcm.multi_chunk_sendfile
This has been observed before[2] and worked around in AF_TLS[3].
Fix this by making splice_direct_to_actor() always signal SPLICE_F_MORE if
we haven't yet hit the requested operation size. SPLICE_F_MORE remains
signalled if the user passed it in to splice() but otherwise gets cleared
when we've read sufficient data to fulfill the request.
If, however, we get a premature EOF from ->splice_read(), have sent at
least one byte and SPLICE_F_MORE was not set by the caller, ->splice_eof()
will be invoked.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jan Kara <jack@suse.cz>
cc: Jeff Layton <jlayton@kernel.org>
cc: David Hildenbrand <david@redhat.com>
cc: Christian Brauner <brauner@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/499791.1685485603@warthog.procyon.org.uk/ [1]
Link: https://lore.kernel.org/r/1591392508-14592-1-git-send-email-pooja.trivedi@stackpath.com/ [2]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=d452d48b9f8b1a7f8152d33ef52cfd7fe1735b0a [3]
---
Notes:
ver #4)
- Use ->splice_eof() to signal a premature EOF to the splice output.
fs/splice.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/fs/splice.c b/fs/splice.c
index 67dbd85db207..67ddaac1f5c5 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1063,13 +1063,17 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
*/
bytes = 0;
len = sd->total_len;
+
+ /* Don't block on output, we have to drain the direct pipe. */
flags = sd->flags;
+ sd->flags &= ~SPLICE_F_NONBLOCK;
/*
- * Don't block on output, we have to drain the direct pipe.
+ * We signal MORE until we've read sufficient data to fulfill the
+ * request and we keep signalling it if the caller set it.
*/
- sd->flags &= ~SPLICE_F_NONBLOCK;
more = sd->flags & SPLICE_F_MORE;
+ sd->flags |= SPLICE_F_MORE;
WARN_ON_ONCE(!pipe_empty(pipe->head, pipe->tail));
@@ -1085,14 +1089,12 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
sd->total_len = read_len;
/*
- * If more data is pending, set SPLICE_F_MORE
- * If this is the last data and SPLICE_F_MORE was not set
- * initially, clears it.
+ * If we now have sufficient data to fulfill the request then
+ * we clear SPLICE_F_MORE if it was not set initially.
*/
- if (read_len < len)
- sd->flags |= SPLICE_F_MORE;
- else if (!more)
+ if (read_len >= len && !more)
sd->flags &= ~SPLICE_F_MORE;
+
/*
* NOTE: nonblocking mode only applies to the input. We
* must not do the output in nonblocking mode as then we
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v6 11/14] tls/sw: Support MSG_SPLICE_PAGES
2023-06-07 18:19 [PATCH net-next v6 00/14] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
` (9 preceding siblings ...)
2023-06-07 18:19 ` [PATCH net-next v6 10/14] splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor() David Howells
@ 2023-06-07 18:19 ` David Howells
2023-06-07 18:19 ` [PATCH net-next v6 12/14] tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES David Howells
` (3 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: David Howells @ 2023-06-07 18:19 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel
Make TLS's sendmsg() support MSG_SPLICE_PAGES. This causes pages to be
spliced from the source iterator if possible.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
Notes:
ver #6)
- In tls_sw_sendmsg_splice(), remove unused put_page.
- In tls_sw_sendmsg(), don't set pending_open_record_frags twice.
ver #2)
- "rls_" should be "tls_".
net/tls/tls_sw.c | 41 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 41 insertions(+)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index a2fb0256ff1c..2d2bb933d2a6 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -931,6 +931,35 @@ static int tls_sw_push_pending_record(struct sock *sk, int flags)
&copied, flags);
}
+static int tls_sw_sendmsg_splice(struct sock *sk, struct msghdr *msg,
+ struct sk_msg *msg_pl, size_t try_to_copy,
+ ssize_t *copied)
+{
+ struct page *page = NULL, **pages = &page;
+
+ do {
+ ssize_t part;
+ size_t off;
+
+ part = iov_iter_extract_pages(&msg->msg_iter, &pages,
+ try_to_copy, 1, 0, &off);
+ if (part <= 0)
+ return part ?: -EIO;
+
+ if (WARN_ON_ONCE(!sendpage_ok(page))) {
+ iov_iter_revert(&msg->msg_iter, part);
+ return -EIO;
+ }
+
+ sk_msg_page_add(msg_pl, page, part, off);
+ sk_mem_charge(sk, part);
+ *copied += part;
+ try_to_copy -= part;
+ } while (try_to_copy && !sk_msg_full(msg_pl));
+
+ return 0;
+}
+
int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
{
long timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
@@ -1020,6 +1049,17 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
full_record = true;
}
+ if (try_to_copy && (msg->msg_flags & MSG_SPLICE_PAGES)) {
+ ret = tls_sw_sendmsg_splice(sk, msg, msg_pl,
+ try_to_copy, &copied);
+ if (ret < 0)
+ goto send_end;
+ tls_ctx->pending_open_record_frags = true;
+ if (full_record || eor || sk_msg_full(msg_pl))
+ goto copied;
+ continue;
+ }
+
if (!is_kvec && (full_record || eor) && !async_capable) {
u32 first = msg_pl->sg.end;
@@ -1084,6 +1124,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
*/
tls_ctx->pending_open_record_frags = true;
copied += try_to_copy;
+copied:
if (full_record || eor) {
ret = bpf_exec_tx_verdict(msg_pl, sk, full_record,
record_type, &copied,
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v6 12/14] tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES
2023-06-07 18:19 [PATCH net-next v6 00/14] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
` (10 preceding siblings ...)
2023-06-07 18:19 ` [PATCH net-next v6 11/14] tls/sw: Support MSG_SPLICE_PAGES David Howells
@ 2023-06-07 18:19 ` David Howells
2023-06-07 18:19 ` [PATCH net-next v6 13/14] tls/device: Support MSG_SPLICE_PAGES David Howells
` (2 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: David Howells @ 2023-06-07 18:19 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel, bpf
Convert tls_sw_sendpage() and tls_sw_sendpage_locked() to use sendmsg()
with MSG_SPLICE_PAGES rather than directly splicing in the pages itself.
[!] Note that tls_sw_sendpage_locked() appears to have the wrong locking
upstream. I think the caller will only hold the socket lock, but it
should hold tls_ctx->tx_lock too.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
cc: bpf@vger.kernel.org
---
net/tls/tls_sw.c | 173 ++++++++++-------------------------------------
1 file changed, 35 insertions(+), 138 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 2d2bb933d2a6..319f61590d2c 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -960,7 +960,8 @@ static int tls_sw_sendmsg_splice(struct sock *sk, struct msghdr *msg,
return 0;
}
-int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
+static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
+ size_t size)
{
long timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
struct tls_context *tls_ctx = tls_get_ctx(sk);
@@ -983,15 +984,6 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
int ret = 0;
int pending;
- if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
- MSG_CMSG_COMPAT | MSG_SPLICE_PAGES))
- return -EOPNOTSUPP;
-
- ret = mutex_lock_interruptible(&tls_ctx->tx_lock);
- if (ret)
- return ret;
- lock_sock(sk);
-
if (unlikely(msg->msg_controllen)) {
ret = tls_process_cmsg(sk, msg, &record_type);
if (ret) {
@@ -1192,10 +1184,27 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
send_end:
ret = sk_stream_error(sk, msg->msg_flags, ret);
+ return copied > 0 ? copied : ret;
+}
+int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
+{
+ struct tls_context *tls_ctx = tls_get_ctx(sk);
+ int ret;
+
+ if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
+ MSG_CMSG_COMPAT | MSG_SPLICE_PAGES |
+ MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY))
+ return -EOPNOTSUPP;
+
+ ret = mutex_lock_interruptible(&tls_ctx->tx_lock);
+ if (ret)
+ return ret;
+ lock_sock(sk);
+ ret = tls_sw_sendmsg_locked(sk, msg, size);
release_sock(sk);
mutex_unlock(&tls_ctx->tx_lock);
- return copied > 0 ? copied : ret;
+ return ret;
}
/*
@@ -1272,151 +1281,39 @@ void tls_sw_splice_eof(struct socket *sock)
mutex_unlock(&tls_ctx->tx_lock);
}
-static int tls_sw_do_sendpage(struct sock *sk, struct page *page,
- int offset, size_t size, int flags)
-{
- long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
- struct tls_context *tls_ctx = tls_get_ctx(sk);
- struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
- struct tls_prot_info *prot = &tls_ctx->prot_info;
- unsigned char record_type = TLS_RECORD_TYPE_DATA;
- struct sk_msg *msg_pl;
- struct tls_rec *rec;
- int num_async = 0;
- ssize_t copied = 0;
- bool full_record;
- int record_room;
- int ret = 0;
- bool eor;
-
- eor = !(flags & MSG_SENDPAGE_NOTLAST);
- sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk);
-
- /* Call the sk_stream functions to manage the sndbuf mem. */
- while (size > 0) {
- size_t copy, required_size;
-
- if (sk->sk_err) {
- ret = -sk->sk_err;
- goto sendpage_end;
- }
-
- if (ctx->open_rec)
- rec = ctx->open_rec;
- else
- rec = ctx->open_rec = tls_get_rec(sk);
- if (!rec) {
- ret = -ENOMEM;
- goto sendpage_end;
- }
-
- msg_pl = &rec->msg_plaintext;
-
- full_record = false;
- record_room = TLS_MAX_PAYLOAD_SIZE - msg_pl->sg.size;
- copy = size;
- if (copy >= record_room) {
- copy = record_room;
- full_record = true;
- }
-
- required_size = msg_pl->sg.size + copy + prot->overhead_size;
-
- if (!sk_stream_memory_free(sk))
- goto wait_for_sndbuf;
-alloc_payload:
- ret = tls_alloc_encrypted_msg(sk, required_size);
- if (ret) {
- if (ret != -ENOSPC)
- goto wait_for_memory;
-
- /* Adjust copy according to the amount that was
- * actually allocated. The difference is due
- * to max sg elements limit
- */
- copy -= required_size - msg_pl->sg.size;
- full_record = true;
- }
-
- sk_msg_page_add(msg_pl, page, copy, offset);
- sk_mem_charge(sk, copy);
-
- offset += copy;
- size -= copy;
- copied += copy;
-
- tls_ctx->pending_open_record_frags = true;
- if (full_record || eor || sk_msg_full(msg_pl)) {
- ret = bpf_exec_tx_verdict(msg_pl, sk, full_record,
- record_type, &copied, flags);
- if (ret) {
- if (ret == -EINPROGRESS)
- num_async++;
- else if (ret == -ENOMEM)
- goto wait_for_memory;
- else if (ret != -EAGAIN) {
- if (ret == -ENOSPC)
- ret = 0;
- goto sendpage_end;
- }
- }
- }
- continue;
-wait_for_sndbuf:
- set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
-wait_for_memory:
- ret = sk_stream_wait_memory(sk, &timeo);
- if (ret) {
- if (ctx->open_rec)
- tls_trim_both_msgs(sk, msg_pl->sg.size);
- goto sendpage_end;
- }
-
- if (ctx->open_rec)
- goto alloc_payload;
- }
-
- if (num_async) {
- /* Transmit if any encryptions have completed */
- if (test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) {
- cancel_delayed_work(&ctx->tx_work.work);
- tls_tx_records(sk, flags);
- }
- }
-sendpage_end:
- ret = sk_stream_error(sk, flags, ret);
- return copied > 0 ? copied : ret;
-}
-
int tls_sw_sendpage_locked(struct sock *sk, struct page *page,
int offset, size_t size, int flags)
{
+ struct bio_vec bvec;
+ struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };
+
if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY |
MSG_NO_SHARED_FRAGS))
return -EOPNOTSUPP;
+ if (flags & MSG_SENDPAGE_NOTLAST)
+ msg.msg_flags |= MSG_MORE;
- return tls_sw_do_sendpage(sk, page, offset, size, flags);
+ bvec_set_page(&bvec, page, size, offset);
+ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
+ return tls_sw_sendmsg_locked(sk, &msg, size);
}
int tls_sw_sendpage(struct sock *sk, struct page *page,
int offset, size_t size, int flags)
{
- struct tls_context *tls_ctx = tls_get_ctx(sk);
- int ret;
+ struct bio_vec bvec;
+ struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };
if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY))
return -EOPNOTSUPP;
+ if (flags & MSG_SENDPAGE_NOTLAST)
+ msg.msg_flags |= MSG_MORE;
- ret = mutex_lock_interruptible(&tls_ctx->tx_lock);
- if (ret)
- return ret;
- lock_sock(sk);
- ret = tls_sw_do_sendpage(sk, page, offset, size, flags);
- release_sock(sk);
- mutex_unlock(&tls_ctx->tx_lock);
- return ret;
+ bvec_set_page(&bvec, page, size, offset);
+ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
+ return tls_sw_sendmsg(sk, &msg, size);
}
static int
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v6 13/14] tls/device: Support MSG_SPLICE_PAGES
2023-06-07 18:19 [PATCH net-next v6 00/14] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
` (11 preceding siblings ...)
2023-06-07 18:19 ` [PATCH net-next v6 12/14] tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES David Howells
@ 2023-06-07 18:19 ` David Howells
2023-06-07 18:19 ` [PATCH net-next v6 14/14] tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES David Howells
2023-06-09 3:40 ` [PATCH net-next v6 00/14] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS patchwork-bot+netdevbpf
14 siblings, 0 replies; 16+ messages in thread
From: David Howells @ 2023-06-07 18:19 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel
Make TLS's device sendmsg() support MSG_SPLICE_PAGES. This causes pages to
be spliced from the source iterator if possible.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
net/tls/tls_device.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 439be833dcf9..bb3bb523544e 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -509,6 +509,29 @@ static int tls_push_data(struct sock *sk,
tls_append_frag(record, &zc_pfrag, copy);
iter_offset.offset += copy;
+ } else if (copy && (flags & MSG_SPLICE_PAGES)) {
+ struct page_frag zc_pfrag;
+ struct page **pages = &zc_pfrag.page;
+ size_t off;
+
+ rc = iov_iter_extract_pages(iter_offset.msg_iter,
+ &pages, copy, 1, 0, &off);
+ if (rc <= 0) {
+ if (rc == 0)
+ rc = -EIO;
+ goto handle_error;
+ }
+ copy = rc;
+
+ if (WARN_ON_ONCE(!sendpage_ok(zc_pfrag.page))) {
+ iov_iter_revert(iter_offset.msg_iter, copy);
+ rc = -EIO;
+ goto handle_error;
+ }
+
+ zc_pfrag.offset = off;
+ zc_pfrag.size = copy;
+ tls_append_frag(record, &zc_pfrag, copy);
} else if (copy) {
copy = min_t(size_t, copy, pfrag->size - pfrag->offset);
@@ -572,6 +595,9 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
union tls_iter_offset iter;
int rc;
+ if (!tls_ctx->zerocopy_sendfile)
+ msg->msg_flags &= ~MSG_SPLICE_PAGES;
+
mutex_lock(&tls_ctx->tx_lock);
lock_sock(sk);
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v6 14/14] tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES
2023-06-07 18:19 [PATCH net-next v6 00/14] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
` (12 preceding siblings ...)
2023-06-07 18:19 ` [PATCH net-next v6 13/14] tls/device: Support MSG_SPLICE_PAGES David Howells
@ 2023-06-07 18:19 ` David Howells
2023-06-09 3:40 ` [PATCH net-next v6 00/14] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS patchwork-bot+netdevbpf
14 siblings, 0 replies; 16+ messages in thread
From: David Howells @ 2023-06-07 18:19 UTC (permalink / raw)
To: netdev, Linus Torvalds
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel
Convert tls_device_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather
than directly splicing in the pages itself. With that, the tls_iter_offset
union is no longer necessary and can be replaced with an iov_iter pointer
and the zc_page argument to tls_push_data() can also be removed.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
net/tls/tls_device.c | 92 +++++++++++---------------------------------
1 file changed, 23 insertions(+), 69 deletions(-)
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index bb3bb523544e..b4864d55900f 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -422,16 +422,10 @@ static int tls_device_copy_data(void *addr, size_t bytes, struct iov_iter *i)
return 0;
}
-union tls_iter_offset {
- struct iov_iter *msg_iter;
- int offset;
-};
-
static int tls_push_data(struct sock *sk,
- union tls_iter_offset iter_offset,
+ struct iov_iter *iter,
size_t size, int flags,
- unsigned char record_type,
- struct page *zc_page)
+ unsigned char record_type)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_prot_info *prot = &tls_ctx->prot_info;
@@ -500,22 +494,13 @@ static int tls_push_data(struct sock *sk,
record = ctx->open_record;
copy = min_t(size_t, size, max_open_record_len - record->len);
- if (copy && zc_page) {
- struct page_frag zc_pfrag;
-
- zc_pfrag.page = zc_page;
- zc_pfrag.offset = iter_offset.offset;
- zc_pfrag.size = copy;
- tls_append_frag(record, &zc_pfrag, copy);
-
- iter_offset.offset += copy;
- } else if (copy && (flags & MSG_SPLICE_PAGES)) {
+ if (copy && (flags & MSG_SPLICE_PAGES)) {
struct page_frag zc_pfrag;
struct page **pages = &zc_pfrag.page;
size_t off;
- rc = iov_iter_extract_pages(iter_offset.msg_iter,
- &pages, copy, 1, 0, &off);
+ rc = iov_iter_extract_pages(iter, &pages,
+ copy, 1, 0, &off);
if (rc <= 0) {
if (rc == 0)
rc = -EIO;
@@ -524,7 +509,7 @@ static int tls_push_data(struct sock *sk,
copy = rc;
if (WARN_ON_ONCE(!sendpage_ok(zc_pfrag.page))) {
- iov_iter_revert(iter_offset.msg_iter, copy);
+ iov_iter_revert(iter, copy);
rc = -EIO;
goto handle_error;
}
@@ -537,7 +522,7 @@ static int tls_push_data(struct sock *sk,
rc = tls_device_copy_data(page_address(pfrag->page) +
pfrag->offset, copy,
- iter_offset.msg_iter);
+ iter);
if (rc)
goto handle_error;
tls_append_frag(record, pfrag, copy);
@@ -592,7 +577,6 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
{
unsigned char record_type = TLS_RECORD_TYPE_DATA;
struct tls_context *tls_ctx = tls_get_ctx(sk);
- union tls_iter_offset iter;
int rc;
if (!tls_ctx->zerocopy_sendfile)
@@ -607,8 +591,8 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
goto out;
}
- iter.msg_iter = &msg->msg_iter;
- rc = tls_push_data(sk, iter, size, msg->msg_flags, record_type, NULL);
+ rc = tls_push_data(sk, &msg->msg_iter, size, msg->msg_flags,
+ record_type);
out:
release_sock(sk);
@@ -620,8 +604,7 @@ void tls_device_splice_eof(struct socket *sock)
{
struct sock *sk = sock->sk;
struct tls_context *tls_ctx = tls_get_ctx(sk);
- union tls_iter_offset iter;
- struct iov_iter iov_iter = {};
+ struct iov_iter iter = {};
if (!tls_is_partially_sent_record(tls_ctx))
return;
@@ -630,9 +613,8 @@ void tls_device_splice_eof(struct socket *sock)
lock_sock(sk);
if (tls_is_partially_sent_record(tls_ctx)) {
- iov_iter_bvec(&iov_iter, ITER_SOURCE, NULL, 0, 0);
- iter.msg_iter = &iov_iter;
- tls_push_data(sk, iter, 0, 0, TLS_RECORD_TYPE_DATA, NULL);
+ iov_iter_bvec(&iter, ITER_SOURCE, NULL, 0, 0);
+ tls_push_data(sk, &iter, 0, 0, TLS_RECORD_TYPE_DATA);
}
release_sock(sk);
@@ -642,44 +624,18 @@ void tls_device_splice_eof(struct socket *sock)
int tls_device_sendpage(struct sock *sk, struct page *page,
int offset, size_t size, int flags)
{
- struct tls_context *tls_ctx = tls_get_ctx(sk);
- union tls_iter_offset iter_offset;
- struct iov_iter msg_iter;
- char *kaddr;
- struct kvec iov;
- int rc;
+ struct bio_vec bvec;
+ struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };
if (flags & MSG_SENDPAGE_NOTLAST)
- flags |= MSG_MORE;
-
- mutex_lock(&tls_ctx->tx_lock);
- lock_sock(sk);
+ msg.msg_flags |= MSG_MORE;
- if (flags & MSG_OOB) {
- rc = -EOPNOTSUPP;
- goto out;
- }
-
- if (tls_ctx->zerocopy_sendfile) {
- iter_offset.offset = offset;
- rc = tls_push_data(sk, iter_offset, size,
- flags, TLS_RECORD_TYPE_DATA, page);
- goto out;
- }
-
- kaddr = kmap(page);
- iov.iov_base = kaddr + offset;
- iov.iov_len = size;
- iov_iter_kvec(&msg_iter, ITER_SOURCE, &iov, 1, size);
- iter_offset.msg_iter = &msg_iter;
- rc = tls_push_data(sk, iter_offset, size, flags, TLS_RECORD_TYPE_DATA,
- NULL);
- kunmap(page);
+ if (flags & MSG_OOB)
+ return -EOPNOTSUPP;
-out:
- release_sock(sk);
- mutex_unlock(&tls_ctx->tx_lock);
- return rc;
+ bvec_set_page(&bvec, page, size, offset);
+ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
+ return tls_device_sendmsg(sk, &msg, size);
}
struct tls_record_info *tls_get_record(struct tls_offload_context_tx *context,
@@ -744,12 +700,10 @@ EXPORT_SYMBOL(tls_get_record);
static int tls_device_push_pending_record(struct sock *sk, int flags)
{
- union tls_iter_offset iter;
- struct iov_iter msg_iter;
+ struct iov_iter iter;
- iov_iter_kvec(&msg_iter, ITER_SOURCE, NULL, 0, 0);
- iter.msg_iter = &msg_iter;
- return tls_push_data(sk, iter, 0, flags, TLS_RECORD_TYPE_DATA, NULL);
+ iov_iter_kvec(&iter, ITER_SOURCE, NULL, 0, 0);
+ return tls_push_data(sk, &iter, 0, flags, TLS_RECORD_TYPE_DATA);
}
void tls_device_write_space(struct sock *sk, struct tls_context *ctx)
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v6 00/14] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS
2023-06-07 18:19 [PATCH net-next v6 00/14] splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS David Howells
` (13 preceding siblings ...)
2023-06-07 18:19 ` [PATCH net-next v6 14/14] tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES David Howells
@ 2023-06-09 3:40 ` patchwork-bot+netdevbpf
14 siblings, 0 replies; 16+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-06-09 3:40 UTC (permalink / raw)
To: David Howells
Cc: netdev, torvalds, chuck.lever, borisp, john.fastabend, kuba,
davem, edumazet, pabeni, willemdebruijn.kernel, dsahern, willy,
axboe, linux-mm, linux-kernel
Hello:
This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Wed, 7 Jun 2023 19:19:06 +0100 you wrote:
> Here are patches to do the following:
>
> (1) Block MSG_SENDPAGE_* flags from leaking into ->sendmsg() from
> userspace, whilst allowing splice_to_socket() to pass them in.
>
> (2) Allow MSG_SPLICE_PAGES to be passed into tls_*_sendmsg(). Until
> support is added, it will be ignored and a splice-driven sendmsg()
> will be treated like a normal sendmsg(). TCP, UDP, AF_UNIX and
> Chelsio-TLS already handle the flag in net-next.
>
> [...]
Here is the summary with links:
- [net-next,v6,01/14] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace
https://git.kernel.org/netdev/net-next/c/4fe38acdac8a
- [net-next,v6,02/14] tls: Allow MSG_SPLICE_PAGES but treat it as normal sendmsg
https://git.kernel.org/netdev/net-next/c/81840b3b91aa
- [net-next,v6,03/14] splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()
https://git.kernel.org/netdev/net-next/c/2dc334f1a63a
- [net-next,v6,04/14] splice, net: Add a splice_eof op to file-ops and socket-ops
https://git.kernel.org/netdev/net-next/c/2bfc66850952
- [net-next,v6,05/14] tls/sw: Use splice_eof() to flush
https://git.kernel.org/netdev/net-next/c/df720d288dbb
- [net-next,v6,06/14] tls/device: Use splice_eof() to flush
https://git.kernel.org/netdev/net-next/c/d4c1e80b0d1b
- [net-next,v6,07/14] ipv4, ipv6: Use splice_eof() to flush
https://git.kernel.org/netdev/net-next/c/1d7e4538a546
- [net-next,v6,08/14] chelsio/chtls: Use splice_eof() to flush
https://git.kernel.org/netdev/net-next/c/c289a1601abd
- [net-next,v6,09/14] kcm: Use splice_eof() to flush
https://git.kernel.org/netdev/net-next/c/951ace995138
- [net-next,v6,10/14] splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor()
https://git.kernel.org/netdev/net-next/c/219d92056ba3
- [net-next,v6,11/14] tls/sw: Support MSG_SPLICE_PAGES
https://git.kernel.org/netdev/net-next/c/fe1e81d4f73b
- [net-next,v6,12/14] tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES
https://git.kernel.org/netdev/net-next/c/45e5be844ab6
- [net-next,v6,13/14] tls/device: Support MSG_SPLICE_PAGES
https://git.kernel.org/netdev/net-next/c/24763c9c0980
- [net-next,v6,14/14] tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES
https://git.kernel.org/netdev/net-next/c/3dc8976c7ad6
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 16+ messages in thread