* [PATCH net-next v2 1/6] splice, net: Fix MSG_MORE signalling in splice_direct_to_actor()
2023-05-31 12:45 [PATCH net-next v2 0/6] splice, net: Handle MSG_SPLICE_PAGES in AF_TLS David Howells
@ 2023-05-31 12:45 ` David Howells
2023-05-31 12:45 ` [PATCH net-next v2 2/6] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace David Howells
` (4 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: David Howells @ 2023-05-31 12:45 UTC (permalink / raw)
To: netdev
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel, Christoph Hellwig, Linus Torvalds,
Al Viro, Jan Kara, Jeff Layton, David Hildenbrand,
Christian Brauner, linux-fsdevel, linux-block
splice_direct_to_actor() doesn't manage SPLICE_F_MORE correctly - and, as a
result, incorrectly signals MSG_MORE when splicing to a socket. The
problem happens when a short splice occurs because we got a short read due
to hitting the EOF on a file. Because the length read (read_len) is less
than the remaining size to be spliced (len), SPLICE_F_MORE is set.
This causes MSG_MORE to be set by pipe_to_sendpage(), indicating to the
network protocol that more data is to be expected. With the changes I want
to make to switch from using sendpage to using sendmsg(MSG_SPLICE_PAGES),
MSG_MORE needs to work properly.
This was observed with the multi_chunk_sendfile tests in the tls kselftest
program. Some of those tests would hang and time out when the last chunk
of file was less than the sendfile request size.
This has been observed before[1] and worked around in AF_TLS[2].
Fix this by checking to see if the source file is seekable if we get a
short read and, if it is, checking to see if we hit the file size. This
should also work for block devices.
This won't help procfiles and suchlike as they're zero length files that
can be read from[3]. To handle that, should splice make a zero-length call
with SPLICE_F_MORE cleared (assuming it wasn't set by userspace via
splice()) if it gets a zero-length read?
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christoph Hellwig <hch@lst.de>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jan Kara <jack@suse.cz>
cc: Jeff Layton <jlayton@kernel.org>
cc: David Hildenbrand <david@redhat.com>
cc: Christian Brauner <brauner@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/netdev/1591392508-14592-1-git-send-email-pooja.trivedi@stackpath.com/ [1]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=d452d48b9f8b1a7f8152d33ef52cfd7fe1735b0a [2]
Link: https://lore.kernel.org/r/CAHk-=wjDq5_wLWrapzFiJ3ZNn6aGFWeMJpAj5q+4z-Ok8DD9dA@mail.gmail.com/ [3]
---
fs/splice.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/fs/splice.c b/fs/splice.c
index 3e06611d19ae..a7cf216c02a7 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -982,10 +982,21 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
* If this is the last data and SPLICE_F_MORE was not set
* initially, clears it.
*/
- if (read_len < len)
- sd->flags |= SPLICE_F_MORE;
- else if (!more)
+ if (read_len < len) {
+ struct inode *ii = in->f_mapping->host;
+
+ if (ii->i_fop->llseek != noop_llseek &&
+ pos >= i_size_read(ii)) {
+ if (!more)
+ sd->flags &= ~SPLICE_F_MORE;
+ } else {
+ sd->flags |= SPLICE_F_MORE;
+ }
+
+ } else if (!more) {
sd->flags &= ~SPLICE_F_MORE;
+ }
+
/*
* NOTE: nonblocking mode only applies to the input. We
* must not do the output in nonblocking mode as then we
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v2 2/6] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace
2023-05-31 12:45 [PATCH net-next v2 0/6] splice, net: Handle MSG_SPLICE_PAGES in AF_TLS David Howells
2023-05-31 12:45 ` [PATCH net-next v2 1/6] splice, net: Fix MSG_MORE signalling in splice_direct_to_actor() David Howells
@ 2023-05-31 12:45 ` David Howells
2023-05-31 17:02 ` Simon Horman
2023-05-31 12:45 ` [PATCH net-next v2 3/6] tls/sw: Support MSG_SPLICE_PAGES David Howells
` (3 subsequent siblings)
5 siblings, 1 reply; 8+ messages in thread
From: David Howells @ 2023-05-31 12:45 UTC (permalink / raw)
To: netdev
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel
It is necessary to allow MSG_SENDPAGE_* to be passed into ->sendmsg() to
allow sendmsg(MSG_SPLICE_PAGES) to replace ->sendpage(). Unblocking them
in the network protocol, however, allows these flags to be passed in by
userspace too[1].
Fix this by marking MSG_SENDPAGE_NOPOLICY, MSG_SENDPAGE_NOTLAST and
MSG_SENDPAGE_DECRYPTED as internal flags, which causes sendmsg() to object
if they are passed to sendmsg() by userspace. Network protocol ->sendmsg()
implementations can then allow them through.
Note that it should be possible to remove MSG_SENDPAGE_NOTLAST once
sendpage is removed as a whole slew of pages will be passed in in one go by
splice through sendmsg, with MSG_MORE being set if it has more data waiting
in the pipe.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20230526181338.03a99016@kernel.org/ [1]
---
include/linux/socket.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/socket.h b/include/linux/socket.h
index bd1cc3238851..3fd3436bc09f 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -339,7 +339,9 @@ struct ucred {
#endif
/* Flags to be cleared on entry by sendmsg and sendmmsg syscalls */
-#define MSG_INTERNAL_SENDMSG_FLAGS (MSG_SPLICE_PAGES)
+#define MSG_INTERNAL_SENDMSG_FLAGS \
+ (MSG_SPLICE_PAGES | MSG_SENDPAGE_NOPOLICY | MSG_SENDPAGE_NOTLAST | \
+ MSG_SENDPAGE_DECRYPTED)
/* Setsockoptions(2) level. Thanks to BSD these must match IPPROTO_xxx */
#define SOL_IP 0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v2 2/6] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace
2023-05-31 12:45 ` [PATCH net-next v2 2/6] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace David Howells
@ 2023-05-31 17:02 ` Simon Horman
0 siblings, 0 replies; 8+ messages in thread
From: Simon Horman @ 2023-05-31 17:02 UTC (permalink / raw)
To: David Howells
Cc: netdev, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel
On Wed, May 31, 2023 at 01:45:24PM +0100, David Howells wrote:
> It is necessary to allow MSG_SENDPAGE_* to be passed into ->sendmsg() to
> allow sendmsg(MSG_SPLICE_PAGES) to replace ->sendpage(). Unblocking them
> in the network protocol, however, allows these flags to be passed in by
> userspace too[1].
>
> Fix this by marking MSG_SENDPAGE_NOPOLICY, MSG_SENDPAGE_NOTLAST and
> MSG_SENDPAGE_DECRYPTED as internal flags, which causes sendmsg() to object
> if they are passed to sendmsg() by userspace. Network protocol ->sendmsg()
> implementations can then allow them through.
>
> Note that it should be possible to remove MSG_SENDPAGE_NOTLAST once
> sendpage is removed as a whole slew of pages will be passed in in one go by
Hi David,
on the off-chance that you need to respin for some other reason:
s/in in/in/
> splice through sendmsg, with MSG_MORE being set if it has more data waiting
> in the pipe.
...
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net-next v2 3/6] tls/sw: Support MSG_SPLICE_PAGES
2023-05-31 12:45 [PATCH net-next v2 0/6] splice, net: Handle MSG_SPLICE_PAGES in AF_TLS David Howells
2023-05-31 12:45 ` [PATCH net-next v2 1/6] splice, net: Fix MSG_MORE signalling in splice_direct_to_actor() David Howells
2023-05-31 12:45 ` [PATCH net-next v2 2/6] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace David Howells
@ 2023-05-31 12:45 ` David Howells
2023-05-31 12:45 ` [PATCH net-next v2 4/6] tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES David Howells
` (2 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: David Howells @ 2023-05-31 12:45 UTC (permalink / raw)
To: netdev
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel
Make TLS's sendmsg() support MSG_SPLICE_PAGES. This causes pages to be
spliced from the source iterator if possible.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
Notes:
ver #2)
- "rls_" should be "tls_".
net/tls/tls_sw.c | 46 +++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 45 insertions(+), 1 deletion(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 6e6a7c37d685..60d807b63f75 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -929,6 +929,38 @@ static int tls_sw_push_pending_record(struct sock *sk, int flags)
&copied, flags);
}
+static int tls_sw_sendmsg_splice(struct sock *sk, struct msghdr *msg,
+ struct sk_msg *msg_pl, size_t try_to_copy,
+ ssize_t *copied)
+{
+ struct page *page = NULL, **pages = &page;
+
+ do {
+ ssize_t part;
+ size_t off;
+ bool put = false;
+
+ part = iov_iter_extract_pages(&msg->msg_iter, &pages,
+ try_to_copy, 1, 0, &off);
+ if (part <= 0)
+ return part ?: -EIO;
+
+ if (WARN_ON_ONCE(!sendpage_ok(page))) {
+ iov_iter_revert(&msg->msg_iter, part);
+ return -EIO;
+ }
+
+ sk_msg_page_add(msg_pl, page, part, off);
+ sk_mem_charge(sk, part);
+ if (put)
+ put_page(page);
+ *copied += part;
+ try_to_copy -= part;
+ } while (try_to_copy && !sk_msg_full(msg_pl));
+
+ return 0;
+}
+
int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
{
long timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
@@ -1018,6 +1050,17 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
full_record = true;
}
+ if (try_to_copy && (msg->msg_flags & MSG_SPLICE_PAGES)) {
+ ret = tls_sw_sendmsg_splice(sk, msg, msg_pl,
+ try_to_copy, &copied);
+ if (ret < 0)
+ goto send_end;
+ tls_ctx->pending_open_record_frags = true;
+ if (full_record || eor || sk_msg_full(msg_pl))
+ goto copied;
+ continue;
+ }
+
if (!is_kvec && (full_record || eor) && !async_capable) {
u32 first = msg_pl->sg.end;
@@ -1080,8 +1123,9 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
/* Open records defined only if successfully copied, otherwise
* we would trim the sg but not reset the open record frags.
*/
- tls_ctx->pending_open_record_frags = true;
copied += try_to_copy;
+copied:
+ tls_ctx->pending_open_record_frags = true;
if (full_record || eor) {
ret = bpf_exec_tx_verdict(msg_pl, sk, full_record,
record_type, &copied,
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v2 4/6] tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES
2023-05-31 12:45 [PATCH net-next v2 0/6] splice, net: Handle MSG_SPLICE_PAGES in AF_TLS David Howells
` (2 preceding siblings ...)
2023-05-31 12:45 ` [PATCH net-next v2 3/6] tls/sw: Support MSG_SPLICE_PAGES David Howells
@ 2023-05-31 12:45 ` David Howells
2023-05-31 12:45 ` [PATCH net-next v2 5/6] tls/device: Support MSG_SPLICE_PAGES David Howells
2023-05-31 12:45 ` [PATCH net-next v2 6/6] tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES David Howells
5 siblings, 0 replies; 8+ messages in thread
From: David Howells @ 2023-05-31 12:45 UTC (permalink / raw)
To: netdev
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel, bpf
Convert tls_sw_sendpage() and tls_sw_sendpage_locked() to use sendmsg()
with MSG_SPLICE_PAGES rather than directly splicing in the pages itself.
[!] Note that tls_sw_sendpage_locked() appears to have the wrong locking
upstream. I think the caller will only hold the socket lock, but it
should hold tls_ctx->tx_lock too.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
cc: bpf@vger.kernel.org
---
net/tls/tls_sw.c | 165 +++++++++--------------------------------------
1 file changed, 31 insertions(+), 134 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 60d807b63f75..f63e4405cf34 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -961,7 +961,8 @@ static int tls_sw_sendmsg_splice(struct sock *sk, struct msghdr *msg,
return 0;
}
-int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
+static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
+ size_t size)
{
long timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
struct tls_context *tls_ctx = tls_get_ctx(sk);
@@ -984,15 +985,6 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
int ret = 0;
int pending;
- if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
- MSG_CMSG_COMPAT))
- return -EOPNOTSUPP;
-
- ret = mutex_lock_interruptible(&tls_ctx->tx_lock);
- if (ret)
- return ret;
- lock_sock(sk);
-
if (unlikely(msg->msg_controllen)) {
ret = tls_process_cmsg(sk, msg, &record_type);
if (ret) {
@@ -1193,157 +1185,62 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
send_end:
ret = sk_stream_error(sk, msg->msg_flags, ret);
-
- release_sock(sk);
- mutex_unlock(&tls_ctx->tx_lock);
return copied > 0 ? copied : ret;
}
-static int tls_sw_do_sendpage(struct sock *sk, struct page *page,
- int offset, size_t size, int flags)
+int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
{
- long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
struct tls_context *tls_ctx = tls_get_ctx(sk);
- struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
- struct tls_prot_info *prot = &tls_ctx->prot_info;
- unsigned char record_type = TLS_RECORD_TYPE_DATA;
- struct sk_msg *msg_pl;
- struct tls_rec *rec;
- int num_async = 0;
- ssize_t copied = 0;
- bool full_record;
- int record_room;
- int ret = 0;
- bool eor;
-
- eor = !(flags & MSG_SENDPAGE_NOTLAST);
- sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk);
-
- /* Call the sk_stream functions to manage the sndbuf mem. */
- while (size > 0) {
- size_t copy, required_size;
-
- if (sk->sk_err) {
- ret = -sk->sk_err;
- goto sendpage_end;
- }
-
- if (ctx->open_rec)
- rec = ctx->open_rec;
- else
- rec = ctx->open_rec = tls_get_rec(sk);
- if (!rec) {
- ret = -ENOMEM;
- goto sendpage_end;
- }
-
- msg_pl = &rec->msg_plaintext;
-
- full_record = false;
- record_room = TLS_MAX_PAYLOAD_SIZE - msg_pl->sg.size;
- copy = size;
- if (copy >= record_room) {
- copy = record_room;
- full_record = true;
- }
-
- required_size = msg_pl->sg.size + copy + prot->overhead_size;
-
- if (!sk_stream_memory_free(sk))
- goto wait_for_sndbuf;
-alloc_payload:
- ret = tls_alloc_encrypted_msg(sk, required_size);
- if (ret) {
- if (ret != -ENOSPC)
- goto wait_for_memory;
-
- /* Adjust copy according to the amount that was
- * actually allocated. The difference is due
- * to max sg elements limit
- */
- copy -= required_size - msg_pl->sg.size;
- full_record = true;
- }
-
- sk_msg_page_add(msg_pl, page, copy, offset);
- sk_mem_charge(sk, copy);
-
- offset += copy;
- size -= copy;
- copied += copy;
-
- tls_ctx->pending_open_record_frags = true;
- if (full_record || eor || sk_msg_full(msg_pl)) {
- ret = bpf_exec_tx_verdict(msg_pl, sk, full_record,
- record_type, &copied, flags);
- if (ret) {
- if (ret == -EINPROGRESS)
- num_async++;
- else if (ret == -ENOMEM)
- goto wait_for_memory;
- else if (ret != -EAGAIN) {
- if (ret == -ENOSPC)
- ret = 0;
- goto sendpage_end;
- }
- }
- }
- continue;
-wait_for_sndbuf:
- set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
-wait_for_memory:
- ret = sk_stream_wait_memory(sk, &timeo);
- if (ret) {
- if (ctx->open_rec)
- tls_trim_both_msgs(sk, msg_pl->sg.size);
- goto sendpage_end;
- }
+ int ret;
- if (ctx->open_rec)
- goto alloc_payload;
- }
+ if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
+ MSG_CMSG_COMPAT | MSG_SPLICE_PAGES |
+ MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY))
+ return -EOPNOTSUPP;
- if (num_async) {
- /* Transmit if any encryptions have completed */
- if (test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) {
- cancel_delayed_work(&ctx->tx_work.work);
- tls_tx_records(sk, flags);
- }
- }
-sendpage_end:
- ret = sk_stream_error(sk, flags, ret);
- return copied > 0 ? copied : ret;
+ ret = mutex_lock_interruptible(&tls_ctx->tx_lock);
+ if (ret)
+ return ret;
+ lock_sock(sk);
+ ret = tls_sw_sendmsg_locked(sk, msg, size);
+ release_sock(sk);
+ mutex_unlock(&tls_ctx->tx_lock);
+ return ret;
}
int tls_sw_sendpage_locked(struct sock *sk, struct page *page,
int offset, size_t size, int flags)
{
+ struct bio_vec bvec;
+ struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };
+
if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY |
MSG_NO_SHARED_FRAGS))
return -EOPNOTSUPP;
+ if (flags & MSG_SENDPAGE_NOTLAST)
+ msg.msg_flags |= MSG_MORE;
- return tls_sw_do_sendpage(sk, page, offset, size, flags);
+ bvec_set_page(&bvec, page, size, offset);
+ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
+ return tls_sw_sendmsg_locked(sk, &msg, size);
}
int tls_sw_sendpage(struct sock *sk, struct page *page,
int offset, size_t size, int flags)
{
- struct tls_context *tls_ctx = tls_get_ctx(sk);
- int ret;
+ struct bio_vec bvec;
+ struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };
if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY))
return -EOPNOTSUPP;
+ if (flags & MSG_SENDPAGE_NOTLAST)
+ msg.msg_flags |= MSG_MORE;
- ret = mutex_lock_interruptible(&tls_ctx->tx_lock);
- if (ret)
- return ret;
- lock_sock(sk);
- ret = tls_sw_do_sendpage(sk, page, offset, size, flags);
- release_sock(sk);
- mutex_unlock(&tls_ctx->tx_lock);
- return ret;
+ bvec_set_page(&bvec, page, size, offset);
+ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
+ return tls_sw_sendmsg(sk, &msg, size);
}
static int
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v2 5/6] tls/device: Support MSG_SPLICE_PAGES
2023-05-31 12:45 [PATCH net-next v2 0/6] splice, net: Handle MSG_SPLICE_PAGES in AF_TLS David Howells
` (3 preceding siblings ...)
2023-05-31 12:45 ` [PATCH net-next v2 4/6] tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES David Howells
@ 2023-05-31 12:45 ` David Howells
2023-05-31 12:45 ` [PATCH net-next v2 6/6] tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES David Howells
5 siblings, 0 replies; 8+ messages in thread
From: David Howells @ 2023-05-31 12:45 UTC (permalink / raw)
To: netdev
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel
Make TLS's device sendmsg() support MSG_SPLICE_PAGES. This causes pages to
be spliced from the source iterator if possible.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
net/tls/tls_device.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index a959572a816f..221fb30cba51 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -508,6 +508,29 @@ static int tls_push_data(struct sock *sk,
tls_append_frag(record, &zc_pfrag, copy);
iter_offset.offset += copy;
+ } else if (copy && (flags & MSG_SPLICE_PAGES)) {
+ struct page_frag zc_pfrag;
+ struct page **pages = &zc_pfrag.page;
+ size_t off;
+
+ rc = iov_iter_extract_pages(iter_offset.msg_iter,
+ &pages, copy, 1, 0, &off);
+ if (rc <= 0) {
+ if (rc == 0)
+ rc = -EIO;
+ goto handle_error;
+ }
+ copy = rc;
+
+ if (WARN_ON_ONCE(!sendpage_ok(zc_pfrag.page))) {
+ iov_iter_revert(iter_offset.msg_iter, copy);
+ rc = -EIO;
+ goto handle_error;
+ }
+
+ zc_pfrag.offset = off;
+ zc_pfrag.size = copy;
+ tls_append_frag(record, &zc_pfrag, copy);
} else if (copy) {
copy = min_t(size_t, copy, pfrag->size - pfrag->offset);
@@ -571,6 +594,9 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
union tls_iter_offset iter;
int rc;
+ if (!tls_ctx->zerocopy_sendfile)
+ msg->msg_flags &= ~MSG_SPLICE_PAGES;
+
mutex_lock(&tls_ctx->tx_lock);
lock_sock(sk);
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v2 6/6] tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES
2023-05-31 12:45 [PATCH net-next v2 0/6] splice, net: Handle MSG_SPLICE_PAGES in AF_TLS David Howells
` (4 preceding siblings ...)
2023-05-31 12:45 ` [PATCH net-next v2 5/6] tls/device: Support MSG_SPLICE_PAGES David Howells
@ 2023-05-31 12:45 ` David Howells
5 siblings, 0 replies; 8+ messages in thread
From: David Howells @ 2023-05-31 12:45 UTC (permalink / raw)
To: netdev
Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
linux-mm, linux-kernel
Convert tls_device_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather
than directly splicing in the pages itself. With that, the tls_iter_offset
union is no longer necessary and can be replaced with an iov_iter pointer
and the zc_page argument to tls_push_data() can also be removed.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
net/tls/tls_device.c | 84 +++++++++++---------------------------------
1 file changed, 20 insertions(+), 64 deletions(-)
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 221fb30cba51..684171363dd9 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -422,16 +422,10 @@ static int tls_device_copy_data(void *addr, size_t bytes, struct iov_iter *i)
return 0;
}
-union tls_iter_offset {
- struct iov_iter *msg_iter;
- int offset;
-};
-
static int tls_push_data(struct sock *sk,
- union tls_iter_offset iter_offset,
+ struct iov_iter *iter,
size_t size, int flags,
- unsigned char record_type,
- struct page *zc_page)
+ unsigned char record_type)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_prot_info *prot = &tls_ctx->prot_info;
@@ -499,22 +493,13 @@ static int tls_push_data(struct sock *sk,
record = ctx->open_record;
copy = min_t(size_t, size, max_open_record_len - record->len);
- if (copy && zc_page) {
- struct page_frag zc_pfrag;
-
- zc_pfrag.page = zc_page;
- zc_pfrag.offset = iter_offset.offset;
- zc_pfrag.size = copy;
- tls_append_frag(record, &zc_pfrag, copy);
-
- iter_offset.offset += copy;
- } else if (copy && (flags & MSG_SPLICE_PAGES)) {
+ if (copy && (flags & MSG_SPLICE_PAGES)) {
struct page_frag zc_pfrag;
struct page **pages = &zc_pfrag.page;
size_t off;
- rc = iov_iter_extract_pages(iter_offset.msg_iter,
- &pages, copy, 1, 0, &off);
+ rc = iov_iter_extract_pages(iter, &pages,
+ copy, 1, 0, &off);
if (rc <= 0) {
if (rc == 0)
rc = -EIO;
@@ -523,7 +508,7 @@ static int tls_push_data(struct sock *sk,
copy = rc;
if (WARN_ON_ONCE(!sendpage_ok(zc_pfrag.page))) {
- iov_iter_revert(iter_offset.msg_iter, copy);
+ iov_iter_revert(iter, copy);
rc = -EIO;
goto handle_error;
}
@@ -536,7 +521,7 @@ static int tls_push_data(struct sock *sk,
rc = tls_device_copy_data(page_address(pfrag->page) +
pfrag->offset, copy,
- iter_offset.msg_iter);
+ iter);
if (rc)
goto handle_error;
tls_append_frag(record, pfrag, copy);
@@ -591,7 +576,6 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
{
unsigned char record_type = TLS_RECORD_TYPE_DATA;
struct tls_context *tls_ctx = tls_get_ctx(sk);
- union tls_iter_offset iter;
int rc;
if (!tls_ctx->zerocopy_sendfile)
@@ -606,8 +590,8 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
goto out;
}
- iter.msg_iter = &msg->msg_iter;
- rc = tls_push_data(sk, iter, size, msg->msg_flags, record_type, NULL);
+ rc = tls_push_data(sk, &msg->msg_iter, size, msg->msg_flags,
+ record_type);
out:
release_sock(sk);
@@ -618,44 +602,18 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
int tls_device_sendpage(struct sock *sk, struct page *page,
int offset, size_t size, int flags)
{
- struct tls_context *tls_ctx = tls_get_ctx(sk);
- union tls_iter_offset iter_offset;
- struct iov_iter msg_iter;
- char *kaddr;
- struct kvec iov;
- int rc;
+ struct bio_vec bvec;
+ struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };
if (flags & MSG_SENDPAGE_NOTLAST)
- flags |= MSG_MORE;
-
- mutex_lock(&tls_ctx->tx_lock);
- lock_sock(sk);
+ msg.msg_flags |= MSG_MORE;
- if (flags & MSG_OOB) {
- rc = -EOPNOTSUPP;
- goto out;
- }
-
- if (tls_ctx->zerocopy_sendfile) {
- iter_offset.offset = offset;
- rc = tls_push_data(sk, iter_offset, size,
- flags, TLS_RECORD_TYPE_DATA, page);
- goto out;
- }
-
- kaddr = kmap(page);
- iov.iov_base = kaddr + offset;
- iov.iov_len = size;
- iov_iter_kvec(&msg_iter, ITER_SOURCE, &iov, 1, size);
- iter_offset.msg_iter = &msg_iter;
- rc = tls_push_data(sk, iter_offset, size, flags, TLS_RECORD_TYPE_DATA,
- NULL);
- kunmap(page);
+ if (flags & MSG_OOB)
+ return -EOPNOTSUPP;
-out:
- release_sock(sk);
- mutex_unlock(&tls_ctx->tx_lock);
- return rc;
+ bvec_set_page(&bvec, page, size, offset);
+ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
+ return tls_device_sendmsg(sk, &msg, size);
}
struct tls_record_info *tls_get_record(struct tls_offload_context_tx *context,
@@ -720,12 +678,10 @@ EXPORT_SYMBOL(tls_get_record);
static int tls_device_push_pending_record(struct sock *sk, int flags)
{
- union tls_iter_offset iter;
- struct iov_iter msg_iter;
+ struct iov_iter iter;
- iov_iter_kvec(&msg_iter, ITER_SOURCE, NULL, 0, 0);
- iter.msg_iter = &msg_iter;
- return tls_push_data(sk, iter, 0, flags, TLS_RECORD_TYPE_DATA, NULL);
+ iov_iter_kvec(&iter, ITER_SOURCE, NULL, 0, 0);
+ return tls_push_data(sk, &iter, 0, flags, TLS_RECORD_TYPE_DATA);
}
void tls_device_write_space(struct sock *sk, struct tls_context *ctx)
^ permalink raw reply related [flat|nested] 8+ messages in thread