linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/6] splice, net: Handle MSG_SPLICE_PAGES in AF_TLS
@ 2023-05-31 12:45 David Howells
  2023-05-31 12:45 ` [PATCH net-next v2 1/6] splice, net: Fix MSG_MORE signalling in splice_direct_to_actor() David Howells
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: David Howells @ 2023-05-31 12:45 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
	Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
	Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
	linux-mm, linux-kernel

Here are patches to make AF_TLS handle the MSG_SPLICE_PAGES internal
sendmsg flag.  MSG_SPLICE_PAGES is an internal hint that tells the protocol
that it should splice the pages supplied if it can.  Its sendpage
implementations are then turned into wrappers around that.

The following changes are made:

 (1) The first patch fixes MSG_MORE signalling in splice_direct_to_actor().

     The problem is that MSG_MORE is asserted if a short splice from a file
     occurs because we hit the EOF[1].  This is fixed by checking the EOF
     if the file is seekable.

 (2) Implement MSG_SPLICE_PAGES support in the AF_TLS-sw sendmsg and make
     tls_sw_sendpage() just a wrapper around sendmsg().

 (3) Implement MSG_SPLICE_PAGES support in AF_TLS-device and make
     tls_device_sendpage() just a wrapper around sendmsg().

I've pushed the patches here also:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=sendpage-2-tls

David

Changes
=======
ver #2)
 - Dropped the slab data copying.
 - "rls_" should be "tls_".
 - Attempted to fix splice_direct_to_actor().
 - Blocked MSG_SENDPAGE_* from being set by userspace.

Link: https://lore.kernel.org/r/499791.1685485603@warthog.procyon.org.uk/ [1]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=51c78a4d532efe9543a4df019ff405f05c6157f6 # part 1
Link: https://lore.kernel.org/r/20230524153311.3625329-1-dhowells@redhat.com/ # v1

David Howells (6):
  splice, net: Fix MSG_MORE signalling in splice_direct_to_actor()
  net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace
  tls/sw: Support MSG_SPLICE_PAGES
  tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES
  tls/device: Support MSG_SPLICE_PAGES
  tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES

 fs/splice.c            |  17 +++-
 include/linux/socket.h |   4 +-
 net/tls/tls_device.c   |  94 ++++++++----------
 net/tls/tls_sw.c       | 211 +++++++++++++++--------------------------
 4 files changed, 131 insertions(+), 195 deletions(-)



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net-next v2 1/6] splice, net: Fix MSG_MORE signalling in splice_direct_to_actor()
  2023-05-31 12:45 [PATCH net-next v2 0/6] splice, net: Handle MSG_SPLICE_PAGES in AF_TLS David Howells
@ 2023-05-31 12:45 ` David Howells
  2023-05-31 12:45 ` [PATCH net-next v2 2/6] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace David Howells
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: David Howells @ 2023-05-31 12:45 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
	Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
	Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
	linux-mm, linux-kernel, Christoph Hellwig, Linus Torvalds,
	Al Viro, Jan Kara, Jeff Layton, David Hildenbrand,
	Christian Brauner, linux-fsdevel, linux-block

splice_direct_to_actor() doesn't manage SPLICE_F_MORE correctly - and, as a
result, incorrectly signals MSG_MORE when splicing to a socket.  The
problem happens when a short splice occurs because we got a short read due
to hitting the EOF on a file.  Because the length read (read_len) is less
than the remaining size to be spliced (len), SPLICE_F_MORE is set.

This causes MSG_MORE to be set by pipe_to_sendpage(), indicating to the
network protocol that more data is to be expected.  With the changes I want
to make to switch from using sendpage to using sendmsg(MSG_SPLICE_PAGES),
MSG_MORE needs to work properly.

This was observed with the multi_chunk_sendfile tests in the tls kselftest
program.  Some of those tests would hang and time out when the last chunk
of file was less than the sendfile request size.

This has been observed before[1] and worked around in AF_TLS[2].

Fix this by checking to see if the source file is seekable if we get a
short read and, if it is, checking to see if we hit the file size.  This
should also work for block devices.

This won't help procfiles and suchlike as they're zero length files that
can be read from[3].  To handle that, should splice make a zero-length call
with SPLICE_F_MORE cleared (assuming it wasn't set by userspace via
splice()) if it gets a zero-length read?

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christoph Hellwig <hch@lst.de>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jan Kara <jack@suse.cz>
cc: Jeff Layton <jlayton@kernel.org>
cc: David Hildenbrand <david@redhat.com>
cc: Christian Brauner <brauner@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
cc: netdev@vger.kernel.org

Link: https://lore.kernel.org/netdev/1591392508-14592-1-git-send-email-pooja.trivedi@stackpath.com/ [1]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=d452d48b9f8b1a7f8152d33ef52cfd7fe1735b0a [2]
Link: https://lore.kernel.org/r/CAHk-=wjDq5_wLWrapzFiJ3ZNn6aGFWeMJpAj5q+4z-Ok8DD9dA@mail.gmail.com/ [3]
---
 fs/splice.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 3e06611d19ae..a7cf216c02a7 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -982,10 +982,21 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
 		 * If this is the last data and SPLICE_F_MORE was not set
 		 * initially, clears it.
 		 */
-		if (read_len < len)
-			sd->flags |= SPLICE_F_MORE;
-		else if (!more)
+		if (read_len < len) {
+			struct inode *ii = in->f_mapping->host;
+
+			if (ii->i_fop->llseek != noop_llseek &&
+			    pos >= i_size_read(ii)) {
+				if (!more)
+					sd->flags &= ~SPLICE_F_MORE;
+			} else {
+				sd->flags |= SPLICE_F_MORE;
+			}
+
+		} else if (!more) {
 			sd->flags &= ~SPLICE_F_MORE;
+		}
+
 		/*
 		 * NOTE: nonblocking mode only applies to the input. We
 		 * must not do the output in nonblocking mode as then we



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next v2 2/6] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace
  2023-05-31 12:45 [PATCH net-next v2 0/6] splice, net: Handle MSG_SPLICE_PAGES in AF_TLS David Howells
  2023-05-31 12:45 ` [PATCH net-next v2 1/6] splice, net: Fix MSG_MORE signalling in splice_direct_to_actor() David Howells
@ 2023-05-31 12:45 ` David Howells
  2023-05-31 17:02   ` Simon Horman
  2023-05-31 12:45 ` [PATCH net-next v2 3/6] tls/sw: Support MSG_SPLICE_PAGES David Howells
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 8+ messages in thread
From: David Howells @ 2023-05-31 12:45 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
	Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
	Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
	linux-mm, linux-kernel

It is necessary to allow MSG_SENDPAGE_* to be passed into ->sendmsg() to
allow sendmsg(MSG_SPLICE_PAGES) to replace ->sendpage().  Unblocking them
in the network protocol, however, allows these flags to be passed in by
userspace too[1].

Fix this by marking MSG_SENDPAGE_NOPOLICY, MSG_SENDPAGE_NOTLAST and
MSG_SENDPAGE_DECRYPTED as internal flags, which causes sendmsg() to object
if they are passed to sendmsg() by userspace.  Network protocol ->sendmsg()
implementations can then allow them through.

Note that it should be possible to remove MSG_SENDPAGE_NOTLAST once
sendpage is removed as a whole slew of pages will be passed in in one go by
splice through sendmsg, with MSG_MORE being set if it has more data waiting
in the pipe.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20230526181338.03a99016@kernel.org/ [1]
---
 include/linux/socket.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index bd1cc3238851..3fd3436bc09f 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -339,7 +339,9 @@ struct ucred {
 #endif
 
 /* Flags to be cleared on entry by sendmsg and sendmmsg syscalls */
-#define MSG_INTERNAL_SENDMSG_FLAGS (MSG_SPLICE_PAGES)
+#define MSG_INTERNAL_SENDMSG_FLAGS \
+	(MSG_SPLICE_PAGES | MSG_SENDPAGE_NOPOLICY | MSG_SENDPAGE_NOTLAST | \
+	 MSG_SENDPAGE_DECRYPTED)
 
 /* Setsockoptions(2) level. Thanks to BSD these must match IPPROTO_xxx */
 #define SOL_IP		0



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next v2 3/6] tls/sw: Support MSG_SPLICE_PAGES
  2023-05-31 12:45 [PATCH net-next v2 0/6] splice, net: Handle MSG_SPLICE_PAGES in AF_TLS David Howells
  2023-05-31 12:45 ` [PATCH net-next v2 1/6] splice, net: Fix MSG_MORE signalling in splice_direct_to_actor() David Howells
  2023-05-31 12:45 ` [PATCH net-next v2 2/6] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace David Howells
@ 2023-05-31 12:45 ` David Howells
  2023-05-31 12:45 ` [PATCH net-next v2 4/6] tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES David Howells
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: David Howells @ 2023-05-31 12:45 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
	Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
	Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
	linux-mm, linux-kernel

Make TLS's sendmsg() support MSG_SPLICE_PAGES.  This causes pages to be
spliced from the source iterator if possible.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---

Notes:
    ver #2)
     - "rls_" should be "tls_".

 net/tls/tls_sw.c | 46 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 6e6a7c37d685..60d807b63f75 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -929,6 +929,38 @@ static int tls_sw_push_pending_record(struct sock *sk, int flags)
 				   &copied, flags);
 }
 
+static int tls_sw_sendmsg_splice(struct sock *sk, struct msghdr *msg,
+				 struct sk_msg *msg_pl, size_t try_to_copy,
+				 ssize_t *copied)
+{
+	struct page *page = NULL, **pages = &page;
+
+	do {
+		ssize_t part;
+		size_t off;
+		bool put = false;
+
+		part = iov_iter_extract_pages(&msg->msg_iter, &pages,
+					      try_to_copy, 1, 0, &off);
+		if (part <= 0)
+			return part ?: -EIO;
+
+		if (WARN_ON_ONCE(!sendpage_ok(page))) {
+			iov_iter_revert(&msg->msg_iter, part);
+			return -EIO;
+		}
+
+		sk_msg_page_add(msg_pl, page, part, off);
+		sk_mem_charge(sk, part);
+		if (put)
+			put_page(page);
+		*copied += part;
+		try_to_copy -= part;
+	} while (try_to_copy && !sk_msg_full(msg_pl));
+
+	return 0;
+}
+
 int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 {
 	long timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
@@ -1018,6 +1050,17 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 			full_record = true;
 		}
 
+		if (try_to_copy && (msg->msg_flags & MSG_SPLICE_PAGES)) {
+			ret = tls_sw_sendmsg_splice(sk, msg, msg_pl,
+						    try_to_copy, &copied);
+			if (ret < 0)
+				goto send_end;
+			tls_ctx->pending_open_record_frags = true;
+			if (full_record || eor || sk_msg_full(msg_pl))
+				goto copied;
+			continue;
+		}
+
 		if (!is_kvec && (full_record || eor) && !async_capable) {
 			u32 first = msg_pl->sg.end;
 
@@ -1080,8 +1123,9 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 		/* Open records defined only if successfully copied, otherwise
 		 * we would trim the sg but not reset the open record frags.
 		 */
-		tls_ctx->pending_open_record_frags = true;
 		copied += try_to_copy;
+copied:
+		tls_ctx->pending_open_record_frags = true;
 		if (full_record || eor) {
 			ret = bpf_exec_tx_verdict(msg_pl, sk, full_record,
 						  record_type, &copied,



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next v2 4/6] tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES
  2023-05-31 12:45 [PATCH net-next v2 0/6] splice, net: Handle MSG_SPLICE_PAGES in AF_TLS David Howells
                   ` (2 preceding siblings ...)
  2023-05-31 12:45 ` [PATCH net-next v2 3/6] tls/sw: Support MSG_SPLICE_PAGES David Howells
@ 2023-05-31 12:45 ` David Howells
  2023-05-31 12:45 ` [PATCH net-next v2 5/6] tls/device: Support MSG_SPLICE_PAGES David Howells
  2023-05-31 12:45 ` [PATCH net-next v2 6/6] tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES David Howells
  5 siblings, 0 replies; 8+ messages in thread
From: David Howells @ 2023-05-31 12:45 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
	Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
	Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
	linux-mm, linux-kernel, bpf

Convert tls_sw_sendpage() and tls_sw_sendpage_locked() to use sendmsg()
with MSG_SPLICE_PAGES rather than directly splicing in the pages itself.

[!] Note that tls_sw_sendpage_locked() appears to have the wrong locking
    upstream.  I think the caller will only hold the socket lock, but it
    should hold tls_ctx->tx_lock too.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
cc: bpf@vger.kernel.org
---
 net/tls/tls_sw.c | 165 +++++++++--------------------------------------
 1 file changed, 31 insertions(+), 134 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 60d807b63f75..f63e4405cf34 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -961,7 +961,8 @@ static int tls_sw_sendmsg_splice(struct sock *sk, struct msghdr *msg,
 	return 0;
 }
 
-int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
+static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
+				 size_t size)
 {
 	long timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
@@ -984,15 +985,6 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 	int ret = 0;
 	int pending;
 
-	if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
-			       MSG_CMSG_COMPAT))
-		return -EOPNOTSUPP;
-
-	ret = mutex_lock_interruptible(&tls_ctx->tx_lock);
-	if (ret)
-		return ret;
-	lock_sock(sk);
-
 	if (unlikely(msg->msg_controllen)) {
 		ret = tls_process_cmsg(sk, msg, &record_type);
 		if (ret) {
@@ -1193,157 +1185,62 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 
 send_end:
 	ret = sk_stream_error(sk, msg->msg_flags, ret);
-
-	release_sock(sk);
-	mutex_unlock(&tls_ctx->tx_lock);
 	return copied > 0 ? copied : ret;
 }
 
-static int tls_sw_do_sendpage(struct sock *sk, struct page *page,
-			      int offset, size_t size, int flags)
+int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 {
-	long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
-	struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
-	struct tls_prot_info *prot = &tls_ctx->prot_info;
-	unsigned char record_type = TLS_RECORD_TYPE_DATA;
-	struct sk_msg *msg_pl;
-	struct tls_rec *rec;
-	int num_async = 0;
-	ssize_t copied = 0;
-	bool full_record;
-	int record_room;
-	int ret = 0;
-	bool eor;
-
-	eor = !(flags & MSG_SENDPAGE_NOTLAST);
-	sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk);
-
-	/* Call the sk_stream functions to manage the sndbuf mem. */
-	while (size > 0) {
-		size_t copy, required_size;
-
-		if (sk->sk_err) {
-			ret = -sk->sk_err;
-			goto sendpage_end;
-		}
-
-		if (ctx->open_rec)
-			rec = ctx->open_rec;
-		else
-			rec = ctx->open_rec = tls_get_rec(sk);
-		if (!rec) {
-			ret = -ENOMEM;
-			goto sendpage_end;
-		}
-
-		msg_pl = &rec->msg_plaintext;
-
-		full_record = false;
-		record_room = TLS_MAX_PAYLOAD_SIZE - msg_pl->sg.size;
-		copy = size;
-		if (copy >= record_room) {
-			copy = record_room;
-			full_record = true;
-		}
-
-		required_size = msg_pl->sg.size + copy + prot->overhead_size;
-
-		if (!sk_stream_memory_free(sk))
-			goto wait_for_sndbuf;
-alloc_payload:
-		ret = tls_alloc_encrypted_msg(sk, required_size);
-		if (ret) {
-			if (ret != -ENOSPC)
-				goto wait_for_memory;
-
-			/* Adjust copy according to the amount that was
-			 * actually allocated. The difference is due
-			 * to max sg elements limit
-			 */
-			copy -= required_size - msg_pl->sg.size;
-			full_record = true;
-		}
-
-		sk_msg_page_add(msg_pl, page, copy, offset);
-		sk_mem_charge(sk, copy);
-
-		offset += copy;
-		size -= copy;
-		copied += copy;
-
-		tls_ctx->pending_open_record_frags = true;
-		if (full_record || eor || sk_msg_full(msg_pl)) {
-			ret = bpf_exec_tx_verdict(msg_pl, sk, full_record,
-						  record_type, &copied, flags);
-			if (ret) {
-				if (ret == -EINPROGRESS)
-					num_async++;
-				else if (ret == -ENOMEM)
-					goto wait_for_memory;
-				else if (ret != -EAGAIN) {
-					if (ret == -ENOSPC)
-						ret = 0;
-					goto sendpage_end;
-				}
-			}
-		}
-		continue;
-wait_for_sndbuf:
-		set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
-wait_for_memory:
-		ret = sk_stream_wait_memory(sk, &timeo);
-		if (ret) {
-			if (ctx->open_rec)
-				tls_trim_both_msgs(sk, msg_pl->sg.size);
-			goto sendpage_end;
-		}
+	int ret;
 
-		if (ctx->open_rec)
-			goto alloc_payload;
-	}
+	if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
+			       MSG_CMSG_COMPAT | MSG_SPLICE_PAGES |
+			       MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY))
+		return -EOPNOTSUPP;
 
-	if (num_async) {
-		/* Transmit if any encryptions have completed */
-		if (test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) {
-			cancel_delayed_work(&ctx->tx_work.work);
-			tls_tx_records(sk, flags);
-		}
-	}
-sendpage_end:
-	ret = sk_stream_error(sk, flags, ret);
-	return copied > 0 ? copied : ret;
+	ret = mutex_lock_interruptible(&tls_ctx->tx_lock);
+	if (ret)
+		return ret;
+	lock_sock(sk);
+	ret = tls_sw_sendmsg_locked(sk, msg, size);
+	release_sock(sk);
+	mutex_unlock(&tls_ctx->tx_lock);
+	return ret;
 }
 
 int tls_sw_sendpage_locked(struct sock *sk, struct page *page,
 			   int offset, size_t size, int flags)
 {
+	struct bio_vec bvec;
+	struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };
+
 	if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
 		      MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY |
 		      MSG_NO_SHARED_FRAGS))
 		return -EOPNOTSUPP;
+	if (flags & MSG_SENDPAGE_NOTLAST)
+		msg.msg_flags |= MSG_MORE;
 
-	return tls_sw_do_sendpage(sk, page, offset, size, flags);
+	bvec_set_page(&bvec, page, size, offset);
+	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
+	return tls_sw_sendmsg_locked(sk, &msg, size);
 }
 
 int tls_sw_sendpage(struct sock *sk, struct page *page,
 		    int offset, size_t size, int flags)
 {
-	struct tls_context *tls_ctx = tls_get_ctx(sk);
-	int ret;
+	struct bio_vec bvec;
+	struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };
 
 	if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
 		      MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY))
 		return -EOPNOTSUPP;
+	if (flags & MSG_SENDPAGE_NOTLAST)
+		msg.msg_flags |= MSG_MORE;
 
-	ret = mutex_lock_interruptible(&tls_ctx->tx_lock);
-	if (ret)
-		return ret;
-	lock_sock(sk);
-	ret = tls_sw_do_sendpage(sk, page, offset, size, flags);
-	release_sock(sk);
-	mutex_unlock(&tls_ctx->tx_lock);
-	return ret;
+	bvec_set_page(&bvec, page, size, offset);
+	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
+	return tls_sw_sendmsg(sk, &msg, size);
 }
 
 static int



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next v2 5/6] tls/device: Support MSG_SPLICE_PAGES
  2023-05-31 12:45 [PATCH net-next v2 0/6] splice, net: Handle MSG_SPLICE_PAGES in AF_TLS David Howells
                   ` (3 preceding siblings ...)
  2023-05-31 12:45 ` [PATCH net-next v2 4/6] tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES David Howells
@ 2023-05-31 12:45 ` David Howells
  2023-05-31 12:45 ` [PATCH net-next v2 6/6] tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES David Howells
  5 siblings, 0 replies; 8+ messages in thread
From: David Howells @ 2023-05-31 12:45 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
	Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
	Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
	linux-mm, linux-kernel

Make TLS's device sendmsg() support MSG_SPLICE_PAGES.  This causes pages to
be spliced from the source iterator if possible.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
 net/tls/tls_device.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index a959572a816f..221fb30cba51 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -508,6 +508,29 @@ static int tls_push_data(struct sock *sk,
 			tls_append_frag(record, &zc_pfrag, copy);
 
 			iter_offset.offset += copy;
+		} else if (copy && (flags & MSG_SPLICE_PAGES)) {
+			struct page_frag zc_pfrag;
+			struct page **pages = &zc_pfrag.page;
+			size_t off;
+
+			rc = iov_iter_extract_pages(iter_offset.msg_iter,
+						    &pages, copy, 1, 0, &off);
+			if (rc <= 0) {
+				if (rc == 0)
+					rc = -EIO;
+				goto handle_error;
+			}
+			copy = rc;
+
+			if (WARN_ON_ONCE(!sendpage_ok(zc_pfrag.page))) {
+				iov_iter_revert(iter_offset.msg_iter, copy);
+				rc = -EIO;
+				goto handle_error;
+			}
+
+			zc_pfrag.offset = off;
+			zc_pfrag.size = copy;
+			tls_append_frag(record, &zc_pfrag, copy);
 		} else if (copy) {
 			copy = min_t(size_t, copy, pfrag->size - pfrag->offset);
 
@@ -571,6 +594,9 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 	union tls_iter_offset iter;
 	int rc;
 
+	if (!tls_ctx->zerocopy_sendfile)
+		msg->msg_flags &= ~MSG_SPLICE_PAGES;
+
 	mutex_lock(&tls_ctx->tx_lock);
 	lock_sock(sk);
 



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next v2 6/6] tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES
  2023-05-31 12:45 [PATCH net-next v2 0/6] splice, net: Handle MSG_SPLICE_PAGES in AF_TLS David Howells
                   ` (4 preceding siblings ...)
  2023-05-31 12:45 ` [PATCH net-next v2 5/6] tls/device: Support MSG_SPLICE_PAGES David Howells
@ 2023-05-31 12:45 ` David Howells
  5 siblings, 0 replies; 8+ messages in thread
From: David Howells @ 2023-05-31 12:45 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Chuck Lever, Boris Pismenny, John Fastabend,
	Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
	Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
	linux-mm, linux-kernel

Convert tls_device_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather
than directly splicing in the pages itself.  With that, the tls_iter_offset
union is no longer necessary and can be replaced with an iov_iter pointer
and the zc_page argument to tls_push_data() can also be removed.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
 net/tls/tls_device.c | 84 +++++++++++---------------------------------
 1 file changed, 20 insertions(+), 64 deletions(-)

diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 221fb30cba51..684171363dd9 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -422,16 +422,10 @@ static int tls_device_copy_data(void *addr, size_t bytes, struct iov_iter *i)
 	return 0;
 }
 
-union tls_iter_offset {
-	struct iov_iter *msg_iter;
-	int offset;
-};
-
 static int tls_push_data(struct sock *sk,
-			 union tls_iter_offset iter_offset,
+			 struct iov_iter *iter,
 			 size_t size, int flags,
-			 unsigned char record_type,
-			 struct page *zc_page)
+			 unsigned char record_type)
 {
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
 	struct tls_prot_info *prot = &tls_ctx->prot_info;
@@ -499,22 +493,13 @@ static int tls_push_data(struct sock *sk,
 		record = ctx->open_record;
 
 		copy = min_t(size_t, size, max_open_record_len - record->len);
-		if (copy && zc_page) {
-			struct page_frag zc_pfrag;
-
-			zc_pfrag.page = zc_page;
-			zc_pfrag.offset = iter_offset.offset;
-			zc_pfrag.size = copy;
-			tls_append_frag(record, &zc_pfrag, copy);
-
-			iter_offset.offset += copy;
-		} else if (copy && (flags & MSG_SPLICE_PAGES)) {
+		if (copy && (flags & MSG_SPLICE_PAGES)) {
 			struct page_frag zc_pfrag;
 			struct page **pages = &zc_pfrag.page;
 			size_t off;
 
-			rc = iov_iter_extract_pages(iter_offset.msg_iter,
-						    &pages, copy, 1, 0, &off);
+			rc = iov_iter_extract_pages(iter, &pages,
+						    copy, 1, 0, &off);
 			if (rc <= 0) {
 				if (rc == 0)
 					rc = -EIO;
@@ -523,7 +508,7 @@ static int tls_push_data(struct sock *sk,
 			copy = rc;
 
 			if (WARN_ON_ONCE(!sendpage_ok(zc_pfrag.page))) {
-				iov_iter_revert(iter_offset.msg_iter, copy);
+				iov_iter_revert(iter, copy);
 				rc = -EIO;
 				goto handle_error;
 			}
@@ -536,7 +521,7 @@ static int tls_push_data(struct sock *sk,
 
 			rc = tls_device_copy_data(page_address(pfrag->page) +
 						  pfrag->offset, copy,
-						  iter_offset.msg_iter);
+						  iter);
 			if (rc)
 				goto handle_error;
 			tls_append_frag(record, pfrag, copy);
@@ -591,7 +576,6 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 {
 	unsigned char record_type = TLS_RECORD_TYPE_DATA;
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
-	union tls_iter_offset iter;
 	int rc;
 
 	if (!tls_ctx->zerocopy_sendfile)
@@ -606,8 +590,8 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 			goto out;
 	}
 
-	iter.msg_iter = &msg->msg_iter;
-	rc = tls_push_data(sk, iter, size, msg->msg_flags, record_type, NULL);
+	rc = tls_push_data(sk, &msg->msg_iter, size, msg->msg_flags,
+			   record_type);
 
 out:
 	release_sock(sk);
@@ -618,44 +602,18 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 int tls_device_sendpage(struct sock *sk, struct page *page,
 			int offset, size_t size, int flags)
 {
-	struct tls_context *tls_ctx = tls_get_ctx(sk);
-	union tls_iter_offset iter_offset;
-	struct iov_iter msg_iter;
-	char *kaddr;
-	struct kvec iov;
-	int rc;
+	struct bio_vec bvec;
+	struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };
 
 	if (flags & MSG_SENDPAGE_NOTLAST)
-		flags |= MSG_MORE;
-
-	mutex_lock(&tls_ctx->tx_lock);
-	lock_sock(sk);
+		msg.msg_flags |= MSG_MORE;
 
-	if (flags & MSG_OOB) {
-		rc = -EOPNOTSUPP;
-		goto out;
-	}
-
-	if (tls_ctx->zerocopy_sendfile) {
-		iter_offset.offset = offset;
-		rc = tls_push_data(sk, iter_offset, size,
-				   flags, TLS_RECORD_TYPE_DATA, page);
-		goto out;
-	}
-
-	kaddr = kmap(page);
-	iov.iov_base = kaddr + offset;
-	iov.iov_len = size;
-	iov_iter_kvec(&msg_iter, ITER_SOURCE, &iov, 1, size);
-	iter_offset.msg_iter = &msg_iter;
-	rc = tls_push_data(sk, iter_offset, size, flags, TLS_RECORD_TYPE_DATA,
-			   NULL);
-	kunmap(page);
+	if (flags & MSG_OOB)
+		return -EOPNOTSUPP;
 
-out:
-	release_sock(sk);
-	mutex_unlock(&tls_ctx->tx_lock);
-	return rc;
+	bvec_set_page(&bvec, page, size, offset);
+	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
+	return tls_device_sendmsg(sk, &msg, size);
 }
 
 struct tls_record_info *tls_get_record(struct tls_offload_context_tx *context,
@@ -720,12 +678,10 @@ EXPORT_SYMBOL(tls_get_record);
 
 static int tls_device_push_pending_record(struct sock *sk, int flags)
 {
-	union tls_iter_offset iter;
-	struct iov_iter msg_iter;
+	struct iov_iter iter;
 
-	iov_iter_kvec(&msg_iter, ITER_SOURCE, NULL, 0, 0);
-	iter.msg_iter = &msg_iter;
-	return tls_push_data(sk, iter, 0, flags, TLS_RECORD_TYPE_DATA, NULL);
+	iov_iter_kvec(&iter, ITER_SOURCE, NULL, 0, 0);
+	return tls_push_data(sk, &iter, 0, flags, TLS_RECORD_TYPE_DATA);
 }
 
 void tls_device_write_space(struct sock *sk, struct tls_context *ctx)



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next v2 2/6] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace
  2023-05-31 12:45 ` [PATCH net-next v2 2/6] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace David Howells
@ 2023-05-31 17:02   ` Simon Horman
  0 siblings, 0 replies; 8+ messages in thread
From: Simon Horman @ 2023-05-31 17:02 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, Chuck Lever, Boris Pismenny, John Fastabend,
	Jakub Kicinski, David S. Miller, Eric Dumazet, Paolo Abeni,
	Willem de Bruijn, David Ahern, Matthew Wilcox, Jens Axboe,
	linux-mm, linux-kernel

On Wed, May 31, 2023 at 01:45:24PM +0100, David Howells wrote:
> It is necessary to allow MSG_SENDPAGE_* to be passed into ->sendmsg() to
> allow sendmsg(MSG_SPLICE_PAGES) to replace ->sendpage().  Unblocking them
> in the network protocol, however, allows these flags to be passed in by
> userspace too[1].
> 
> Fix this by marking MSG_SENDPAGE_NOPOLICY, MSG_SENDPAGE_NOTLAST and
> MSG_SENDPAGE_DECRYPTED as internal flags, which causes sendmsg() to object
> if they are passed to sendmsg() by userspace.  Network protocol ->sendmsg()
> implementations can then allow them through.
> 
> Note that it should be possible to remove MSG_SENDPAGE_NOTLAST once
> sendpage is removed as a whole slew of pages will be passed in in one go by

Hi David,

on the off-chance that you need to respin for some other reason:

	s/in in/in/

> splice through sendmsg, with MSG_MORE being set if it has more data waiting
> in the pipe.

...


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-05-31 17:03 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-31 12:45 [PATCH net-next v2 0/6] splice, net: Handle MSG_SPLICE_PAGES in AF_TLS David Howells
2023-05-31 12:45 ` [PATCH net-next v2 1/6] splice, net: Fix MSG_MORE signalling in splice_direct_to_actor() David Howells
2023-05-31 12:45 ` [PATCH net-next v2 2/6] net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace David Howells
2023-05-31 17:02   ` Simon Horman
2023-05-31 12:45 ` [PATCH net-next v2 3/6] tls/sw: Support MSG_SPLICE_PAGES David Howells
2023-05-31 12:45 ` [PATCH net-next v2 4/6] tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES David Howells
2023-05-31 12:45 ` [PATCH net-next v2 5/6] tls/device: Support MSG_SPLICE_PAGES David Howells
2023-05-31 12:45 ` [PATCH net-next v2 6/6] tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).