linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v10 00/16] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1
@ 2023-05-22 12:11 David Howells
  2023-05-22 12:11 ` [PATCH net-next v10 01/16] net: Declare MSG_SPLICE_PAGES internal sendmsg() flag David Howells
                   ` (16 more replies)
  0 siblings, 17 replies; 43+ messages in thread
From: David Howells @ 2023-05-22 12:11 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm

Here's the first tranche of patches towards providing a MSG_SPLICE_PAGES
internal sendmsg flag that is intended to replace the ->sendpage() op with
calls to sendmsg().  MSG_SPLICE_PAGES is a hint that tells the protocol
that it should splice the pages supplied if it can and copy them if not.

This will allow splice to pass multiple pages in a single call and allow
certain parts of higher protocols (e.g. sunrpc, iwarp) to pass an entire
message in one go rather than having to send them piecemeal.  This should
also make it easier to handle the splicing of multipage folios.

A helper, skb_splice_from_iter() is provided to do the work of splicing or
copying data from an iterator.  If a page is determined to be unspliceable
(such as being in the slab), then the helper will give an error.

Note that this facility is not made available to userspace and does not
provide any sort of callback.

This set consists of the following parts:

 (1) Define the MSG_SPLICE_PAGES flag and prevent sys_sendmsg() from being
     able to set it.

 (2) Add an extra argument to skb_append_pagefrags() so that something
     other than MAX_SKB_FRAGS can be used (sysctl_max_skb_frags for
     example).

 (3) Add the skb_splice_from_iter() helper to handle splicing pages into
     skbuffs for MSG_SPLICE_PAGES that can be shared by TCP, IP/UDP and
     AF_UNIX.

 (4) Implement MSG_SPLICE_PAGES support in TCP.

 (5) Make do_tcp_sendpages() just wrap sendmsg() and then fold it in to its
     various callers.

 (6) Implement MSG_SPLICE_PAGES support in IP and make udp_sendpage() just
     a wrapper around sendmsg().

 (7) Implement MSG_SPLICE_PAGES support in IP6/UDP6.

 (8) Implement MSG_SPLICE_PAGES support in AF_UNIX.

 (9) Make AF_UNIX copy unspliceable pages.

I've pushed the patches here also:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=sendpage-1

The follow-on patches are on branch iov-sendpage on the same tree.

David

Changes
=======
ver #10)
 - Rebase.
 - Fix patch subject to refer to unix_stream_sendpage() not udp_sendpage().

ver #9)
 - Fix a merge conflict with commit eea96a3e2c909.

ver #8)
 - Order local variables in reverse xmas tree order.
 - Remove duplicate coalescence check.
 - Warn if sendpage_ok() fails.

ver #7)
 - Rebase after merge window.
 - In ____sys_sendmsg(), clear internal flags before setting msg_flags.
 - Clear internal flags in uring io_send{,_zc}().
 - Export skb_splice_from_iter().
 - Missed changing a "zc = 1" in tcp_sendmsg_locked().
 - Remove now-unused csum_page() from UDP.
 - Add a patch to make AF_UNIX sendpage() just a wrapper around sendmsg().
 - Return an error if !sendpage_ok() rather than copying for now.
 - Drop the page frag allocator patches for the moment.

ver #6)
 - Removed a couple of leftover page pointer declarations.
 - In TCP, set zc to 0/MSG_ZEROCOPY/MSG_SPLICE_PAGES rather than 0/1/2.
 - Add a max-frags argument to skb_append_pagefrags().
 - Extract the AF_UNIX helper out into a common helper and use it for
   IP/UDP and TCP too.
 - udp_sendpage() shouldn't lock the socket around udp_sendmsg().
 - udp_sendpage() should only set MSG_MORE if MSG_SENDPAGE_NOTLAST is set.
 - In siw, don't clear MSG_SPLICE_PAGES on the last page.

ver #5)
 - Dropped the samples patch as it causes lots of failures in the patchwork
   32-bit builds due to apparent libc userspace header issues.
 - Made the pagefrag alloc patches alter the Google gve driver too.
 - Rearranged the patches to put the support in IP before altering UDP.

ver #4)
 - Added some sample socket-I/O programs into samples/net/.
 - Fix a missing page-get in AF_KCM.
 - Init the sgtable and mark the end in AF_ALG when calling
   netfs_extract_iter_to_sg().
 - Add a destructor func for page frag caches prior to generalising it and
   making it per-cpu.

ver #3)
 - Dropped the iterator-of-iterators patch.
 - Only expunge MSG_SPLICE_PAGES in sys_send[m]msg, not sys_recv[m]msg.
 - Split MSG_SPLICE_PAGES code in __ip_append_data() out into helper
   functions.
 - Implement MSG_SPLICE_PAGES support in __ip6_append_data() using the
   above helper functions.
 - Rename 'xlength' to 'initial_length'.
 - Minimise the changes to sunrpc for the moment.
 - Don't give -EOPNOTSUPP if NETIF_F_SG not available, just copy instead.
 - Implemented MSG_SPLICE_PAGES support in the TLS, Chelsio-TLS and AF_KCM
   code.

ver #2)
 - Overhauled the page_frag_alloc() allocator: large folios and per-cpu.
   - Got rid of my own zerocopy allocator.
 - Use iov_iter_extract_pages() rather poking in iter->bvec.
 - Made page splicing fall back to page copying on a page-by-page basis.
 - Made splice_to_socket() pass 16 pipe buffers at a time.
 - Made AF_ALG/hash use finup/digest where possible in sendmsg.
 - Added an iterator-of-iterators, ITER_ITERLIST.
 - Made sunrpc use the iterator-of-iterators.
 - Converted more drivers.

Link: https://lore.kernel.org/r/20230316152618.711970-1-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20230329141354.516864-1-dhowells@redhat.com/ # v2
Link: https://lore.kernel.org/r/20230331160914.1608208-1-dhowells@redhat.com/ # v3
Link: https://lore.kernel.org/r/20230405165339.3468808-1-dhowells@redhat.com/ # v4
Link: https://lore.kernel.org/r/20230406094245.3633290-1-dhowells@redhat.com/ # v5
Link: https://lore.kernel.org/r/20230411160902.4134381-1-dhowells@redhat.com/ # v6
Link: https://lore.kernel.org/r/20230515093345.396978-1-dhowells@redhat.com/ # v7
Link: https://lore.kernel.org/r/20230518113453.1350757-1-dhowells@redhat.com/ # v8

David Howells (16):
  net: Declare MSG_SPLICE_PAGES internal sendmsg() flag
  net: Pass max frags into skb_append_pagefrags()
  net: Add a function to splice pages into an skbuff for
    MSG_SPLICE_PAGES
  tcp: Support MSG_SPLICE_PAGES
  tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES
  tcp_bpf: Inline do_tcp_sendpages as it's now a wrapper around
    tcp_sendmsg
  espintcp: Inline do_tcp_sendpages()
  tls: Inline do_tcp_sendpages()
  siw: Inline do_tcp_sendpages()
  tcp: Fold do_tcp_sendpages() into tcp_sendpage_locked()
  ip, udp: Support MSG_SPLICE_PAGES
  ip6, udp6: Support MSG_SPLICE_PAGES
  udp: Convert udp_sendpage() to use MSG_SPLICE_PAGES
  ip: Remove ip_append_page()
  af_unix: Support MSG_SPLICE_PAGES
  unix: Convert unix_stream_sendpage() to use MSG_SPLICE_PAGES

 drivers/infiniband/sw/siw/siw_qp_tx.c |  17 +-
 include/linux/skbuff.h                |   5 +-
 include/linux/socket.h                |   3 +
 include/net/ip.h                      |   2 -
 include/net/tcp.h                     |   2 -
 include/net/tls.h                     |   2 +-
 io_uring/net.c                        |   2 +
 net/core/skbuff.c                     |  92 ++++++++++-
 net/ipv4/ip_output.c                  | 164 +++-----------------
 net/ipv4/tcp.c                        | 214 ++++++--------------------
 net/ipv4/tcp_bpf.c                    |  20 ++-
 net/ipv4/udp.c                        |  51 +-----
 net/ipv6/ip6_output.c                 |  17 ++
 net/socket.c                          |   2 +
 net/tls/tls_main.c                    |  24 +--
 net/unix/af_unix.c                    | 183 +++++-----------------
 net/xfrm/espintcp.c                   |  10 +-
 17 files changed, 278 insertions(+), 532 deletions(-)


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH net-next v10 01/16] net: Declare MSG_SPLICE_PAGES internal sendmsg() flag
  2023-05-22 12:11 [PATCH net-next v10 00/16] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1 David Howells
@ 2023-05-22 12:11 ` David Howells
  2023-05-22 12:11 ` [PATCH net-next v10 02/16] net: Pass max frags into skb_append_pagefrags() David Howells
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 43+ messages in thread
From: David Howells @ 2023-05-22 12:11 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Willem de Bruijn,
	io-uring

Declare MSG_SPLICE_PAGES, an internal sendmsg() flag, that hints to a
network protocol that it should splice pages from the source iterator
rather than copying the data if it can.  This flag is added to a list that
is cleared by sendmsg syscalls on entry.

This is intended as a replacement for the ->sendpage() op, allowing a way
to splice in several multipage folios in one go.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: io-uring@vger.kernel.org
cc: netdev@vger.kernel.org
---

Notes:
    ver #7)
     - In ____sys_sendmsg(), clear internal flags before setting msg_flags.
     - Clear internal flags in uring io_send{,_zc}().

 include/linux/socket.h | 3 +++
 io_uring/net.c         | 2 ++
 net/socket.c           | 2 ++
 3 files changed, 7 insertions(+)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 13c3a237b9c9..bd1cc3238851 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -327,6 +327,7 @@ struct ucred {
 					  */
 
 #define MSG_ZEROCOPY	0x4000000	/* Use user data in kernel path */
+#define MSG_SPLICE_PAGES 0x8000000	/* Splice the pages from the iterator in sendmsg() */
 #define MSG_FASTOPEN	0x20000000	/* Send data in TCP SYN */
 #define MSG_CMSG_CLOEXEC 0x40000000	/* Set close_on_exec for file
 					   descriptor received through
@@ -337,6 +338,8 @@ struct ucred {
 #define MSG_CMSG_COMPAT	0		/* We never have 32 bit fixups */
 #endif
 
+/* Flags to be cleared on entry by sendmsg and sendmmsg syscalls */
+#define MSG_INTERNAL_SENDMSG_FLAGS (MSG_SPLICE_PAGES)
 
 /* Setsockoptions(2) level. Thanks to BSD these must match IPPROTO_xxx */
 #define SOL_IP		0
diff --git a/io_uring/net.c b/io_uring/net.c
index 89e839013837..f7cbb3c7a575 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -389,6 +389,7 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags)
 	if (flags & MSG_WAITALL)
 		min_ret = iov_iter_count(&msg.msg_iter);
 
+	flags &= ~MSG_INTERNAL_SENDMSG_FLAGS;
 	msg.msg_flags = flags;
 	ret = sock_sendmsg(sock, &msg);
 	if (ret < min_ret) {
@@ -1136,6 +1137,7 @@ int io_send_zc(struct io_kiocb *req, unsigned int issue_flags)
 		msg_flags |= MSG_DONTWAIT;
 	if (msg_flags & MSG_WAITALL)
 		min_ret = iov_iter_count(&msg.msg_iter);
+	msg_flags &= ~MSG_INTERNAL_SENDMSG_FLAGS;
 
 	msg.msg_flags = msg_flags;
 	msg.msg_ubuf = &io_notif_to_data(zc->notif)->uarg;
diff --git a/net/socket.c b/net/socket.c
index b7e01d0fe082..3df96e9ba4e2 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2138,6 +2138,7 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags,
 		msg.msg_name = (struct sockaddr *)&address;
 		msg.msg_namelen = addr_len;
 	}
+	flags &= ~MSG_INTERNAL_SENDMSG_FLAGS;
 	if (sock->file->f_flags & O_NONBLOCK)
 		flags |= MSG_DONTWAIT;
 	msg.msg_flags = flags;
@@ -2483,6 +2484,7 @@ static int ____sys_sendmsg(struct socket *sock, struct msghdr *msg_sys,
 		msg_sys->msg_control = ctl_buf;
 		msg_sys->msg_control_is_user = false;
 	}
+	flags &= ~MSG_INTERNAL_SENDMSG_FLAGS;
 	msg_sys->msg_flags = flags;
 
 	if (sock->file->f_flags & O_NONBLOCK)


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH net-next v10 02/16] net: Pass max frags into skb_append_pagefrags()
  2023-05-22 12:11 [PATCH net-next v10 00/16] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1 David Howells
  2023-05-22 12:11 ` [PATCH net-next v10 01/16] net: Declare MSG_SPLICE_PAGES internal sendmsg() flag David Howells
@ 2023-05-22 12:11 ` David Howells
  2023-05-22 12:11 ` [PATCH net-next v10 03/16] net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES David Howells
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 43+ messages in thread
From: David Howells @ 2023-05-22 12:11 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm

Pass the maximum number of fragments into skb_append_pagefrags() rather
than using MAX_SKB_FRAGS so that it can be used from code that wants to
specify sysctl_max_skb_frags.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: David Ahern <dsahern@kernel.org>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
 include/linux/skbuff.h | 2 +-
 net/core/skbuff.c      | 4 ++--
 net/ipv4/ip_output.c   | 3 ++-
 net/unix/af_unix.c     | 2 +-
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 8cff3d817131..15011408c47c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1383,7 +1383,7 @@ static inline int skb_pad(struct sk_buff *skb, int pad)
 #define dev_kfree_skb(a)	consume_skb(a)
 
 int skb_append_pagefrags(struct sk_buff *skb, struct page *page,
-			 int offset, size_t size);
+			 int offset, size_t size, size_t max_frags);
 
 struct skb_seq_state {
 	__u32		lower_offset;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6724a84ebb09..7f53dcb26ad3 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4188,13 +4188,13 @@ unsigned int skb_find_text(struct sk_buff *skb, unsigned int from,
 EXPORT_SYMBOL(skb_find_text);
 
 int skb_append_pagefrags(struct sk_buff *skb, struct page *page,
-			 int offset, size_t size)
+			 int offset, size_t size, size_t max_frags)
 {
 	int i = skb_shinfo(skb)->nr_frags;
 
 	if (skb_can_coalesce(skb, i, page, offset)) {
 		skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], size);
-	} else if (i < MAX_SKB_FRAGS) {
+	} else if (i < max_frags) {
 		skb_zcopy_downgrade_managed(skb);
 		get_page(page);
 		skb_fill_page_desc_noacc(skb, i, page, offset, size);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 61892268e8a6..52fc840898d8 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1450,7 +1450,8 @@ ssize_t	ip_append_page(struct sock *sk, struct flowi4 *fl4, struct page *page,
 		if (len > size)
 			len = size;
 
-		if (skb_append_pagefrags(skb, page, offset, len)) {
+		if (skb_append_pagefrags(skb, page, offset, len,
+					 MAX_SKB_FRAGS)) {
 			err = -EMSGSIZE;
 			goto error;
 		}
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index cc695c9f09ec..dd55506b4632 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2349,7 +2349,7 @@ static ssize_t unix_stream_sendpage(struct socket *socket, struct page *page,
 		newskb = NULL;
 	}
 
-	if (skb_append_pagefrags(skb, page, offset, size)) {
+	if (skb_append_pagefrags(skb, page, offset, size, MAX_SKB_FRAGS)) {
 		tail = skb;
 		goto alloc_skb;
 	}


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH net-next v10 03/16] net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES
  2023-05-22 12:11 [PATCH net-next v10 00/16] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1 David Howells
  2023-05-22 12:11 ` [PATCH net-next v10 01/16] net: Declare MSG_SPLICE_PAGES internal sendmsg() flag David Howells
  2023-05-22 12:11 ` [PATCH net-next v10 02/16] net: Pass max frags into skb_append_pagefrags() David Howells
@ 2023-05-22 12:11 ` David Howells
  2023-05-24 12:24   ` Yunsheng Lin
  2023-05-24 13:21   ` David Howells
  2023-05-22 12:11 ` [PATCH net-next v10 04/16] tcp: Support MSG_SPLICE_PAGES David Howells
                   ` (13 subsequent siblings)
  16 siblings, 2 replies; 43+ messages in thread
From: David Howells @ 2023-05-22 12:11 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm

Add a function to handle MSG_SPLICE_PAGES being passed internally to
sendmsg().  Pages are spliced into the given socket buffer if possible and
copied in if not (e.g. they're slab pages or have a zero refcount).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: David Ahern <dsahern@kernel.org>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---

Notes:
    ver #8)
     - Order local variables in reverse xmas tree order.
     - Remove duplicate coalescence check.
     - Warn if sendpage_ok() fails.
    
    ver #7)
     - Export function.
     - Never copy data, return -EIO if sendpage_ok() returns false.

 include/linux/skbuff.h |  3 ++
 net/core/skbuff.c      | 88 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 91 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 15011408c47c..1b2ebf6113e0 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -5097,5 +5097,8 @@ static inline void skb_mark_for_recycle(struct sk_buff *skb)
 #endif
 }
 
+ssize_t skb_splice_from_iter(struct sk_buff *skb, struct iov_iter *iter,
+			     ssize_t maxsize, gfp_t gfp);
+
 #endif	/* __KERNEL__ */
 #endif	/* _LINUX_SKBUFF_H */
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7f53dcb26ad3..f4a5b51aed22 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -6892,3 +6892,91 @@ nodefer:	__kfree_skb(skb);
 	if (unlikely(kick) && !cmpxchg(&sd->defer_ipi_scheduled, 0, 1))
 		smp_call_function_single_async(cpu, &sd->defer_csd);
 }
+
+static void skb_splice_csum_page(struct sk_buff *skb, struct page *page,
+				 size_t offset, size_t len)
+{
+	const char *kaddr;
+	__wsum csum;
+
+	kaddr = kmap_local_page(page);
+	csum = csum_partial(kaddr + offset, len, 0);
+	kunmap_local(kaddr);
+	skb->csum = csum_block_add(skb->csum, csum, skb->len);
+}
+
+/**
+ * skb_splice_from_iter - Splice (or copy) pages to skbuff
+ * @skb: The buffer to add pages to
+ * @iter: Iterator representing the pages to be added
+ * @maxsize: Maximum amount of pages to be added
+ * @gfp: Allocation flags
+ *
+ * This is a common helper function for supporting MSG_SPLICE_PAGES.  It
+ * extracts pages from an iterator and adds them to the socket buffer if
+ * possible, copying them to fragments if not possible (such as if they're slab
+ * pages).
+ *
+ * Returns the amount of data spliced/copied or -EMSGSIZE if there's
+ * insufficient space in the buffer to transfer anything.
+ */
+ssize_t skb_splice_from_iter(struct sk_buff *skb, struct iov_iter *iter,
+			     ssize_t maxsize, gfp_t gfp)
+{
+	size_t frag_limit = READ_ONCE(sysctl_max_skb_frags);
+	struct page *pages[8], **ppages = pages;
+	ssize_t spliced = 0, ret = 0;
+	unsigned int i;
+
+	while (iter->count > 0) {
+		ssize_t space, nr;
+		size_t off, len;
+
+		ret = -EMSGSIZE;
+		space = frag_limit - skb_shinfo(skb)->nr_frags;
+		if (space < 0)
+			break;
+
+		/* We might be able to coalesce without increasing nr_frags */
+		nr = clamp_t(size_t, space, 1, ARRAY_SIZE(pages));
+
+		len = iov_iter_extract_pages(iter, &ppages, maxsize, nr, 0, &off);
+		if (len <= 0) {
+			ret = len ?: -EIO;
+			break;
+		}
+
+		i = 0;
+		do {
+			struct page *page = pages[i++];
+			size_t part = min_t(size_t, PAGE_SIZE - off, len);
+
+			ret = -EIO;
+			if (WARN_ON_ONCE(!sendpage_ok(page)))
+				goto out;
+
+			ret = skb_append_pagefrags(skb, page, off, part,
+						   frag_limit);
+			if (ret < 0) {
+				iov_iter_revert(iter, len);
+				goto out;
+			}
+
+			if (skb->ip_summed == CHECKSUM_NONE)
+				skb_splice_csum_page(skb, page, off, part);
+
+			off = 0;
+			spliced += part;
+			maxsize -= part;
+			len -= part;
+		} while (len > 0);
+
+		if (maxsize <= 0)
+			break;
+	}
+
+out:
+	skb_len_add(skb, spliced);
+	return spliced ?: ret;
+}
+EXPORT_SYMBOL(skb_splice_from_iter);


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH net-next v10 04/16] tcp: Support MSG_SPLICE_PAGES
  2023-05-22 12:11 [PATCH net-next v10 00/16] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1 David Howells
                   ` (2 preceding siblings ...)
  2023-05-22 12:11 ` [PATCH net-next v10 03/16] net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES David Howells
@ 2023-05-22 12:11 ` David Howells
  2023-05-22 12:11 ` [PATCH net-next v10 05/16] tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES David Howells
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 43+ messages in thread
From: David Howells @ 2023-05-22 12:11 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm

Make TCP's sendmsg() support MSG_SPLICE_PAGES.  This causes pages to be
spliced or copied (if it cannot be spliced) from the source iterator.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: David Ahern <dsahern@kernel.org>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---

Notes:
    ver #9)
     - Fix a merge conflict with commit eea96a3e2c909.
    
    ver #7)
     - Missed a "zc = 1" in tcp_sendmsg_locked().
    
    ver #6)
     - Set zc to 0/MSG_ZEROCOPY/MSG_SPLICE_PAGES rather than 0/1/2.
     - Use common helper.

 net/ipv4/tcp.c | 43 ++++++++++++++++++++++++++++++++++++-------
 1 file changed, 36 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 3d18e295bb2f..2d61150d01f1 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1223,7 +1223,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 	int flags, err, copied = 0;
 	int mss_now = 0, size_goal, copied_syn = 0;
 	int process_backlog = 0;
-	bool zc = false;
+	int zc = 0;
 	long timeo;
 
 	flags = msg->msg_flags;
@@ -1231,7 +1231,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 	if ((flags & MSG_ZEROCOPY) && size) {
 		if (msg->msg_ubuf) {
 			uarg = msg->msg_ubuf;
-			zc = sk->sk_route_caps & NETIF_F_SG;
+			if (sk->sk_route_caps & NETIF_F_SG)
+				zc = MSG_ZEROCOPY;
 		} else if (sock_flag(sk, SOCK_ZEROCOPY)) {
 			skb = tcp_write_queue_tail(sk);
 			uarg = msg_zerocopy_realloc(sk, size, skb_zcopy(skb));
@@ -1239,10 +1240,14 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 				err = -ENOBUFS;
 				goto out_err;
 			}
-			zc = sk->sk_route_caps & NETIF_F_SG;
-			if (!zc)
+			if (sk->sk_route_caps & NETIF_F_SG)
+				zc = MSG_ZEROCOPY;
+			else
 				uarg_to_msgzc(uarg)->zerocopy = 0;
 		}
+	} else if (unlikely(msg->msg_flags & MSG_SPLICE_PAGES) && size) {
+		if (sk->sk_route_caps & NETIF_F_SG)
+			zc = MSG_SPLICE_PAGES;
 	}
 
 	if (unlikely(flags & MSG_FASTOPEN || inet_sk(sk)->defer_connect) &&
@@ -1305,7 +1310,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 		goto do_error;
 
 	while (msg_data_left(msg)) {
-		int copy = 0;
+		ssize_t copy = 0;
 
 		skb = tcp_write_queue_tail(sk);
 		if (skb)
@@ -1346,7 +1351,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 		if (copy > msg_data_left(msg))
 			copy = msg_data_left(msg);
 
-		if (!zc) {
+		if (zc == 0) {
 			bool merge = true;
 			int i = skb_shinfo(skb)->nr_frags;
 			struct page_frag *pfrag = sk_page_frag(sk);
@@ -1391,7 +1396,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 				page_ref_inc(pfrag->page);
 			}
 			pfrag->offset += copy;
-		} else {
+		} else if (zc == MSG_ZEROCOPY)  {
 			/* First append to a fragless skb builds initial
 			 * pure zerocopy skb
 			 */
@@ -1412,6 +1417,30 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 			if (err < 0)
 				goto do_error;
 			copy = err;
+		} else if (zc == MSG_SPLICE_PAGES) {
+			/* Splice in data if we can; copy if we can't. */
+			if (tcp_downgrade_zcopy_pure(sk, skb))
+				goto wait_for_space;
+			copy = tcp_wmem_schedule(sk, copy);
+			if (!copy)
+				goto wait_for_space;
+
+			err = skb_splice_from_iter(skb, &msg->msg_iter, copy,
+						   sk->sk_allocation);
+			if (err < 0) {
+				if (err == -EMSGSIZE) {
+					tcp_mark_push(tp, skb);
+					goto new_segment;
+				}
+				goto do_error;
+			}
+			copy = err;
+
+			if (!(flags & MSG_NO_SHARED_FRAGS))
+				skb_shinfo(skb)->flags |= SKBFL_SHARED_FRAG;
+
+			sk_wmem_queued_add(sk, copy);
+			sk_mem_charge(sk, copy);
 		}
 
 		if (!copied)


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH net-next v10 05/16] tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES
  2023-05-22 12:11 [PATCH net-next v10 00/16] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1 David Howells
                   ` (3 preceding siblings ...)
  2023-05-22 12:11 ` [PATCH net-next v10 04/16] tcp: Support MSG_SPLICE_PAGES David Howells
@ 2023-05-22 12:11 ` David Howells
  2023-05-22 12:11 ` [PATCH net-next v10 06/16] tcp_bpf: Inline do_tcp_sendpages as it's now a wrapper around tcp_sendmsg David Howells
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 43+ messages in thread
From: David Howells @ 2023-05-22 12:11 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm

Convert do_tcp_sendpages() to use sendmsg() with MSG_SPLICE_PAGES rather
than directly splicing in the pages itself.  do_tcp_sendpages() can then be
inlined in subsequent patches into its callers.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: David Ahern <dsahern@kernel.org>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
 net/ipv4/tcp.c | 158 +++----------------------------------------------
 1 file changed, 7 insertions(+), 151 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 2d61150d01f1..f3a0c02678e0 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -974,163 +974,19 @@ static int tcp_wmem_schedule(struct sock *sk, int copy)
 	return min(copy, sk->sk_forward_alloc);
 }
 
-static struct sk_buff *tcp_build_frag(struct sock *sk, int size_goal, int flags,
-				      struct page *page, int offset, size_t *size)
-{
-	struct sk_buff *skb = tcp_write_queue_tail(sk);
-	struct tcp_sock *tp = tcp_sk(sk);
-	bool can_coalesce;
-	int copy, i;
-
-	if (!skb || (copy = size_goal - skb->len) <= 0 ||
-	    !tcp_skb_can_collapse_to(skb)) {
-new_segment:
-		if (!sk_stream_memory_free(sk))
-			return NULL;
-
-		skb = tcp_stream_alloc_skb(sk, 0, sk->sk_allocation,
-					   tcp_rtx_and_write_queues_empty(sk));
-		if (!skb)
-			return NULL;
-
-#ifdef CONFIG_TLS_DEVICE
-		skb->decrypted = !!(flags & MSG_SENDPAGE_DECRYPTED);
-#endif
-		tcp_skb_entail(sk, skb);
-		copy = size_goal;
-	}
-
-	if (copy > *size)
-		copy = *size;
-
-	i = skb_shinfo(skb)->nr_frags;
-	can_coalesce = skb_can_coalesce(skb, i, page, offset);
-	if (!can_coalesce && i >= READ_ONCE(sysctl_max_skb_frags)) {
-		tcp_mark_push(tp, skb);
-		goto new_segment;
-	}
-	if (tcp_downgrade_zcopy_pure(sk, skb))
-		return NULL;
-
-	copy = tcp_wmem_schedule(sk, copy);
-	if (!copy)
-		return NULL;
-
-	if (can_coalesce) {
-		skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
-	} else {
-		get_page(page);
-		skb_fill_page_desc_noacc(skb, i, page, offset, copy);
-	}
-
-	if (!(flags & MSG_NO_SHARED_FRAGS))
-		skb_shinfo(skb)->flags |= SKBFL_SHARED_FRAG;
-
-	skb->len += copy;
-	skb->data_len += copy;
-	skb->truesize += copy;
-	sk_wmem_queued_add(sk, copy);
-	sk_mem_charge(sk, copy);
-	WRITE_ONCE(tp->write_seq, tp->write_seq + copy);
-	TCP_SKB_CB(skb)->end_seq += copy;
-	tcp_skb_pcount_set(skb, 0);
-
-	*size = copy;
-	return skb;
-}
-
 ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
 			 size_t size, int flags)
 {
-	struct tcp_sock *tp = tcp_sk(sk);
-	int mss_now, size_goal;
-	int err;
-	ssize_t copied;
-	long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
-
-	if (IS_ENABLED(CONFIG_DEBUG_VM) &&
-	    WARN_ONCE(!sendpage_ok(page),
-		      "page must not be a Slab one and have page_count > 0"))
-		return -EINVAL;
-
-	/* Wait for a connection to finish. One exception is TCP Fast Open
-	 * (passive side) where data is allowed to be sent before a connection
-	 * is fully established.
-	 */
-	if (((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) &&
-	    !tcp_passive_fastopen(sk)) {
-		err = sk_stream_wait_connect(sk, &timeo);
-		if (err != 0)
-			goto out_err;
-	}
+	struct bio_vec bvec;
+	struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };
 
-	sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk);
+	bvec_set_page(&bvec, page, size, offset);
+	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
 
-	mss_now = tcp_send_mss(sk, &size_goal, flags);
-	copied = 0;
+	if (flags & MSG_SENDPAGE_NOTLAST)
+		msg.msg_flags |= MSG_MORE;
 
-	err = -EPIPE;
-	if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
-		goto out_err;
-
-	while (size > 0) {
-		struct sk_buff *skb;
-		size_t copy = size;
-
-		skb = tcp_build_frag(sk, size_goal, flags, page, offset, &copy);
-		if (!skb)
-			goto wait_for_space;
-
-		if (!copied)
-			TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_PSH;
-
-		copied += copy;
-		offset += copy;
-		size -= copy;
-		if (!size)
-			goto out;
-
-		if (skb->len < size_goal || (flags & MSG_OOB))
-			continue;
-
-		if (forced_push(tp)) {
-			tcp_mark_push(tp, skb);
-			__tcp_push_pending_frames(sk, mss_now, TCP_NAGLE_PUSH);
-		} else if (skb == tcp_send_head(sk))
-			tcp_push_one(sk, mss_now);
-		continue;
-
-wait_for_space:
-		set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
-		tcp_push(sk, flags & ~MSG_MORE, mss_now,
-			 TCP_NAGLE_PUSH, size_goal);
-
-		err = sk_stream_wait_memory(sk, &timeo);
-		if (err != 0)
-			goto do_error;
-
-		mss_now = tcp_send_mss(sk, &size_goal, flags);
-	}
-
-out:
-	if (copied) {
-		tcp_tx_timestamp(sk, sk->sk_tsflags);
-		if (!(flags & MSG_SENDPAGE_NOTLAST))
-			tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
-	}
-	return copied;
-
-do_error:
-	tcp_remove_empty_skb(sk);
-	if (copied)
-		goto out;
-out_err:
-	/* make sure we wake any epoll edge trigger waiter */
-	if (unlikely(tcp_rtx_and_write_queues_empty(sk) && err == -EAGAIN)) {
-		sk->sk_write_space(sk);
-		tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);
-	}
-	return sk_stream_error(sk, flags, err);
+	return tcp_sendmsg_locked(sk, &msg, size);
 }
 EXPORT_SYMBOL_GPL(do_tcp_sendpages);
 


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH net-next v10 06/16] tcp_bpf: Inline do_tcp_sendpages as it's now a wrapper around tcp_sendmsg
  2023-05-22 12:11 [PATCH net-next v10 00/16] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1 David Howells
                   ` (4 preceding siblings ...)
  2023-05-22 12:11 ` [PATCH net-next v10 05/16] tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES David Howells
@ 2023-05-22 12:11 ` David Howells
  2023-05-22 12:11 ` [PATCH net-next v10 07/16] espintcp: Inline do_tcp_sendpages() David Howells
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 43+ messages in thread
From: David Howells @ 2023-05-22 12:11 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, John Fastabend,
	Jakub Sitnicki, bpf

do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(),
so inline it.  This is part of replacing ->sendpage() with a call to
sendmsg() with MSG_SPLICE_PAGES set.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Sitnicki <jakub@cloudflare.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
cc: bpf@vger.kernel.org
---
 net/ipv4/tcp_bpf.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 2e9547467edb..0291d15acd19 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -72,11 +72,13 @@ static int tcp_bpf_push(struct sock *sk, struct sk_msg *msg, u32 apply_bytes,
 {
 	bool apply = apply_bytes;
 	struct scatterlist *sge;
+	struct msghdr msghdr = { .msg_flags = flags | MSG_SPLICE_PAGES, };
 	struct page *page;
 	int size, ret = 0;
 	u32 off;
 
 	while (1) {
+		struct bio_vec bvec;
 		bool has_tx_ulp;
 
 		sge = sk_msg_elem(msg, msg->sg.start);
@@ -88,16 +90,18 @@ static int tcp_bpf_push(struct sock *sk, struct sk_msg *msg, u32 apply_bytes,
 		tcp_rate_check_app_limited(sk);
 retry:
 		has_tx_ulp = tls_sw_has_ctx_tx(sk);
-		if (has_tx_ulp) {
-			flags |= MSG_SENDPAGE_NOPOLICY;
-			ret = kernel_sendpage_locked(sk,
-						     page, off, size, flags);
-		} else {
-			ret = do_tcp_sendpages(sk, page, off, size, flags);
-		}
+		if (has_tx_ulp)
+			msghdr.msg_flags |= MSG_SENDPAGE_NOPOLICY;
 
+		if (flags & MSG_SENDPAGE_NOTLAST)
+			msghdr.msg_flags |= MSG_MORE;
+
+		bvec_set_page(&bvec, page, size, off);
+		iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, size);
+		ret = tcp_sendmsg_locked(sk, &msghdr, size);
 		if (ret <= 0)
 			return ret;
+
 		if (apply)
 			apply_bytes -= ret;
 		msg->sg.size -= ret;
@@ -404,7 +408,7 @@ static int tcp_bpf_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 	long timeo;
 	int flags;
 
-	/* Don't let internal do_tcp_sendpages() flags through */
+	/* Don't let internal sendpage flags through */
 	flags = (msg->msg_flags & ~MSG_SENDPAGE_DECRYPTED);
 	flags |= MSG_NO_SHARED_FRAGS;
 


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH net-next v10 07/16] espintcp: Inline do_tcp_sendpages()
  2023-05-22 12:11 [PATCH net-next v10 00/16] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1 David Howells
                   ` (5 preceding siblings ...)
  2023-05-22 12:11 ` [PATCH net-next v10 06/16] tcp_bpf: Inline do_tcp_sendpages as it's now a wrapper around tcp_sendmsg David Howells
@ 2023-05-22 12:11 ` David Howells
  2023-05-22 12:11 ` [PATCH net-next v10 08/16] tls: " David Howells
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 43+ messages in thread
From: David Howells @ 2023-05-22 12:11 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Steffen Klassert,
	Herbert Xu

do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(),
so inline it, allowing do_tcp_sendpages() to be removed.  This is part of
replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steffen Klassert <steffen.klassert@secunet.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: David Ahern <dsahern@kernel.org>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
 net/xfrm/espintcp.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/net/xfrm/espintcp.c b/net/xfrm/espintcp.c
index 872b80188e83..3504925babdb 100644
--- a/net/xfrm/espintcp.c
+++ b/net/xfrm/espintcp.c
@@ -205,14 +205,16 @@ static int espintcp_sendskb_locked(struct sock *sk, struct espintcp_msg *emsg,
 static int espintcp_sendskmsg_locked(struct sock *sk,
 				     struct espintcp_msg *emsg, int flags)
 {
+	struct msghdr msghdr = { .msg_flags = flags | MSG_SPLICE_PAGES, };
 	struct sk_msg *skmsg = &emsg->skmsg;
 	struct scatterlist *sg;
 	int done = 0;
 	int ret;
 
-	flags |= MSG_SENDPAGE_NOTLAST;
+	msghdr.msg_flags |= MSG_SENDPAGE_NOTLAST;
 	sg = &skmsg->sg.data[skmsg->sg.start];
 	do {
+		struct bio_vec bvec;
 		size_t size = sg->length - emsg->offset;
 		int offset = sg->offset + emsg->offset;
 		struct page *p;
@@ -220,11 +222,13 @@ static int espintcp_sendskmsg_locked(struct sock *sk,
 		emsg->offset = 0;
 
 		if (sg_is_last(sg))
-			flags &= ~MSG_SENDPAGE_NOTLAST;
+			msghdr.msg_flags &= ~MSG_SENDPAGE_NOTLAST;
 
 		p = sg_page(sg);
 retry:
-		ret = do_tcp_sendpages(sk, p, offset, size, flags);
+		bvec_set_page(&bvec, p, size, offset);
+		iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, size);
+		ret = tcp_sendmsg_locked(sk, &msghdr, size);
 		if (ret < 0) {
 			emsg->offset = offset - sg->offset;
 			skmsg->sg.start += done;


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH net-next v10 08/16] tls: Inline do_tcp_sendpages()
  2023-05-22 12:11 [PATCH net-next v10 00/16] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1 David Howells
                   ` (6 preceding siblings ...)
  2023-05-22 12:11 ` [PATCH net-next v10 07/16] espintcp: Inline do_tcp_sendpages() David Howells
@ 2023-05-22 12:11 ` David Howells
  2023-06-07 14:17   ` Tariq Toukan
  2023-06-07 15:03   ` David Howells
  2023-05-22 12:11 ` [PATCH net-next v10 09/16] siw: " David Howells
                   ` (8 subsequent siblings)
  16 siblings, 2 replies; 43+ messages in thread
From: David Howells @ 2023-05-22 12:11 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Boris Pismenny,
	John Fastabend

do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(),
so inline it, allowing do_tcp_sendpages() to be removed.  This is part of
replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
 include/net/tls.h  |  2 +-
 net/tls/tls_main.c | 24 +++++++++++++++---------
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 6056ce5a2aa5..5791ca7a189c 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -258,7 +258,7 @@ struct tls_context {
 	struct scatterlist *partially_sent_record;
 	u16 partially_sent_offset;
 
-	bool in_tcp_sendpages;
+	bool splicing_pages;
 	bool pending_open_record_frags;
 
 	struct mutex tx_lock; /* protects partially_sent_* fields and
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index f2e7302a4d96..3d45fdb5c4e9 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -125,7 +125,10 @@ int tls_push_sg(struct sock *sk,
 		u16 first_offset,
 		int flags)
 {
-	int sendpage_flags = flags | MSG_SENDPAGE_NOTLAST;
+	struct bio_vec bvec;
+	struct msghdr msg = {
+		.msg_flags = MSG_SENDPAGE_NOTLAST | MSG_SPLICE_PAGES | flags,
+	};
 	int ret = 0;
 	struct page *p;
 	size_t size;
@@ -134,16 +137,19 @@ int tls_push_sg(struct sock *sk,
 	size = sg->length - offset;
 	offset += sg->offset;
 
-	ctx->in_tcp_sendpages = true;
+	ctx->splicing_pages = true;
 	while (1) {
 		if (sg_is_last(sg))
-			sendpage_flags = flags;
+			msg.msg_flags = flags;
 
 		/* is sending application-limited? */
 		tcp_rate_check_app_limited(sk);
 		p = sg_page(sg);
 retry:
-		ret = do_tcp_sendpages(sk, p, offset, size, sendpage_flags);
+		bvec_set_page(&bvec, p, size, offset);
+		iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
+
+		ret = tcp_sendmsg_locked(sk, &msg, size);
 
 		if (ret != size) {
 			if (ret > 0) {
@@ -155,7 +161,7 @@ int tls_push_sg(struct sock *sk,
 			offset -= sg->offset;
 			ctx->partially_sent_offset = offset;
 			ctx->partially_sent_record = (void *)sg;
-			ctx->in_tcp_sendpages = false;
+			ctx->splicing_pages = false;
 			return ret;
 		}
 
@@ -169,7 +175,7 @@ int tls_push_sg(struct sock *sk,
 		size = sg->length;
 	}
 
-	ctx->in_tcp_sendpages = false;
+	ctx->splicing_pages = false;
 
 	return 0;
 }
@@ -247,11 +253,11 @@ static void tls_write_space(struct sock *sk)
 {
 	struct tls_context *ctx = tls_get_ctx(sk);
 
-	/* If in_tcp_sendpages call lower protocol write space handler
+	/* If splicing_pages call lower protocol write space handler
 	 * to ensure we wake up any waiting operations there. For example
-	 * if do_tcp_sendpages where to call sk_wait_event.
+	 * if splicing pages where to call sk_wait_event.
 	 */
-	if (ctx->in_tcp_sendpages) {
+	if (ctx->splicing_pages) {
 		ctx->sk_write_space(sk);
 		return;
 	}


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH net-next v10 09/16] siw: Inline do_tcp_sendpages()
  2023-05-22 12:11 [PATCH net-next v10 00/16] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1 David Howells
                   ` (7 preceding siblings ...)
  2023-05-22 12:11 ` [PATCH net-next v10 08/16] tls: " David Howells
@ 2023-05-22 12:11 ` David Howells
  2023-05-22 12:11 ` [PATCH net-next v10 10/16] tcp: Fold do_tcp_sendpages() into tcp_sendpage_locked() David Howells
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 43+ messages in thread
From: David Howells @ 2023-05-22 12:11 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Bernard Metzler,
	Tom Talpey, Jason Gunthorpe, Leon Romanovsky, linux-rdma

do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(),
so inline it, allowing do_tcp_sendpages() to be removed.  This is part of
replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
cc: Jason Gunthorpe <jgg@ziepe.ca>
cc: Leon Romanovsky <leon@kernel.org>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-rdma@vger.kernel.org
cc: netdev@vger.kernel.org
---

Notes:
    ver #6)
     - Don't clear MSG_SPLICE_PAGES on the last page.

 drivers/infiniband/sw/siw/siw_qp_tx.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c
index 4b292e0504f1..ffb16beb6c30 100644
--- a/drivers/infiniband/sw/siw/siw_qp_tx.c
+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
@@ -312,7 +312,7 @@ static int siw_tx_ctrl(struct siw_iwarp_tx *c_tx, struct socket *s,
 }
 
 /*
- * 0copy TCP transmit interface: Use do_tcp_sendpages.
+ * 0copy TCP transmit interface: Use MSG_SPLICE_PAGES.
  *
  * Using sendpage to push page by page appears to be less efficient
  * than using sendmsg, even if data are copied.
@@ -323,20 +323,27 @@ static int siw_tx_ctrl(struct siw_iwarp_tx *c_tx, struct socket *s,
 static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset,
 			     size_t size)
 {
+	struct bio_vec bvec;
+	struct msghdr msg = {
+		.msg_flags = (MSG_MORE | MSG_DONTWAIT | MSG_SENDPAGE_NOTLAST |
+			      MSG_SPLICE_PAGES),
+	};
 	struct sock *sk = s->sk;
-	int i = 0, rv = 0, sent = 0,
-	    flags = MSG_MORE | MSG_DONTWAIT | MSG_SENDPAGE_NOTLAST;
+	int i = 0, rv = 0, sent = 0;
 
 	while (size) {
 		size_t bytes = min_t(size_t, PAGE_SIZE - offset, size);
 
 		if (size + offset <= PAGE_SIZE)
-			flags = MSG_MORE | MSG_DONTWAIT;
+			msg.msg_flags &= ~MSG_SENDPAGE_NOTLAST;
 
 		tcp_rate_check_app_limited(sk);
+		bvec_set_page(&bvec, page[i], bytes, offset);
+		iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
+
 try_page_again:
 		lock_sock(sk);
-		rv = do_tcp_sendpages(sk, page[i], offset, bytes, flags);
+		rv = tcp_sendmsg_locked(sk, &msg, size);
 		release_sock(sk);
 
 		if (rv > 0) {


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH net-next v10 10/16] tcp: Fold do_tcp_sendpages() into tcp_sendpage_locked()
  2023-05-22 12:11 [PATCH net-next v10 00/16] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1 David Howells
                   ` (8 preceding siblings ...)
  2023-05-22 12:11 ` [PATCH net-next v10 09/16] siw: " David Howells
@ 2023-05-22 12:11 ` David Howells
  2023-05-22 12:11 ` [PATCH net-next v10 11/16] ip, udp: Support MSG_SPLICE_PAGES David Howells
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 43+ messages in thread
From: David Howells @ 2023-05-22 12:11 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm

Fold do_tcp_sendpages() into its last remaining caller,
tcp_sendpage_locked().

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Dumazet <edumazet@google.com>
cc: David Ahern <dsahern@kernel.org>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---
 include/net/tcp.h |  2 --
 net/ipv4/tcp.c    | 21 +++++++--------------
 2 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 04a31643cda3..02a6cff1827e 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -333,8 +333,6 @@ int tcp_sendpage(struct sock *sk, struct page *page, int offset, size_t size,
 		 int flags);
 int tcp_sendpage_locked(struct sock *sk, struct page *page, int offset,
 			size_t size, int flags);
-ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
-		 size_t size, int flags);
 int tcp_send_mss(struct sock *sk, int *size_goal, int flags);
 void tcp_push(struct sock *sk, int flags, int mss_now, int nonagle,
 	      int size_goal);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index f3a0c02678e0..e9506cebecce 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -974,12 +974,17 @@ static int tcp_wmem_schedule(struct sock *sk, int copy)
 	return min(copy, sk->sk_forward_alloc);
 }
 
-ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
-			 size_t size, int flags)
+int tcp_sendpage_locked(struct sock *sk, struct page *page, int offset,
+			size_t size, int flags)
 {
 	struct bio_vec bvec;
 	struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };
 
+	if (!(sk->sk_route_caps & NETIF_F_SG))
+		return sock_no_sendpage_locked(sk, page, offset, size, flags);
+
+	tcp_rate_check_app_limited(sk);  /* is sending application-limited? */
+
 	bvec_set_page(&bvec, page, size, offset);
 	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
 
@@ -988,18 +993,6 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
 
 	return tcp_sendmsg_locked(sk, &msg, size);
 }
-EXPORT_SYMBOL_GPL(do_tcp_sendpages);
-
-int tcp_sendpage_locked(struct sock *sk, struct page *page, int offset,
-			size_t size, int flags)
-{
-	if (!(sk->sk_route_caps & NETIF_F_SG))
-		return sock_no_sendpage_locked(sk, page, offset, size, flags);
-
-	tcp_rate_check_app_limited(sk);  /* is sending application-limited? */
-
-	return do_tcp_sendpages(sk, page, offset, size, flags);
-}
 EXPORT_SYMBOL_GPL(tcp_sendpage_locked);
 
 int tcp_sendpage(struct sock *sk, struct page *page, int offset,


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH net-next v10 11/16] ip, udp: Support MSG_SPLICE_PAGES
  2023-05-22 12:11 [PATCH net-next v10 00/16] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1 David Howells
                   ` (9 preceding siblings ...)
  2023-05-22 12:11 ` [PATCH net-next v10 10/16] tcp: Fold do_tcp_sendpages() into tcp_sendpage_locked() David Howells
@ 2023-05-22 12:11 ` David Howells
  2023-05-22 12:11 ` [PATCH net-next v10 12/16] ip6, udp6: " David Howells
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 43+ messages in thread
From: David Howells @ 2023-05-22 12:11 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm

Make IP/UDP sendmsg() support MSG_SPLICE_PAGES.  This causes pages to be
spliced from the source iterator.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: David Ahern <dsahern@kernel.org>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---

Notes:
    ver #6)
     - Use common helper.

 net/ipv4/ip_output.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 52fc840898d8..c7db973b5d29 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1048,6 +1048,14 @@ static int __ip_append_data(struct sock *sk,
 				skb_zcopy_set(skb, uarg, &extra_uref);
 			}
 		}
+	} else if ((flags & MSG_SPLICE_PAGES) && length) {
+		if (inet->hdrincl)
+			return -EPERM;
+		if (rt->dst.dev->features & NETIF_F_SG)
+			/* We need an empty buffer to attach stuff to */
+			paged = true;
+		else
+			flags &= ~MSG_SPLICE_PAGES;
 	}
 
 	cork->length += length;
@@ -1207,6 +1215,15 @@ static int __ip_append_data(struct sock *sk,
 				err = -EFAULT;
 				goto error;
 			}
+		} else if (flags & MSG_SPLICE_PAGES) {
+			struct msghdr *msg = from;
+
+			err = skb_splice_from_iter(skb, &msg->msg_iter, copy,
+						   sk->sk_allocation);
+			if (err < 0)
+				goto error;
+			copy = err;
+			wmem_alloc_delta += copy;
 		} else if (!zc) {
 			int i = skb_shinfo(skb)->nr_frags;
 


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH net-next v10 12/16] ip6, udp6: Support MSG_SPLICE_PAGES
  2023-05-22 12:11 [PATCH net-next v10 00/16] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1 David Howells
                   ` (10 preceding siblings ...)
  2023-05-22 12:11 ` [PATCH net-next v10 11/16] ip, udp: Support MSG_SPLICE_PAGES David Howells
@ 2023-05-22 12:11 ` David Howells
  2023-05-22 12:11 ` [PATCH net-next v10 13/16] udp: Convert udp_sendpage() to use MSG_SPLICE_PAGES David Howells
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 43+ messages in thread
From: David Howells @ 2023-05-22 12:11 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm

Make IP6/UDP6 sendmsg() support MSG_SPLICE_PAGES.  This causes pages to be
spliced from the source iterator if possible, copying the data if not.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: David Ahern <dsahern@kernel.org>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---

Notes:
    ver #6)
     - Use common helper.

 net/ipv6/ip6_output.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 9554cf46ed88..c722cb881b2d 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1589,6 +1589,14 @@ static int __ip6_append_data(struct sock *sk,
 				skb_zcopy_set(skb, uarg, &extra_uref);
 			}
 		}
+	} else if ((flags & MSG_SPLICE_PAGES) && length) {
+		if (inet_sk(sk)->hdrincl)
+			return -EPERM;
+		if (rt->dst.dev->features & NETIF_F_SG)
+			/* We need an empty buffer to attach stuff to */
+			paged = true;
+		else
+			flags &= ~MSG_SPLICE_PAGES;
 	}
 
 	/*
@@ -1778,6 +1786,15 @@ static int __ip6_append_data(struct sock *sk,
 				err = -EFAULT;
 				goto error;
 			}
+		} else if (flags & MSG_SPLICE_PAGES) {
+			struct msghdr *msg = from;
+
+			err = skb_splice_from_iter(skb, &msg->msg_iter, copy,
+						   sk->sk_allocation);
+			if (err < 0)
+				goto error;
+			copy = err;
+			wmem_alloc_delta += copy;
 		} else if (!zc) {
 			int i = skb_shinfo(skb)->nr_frags;
 


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH net-next v10 13/16] udp: Convert udp_sendpage() to use MSG_SPLICE_PAGES
  2023-05-22 12:11 [PATCH net-next v10 00/16] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1 David Howells
                   ` (11 preceding siblings ...)
  2023-05-22 12:11 ` [PATCH net-next v10 12/16] ip6, udp6: " David Howells
@ 2023-05-22 12:11 ` David Howells
  2023-05-22 12:11 ` [PATCH net-next v10 14/16] ip: Remove ip_append_page() David Howells
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 43+ messages in thread
From: David Howells @ 2023-05-22 12:11 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm

Convert udp_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather than
directly splicing in the pages itself.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: David Ahern <dsahern@kernel.org>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---

Notes:
    ver #6)
     - udp_sendpage() shouldn't lock the socket around udp_sendpage().
     - udp_sendpage() should only set MSG_MORE if MSG_SENDPAGE_NOTLAST is set.

 net/ipv4/udp.c | 51 ++++++--------------------------------------------
 1 file changed, 6 insertions(+), 45 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index aa32afd871ee..2879dc6d66ea 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1332,54 +1332,15 @@ EXPORT_SYMBOL(udp_sendmsg);
 int udp_sendpage(struct sock *sk, struct page *page, int offset,
 		 size_t size, int flags)
 {
-	struct inet_sock *inet = inet_sk(sk);
-	struct udp_sock *up = udp_sk(sk);
-	int ret;
+	struct bio_vec bvec;
+	struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES };
 
 	if (flags & MSG_SENDPAGE_NOTLAST)
-		flags |= MSG_MORE;
-
-	if (!up->pending) {
-		struct msghdr msg = {	.msg_flags = flags|MSG_MORE };
-
-		/* Call udp_sendmsg to specify destination address which
-		 * sendpage interface can't pass.
-		 * This will succeed only when the socket is connected.
-		 */
-		ret = udp_sendmsg(sk, &msg, 0);
-		if (ret < 0)
-			return ret;
-	}
-
-	lock_sock(sk);
+		msg.msg_flags |= MSG_MORE;
 
-	if (unlikely(!up->pending)) {
-		release_sock(sk);
-
-		net_dbg_ratelimited("cork failed\n");
-		return -EINVAL;
-	}
-
-	ret = ip_append_page(sk, &inet->cork.fl.u.ip4,
-			     page, offset, size, flags);
-	if (ret == -EOPNOTSUPP) {
-		release_sock(sk);
-		return sock_no_sendpage(sk->sk_socket, page, offset,
-					size, flags);
-	}
-	if (ret < 0) {
-		udp_flush_pending_frames(sk);
-		goto out;
-	}
-
-	up->len += size;
-	if (!(READ_ONCE(up->corkflag) || (flags&MSG_MORE)))
-		ret = udp_push_pending_frames(sk);
-	if (!ret)
-		ret = size;
-out:
-	release_sock(sk);
-	return ret;
+	bvec_set_page(&bvec, page, size, offset);
+	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
+	return udp_sendmsg(sk, &msg, size);
 }
 
 #define UDP_SKB_IS_STATELESS 0x80000000


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH net-next v10 14/16] ip: Remove ip_append_page()
  2023-05-22 12:11 [PATCH net-next v10 00/16] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1 David Howells
                   ` (12 preceding siblings ...)
  2023-05-22 12:11 ` [PATCH net-next v10 13/16] udp: Convert udp_sendpage() to use MSG_SPLICE_PAGES David Howells
@ 2023-05-22 12:11 ` David Howells
  2023-05-22 12:11 ` [PATCH net-next v10 15/16] af_unix: Support MSG_SPLICE_PAGES David Howells
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 43+ messages in thread
From: David Howells @ 2023-05-22 12:11 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm

ip_append_page() is no longer used with the removal of udp_sendpage(), so
remove it.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: David Ahern <dsahern@kernel.org>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---

Notes:
    ver #7)
     - Remove now-unused csum_page().

 include/net/ip.h     |   2 -
 net/ipv4/ip_output.c | 148 ++-----------------------------------------
 2 files changed, 4 insertions(+), 146 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index c3fffaa92d6e..7627a4df893b 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -220,8 +220,6 @@ int ip_append_data(struct sock *sk, struct flowi4 *fl4,
 		   unsigned int flags);
 int ip_generic_getfrag(void *from, char *to, int offset, int len, int odd,
 		       struct sk_buff *skb);
-ssize_t ip_append_page(struct sock *sk, struct flowi4 *fl4, struct page *page,
-		       int offset, size_t size, int flags);
 struct sk_buff *__ip_make_skb(struct sock *sk, struct flowi4 *fl4,
 			      struct sk_buff_head *queue,
 			      struct inet_cork *cork);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index c7db973b5d29..553c740a6bfb 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -946,17 +946,6 @@ ip_generic_getfrag(void *from, char *to, int offset, int len, int odd, struct sk
 }
 EXPORT_SYMBOL(ip_generic_getfrag);
 
-static inline __wsum
-csum_page(struct page *page, int offset, int copy)
-{
-	char *kaddr;
-	__wsum csum;
-	kaddr = kmap(page);
-	csum = csum_partial(kaddr + offset, copy, 0);
-	kunmap(page);
-	return csum;
-}
-
 static int __ip_append_data(struct sock *sk,
 			    struct flowi4 *fl4,
 			    struct sk_buff_head *queue,
@@ -1327,10 +1316,10 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
 }
 
 /*
- *	ip_append_data() and ip_append_page() can make one large IP datagram
- *	from many pieces of data. Each pieces will be holded on the socket
- *	until ip_push_pending_frames() is called. Each piece can be a page
- *	or non-page data.
+ *	ip_append_data() can make one large IP datagram from many pieces of
+ *	data.  Each piece will be held on the socket until
+ *	ip_push_pending_frames() is called. Each piece can be a page or
+ *	non-page data.
  *
  *	Not only UDP, other transport protocols - e.g. raw sockets - can use
  *	this interface potentially.
@@ -1363,135 +1352,6 @@ int ip_append_data(struct sock *sk, struct flowi4 *fl4,
 				from, length, transhdrlen, flags);
 }
 
-ssize_t	ip_append_page(struct sock *sk, struct flowi4 *fl4, struct page *page,
-		       int offset, size_t size, int flags)
-{
-	struct inet_sock *inet = inet_sk(sk);
-	struct sk_buff *skb;
-	struct rtable *rt;
-	struct ip_options *opt = NULL;
-	struct inet_cork *cork;
-	int hh_len;
-	int mtu;
-	int len;
-	int err;
-	unsigned int maxfraglen, fragheaderlen, fraggap, maxnonfragsize;
-
-	if (inet->hdrincl)
-		return -EPERM;
-
-	if (flags&MSG_PROBE)
-		return 0;
-
-	if (skb_queue_empty(&sk->sk_write_queue))
-		return -EINVAL;
-
-	cork = &inet->cork.base;
-	rt = (struct rtable *)cork->dst;
-	if (cork->flags & IPCORK_OPT)
-		opt = cork->opt;
-
-	if (!(rt->dst.dev->features & NETIF_F_SG))
-		return -EOPNOTSUPP;
-
-	hh_len = LL_RESERVED_SPACE(rt->dst.dev);
-	mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize;
-
-	fragheaderlen = sizeof(struct iphdr) + (opt ? opt->optlen : 0);
-	maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen;
-	maxnonfragsize = ip_sk_ignore_df(sk) ? 0xFFFF : mtu;
-
-	if (cork->length + size > maxnonfragsize - fragheaderlen) {
-		ip_local_error(sk, EMSGSIZE, fl4->daddr, inet->inet_dport,
-			       mtu - (opt ? opt->optlen : 0));
-		return -EMSGSIZE;
-	}
-
-	skb = skb_peek_tail(&sk->sk_write_queue);
-	if (!skb)
-		return -EINVAL;
-
-	cork->length += size;
-
-	while (size > 0) {
-		/* Check if the remaining data fits into current packet. */
-		len = mtu - skb->len;
-		if (len < size)
-			len = maxfraglen - skb->len;
-
-		if (len <= 0) {
-			struct sk_buff *skb_prev;
-			int alloclen;
-
-			skb_prev = skb;
-			fraggap = skb_prev->len - maxfraglen;
-
-			alloclen = fragheaderlen + hh_len + fraggap + 15;
-			skb = sock_wmalloc(sk, alloclen, 1, sk->sk_allocation);
-			if (unlikely(!skb)) {
-				err = -ENOBUFS;
-				goto error;
-			}
-
-			/*
-			 *	Fill in the control structures
-			 */
-			skb->ip_summed = CHECKSUM_NONE;
-			skb->csum = 0;
-			skb_reserve(skb, hh_len);
-
-			/*
-			 *	Find where to start putting bytes.
-			 */
-			skb_put(skb, fragheaderlen + fraggap);
-			skb_reset_network_header(skb);
-			skb->transport_header = (skb->network_header +
-						 fragheaderlen);
-			if (fraggap) {
-				skb->csum = skb_copy_and_csum_bits(skb_prev,
-								   maxfraglen,
-						    skb_transport_header(skb),
-								   fraggap);
-				skb_prev->csum = csum_sub(skb_prev->csum,
-							  skb->csum);
-				pskb_trim_unique(skb_prev, maxfraglen);
-			}
-
-			/*
-			 * Put the packet on the pending queue.
-			 */
-			__skb_queue_tail(&sk->sk_write_queue, skb);
-			continue;
-		}
-
-		if (len > size)
-			len = size;
-
-		if (skb_append_pagefrags(skb, page, offset, len,
-					 MAX_SKB_FRAGS)) {
-			err = -EMSGSIZE;
-			goto error;
-		}
-
-		if (skb->ip_summed == CHECKSUM_NONE) {
-			__wsum csum;
-			csum = csum_page(page, offset, len);
-			skb->csum = csum_block_add(skb->csum, csum, skb->len);
-		}
-
-		skb_len_add(skb, len);
-		refcount_add(len, &sk->sk_wmem_alloc);
-		offset += len;
-		size -= len;
-	}
-	return 0;
-
-error:
-	cork->length -= size;
-	IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTDISCARDS);
-	return err;
-}
-
 static void ip_cork_release(struct inet_cork *cork)
 {
 	cork->flags &= ~IPCORK_OPT;


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH net-next v10 15/16] af_unix: Support MSG_SPLICE_PAGES
  2023-05-22 12:11 [PATCH net-next v10 00/16] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1 David Howells
                   ` (13 preceding siblings ...)
  2023-05-22 12:11 ` [PATCH net-next v10 14/16] ip: Remove ip_append_page() David Howells
@ 2023-05-22 12:11 ` David Howells
  2023-05-22 12:11 ` [PATCH net-next v10 16/16] unix: Convert unix_stream_sendpage() to use MSG_SPLICE_PAGES David Howells
  2023-05-24  4:20 ` [PATCH net-next v10 00/16] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1 patchwork-bot+netdevbpf
  16 siblings, 0 replies; 43+ messages in thread
From: David Howells @ 2023-05-22 12:11 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Kuniyuki Iwashima

Make AF_UNIX sendmsg() support MSG_SPLICE_PAGES, splicing in pages from the
source iterator if possible and copying the data in otherwise.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Kuniyuki Iwashima <kuniyu@amazon.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---

Notes:
    ver #6)
     - Use common helper.

 net/unix/af_unix.c | 49 +++++++++++++++++++++++++++++++---------------
 1 file changed, 33 insertions(+), 16 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index dd55506b4632..976bc1c5e11b 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2200,19 +2200,25 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 	while (sent < len) {
 		size = len - sent;
 
-		/* Keep two messages in the pipe so it schedules better */
-		size = min_t(int, size, (sk->sk_sndbuf >> 1) - 64);
+		if (unlikely(msg->msg_flags & MSG_SPLICE_PAGES)) {
+			skb = sock_alloc_send_pskb(sk, 0, 0,
+						   msg->msg_flags & MSG_DONTWAIT,
+						   &err, 0);
+		} else {
+			/* Keep two messages in the pipe so it schedules better */
+			size = min_t(int, size, (sk->sk_sndbuf >> 1) - 64);
 
-		/* allow fallback to order-0 allocations */
-		size = min_t(int, size, SKB_MAX_HEAD(0) + UNIX_SKB_FRAGS_SZ);
+			/* allow fallback to order-0 allocations */
+			size = min_t(int, size, SKB_MAX_HEAD(0) + UNIX_SKB_FRAGS_SZ);
 
-		data_len = max_t(int, 0, size - SKB_MAX_HEAD(0));
+			data_len = max_t(int, 0, size - SKB_MAX_HEAD(0));
 
-		data_len = min_t(size_t, size, PAGE_ALIGN(data_len));
+			data_len = min_t(size_t, size, PAGE_ALIGN(data_len));
 
-		skb = sock_alloc_send_pskb(sk, size - data_len, data_len,
-					   msg->msg_flags & MSG_DONTWAIT, &err,
-					   get_order(UNIX_SKB_FRAGS_SZ));
+			skb = sock_alloc_send_pskb(sk, size - data_len, data_len,
+						   msg->msg_flags & MSG_DONTWAIT, &err,
+						   get_order(UNIX_SKB_FRAGS_SZ));
+		}
 		if (!skb)
 			goto out_err;
 
@@ -2224,13 +2230,24 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 		}
 		fds_sent = true;
 
-		skb_put(skb, size - data_len);
-		skb->data_len = data_len;
-		skb->len = size;
-		err = skb_copy_datagram_from_iter(skb, 0, &msg->msg_iter, size);
-		if (err) {
-			kfree_skb(skb);
-			goto out_err;
+		if (unlikely(msg->msg_flags & MSG_SPLICE_PAGES)) {
+			err = skb_splice_from_iter(skb, &msg->msg_iter, size,
+						   sk->sk_allocation);
+			if (err < 0) {
+				kfree_skb(skb);
+				goto out_err;
+			}
+			size = err;
+			refcount_add(size, &sk->sk_wmem_alloc);
+		} else {
+			skb_put(skb, size - data_len);
+			skb->data_len = data_len;
+			skb->len = size;
+			err = skb_copy_datagram_from_iter(skb, 0, &msg->msg_iter, size);
+			if (err) {
+				kfree_skb(skb);
+				goto out_err;
+			}
 		}
 
 		unix_state_lock(other);


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH net-next v10 16/16] unix: Convert unix_stream_sendpage() to use MSG_SPLICE_PAGES
  2023-05-22 12:11 [PATCH net-next v10 00/16] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1 David Howells
                   ` (14 preceding siblings ...)
  2023-05-22 12:11 ` [PATCH net-next v10 15/16] af_unix: Support MSG_SPLICE_PAGES David Howells
@ 2023-05-22 12:11 ` David Howells
  2023-05-24  4:20 ` [PATCH net-next v10 00/16] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1 patchwork-bot+netdevbpf
  16 siblings, 0 replies; 43+ messages in thread
From: David Howells @ 2023-05-22 12:11 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Kuniyuki Iwashima

Convert unix_stream_sendpage() to use sendmsg() with MSG_SPLICE_PAGES
rather than directly splicing in the pages itself.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Kuniyuki Iwashima <kuniyu@amazon.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---

Notes:
    ver #10)
     - Fix subject to refer to unix_stream_sendpage() not udp_sendpage().

 net/unix/af_unix.c | 134 +++------------------------------------------
 1 file changed, 7 insertions(+), 127 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 976bc1c5e11b..115436ce1f8a 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1839,24 +1839,6 @@ static void maybe_add_creds(struct sk_buff *skb, const struct socket *sock,
 	}
 }
 
-static int maybe_init_creds(struct scm_cookie *scm,
-			    struct socket *socket,
-			    const struct sock *other)
-{
-	int err;
-	struct msghdr msg = { .msg_controllen = 0 };
-
-	err = scm_send(socket, &msg, scm, false);
-	if (err)
-		return err;
-
-	if (unix_passcred_enabled(socket, other)) {
-		scm->pid = get_pid(task_tgid(current));
-		current_uid_gid(&scm->creds.uid, &scm->creds.gid);
-	}
-	return err;
-}
-
 static bool unix_skb_scm_eq(struct sk_buff *skb,
 			    struct scm_cookie *scm)
 {
@@ -2292,117 +2274,15 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 static ssize_t unix_stream_sendpage(struct socket *socket, struct page *page,
 				    int offset, size_t size, int flags)
 {
-	int err;
-	bool send_sigpipe = false;
-	bool init_scm = true;
-	struct scm_cookie scm;
-	struct sock *other, *sk = socket->sk;
-	struct sk_buff *skb, *newskb = NULL, *tail = NULL;
-
-	if (flags & MSG_OOB)
-		return -EOPNOTSUPP;
+	struct bio_vec bvec;
+	struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES };
 
-	other = unix_peer(sk);
-	if (!other || sk->sk_state != TCP_ESTABLISHED)
-		return -ENOTCONN;
-
-	if (false) {
-alloc_skb:
-		unix_state_unlock(other);
-		mutex_unlock(&unix_sk(other)->iolock);
-		newskb = sock_alloc_send_pskb(sk, 0, 0, flags & MSG_DONTWAIT,
-					      &err, 0);
-		if (!newskb)
-			goto err;
-	}
-
-	/* we must acquire iolock as we modify already present
-	 * skbs in the sk_receive_queue and mess with skb->len
-	 */
-	err = mutex_lock_interruptible(&unix_sk(other)->iolock);
-	if (err) {
-		err = flags & MSG_DONTWAIT ? -EAGAIN : -ERESTARTSYS;
-		goto err;
-	}
-
-	if (sk->sk_shutdown & SEND_SHUTDOWN) {
-		err = -EPIPE;
-		send_sigpipe = true;
-		goto err_unlock;
-	}
-
-	unix_state_lock(other);
+	if (flags & MSG_SENDPAGE_NOTLAST)
+		msg.msg_flags |= MSG_MORE;
 
-	if (sock_flag(other, SOCK_DEAD) ||
-	    other->sk_shutdown & RCV_SHUTDOWN) {
-		err = -EPIPE;
-		send_sigpipe = true;
-		goto err_state_unlock;
-	}
-
-	if (init_scm) {
-		err = maybe_init_creds(&scm, socket, other);
-		if (err)
-			goto err_state_unlock;
-		init_scm = false;
-	}
-
-	skb = skb_peek_tail(&other->sk_receive_queue);
-	if (tail && tail == skb) {
-		skb = newskb;
-	} else if (!skb || !unix_skb_scm_eq(skb, &scm)) {
-		if (newskb) {
-			skb = newskb;
-		} else {
-			tail = skb;
-			goto alloc_skb;
-		}
-	} else if (newskb) {
-		/* this is fast path, we don't necessarily need to
-		 * call to kfree_skb even though with newskb == NULL
-		 * this - does no harm
-		 */
-		consume_skb(newskb);
-		newskb = NULL;
-	}
-
-	if (skb_append_pagefrags(skb, page, offset, size, MAX_SKB_FRAGS)) {
-		tail = skb;
-		goto alloc_skb;
-	}
-
-	skb->len += size;
-	skb->data_len += size;
-	skb->truesize += size;
-	refcount_add(size, &sk->sk_wmem_alloc);
-
-	if (newskb) {
-		err = unix_scm_to_skb(&scm, skb, false);
-		if (err)
-			goto err_state_unlock;
-		spin_lock(&other->sk_receive_queue.lock);
-		__skb_queue_tail(&other->sk_receive_queue, newskb);
-		spin_unlock(&other->sk_receive_queue.lock);
-	}
-
-	unix_state_unlock(other);
-	mutex_unlock(&unix_sk(other)->iolock);
-
-	other->sk_data_ready(other);
-	scm_destroy(&scm);
-	return size;
-
-err_state_unlock:
-	unix_state_unlock(other);
-err_unlock:
-	mutex_unlock(&unix_sk(other)->iolock);
-err:
-	kfree_skb(newskb);
-	if (send_sigpipe && !(flags & MSG_NOSIGNAL))
-		send_sig(SIGPIPE, current, 0);
-	if (!init_scm)
-		scm_destroy(&scm);
-	return err;
+	bvec_set_page(&bvec, page, size, offset);
+	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
+	return unix_stream_sendmsg(socket, &msg, size);
 }
 
 static int unix_seqpacket_sendmsg(struct socket *sock, struct msghdr *msg,


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 00/16] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1
  2023-05-22 12:11 [PATCH net-next v10 00/16] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1 David Howells
                   ` (15 preceding siblings ...)
  2023-05-22 12:11 ` [PATCH net-next v10 16/16] unix: Convert unix_stream_sendpage() to use MSG_SPLICE_PAGES David Howells
@ 2023-05-24  4:20 ` patchwork-bot+netdevbpf
  16 siblings, 0 replies; 43+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-05-24  4:20 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, davem, edumazet, kuba, pabeni, willemdebruijn.kernel,
	dsahern, willy, viro, hch, axboe, jlayton, brauner, chuck.lever,
	torvalds, linux-fsdevel, linux-kernel, linux-mm

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 22 May 2023 13:11:09 +0100 you wrote:
> Here's the first tranche of patches towards providing a MSG_SPLICE_PAGES
> internal sendmsg flag that is intended to replace the ->sendpage() op with
> calls to sendmsg().  MSG_SPLICE_PAGES is a hint that tells the protocol
> that it should splice the pages supplied if it can and copy them if not.
> 
> This will allow splice to pass multiple pages in a single call and allow
> certain parts of higher protocols (e.g. sunrpc, iwarp) to pass an entire
> message in one go rather than having to send them piecemeal.  This should
> also make it easier to handle the splicing of multipage folios.
> 
> [...]

Here is the summary with links:
  - [net-next,v10,01/16] net: Declare MSG_SPLICE_PAGES internal sendmsg() flag
    https://git.kernel.org/netdev/net-next/c/b841b901c452
  - [net-next,v10,02/16] net: Pass max frags into skb_append_pagefrags()
    https://git.kernel.org/netdev/net-next/c/96449f902407
  - [net-next,v10,03/16] net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES
    https://git.kernel.org/netdev/net-next/c/2e910b95329c
  - [net-next,v10,04/16] tcp: Support MSG_SPLICE_PAGES
    https://git.kernel.org/netdev/net-next/c/270a1c3de47e
  - [net-next,v10,05/16] tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES
    https://git.kernel.org/netdev/net-next/c/c5c37af6ecad
  - [net-next,v10,06/16] tcp_bpf: Inline do_tcp_sendpages as it's now a wrapper around tcp_sendmsg
    https://git.kernel.org/netdev/net-next/c/ebf2e8860eea
  - [net-next,v10,07/16] espintcp: Inline do_tcp_sendpages()
    https://git.kernel.org/netdev/net-next/c/7f8816ab4bae
  - [net-next,v10,08/16] tls: Inline do_tcp_sendpages()
    https://git.kernel.org/netdev/net-next/c/e117dcfd646e
  - [net-next,v10,09/16] siw: Inline do_tcp_sendpages()
    https://git.kernel.org/netdev/net-next/c/c2ff29e99a76
  - [net-next,v10,10/16] tcp: Fold do_tcp_sendpages() into tcp_sendpage_locked()
    https://git.kernel.org/netdev/net-next/c/5367f9bbb86a
  - [net-next,v10,11/16] ip, udp: Support MSG_SPLICE_PAGES
    https://git.kernel.org/netdev/net-next/c/7da0dde68486
  - [net-next,v10,12/16] ip6, udp6: Support MSG_SPLICE_PAGES
    https://git.kernel.org/netdev/net-next/c/6d8192bd69bb
  - [net-next,v10,13/16] udp: Convert udp_sendpage() to use MSG_SPLICE_PAGES
    https://git.kernel.org/netdev/net-next/c/7ac7c987850c
  - [net-next,v10,14/16] ip: Remove ip_append_page()
    https://git.kernel.org/netdev/net-next/c/c49cf2663291
  - [net-next,v10,15/16] af_unix: Support MSG_SPLICE_PAGES
    https://git.kernel.org/netdev/net-next/c/a0dbf5f818f9
  - [net-next,v10,16/16] unix: Convert unix_stream_sendpage() to use MSG_SPLICE_PAGES
    https://git.kernel.org/netdev/net-next/c/57d44a354a43

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 03/16] net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES
  2023-05-22 12:11 ` [PATCH net-next v10 03/16] net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES David Howells
@ 2023-05-24 12:24   ` Yunsheng Lin
  2023-05-24 13:21   ` David Howells
  1 sibling, 0 replies; 43+ messages in thread
From: Yunsheng Lin @ 2023-05-24 12:24 UTC (permalink / raw)
  To: David Howells, netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn, David Ahern, Matthew Wilcox, Al Viro,
	Christoph Hellwig, Jens Axboe, Jeff Layton, Christian Brauner,
	Chuck Lever III, Linus Torvalds, linux-fsdevel, linux-kernel,
	linux-mm

On 2023/5/22 20:11, David Howells wrote:

Hi, David

I am not very familiar with the 'struct iov_iter' yet, just two
questions below.

> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 7f53dcb26ad3..f4a5b51aed22 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -6892,3 +6892,91 @@ nodefer:	__kfree_skb(skb);
>  	if (unlikely(kick) && !cmpxchg(&sd->defer_ipi_scheduled, 0, 1))
>  		smp_call_function_single_async(cpu, &sd->defer_csd);
>  }
> +
> +static void skb_splice_csum_page(struct sk_buff *skb, struct page *page,
> +				 size_t offset, size_t len)
> +{
> +	const char *kaddr;
> +	__wsum csum;
> +
> +	kaddr = kmap_local_page(page);
> +	csum = csum_partial(kaddr + offset, len, 0);
> +	kunmap_local(kaddr);
> +	skb->csum = csum_block_add(skb->csum, csum, skb->len);
> +}
> +
> +/**
> + * skb_splice_from_iter - Splice (or copy) pages to skbuff
> + * @skb: The buffer to add pages to
> + * @iter: Iterator representing the pages to be added
> + * @maxsize: Maximum amount of pages to be added
> + * @gfp: Allocation flags
> + *
> + * This is a common helper function for supporting MSG_SPLICE_PAGES.  It
> + * extracts pages from an iterator and adds them to the socket buffer if
> + * possible, copying them to fragments if not possible (such as if they're slab
> + * pages).
> + *
> + * Returns the amount of data spliced/copied or -EMSGSIZE if there's

I am not seeing any copying done directly in the skb_splice_from_iter(),
maybe iov_iter_extract_pages() has done copying for it?

> + * insufficient space in the buffer to transfer anything.
> + */
> +ssize_t skb_splice_from_iter(struct sk_buff *skb, struct iov_iter *iter,
> +			     ssize_t maxsize, gfp_t gfp)
> +{
> +	size_t frag_limit = READ_ONCE(sysctl_max_skb_frags);
> +	struct page *pages[8], **ppages = pages;
> +	ssize_t spliced = 0, ret = 0;
> +	unsigned int i;
> +
> +	while (iter->count > 0) {
> +		ssize_t space, nr;
> +		size_t off, len;
> +
> +		ret = -EMSGSIZE;
> +		space = frag_limit - skb_shinfo(skb)->nr_frags;
> +		if (space < 0)
> +			break;
> +
> +		/* We might be able to coalesce without increasing nr_frags */
> +		nr = clamp_t(size_t, space, 1, ARRAY_SIZE(pages));
> +
> +		len = iov_iter_extract_pages(iter, &ppages, maxsize, nr, 0, &off);
> +		if (len <= 0) {
> +			ret = len ?: -EIO;
> +			break;
> +		}
> +
> +		i = 0;
> +		do {
> +			struct page *page = pages[i++];
> +			size_t part = min_t(size_t, PAGE_SIZE - off, len);
> +
> +			ret = -EIO;
> +			if (WARN_ON_ONCE(!sendpage_ok(page)))
> +				goto out;
> +
> +			ret = skb_append_pagefrags(skb, page, off, part,
> +						   frag_limit);
> +			if (ret < 0) {
> +				iov_iter_revert(iter, len);

I am not sure I understand the error handling here, doesn't 'len'
indicate the remaining size of the data to be appended to skb, maybe
we should revert the size of data that is already appended to skb here?
Does 'spliced' need to be adjusted accordingly?

> +				goto out;
> +			}
> +
> +			if (skb->ip_summed == CHECKSUM_NONE)
> +				skb_splice_csum_page(skb, page, off, part);
> +
> +			off = 0;
> +			spliced += part;
> +			maxsize -= part;
> +			len -= part;
> +		} while (len > 0);
> +
> +		if (maxsize <= 0)
> +			break;
> +	}
> +
> +out:
> +	skb_len_add(skb, spliced);
> +	return spliced ?: ret;
> +}
> +EXPORT_SYMBOL(skb_splice_from_iter);
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 03/16] net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES
  2023-05-22 12:11 ` [PATCH net-next v10 03/16] net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES David Howells
  2023-05-24 12:24   ` Yunsheng Lin
@ 2023-05-24 13:21   ` David Howells
  1 sibling, 0 replies; 43+ messages in thread
From: David Howells @ 2023-05-24 13:21 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: dhowells, netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm

Yunsheng Lin <linyunsheng@huawei.com> wrote:

> > + * Returns the amount of data spliced/copied or -EMSGSIZE if there's
> 
> I am not seeing any copying done directly in the skb_splice_from_iter(),
> maybe iov_iter_extract_pages() has done copying for it?

Ah, I took the code for that out and deferred it.  The comment needs amending.

> > +			ret = skb_append_pagefrags(skb, page, off, part,
> > +						   frag_limit);
> > +			if (ret < 0) {
> > +				iov_iter_revert(iter, len);
> 
> I am not sure I understand the error handling here, doesn't 'len'
> indicate the remaining size of the data to be appended to skb,

Yes.

> maybe we should revert the size of data that is already appended to skb
> here?  Does 'spliced' need to be adjusted accordingly?

Neither.

> I am not very familiar with the 'struct iov_iter' yet

An iov_iter struct is a cursor over a buffer.  It advances as we draw data or
space from that buffer.  Sometimes we overdraw and have to back up a bit -
hence the revert function.  It could possibly be renamed to something more
appropriate as (if/when ITER_PIPE is removed) it doesn't actually change the
buffer.

So looking at skb_splice_from_iter():

iov_iter_extract_pages() is used to get a list of pages from the buffer that
we think we're going to be able to handle.  If the buffer is of type IOVEC or
UBUF those pages would have pins inserted into them also; otherwise no pin or
ref will be taken on them.  MSG_SPLICE_PAGES should not be used with IOVEC or
UBUF types for the moment as the network layer does not yet handle pins.

iov_iter_extract_pages() will advance the iterator past the page fragments it
has returned.  If skb_append_pagefrags() indicates that it could not attach
the page, this isn't necessarily fatal - it could return -EMSGSIZE to indicate
there was no space, in which case we return to the caller to create a new
skbuff.

If a non-fatal error occurs, we may already have committed some parts of the
buffer to the skbuff and rewinding into that part of the buffer would cause a
repeat of the data which would be bad.

What the iov_iter_revert() is doing is rewinding iterator back past the part
of the extracted pages that we didn't get to use so that we will pick up where
we left off next time we're called.  It does *not* and must not revert the
data we've already transferred.

Arguably, I should revert when I return -EIO because sendpage_ok() returned
false, but that's a fatal error.

David


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 08/16] tls: Inline do_tcp_sendpages()
  2023-05-22 12:11 ` [PATCH net-next v10 08/16] tls: " David Howells
@ 2023-06-07 14:17   ` Tariq Toukan
  2023-06-07 15:03   ` David Howells
  1 sibling, 0 replies; 43+ messages in thread
From: Tariq Toukan @ 2023-06-07 14:17 UTC (permalink / raw)
  To: David Howells, netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn, David Ahern, Matthew Wilcox, Al Viro,
	Christoph Hellwig, Jens Axboe, Jeff Layton, Christian Brauner,
	Chuck Lever III, Linus Torvalds, linux-fsdevel, linux-kernel,
	linux-mm, Boris Pismenny, John Fastabend, Gal Pressman, ranro,
	samiram, drort, Tariq Toukan



On 22/05/2023 15:11, David Howells wrote:
> do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(),
> so inline it, allowing do_tcp_sendpages() to be removed.  This is part of
> replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Boris Pismenny <borisp@nvidia.com>
> cc: John Fastabend <john.fastabend@gmail.com>
> cc: Jakub Kicinski <kuba@kernel.org>
> cc: "David S. Miller" <davem@davemloft.net>
> cc: Eric Dumazet <edumazet@google.com>
> cc: Paolo Abeni <pabeni@redhat.com>
> cc: Jens Axboe <axboe@kernel.dk>
> cc: Matthew Wilcox <willy@infradead.org>
> cc: netdev@vger.kernel.org
> ---

Hi,

My team spotted a new degradation in TLS TX device offload, bisected to 
this patch.

 From a quick look at the patch, it's not clear to me what's going wrong.
Please let us know of any helpful information that we can provide to 
help in the debug.

Regards,
Tariq

Reproduce Flow:
client / server test using nginx and  wrk (nothing special/custom about 
the apps used).

client:
/opt/mellanox/iproute2/sbin/ip link set dev eth3 up
/opt/mellanox/iproute2/sbin/ip addr add 11.141.46.9/16 dev eth3

server:
/opt/mellanox/iproute2/sbin/ip link set dev eth3 up
/opt/mellanox/iproute2/sbin/ip addr add 11.141.46.10/16 dev eth3

client:
/auto/sw/regression/sw_net_ver_tools/ktls/tools/x86_64/nginx_openssl_3_0_0 
-p /usr/bin/drivertest_rpms/ktls/nginx/
/opt/mellanox/iproute2/sbin/ss -i src [11.141.46.9]

server:
/auto/sw/regression/sw_net_ver_tools/ktls/tools/x86_64/wrk_openssl_3_0_0 
-b11.141.46.10 -t4 -c874 -d14 --timeout 5s 
https://11.141.46.9:20443/256000b.img

client:
dmesg
/auto/sw/regression/sw_net_ver_tools/ktls/tools/x86_64/nginx_openssl_3_0_0 
-p /usr/bin/drivertest_rpms/ktls/nginx/ -s stop


[root@c-141-46-1-009 ~]# dmesg
------------[ cut here ]------------
WARNING: CPU: 1 PID: 977 at net/core/skbuff.c:6957 
skb_splice_from_iter+0x102/0x300
Modules linked in: rpcrdma rdma_ucm ib_iser libiscsi 
scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib 
ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink 
nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 
auth_rpcgss oid_registry overlay mlx5_core zram zsmalloc fuse
CPU: 1 PID: 977 Comm: nginx_openssl_3 Not tainted 
6.4.0-rc3_for_upstream_min_debug_2023_06_01_23_04 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:skb_splice_from_iter+0x102/0x300
Code: ef 48 8b 55 08 f6 c2 01 0f 85 54 01 00 00 8b 0d 98 cf 5f 01 48 89 
ea 85 c9 0f 8f 4c 01 00 00 48 8b 12 80 e6 02 74 48 49 89 dd <0f> 0b 48 
c7 c1 fb ff ff ff 45 01 65 70 45 01 65 74 45 01 a5 d0 00
RSP: 0018:ffff8881045abaa0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88814370fe00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffea00051123c0 RDI: ffff88814370fe00
RBP: ffffea0005112400 R08: 0000000000000011 R09: 0000000000003ffd
R10: 0000000000003ffd R11: 0000000000000008 R12: 0000000000002e6e
R13: ffff88814370fe00 R14: ffff8881045abae8 R15: 000000000000118f
FS:  00007f6e23043740(0000) GS:ffff88852c880000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000009c6c00 CR3: 000000013b791001 CR4: 0000000000370ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:

  ? kmalloc_reserve+0x86/0xe0
  tcp_sendmsg_locked+0x33e/0xd40
  tls_push_sg+0xdd/0x230
  tls_push_data+0x673/0x920
  tls_device_sendmsg+0x6e/0xc0
  sock_sendmsg+0x38/0x60
  sock_write_iter+0x97/0x100
  vfs_write+0x2df/0x380
  ksys_write+0xa7/0xe0
  do_syscall_64+0x3d/0x90
  entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f6e22f018b7
Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e 
fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 
f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
RSP: 002b:00007ffdb528a2f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000004000 RCX: 00007f6e22f018b7
RDX: 0000000000004000 RSI: 00000000025cdef0 RDI: 0000000000000028
RBP: 00000000020103c0 R08: 00007ffdb5289a90 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000025cdef0
R13: 000000000204fca0 R14: 0000000000004000 R15: 0000000000004000

---[ end trace 0000000000000000 ]---



>   include/net/tls.h  |  2 +-
>   net/tls/tls_main.c | 24 +++++++++++++++---------
>   2 files changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/include/net/tls.h b/include/net/tls.h
> index 6056ce5a2aa5..5791ca7a189c 100644
> --- a/include/net/tls.h
> +++ b/include/net/tls.h
> @@ -258,7 +258,7 @@ struct tls_context {
>   	struct scatterlist *partially_sent_record;
>   	u16 partially_sent_offset;
>   
> -	bool in_tcp_sendpages;
> +	bool splicing_pages;
>   	bool pending_open_record_frags;
>   
>   	struct mutex tx_lock; /* protects partially_sent_* fields and
> diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> index f2e7302a4d96..3d45fdb5c4e9 100644
> --- a/net/tls/tls_main.c
> +++ b/net/tls/tls_main.c
> @@ -125,7 +125,10 @@ int tls_push_sg(struct sock *sk,
>   		u16 first_offset,
>   		int flags)
>   {
> -	int sendpage_flags = flags | MSG_SENDPAGE_NOTLAST;
> +	struct bio_vec bvec;
> +	struct msghdr msg = {
> +		.msg_flags = MSG_SENDPAGE_NOTLAST | MSG_SPLICE_PAGES | flags,
> +	};
>   	int ret = 0;
>   	struct page *p;
>   	size_t size;
> @@ -134,16 +137,19 @@ int tls_push_sg(struct sock *sk,
>   	size = sg->length - offset;
>   	offset += sg->offset;
>   
> -	ctx->in_tcp_sendpages = true;
> +	ctx->splicing_pages = true;
>   	while (1) {
>   		if (sg_is_last(sg))
> -			sendpage_flags = flags;
> +			msg.msg_flags = flags;
>   
>   		/* is sending application-limited? */
>   		tcp_rate_check_app_limited(sk);
>   		p = sg_page(sg);
>   retry:
> -		ret = do_tcp_sendpages(sk, p, offset, size, sendpage_flags);
> +		bvec_set_page(&bvec, p, size, offset);
> +		iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
> +
> +		ret = tcp_sendmsg_locked(sk, &msg, size);
>   
>   		if (ret != size) {
>   			if (ret > 0) {
> @@ -155,7 +161,7 @@ int tls_push_sg(struct sock *sk,
>   			offset -= sg->offset;
>   			ctx->partially_sent_offset = offset;
>   			ctx->partially_sent_record = (void *)sg;
> -			ctx->in_tcp_sendpages = false;
> +			ctx->splicing_pages = false;
>   			return ret;
>   		}
>   
> @@ -169,7 +175,7 @@ int tls_push_sg(struct sock *sk,
>   		size = sg->length;
>   	}
>   
> -	ctx->in_tcp_sendpages = false;
> +	ctx->splicing_pages = false;
>   
>   	return 0;
>   }
> @@ -247,11 +253,11 @@ static void tls_write_space(struct sock *sk)
>   {
>   	struct tls_context *ctx = tls_get_ctx(sk);
>   
> -	/* If in_tcp_sendpages call lower protocol write space handler
> +	/* If splicing_pages call lower protocol write space handler
>   	 * to ensure we wake up any waiting operations there. For example
> -	 * if do_tcp_sendpages where to call sk_wait_event.
> +	 * if splicing pages where to call sk_wait_event.
>   	 */
> -	if (ctx->in_tcp_sendpages) {
> +	if (ctx->splicing_pages) {
>   		ctx->sk_write_space(sk);
>   		return;
>   	}
> 
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 08/16] tls: Inline do_tcp_sendpages()
  2023-05-22 12:11 ` [PATCH net-next v10 08/16] tls: " David Howells
  2023-06-07 14:17   ` Tariq Toukan
@ 2023-06-07 15:03   ` David Howells
  2023-06-13 11:15     ` Tariq Toukan
  1 sibling, 1 reply; 43+ messages in thread
From: David Howells @ 2023-06-07 15:03 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: dhowells, netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Boris Pismenny,
	John Fastabend, Gal Pressman, ranro, samiram, drort,
	Tariq Toukan

Tariq Toukan <ttoukan.linux@gmail.com> wrote:

> My team spotted a new degradation in TLS TX device offload, bisected to this
> patch.

I presume you're using some hardware (I'm guessing Mellanox?) that can
actually do TLS offload?  Unfortunately, I don't have any hardware that can do
this, so I can't test the tls_device stuff.

> From a quick look at the patch, it's not clear to me what's going wrong.
> Please let us know of any helpful information that we can provide to help in
> the debug.

Can you find out what source line this corresponds to?

	RIP: 0010:skb_splice_from_iter+0x102/0x300

Assuming you're building your own kernel, something like the following might
do the trick:

	echo "RIP: 0010:skb_splice_from_iter+0x102/0x300" |
	./scripts/decode_stacktrace.sh /my/built/vmlinux /my/build/tree

if you run it in the kernel source tree you're using and substitute the
paths to vmlinux and the build tree for modules.

David


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 08/16] tls: Inline do_tcp_sendpages()
  2023-06-07 15:03   ` David Howells
@ 2023-06-13 11:15     ` Tariq Toukan
  2023-06-19  8:23       ` Tariq Toukan
  2023-06-19  9:35       ` David Howells
  0 siblings, 2 replies; 43+ messages in thread
From: Tariq Toukan @ 2023-06-13 11:15 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Boris Pismenny,
	John Fastabend, Gal Pressman, ranro, samiram, drort,
	Tariq Toukan



On 07/06/2023 18:03, David Howells wrote:
> Tariq Toukan <ttoukan.linux@gmail.com> wrote:
> 
>> My team spotted a new degradation in TLS TX device offload, bisected to this
>> patch.
> 
> I presume you're using some hardware (I'm guessing Mellanox?) that can
> actually do TLS offload?  Unfortunately, I don't have any hardware that can do
> this, so I can't test the tls_device stuff.
> 
>>  From a quick look at the patch, it's not clear to me what's going wrong.
>> Please let us know of any helpful information that we can provide to help in
>> the debug.
> 
> Can you find out what source line this corresponds to?
> 
> 	RIP: 0010:skb_splice_from_iter+0x102/0x300
> 
> Assuming you're building your own kernel, something like the following might
> do the trick:
> 
> 	echo "RIP: 0010:skb_splice_from_iter+0x102/0x300" |
> 	./scripts/decode_stacktrace.sh /my/built/vmlinux /my/build/tree
> 

Hi,

It's:
RIP: 0010:skb_splice_from_iter (/usr/linux/net/core/skbuff.c:6957)

which coresponds to this line:
                         if (WARN_ON_ONCE(!sendpage_ok(page)))

> if you run it in the kernel source tree you're using and substitute the
> paths to vmlinux and the build tree for modules.
> 
> David
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 08/16] tls: Inline do_tcp_sendpages()
  2023-06-13 11:15     ` Tariq Toukan
@ 2023-06-19  8:23       ` Tariq Toukan
  2023-06-19  9:35       ` David Howells
  1 sibling, 0 replies; 43+ messages in thread
From: Tariq Toukan @ 2023-06-19  8:23 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Boris Pismenny,
	John Fastabend, Gal Pressman, ranro, samiram, drort,
	Tariq Toukan



On 13/06/2023 14:15, Tariq Toukan wrote:
> 
> 
> On 07/06/2023 18:03, David Howells wrote:
>> Tariq Toukan <ttoukan.linux@gmail.com> wrote:
>>
>>> My team spotted a new degradation in TLS TX device offload, bisected 
>>> to this
>>> patch.
>>
>> I presume you're using some hardware (I'm guessing Mellanox?) that can
>> actually do TLS offload?  Unfortunately, I don't have any hardware 
>> that can do
>> this, so I can't test the tls_device stuff.
>>
>>>  From a quick look at the patch, it's not clear to me what's going 
>>> wrong.
>>> Please let us know of any helpful information that we can provide to 
>>> help in
>>> the debug.
>>
>> Can you find out what source line this corresponds to?
>>
>>     RIP: 0010:skb_splice_from_iter+0x102/0x300
>>
>> Assuming you're building your own kernel, something like the following 
>> might
>> do the trick:
>>
>>     echo "RIP: 0010:skb_splice_from_iter+0x102/0x300" |
>>     ./scripts/decode_stacktrace.sh /my/built/vmlinux /my/build/tree
>>
> 
> Hi,
> 
> It's:
> RIP: 0010:skb_splice_from_iter (/usr/linux/net/core/skbuff.c:6957)
> 
> which coresponds to this line:
>                          if (WARN_ON_ONCE(!sendpage_ok(page)))
> 

Hi David,
Any other debug information that we can provide to progress with the 
analysis?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 08/16] tls: Inline do_tcp_sendpages()
  2023-06-13 11:15     ` Tariq Toukan
  2023-06-19  8:23       ` Tariq Toukan
@ 2023-06-19  9:35       ` David Howells
  2023-06-27 16:49         ` Tariq Toukan
                           ` (2 more replies)
  1 sibling, 3 replies; 43+ messages in thread
From: David Howells @ 2023-06-19  9:35 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: dhowells, netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Boris Pismenny,
	John Fastabend, Gal Pressman, ranro, samiram, drort,
	Tariq Toukan

Tariq Toukan <ttoukan.linux@gmail.com> wrote:

> Any other debug information that we can provide to progress with the analysis?

Can you see if the problem still happens on this branch of my tree?

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=sendpage-3-frag

It eases the restriction that the WARN_ON is warning about by (in patch 1[*])
copying slab objects into page fragments.

David

[*] "net: Copy slab data for sendmsg(MSG_SPLICE_PAGES)"


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 08/16] tls: Inline do_tcp_sendpages()
  2023-06-19  9:35       ` David Howells
@ 2023-06-27 16:49         ` Tariq Toukan
  2023-06-30 17:21           ` Jakub Kicinski
  2023-06-27 16:55         ` David Howells
  2023-06-27 17:06         ` David Howells
  2 siblings, 1 reply; 43+ messages in thread
From: Tariq Toukan @ 2023-06-27 16:49 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Boris Pismenny,
	John Fastabend, Gal Pressman, ranro, samiram, drort,
	Tariq Toukan



On 19/06/2023 12:35, David Howells wrote:
> Tariq Toukan <ttoukan.linux@gmail.com> wrote:
> 
>> Any other debug information that we can provide to progress with the analysis?
> 
> Can you see if the problem still happens on this branch of my tree?
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=sendpage-3-frag
> 
> It eases the restriction that the WARN_ON is warning about by (in patch 1[*])
> copying slab objects into page fragments.
> 
> David
> 
> [*] "net: Copy slab data for sendmsg(MSG_SPLICE_PAGES)"
> 

Hi David,

Unfortunately, it still happens:

------------[ cut here ]------------
WARNING: CPU: 2 PID: 93427 at net/core/skbuff.c:7013 
skb_splice_from_iter+0x299/0x550
Modules linked in: bonding nf_tables vfio_pci ip_gre geneve ib_umad 
rdma_ucm ipip tunnel4 ip6_gre gre ip6_tunnel tunnel6 ib_ipoib 
mlx5_vfio_pci vfio_pci_core mlx5_ib ib_uverbs mlx5_core sch_mqprio 
sch_mqprio_lib sch_netem iptable_raw vfio_iommu_type1 vfio openvswitch 
nsh rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm 
ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink 
xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss 
oid_registry overlay zram zsmalloc fuse [last unloaded: nf_tables]
CPU: 2 PID: 93427 Comm: nginx_openssl_3 Tainted: G        W 
6.4.0-rc6_net_next_mlx5_9b6e6b6 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:skb_splice_from_iter+0x299/0x550
Code: 49 8b 57 08 f6 c2 01 0f 85 89 01 00 00 8b 0d 22 b3 4a 01 4c 89 fa 
85 c9 0f 8f 81 01 00 00 48 8b 12 80 e6 02 0f 84 a3 00 00 00 <0f> 0b 48 
c7 c1 fb ff ff ff 44 01 6b 70 44 01 6b 74 44 01 ab d0 00
RSP: 0018:ffff8882a16d3a80 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88821a89ee00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffea0004c2cfc0 RDI: ffff88821a89ee00
RBP: 0000000000000f34 R08: 0000000000000011 R09: 0000000000000f34
R10: 0000000000000000 R11: 000000000000000d R12: 0000000000000004
R13: 0000000000002d0f R14: 0000000000000f34 R15: ffffea0004c2d000
FS:  00007f5c383eb740(0000) GS:ffff88885f880000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000002ea5000 CR3: 0000000264ffe006 CR4: 0000000000370ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  <TASK>
  ? __warn+0x79/0x120
  ? skb_splice_from_iter+0x299/0x550
  ? report_bug+0x17c/0x190
  ? handle_bug+0x3c/0x60
  ? exc_invalid_op+0x14/0x70
  ? asm_exc_invalid_op+0x16/0x20
  ? skb_splice_from_iter+0x299/0x550
  tcp_sendmsg_locked+0x375/0xd00
  tls_push_sg+0xdd/0x230
  tls_push_data+0x6de/0xb00
  tls_device_sendmsg+0x7a/0xd0
  sock_sendmsg+0x38/0x60
  sock_write_iter+0x97/0x100
  vfs_write+0x2df/0x380
  ksys_write+0xa7/0xe0
  do_syscall_64+0x3d/0x90
  entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f5c381018b7
Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e 
fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 
f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
RSP: 002b:00007ffee9750848 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000004000 RCX: 00007f5c381018b7
RDX: 0000000000004000 RSI: 0000000002ea2dc0 RDI: 00000000000000d1
RBP: 0000000001d1bbe0 R08: 00007ffee974ffe0 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000002ea2dc0
R13: 0000000001d2a7e0 R14: 0000000000004000 R15: 0000000000004000
  </TASK>
---[ end trace 0000000000000000 ]---

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 08/16] tls: Inline do_tcp_sendpages()
  2023-06-19  9:35       ` David Howells
  2023-06-27 16:49         ` Tariq Toukan
@ 2023-06-27 16:55         ` David Howells
  2023-06-27 17:06         ` David Howells
  2 siblings, 0 replies; 43+ messages in thread
From: David Howells @ 2023-06-27 16:55 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: dhowells, netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Boris Pismenny,
	John Fastabend, Gal Pressman, ranro, samiram, drort,
	Tariq Toukan

Tariq Toukan <ttoukan.linux@gmail.com> wrote:

> WARNING: CPU: 2 PID: 93427 at net/core/skbuff.c:7013

Is that this line for you:

			} else if (WARN_ON_ONCE(!sendpage_ok(page))) {

If so, it's not slab data, but we've got a page with a 0 refcount from
somewhere.

David


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 08/16] tls: Inline do_tcp_sendpages()
  2023-06-19  9:35       ` David Howells
  2023-06-27 16:49         ` Tariq Toukan
  2023-06-27 16:55         ` David Howells
@ 2023-06-27 17:06         ` David Howells
  2 siblings, 0 replies; 43+ messages in thread
From: David Howells @ 2023-06-27 17:06 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: dhowells, netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Boris Pismenny,
	John Fastabend, Gal Pressman, ranro, samiram, drort,
	Tariq Toukan

Can you try net-next/main?  Parts of the branch you're trying have been
dropped.

David


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 08/16] tls: Inline do_tcp_sendpages()
  2023-06-27 16:49         ` Tariq Toukan
@ 2023-06-30 17:21           ` Jakub Kicinski
  2023-07-04 20:06             ` Tariq Toukan
  2023-08-10 13:07             ` David Howells
  0 siblings, 2 replies; 43+ messages in thread
From: Jakub Kicinski @ 2023-06-30 17:21 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David Howells, netdev, David S. Miller, Eric Dumazet,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Boris Pismenny,
	John Fastabend, Gal Pressman, ranro, samiram, drort,
	Tariq Toukan

On Tue, 27 Jun 2023 19:49:22 +0300 Tariq Toukan wrote:
> Unfortunately, it still happens:
> 
> ------------[ cut here ]------------
> WARNING: CPU: 2 PID: 93427 at net/core/skbuff.c:7013 

I can't repro it on net-next with basic TLS 1.2 sendmsg/stream
test + device offload, let us know if you still see it.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 08/16] tls: Inline do_tcp_sendpages()
  2023-06-30 17:21           ` Jakub Kicinski
@ 2023-07-04 20:06             ` Tariq Toukan
  2023-07-05 16:19               ` Jakub Kicinski
  2023-08-10 13:07             ` David Howells
  1 sibling, 1 reply; 43+ messages in thread
From: Tariq Toukan @ 2023-07-04 20:06 UTC (permalink / raw)
  To: Jakub Kicinski, David Howells
  Cc: netdev, David S. Miller, Eric Dumazet, Paolo Abeni,
	Willem de Bruijn, David Ahern, Matthew Wilcox, Al Viro,
	Christoph Hellwig, Jens Axboe, Jeff Layton, Christian Brauner,
	Chuck Lever III, Linus Torvalds, linux-fsdevel, linux-kernel,
	linux-mm, Boris Pismenny, John Fastabend, Gal Pressman, ranro,
	samiram, drort, Tariq Toukan



On 30/06/2023 20:21, Jakub Kicinski wrote:
> On Tue, 27 Jun 2023 19:49:22 +0300 Tariq Toukan wrote:
>> Unfortunately, it still happens:
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 2 PID: 93427 at net/core/skbuff.c:7013
> 
> I can't repro it on net-next with basic TLS 1.2 sendmsg/stream
> test + device offload, let us know if you still see it.

Hi,

Unfortunately, it still repros for us.

We are collecting more info on how the repro is affected by the 
different parameters.

Regards,
Tariq

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 08/16] tls: Inline do_tcp_sendpages()
  2023-07-04 20:06             ` Tariq Toukan
@ 2023-07-05 16:19               ` Jakub Kicinski
  2023-07-23  6:35                 ` Tariq Toukan
  2023-07-26 10:51                 ` David Howells
  0 siblings, 2 replies; 43+ messages in thread
From: Jakub Kicinski @ 2023-07-05 16:19 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David Howells, netdev, David S. Miller, Eric Dumazet,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Boris Pismenny,
	John Fastabend, Gal Pressman, ranro, samiram, drort,
	Tariq Toukan

On Tue, 4 Jul 2023 23:06:02 +0300 Tariq Toukan wrote:
> Unfortunately, it still repros for us.
> 
> We are collecting more info on how the repro is affected by the 
> different parameters.

Consider configuring kdump for your test env. Debugging is super easy
if one has the vmcore available.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 08/16] tls: Inline do_tcp_sendpages()
  2023-07-05 16:19               ` Jakub Kicinski
@ 2023-07-23  6:35                 ` Tariq Toukan
  2023-07-26  0:30                   ` Jakub Kicinski
  2023-07-26 10:51                 ` David Howells
  1 sibling, 1 reply; 43+ messages in thread
From: Tariq Toukan @ 2023-07-23  6:35 UTC (permalink / raw)
  To: Jakub Kicinski, David Howells
  Cc: netdev, David S. Miller, Eric Dumazet, Paolo Abeni,
	Willem de Bruijn, David Ahern, Matthew Wilcox, Al Viro,
	Christoph Hellwig, Jens Axboe, Jeff Layton, Christian Brauner,
	Chuck Lever III, Linus Torvalds, linux-fsdevel, linux-kernel,
	linux-mm, Boris Pismenny, John Fastabend, Gal Pressman, ranro,
	samiram, drort, Tariq Toukan



On 05/07/2023 19:19, Jakub Kicinski wrote:
> On Tue, 4 Jul 2023 23:06:02 +0300 Tariq Toukan wrote:
>> Unfortunately, it still repros for us.
>>
>> We are collecting more info on how the repro is affected by the
>> different parameters.
> 
> Consider configuring kdump for your test env. Debugging is super easy
> if one has the vmcore available.

Hi Jakub, David,

We repro the issue on the server side using this client command:
$ wrk -b2.2.2.2 -t4 -c1000 -d5 --timeout 5s 
https://2.2.2.3:20443/256000b.img

Port 20443 is configured with:
     ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256;
     sendfile    off;


Important:
1. Couldn't repro with files smaller than 40KB.
2. Couldn't repro with "sendfile    on;"

In addition, we collected the vmcore (forced by panic_on_warn), it can 
be downloaded from here:
https://drive.google.com/file/d/1Fi2dzgq6k2hb2L_kwyntRjfLF6_RmbxB/view?usp=sharing

Regards,
Tariq

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 08/16] tls: Inline do_tcp_sendpages()
  2023-07-23  6:35                 ` Tariq Toukan
@ 2023-07-26  0:30                   ` Jakub Kicinski
  2023-07-26 19:20                     ` Tariq Toukan
  2023-08-03 11:47                     ` Tariq Toukan
  0 siblings, 2 replies; 43+ messages in thread
From: Jakub Kicinski @ 2023-07-26  0:30 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David Howells, netdev, David S. Miller, Eric Dumazet,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Boris Pismenny,
	John Fastabend, Gal Pressman, ranro, samiram, drort,
	Tariq Toukan

On Sun, 23 Jul 2023 09:35:56 +0300 Tariq Toukan wrote:
> Hi Jakub, David,
> 
> We repro the issue on the server side using this client command:
> $ wrk -b2.2.2.2 -t4 -c1000 -d5 --timeout 5s 
> https://2.2.2.3:20443/256000b.img
> 
> Port 20443 is configured with:
>      ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256;
>      sendfile    off;
> 
> 
> Important:
> 1. Couldn't repro with files smaller than 40KB.
> 2. Couldn't repro with "sendfile    on;"
> 
> In addition, we collected the vmcore (forced by panic_on_warn), it can 
> be downloaded from here:
> https://drive.google.com/file/d/1Fi2dzgq6k2hb2L_kwyntRjfLF6_RmbxB/view?usp=sharing

This has no symbols :(

There is a small bug in this commit, we should always set SPLICE.
But I don't see how that'd cause the warning you're seeing.
Does your build have CONFIG_DEBUG_VM enabled?

-->8-------------------------

From: Jakub Kicinski <kuba@kernel.org>
Date: Tue, 25 Jul 2023 17:03:25 -0700
Subject: net: tls: set MSG_SPLICE_PAGES consistently

We used to change the flags for the last segment, because
non-last segments had the MSG_SENDPAGE_NOTLAST flag set.
That flag is no longer a thing so remove the setting.

Since flags most likely don't have MSG_SPLICE_PAGES set
this avoids passing parts of the sg as splice and parts
as non-splice.

... tags ...
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 net/tls/tls_main.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index b6896126bb92..4a8ee2f6badb 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -139,9 +139,6 @@ int tls_push_sg(struct sock *sk,
 
 	ctx->splicing_pages = true;
 	while (1) {
-		if (sg_is_last(sg))
-			msg.msg_flags = flags;
-
 		/* is sending application-limited? */
 		tcp_rate_check_app_limited(sk);
 		p = sg_page(sg);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 08/16] tls: Inline do_tcp_sendpages()
  2023-07-05 16:19               ` Jakub Kicinski
  2023-07-23  6:35                 ` Tariq Toukan
@ 2023-07-26 10:51                 ` David Howells
  2023-07-26 11:43                   ` Tariq Toukan
  1 sibling, 1 reply; 43+ messages in thread
From: David Howells @ 2023-07-26 10:51 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: dhowells, Jakub Kicinski, netdev, David S. Miller, Eric Dumazet,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Boris Pismenny,
	John Fastabend, Gal Pressman, ranro, samiram, drort,
	Tariq Toukan

Tariq Toukan <ttoukan.linux@gmail.com> wrote:

> We repro the issue on the server side using this client command:
> $ wrk -b2.2.2.2 -t4 -c1000 -d5 --timeout 5s https://2.2.2.3:20443/256000b.img

What's wrk?

David


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 08/16] tls: Inline do_tcp_sendpages()
  2023-07-26 10:51                 ` David Howells
@ 2023-07-26 11:43                   ` Tariq Toukan
  2023-07-26 14:57                     ` Jakub Kicinski
  0 siblings, 1 reply; 43+ messages in thread
From: Tariq Toukan @ 2023-07-26 11:43 UTC (permalink / raw)
  To: David Howells
  Cc: Jakub Kicinski, netdev, David S. Miller, Eric Dumazet,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Boris Pismenny,
	John Fastabend, Gal Pressman, ranro, samiram, drort,
	Tariq Toukan



On 26/07/2023 13:51, David Howells wrote:
> Tariq Toukan <ttoukan.linux@gmail.com> wrote:
> 
>> We repro the issue on the server side using this client command:
>> $ wrk -b2.2.2.2 -t4 -c1000 -d5 --timeout 5s https://2.2.2.3:20443/256000b.img
> 
> What's wrk?
> 
> David
> 

Pretty known and standard client app.
wrk - a HTTP benchmarking tool
https://github.com/wg/wrk

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 08/16] tls: Inline do_tcp_sendpages()
  2023-07-26 11:43                   ` Tariq Toukan
@ 2023-07-26 14:57                     ` Jakub Kicinski
  0 siblings, 0 replies; 43+ messages in thread
From: Jakub Kicinski @ 2023-07-26 14:57 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David Howells, netdev, David S. Miller, Eric Dumazet,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Boris Pismenny,
	John Fastabend, Gal Pressman, ranro, samiram, drort,
	Tariq Toukan

On Wed, 26 Jul 2023 14:43:35 +0300 Tariq Toukan wrote:
> >> We repro the issue on the server side using this client command:
> >> $ wrk -b2.2.2.2 -t4 -c1000 -d5 --timeout 5s https://2.2.2.3:20443/256000b.img  
> > 
> > What's wrk?
> > 
> > David
> >   
> 
> Pretty known and standard client app.
> wrk - a HTTP benchmarking tool
> https://github.com/wg/wrk

Let us know if your build has CONFIG_DEBUG_VM, please.
Because in the old code the warning was gated by this config,
so the bug may be older. We just started reporting it.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 08/16] tls: Inline do_tcp_sendpages()
  2023-07-26  0:30                   ` Jakub Kicinski
@ 2023-07-26 19:20                     ` Tariq Toukan
  2023-07-26 20:08                       ` Jakub Kicinski
  2023-08-03 11:47                     ` Tariq Toukan
  1 sibling, 1 reply; 43+ messages in thread
From: Tariq Toukan @ 2023-07-26 19:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David Howells, netdev, David S. Miller, Eric Dumazet,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Boris Pismenny,
	John Fastabend, Gal Pressman, ranro, samiram, drort,
	Tariq Toukan



On 26/07/2023 3:30, Jakub Kicinski wrote:
> On Sun, 23 Jul 2023 09:35:56 +0300 Tariq Toukan wrote:
>> Hi Jakub, David,
>>
>> We repro the issue on the server side using this client command:
>> $ wrk -b2.2.2.2 -t4 -c1000 -d5 --timeout 5s
>> https://2.2.2.3:20443/256000b.img
>>
>> Port 20443 is configured with:
>>       ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256;
>>       sendfile    off;
>>
>>
>> Important:
>> 1. Couldn't repro with files smaller than 40KB.
>> 2. Couldn't repro with "sendfile    on;"
>>
>> In addition, we collected the vmcore (forced by panic_on_warn), it can
>> be downloaded from here:
>> https://drive.google.com/file/d/1Fi2dzgq6k2hb2L_kwyntRjfLF6_RmbxB/view?usp=sharing
> 
> This has no symbols :(
> 

Uh.. :/
I'll try to fix this and re-generate.

> There is a small bug in this commit, we should always set SPLICE.
> But I don't see how that'd cause the warning you're seeing.
> Does your build have CONFIG_DEBUG_VM enabled?

No.

# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_VM_PGTABLE is not set

> 
> -->8-------------------------
> 
> From: Jakub Kicinski <kuba@kernel.org>
> Date: Tue, 25 Jul 2023 17:03:25 -0700
> Subject: net: tls: set MSG_SPLICE_PAGES consistently
> 
> We used to change the flags for the last segment, because
> non-last segments had the MSG_SENDPAGE_NOTLAST flag set.
> That flag is no longer a thing so remove the setting.
> 
> Since flags most likely don't have MSG_SPLICE_PAGES set
> this avoids passing parts of the sg as splice and parts
> as non-splice.
> 
> ... tags ...
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
>   net/tls/tls_main.c | 3 ---
>   1 file changed, 3 deletions(-)
> 
> diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> index b6896126bb92..4a8ee2f6badb 100644
> --- a/net/tls/tls_main.c
> +++ b/net/tls/tls_main.c
> @@ -139,9 +139,6 @@ int tls_push_sg(struct sock *sk,
>   
>   	ctx->splicing_pages = true;
>   	while (1) {
> -		if (sg_is_last(sg))
> -			msg.msg_flags = flags;
> -
>   		/* is sending application-limited? */
>   		tcp_rate_check_app_limited(sk);
>   		p = sg_page(sg);

I'll test this anyway tomorrow and update.

Regards,
Tariq

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 08/16] tls: Inline do_tcp_sendpages()
  2023-07-26 19:20                     ` Tariq Toukan
@ 2023-07-26 20:08                       ` Jakub Kicinski
  2023-08-03 11:52                         ` Tariq Toukan
  0 siblings, 1 reply; 43+ messages in thread
From: Jakub Kicinski @ 2023-07-26 20:08 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David Howells, netdev, David S. Miller, Eric Dumazet,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Boris Pismenny,
	John Fastabend, Gal Pressman, ranro, samiram, drort,
	Tariq Toukan

On Wed, 26 Jul 2023 22:20:42 +0300 Tariq Toukan wrote:
> > There is a small bug in this commit, we should always set SPLICE.
> > But I don't see how that'd cause the warning you're seeing.
> > Does your build have CONFIG_DEBUG_VM enabled?  
> 
> No.
> 
> # CONFIG_DEBUG_VM is not set
> # CONFIG_DEBUG_VM_PGTABLE is not set

Try testing v6.3 with DEBUG_VM enabled or just remove the IS_ENABLED()
from: https://github.com/torvalds/linux/blob/v6.4/net/ipv4/tcp.c#L1051

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 08/16] tls: Inline do_tcp_sendpages()
  2023-07-26  0:30                   ` Jakub Kicinski
  2023-07-26 19:20                     ` Tariq Toukan
@ 2023-08-03 11:47                     ` Tariq Toukan
  2023-08-04  3:12                       ` Jakub Kicinski
  1 sibling, 1 reply; 43+ messages in thread
From: Tariq Toukan @ 2023-08-03 11:47 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David Howells, netdev, David S. Miller, Eric Dumazet,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Boris Pismenny,
	John Fastabend, Gal Pressman, ranro, samiram, drort,
	Tariq Toukan



On 26/07/2023 3:30, Jakub Kicinski wrote:
> On Sun, 23 Jul 2023 09:35:56 +0300 Tariq Toukan wrote:
>> Hi Jakub, David,
>>
>> We repro the issue on the server side using this client command:
>> $ wrk -b2.2.2.2 -t4 -c1000 -d5 --timeout 5s
>> https://2.2.2.3:20443/256000b.img
>>
>> Port 20443 is configured with:
>>       ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256;
>>       sendfile    off;
>>
>>
>> Important:
>> 1. Couldn't repro with files smaller than 40KB.
>> 2. Couldn't repro with "sendfile    on;"
>>
>> In addition, we collected the vmcore (forced by panic_on_warn), it can
>> be downloaded from here:
>> https://drive.google.com/file/d/1Fi2dzgq6k2hb2L_kwyntRjfLF6_RmbxB/view?usp=sharing
> 
> This has no symbols :(
> 
> There is a small bug in this commit, we should always set SPLICE.
> But I don't see how that'd cause the warning you're seeing.
> Does your build have CONFIG_DEBUG_VM enabled?
> 
> -->8-------------------------
> 
> From: Jakub Kicinski <kuba@kernel.org>
> Date: Tue, 25 Jul 2023 17:03:25 -0700
> Subject: net: tls: set MSG_SPLICE_PAGES consistently
> 
> We used to change the flags for the last segment, because
> non-last segments had the MSG_SENDPAGE_NOTLAST flag set.
> That flag is no longer a thing so remove the setting.
> 
> Since flags most likely don't have MSG_SPLICE_PAGES set
> this avoids passing parts of the sg as splice and parts
> as non-splice.
> 
> ... tags ...
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
>   net/tls/tls_main.c | 3 ---
>   1 file changed, 3 deletions(-)
> 
> diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> index b6896126bb92..4a8ee2f6badb 100644
> --- a/net/tls/tls_main.c
> +++ b/net/tls/tls_main.c
> @@ -139,9 +139,6 @@ int tls_push_sg(struct sock *sk,
>   
>   	ctx->splicing_pages = true;
>   	while (1) {
> -		if (sg_is_last(sg))
> -			msg.msg_flags = flags;
> -
>   		/* is sending application-limited? */
>   		tcp_rate_check_app_limited(sk);
>   		p = sg_page(sg);

Hi Jakub,

When applying this patch, repro disappears! :)
Apparently it is related to the warning.
Please go on and submit it.

Tested-by: Tariq Toukan <tariqt@nvidia.com>

We are going to run more comprehensive tests, I'll let you know if we 
find anything unusual.

Regards,
Tariq

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 08/16] tls: Inline do_tcp_sendpages()
  2023-07-26 20:08                       ` Jakub Kicinski
@ 2023-08-03 11:52                         ` Tariq Toukan
  0 siblings, 0 replies; 43+ messages in thread
From: Tariq Toukan @ 2023-08-03 11:52 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David Howells, netdev, David S. Miller, Eric Dumazet,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Boris Pismenny,
	John Fastabend, Gal Pressman, ranro, samiram, drort,
	Tariq Toukan



On 26/07/2023 23:08, Jakub Kicinski wrote:
> On Wed, 26 Jul 2023 22:20:42 +0300 Tariq Toukan wrote:
>>> There is a small bug in this commit, we should always set SPLICE.
>>> But I don't see how that'd cause the warning you're seeing.
>>> Does your build have CONFIG_DEBUG_VM enabled?
>>
>> No.
>>
>> # CONFIG_DEBUG_VM is not set
>> # CONFIG_DEBUG_VM_PGTABLE is not set
> 
> Try testing v6.3 with DEBUG_VM enabled or just remove the IS_ENABLED()
> from: https://github.com/torvalds/linux/blob/v6.4/net/ipv4/tcp.c#L1051

Tested. It doesn't repro.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 08/16] tls: Inline do_tcp_sendpages()
  2023-08-03 11:47                     ` Tariq Toukan
@ 2023-08-04  3:12                       ` Jakub Kicinski
  2023-08-08  7:29                         ` Tariq Toukan
  0 siblings, 1 reply; 43+ messages in thread
From: Jakub Kicinski @ 2023-08-04  3:12 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David Howells, netdev, David S. Miller, Eric Dumazet,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Boris Pismenny,
	John Fastabend, Gal Pressman, ranro, samiram, drort,
	Tariq Toukan

On Thu, 3 Aug 2023 14:47:35 +0300 Tariq Toukan wrote:
> When applying this patch, repro disappears! :)
> Apparently it is related to the warning.
> Please go on and submit it.

I have no idea how. I found a different bug, staring at this code
for another hour. But I still don't get how we can avoid UaF on
a page by having the TCP take a ref on it rather than copy it.

If anything we should have 2 refs on any page in the sg, one because
it's on the sg, and another held by the re-tx handling.

So I'm afraid we're papering over something here :( We need to keep
digging.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 08/16] tls: Inline do_tcp_sendpages()
  2023-08-04  3:12                       ` Jakub Kicinski
@ 2023-08-08  7:29                         ` Tariq Toukan
  0 siblings, 0 replies; 43+ messages in thread
From: Tariq Toukan @ 2023-08-08  7:29 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David Howells, netdev, David S. Miller, Eric Dumazet,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Boris Pismenny,
	John Fastabend, Gal Pressman, ranro, samiram, drort,
	Tariq Toukan



On 04/08/2023 6:12, Jakub Kicinski wrote:
> On Thu, 3 Aug 2023 14:47:35 +0300 Tariq Toukan wrote:
>> When applying this patch, repro disappears! :)
>> Apparently it is related to the warning.
>> Please go on and submit it.
> 
> I have no idea how. I found a different bug, staring at this code
> for another hour. But I still don't get how we can avoid UaF on
> a page by having the TCP take a ref on it rather than copy it.
> 
> If anything we should have 2 refs on any page in the sg, one because
> it's on the sg, and another held by the re-tx handling.
> 
> So I'm afraid we're papering over something here :( We need to keep
> digging.

Hi Jakub,
I'm glad to see that you already nailed the other bug and merged the fix.

I can update that we ran comprehensive TLS testing on a branch that 
contains your proposed fix (net: tls: set MSG_SPLICE_PAGES 
consistently), and doesn't contain the other fix (net: tls: avoid 
discarding data on record close).

Except for one "known" issue (we'll discuss it in a second), the runs 
look clean.
No more traces or encrypt/decrypt error counters. Your proposed fix 
seems to work and causes no degradation.
How do you suggest proceeding here?

One mysterious remaining issue, which I already reported some time ago 
but couldn't effectively debug due to other TLS bugs, is the increase of 
TlsDecryptError / TlsEncryptError counters when running kTLS offloaded 
traffic during bond creation on some other interface.
Weird...

We should start giving it the needed attention now that the other issues 
seem to be resolved.

Regards,
Tariq

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v10 08/16] tls: Inline do_tcp_sendpages()
  2023-06-30 17:21           ` Jakub Kicinski
  2023-07-04 20:06             ` Tariq Toukan
@ 2023-08-10 13:07             ` David Howells
  1 sibling, 0 replies; 43+ messages in thread
From: David Howells @ 2023-08-10 13:07 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: dhowells, Jakub Kicinski, netdev, David S. Miller, Eric Dumazet,
	Paolo Abeni, Willem de Bruijn, David Ahern, Matthew Wilcox,
	Al Viro, Christoph Hellwig, Jens Axboe, Jeff Layton,
	Christian Brauner, Chuck Lever III, Linus Torvalds,
	linux-fsdevel, linux-kernel, linux-mm, Boris Pismenny,
	John Fastabend, Gal Pressman, ranro, samiram, drort,
	Tariq Toukan

Tariq Toukan <ttoukan.linux@gmail.com> wrote:

> We are collecting more info on how the repro is affected by the different
> parameters.

I'm wondering if userspace is feeding the unspliceable page in somehow.  Could
you try running with the attached changes?  It might help catch the point at
which the offending page is first spliced into the pipe and any backtrace
might help localise the driver that's producing it.

Thanks,
David
---
diff --git a/fs/splice.c b/fs/splice.c
index 3e2a31e1ce6a..877df1de3863 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -218,6 +218,8 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
 	while (!pipe_full(head, tail, pipe->max_usage)) {
 		struct pipe_buffer *buf = &pipe->bufs[head & mask];
 
+		WARN_ON_ONCE(!sendpage_ok(spd->pages[page_nr]));
+
 		buf->page = spd->pages[page_nr];
 		buf->offset = spd->partial[page_nr].offset;
 		buf->len = spd->partial[page_nr].len;
@@ -252,6 +254,8 @@ ssize_t add_to_pipe(struct pipe_inode_info *pipe, struct pipe_buffer *buf)
 	unsigned int mask = pipe->ring_size - 1;
 	int ret;
 
+	WARN_ON_ONCE(!sendpage_ok(buf->page));
+
 	if (unlikely(!pipe->readers)) {
 		send_sig(SIGPIPE, current, 0);
 		ret = -EPIPE;
@@ -861,6 +865,8 @@ ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out,
 				break;
 			}
 
+			WARN_ON_ONCE(!sendpage_ok(buf->page));
+
 			bvec_set_page(&bvec[bc++], buf->page, seg, buf->offset);
 			remain -= seg;
 			if (remain == 0 || bc >= ARRAY_SIZE(bvec))
@@ -1411,6 +1417,8 @@ static int iter_to_pipe(struct iov_iter *from,
 		for (i = 0; i < n; i++) {
 			int size = min_t(int, left, PAGE_SIZE - start);
 
+			WARN_ON_ONCE(!sendpage_ok(pages[i]));
+
 			buf.page = pages[i];
 			buf.offset = start;
 			buf.len = size;


^ permalink raw reply related	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2023-08-10 13:08 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-22 12:11 [PATCH net-next v10 00/16] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1 David Howells
2023-05-22 12:11 ` [PATCH net-next v10 01/16] net: Declare MSG_SPLICE_PAGES internal sendmsg() flag David Howells
2023-05-22 12:11 ` [PATCH net-next v10 02/16] net: Pass max frags into skb_append_pagefrags() David Howells
2023-05-22 12:11 ` [PATCH net-next v10 03/16] net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES David Howells
2023-05-24 12:24   ` Yunsheng Lin
2023-05-24 13:21   ` David Howells
2023-05-22 12:11 ` [PATCH net-next v10 04/16] tcp: Support MSG_SPLICE_PAGES David Howells
2023-05-22 12:11 ` [PATCH net-next v10 05/16] tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES David Howells
2023-05-22 12:11 ` [PATCH net-next v10 06/16] tcp_bpf: Inline do_tcp_sendpages as it's now a wrapper around tcp_sendmsg David Howells
2023-05-22 12:11 ` [PATCH net-next v10 07/16] espintcp: Inline do_tcp_sendpages() David Howells
2023-05-22 12:11 ` [PATCH net-next v10 08/16] tls: " David Howells
2023-06-07 14:17   ` Tariq Toukan
2023-06-07 15:03   ` David Howells
2023-06-13 11:15     ` Tariq Toukan
2023-06-19  8:23       ` Tariq Toukan
2023-06-19  9:35       ` David Howells
2023-06-27 16:49         ` Tariq Toukan
2023-06-30 17:21           ` Jakub Kicinski
2023-07-04 20:06             ` Tariq Toukan
2023-07-05 16:19               ` Jakub Kicinski
2023-07-23  6:35                 ` Tariq Toukan
2023-07-26  0:30                   ` Jakub Kicinski
2023-07-26 19:20                     ` Tariq Toukan
2023-07-26 20:08                       ` Jakub Kicinski
2023-08-03 11:52                         ` Tariq Toukan
2023-08-03 11:47                     ` Tariq Toukan
2023-08-04  3:12                       ` Jakub Kicinski
2023-08-08  7:29                         ` Tariq Toukan
2023-07-26 10:51                 ` David Howells
2023-07-26 11:43                   ` Tariq Toukan
2023-07-26 14:57                     ` Jakub Kicinski
2023-08-10 13:07             ` David Howells
2023-06-27 16:55         ` David Howells
2023-06-27 17:06         ` David Howells
2023-05-22 12:11 ` [PATCH net-next v10 09/16] siw: " David Howells
2023-05-22 12:11 ` [PATCH net-next v10 10/16] tcp: Fold do_tcp_sendpages() into tcp_sendpage_locked() David Howells
2023-05-22 12:11 ` [PATCH net-next v10 11/16] ip, udp: Support MSG_SPLICE_PAGES David Howells
2023-05-22 12:11 ` [PATCH net-next v10 12/16] ip6, udp6: " David Howells
2023-05-22 12:11 ` [PATCH net-next v10 13/16] udp: Convert udp_sendpage() to use MSG_SPLICE_PAGES David Howells
2023-05-22 12:11 ` [PATCH net-next v10 14/16] ip: Remove ip_append_page() David Howells
2023-05-22 12:11 ` [PATCH net-next v10 15/16] af_unix: Support MSG_SPLICE_PAGES David Howells
2023-05-22 12:11 ` [PATCH net-next v10 16/16] unix: Convert unix_stream_sendpage() to use MSG_SPLICE_PAGES David Howells
2023-05-24  4:20 ` [PATCH net-next v10 00/16] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1 patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).