linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v5 00/16] splice, net: Switch over users of sendpage() and remove it
@ 2023-06-23 22:54 David Howells
  2023-06-23 22:54 ` [PATCH net-next v5 01/16] tcp_bpf, smc, tls, espintcp, siw: Reduce MSG_SENDPAGE_NOTLAST usage David Howells
                   ` (16 more replies)
  0 siblings, 17 replies; 29+ messages in thread
From: David Howells @ 2023-06-23 22:54 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel

Here's the final set of patches towards the removal of sendpage.  All the
drivers that use sendpage() get switched over to using sendmsg() with
MSG_SPLICE_PAGES.

The following changes are made:

 (1) Make the protocol drivers behave according to MSG_MORE, not
     MSG_SENDPAGE_NOTLAST.  The latter is restricted to turning on MSG_MORE
     in the sendpage() wrappers.

 (2) Fix ocfs2 to allocate its global protocol buffers with folio_alloc()
     rather than kzalloc() so as not to invoke the !sendpage_ok warning in
     skb_splice_from_iter().

 (3) Make ceph/rds, skb_send_sock, dlm, nvme, smc, ocfs2, drbd and iscsi
     use sendmsg(), not sendpage and make them specify MSG_MORE instead of
     MSG_SENDPAGE_NOTLAST.

 (4) Kill off sendpage and clean up MSG_SENDPAGE_NOTLAST.

I've pushed the patches here also:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=sendpage-3-killsb

David

Changes
=======
ver #5)
 - Switch condition for setting MSG_MORE in write_partial_message_data()
 - Split iscsi changes into client and target patches

ver #4)
 - Drop the copy-slab-to-fragment code from skb_splice_from_iter().
 - Slim down the driver changes to only do one page at a time.
 - Drop the siw and drbd page aggregation single-sendmsg patches.
 - Make nvme/host cancel MSG_SPLICE_PAGES if !sendpage_ok.
 - Break out the ocfs2 protocol buffer allocation fix into its own patch.
 - Simplify the ocfs2 protocol data transmission.

ver #3)
 - Remove duplicate decl of skb_splice_from_iter().
 - In tcp_bpf, reset msg_flags on each iteration to clear MSG_MORE.
 - In tcp_bpf, set MSG_MORE if there's more data in the sk_msg.
 - Split the nvme patch into host and target patches.

ver #2)
 - Wrapped some lines at 80.
 - Fixed parameter to put_cpu_ptr() to have an '&'.
 - Use "unsigned int" rather than "unsigned".
 - Removed duplicate word in comment.
 - Filled in the commit message on the last patch.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=51c78a4d532efe9543a4df019ff405f05c6157f6 # part 1
Link: https://lore.kernel.org/r/20230616161301.622169-1-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20230617121146.716077-1-dhowells@redhat.com/ # v2
Link: https://lore.kernel.org/r/20230620145338.1300897-1-dhowells@redhat.com/ # v3





David Howells (16):
  tcp_bpf, smc, tls, espintcp, siw: Reduce MSG_SENDPAGE_NOTLAST usage
  net: Use sendmsg(MSG_SPLICE_PAGES) not sendpage in skb_send_sock()
  ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage
  ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage()
  rds: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage
  dlm: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage
  nvme-tcp: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage
  nvmet-tcp: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage
  smc: Drop smc_sendpage() in favour of smc_sendmsg() + MSG_SPLICE_PAGES
  drbd: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage()
  scsi: iscsi_tcp: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage
  scsi: target: iscsi: Use sendmsg(MSG_SPLICE_PAGES) rather than
    sendpage
  ocfs2: Fix use of slab data with sendpage
  ocfs2: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage()
  sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES)
  net: Kill MSG_SENDPAGE_NOTLAST

 Documentation/bpf/map_sockmap.rst             | 10 +-
 Documentation/filesystems/locking.rst         |  2 -
 Documentation/filesystems/vfs.rst             |  1 -
 Documentation/networking/scaling.rst          |  4 +-
 crypto/af_alg.c                               | 28 ------
 crypto/algif_aead.c                           | 22 +----
 crypto/algif_rng.c                            |  2 -
 crypto/algif_skcipher.c                       | 14 ---
 drivers/block/drbd/drbd_main.c                | 12 ++-
 drivers/infiniband/sw/siw/siw_qp_tx.c         |  5 +-
 .../chelsio/inline_crypto/chtls/chtls.h       |  2 -
 .../chelsio/inline_crypto/chtls/chtls_io.c    | 14 ---
 .../chelsio/inline_crypto/chtls/chtls_main.c  |  1 -
 drivers/nvme/host/tcp.c                       | 49 +++++-----
 drivers/nvme/target/tcp.c                     | 46 ++++++----
 drivers/scsi/iscsi_tcp.c                      | 26 ++----
 drivers/scsi/iscsi_tcp.h                      |  2 -
 drivers/target/iscsi/iscsi_target_util.c      | 15 +--
 fs/dlm/lowcomms.c                             | 10 +-
 fs/nfsd/vfs.c                                 |  2 +-
 fs/ocfs2/cluster/tcp.c                        | 38 ++++----
 include/crypto/if_alg.h                       |  2 -
 include/linux/net.h                           |  8 --
 include/linux/socket.h                        |  4 +-
 include/net/inet_common.h                     |  2 -
 include/net/sock.h                            |  6 --
 include/net/tcp.h                             |  4 -
 net/appletalk/ddp.c                           |  1 -
 net/atm/pvc.c                                 |  1 -
 net/atm/svc.c                                 |  1 -
 net/ax25/af_ax25.c                            |  1 -
 net/caif/caif_socket.c                        |  2 -
 net/can/bcm.c                                 |  1 -
 net/can/isotp.c                               |  1 -
 net/can/j1939/socket.c                        |  1 -
 net/can/raw.c                                 |  1 -
 net/ceph/messenger_v1.c                       | 60 ++++--------
 net/ceph/messenger_v2.c                       | 91 ++++---------------
 net/core/skbuff.c                             | 50 +++++-----
 net/core/sock.c                               | 35 +------
 net/dccp/ipv4.c                               |  1 -
 net/dccp/ipv6.c                               |  1 -
 net/ieee802154/socket.c                       |  2 -
 net/ipv4/af_inet.c                            | 21 -----
 net/ipv4/tcp.c                                | 43 +--------
 net/ipv4/tcp_bpf.c                            | 28 +-----
 net/ipv4/tcp_ipv4.c                           |  1 -
 net/ipv4/udp.c                                | 15 ---
 net/ipv4/udp_impl.h                           |  2 -
 net/ipv4/udplite.c                            |  1 -
 net/ipv6/af_inet6.c                           |  3 -
 net/ipv6/raw.c                                |  1 -
 net/ipv6/tcp_ipv6.c                           |  1 -
 net/kcm/kcmsock.c                             | 20 ----
 net/key/af_key.c                              |  1 -
 net/l2tp/l2tp_ip.c                            |  1 -
 net/l2tp/l2tp_ip6.c                           |  1 -
 net/llc/af_llc.c                              |  1 -
 net/mctp/af_mctp.c                            |  1 -
 net/mptcp/protocol.c                          |  2 -
 net/netlink/af_netlink.c                      |  1 -
 net/netrom/af_netrom.c                        |  1 -
 net/packet/af_packet.c                        |  2 -
 net/phonet/socket.c                           |  2 -
 net/qrtr/af_qrtr.c                            |  1 -
 net/rds/af_rds.c                              |  1 -
 net/rds/tcp_send.c                            | 23 ++---
 net/rose/af_rose.c                            |  1 -
 net/rxrpc/af_rxrpc.c                          |  1 -
 net/sctp/protocol.c                           |  1 -
 net/smc/af_smc.c                              | 29 ------
 net/smc/smc_stats.c                           |  2 +-
 net/smc/smc_stats.h                           |  1 -
 net/smc/smc_tx.c                              | 19 +---
 net/smc/smc_tx.h                              |  2 -
 net/socket.c                                  | 48 ----------
 net/tipc/socket.c                             |  3 -
 net/tls/tls.h                                 |  6 --
 net/tls/tls_device.c                          | 24 +----
 net/tls/tls_main.c                            |  9 +-
 net/tls/tls_sw.c                              | 37 +-------
 net/unix/af_unix.c                            | 19 ----
 net/vmw_vsock/af_vsock.c                      |  3 -
 net/x25/af_x25.c                              |  1 -
 net/xdp/xsk.c                                 |  1 -
 net/xfrm/espintcp.c                           | 10 +-
 .../perf/trace/beauty/include/linux/socket.h  |  1 -
 tools/perf/trace/beauty/msg_flags.c           |  5 +-
 88 files changed, 230 insertions(+), 748 deletions(-)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH net-next v5 01/16] tcp_bpf, smc, tls, espintcp, siw: Reduce MSG_SENDPAGE_NOTLAST usage
  2023-06-23 22:54 [PATCH net-next v5 00/16] splice, net: Switch over users of sendpage() and remove it David Howells
@ 2023-06-23 22:54 ` David Howells
  2023-06-23 22:54 ` [PATCH net-next v5 02/16] net: Use sendmsg(MSG_SPLICE_PAGES) not sendpage in skb_send_sock() David Howells
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: David Howells @ 2023-06-23 22:54 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel,
	Bernard Metzler, Jason Gunthorpe, Leon Romanovsky,
	John Fastabend, Jakub Sitnicki, Karsten Graul, Wenjia Zhang,
	Jan Karcher, D. Wythe, Tony Lu, Wen Gu, Boris Pismenny,
	Steffen Klassert, Herbert Xu, bpf, linux-s390, linux-rdma

As MSG_SENDPAGE_NOTLAST is being phased out along with sendpage(), don't
use it further in than the sendpage methods, but rather translate it to
MSG_MORE and use that instead.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: Bernard Metzler <bmt@zurich.ibm.com>
cc: Jason Gunthorpe <jgg@ziepe.ca>
cc: Leon Romanovsky <leon@kernel.org>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Sitnicki <jakub@cloudflare.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: David Ahern <dsahern@kernel.org>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Karsten Graul <kgraul@linux.ibm.com>
cc: Wenjia Zhang <wenjia@linux.ibm.com>
cc: Jan Karcher <jaka@linux.ibm.com>
cc: "D. Wythe" <alibuda@linux.alibaba.com>
cc: Tony Lu <tonylu@linux.alibaba.com>
cc: Wen Gu <guwen@linux.alibaba.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: Steffen Klassert <steffen.klassert@secunet.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: netdev@vger.kernel.org
cc: bpf@vger.kernel.org
cc: linux-s390@vger.kernel.org
cc: linux-rdma@vger.kernel.org
---

Notes:
    ver #3)
     - In tcp_bpf, reset msg_flags on each iteration to clear MSG_MORE.
     - In tcp_bpf, set MSG_MORE if there's more data in the sk_msg.

 drivers/infiniband/sw/siw/siw_qp_tx.c |  5 ++---
 net/ipv4/tcp_bpf.c                    |  5 +++--
 net/smc/smc_tx.c                      |  6 ++++--
 net/tls/tls_device.c                  |  4 ++--
 net/xfrm/espintcp.c                   | 10 ++++++----
 5 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c
index ffb16beb6c30..7c7a51d36d0c 100644
--- a/drivers/infiniband/sw/siw/siw_qp_tx.c
+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
@@ -325,8 +325,7 @@ static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset,
 {
 	struct bio_vec bvec;
 	struct msghdr msg = {
-		.msg_flags = (MSG_MORE | MSG_DONTWAIT | MSG_SENDPAGE_NOTLAST |
-			      MSG_SPLICE_PAGES),
+		.msg_flags = (MSG_MORE | MSG_DONTWAIT | MSG_SPLICE_PAGES),
 	};
 	struct sock *sk = s->sk;
 	int i = 0, rv = 0, sent = 0;
@@ -335,7 +334,7 @@ static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset,
 		size_t bytes = min_t(size_t, PAGE_SIZE - offset, size);
 
 		if (size + offset <= PAGE_SIZE)
-			msg.msg_flags &= ~MSG_SENDPAGE_NOTLAST;
+			msg.msg_flags &= ~MSG_MORE;
 
 		tcp_rate_check_app_limited(sk);
 		bvec_set_page(&bvec, page[i], bytes, offset);
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 5a84053ac62b..31d6005cea9b 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -88,9 +88,9 @@ static int bpf_tcp_ingress(struct sock *sk, struct sk_psock *psock,
 static int tcp_bpf_push(struct sock *sk, struct sk_msg *msg, u32 apply_bytes,
 			int flags, bool uncharge)
 {
+	struct msghdr msghdr = {};
 	bool apply = apply_bytes;
 	struct scatterlist *sge;
-	struct msghdr msghdr = { .msg_flags = flags | MSG_SPLICE_PAGES, };
 	struct page *page;
 	int size, ret = 0;
 	u32 off;
@@ -107,11 +107,12 @@ static int tcp_bpf_push(struct sock *sk, struct sk_msg *msg, u32 apply_bytes,
 
 		tcp_rate_check_app_limited(sk);
 retry:
+		msghdr.msg_flags = flags | MSG_SPLICE_PAGES;
 		has_tx_ulp = tls_sw_has_ctx_tx(sk);
 		if (has_tx_ulp)
 			msghdr.msg_flags |= MSG_SENDPAGE_NOPOLICY;
 
-		if (flags & MSG_SENDPAGE_NOTLAST)
+		if (size < sge->length && msg->sg.start != msg->sg.end)
 			msghdr.msg_flags |= MSG_MORE;
 
 		bvec_set_page(&bvec, page, size, off);
diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
index 45128443f1f1..9b9e0a190734 100644
--- a/net/smc/smc_tx.c
+++ b/net/smc/smc_tx.c
@@ -168,8 +168,7 @@ static bool smc_tx_should_cork(struct smc_sock *smc, struct msghdr *msg)
 	 * should known how/when to uncork it.
 	 */
 	if ((msg->msg_flags & MSG_MORE ||
-	     smc_tx_is_corked(smc) ||
-	     msg->msg_flags & MSG_SENDPAGE_NOTLAST) &&
+	     smc_tx_is_corked(smc)) &&
 	    atomic_read(&conn->sndbuf_space))
 		return true;
 
@@ -306,6 +305,9 @@ int smc_tx_sendpage(struct smc_sock *smc, struct page *page, int offset,
 	struct kvec iov;
 	int rc;
 
+	if (flags & MSG_SENDPAGE_NOTLAST)
+		msg.msg_flags |= MSG_MORE;
+
 	iov.iov_base = kaddr + offset;
 	iov.iov_len = size;
 	iov_iter_kvec(&msg.msg_iter, ITER_SOURCE, &iov, 1, size);
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index b82770f68807..975299d7213b 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -449,7 +449,7 @@ static int tls_push_data(struct sock *sk,
 		return -sk->sk_err;
 
 	flags |= MSG_SENDPAGE_DECRYPTED;
-	tls_push_record_flags = flags | MSG_SENDPAGE_NOTLAST;
+	tls_push_record_flags = flags | MSG_MORE;
 
 	timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
 	if (tls_is_partially_sent_record(tls_ctx)) {
@@ -532,7 +532,7 @@ static int tls_push_data(struct sock *sk,
 		if (!size) {
 last_record:
 			tls_push_record_flags = flags;
-			if (flags & (MSG_SENDPAGE_NOTLAST | MSG_MORE)) {
+			if (flags & MSG_MORE) {
 				more = true;
 				break;
 			}
diff --git a/net/xfrm/espintcp.c b/net/xfrm/espintcp.c
index 3504925babdb..d3b3f9e720b3 100644
--- a/net/xfrm/espintcp.c
+++ b/net/xfrm/espintcp.c
@@ -205,13 +205,15 @@ static int espintcp_sendskb_locked(struct sock *sk, struct espintcp_msg *emsg,
 static int espintcp_sendskmsg_locked(struct sock *sk,
 				     struct espintcp_msg *emsg, int flags)
 {
-	struct msghdr msghdr = { .msg_flags = flags | MSG_SPLICE_PAGES, };
+	struct msghdr msghdr = {
+		.msg_flags = flags | MSG_SPLICE_PAGES | MSG_MORE,
+	};
 	struct sk_msg *skmsg = &emsg->skmsg;
+	bool more = flags & MSG_MORE;
 	struct scatterlist *sg;
 	int done = 0;
 	int ret;
 
-	msghdr.msg_flags |= MSG_SENDPAGE_NOTLAST;
 	sg = &skmsg->sg.data[skmsg->sg.start];
 	do {
 		struct bio_vec bvec;
@@ -221,8 +223,8 @@ static int espintcp_sendskmsg_locked(struct sock *sk,
 
 		emsg->offset = 0;
 
-		if (sg_is_last(sg))
-			msghdr.msg_flags &= ~MSG_SENDPAGE_NOTLAST;
+		if (sg_is_last(sg) && !more)
+			msghdr.msg_flags &= ~MSG_MORE;
 
 		p = sg_page(sg);
 retry:


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH net-next v5 02/16] net: Use sendmsg(MSG_SPLICE_PAGES) not sendpage in skb_send_sock()
  2023-06-23 22:54 [PATCH net-next v5 00/16] splice, net: Switch over users of sendpage() and remove it David Howells
  2023-06-23 22:54 ` [PATCH net-next v5 01/16] tcp_bpf, smc, tls, espintcp, siw: Reduce MSG_SENDPAGE_NOTLAST usage David Howells
@ 2023-06-23 22:54 ` David Howells
  2023-06-23 22:55 ` [PATCH net-next v5 03/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage David Howells
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: David Howells @ 2023-06-23 22:54 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel

Use sendmsg() with MSG_SPLICE_PAGES rather than sendpage in
skb_send_sock().  This causes pages to be spliced from the source iterator
if possible.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Note that this could perhaps be improved to fill out a bvec array with all
the frags and then make a single sendmsg call, possibly sticking the header
on the front also.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: netdev@vger.kernel.org
---

Notes:
    ver #2)
     - Wrap lines at 80.

 net/core/skbuff.c | 50 ++++++++++++++++++++++++++---------------------
 1 file changed, 28 insertions(+), 22 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index fee2b1c105fe..6c5915efbc17 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2989,32 +2989,32 @@ int skb_splice_bits(struct sk_buff *skb, struct sock *sk, unsigned int offset,
 }
 EXPORT_SYMBOL_GPL(skb_splice_bits);
 
-static int sendmsg_unlocked(struct sock *sk, struct msghdr *msg,
-			    struct kvec *vec, size_t num, size_t size)
+static int sendmsg_locked(struct sock *sk, struct msghdr *msg)
 {
 	struct socket *sock = sk->sk_socket;
+	size_t size = msg_data_left(msg);
 
 	if (!sock)
 		return -EINVAL;
-	return kernel_sendmsg(sock, msg, vec, num, size);
+
+	if (!sock->ops->sendmsg_locked)
+		return sock_no_sendmsg_locked(sk, msg, size);
+
+	return sock->ops->sendmsg_locked(sk, msg, size);
 }
 
-static int sendpage_unlocked(struct sock *sk, struct page *page, int offset,
-			     size_t size, int flags)
+static int sendmsg_unlocked(struct sock *sk, struct msghdr *msg)
 {
 	struct socket *sock = sk->sk_socket;
 
 	if (!sock)
 		return -EINVAL;
-	return kernel_sendpage(sock, page, offset, size, flags);
+	return sock_sendmsg(sock, msg);
 }
 
-typedef int (*sendmsg_func)(struct sock *sk, struct msghdr *msg,
-			    struct kvec *vec, size_t num, size_t size);
-typedef int (*sendpage_func)(struct sock *sk, struct page *page, int offset,
-			     size_t size, int flags);
+typedef int (*sendmsg_func)(struct sock *sk, struct msghdr *msg);
 static int __skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset,
-			   int len, sendmsg_func sendmsg, sendpage_func sendpage)
+			   int len, sendmsg_func sendmsg)
 {
 	unsigned int orig_len = len;
 	struct sk_buff *head = skb;
@@ -3034,8 +3034,9 @@ static int __skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset,
 		memset(&msg, 0, sizeof(msg));
 		msg.msg_flags = MSG_DONTWAIT;
 
-		ret = INDIRECT_CALL_2(sendmsg, kernel_sendmsg_locked,
-				      sendmsg_unlocked, sk, &msg, &kv, 1, slen);
+		iov_iter_kvec(&msg.msg_iter, ITER_SOURCE, &kv, 1, slen);
+		ret = INDIRECT_CALL_2(sendmsg, sendmsg_locked,
+				      sendmsg_unlocked, sk, &msg);
 		if (ret <= 0)
 			goto error;
 
@@ -3066,11 +3067,18 @@ static int __skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset,
 		slen = min_t(size_t, len, skb_frag_size(frag) - offset);
 
 		while (slen) {
-			ret = INDIRECT_CALL_2(sendpage, kernel_sendpage_locked,
-					      sendpage_unlocked, sk,
-					      skb_frag_page(frag),
-					      skb_frag_off(frag) + offset,
-					      slen, MSG_DONTWAIT);
+			struct bio_vec bvec;
+			struct msghdr msg = {
+				.msg_flags = MSG_SPLICE_PAGES | MSG_DONTWAIT,
+			};
+
+			bvec_set_page(&bvec, skb_frag_page(frag), slen,
+				      skb_frag_off(frag) + offset);
+			iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1,
+				      slen);
+
+			ret = INDIRECT_CALL_2(sendmsg, sendmsg_locked,
+					      sendmsg_unlocked, sk, &msg);
 			if (ret <= 0)
 				goto error;
 
@@ -3107,16 +3115,14 @@ static int __skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset,
 int skb_send_sock_locked(struct sock *sk, struct sk_buff *skb, int offset,
 			 int len)
 {
-	return __skb_send_sock(sk, skb, offset, len, kernel_sendmsg_locked,
-			       kernel_sendpage_locked);
+	return __skb_send_sock(sk, skb, offset, len, sendmsg_locked);
 }
 EXPORT_SYMBOL_GPL(skb_send_sock_locked);
 
 /* Send skb data on a socket. Socket must be unlocked. */
 int skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset, int len)
 {
-	return __skb_send_sock(sk, skb, offset, len, sendmsg_unlocked,
-			       sendpage_unlocked);
+	return __skb_send_sock(sk, skb, offset, len, sendmsg_unlocked);
 }
 
 /**


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH net-next v5 03/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage
  2023-06-23 22:54 [PATCH net-next v5 00/16] splice, net: Switch over users of sendpage() and remove it David Howells
  2023-06-23 22:54 ` [PATCH net-next v5 01/16] tcp_bpf, smc, tls, espintcp, siw: Reduce MSG_SENDPAGE_NOTLAST usage David Howells
  2023-06-23 22:54 ` [PATCH net-next v5 02/16] net: Use sendmsg(MSG_SPLICE_PAGES) not sendpage in skb_send_sock() David Howells
@ 2023-06-23 22:55 ` David Howells
  2023-06-25 12:20   ` Ilya Dryomov
                     ` (2 more replies)
  2023-06-23 22:55 ` [PATCH net-next v5 04/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage() David Howells
                   ` (13 subsequent siblings)
  16 siblings, 3 replies; 29+ messages in thread
From: David Howells @ 2023-06-23 22:55 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel, Ilya Dryomov,
	Xiubo Li, Jeff Layton, ceph-devel

Use sendmsg() and MSG_SPLICE_PAGES rather than sendpage in ceph when
transmitting data.  For the moment, this can only transmit one page at a
time because of the architecture of net/ceph/, but if
write_partial_message_data() can be given a bvec[] at a time by the
iteration code, this would allow pages to be sent in a batch.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Xiubo Li <xiubli@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: ceph-devel@vger.kernel.org
cc: netdev@vger.kernel.org
---

Notes:
    ver #5)
     - Switch condition for setting MSG_MORE in write_partial_message_data()

 net/ceph/messenger_v1.c | 60 ++++++++++++++---------------------------
 1 file changed, 20 insertions(+), 40 deletions(-)

diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c
index d664cb1593a7..814579f27f04 100644
--- a/net/ceph/messenger_v1.c
+++ b/net/ceph/messenger_v1.c
@@ -74,37 +74,6 @@ static int ceph_tcp_sendmsg(struct socket *sock, struct kvec *iov,
 	return r;
 }
 
-/*
- * @more: either or both of MSG_MORE and MSG_SENDPAGE_NOTLAST
- */
-static int ceph_tcp_sendpage(struct socket *sock, struct page *page,
-			     int offset, size_t size, int more)
-{
-	ssize_t (*sendpage)(struct socket *sock, struct page *page,
-			    int offset, size_t size, int flags);
-	int flags = MSG_DONTWAIT | MSG_NOSIGNAL | more;
-	int ret;
-
-	/*
-	 * sendpage cannot properly handle pages with page_count == 0,
-	 * we need to fall back to sendmsg if that's the case.
-	 *
-	 * Same goes for slab pages: skb_can_coalesce() allows
-	 * coalescing neighboring slab objects into a single frag which
-	 * triggers one of hardened usercopy checks.
-	 */
-	if (sendpage_ok(page))
-		sendpage = sock->ops->sendpage;
-	else
-		sendpage = sock_no_sendpage;
-
-	ret = sendpage(sock, page, offset, size, flags);
-	if (ret == -EAGAIN)
-		ret = 0;
-
-	return ret;
-}
-
 static void con_out_kvec_reset(struct ceph_connection *con)
 {
 	BUG_ON(con->v1.out_skip);
@@ -464,7 +433,6 @@ static int write_partial_message_data(struct ceph_connection *con)
 	struct ceph_msg *msg = con->out_msg;
 	struct ceph_msg_data_cursor *cursor = &msg->cursor;
 	bool do_datacrc = !ceph_test_opt(from_msgr(con->msgr), NOCRC);
-	int more = MSG_MORE | MSG_SENDPAGE_NOTLAST;
 	u32 crc;
 
 	dout("%s %p msg %p\n", __func__, con, msg);
@@ -482,6 +450,10 @@ static int write_partial_message_data(struct ceph_connection *con)
 	 */
 	crc = do_datacrc ? le32_to_cpu(msg->footer.data_crc) : 0;
 	while (cursor->total_resid) {
+		struct bio_vec bvec;
+		struct msghdr msghdr = {
+			.msg_flags = MSG_SPLICE_PAGES,
+		};
 		struct page *page;
 		size_t page_offset;
 		size_t length;
@@ -493,10 +465,13 @@ static int write_partial_message_data(struct ceph_connection *con)
 		}
 
 		page = ceph_msg_data_next(cursor, &page_offset, &length);
-		if (length == cursor->total_resid)
-			more = MSG_MORE;
-		ret = ceph_tcp_sendpage(con->sock, page, page_offset, length,
-					more);
+		if (length != cursor->total_resid)
+			msghdr.msg_flags |= MSG_MORE;
+
+		bvec_set_page(&bvec, page, length, page_offset);
+		iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, length);
+
+		ret = sock_sendmsg(con->sock, &msghdr);
 		if (ret <= 0) {
 			if (do_datacrc)
 				msg->footer.data_crc = cpu_to_le32(crc);
@@ -526,7 +501,10 @@ static int write_partial_message_data(struct ceph_connection *con)
  */
 static int write_partial_skip(struct ceph_connection *con)
 {
-	int more = MSG_MORE | MSG_SENDPAGE_NOTLAST;
+	struct bio_vec bvec;
+	struct msghdr msghdr = {
+		.msg_flags = MSG_SPLICE_PAGES | MSG_MORE,
+	};
 	int ret;
 
 	dout("%s %p %d left\n", __func__, con, con->v1.out_skip);
@@ -534,9 +512,11 @@ static int write_partial_skip(struct ceph_connection *con)
 		size_t size = min(con->v1.out_skip, (int)PAGE_SIZE);
 
 		if (size == con->v1.out_skip)
-			more = MSG_MORE;
-		ret = ceph_tcp_sendpage(con->sock, ceph_zero_page, 0, size,
-					more);
+			msghdr.msg_flags &= ~MSG_MORE;
+		bvec_set_page(&bvec, ZERO_PAGE(0), size, 0);
+		iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, size);
+
+		ret = sock_sendmsg(con->sock, &msghdr);
 		if (ret <= 0)
 			goto out;
 		con->v1.out_skip -= ret;


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH net-next v5 04/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage()
  2023-06-23 22:54 [PATCH net-next v5 00/16] splice, net: Switch over users of sendpage() and remove it David Howells
                   ` (2 preceding siblings ...)
  2023-06-23 22:55 ` [PATCH net-next v5 03/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage David Howells
@ 2023-06-23 22:55 ` David Howells
  2023-06-25 12:34   ` Ilya Dryomov
  2023-06-26 15:30   ` David Howells
  2023-06-23 22:55 ` [PATCH net-next v5 05/16] rds: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage David Howells
                   ` (12 subsequent siblings)
  16 siblings, 2 replies; 29+ messages in thread
From: David Howells @ 2023-06-23 22:55 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel, Ilya Dryomov,
	Xiubo Li, Jeff Layton, ceph-devel

Use sendmsg() and MSG_SPLICE_PAGES rather than sendpage in ceph when
transmitting data.  For the moment, this can only transmit one page at a
time because of the architecture of net/ceph/, but if
write_partial_message_data() can be given a bvec[] at a time by the
iteration code, this would allow pages to be sent in a batch.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Xiubo Li <xiubli@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: ceph-devel@vger.kernel.org
cc: netdev@vger.kernel.org
---
 net/ceph/messenger_v2.c | 91 +++++++++--------------------------------
 1 file changed, 19 insertions(+), 72 deletions(-)

diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
index 301a991dc6a6..87ac97073e75 100644
--- a/net/ceph/messenger_v2.c
+++ b/net/ceph/messenger_v2.c
@@ -117,91 +117,38 @@ static int ceph_tcp_recv(struct ceph_connection *con)
 	return ret;
 }
 
-static int do_sendmsg(struct socket *sock, struct iov_iter *it)
-{
-	struct msghdr msg = { .msg_flags = CEPH_MSG_FLAGS };
-	int ret;
-
-	msg.msg_iter = *it;
-	while (iov_iter_count(it)) {
-		ret = sock_sendmsg(sock, &msg);
-		if (ret <= 0) {
-			if (ret == -EAGAIN)
-				ret = 0;
-			return ret;
-		}
-
-		iov_iter_advance(it, ret);
-	}
-
-	WARN_ON(msg_data_left(&msg));
-	return 1;
-}
-
-static int do_try_sendpage(struct socket *sock, struct iov_iter *it)
-{
-	struct msghdr msg = { .msg_flags = CEPH_MSG_FLAGS };
-	struct bio_vec bv;
-	int ret;
-
-	if (WARN_ON(!iov_iter_is_bvec(it)))
-		return -EINVAL;
-
-	while (iov_iter_count(it)) {
-		/* iov_iter_iovec() for ITER_BVEC */
-		bvec_set_page(&bv, it->bvec->bv_page,
-			      min(iov_iter_count(it),
-				  it->bvec->bv_len - it->iov_offset),
-			      it->bvec->bv_offset + it->iov_offset);
-
-		/*
-		 * sendpage cannot properly handle pages with
-		 * page_count == 0, we need to fall back to sendmsg if
-		 * that's the case.
-		 *
-		 * Same goes for slab pages: skb_can_coalesce() allows
-		 * coalescing neighboring slab objects into a single frag
-		 * which triggers one of hardened usercopy checks.
-		 */
-		if (sendpage_ok(bv.bv_page)) {
-			ret = sock->ops->sendpage(sock, bv.bv_page,
-						  bv.bv_offset, bv.bv_len,
-						  CEPH_MSG_FLAGS);
-		} else {
-			iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bv, 1, bv.bv_len);
-			ret = sock_sendmsg(sock, &msg);
-		}
-		if (ret <= 0) {
-			if (ret == -EAGAIN)
-				ret = 0;
-			return ret;
-		}
-
-		iov_iter_advance(it, ret);
-	}
-
-	return 1;
-}
-
 /*
  * Write as much as possible.  The socket is expected to be corked,
- * so we don't bother with MSG_MORE/MSG_SENDPAGE_NOTLAST here.
+ * so we don't bother with MSG_MORE here.
  *
  * Return:
- *   1 - done, nothing (else) to write
+ *  >0 - done, nothing (else) to write
  *   0 - socket is full, need to wait
  *  <0 - error
  */
 static int ceph_tcp_send(struct ceph_connection *con)
 {
+	struct msghdr msg = {
+		.msg_iter	= con->v2.out_iter,
+		.msg_flags	= CEPH_MSG_FLAGS,
+	};
 	int ret;
 
+	if (WARN_ON(!iov_iter_is_bvec(&con->v2.out_iter)))
+		return -EINVAL;
+
+	if (con->v2.out_iter_sendpage)
+		msg.msg_flags |= MSG_SPLICE_PAGES;
+
 	dout("%s con %p have %zu try_sendpage %d\n", __func__, con,
 	     iov_iter_count(&con->v2.out_iter), con->v2.out_iter_sendpage);
-	if (con->v2.out_iter_sendpage)
-		ret = do_try_sendpage(con->sock, &con->v2.out_iter);
-	else
-		ret = do_sendmsg(con->sock, &con->v2.out_iter);
+
+	ret = sock_sendmsg(con->sock, &msg);
+	if (ret > 0)
+		iov_iter_advance(&con->v2.out_iter, ret);
+	else if (ret == -EAGAIN)
+		ret = 0;
+
 	dout("%s con %p ret %d left %zu\n", __func__, con, ret,
 	     iov_iter_count(&con->v2.out_iter));
 	return ret;


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH net-next v5 05/16] rds: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage
  2023-06-23 22:54 [PATCH net-next v5 00/16] splice, net: Switch over users of sendpage() and remove it David Howells
                   ` (3 preceding siblings ...)
  2023-06-23 22:55 ` [PATCH net-next v5 04/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage() David Howells
@ 2023-06-23 22:55 ` David Howells
  2023-06-23 22:55 ` [PATCH net-next v5 06/16] dlm: " David Howells
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: David Howells @ 2023-06-23 22:55 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel,
	Santosh Shilimkar, linux-rdma, rds-devel

When transmitting data, call down into TCP using a single sendmsg with
MSG_SPLICE_PAGES to indicate that content should be spliced.

To make this work, the data is assembled in a bio_vec array and attached to
a BVEC-type iterator.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-rdma@vger.kernel.org
cc: rds-devel@oss.oracle.com
cc: netdev@vger.kernel.org
---

Notes:
    ver #4)
     - Reduce change to only call sendmsg on a page at a time.

 net/rds/tcp_send.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/net/rds/tcp_send.c b/net/rds/tcp_send.c
index 8c4d1d6e9249..7d284ac7e81a 100644
--- a/net/rds/tcp_send.c
+++ b/net/rds/tcp_send.c
@@ -72,9 +72,10 @@ int rds_tcp_xmit(struct rds_connection *conn, struct rds_message *rm,
 {
 	struct rds_conn_path *cp = rm->m_inc.i_conn_path;
 	struct rds_tcp_connection *tc = cp->cp_transport_data;
+	struct msghdr msg = {};
+	struct bio_vec bvec;
 	int done = 0;
 	int ret = 0;
-	int more;
 
 	if (hdr_off == 0) {
 		/*
@@ -111,15 +112,17 @@ int rds_tcp_xmit(struct rds_connection *conn, struct rds_message *rm,
 			goto out;
 	}
 
-	more = rm->data.op_nents > 1 ? (MSG_MORE | MSG_SENDPAGE_NOTLAST) : 0;
 	while (sg < rm->data.op_nents) {
-		int flags = MSG_DONTWAIT | MSG_NOSIGNAL | more;
-
-		ret = tc->t_sock->ops->sendpage(tc->t_sock,
-						sg_page(&rm->data.op_sg[sg]),
-						rm->data.op_sg[sg].offset + off,
-						rm->data.op_sg[sg].length - off,
-						flags);
+		msg.msg_flags = MSG_SPLICE_PAGES | MSG_DONTWAIT | MSG_NOSIGNAL;
+		if (sg + 1 < rm->data.op_nents)
+			msg.msg_flags |= MSG_MORE;
+
+		bvec_set_page(&bvec, sg_page(&rm->data.op_sg[sg]),
+			      rm->data.op_sg[sg].length - off,
+			      rm->data.op_sg[sg].offset + off);
+		iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1,
+			      rm->data.op_sg[sg].length - off);
+		ret = sock_sendmsg(tc->t_sock, &msg);
 		rdsdebug("tcp sendpage %p:%u:%u ret %d\n", (void *)sg_page(&rm->data.op_sg[sg]),
 			 rm->data.op_sg[sg].offset + off, rm->data.op_sg[sg].length - off,
 			 ret);
@@ -132,8 +135,6 @@ int rds_tcp_xmit(struct rds_connection *conn, struct rds_message *rm,
 			off = 0;
 			sg++;
 		}
-		if (sg == rm->data.op_nents - 1)
-			more = 0;
 	}
 
 out:


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH net-next v5 06/16] dlm: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage
  2023-06-23 22:54 [PATCH net-next v5 00/16] splice, net: Switch over users of sendpage() and remove it David Howells
                   ` (4 preceding siblings ...)
  2023-06-23 22:55 ` [PATCH net-next v5 05/16] rds: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage David Howells
@ 2023-06-23 22:55 ` David Howells
  2023-06-23 22:55 ` [PATCH net-next v5 07/16] nvme-tcp: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage David Howells
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: David Howells @ 2023-06-23 22:55 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel,
	Christine Caulfield, David Teigland, cluster-devel

When transmitting data, call down a layer using a single sendmsg with
MSG_SPLICE_PAGES to indicate that content should be spliced rather using
sendpage.  This allows ->sendpage() to be replaced by something that can
handle multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christine Caulfield <ccaulfie@redhat.com>
cc: David Teigland <teigland@redhat.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: cluster-devel@redhat.com
cc: netdev@vger.kernel.org
---
 fs/dlm/lowcomms.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 3d3802c47b8b..5c12d8cdfc16 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -1395,8 +1395,11 @@ int dlm_lowcomms_resend_msg(struct dlm_msg *msg)
 /* Send a message */
 static int send_to_sock(struct connection *con)
 {
-	const int msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL;
 	struct writequeue_entry *e;
+	struct bio_vec bvec;
+	struct msghdr msg = {
+		.msg_flags = MSG_SPLICE_PAGES | MSG_DONTWAIT | MSG_NOSIGNAL,
+	};
 	int len, offset, ret;
 
 	spin_lock_bh(&con->writequeue_lock);
@@ -1412,8 +1415,9 @@ static int send_to_sock(struct connection *con)
 	WARN_ON_ONCE(len == 0 && e->users == 0);
 	spin_unlock_bh(&con->writequeue_lock);
 
-	ret = kernel_sendpage(con->sock, e->page, offset, len,
-			      msg_flags);
+	bvec_set_page(&bvec, e->page, len, offset);
+	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len);
+	ret = sock_sendmsg(con->sock, &msg);
 	trace_dlm_send(con->nodeid, ret);
 	if (ret == -EAGAIN || ret == 0) {
 		lock_sock(con->sock->sk);


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH net-next v5 07/16] nvme-tcp: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage
  2023-06-23 22:54 [PATCH net-next v5 00/16] splice, net: Switch over users of sendpage() and remove it David Howells
                   ` (5 preceding siblings ...)
  2023-06-23 22:55 ` [PATCH net-next v5 06/16] dlm: " David Howells
@ 2023-06-23 22:55 ` David Howells
  2023-06-23 22:55 ` [PATCH net-next v5 08/16] nvmet-tcp: " David Howells
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: David Howells @ 2023-06-23 22:55 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel,
	Sagi Grimberg, Willem de Bruijn, Keith Busch, Jens Axboe,
	Christoph Hellwig, Chaitanya Kulkarni, linux-nvme

When transmitting data, call down into TCP using a sendmsg with
MSG_SPLICE_PAGES instead of sendpage.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Willem de Bruijn <willemb@google.com>
cc: Keith Busch <kbusch@kernel.org>
cc: Jens Axboe <axboe@fb.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Chaitanya Kulkarni <kch@nvidia.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-nvme@lists.infradead.org
cc: netdev@vger.kernel.org
---

Notes:
    ver #4)
     - Cancel MSG_SPLICE_PAGES if the page being sent fails sendpage_ok().
    
    ver #3)
     - Split nvme/host from nvme/target changes.
    
    ver #2)
     - Wrap lines at 80.

 drivers/nvme/host/tcp.c | 49 +++++++++++++++++++++++------------------
 1 file changed, 27 insertions(+), 22 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index bf0230442d57..47ae17f16c05 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -997,25 +997,28 @@ static int nvme_tcp_try_send_data(struct nvme_tcp_request *req)
 	u32 h2cdata_left = req->h2cdata_left;
 
 	while (true) {
+		struct bio_vec bvec;
+		struct msghdr msg = {
+			.msg_flags = MSG_DONTWAIT | MSG_SPLICE_PAGES,
+		};
 		struct page *page = nvme_tcp_req_cur_page(req);
 		size_t offset = nvme_tcp_req_cur_offset(req);
 		size_t len = nvme_tcp_req_cur_length(req);
 		bool last = nvme_tcp_pdu_last_send(req, len);
 		int req_data_sent = req->data_sent;
-		int ret, flags = MSG_DONTWAIT;
+		int ret;
 
 		if (last && !queue->data_digest && !nvme_tcp_queue_more(queue))
-			flags |= MSG_EOR;
+			msg.msg_flags |= MSG_EOR;
 		else
-			flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST;
+			msg.msg_flags |= MSG_MORE;
 
-		if (sendpage_ok(page)) {
-			ret = kernel_sendpage(queue->sock, page, offset, len,
-					flags);
-		} else {
-			ret = sock_no_sendpage(queue->sock, page, offset, len,
-					flags);
-		}
+		if (!sendpage_ok(page))
+			msg.msg_flags &= ~MSG_SPLICE_PAGES,
+
+		bvec_set_page(&bvec, page, len, offset);
+		iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len);
+		ret = sock_sendmsg(queue->sock, &msg);
 		if (ret <= 0)
 			return ret;
 
@@ -1054,22 +1057,24 @@ static int nvme_tcp_try_send_cmd_pdu(struct nvme_tcp_request *req)
 {
 	struct nvme_tcp_queue *queue = req->queue;
 	struct nvme_tcp_cmd_pdu *pdu = nvme_tcp_req_cmd_pdu(req);
+	struct bio_vec bvec;
+	struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_SPLICE_PAGES, };
 	bool inline_data = nvme_tcp_has_inline_data(req);
 	u8 hdgst = nvme_tcp_hdgst_len(queue);
 	int len = sizeof(*pdu) + hdgst - req->offset;
-	int flags = MSG_DONTWAIT;
 	int ret;
 
 	if (inline_data || nvme_tcp_queue_more(queue))
-		flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST;
+		msg.msg_flags |= MSG_MORE;
 	else
-		flags |= MSG_EOR;
+		msg.msg_flags |= MSG_EOR;
 
 	if (queue->hdr_digest && !req->offset)
 		nvme_tcp_hdgst(queue->snd_hash, pdu, sizeof(*pdu));
 
-	ret = kernel_sendpage(queue->sock, virt_to_page(pdu),
-			offset_in_page(pdu) + req->offset, len,  flags);
+	bvec_set_virt(&bvec, (void *)pdu + req->offset, len);
+	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len);
+	ret = sock_sendmsg(queue->sock, &msg);
 	if (unlikely(ret <= 0))
 		return ret;
 
@@ -1093,6 +1098,8 @@ static int nvme_tcp_try_send_data_pdu(struct nvme_tcp_request *req)
 {
 	struct nvme_tcp_queue *queue = req->queue;
 	struct nvme_tcp_data_pdu *pdu = nvme_tcp_req_data_pdu(req);
+	struct bio_vec bvec;
+	struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_MORE, };
 	u8 hdgst = nvme_tcp_hdgst_len(queue);
 	int len = sizeof(*pdu) - req->offset + hdgst;
 	int ret;
@@ -1101,13 +1108,11 @@ static int nvme_tcp_try_send_data_pdu(struct nvme_tcp_request *req)
 		nvme_tcp_hdgst(queue->snd_hash, pdu, sizeof(*pdu));
 
 	if (!req->h2cdata_left)
-		ret = kernel_sendpage(queue->sock, virt_to_page(pdu),
-				offset_in_page(pdu) + req->offset, len,
-				MSG_DONTWAIT | MSG_MORE | MSG_SENDPAGE_NOTLAST);
-	else
-		ret = sock_no_sendpage(queue->sock, virt_to_page(pdu),
-				offset_in_page(pdu) + req->offset, len,
-				MSG_DONTWAIT | MSG_MORE);
+		msg.msg_flags |= MSG_SPLICE_PAGES;
+
+	bvec_set_virt(&bvec, (void *)pdu + req->offset, len);
+	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len);
+	ret = sock_sendmsg(queue->sock, &msg);
 	if (unlikely(ret <= 0))
 		return ret;
 


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH net-next v5 08/16] nvmet-tcp: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage
  2023-06-23 22:54 [PATCH net-next v5 00/16] splice, net: Switch over users of sendpage() and remove it David Howells
                   ` (6 preceding siblings ...)
  2023-06-23 22:55 ` [PATCH net-next v5 07/16] nvme-tcp: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage David Howells
@ 2023-06-23 22:55 ` David Howells
  2023-06-23 22:55 ` [PATCH net-next v5 09/16] smc: Drop smc_sendpage() in favour of smc_sendmsg() + MSG_SPLICE_PAGES David Howells
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: David Howells @ 2023-06-23 22:55 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel,
	Sagi Grimberg, Willem de Bruijn, Keith Busch, Jens Axboe,
	Christoph Hellwig, Chaitanya Kulkarni, linux-nvme

When transmitting data, call down into TCP using a single sendmsg with
MSG_SPLICE_PAGES to indicate that content should be spliced rather than
copied instead of calling sendpage.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Willem de Bruijn <willemb@google.com>
cc: Keith Busch <kbusch@kernel.org>
cc: Jens Axboe <axboe@fb.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Chaitanya Kulkarni <kch@nvidia.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-nvme@lists.infradead.org
cc: netdev@vger.kernel.org
---

Notes:
    ver #3)
     - Split nvme/host from nvme/target changes.
    
    ver #2)
     - Wrap lines at 80.

 drivers/nvme/target/tcp.c | 46 ++++++++++++++++++++++++---------------
 1 file changed, 29 insertions(+), 17 deletions(-)

diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index ed98df72c76b..868aa4de2e4c 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -576,13 +576,17 @@ static void nvmet_tcp_execute_request(struct nvmet_tcp_cmd *cmd)
 
 static int nvmet_try_send_data_pdu(struct nvmet_tcp_cmd *cmd)
 {
+	struct msghdr msg = {
+		.msg_flags = MSG_DONTWAIT | MSG_MORE | MSG_SPLICE_PAGES,
+	};
+	struct bio_vec bvec;
 	u8 hdgst = nvmet_tcp_hdgst_len(cmd->queue);
 	int left = sizeof(*cmd->data_pdu) - cmd->offset + hdgst;
 	int ret;
 
-	ret = kernel_sendpage(cmd->queue->sock, virt_to_page(cmd->data_pdu),
-			offset_in_page(cmd->data_pdu) + cmd->offset,
-			left, MSG_DONTWAIT | MSG_MORE | MSG_SENDPAGE_NOTLAST);
+	bvec_set_virt(&bvec, (void *)cmd->data_pdu + cmd->offset, left);
+	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, left);
+	ret = sock_sendmsg(cmd->queue->sock, &msg);
 	if (ret <= 0)
 		return ret;
 
@@ -603,17 +607,21 @@ static int nvmet_try_send_data(struct nvmet_tcp_cmd *cmd, bool last_in_batch)
 	int ret;
 
 	while (cmd->cur_sg) {
+		struct msghdr msg = {
+			.msg_flags = MSG_DONTWAIT | MSG_SPLICE_PAGES,
+		};
 		struct page *page = sg_page(cmd->cur_sg);
+		struct bio_vec bvec;
 		u32 left = cmd->cur_sg->length - cmd->offset;
-		int flags = MSG_DONTWAIT;
 
 		if ((!last_in_batch && cmd->queue->send_list_len) ||
 		    cmd->wbytes_done + left < cmd->req.transfer_len ||
 		    queue->data_digest || !queue->nvme_sq.sqhd_disabled)
-			flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST;
+			msg.msg_flags |= MSG_MORE;
 
-		ret = kernel_sendpage(cmd->queue->sock, page, cmd->offset,
-					left, flags);
+		bvec_set_page(&bvec, page, left, cmd->offset);
+		iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, left);
+		ret = sock_sendmsg(cmd->queue->sock, &msg);
 		if (ret <= 0)
 			return ret;
 
@@ -649,18 +657,20 @@ static int nvmet_try_send_data(struct nvmet_tcp_cmd *cmd, bool last_in_batch)
 static int nvmet_try_send_response(struct nvmet_tcp_cmd *cmd,
 		bool last_in_batch)
 {
+	struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_SPLICE_PAGES, };
+	struct bio_vec bvec;
 	u8 hdgst = nvmet_tcp_hdgst_len(cmd->queue);
 	int left = sizeof(*cmd->rsp_pdu) - cmd->offset + hdgst;
-	int flags = MSG_DONTWAIT;
 	int ret;
 
 	if (!last_in_batch && cmd->queue->send_list_len)
-		flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST;
+		msg.msg_flags |= MSG_MORE;
 	else
-		flags |= MSG_EOR;
+		msg.msg_flags |= MSG_EOR;
 
-	ret = kernel_sendpage(cmd->queue->sock, virt_to_page(cmd->rsp_pdu),
-		offset_in_page(cmd->rsp_pdu) + cmd->offset, left, flags);
+	bvec_set_virt(&bvec, (void *)cmd->rsp_pdu + cmd->offset, left);
+	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, left);
+	ret = sock_sendmsg(cmd->queue->sock, &msg);
 	if (ret <= 0)
 		return ret;
 	cmd->offset += ret;
@@ -677,18 +687,20 @@ static int nvmet_try_send_response(struct nvmet_tcp_cmd *cmd,
 
 static int nvmet_try_send_r2t(struct nvmet_tcp_cmd *cmd, bool last_in_batch)
 {
+	struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_SPLICE_PAGES, };
+	struct bio_vec bvec;
 	u8 hdgst = nvmet_tcp_hdgst_len(cmd->queue);
 	int left = sizeof(*cmd->r2t_pdu) - cmd->offset + hdgst;
-	int flags = MSG_DONTWAIT;
 	int ret;
 
 	if (!last_in_batch && cmd->queue->send_list_len)
-		flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST;
+		msg.msg_flags |= MSG_MORE;
 	else
-		flags |= MSG_EOR;
+		msg.msg_flags |= MSG_EOR;
 
-	ret = kernel_sendpage(cmd->queue->sock, virt_to_page(cmd->r2t_pdu),
-		offset_in_page(cmd->r2t_pdu) + cmd->offset, left, flags);
+	bvec_set_virt(&bvec, (void *)cmd->r2t_pdu + cmd->offset, left);
+	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, left);
+	ret = sock_sendmsg(cmd->queue->sock, &msg);
 	if (ret <= 0)
 		return ret;
 	cmd->offset += ret;


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH net-next v5 09/16] smc: Drop smc_sendpage() in favour of smc_sendmsg() + MSG_SPLICE_PAGES
  2023-06-23 22:54 [PATCH net-next v5 00/16] splice, net: Switch over users of sendpage() and remove it David Howells
                   ` (7 preceding siblings ...)
  2023-06-23 22:55 ` [PATCH net-next v5 08/16] nvmet-tcp: " David Howells
@ 2023-06-23 22:55 ` David Howells
  2023-06-23 22:55 ` [PATCH net-next v5 10/16] drbd: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage() David Howells
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: David Howells @ 2023-06-23 22:55 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel,
	Karsten Graul, Wenjia Zhang, Jan Karcher, D. Wythe, Tony Lu,
	Wen Gu, linux-s390

Drop the smc_sendpage() code as smc_sendmsg() just passes the call down to
the underlying TCP socket and smc_tx_sendpage() is just a wrapper around
its sendmsg implementation.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Karsten Graul <kgraul@linux.ibm.com>
cc: Wenjia Zhang <wenjia@linux.ibm.com>
cc: Jan Karcher <jaka@linux.ibm.com>
cc: "D. Wythe" <alibuda@linux.alibaba.com>
cc: Tony Lu <tonylu@linux.alibaba.com>
cc: Wen Gu <guwen@linux.alibaba.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-s390@vger.kernel.org
cc: netdev@vger.kernel.org
---
 net/smc/af_smc.c    | 29 -----------------------------
 net/smc/smc_stats.c |  2 +-
 net/smc/smc_stats.h |  1 -
 net/smc/smc_tx.c    | 19 -------------------
 net/smc/smc_tx.h    |  2 --
 5 files changed, 1 insertion(+), 52 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 538e9c6ec8c9..a7f887d91d89 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -3133,34 +3133,6 @@ static int smc_ioctl(struct socket *sock, unsigned int cmd,
 	return put_user(answ, (int __user *)arg);
 }
 
-static ssize_t smc_sendpage(struct socket *sock, struct page *page,
-			    int offset, size_t size, int flags)
-{
-	struct sock *sk = sock->sk;
-	struct smc_sock *smc;
-	int rc = -EPIPE;
-
-	smc = smc_sk(sk);
-	lock_sock(sk);
-	if (sk->sk_state != SMC_ACTIVE) {
-		release_sock(sk);
-		goto out;
-	}
-	release_sock(sk);
-	if (smc->use_fallback) {
-		rc = kernel_sendpage(smc->clcsock, page, offset,
-				     size, flags);
-	} else {
-		lock_sock(sk);
-		rc = smc_tx_sendpage(smc, page, offset, size, flags);
-		release_sock(sk);
-		SMC_STAT_INC(smc, sendpage_cnt);
-	}
-
-out:
-	return rc;
-}
-
 /* Map the affected portions of the rmbe into an spd, note the number of bytes
  * to splice in conn->splice_pending, and press 'go'. Delays consumer cursor
  * updates till whenever a respective page has been fully processed.
@@ -3232,7 +3204,6 @@ static const struct proto_ops smc_sock_ops = {
 	.sendmsg	= smc_sendmsg,
 	.recvmsg	= smc_recvmsg,
 	.mmap		= sock_no_mmap,
-	.sendpage	= smc_sendpage,
 	.splice_read	= smc_splice_read,
 };
 
diff --git a/net/smc/smc_stats.c b/net/smc/smc_stats.c
index e80e34f7ac15..ca14c0f3a07d 100644
--- a/net/smc/smc_stats.c
+++ b/net/smc/smc_stats.c
@@ -227,7 +227,7 @@ static int smc_nl_fill_stats_tech_data(struct sk_buff *skb,
 			      SMC_NLA_STATS_PAD))
 		goto errattr;
 	if (nla_put_u64_64bit(skb, SMC_NLA_STATS_T_SENDPAGE_CNT,
-			      smc_tech->sendpage_cnt,
+			      0,
 			      SMC_NLA_STATS_PAD))
 		goto errattr;
 	if (nla_put_u64_64bit(skb, SMC_NLA_STATS_T_CORK_CNT,
diff --git a/net/smc/smc_stats.h b/net/smc/smc_stats.h
index 84b7ecd8c05c..b60fe1eb37ab 100644
--- a/net/smc/smc_stats.h
+++ b/net/smc/smc_stats.h
@@ -71,7 +71,6 @@ struct smc_stats_tech {
 	u64			clnt_v2_succ_cnt;
 	u64			srv_v1_succ_cnt;
 	u64			srv_v2_succ_cnt;
-	u64			sendpage_cnt;
 	u64			urg_data_cnt;
 	u64			splice_cnt;
 	u64			cork_cnt;
diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
index 9b9e0a190734..3b0ff3b589c7 100644
--- a/net/smc/smc_tx.c
+++ b/net/smc/smc_tx.c
@@ -297,25 +297,6 @@ int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len)
 	return rc;
 }
 
-int smc_tx_sendpage(struct smc_sock *smc, struct page *page, int offset,
-		    size_t size, int flags)
-{
-	struct msghdr msg = {.msg_flags = flags};
-	char *kaddr = kmap(page);
-	struct kvec iov;
-	int rc;
-
-	if (flags & MSG_SENDPAGE_NOTLAST)
-		msg.msg_flags |= MSG_MORE;
-
-	iov.iov_base = kaddr + offset;
-	iov.iov_len = size;
-	iov_iter_kvec(&msg.msg_iter, ITER_SOURCE, &iov, 1, size);
-	rc = smc_tx_sendmsg(smc, &msg, size);
-	kunmap(page);
-	return rc;
-}
-
 /***************************** sndbuf consumer *******************************/
 
 /* sndbuf consumer: actual data transfer of one target chunk with ISM write */
diff --git a/net/smc/smc_tx.h b/net/smc/smc_tx.h
index 34b578498b1f..a59f370b8b43 100644
--- a/net/smc/smc_tx.h
+++ b/net/smc/smc_tx.h
@@ -31,8 +31,6 @@ void smc_tx_pending(struct smc_connection *conn);
 void smc_tx_work(struct work_struct *work);
 void smc_tx_init(struct smc_sock *smc);
 int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len);
-int smc_tx_sendpage(struct smc_sock *smc, struct page *page, int offset,
-		    size_t size, int flags);
 int smc_tx_sndbuf_nonempty(struct smc_connection *conn);
 void smc_tx_sndbuf_nonfull(struct smc_sock *smc);
 void smc_tx_consumer_update(struct smc_connection *conn, bool force);


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH net-next v5 10/16] drbd: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage()
  2023-06-23 22:54 [PATCH net-next v5 00/16] splice, net: Switch over users of sendpage() and remove it David Howells
                   ` (8 preceding siblings ...)
  2023-06-23 22:55 ` [PATCH net-next v5 09/16] smc: Drop smc_sendpage() in favour of smc_sendmsg() + MSG_SPLICE_PAGES David Howells
@ 2023-06-23 22:55 ` David Howells
  2023-06-23 22:55 ` [PATCH net-next v5 11/16] scsi: iscsi_tcp: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage David Howells
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: David Howells @ 2023-06-23 22:55 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel,
	Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder,
	drbd-dev, linux-block

Use sendmsg() conditionally with MSG_SPLICE_PAGES in _drbd_send_page()
rather than calling sendpage() or _drbd_no_send_page().

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Philipp Reisner <philipp.reisner@linbit.com>
cc: Lars Ellenberg <lars.ellenberg@linbit.com>
cc: "Christoph Böhmwalder" <christoph.boehmwalder@linbit.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: drbd-dev@lists.linbit.com
cc: linux-block@vger.kernel.org
cc: netdev@vger.kernel.org
---

Notes:
    ver #4)
     - Don't look at msg.msg_iter after calling sendmsg.  There's no guarantee
       it has changed.
    
    ver #2)
     - Wrap lines at 80.

 drivers/block/drbd/drbd_main.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 83987e7a5ef2..ea82d6733313 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -1540,6 +1540,8 @@ static int _drbd_send_page(struct drbd_peer_device *peer_device, struct page *pa
 		    int offset, size_t size, unsigned msg_flags)
 {
 	struct socket *socket = peer_device->connection->data.socket;
+	struct msghdr msg = { .msg_flags = msg_flags, };
+	struct bio_vec bvec;
 	int len = size;
 	int err = -EIO;
 
@@ -1549,15 +1551,17 @@ static int _drbd_send_page(struct drbd_peer_device *peer_device, struct page *pa
 	 * put_page(); and would cause either a VM_BUG directly, or
 	 * __page_cache_release a page that would actually still be referenced
 	 * by someone, leading to some obscure delayed Oops somewhere else. */
-	if (drbd_disable_sendpage || !sendpage_ok(page))
-		return _drbd_no_send_page(peer_device, page, offset, size, msg_flags);
+	if (!drbd_disable_sendpage && sendpage_ok(page))
+		msg.msg_flags |= MSG_NOSIGNAL | MSG_SPLICE_PAGES;
 
-	msg_flags |= MSG_NOSIGNAL;
 	drbd_update_congested(peer_device->connection);
 	do {
 		int sent;
 
-		sent = socket->ops->sendpage(socket, page, offset, len, msg_flags);
+		bvec_set_page(&bvec, page, offset, len);
+		iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len);
+
+		sent = sock_sendmsg(socket, &msg);
 		if (sent <= 0) {
 			if (sent == -EAGAIN) {
 				if (we_should_drop_the_connection(peer_device->connection, socket))


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH net-next v5 11/16] scsi: iscsi_tcp: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage
  2023-06-23 22:54 [PATCH net-next v5 00/16] splice, net: Switch over users of sendpage() and remove it David Howells
                   ` (9 preceding siblings ...)
  2023-06-23 22:55 ` [PATCH net-next v5 10/16] drbd: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage() David Howells
@ 2023-06-23 22:55 ` David Howells
  2023-06-27 19:02   ` Chris Leech
  2023-06-23 22:55 ` [PATCH net-next v5 12/16] scsi: target: iscsi: " David Howells
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 29+ messages in thread
From: David Howells @ 2023-06-23 22:55 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel,
	Mike Christie, Lee Duncan, Chris Leech, James E.J. Bottomley,
	Martin K. Petersen, Al Viro, open-iscsi, linux-scsi,
	target-devel

Use sendmsg() with MSG_SPLICE_PAGES rather than sendpage.  This allows
multiple pages and multipage folios to be passed through.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
cc: Lee Duncan <lduncan@suse.com>
cc: Chris Leech <cleech@redhat.com>
cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
cc: "Martin K. Petersen" <martin.petersen@oracle.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: open-iscsi@googlegroups.com
cc: linux-scsi@vger.kernel.org
cc: target-devel@vger.kernel.org
cc: netdev@vger.kernel.org
---

Notes:
    ver #5)
     - Split iscsi changes into client and target patches

 drivers/scsi/iscsi_tcp.c | 26 ++++++++++----------------
 drivers/scsi/iscsi_tcp.h |  2 --
 2 files changed, 10 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 9637d4bc2bc9..9ab8555180a3 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -301,35 +301,32 @@ static int iscsi_sw_tcp_xmit_segment(struct iscsi_tcp_conn *tcp_conn,
 
 	while (!iscsi_tcp_segment_done(tcp_conn, segment, 0, r)) {
 		struct scatterlist *sg;
+		struct msghdr msg = {};
+		struct bio_vec bv;
 		unsigned int offset, copy;
-		int flags = 0;
 
 		r = 0;
 		offset = segment->copied;
 		copy = segment->size - offset;
 
 		if (segment->total_copied + segment->size < segment->total_size)
-			flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST;
+			msg.msg_flags |= MSG_MORE;
 
 		if (tcp_sw_conn->queue_recv)
-			flags |= MSG_DONTWAIT;
+			msg.msg_flags |= MSG_DONTWAIT;
 
-		/* Use sendpage if we can; else fall back to sendmsg */
 		if (!segment->data) {
+			if (!tcp_conn->iscsi_conn->datadgst_en)
+				msg.msg_flags |= MSG_SPLICE_PAGES;
 			sg = segment->sg;
 			offset += segment->sg_offset + sg->offset;
-			r = tcp_sw_conn->sendpage(sk, sg_page(sg), offset,
-						  copy, flags);
+			bvec_set_page(&bv, sg_page(sg), copy, offset);
 		} else {
-			struct msghdr msg = { .msg_flags = flags };
-			struct kvec iov = {
-				.iov_base = segment->data + offset,
-				.iov_len = copy
-			};
-
-			r = kernel_sendmsg(sk, &msg, &iov, 1, copy);
+			bvec_set_virt(&bv, segment->data + offset, copy);
 		}
+		iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bv, 1, copy);
 
+		r = sock_sendmsg(sk, &msg);
 		if (r < 0) {
 			iscsi_tcp_segment_unmap(segment);
 			return r;
@@ -746,7 +743,6 @@ iscsi_sw_tcp_conn_bind(struct iscsi_cls_session *cls_session,
 	sock_no_linger(sk);
 
 	iscsi_sw_tcp_conn_set_callbacks(conn);
-	tcp_sw_conn->sendpage = tcp_sw_conn->sock->ops->sendpage;
 	/*
 	 * set receive state machine into initial state
 	 */
@@ -777,8 +773,6 @@ static int iscsi_sw_tcp_conn_set_param(struct iscsi_cls_conn *cls_conn,
 			return -ENOTCONN;
 		}
 		iscsi_set_param(cls_conn, param, buf, buflen);
-		tcp_sw_conn->sendpage = conn->datadgst_en ?
-			sock_no_sendpage : tcp_sw_conn->sock->ops->sendpage;
 		mutex_unlock(&tcp_sw_conn->sock_lock);
 		break;
 	case ISCSI_PARAM_MAX_R2T:
diff --git a/drivers/scsi/iscsi_tcp.h b/drivers/scsi/iscsi_tcp.h
index 68e14a344904..89a6fc552f0b 100644
--- a/drivers/scsi/iscsi_tcp.h
+++ b/drivers/scsi/iscsi_tcp.h
@@ -47,8 +47,6 @@ struct iscsi_sw_tcp_conn {
 	/* MIB custom statistics */
 	uint32_t		sendpage_failures_cnt;
 	uint32_t		discontiguous_hdr_cnt;
-
-	ssize_t (*sendpage)(struct socket *, struct page *, int, size_t, int);
 };
 
 struct iscsi_sw_tcp_host {


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH net-next v5 12/16] scsi: target: iscsi: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage
  2023-06-23 22:54 [PATCH net-next v5 00/16] splice, net: Switch over users of sendpage() and remove it David Howells
                   ` (10 preceding siblings ...)
  2023-06-23 22:55 ` [PATCH net-next v5 11/16] scsi: iscsi_tcp: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage David Howells
@ 2023-06-23 22:55 ` David Howells
  2023-06-23 22:55 ` [PATCH net-next v5 13/16] ocfs2: Fix use of slab data with sendpage David Howells
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: David Howells @ 2023-06-23 22:55 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel,
	Mike Christie, Maurizio Lombardi, James E.J. Bottomley,
	Martin K. Petersen, Al Viro, open-iscsi, linux-scsi,
	target-devel

Use sendmsg() with MSG_SPLICE_PAGES rather than sendpage.  This allows
multiple pages and multipage folios to be passed through.

TODO: iscsit_fe_sendpage_sg() should perhaps set up a bio_vec array for the
entire set of pages it's going to transfer plus two for the header and
trailer and page fragments to hold the header and trailer - and then call
sendmsg once for the entire message.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Mike Christie <michael.christie@oracle.com>
cc: Maurizio Lombardi <mlombard@redhat.com>
cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
cc: "Martin K. Petersen" <martin.petersen@oracle.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: open-iscsi@googlegroups.com
cc: linux-scsi@vger.kernel.org
cc: target-devel@vger.kernel.org
cc: netdev@vger.kernel.org
---

Notes:
    ver #5)
     - Split iscsi changes into client and target patches
    
    ver #2)
     - Wrap lines at 80.

 drivers/target/iscsi/iscsi_target_util.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
index b14835fcb033..6231fa4ef5c6 100644
--- a/drivers/target/iscsi/iscsi_target_util.c
+++ b/drivers/target/iscsi/iscsi_target_util.c
@@ -1129,6 +1129,8 @@ int iscsit_fe_sendpage_sg(
 	struct iscsit_conn *conn)
 {
 	struct scatterlist *sg = cmd->first_data_sg;
+	struct bio_vec bvec;
+	struct msghdr msghdr = { .msg_flags = MSG_SPLICE_PAGES,	};
 	struct kvec iov;
 	u32 tx_hdr_size, data_len;
 	u32 offset = cmd->first_data_sg_off;
@@ -1172,17 +1174,18 @@ int iscsit_fe_sendpage_sg(
 		u32 space = (sg->length - offset);
 		u32 sub_len = min_t(u32, data_len, space);
 send_pg:
-		tx_sent = conn->sock->ops->sendpage(conn->sock,
-					sg_page(sg), sg->offset + offset, sub_len, 0);
+		bvec_set_page(&bvec, sg_page(sg), sub_len, sg->offset + offset);
+		iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, sub_len);
+
+		tx_sent = conn->sock->ops->sendmsg(conn->sock, &msghdr,
+						   sub_len);
 		if (tx_sent != sub_len) {
 			if (tx_sent == -EAGAIN) {
-				pr_err("tcp_sendpage() returned"
-						" -EAGAIN\n");
+				pr_err("sendmsg/splice returned -EAGAIN\n");
 				goto send_pg;
 			}
 
-			pr_err("tcp_sendpage() failure: %d\n",
-					tx_sent);
+			pr_err("sendmsg/splice failure: %d\n", tx_sent);
 			return -1;
 		}
 


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH net-next v5 13/16] ocfs2: Fix use of slab data with sendpage
  2023-06-23 22:54 [PATCH net-next v5 00/16] splice, net: Switch over users of sendpage() and remove it David Howells
                   ` (11 preceding siblings ...)
  2023-06-23 22:55 ` [PATCH net-next v5 12/16] scsi: target: iscsi: " David Howells
@ 2023-06-23 22:55 ` David Howells
  2023-06-23 22:55 ` [PATCH net-next v5 14/16] ocfs2: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage() David Howells
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: David Howells @ 2023-06-23 22:55 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel, Mark Fasheh,
	Kurt Hackel, Joel Becker, Joseph Qi, ocfs2-devel

ocfs2 uses kzalloc() to allocate buffers for o2net_hand, o2net_keep_req and
o2net_keep_resp and then passes these to sendpage.  This isn't really
allowed as the lifetime of slab objects is not controlled by page ref -
though in this case it will probably work.  sendmsg() with MSG_SPLICE_PAGES
will, however, print a warning and give an error.

Fix it to use folio_alloc() instead to allocate a buffer for the handshake
message, keepalive request and reply messages.

Fixes: 98211489d414 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Mark Fasheh <mark@fasheh.com>
cc: Kurt Hackel <kurt.hackel@oracle.com>
cc: Joel Becker <jlbec@evilplan.org>
cc: Joseph Qi <joseph.qi@linux.alibaba.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: ocfs2-devel@oss.oracle.com
cc: netdev@vger.kernel.org
---
 fs/ocfs2/cluster/tcp.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c
index aecbd712a00c..929a1133bc18 100644
--- a/fs/ocfs2/cluster/tcp.c
+++ b/fs/ocfs2/cluster/tcp.c
@@ -2087,18 +2087,24 @@ void o2net_stop_listening(struct o2nm_node *node)
 
 int o2net_init(void)
 {
+	struct folio *folio;
+	void *p;
 	unsigned long i;
 
 	o2quo_init();
-
 	o2net_debugfs_init();
 
-	o2net_hand = kzalloc(sizeof(struct o2net_handshake), GFP_KERNEL);
-	o2net_keep_req = kzalloc(sizeof(struct o2net_msg), GFP_KERNEL);
-	o2net_keep_resp = kzalloc(sizeof(struct o2net_msg), GFP_KERNEL);
-	if (!o2net_hand || !o2net_keep_req || !o2net_keep_resp)
+	folio = folio_alloc(GFP_KERNEL | __GFP_ZERO, 0);
+	if (!folio)
 		goto out;
 
+	p = folio_address(folio);
+	o2net_hand = p;
+	p += sizeof(struct o2net_handshake);
+	o2net_keep_req = p;
+	p += sizeof(struct o2net_msg);
+	o2net_keep_resp = p;
+
 	o2net_hand->protocol_version = cpu_to_be64(O2NET_PROTOCOL_VERSION);
 	o2net_hand->connector_id = cpu_to_be64(1);
 
@@ -2124,9 +2130,6 @@ int o2net_init(void)
 	return 0;
 
 out:
-	kfree(o2net_hand);
-	kfree(o2net_keep_req);
-	kfree(o2net_keep_resp);
 	o2net_debugfs_exit();
 	o2quo_exit();
 	return -ENOMEM;
@@ -2135,8 +2138,6 @@ int o2net_init(void)
 void o2net_exit(void)
 {
 	o2quo_exit();
-	kfree(o2net_hand);
-	kfree(o2net_keep_req);
-	kfree(o2net_keep_resp);
 	o2net_debugfs_exit();
+	folio_put(virt_to_folio(o2net_hand));
 }


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH net-next v5 14/16] ocfs2: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage()
  2023-06-23 22:54 [PATCH net-next v5 00/16] splice, net: Switch over users of sendpage() and remove it David Howells
                   ` (12 preceding siblings ...)
  2023-06-23 22:55 ` [PATCH net-next v5 13/16] ocfs2: Fix use of slab data with sendpage David Howells
@ 2023-06-23 22:55 ` David Howells
  2023-06-23 22:55 ` [PATCH net-next v5 15/16] sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) David Howells
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: David Howells @ 2023-06-23 22:55 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel, Mark Fasheh,
	Joel Becker, Joseph Qi, ocfs2-devel

Switch ocfs2 from using sendpage() to using sendmsg() + MSG_SPLICE_PAGES so
that sendpage can be phased out.

Signed-off-by: David Howells <dhowells@redhat.com>

cc: Mark Fasheh <mark@fasheh.com>
cc: Joel Becker <jlbec@evilplan.org>
cc: Joseph Qi <joseph.qi@linux.alibaba.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: ocfs2-devel@oss.oracle.com
cc: netdev@vger.kernel.org
---

Notes:
    ver #4)
     - Use folio_alloc() for o2net_hand, o2net_keep_req and o2net_keep_resp.
    
    ver #2)
     - Wrap lines at 80.

 fs/ocfs2/cluster/tcp.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c
index 929a1133bc18..960080753d3b 100644
--- a/fs/ocfs2/cluster/tcp.c
+++ b/fs/ocfs2/cluster/tcp.c
@@ -930,19 +930,22 @@ static int o2net_send_tcp_msg(struct socket *sock, struct kvec *vec,
 }
 
 static void o2net_sendpage(struct o2net_sock_container *sc,
-			   void *kmalloced_virt,
-			   size_t size)
+			   void *virt, size_t size)
 {
 	struct o2net_node *nn = o2net_nn_from_num(sc->sc_node->nd_num);
+	struct msghdr msg = {};
+	struct bio_vec bv;
 	ssize_t ret;
 
+	bvec_set_virt(&bv, virt, size);
+	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bv, 1, size);
+
 	while (1) {
+		msg.msg_flags = MSG_DONTWAIT | MSG_SPLICE_PAGES;
 		mutex_lock(&sc->sc_send_lock);
-		ret = sc->sc_sock->ops->sendpage(sc->sc_sock,
-						 virt_to_page(kmalloced_virt),
-						 offset_in_page(kmalloced_virt),
-						 size, MSG_DONTWAIT);
+		ret = sock_sendmsg(sc->sc_sock, &msg);
 		mutex_unlock(&sc->sc_send_lock);
+
 		if (ret == size)
 			break;
 		if (ret == (ssize_t)-EAGAIN) {


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH net-next v5 15/16] sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES)
  2023-06-23 22:54 [PATCH net-next v5 00/16] splice, net: Switch over users of sendpage() and remove it David Howells
                   ` (13 preceding siblings ...)
  2023-06-23 22:55 ` [PATCH net-next v5 14/16] ocfs2: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage() David Howells
@ 2023-06-23 22:55 ` David Howells
  2023-06-23 22:55 ` [PATCH net-next v5 16/16] net: Kill MSG_SENDPAGE_NOTLAST David Howells
  2023-06-24 23:00 ` [PATCH net-next v5 00/16] splice, net: Switch over users of sendpage() and remove it patchwork-bot+netdevbpf
  16 siblings, 0 replies; 29+ messages in thread
From: David Howells @ 2023-06-23 22:55 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel,
	Marc Kleine-Budde, bpf, dccp, linux-afs, linux-arm-msm,
	linux-can, linux-crypto, linux-doc, linux-fsdevel, linux-hams,
	linux-nfs, linux-rdma, linux-sctp, linux-wpan, linux-x25, mptcp,
	rds-devel, tipc-discussion, virtualization

Remove ->sendpage() and ->sendpage_locked().  sendmsg() with
MSG_SPLICE_PAGES should be used instead.  This allows multiple pages and
multipage folios to be passed through.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for net/can
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: bpf@vger.kernel.org
cc: dccp@vger.kernel.org
cc: linux-afs@lists.infradead.org
cc: linux-arm-msm@vger.kernel.org
cc: linux-can@vger.kernel.org
cc: linux-crypto@vger.kernel.org
cc: linux-doc@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-hams@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: linux-rdma@vger.kernel.org
cc: linux-sctp@vger.kernel.org
cc: linux-wpan@vger.kernel.org
cc: linux-x25@vger.kernel.org
cc: mptcp@lists.linux.dev
cc: netdev@vger.kernel.org
cc: rds-devel@oss.oracle.com
cc: tipc-discussion@lists.sourceforge.net
cc: virtualization@lists.linux-foundation.org
---

Notes:
    ver #2)
     - Removed duplicate word in comment.

 Documentation/bpf/map_sockmap.rst             | 10 ++--
 Documentation/filesystems/locking.rst         |  2 -
 Documentation/filesystems/vfs.rst             |  1 -
 Documentation/networking/scaling.rst          |  4 +-
 crypto/af_alg.c                               | 28 -----------
 crypto/algif_aead.c                           | 22 ++-------
 crypto/algif_rng.c                            |  2 -
 crypto/algif_skcipher.c                       | 14 ------
 .../chelsio/inline_crypto/chtls/chtls.h       |  2 -
 .../chelsio/inline_crypto/chtls/chtls_io.c    | 14 ------
 .../chelsio/inline_crypto/chtls/chtls_main.c  |  1 -
 fs/nfsd/vfs.c                                 |  2 +-
 include/crypto/if_alg.h                       |  2 -
 include/linux/net.h                           |  8 ----
 include/net/inet_common.h                     |  2 -
 include/net/sock.h                            |  6 ---
 include/net/tcp.h                             |  4 --
 net/appletalk/ddp.c                           |  1 -
 net/atm/pvc.c                                 |  1 -
 net/atm/svc.c                                 |  1 -
 net/ax25/af_ax25.c                            |  1 -
 net/caif/caif_socket.c                        |  2 -
 net/can/bcm.c                                 |  1 -
 net/can/isotp.c                               |  1 -
 net/can/j1939/socket.c                        |  1 -
 net/can/raw.c                                 |  1 -
 net/core/sock.c                               | 35 +-------------
 net/dccp/ipv4.c                               |  1 -
 net/dccp/ipv6.c                               |  1 -
 net/ieee802154/socket.c                       |  2 -
 net/ipv4/af_inet.c                            | 21 --------
 net/ipv4/tcp.c                                | 43 ++---------------
 net/ipv4/tcp_bpf.c                            | 23 +--------
 net/ipv4/tcp_ipv4.c                           |  1 -
 net/ipv4/udp.c                                | 15 ------
 net/ipv4/udp_impl.h                           |  2 -
 net/ipv4/udplite.c                            |  1 -
 net/ipv6/af_inet6.c                           |  3 --
 net/ipv6/raw.c                                |  1 -
 net/ipv6/tcp_ipv6.c                           |  1 -
 net/kcm/kcmsock.c                             | 20 --------
 net/key/af_key.c                              |  1 -
 net/l2tp/l2tp_ip.c                            |  1 -
 net/l2tp/l2tp_ip6.c                           |  1 -
 net/llc/af_llc.c                              |  1 -
 net/mctp/af_mctp.c                            |  1 -
 net/mptcp/protocol.c                          |  2 -
 net/netlink/af_netlink.c                      |  1 -
 net/netrom/af_netrom.c                        |  1 -
 net/packet/af_packet.c                        |  2 -
 net/phonet/socket.c                           |  2 -
 net/qrtr/af_qrtr.c                            |  1 -
 net/rds/af_rds.c                              |  1 -
 net/rose/af_rose.c                            |  1 -
 net/rxrpc/af_rxrpc.c                          |  1 -
 net/sctp/protocol.c                           |  1 -
 net/socket.c                                  | 48 -------------------
 net/tipc/socket.c                             |  3 --
 net/tls/tls.h                                 |  6 ---
 net/tls/tls_device.c                          | 17 -------
 net/tls/tls_main.c                            |  7 ---
 net/tls/tls_sw.c                              | 35 --------------
 net/unix/af_unix.c                            | 19 --------
 net/vmw_vsock/af_vsock.c                      |  3 --
 net/x25/af_x25.c                              |  1 -
 net/xdp/xsk.c                                 |  1 -
 66 files changed, 20 insertions(+), 442 deletions(-)

diff --git a/Documentation/bpf/map_sockmap.rst b/Documentation/bpf/map_sockmap.rst
index cc92047c6630..2d630686a00b 100644
--- a/Documentation/bpf/map_sockmap.rst
+++ b/Documentation/bpf/map_sockmap.rst
@@ -240,11 +240,11 @@ offsets into ``msg``, respectively.
 If a program of type ``BPF_PROG_TYPE_SK_MSG`` is run on a ``msg`` it can only
 parse data that the (``data``, ``data_end``) pointers have already consumed.
 For ``sendmsg()`` hooks this is likely the first scatterlist element. But for
-calls relying on the ``sendpage`` handler (e.g., ``sendfile()``) this will be
-the range (**0**, **0**) because the data is shared with user space and by
-default the objective is to avoid allowing user space to modify data while (or
-after) BPF verdict is being decided. This helper can be used to pull in data
-and to set the start and end pointers to given values. Data will be copied if
+calls relying on MSG_SPLICE_PAGES (e.g., ``sendfile()``) this will be the
+range (**0**, **0**) because the data is shared with user space and by default
+the objective is to avoid allowing user space to modify data while (or after)
+BPF verdict is being decided. This helper can be used to pull in data and to
+set the start and end pointers to given values. Data will be copied if
 necessary (i.e., if data was not linear and if start and end pointers do not
 point to the same chunk).
 
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index aa1a233b0fa8..ed148919e11a 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -521,8 +521,6 @@ prototypes::
 	int (*fsync) (struct file *, loff_t start, loff_t end, int datasync);
 	int (*fasync) (int, struct file *, int);
 	int (*lock) (struct file *, int, struct file_lock *);
-	ssize_t (*sendpage) (struct file *, struct page *, int, size_t,
-			loff_t *, int);
 	unsigned long (*get_unmapped_area)(struct file *, unsigned long,
 			unsigned long, unsigned long, unsigned long);
 	int (*check_flags)(int);
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 769be5230210..cb2a97e49872 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -1086,7 +1086,6 @@ This describes how the VFS can manipulate an open file.  As of kernel
 		int (*fsync) (struct file *, loff_t, loff_t, int datasync);
 		int (*fasync) (int, struct file *, int);
 		int (*lock) (struct file *, int, struct file_lock *);
-		ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
 		unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
 		int (*check_flags)(int);
 		int (*flock) (struct file *, int, struct file_lock *);
diff --git a/Documentation/networking/scaling.rst b/Documentation/networking/scaling.rst
index 3d435caa3ef2..92c9fb46d6a2 100644
--- a/Documentation/networking/scaling.rst
+++ b/Documentation/networking/scaling.rst
@@ -269,8 +269,8 @@ a single application thread handles flows with many different flow hashes.
 rps_sock_flow_table is a global flow table that contains the *desired* CPU
 for flows: the CPU that is currently processing the flow in userspace.
 Each table value is a CPU index that is updated during calls to recvmsg
-and sendmsg (specifically, inet_recvmsg(), inet_sendmsg(), inet_sendpage()
-and tcp_splice_read()).
+and sendmsg (specifically, inet_recvmsg(), inet_sendmsg() and
+tcp_splice_read()).
 
 When the scheduler moves a thread to a new CPU while it has outstanding
 receive packets on the old CPU, packets may arrive out of order. To
diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index 7d4b6016b83d..11229f3bcf84 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -482,7 +482,6 @@ static const struct proto_ops alg_proto_ops = {
 	.listen		=	sock_no_listen,
 	.shutdown	=	sock_no_shutdown,
 	.mmap		=	sock_no_mmap,
-	.sendpage	=	sock_no_sendpage,
 	.sendmsg	=	sock_no_sendmsg,
 	.recvmsg	=	sock_no_recvmsg,
 
@@ -1106,33 +1105,6 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size,
 }
 EXPORT_SYMBOL_GPL(af_alg_sendmsg);
 
-/**
- * af_alg_sendpage - sendpage system call handler
- * @sock: socket of connection to user space to write to
- * @page: data to send
- * @offset: offset into page to begin sending
- * @size: length of data
- * @flags: message send/receive flags
- *
- * This is a generic implementation of sendpage to fill ctx->tsgl_list.
- */
-ssize_t af_alg_sendpage(struct socket *sock, struct page *page,
-			int offset, size_t size, int flags)
-{
-	struct bio_vec bvec;
-	struct msghdr msg = {
-		.msg_flags = flags | MSG_SPLICE_PAGES,
-	};
-
-	if (flags & MSG_SENDPAGE_NOTLAST)
-		msg.msg_flags |= MSG_MORE;
-
-	bvec_set_page(&bvec, page, size, offset);
-	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
-	return sock_sendmsg(sock, &msg);
-}
-EXPORT_SYMBOL_GPL(af_alg_sendpage);
-
 /**
  * af_alg_free_resources - release resources required for crypto request
  * @areq: Request holding the TX and RX SGL
diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
index 35bfa283748d..7d58cbbce4af 100644
--- a/crypto/algif_aead.c
+++ b/crypto/algif_aead.c
@@ -9,10 +9,10 @@
  * The following concept of the memory management is used:
  *
  * The kernel maintains two SGLs, the TX SGL and the RX SGL. The TX SGL is
- * filled by user space with the data submitted via sendpage. Filling up
- * the TX SGL does not cause a crypto operation -- the data will only be
- * tracked by the kernel. Upon receipt of one recvmsg call, the caller must
- * provide a buffer which is tracked with the RX SGL.
+ * filled by user space with the data submitted via sendmsg (maybe with
+ * MSG_SPLICE_PAGES).  Filling up the TX SGL does not cause a crypto operation
+ * -- the data will only be tracked by the kernel. Upon receipt of one recvmsg
+ * call, the caller must provide a buffer which is tracked with the RX SGL.
  *
  * During the processing of the recvmsg operation, the cipher request is
  * allocated and prepared. As part of the recvmsg operation, the processed
@@ -370,7 +370,6 @@ static struct proto_ops algif_aead_ops = {
 
 	.release	=	af_alg_release,
 	.sendmsg	=	aead_sendmsg,
-	.sendpage	=	af_alg_sendpage,
 	.recvmsg	=	aead_recvmsg,
 	.poll		=	af_alg_poll,
 };
@@ -422,18 +421,6 @@ static int aead_sendmsg_nokey(struct socket *sock, struct msghdr *msg,
 	return aead_sendmsg(sock, msg, size);
 }
 
-static ssize_t aead_sendpage_nokey(struct socket *sock, struct page *page,
-				       int offset, size_t size, int flags)
-{
-	int err;
-
-	err = aead_check_key(sock);
-	if (err)
-		return err;
-
-	return af_alg_sendpage(sock, page, offset, size, flags);
-}
-
 static int aead_recvmsg_nokey(struct socket *sock, struct msghdr *msg,
 				  size_t ignored, int flags)
 {
@@ -461,7 +448,6 @@ static struct proto_ops algif_aead_ops_nokey = {
 
 	.release	=	af_alg_release,
 	.sendmsg	=	aead_sendmsg_nokey,
-	.sendpage	=	aead_sendpage_nokey,
 	.recvmsg	=	aead_recvmsg_nokey,
 	.poll		=	af_alg_poll,
 };
diff --git a/crypto/algif_rng.c b/crypto/algif_rng.c
index 407408c43730..10c41adac3b1 100644
--- a/crypto/algif_rng.c
+++ b/crypto/algif_rng.c
@@ -174,7 +174,6 @@ static struct proto_ops algif_rng_ops = {
 	.bind		=	sock_no_bind,
 	.accept		=	sock_no_accept,
 	.sendmsg	=	sock_no_sendmsg,
-	.sendpage	=	sock_no_sendpage,
 
 	.release	=	af_alg_release,
 	.recvmsg	=	rng_recvmsg,
@@ -192,7 +191,6 @@ static struct proto_ops __maybe_unused algif_rng_test_ops = {
 	.mmap		=	sock_no_mmap,
 	.bind		=	sock_no_bind,
 	.accept		=	sock_no_accept,
-	.sendpage	=	sock_no_sendpage,
 
 	.release	=	af_alg_release,
 	.recvmsg	=	rng_test_recvmsg,
diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
index b1f321b9f846..9ada9b741af8 100644
--- a/crypto/algif_skcipher.c
+++ b/crypto/algif_skcipher.c
@@ -194,7 +194,6 @@ static struct proto_ops algif_skcipher_ops = {
 
 	.release	=	af_alg_release,
 	.sendmsg	=	skcipher_sendmsg,
-	.sendpage	=	af_alg_sendpage,
 	.recvmsg	=	skcipher_recvmsg,
 	.poll		=	af_alg_poll,
 };
@@ -246,18 +245,6 @@ static int skcipher_sendmsg_nokey(struct socket *sock, struct msghdr *msg,
 	return skcipher_sendmsg(sock, msg, size);
 }
 
-static ssize_t skcipher_sendpage_nokey(struct socket *sock, struct page *page,
-				       int offset, size_t size, int flags)
-{
-	int err;
-
-	err = skcipher_check_key(sock);
-	if (err)
-		return err;
-
-	return af_alg_sendpage(sock, page, offset, size, flags);
-}
-
 static int skcipher_recvmsg_nokey(struct socket *sock, struct msghdr *msg,
 				  size_t ignored, int flags)
 {
@@ -285,7 +272,6 @@ static struct proto_ops algif_skcipher_ops_nokey = {
 
 	.release	=	af_alg_release,
 	.sendmsg	=	skcipher_sendmsg_nokey,
-	.sendpage	=	skcipher_sendpage_nokey,
 	.recvmsg	=	skcipher_recvmsg_nokey,
 	.poll		=	af_alg_poll,
 };
diff --git a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls.h b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls.h
index da4818d2c856..68562a82d036 100644
--- a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls.h
+++ b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls.h
@@ -569,8 +569,6 @@ int chtls_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
 int chtls_recvmsg(struct sock *sk, struct msghdr *msg,
 		  size_t len, int flags, int *addr_len);
 void chtls_splice_eof(struct socket *sock);
-int chtls_sendpage(struct sock *sk, struct page *page,
-		   int offset, size_t size, int flags);
 int send_tx_flowc_wr(struct sock *sk, int compl,
 		     u32 snd_nxt, u32 rcv_nxt);
 void chtls_tcp_push(struct sock *sk, int flags);
diff --git a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_io.c b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_io.c
index e08ac960c967..5fc64e47568a 100644
--- a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_io.c
+++ b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_io.c
@@ -1246,20 +1246,6 @@ void chtls_splice_eof(struct socket *sock)
 	release_sock(sk);
 }
 
-int chtls_sendpage(struct sock *sk, struct page *page,
-		   int offset, size_t size, int flags)
-{
-	struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };
-	struct bio_vec bvec;
-
-	if (flags & MSG_SENDPAGE_NOTLAST)
-		msg.msg_flags |= MSG_MORE;
-
-	bvec_set_page(&bvec, page, size, offset);
-	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
-	return chtls_sendmsg(sk, &msg, size);
-}
-
 static void chtls_select_window(struct sock *sk)
 {
 	struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
diff --git a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_main.c b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_main.c
index 6b6787eafd2f..455a54708be4 100644
--- a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_main.c
+++ b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_main.c
@@ -607,7 +607,6 @@ static void __init chtls_init_ulp_ops(void)
 	chtls_cpl_prot.shutdown		= chtls_shutdown;
 	chtls_cpl_prot.sendmsg		= chtls_sendmsg;
 	chtls_cpl_prot.splice_eof	= chtls_splice_eof;
-	chtls_cpl_prot.sendpage		= chtls_sendpage;
 	chtls_cpl_prot.recvmsg		= chtls_recvmsg;
 	chtls_cpl_prot.setsockopt	= chtls_setsockopt;
 	chtls_cpl_prot.getsockopt	= chtls_getsockopt;
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index db67f8e19344..8879e207ff5a 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -936,7 +936,7 @@ nfsd_open_verified(struct svc_rqst *rqstp, struct svc_fh *fhp, int may_flags,
 
 /*
  * Grab and keep cached pages associated with a file in the svc_rqst
- * so that they can be passed to the network sendmsg/sendpage routines
+ * so that they can be passed to the network sendmsg routines
  * directly. They will be released after the sending has completed.
  *
  * Return values: Number of bytes consumed, or -EIO if there are no
diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h
index 34224e77f5a2..ef8ce86b1f78 100644
--- a/include/crypto/if_alg.h
+++ b/include/crypto/if_alg.h
@@ -229,8 +229,6 @@ void af_alg_wmem_wakeup(struct sock *sk);
 int af_alg_wait_for_data(struct sock *sk, unsigned flags, unsigned min);
 int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size,
 		   unsigned int ivsize);
-ssize_t af_alg_sendpage(struct socket *sock, struct page *page,
-			int offset, size_t size, int flags);
 void af_alg_free_resources(struct af_alg_async_req *areq);
 void af_alg_async_cb(void *data, int err);
 __poll_t af_alg_poll(struct file *file, struct socket *sock,
diff --git a/include/linux/net.h b/include/linux/net.h
index 23324e9a2b3d..41c608c1b02c 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -207,8 +207,6 @@ struct proto_ops {
 				      size_t total_len, int flags);
 	int		(*mmap)	     (struct file *file, struct socket *sock,
 				      struct vm_area_struct * vma);
-	ssize_t		(*sendpage)  (struct socket *sock, struct page *page,
-				      int offset, size_t size, int flags);
 	ssize_t 	(*splice_read)(struct socket *sock,  loff_t *ppos,
 				       struct pipe_inode_info *pipe, size_t len, unsigned int flags);
 	void		(*splice_eof)(struct socket *sock);
@@ -222,8 +220,6 @@ struct proto_ops {
 				     sk_read_actor_t recv_actor);
 	/* This is different from read_sock(), it reads an entire skb at a time. */
 	int		(*read_skb)(struct sock *sk, skb_read_actor_t recv_actor);
-	int		(*sendpage_locked)(struct sock *sk, struct page *page,
-					   int offset, size_t size, int flags);
 	int		(*sendmsg_locked)(struct sock *sk, struct msghdr *msg,
 					  size_t size);
 	int		(*set_rcvlowat)(struct sock *sk, int val);
@@ -341,10 +337,6 @@ int kernel_connect(struct socket *sock, struct sockaddr *addr, int addrlen,
 		   int flags);
 int kernel_getsockname(struct socket *sock, struct sockaddr *addr);
 int kernel_getpeername(struct socket *sock, struct sockaddr *addr);
-int kernel_sendpage(struct socket *sock, struct page *page, int offset,
-		    size_t size, int flags);
-int kernel_sendpage_locked(struct sock *sk, struct page *page, int offset,
-			   size_t size, int flags);
 int kernel_sock_shutdown(struct socket *sock, enum sock_shutdown_cmd how);
 
 /* Routine returns the IP overhead imposed by a (caller-protected) socket. */
diff --git a/include/net/inet_common.h b/include/net/inet_common.h
index a75333342c4e..b86b8e21de7f 100644
--- a/include/net/inet_common.h
+++ b/include/net/inet_common.h
@@ -36,8 +36,6 @@ void __inet_accept(struct socket *sock, struct socket *newsock,
 int inet_send_prepare(struct sock *sk);
 int inet_sendmsg(struct socket *sock, struct msghdr *msg, size_t size);
 void inet_splice_eof(struct socket *sock);
-ssize_t inet_sendpage(struct socket *sock, struct page *page, int offset,
-		      size_t size, int flags);
 int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
 		 int flags);
 int inet_shutdown(struct socket *sock, int how);
diff --git a/include/net/sock.h b/include/net/sock.h
index 62a1b99da349..121284f455a8 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1277,8 +1277,6 @@ struct proto {
 					   size_t len);
 	int			(*recvmsg)(struct sock *sk, struct msghdr *msg,
 					   size_t len, int flags, int *addr_len);
-	int			(*sendpage)(struct sock *sk, struct page *page,
-					int offset, size_t size, int flags);
 	void			(*splice_eof)(struct socket *sock);
 	int			(*bind)(struct sock *sk,
 					struct sockaddr *addr, int addr_len);
@@ -1919,10 +1917,6 @@ int sock_no_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len);
 int sock_no_recvmsg(struct socket *, struct msghdr *, size_t, int);
 int sock_no_mmap(struct file *file, struct socket *sock,
 		 struct vm_area_struct *vma);
-ssize_t sock_no_sendpage(struct socket *sock, struct page *page, int offset,
-			 size_t size, int flags);
-ssize_t sock_no_sendpage_locked(struct sock *sk, struct page *page,
-				int offset, size_t size, int flags);
 
 /*
  * Functions to fill in entries in struct proto_ops when a protocol
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 9c08eab647a2..95e4507febed 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -328,10 +328,6 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size);
 int tcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg, int *copied,
 			 size_t size, struct ubuf_info *uarg);
 void tcp_splice_eof(struct socket *sock);
-int tcp_sendpage(struct sock *sk, struct page *page, int offset, size_t size,
-		 int flags);
-int tcp_sendpage_locked(struct sock *sk, struct page *page, int offset,
-			size_t size, int flags);
 int tcp_send_mss(struct sock *sk, int *size_goal, int flags);
 int tcp_wmem_schedule(struct sock *sk, int copy);
 void tcp_push(struct sock *sk, int flags, int mss_now, int nonagle,
diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
index a06f4d4a6f47..8978fb6212ff 100644
--- a/net/appletalk/ddp.c
+++ b/net/appletalk/ddp.c
@@ -1929,7 +1929,6 @@ static const struct proto_ops atalk_dgram_ops = {
 	.sendmsg	= atalk_sendmsg,
 	.recvmsg	= atalk_recvmsg,
 	.mmap		= sock_no_mmap,
-	.sendpage	= sock_no_sendpage,
 };
 
 static struct notifier_block ddp_notifier = {
diff --git a/net/atm/pvc.c b/net/atm/pvc.c
index 53e7d3f39e26..66d9a9bd5896 100644
--- a/net/atm/pvc.c
+++ b/net/atm/pvc.c
@@ -126,7 +126,6 @@ static const struct proto_ops pvc_proto_ops = {
 	.sendmsg =	vcc_sendmsg,
 	.recvmsg =	vcc_recvmsg,
 	.mmap =		sock_no_mmap,
-	.sendpage =	sock_no_sendpage,
 };
 
 
diff --git a/net/atm/svc.c b/net/atm/svc.c
index d83556d8beb9..36a814f1fbd1 100644
--- a/net/atm/svc.c
+++ b/net/atm/svc.c
@@ -654,7 +654,6 @@ static const struct proto_ops svc_proto_ops = {
 	.sendmsg =	vcc_sendmsg,
 	.recvmsg =	vcc_recvmsg,
 	.mmap =		sock_no_mmap,
-	.sendpage =	sock_no_sendpage,
 };
 
 
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index d8da400cb4de..5db805d5f74d 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -2022,7 +2022,6 @@ static const struct proto_ops ax25_proto_ops = {
 	.sendmsg	= ax25_sendmsg,
 	.recvmsg	= ax25_recvmsg,
 	.mmap		= sock_no_mmap,
-	.sendpage	= sock_no_sendpage,
 };
 
 /*
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index 4eebcc66c19a..9c82698da4f5 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -976,7 +976,6 @@ static const struct proto_ops caif_seqpacket_ops = {
 	.sendmsg = caif_seqpkt_sendmsg,
 	.recvmsg = caif_seqpkt_recvmsg,
 	.mmap = sock_no_mmap,
-	.sendpage = sock_no_sendpage,
 };
 
 static const struct proto_ops caif_stream_ops = {
@@ -996,7 +995,6 @@ static const struct proto_ops caif_stream_ops = {
 	.sendmsg = caif_stream_sendmsg,
 	.recvmsg = caif_stream_recvmsg,
 	.mmap = sock_no_mmap,
-	.sendpage = sock_no_sendpage,
 };
 
 /* This function is called when a socket is finally destroyed. */
diff --git a/net/can/bcm.c b/net/can/bcm.c
index a962ec2b8ba5..9ba35685b043 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -1703,7 +1703,6 @@ static const struct proto_ops bcm_ops = {
 	.sendmsg       = bcm_sendmsg,
 	.recvmsg       = bcm_recvmsg,
 	.mmap          = sock_no_mmap,
-	.sendpage      = sock_no_sendpage,
 };
 
 static struct proto bcm_proto __read_mostly = {
diff --git a/net/can/isotp.c b/net/can/isotp.c
index 84f9aba02901..1f25b45868cf 100644
--- a/net/can/isotp.c
+++ b/net/can/isotp.c
@@ -1699,7 +1699,6 @@ static const struct proto_ops isotp_ops = {
 	.sendmsg = isotp_sendmsg,
 	.recvmsg = isotp_recvmsg,
 	.mmap = sock_no_mmap,
-	.sendpage = sock_no_sendpage,
 };
 
 static struct proto isotp_proto __read_mostly = {
diff --git a/net/can/j1939/socket.c b/net/can/j1939/socket.c
index 35970c25496a..feaec4ad6d16 100644
--- a/net/can/j1939/socket.c
+++ b/net/can/j1939/socket.c
@@ -1306,7 +1306,6 @@ static const struct proto_ops j1939_ops = {
 	.sendmsg = j1939_sk_sendmsg,
 	.recvmsg = j1939_sk_recvmsg,
 	.mmap = sock_no_mmap,
-	.sendpage = sock_no_sendpage,
 };
 
 static struct proto j1939_proto __read_mostly = {
diff --git a/net/can/raw.c b/net/can/raw.c
index f64469b98260..15c79b079184 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -962,7 +962,6 @@ static const struct proto_ops raw_ops = {
 	.sendmsg       = raw_sendmsg,
 	.recvmsg       = raw_recvmsg,
 	.mmap          = sock_no_mmap,
-	.sendpage      = sock_no_sendpage,
 };
 
 static struct proto raw_proto __read_mostly = {
diff --git a/net/core/sock.c b/net/core/sock.c
index cff3e82514d1..5c4948fe8ab1 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3267,36 +3267,6 @@ void __receive_sock(struct file *file)
 	}
 }
 
-ssize_t sock_no_sendpage(struct socket *sock, struct page *page, int offset, size_t size, int flags)
-{
-	ssize_t res;
-	struct msghdr msg = {.msg_flags = flags};
-	struct kvec iov;
-	char *kaddr = kmap(page);
-	iov.iov_base = kaddr + offset;
-	iov.iov_len = size;
-	res = kernel_sendmsg(sock, &msg, &iov, 1, size);
-	kunmap(page);
-	return res;
-}
-EXPORT_SYMBOL(sock_no_sendpage);
-
-ssize_t sock_no_sendpage_locked(struct sock *sk, struct page *page,
-				int offset, size_t size, int flags)
-{
-	ssize_t res;
-	struct msghdr msg = {.msg_flags = flags};
-	struct kvec iov;
-	char *kaddr = kmap(page);
-
-	iov.iov_base = kaddr + offset;
-	iov.iov_len = size;
-	res = kernel_sendmsg_locked(sk, &msg, &iov, 1, size);
-	kunmap(page);
-	return res;
-}
-EXPORT_SYMBOL(sock_no_sendpage_locked);
-
 /*
  *	Default Socket Callbacks
  */
@@ -4052,7 +4022,7 @@ static void proto_seq_printf(struct seq_file *seq, struct proto *proto)
 {
 
 	seq_printf(seq, "%-9s %4u %6d  %6ld   %-3s %6u   %-3s  %-10s "
-			"%2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c\n",
+			"%2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c\n",
 		   proto->name,
 		   proto->obj_size,
 		   sock_prot_inuse_get(seq_file_net(seq), proto),
@@ -4073,7 +4043,6 @@ static void proto_seq_printf(struct seq_file *seq, struct proto *proto)
 		   proto_method_implemented(proto->getsockopt),
 		   proto_method_implemented(proto->sendmsg),
 		   proto_method_implemented(proto->recvmsg),
-		   proto_method_implemented(proto->sendpage),
 		   proto_method_implemented(proto->bind),
 		   proto_method_implemented(proto->backlog_rcv),
 		   proto_method_implemented(proto->hash),
@@ -4094,7 +4063,7 @@ static int proto_seq_show(struct seq_file *seq, void *v)
 			   "maxhdr",
 			   "slab",
 			   "module",
-			   "cl co di ac io in de sh ss gs se re sp bi br ha uh gp em\n");
+			   "cl co di ac io in de sh ss gs se re bi br ha uh gp em\n");
 	else
 		proto_seq_printf(seq, list_entry(v, struct proto, node));
 	return 0;
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 3ab68415d121..fa8079303cb0 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -1010,7 +1010,6 @@ static const struct proto_ops inet_dccp_ops = {
 	.sendmsg	   = inet_sendmsg,
 	.recvmsg	   = sock_common_recvmsg,
 	.mmap		   = sock_no_mmap,
-	.sendpage	   = sock_no_sendpage,
 };
 
 static struct inet_protosw dccp_v4_protosw = {
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 93c98990d726..7249ef218178 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -1087,7 +1087,6 @@ static const struct proto_ops inet6_dccp_ops = {
 	.sendmsg	   = inet_sendmsg,
 	.recvmsg	   = sock_common_recvmsg,
 	.mmap		   = sock_no_mmap,
-	.sendpage	   = sock_no_sendpage,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	   = inet6_compat_ioctl,
 #endif
diff --git a/net/ieee802154/socket.c b/net/ieee802154/socket.c
index 9c124705120d..00302e8b9615 100644
--- a/net/ieee802154/socket.c
+++ b/net/ieee802154/socket.c
@@ -426,7 +426,6 @@ static const struct proto_ops ieee802154_raw_ops = {
 	.sendmsg	   = ieee802154_sock_sendmsg,
 	.recvmsg	   = sock_common_recvmsg,
 	.mmap		   = sock_no_mmap,
-	.sendpage	   = sock_no_sendpage,
 };
 
 /* DGRAM Sockets (802.15.4 dataframes) */
@@ -989,7 +988,6 @@ static const struct proto_ops ieee802154_dgram_ops = {
 	.sendmsg	   = ieee802154_sock_sendmsg,
 	.recvmsg	   = sock_common_recvmsg,
 	.mmap		   = sock_no_mmap,
-	.sendpage	   = sock_no_sendpage,
 };
 
 static void ieee802154_sock_destruct(struct sock *sk)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 38e649fb4474..9b2ca2fcc5a1 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -847,23 +847,6 @@ void inet_splice_eof(struct socket *sock)
 }
 EXPORT_SYMBOL_GPL(inet_splice_eof);
 
-ssize_t inet_sendpage(struct socket *sock, struct page *page, int offset,
-		      size_t size, int flags)
-{
-	struct sock *sk = sock->sk;
-	const struct proto *prot;
-
-	if (unlikely(inet_send_prepare(sk)))
-		return -EAGAIN;
-
-	/* IPV6_ADDRFORM can change sk->sk_prot under us. */
-	prot = READ_ONCE(sk->sk_prot);
-	if (prot->sendpage)
-		return prot->sendpage(sk, page, offset, size, flags);
-	return sock_no_sendpage(sock, page, offset, size, flags);
-}
-EXPORT_SYMBOL(inet_sendpage);
-
 INDIRECT_CALLABLE_DECLARE(int udp_recvmsg(struct sock *, struct msghdr *,
 					  size_t, int, int *));
 int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
@@ -1067,12 +1050,10 @@ const struct proto_ops inet_stream_ops = {
 	.mmap		   = tcp_mmap,
 #endif
 	.splice_eof	   = inet_splice_eof,
-	.sendpage	   = inet_sendpage,
 	.splice_read	   = tcp_splice_read,
 	.read_sock	   = tcp_read_sock,
 	.read_skb	   = tcp_read_skb,
 	.sendmsg_locked    = tcp_sendmsg_locked,
-	.sendpage_locked   = tcp_sendpage_locked,
 	.peek_len	   = tcp_peek_len,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	   = inet_compat_ioctl,
@@ -1102,7 +1083,6 @@ const struct proto_ops inet_dgram_ops = {
 	.recvmsg	   = inet_recvmsg,
 	.mmap		   = sock_no_mmap,
 	.splice_eof	   = inet_splice_eof,
-	.sendpage	   = inet_sendpage,
 	.set_peek_off	   = sk_set_peek_off,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	   = inet_compat_ioctl,
@@ -1134,7 +1114,6 @@ static const struct proto_ops inet_sockraw_ops = {
 	.recvmsg	   = inet_recvmsg,
 	.mmap		   = sock_no_mmap,
 	.splice_eof	   = inet_splice_eof,
-	.sendpage	   = inet_sendpage,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	   = inet_compat_ioctl,
 #endif
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0e21ea92dc1d..38f9250999cc 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -923,11 +923,10 @@ int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
 	return mss_now;
 }
 
-/* In some cases, both sendpage() and sendmsg() could have added
- * an skb to the write queue, but failed adding payload on it.
- * We need to remove it to consume less memory, but more
- * importantly be able to generate EPOLLOUT for Edge Trigger epoll()
- * users.
+/* In some cases, both sendmsg() could have added an skb to the write queue,
+ * but failed adding payload on it.  We need to remove it to consume less
+ * memory, but more importantly be able to generate EPOLLOUT for Edge Trigger
+ * epoll() users.
  */
 void tcp_remove_empty_skb(struct sock *sk)
 {
@@ -975,40 +974,6 @@ int tcp_wmem_schedule(struct sock *sk, int copy)
 	return min(copy, sk->sk_forward_alloc);
 }
 
-int tcp_sendpage_locked(struct sock *sk, struct page *page, int offset,
-			size_t size, int flags)
-{
-	struct bio_vec bvec;
-	struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };
-
-	if (!(sk->sk_route_caps & NETIF_F_SG))
-		return sock_no_sendpage_locked(sk, page, offset, size, flags);
-
-	tcp_rate_check_app_limited(sk);  /* is sending application-limited? */
-
-	bvec_set_page(&bvec, page, size, offset);
-	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
-
-	if (flags & MSG_SENDPAGE_NOTLAST)
-		msg.msg_flags |= MSG_MORE;
-
-	return tcp_sendmsg_locked(sk, &msg, size);
-}
-EXPORT_SYMBOL_GPL(tcp_sendpage_locked);
-
-int tcp_sendpage(struct sock *sk, struct page *page, int offset,
-		 size_t size, int flags)
-{
-	int ret;
-
-	lock_sock(sk);
-	ret = tcp_sendpage_locked(sk, page, offset, size, flags);
-	release_sock(sk);
-
-	return ret;
-}
-EXPORT_SYMBOL(tcp_sendpage);
-
 void tcp_free_fastopen_req(struct tcp_sock *tp)
 {
 	if (tp->fastopen_req) {
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 31d6005cea9b..81f0dff69e0b 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -486,7 +486,7 @@ static int tcp_bpf_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 	long timeo;
 	int flags;
 
-	/* Don't let internal sendpage flags through */
+	/* Don't let internal flags through */
 	flags = (msg->msg_flags & ~MSG_SENDPAGE_DECRYPTED);
 	flags |= MSG_NO_SHARED_FRAGS;
 
@@ -566,23 +566,6 @@ static int tcp_bpf_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 	return copied ? copied : err;
 }
 
-static int tcp_bpf_sendpage(struct sock *sk, struct page *page, int offset,
-			    size_t size, int flags)
-{
-	struct bio_vec bvec;
-	struct msghdr msg = {
-		.msg_flags = flags | MSG_SPLICE_PAGES,
-	};
-
-	bvec_set_page(&bvec, page, size, offset);
-	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
-
-	if (flags & MSG_SENDPAGE_NOTLAST)
-		msg.msg_flags |= MSG_MORE;
-
-	return tcp_bpf_sendmsg(sk, &msg, size);
-}
-
 enum {
 	TCP_BPF_IPV4,
 	TCP_BPF_IPV6,
@@ -612,7 +595,6 @@ static void tcp_bpf_rebuild_protos(struct proto prot[TCP_BPF_NUM_CFGS],
 
 	prot[TCP_BPF_TX]			= prot[TCP_BPF_BASE];
 	prot[TCP_BPF_TX].sendmsg		= tcp_bpf_sendmsg;
-	prot[TCP_BPF_TX].sendpage		= tcp_bpf_sendpage;
 
 	prot[TCP_BPF_RX]			= prot[TCP_BPF_BASE];
 	prot[TCP_BPF_RX].recvmsg		= tcp_bpf_recvmsg_parser;
@@ -647,8 +629,7 @@ static int tcp_bpf_assert_proto_ops(struct proto *ops)
 	 * indeed valid assumptions.
 	 */
 	return ops->recvmsg  == tcp_recvmsg &&
-	       ops->sendmsg  == tcp_sendmsg &&
-	       ops->sendpage == tcp_sendpage ? 0 : -ENOTSUPP;
+	       ops->sendmsg  == tcp_sendmsg ? 0 : -ENOTSUPP;
 }
 
 int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 84a5d557dc1a..a228cdb23831 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -3117,7 +3117,6 @@ struct proto tcp_prot = {
 	.recvmsg		= tcp_recvmsg,
 	.sendmsg		= tcp_sendmsg,
 	.splice_eof		= tcp_splice_eof,
-	.sendpage		= tcp_sendpage,
 	.backlog_rcv		= tcp_v4_do_rcv,
 	.release_cb		= tcp_release_cb,
 	.hash			= inet_hash,
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 48fdcd3cad9c..42a96b3547c9 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1340,20 +1340,6 @@ void udp_splice_eof(struct socket *sock)
 }
 EXPORT_SYMBOL_GPL(udp_splice_eof);
 
-int udp_sendpage(struct sock *sk, struct page *page, int offset,
-		 size_t size, int flags)
-{
-	struct bio_vec bvec;
-	struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES };
-
-	if (flags & MSG_SENDPAGE_NOTLAST)
-		msg.msg_flags |= MSG_MORE;
-
-	bvec_set_page(&bvec, page, size, offset);
-	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
-	return udp_sendmsg(sk, &msg, size);
-}
-
 #define UDP_SKB_IS_STATELESS 0x80000000
 
 /* all head states (dst, sk, nf conntrack) except skb extensions are
@@ -2933,7 +2919,6 @@ struct proto udp_prot = {
 	.sendmsg		= udp_sendmsg,
 	.recvmsg		= udp_recvmsg,
 	.splice_eof		= udp_splice_eof,
-	.sendpage		= udp_sendpage,
 	.release_cb		= ip4_datagram_release_cb,
 	.hash			= udp_lib_hash,
 	.unhash			= udp_lib_unhash,
diff --git a/net/ipv4/udp_impl.h b/net/ipv4/udp_impl.h
index 4ba7a88a1b1d..e1ff3a375996 100644
--- a/net/ipv4/udp_impl.h
+++ b/net/ipv4/udp_impl.h
@@ -19,8 +19,6 @@ int udp_getsockopt(struct sock *sk, int level, int optname,
 
 int udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags,
 		int *addr_len);
-int udp_sendpage(struct sock *sk, struct page *page, int offset, size_t size,
-		 int flags);
 void udp_destroy_sock(struct sock *sk);
 
 #ifdef CONFIG_PROC_FS
diff --git a/net/ipv4/udplite.c b/net/ipv4/udplite.c
index 143f93a12f25..39ecdad1b50c 100644
--- a/net/ipv4/udplite.c
+++ b/net/ipv4/udplite.c
@@ -56,7 +56,6 @@ struct proto 	udplite_prot = {
 	.getsockopt	   = udp_getsockopt,
 	.sendmsg	   = udp_sendmsg,
 	.recvmsg	   = udp_recvmsg,
-	.sendpage	   = udp_sendpage,
 	.hash		   = udp_lib_hash,
 	.unhash		   = udp_lib_unhash,
 	.rehash		   = udp_v4_rehash,
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index b3451cf47d29..5d593ddc0347 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -696,9 +696,7 @@ const struct proto_ops inet6_stream_ops = {
 	.mmap		   = tcp_mmap,
 #endif
 	.splice_eof	   = inet_splice_eof,
-	.sendpage	   = inet_sendpage,
 	.sendmsg_locked    = tcp_sendmsg_locked,
-	.sendpage_locked   = tcp_sendpage_locked,
 	.splice_read	   = tcp_splice_read,
 	.read_sock	   = tcp_read_sock,
 	.read_skb	   = tcp_read_skb,
@@ -729,7 +727,6 @@ const struct proto_ops inet6_dgram_ops = {
 	.recvmsg	   = inet6_recvmsg,		/* retpoline's sake */
 	.read_skb	   = udp_read_skb,
 	.mmap		   = sock_no_mmap,
-	.sendpage	   = sock_no_sendpage,
 	.set_peek_off	   = sk_set_peek_off,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	   = inet6_compat_ioctl,
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index c9caeb5a43ed..ac1cef094c5f 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -1296,7 +1296,6 @@ const struct proto_ops inet6_sockraw_ops = {
 	.sendmsg	   = inet_sendmsg,		/* ok		*/
 	.recvmsg	   = sock_common_recvmsg,	/* ok		*/
 	.mmap		   = sock_no_mmap,
-	.sendpage	   = sock_no_sendpage,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	   = inet6_compat_ioctl,
 #endif
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index c17c8ff94b79..40dd92a2f480 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -2151,7 +2151,6 @@ struct proto tcpv6_prot = {
 	.recvmsg		= tcp_recvmsg,
 	.sendmsg		= tcp_sendmsg,
 	.splice_eof		= tcp_splice_eof,
-	.sendpage		= tcp_sendpage,
 	.backlog_rcv		= tcp_v6_do_rcv,
 	.release_cb		= tcp_release_cb,
 	.hash			= inet6_hash,
diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index d75d775e9462..b512f9958e43 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -961,24 +961,6 @@ static void kcm_splice_eof(struct socket *sock)
 	release_sock(sk);
 }
 
-static ssize_t kcm_sendpage(struct socket *sock, struct page *page,
-			    int offset, size_t size, int flags)
-
-{
-	struct bio_vec bvec;
-	struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };
-
-	if (flags & MSG_SENDPAGE_NOTLAST)
-		msg.msg_flags |= MSG_MORE;
-
-	if (flags & MSG_OOB)
-		return -EOPNOTSUPP;
-
-	bvec_set_page(&bvec, page, size, offset);
-	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
-	return kcm_sendmsg(sock, &msg, size);
-}
-
 static int kcm_recvmsg(struct socket *sock, struct msghdr *msg,
 		       size_t len, int flags)
 {
@@ -1767,7 +1749,6 @@ static const struct proto_ops kcm_dgram_ops = {
 	.recvmsg =	kcm_recvmsg,
 	.mmap =		sock_no_mmap,
 	.splice_eof =	kcm_splice_eof,
-	.sendpage =	kcm_sendpage,
 };
 
 static const struct proto_ops kcm_seqpacket_ops = {
@@ -1789,7 +1770,6 @@ static const struct proto_ops kcm_seqpacket_ops = {
 	.recvmsg =	kcm_recvmsg,
 	.mmap =		sock_no_mmap,
 	.splice_eof =	kcm_splice_eof,
-	.sendpage =	kcm_sendpage,
 	.splice_read =	kcm_splice_read,
 };
 
diff --git a/net/key/af_key.c b/net/key/af_key.c
index 31ab12fd720a..ede3c6a60353 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -3761,7 +3761,6 @@ static const struct proto_ops pfkey_ops = {
 	.listen		=	sock_no_listen,
 	.shutdown	=	sock_no_shutdown,
 	.mmap		=	sock_no_mmap,
-	.sendpage	=	sock_no_sendpage,
 
 	/* Now the operations that really occur. */
 	.release	=	pfkey_release,
diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
index 2b795c1064f5..f9073bc7281f 100644
--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -624,7 +624,6 @@ static const struct proto_ops l2tp_ip_ops = {
 	.sendmsg	   = inet_sendmsg,
 	.recvmsg	   = sock_common_recvmsg,
 	.mmap		   = sock_no_mmap,
-	.sendpage	   = sock_no_sendpage,
 };
 
 static struct inet_protosw l2tp_ip_protosw = {
diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index 5137ea1861ce..b1623f9c4f92 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -751,7 +751,6 @@ static const struct proto_ops l2tp_ip6_ops = {
 	.sendmsg	   = inet_sendmsg,
 	.recvmsg	   = sock_common_recvmsg,
 	.mmap		   = sock_no_mmap,
-	.sendpage	   = sock_no_sendpage,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	   = inet6_compat_ioctl,
 #endif
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 9ffbc667be6c..57c35c960b2c 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -1232,7 +1232,6 @@ static const struct proto_ops llc_ui_ops = {
 	.sendmsg     = llc_ui_sendmsg,
 	.recvmsg     = llc_ui_recvmsg,
 	.mmap	     = sock_no_mmap,
-	.sendpage    = sock_no_sendpage,
 };
 
 static const char llc_proc_err_msg[] __initconst =
diff --git a/net/mctp/af_mctp.c b/net/mctp/af_mctp.c
index bb4bd0b6a4f7..f6be58b68c6f 100644
--- a/net/mctp/af_mctp.c
+++ b/net/mctp/af_mctp.c
@@ -485,7 +485,6 @@ static const struct proto_ops mctp_dgram_ops = {
 	.sendmsg	= mctp_sendmsg,
 	.recvmsg	= mctp_recvmsg,
 	.mmap		= sock_no_mmap,
-	.sendpage	= sock_no_sendpage,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	= mctp_compat_ioctl,
 #endif
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 992b89c75631..e67983bd3bc7 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3891,7 +3891,6 @@ static const struct proto_ops mptcp_stream_ops = {
 	.sendmsg	   = inet_sendmsg,
 	.recvmsg	   = inet_recvmsg,
 	.mmap		   = sock_no_mmap,
-	.sendpage	   = inet_sendpage,
 };
 
 static struct inet_protosw mptcp_protosw = {
@@ -3986,7 +3985,6 @@ static const struct proto_ops mptcp_v6_stream_ops = {
 	.sendmsg	   = inet6_sendmsg,
 	.recvmsg	   = inet6_recvmsg,
 	.mmap		   = sock_no_mmap,
-	.sendpage	   = inet_sendpage,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	   = inet6_compat_ioctl,
 #endif
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index cbd9aa7ee24a..39cfb778ebc5 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2815,7 +2815,6 @@ static const struct proto_ops netlink_ops = {
 	.sendmsg =	netlink_sendmsg,
 	.recvmsg =	netlink_recvmsg,
 	.mmap =		sock_no_mmap,
-	.sendpage =	sock_no_sendpage,
 };
 
 static const struct net_proto_family netlink_family_ops = {
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index 5a4cb796150f..eb8ccbd58df7 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -1364,7 +1364,6 @@ static const struct proto_ops nr_proto_ops = {
 	.sendmsg	=	nr_sendmsg,
 	.recvmsg	=	nr_recvmsg,
 	.mmap		=	sock_no_mmap,
-	.sendpage	=	sock_no_sendpage,
 };
 
 static struct notifier_block nr_dev_notifier = {
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index a2dbeb264f26..85ff90a03b0c 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -4621,7 +4621,6 @@ static const struct proto_ops packet_ops_spkt = {
 	.sendmsg =	packet_sendmsg_spkt,
 	.recvmsg =	packet_recvmsg,
 	.mmap =		sock_no_mmap,
-	.sendpage =	sock_no_sendpage,
 };
 
 static const struct proto_ops packet_ops = {
@@ -4643,7 +4642,6 @@ static const struct proto_ops packet_ops = {
 	.sendmsg =	packet_sendmsg,
 	.recvmsg =	packet_recvmsg,
 	.mmap =		packet_mmap,
-	.sendpage =	sock_no_sendpage,
 };
 
 static const struct net_proto_family packet_family_ops = {
diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index 967f9b4dc026..1018340d89a7 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -441,7 +441,6 @@ const struct proto_ops phonet_dgram_ops = {
 	.sendmsg	= pn_socket_sendmsg,
 	.recvmsg	= sock_common_recvmsg,
 	.mmap		= sock_no_mmap,
-	.sendpage	= sock_no_sendpage,
 };
 
 const struct proto_ops phonet_stream_ops = {
@@ -462,7 +461,6 @@ const struct proto_ops phonet_stream_ops = {
 	.sendmsg	= pn_socket_sendmsg,
 	.recvmsg	= sock_common_recvmsg,
 	.mmap		= sock_no_mmap,
-	.sendpage	= sock_no_sendpage,
 };
 EXPORT_SYMBOL(phonet_stream_ops);
 
diff --git a/net/qrtr/af_qrtr.c b/net/qrtr/af_qrtr.c
index 76f0434d3d06..78beb74146e7 100644
--- a/net/qrtr/af_qrtr.c
+++ b/net/qrtr/af_qrtr.c
@@ -1244,7 +1244,6 @@ static const struct proto_ops qrtr_proto_ops = {
 	.shutdown	= sock_no_shutdown,
 	.release	= qrtr_release,
 	.mmap		= sock_no_mmap,
-	.sendpage	= sock_no_sendpage,
 };
 
 static struct proto qrtr_proto = {
diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
index 3ff6995244e5..01c4cdfef45d 100644
--- a/net/rds/af_rds.c
+++ b/net/rds/af_rds.c
@@ -653,7 +653,6 @@ static const struct proto_ops rds_proto_ops = {
 	.sendmsg =	rds_sendmsg,
 	.recvmsg =	rds_recvmsg,
 	.mmap =		sock_no_mmap,
-	.sendpage =	sock_no_sendpage,
 };
 
 static void rds_sock_destruct(struct sock *sk)
diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index ca2b17f32670..49dafe9ac72f 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -1496,7 +1496,6 @@ static const struct proto_ops rose_proto_ops = {
 	.sendmsg	=	rose_sendmsg,
 	.recvmsg	=	rose_recvmsg,
 	.mmap		=	sock_no_mmap,
-	.sendpage	=	sock_no_sendpage,
 };
 
 static struct notifier_block rose_dev_notifier = {
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index da0b3b5157d5..f2cf4aa99db2 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -954,7 +954,6 @@ static const struct proto_ops rxrpc_rpc_ops = {
 	.sendmsg	= rxrpc_sendmsg,
 	.recvmsg	= rxrpc_recvmsg,
 	.mmap		= sock_no_mmap,
-	.sendpage	= sock_no_sendpage,
 };
 
 static struct proto rxrpc_proto = {
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 664d1f2e9121..274d07bd774f 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1133,7 +1133,6 @@ static const struct proto_ops inet_seqpacket_ops = {
 	.sendmsg	   = inet_sendmsg,
 	.recvmsg	   = inet_recvmsg,
 	.mmap		   = sock_no_mmap,
-	.sendpage	   = sock_no_sendpage,
 };
 
 /* Registration with AF_INET family.  */
diff --git a/net/socket.c b/net/socket.c
index b778fc03c6e0..8c3c8b29995a 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -3552,54 +3552,6 @@ int kernel_getpeername(struct socket *sock, struct sockaddr *addr)
 }
 EXPORT_SYMBOL(kernel_getpeername);
 
-/**
- *	kernel_sendpage - send a &page through a socket (kernel space)
- *	@sock: socket
- *	@page: page
- *	@offset: page offset
- *	@size: total size in bytes
- *	@flags: flags (MSG_DONTWAIT, ...)
- *
- *	Returns the total amount sent in bytes or an error.
- */
-
-int kernel_sendpage(struct socket *sock, struct page *page, int offset,
-		    size_t size, int flags)
-{
-	if (sock->ops->sendpage) {
-		/* Warn in case the improper page to zero-copy send */
-		WARN_ONCE(!sendpage_ok(page), "improper page for zero-copy send");
-		return sock->ops->sendpage(sock, page, offset, size, flags);
-	}
-	return sock_no_sendpage(sock, page, offset, size, flags);
-}
-EXPORT_SYMBOL(kernel_sendpage);
-
-/**
- *	kernel_sendpage_locked - send a &page through the locked sock (kernel space)
- *	@sk: sock
- *	@page: page
- *	@offset: page offset
- *	@size: total size in bytes
- *	@flags: flags (MSG_DONTWAIT, ...)
- *
- *	Returns the total amount sent in bytes or an error.
- *	Caller must hold @sk.
- */
-
-int kernel_sendpage_locked(struct sock *sk, struct page *page, int offset,
-			   size_t size, int flags)
-{
-	struct socket *sock = sk->sk_socket;
-
-	if (sock->ops->sendpage_locked)
-		return sock->ops->sendpage_locked(sk, page, offset, size,
-						  flags);
-
-	return sock_no_sendpage_locked(sk, page, offset, size, flags);
-}
-EXPORT_SYMBOL(kernel_sendpage_locked);
-
 /**
  *	kernel_sock_shutdown - shut down part of a full-duplex connection (kernel space)
  *	@sock: socket
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index dd73d71c02a9..ef8e5139a873 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -3375,7 +3375,6 @@ static const struct proto_ops msg_ops = {
 	.sendmsg	= tipc_sendmsg,
 	.recvmsg	= tipc_recvmsg,
 	.mmap		= sock_no_mmap,
-	.sendpage	= sock_no_sendpage
 };
 
 static const struct proto_ops packet_ops = {
@@ -3396,7 +3395,6 @@ static const struct proto_ops packet_ops = {
 	.sendmsg	= tipc_send_packet,
 	.recvmsg	= tipc_recvmsg,
 	.mmap		= sock_no_mmap,
-	.sendpage	= sock_no_sendpage
 };
 
 static const struct proto_ops stream_ops = {
@@ -3417,7 +3415,6 @@ static const struct proto_ops stream_ops = {
 	.sendmsg	= tipc_sendstream,
 	.recvmsg	= tipc_recvstream,
 	.mmap		= sock_no_mmap,
-	.sendpage	= sock_no_sendpage
 };
 
 static const struct net_proto_family tipc_family_ops = {
diff --git a/net/tls/tls.h b/net/tls/tls.h
index d002c3af1966..86cef1c68e03 100644
--- a/net/tls/tls.h
+++ b/net/tls/tls.h
@@ -98,10 +98,6 @@ void tls_sw_strparser_arm(struct sock *sk, struct tls_context *ctx);
 void tls_sw_strparser_done(struct tls_context *tls_ctx);
 int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
 void tls_sw_splice_eof(struct socket *sock);
-int tls_sw_sendpage_locked(struct sock *sk, struct page *page,
-			   int offset, size_t size, int flags);
-int tls_sw_sendpage(struct sock *sk, struct page *page,
-		    int offset, size_t size, int flags);
 void tls_sw_cancel_work_tx(struct tls_context *tls_ctx);
 void tls_sw_release_resources_tx(struct sock *sk);
 void tls_sw_free_ctx_tx(struct tls_context *tls_ctx);
@@ -117,8 +113,6 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
 
 int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
 void tls_device_splice_eof(struct socket *sock);
-int tls_device_sendpage(struct sock *sk, struct page *page,
-			int offset, size_t size, int flags);
 int tls_tx_records(struct sock *sk, int flags);
 
 void tls_sw_write_space(struct sock *sk, struct tls_context *ctx);
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 975299d7213b..840ee06f1708 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -621,23 +621,6 @@ void tls_device_splice_eof(struct socket *sock)
 	mutex_unlock(&tls_ctx->tx_lock);
 }
 
-int tls_device_sendpage(struct sock *sk, struct page *page,
-			int offset, size_t size, int flags)
-{
-	struct bio_vec bvec;
-	struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };
-
-	if (flags & MSG_SENDPAGE_NOTLAST)
-		msg.msg_flags |= MSG_MORE;
-
-	if (flags & MSG_OOB)
-		return -EOPNOTSUPP;
-
-	bvec_set_page(&bvec, page, size, offset);
-	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
-	return tls_device_sendmsg(sk, &msg, size);
-}
-
 struct tls_record_info *tls_get_record(struct tls_offload_context_tx *context,
 				       u32 seq, u64 *p_record_sn)
 {
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 7b9c83dd7de2..d5ed4d47b16e 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -958,7 +958,6 @@ static void build_proto_ops(struct proto_ops ops[TLS_NUM_CONFIG][TLS_NUM_CONFIG]
 
 	ops[TLS_SW  ][TLS_BASE] = ops[TLS_BASE][TLS_BASE];
 	ops[TLS_SW  ][TLS_BASE].splice_eof	= tls_sw_splice_eof;
-	ops[TLS_SW  ][TLS_BASE].sendpage_locked	= tls_sw_sendpage_locked;
 
 	ops[TLS_BASE][TLS_SW  ] = ops[TLS_BASE][TLS_BASE];
 	ops[TLS_BASE][TLS_SW  ].splice_read	= tls_sw_splice_read;
@@ -970,17 +969,14 @@ static void build_proto_ops(struct proto_ops ops[TLS_NUM_CONFIG][TLS_NUM_CONFIG]
 
 #ifdef CONFIG_TLS_DEVICE
 	ops[TLS_HW  ][TLS_BASE] = ops[TLS_BASE][TLS_BASE];
-	ops[TLS_HW  ][TLS_BASE].sendpage_locked	= NULL;
 
 	ops[TLS_HW  ][TLS_SW  ] = ops[TLS_BASE][TLS_SW  ];
-	ops[TLS_HW  ][TLS_SW  ].sendpage_locked	= NULL;
 
 	ops[TLS_BASE][TLS_HW  ] = ops[TLS_BASE][TLS_SW  ];
 
 	ops[TLS_SW  ][TLS_HW  ] = ops[TLS_SW  ][TLS_SW  ];
 
 	ops[TLS_HW  ][TLS_HW  ] = ops[TLS_HW  ][TLS_SW  ];
-	ops[TLS_HW  ][TLS_HW  ].sendpage_locked	= NULL;
 #endif
 #ifdef CONFIG_TLS_TOE
 	ops[TLS_HW_RECORD][TLS_HW_RECORD] = *base;
@@ -1029,7 +1025,6 @@ static void build_protos(struct proto prot[TLS_NUM_CONFIG][TLS_NUM_CONFIG],
 	prot[TLS_SW][TLS_BASE] = prot[TLS_BASE][TLS_BASE];
 	prot[TLS_SW][TLS_BASE].sendmsg		= tls_sw_sendmsg;
 	prot[TLS_SW][TLS_BASE].splice_eof	= tls_sw_splice_eof;
-	prot[TLS_SW][TLS_BASE].sendpage		= tls_sw_sendpage;
 
 	prot[TLS_BASE][TLS_SW] = prot[TLS_BASE][TLS_BASE];
 	prot[TLS_BASE][TLS_SW].recvmsg		  = tls_sw_recvmsg;
@@ -1045,12 +1040,10 @@ static void build_protos(struct proto prot[TLS_NUM_CONFIG][TLS_NUM_CONFIG],
 	prot[TLS_HW][TLS_BASE] = prot[TLS_BASE][TLS_BASE];
 	prot[TLS_HW][TLS_BASE].sendmsg		= tls_device_sendmsg;
 	prot[TLS_HW][TLS_BASE].splice_eof	= tls_device_splice_eof;
-	prot[TLS_HW][TLS_BASE].sendpage		= tls_device_sendpage;
 
 	prot[TLS_HW][TLS_SW] = prot[TLS_BASE][TLS_SW];
 	prot[TLS_HW][TLS_SW].sendmsg		= tls_device_sendmsg;
 	prot[TLS_HW][TLS_SW].splice_eof		= tls_device_splice_eof;
-	prot[TLS_HW][TLS_SW].sendpage		= tls_device_sendpage;
 
 	prot[TLS_BASE][TLS_HW] = prot[TLS_BASE][TLS_SW];
 
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 319f61590d2c..9b3aa89a4292 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1281,41 +1281,6 @@ void tls_sw_splice_eof(struct socket *sock)
 	mutex_unlock(&tls_ctx->tx_lock);
 }
 
-int tls_sw_sendpage_locked(struct sock *sk, struct page *page,
-			   int offset, size_t size, int flags)
-{
-	struct bio_vec bvec;
-	struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };
-
-	if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
-		      MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY |
-		      MSG_NO_SHARED_FRAGS))
-		return -EOPNOTSUPP;
-	if (flags & MSG_SENDPAGE_NOTLAST)
-		msg.msg_flags |= MSG_MORE;
-
-	bvec_set_page(&bvec, page, size, offset);
-	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
-	return tls_sw_sendmsg_locked(sk, &msg, size);
-}
-
-int tls_sw_sendpage(struct sock *sk, struct page *page,
-		    int offset, size_t size, int flags)
-{
-	struct bio_vec bvec;
-	struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };
-
-	if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
-		      MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY))
-		return -EOPNOTSUPP;
-	if (flags & MSG_SENDPAGE_NOTLAST)
-		msg.msg_flags |= MSG_MORE;
-
-	bvec_set_page(&bvec, page, size, offset);
-	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
-	return tls_sw_sendmsg(sk, &msg, size);
-}
-
 static int
 tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
 		bool released)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 73c61a010b01..3953daa2e1d0 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -758,8 +758,6 @@ static int unix_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned lon
 static int unix_shutdown(struct socket *, int);
 static int unix_stream_sendmsg(struct socket *, struct msghdr *, size_t);
 static int unix_stream_recvmsg(struct socket *, struct msghdr *, size_t, int);
-static ssize_t unix_stream_sendpage(struct socket *, struct page *, int offset,
-				    size_t size, int flags);
 static ssize_t unix_stream_splice_read(struct socket *,  loff_t *ppos,
 				       struct pipe_inode_info *, size_t size,
 				       unsigned int flags);
@@ -852,7 +850,6 @@ static const struct proto_ops unix_stream_ops = {
 	.recvmsg =	unix_stream_recvmsg,
 	.read_skb =	unix_stream_read_skb,
 	.mmap =		sock_no_mmap,
-	.sendpage =	unix_stream_sendpage,
 	.splice_read =	unix_stream_splice_read,
 	.set_peek_off =	unix_set_peek_off,
 	.show_fdinfo =	unix_show_fdinfo,
@@ -878,7 +875,6 @@ static const struct proto_ops unix_dgram_ops = {
 	.read_skb =	unix_read_skb,
 	.recvmsg =	unix_dgram_recvmsg,
 	.mmap =		sock_no_mmap,
-	.sendpage =	sock_no_sendpage,
 	.set_peek_off =	unix_set_peek_off,
 	.show_fdinfo =	unix_show_fdinfo,
 };
@@ -902,7 +898,6 @@ static const struct proto_ops unix_seqpacket_ops = {
 	.sendmsg =	unix_seqpacket_sendmsg,
 	.recvmsg =	unix_seqpacket_recvmsg,
 	.mmap =		sock_no_mmap,
-	.sendpage =	sock_no_sendpage,
 	.set_peek_off =	unix_set_peek_off,
 	.show_fdinfo =	unix_show_fdinfo,
 };
@@ -2294,20 +2289,6 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 	return sent ? : err;
 }
 
-static ssize_t unix_stream_sendpage(struct socket *socket, struct page *page,
-				    int offset, size_t size, int flags)
-{
-	struct bio_vec bvec;
-	struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES };
-
-	if (flags & MSG_SENDPAGE_NOTLAST)
-		msg.msg_flags |= MSG_MORE;
-
-	bvec_set_page(&bvec, page, size, offset);
-	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
-	return unix_stream_sendmsg(socket, &msg, size);
-}
-
 static int unix_seqpacket_sendmsg(struct socket *sock, struct msghdr *msg,
 				  size_t len)
 {
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index efb8a0937a13..020cf17ab7e4 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1306,7 +1306,6 @@ static const struct proto_ops vsock_dgram_ops = {
 	.sendmsg = vsock_dgram_sendmsg,
 	.recvmsg = vsock_dgram_recvmsg,
 	.mmap = sock_no_mmap,
-	.sendpage = sock_no_sendpage,
 	.read_skb = vsock_read_skb,
 };
 
@@ -2234,7 +2233,6 @@ static const struct proto_ops vsock_stream_ops = {
 	.sendmsg = vsock_connectible_sendmsg,
 	.recvmsg = vsock_connectible_recvmsg,
 	.mmap = sock_no_mmap,
-	.sendpage = sock_no_sendpage,
 	.set_rcvlowat = vsock_set_rcvlowat,
 	.read_skb = vsock_read_skb,
 };
@@ -2257,7 +2255,6 @@ static const struct proto_ops vsock_seqpacket_ops = {
 	.sendmsg = vsock_connectible_sendmsg,
 	.recvmsg = vsock_connectible_recvmsg,
 	.mmap = sock_no_mmap,
-	.sendpage = sock_no_sendpage,
 	.read_skb = vsock_read_skb,
 };
 
diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
index 5c7ad301d742..0fb5143bec7a 100644
--- a/net/x25/af_x25.c
+++ b/net/x25/af_x25.c
@@ -1757,7 +1757,6 @@ static const struct proto_ops x25_proto_ops = {
 	.sendmsg =	x25_sendmsg,
 	.recvmsg =	x25_recvmsg,
 	.mmap =		sock_no_mmap,
-	.sendpage =	sock_no_sendpage,
 };
 
 static struct packet_type x25_packet_type __read_mostly = {
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index cc1e7f15fa73..5a8c0dd250af 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -1389,7 +1389,6 @@ static const struct proto_ops xsk_proto_ops = {
 	.sendmsg	= xsk_sendmsg,
 	.recvmsg	= xsk_recvmsg,
 	.mmap		= xsk_mmap,
-	.sendpage	= sock_no_sendpage,
 };
 
 static void xsk_destruct(struct sock *sk)


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH net-next v5 16/16] net: Kill MSG_SENDPAGE_NOTLAST
  2023-06-23 22:54 [PATCH net-next v5 00/16] splice, net: Switch over users of sendpage() and remove it David Howells
                   ` (14 preceding siblings ...)
  2023-06-23 22:55 ` [PATCH net-next v5 15/16] sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) David Howells
@ 2023-06-23 22:55 ` David Howells
  2023-06-24 23:00 ` [PATCH net-next v5 00/16] splice, net: Switch over users of sendpage() and remove it patchwork-bot+netdevbpf
  16 siblings, 0 replies; 29+ messages in thread
From: David Howells @ 2023-06-23 22:55 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel, bpf, dccp,
	linux-afs, linux-arm-msm, linux-can, linux-crypto, linux-doc,
	linux-hams, linux-perf-users, linux-rdma, linux-sctp, linux-wpan,
	linux-x25, mptcp, rds-devel, tipc-discussion, virtualization

Now that ->sendpage() has been removed, MSG_SENDPAGE_NOTLAST can be cleaned
up.  Things were converted to use MSG_MORE instead, but the protocol
sendpage stubs still convert MSG_SENDPAGE_NOTLAST to MSG_MORE, which is now
unnecessary.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: bpf@vger.kernel.org
cc: dccp@vger.kernel.org
cc: linux-afs@lists.infradead.org
cc: linux-arm-msm@vger.kernel.org
cc: linux-can@vger.kernel.org
cc: linux-crypto@vger.kernel.org
cc: linux-doc@vger.kernel.org
cc: linux-hams@vger.kernel.org
cc: linux-perf-users@vger.kernel.org
cc: linux-rdma@vger.kernel.org
cc: linux-sctp@vger.kernel.org
cc: linux-wpan@vger.kernel.org
cc: linux-x25@vger.kernel.org
cc: mptcp@lists.linux.dev
cc: netdev@vger.kernel.org
cc: rds-devel@oss.oracle.com
cc: tipc-discussion@lists.sourceforge.net
cc: virtualization@lists.linux-foundation.org
---

Notes:
    ver #3)
     - tcp_bpf is now handled by an earlier patch.

 include/linux/socket.h                         | 4 +---
 net/tls/tls_device.c                           | 3 +--
 net/tls/tls_main.c                             | 2 +-
 net/tls/tls_sw.c                               | 2 +-
 tools/perf/trace/beauty/include/linux/socket.h | 1 -
 tools/perf/trace/beauty/msg_flags.c            | 5 +----
 6 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 58204700018a..39b74d83c7c4 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -319,7 +319,6 @@ struct ucred {
 #define MSG_MORE	0x8000	/* Sender will send more */
 #define MSG_WAITFORONE	0x10000	/* recvmmsg(): block until 1+ packets avail */
 #define MSG_SENDPAGE_NOPOLICY 0x10000 /* sendpage() internal : do no apply policy */
-#define MSG_SENDPAGE_NOTLAST 0x20000 /* sendpage() internal : not the last page */
 #define MSG_BATCH	0x40000 /* sendmmsg(): more messages coming */
 #define MSG_EOF         MSG_FIN
 #define MSG_NO_SHARED_FRAGS 0x80000 /* sendpage() internal : page frags are not shared */
@@ -341,8 +340,7 @@ struct ucred {
 
 /* Flags to be cleared on entry by sendmsg and sendmmsg syscalls */
 #define MSG_INTERNAL_SENDMSG_FLAGS \
-	(MSG_SPLICE_PAGES | MSG_SENDPAGE_NOPOLICY | MSG_SENDPAGE_NOTLAST | \
-	 MSG_SENDPAGE_DECRYPTED)
+	(MSG_SPLICE_PAGES | MSG_SENDPAGE_NOPOLICY | MSG_SENDPAGE_DECRYPTED)
 
 /* Setsockoptions(2) level. Thanks to BSD these must match IPPROTO_xxx */
 #define SOL_IP		0
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 840ee06f1708..2021fe557e50 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -441,8 +441,7 @@ static int tls_push_data(struct sock *sk,
 	long timeo;
 
 	if (flags &
-	    ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST |
-	      MSG_SPLICE_PAGES))
+	    ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SPLICE_PAGES))
 		return -EOPNOTSUPP;
 
 	if (unlikely(sk->sk_err))
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index d5ed4d47b16e..b6896126bb92 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -127,7 +127,7 @@ int tls_push_sg(struct sock *sk,
 {
 	struct bio_vec bvec;
 	struct msghdr msg = {
-		.msg_flags = MSG_SENDPAGE_NOTLAST | MSG_SPLICE_PAGES | flags,
+		.msg_flags = MSG_SPLICE_PAGES | flags,
 	};
 	int ret = 0;
 	struct page *p;
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 9b3aa89a4292..53f944e6d8ef 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1194,7 +1194,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 
 	if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
 			       MSG_CMSG_COMPAT | MSG_SPLICE_PAGES |
-			       MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY))
+			       MSG_SENDPAGE_NOPOLICY))
 		return -EOPNOTSUPP;
 
 	ret = mutex_lock_interruptible(&tls_ctx->tx_lock);
diff --git a/tools/perf/trace/beauty/include/linux/socket.h b/tools/perf/trace/beauty/include/linux/socket.h
index 13c3a237b9c9..3bef212a24d7 100644
--- a/tools/perf/trace/beauty/include/linux/socket.h
+++ b/tools/perf/trace/beauty/include/linux/socket.h
@@ -318,7 +318,6 @@ struct ucred {
 #define MSG_MORE	0x8000	/* Sender will send more */
 #define MSG_WAITFORONE	0x10000	/* recvmmsg(): block until 1+ packets avail */
 #define MSG_SENDPAGE_NOPOLICY 0x10000 /* sendpage() internal : do no apply policy */
-#define MSG_SENDPAGE_NOTLAST 0x20000 /* sendpage() internal : not the last page */
 #define MSG_BATCH	0x40000 /* sendmmsg(): more messages coming */
 #define MSG_EOF         MSG_FIN
 #define MSG_NO_SHARED_FRAGS 0x80000 /* sendpage() internal : page frags are not shared */
diff --git a/tools/perf/trace/beauty/msg_flags.c b/tools/perf/trace/beauty/msg_flags.c
index ea68db08b8e7..5cdebd7ece7e 100644
--- a/tools/perf/trace/beauty/msg_flags.c
+++ b/tools/perf/trace/beauty/msg_flags.c
@@ -8,9 +8,6 @@
 #ifndef MSG_WAITFORONE
 #define MSG_WAITFORONE		   0x10000
 #endif
-#ifndef MSG_SENDPAGE_NOTLAST
-#define MSG_SENDPAGE_NOTLAST	   0x20000
-#endif
 #ifndef MSG_FASTOPEN
 #define MSG_FASTOPEN		0x20000000
 #endif
@@ -50,7 +47,7 @@ static size_t syscall_arg__scnprintf_msg_flags(char *bf, size_t size,
 	P_MSG_FLAG(NOSIGNAL);
 	P_MSG_FLAG(MORE);
 	P_MSG_FLAG(WAITFORONE);
-	P_MSG_FLAG(SENDPAGE_NOTLAST);
+	P_MSG_FLAG(SPLICE_PAGES);
 	P_MSG_FLAG(FASTOPEN);
 	P_MSG_FLAG(CMSG_CLOEXEC);
 #undef P_MSG_FLAG


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next v5 00/16] splice, net: Switch over users of sendpage() and remove it
  2023-06-23 22:54 [PATCH net-next v5 00/16] splice, net: Switch over users of sendpage() and remove it David Howells
                   ` (15 preceding siblings ...)
  2023-06-23 22:55 ` [PATCH net-next v5 16/16] net: Kill MSG_SENDPAGE_NOTLAST David Howells
@ 2023-06-24 23:00 ` patchwork-bot+netdevbpf
  16 siblings, 0 replies; 29+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-06-24 23:00 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, alexander.duyck, davem, edumazet, kuba, pabeni,
	willemdebruijn.kernel, dsahern, willy, axboe, linux-mm,
	linux-kernel

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 23 Jun 2023 23:54:57 +0100 you wrote:
> Here's the final set of patches towards the removal of sendpage.  All the
> drivers that use sendpage() get switched over to using sendmsg() with
> MSG_SPLICE_PAGES.
> 
> The following changes are made:
> 
>  (1) Make the protocol drivers behave according to MSG_MORE, not
>      MSG_SENDPAGE_NOTLAST.  The latter is restricted to turning on MSG_MORE
>      in the sendpage() wrappers.
> 
> [...]

Here is the summary with links:
  - [net-next,v5,01/16] tcp_bpf, smc, tls, espintcp, siw: Reduce MSG_SENDPAGE_NOTLAST usage
    https://git.kernel.org/netdev/net-next/c/f8dd95b29d7e
  - [net-next,v5,02/16] net: Use sendmsg(MSG_SPLICE_PAGES) not sendpage in skb_send_sock()
    https://git.kernel.org/netdev/net-next/c/c729ed6f5be5
  - [net-next,v5,03/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage
    https://git.kernel.org/netdev/net-next/c/40a8c17aa770
  - [net-next,v5,04/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage()
    https://git.kernel.org/netdev/net-next/c/fa094ccae1e7
  - [net-next,v5,05/16] rds: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage
    https://git.kernel.org/netdev/net-next/c/572efade27c5
  - [net-next,v5,06/16] dlm: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage
    https://git.kernel.org/netdev/net-next/c/a1a5e8752786
  - [net-next,v5,07/16] nvme-tcp: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage
    https://git.kernel.org/netdev/net-next/c/7769887817c3
  - [net-next,v5,08/16] nvmet-tcp: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage
    https://git.kernel.org/netdev/net-next/c/c336a79983c7
  - [net-next,v5,09/16] smc: Drop smc_sendpage() in favour of smc_sendmsg() + MSG_SPLICE_PAGES
    https://git.kernel.org/netdev/net-next/c/2f8bc2bbb0fa
  - [net-next,v5,10/16] drbd: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage()
    https://git.kernel.org/netdev/net-next/c/eeac7405c735
  - [net-next,v5,11/16] scsi: iscsi_tcp: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage
    https://git.kernel.org/netdev/net-next/c/fa8df3435727
  - [net-next,v5,12/16] scsi: target: iscsi: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage
    https://git.kernel.org/netdev/net-next/c/d2fe21077d6d
  - [net-next,v5,13/16] ocfs2: Fix use of slab data with sendpage
    https://git.kernel.org/netdev/net-next/c/86d7bd6e66e9
  - [net-next,v5,14/16] ocfs2: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage()
    https://git.kernel.org/netdev/net-next/c/e52828cc0109
  - [net-next,v5,15/16] sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES)
    https://git.kernel.org/netdev/net-next/c/dc97391e6610
  - [net-next,v5,16/16] net: Kill MSG_SENDPAGE_NOTLAST
    https://git.kernel.org/netdev/net-next/c/b848b26c6672

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next v5 03/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage
  2023-06-23 22:55 ` [PATCH net-next v5 03/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage David Howells
@ 2023-06-25 12:20   ` Ilya Dryomov
  2023-06-26 14:00   ` David Howells
  2023-06-26 15:12   ` David Howells
  2 siblings, 0 replies; 29+ messages in thread
From: Ilya Dryomov @ 2023-06-25 12:20 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel, Xiubo Li,
	Jeff Layton, ceph-devel

On Sat, Jun 24, 2023 at 12:55 AM David Howells <dhowells@redhat.com> wrote:
>
> Use sendmsg() and MSG_SPLICE_PAGES rather than sendpage in ceph when
> transmitting data.  For the moment, this can only transmit one page at a
> time because of the architecture of net/ceph/, but if
> write_partial_message_data() can be given a bvec[] at a time by the
> iteration code, this would allow pages to be sent in a batch.
>
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Ilya Dryomov <idryomov@gmail.com>
> cc: Xiubo Li <xiubli@redhat.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: "David S. Miller" <davem@davemloft.net>
> cc: Eric Dumazet <edumazet@google.com>
> cc: Jakub Kicinski <kuba@kernel.org>
> cc: Paolo Abeni <pabeni@redhat.com>
> cc: Jens Axboe <axboe@kernel.dk>
> cc: Matthew Wilcox <willy@infradead.org>
> cc: ceph-devel@vger.kernel.org
> cc: netdev@vger.kernel.org
> ---
>
> Notes:
>     ver #5)
>      - Switch condition for setting MSG_MORE in write_partial_message_data()
>
>  net/ceph/messenger_v1.c | 60 ++++++++++++++---------------------------
>  1 file changed, 20 insertions(+), 40 deletions(-)
>
> diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c
> index d664cb1593a7..814579f27f04 100644
> --- a/net/ceph/messenger_v1.c
> +++ b/net/ceph/messenger_v1.c
> @@ -74,37 +74,6 @@ static int ceph_tcp_sendmsg(struct socket *sock, struct kvec *iov,
>         return r;
>  }
>
> -/*
> - * @more: either or both of MSG_MORE and MSG_SENDPAGE_NOTLAST
> - */
> -static int ceph_tcp_sendpage(struct socket *sock, struct page *page,
> -                            int offset, size_t size, int more)
> -{
> -       ssize_t (*sendpage)(struct socket *sock, struct page *page,
> -                           int offset, size_t size, int flags);
> -       int flags = MSG_DONTWAIT | MSG_NOSIGNAL | more;
> -       int ret;
> -
> -       /*
> -        * sendpage cannot properly handle pages with page_count == 0,
> -        * we need to fall back to sendmsg if that's the case.
> -        *
> -        * Same goes for slab pages: skb_can_coalesce() allows
> -        * coalescing neighboring slab objects into a single frag which
> -        * triggers one of hardened usercopy checks.
> -        */
> -       if (sendpage_ok(page))
> -               sendpage = sock->ops->sendpage;
> -       else
> -               sendpage = sock_no_sendpage;
> -
> -       ret = sendpage(sock, page, offset, size, flags);
> -       if (ret == -EAGAIN)
> -               ret = 0;
> -
> -       return ret;
> -}
> -
>  static void con_out_kvec_reset(struct ceph_connection *con)
>  {
>         BUG_ON(con->v1.out_skip);
> @@ -464,7 +433,6 @@ static int write_partial_message_data(struct ceph_connection *con)
>         struct ceph_msg *msg = con->out_msg;
>         struct ceph_msg_data_cursor *cursor = &msg->cursor;
>         bool do_datacrc = !ceph_test_opt(from_msgr(con->msgr), NOCRC);
> -       int more = MSG_MORE | MSG_SENDPAGE_NOTLAST;
>         u32 crc;
>
>         dout("%s %p msg %p\n", __func__, con, msg);
> @@ -482,6 +450,10 @@ static int write_partial_message_data(struct ceph_connection *con)
>          */
>         crc = do_datacrc ? le32_to_cpu(msg->footer.data_crc) : 0;
>         while (cursor->total_resid) {
> +               struct bio_vec bvec;
> +               struct msghdr msghdr = {
> +                       .msg_flags = MSG_SPLICE_PAGES,

Hi David,

This appears to be losing MSG_DONTWAIT | MSG_NOSIGNAL flags which were
set previously?

> +               };
>                 struct page *page;
>                 size_t page_offset;
>                 size_t length;
> @@ -493,10 +465,13 @@ static int write_partial_message_data(struct ceph_connection *con)
>                 }
>
>                 page = ceph_msg_data_next(cursor, &page_offset, &length);
> -               if (length == cursor->total_resid)
> -                       more = MSG_MORE;
> -               ret = ceph_tcp_sendpage(con->sock, page, page_offset, length,
> -                                       more);
> +               if (length != cursor->total_resid)
> +                       msghdr.msg_flags |= MSG_MORE;
> +
> +               bvec_set_page(&bvec, page, length, page_offset);
> +               iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, length);
> +
> +               ret = sock_sendmsg(con->sock, &msghdr);
>                 if (ret <= 0) {

And this is losing munging -EAGAIN -> 0?

>                         if (do_datacrc)
>                                 msg->footer.data_crc = cpu_to_le32(crc);
> @@ -526,7 +501,10 @@ static int write_partial_message_data(struct ceph_connection *con)
>   */
>  static int write_partial_skip(struct ceph_connection *con)
>  {
> -       int more = MSG_MORE | MSG_SENDPAGE_NOTLAST;
> +       struct bio_vec bvec;
> +       struct msghdr msghdr = {
> +               .msg_flags = MSG_SPLICE_PAGES | MSG_MORE,
> +       };
>         int ret;
>
>         dout("%s %p %d left\n", __func__, con, con->v1.out_skip);
> @@ -534,9 +512,11 @@ static int write_partial_skip(struct ceph_connection *con)
>                 size_t size = min(con->v1.out_skip, (int)PAGE_SIZE);
>
>                 if (size == con->v1.out_skip)
> -                       more = MSG_MORE;
> -               ret = ceph_tcp_sendpage(con->sock, ceph_zero_page, 0, size,
> -                                       more);
> +                       msghdr.msg_flags &= ~MSG_MORE;
> +               bvec_set_page(&bvec, ZERO_PAGE(0), size, 0);
> +               iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, size);
> +
> +               ret = sock_sendmsg(con->sock, &msghdr);
>                 if (ret <= 0)

Same here...  I would suggest that you keep ceph_tcp_sendpage() function
and make only minimal modifications to avoid regressions.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next v5 04/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage()
  2023-06-23 22:55 ` [PATCH net-next v5 04/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage() David Howells
@ 2023-06-25 12:34   ` Ilya Dryomov
  2023-06-26 15:30   ` David Howells
  1 sibling, 0 replies; 29+ messages in thread
From: Ilya Dryomov @ 2023-06-25 12:34 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel, Xiubo Li,
	Jeff Layton, ceph-devel

On Sat, Jun 24, 2023 at 12:55 AM David Howells <dhowells@redhat.com> wrote:
>
> Use sendmsg() and MSG_SPLICE_PAGES rather than sendpage in ceph when
> transmitting data.  For the moment, this can only transmit one page at a
> time because of the architecture of net/ceph/, but if
> write_partial_message_data() can be given a bvec[] at a time by the

Hi David,

write_partial_message_data() is net/ceph/messenger_v1.c specific, so it
doesn't apply here.  I would suggest squashing the two net/ceph patches
into one since even the titles are the same.

Also, we tend to use "libceph: " prefix for net/ceph changes.

> iteration code, this would allow pages to be sent in a batch.
>
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Ilya Dryomov <idryomov@gmail.com>
> cc: Xiubo Li <xiubli@redhat.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: "David S. Miller" <davem@davemloft.net>
> cc: Eric Dumazet <edumazet@google.com>
> cc: Jakub Kicinski <kuba@kernel.org>
> cc: Paolo Abeni <pabeni@redhat.com>
> cc: Jens Axboe <axboe@kernel.dk>
> cc: Matthew Wilcox <willy@infradead.org>
> cc: ceph-devel@vger.kernel.org
> cc: netdev@vger.kernel.org
> ---
>  net/ceph/messenger_v2.c | 91 +++++++++--------------------------------
>  1 file changed, 19 insertions(+), 72 deletions(-)
>
> diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
> index 301a991dc6a6..87ac97073e75 100644
> --- a/net/ceph/messenger_v2.c
> +++ b/net/ceph/messenger_v2.c
> @@ -117,91 +117,38 @@ static int ceph_tcp_recv(struct ceph_connection *con)
>         return ret;
>  }
>
> -static int do_sendmsg(struct socket *sock, struct iov_iter *it)
> -{
> -       struct msghdr msg = { .msg_flags = CEPH_MSG_FLAGS };
> -       int ret;
> -
> -       msg.msg_iter = *it;
> -       while (iov_iter_count(it)) {
> -               ret = sock_sendmsg(sock, &msg);
> -               if (ret <= 0) {
> -                       if (ret == -EAGAIN)
> -                               ret = 0;
> -                       return ret;
> -               }
> -
> -               iov_iter_advance(it, ret);
> -       }
> -
> -       WARN_ON(msg_data_left(&msg));
> -       return 1;
> -}
> -
> -static int do_try_sendpage(struct socket *sock, struct iov_iter *it)
> -{
> -       struct msghdr msg = { .msg_flags = CEPH_MSG_FLAGS };
> -       struct bio_vec bv;
> -       int ret;
> -
> -       if (WARN_ON(!iov_iter_is_bvec(it)))
> -               return -EINVAL;
> -
> -       while (iov_iter_count(it)) {
> -               /* iov_iter_iovec() for ITER_BVEC */
> -               bvec_set_page(&bv, it->bvec->bv_page,
> -                             min(iov_iter_count(it),
> -                                 it->bvec->bv_len - it->iov_offset),
> -                             it->bvec->bv_offset + it->iov_offset);
> -
> -               /*
> -                * sendpage cannot properly handle pages with
> -                * page_count == 0, we need to fall back to sendmsg if
> -                * that's the case.
> -                *
> -                * Same goes for slab pages: skb_can_coalesce() allows
> -                * coalescing neighboring slab objects into a single frag
> -                * which triggers one of hardened usercopy checks.
> -                */
> -               if (sendpage_ok(bv.bv_page)) {
> -                       ret = sock->ops->sendpage(sock, bv.bv_page,
> -                                                 bv.bv_offset, bv.bv_len,
> -                                                 CEPH_MSG_FLAGS);
> -               } else {
> -                       iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bv, 1, bv.bv_len);
> -                       ret = sock_sendmsg(sock, &msg);
> -               }
> -               if (ret <= 0) {
> -                       if (ret == -EAGAIN)
> -                               ret = 0;
> -                       return ret;
> -               }
> -
> -               iov_iter_advance(it, ret);
> -       }
> -
> -       return 1;
> -}
> -
>  /*
>   * Write as much as possible.  The socket is expected to be corked,
> - * so we don't bother with MSG_MORE/MSG_SENDPAGE_NOTLAST here.
> + * so we don't bother with MSG_MORE here.
>   *
>   * Return:
> - *   1 - done, nothing (else) to write
> + *  >0 - done, nothing (else) to write

It would be nice to avoid making tweaks like this to the outer
interface as part of switching to a new internal API.

>   *   0 - socket is full, need to wait
>   *  <0 - error
>   */
>  static int ceph_tcp_send(struct ceph_connection *con)
>  {
> +       struct msghdr msg = {
> +               .msg_iter       = con->v2.out_iter,
> +               .msg_flags      = CEPH_MSG_FLAGS,
> +       };
>         int ret;
>
> +       if (WARN_ON(!iov_iter_is_bvec(&con->v2.out_iter)))
> +               return -EINVAL;

Previously, this WARN_ON + error applied only to the "try sendpage"
path.  There is a ton of kvec usage in net/ceph/messenger_v2.c, so I'm
pretty sure that placing it here breaks everything.

> +
> +       if (con->v2.out_iter_sendpage)
> +               msg.msg_flags |= MSG_SPLICE_PAGES;
> +
>         dout("%s con %p have %zu try_sendpage %d\n", __func__, con,
>              iov_iter_count(&con->v2.out_iter), con->v2.out_iter_sendpage);
> -       if (con->v2.out_iter_sendpage)
> -               ret = do_try_sendpage(con->sock, &con->v2.out_iter);
> -       else
> -               ret = do_sendmsg(con->sock, &con->v2.out_iter);
> +
> +       ret = sock_sendmsg(con->sock, &msg);
> +       if (ret > 0)
> +               iov_iter_advance(&con->v2.out_iter, ret);
> +       else if (ret == -EAGAIN)
> +               ret = 0;

Hrm, is sock_sendmsg() now guaranteed to exhaust the iterator (i.e.
a "short write" is no longer possible)?  Unless that is the case, this
is not an equivalent transformation.

This is actually the reason for

>   * Return:
>   *   1 - done, nothing (else) to write

specification which you also tweaked.  It doesn't make sense for
ceph_tcp_send() to return the number of bytes sent because the caller
expects everything to be sent when a positive number is returned.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next v5 03/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage
  2023-06-23 22:55 ` [PATCH net-next v5 03/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage David Howells
  2023-06-25 12:20   ` Ilya Dryomov
@ 2023-06-26 14:00   ` David Howells
  2023-06-26 15:41     ` Ilya Dryomov
  2023-06-26 16:44     ` David Howells
  2023-06-26 15:12   ` David Howells
  2 siblings, 2 replies; 29+ messages in thread
From: David Howells @ 2023-06-26 14:00 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: dhowells, netdev, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel, Xiubo Li,
	Jeff Layton, ceph-devel

Ilya Dryomov <idryomov@gmail.com> wrote:

> Same here...  I would suggest that you keep ceph_tcp_sendpage() function
> and make only minimal modifications to avoid regressions.

This is now committed to net-next.  I can bring ceph_tcp_sendpage() back into
existence or fix it in place for now if you have a preference.

Note that I'm working on patches to rework the libceph transmission path so
that it isn't dealing with transmitting a single page at a time, but it's not
ready yet.

David


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next v5 03/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage
  2023-06-23 22:55 ` [PATCH net-next v5 03/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage David Howells
  2023-06-25 12:20   ` Ilya Dryomov
  2023-06-26 14:00   ` David Howells
@ 2023-06-26 15:12   ` David Howells
  2023-06-26 15:52     ` Ilya Dryomov
  2 siblings, 1 reply; 29+ messages in thread
From: David Howells @ 2023-06-26 15:12 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: dhowells, netdev, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel, Xiubo Li,
	Jeff Layton, ceph-devel

Ilya Dryomov <idryomov@gmail.com> wrote:

> > -       int flags = MSG_DONTWAIT | MSG_NOSIGNAL | more;

Btw, why are you setting MSG_DONTWAIT?  If you're in the middle of
transmitting a message on a TCP socket, surely you can't just switch to
transmitting a different message on the same socket without doing some sort of
reframing?

David


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next v5 04/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage()
  2023-06-23 22:55 ` [PATCH net-next v5 04/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage() David Howells
  2023-06-25 12:34   ` Ilya Dryomov
@ 2023-06-26 15:30   ` David Howells
  2023-06-26 16:07     ` Ilya Dryomov
  2023-06-26 17:01     ` David Howells
  1 sibling, 2 replies; 29+ messages in thread
From: David Howells @ 2023-06-26 15:30 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: dhowells, netdev, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel, Xiubo Li,
	Jeff Layton, ceph-devel

Ilya Dryomov <idryomov@gmail.com> wrote:

> write_partial_message_data() is net/ceph/messenger_v1.c specific, so it
> doesn't apply here.  I would suggest squashing the two net/ceph patches
> into one since even the titles are the same.

I would, but they're now applied to net-next, so we need to patch that.

> >   * Write as much as possible.  The socket is expected to be corked,
> > - * so we don't bother with MSG_MORE/MSG_SENDPAGE_NOTLAST here.
> > + * so we don't bother with MSG_MORE here.
> >   *
> >   * Return:
> > - *   1 - done, nothing (else) to write
> > + *  >0 - done, nothing (else) to write
> 
> It would be nice to avoid making tweaks like this to the outer
> interface as part of switching to a new internal API.

Ok.  I'll change that and wrap the sendmsg in a loop.  Though, as I asked in
an earlier reply, why is MSG_DONTWAIT used here?

> > +       if (WARN_ON(!iov_iter_is_bvec(&con->v2.out_iter)))
> > +               return -EINVAL;
> 
> Previously, this WARN_ON + error applied only to the "try sendpage"
> path.  There is a ton of kvec usage in net/ceph/messenger_v2.c, so I'm
> pretty sure that placing it here breaks everything.

This should have been removed as MSG_SPLICE_PAGES now accepts KVEC and XARRAY
iterators also.

Btw, is it feasible to use con->v2.out_iter_sendpage to apply MSG_SPLICE_PAGES
to the iterator to be transmitted as a whole?  It seems to be set depending on
iterator type.

David


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next v5 03/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage
  2023-06-26 14:00   ` David Howells
@ 2023-06-26 15:41     ` Ilya Dryomov
  2023-06-26 16:44     ` David Howells
  1 sibling, 0 replies; 29+ messages in thread
From: Ilya Dryomov @ 2023-06-26 15:41 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel, Xiubo Li,
	Jeff Layton, ceph-devel

On Mon, Jun 26, 2023 at 4:00 PM David Howells <dhowells@redhat.com> wrote:
>
> Ilya Dryomov <idryomov@gmail.com> wrote:
>
> > Same here...  I would suggest that you keep ceph_tcp_sendpage() function
> > and make only minimal modifications to avoid regressions.
>
> This is now committed to net-next.

This needs to be dropped from linux-next because both this and
especially the other (net/ceph/messenger_v2.c) patch introduce
regressions.

> I can bring ceph_tcp_sendpage() back into
> existence or fix it in place for now if you have a preference.

I already mentioned that I would prefer if ceph_tcp_sendpage() was
brought back into existence.

>
> Note that I'm working on patches to rework the libceph transmission path so
> that it isn't dealing with transmitting a single page at a time, but it's not
> ready yet.

That is a worthwhile improvement now that sock_sendmsg() can take
advantage of multiple pages!  It would be pretty invasive though so
I think ceph_tcp_sendpage() is better to remain in place until then.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next v5 03/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage
  2023-06-26 15:12   ` David Howells
@ 2023-06-26 15:52     ` Ilya Dryomov
  0 siblings, 0 replies; 29+ messages in thread
From: Ilya Dryomov @ 2023-06-26 15:52 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel, Xiubo Li,
	Jeff Layton, ceph-devel

On Mon, Jun 26, 2023 at 5:12 PM David Howells <dhowells@redhat.com> wrote:
>
> Ilya Dryomov <idryomov@gmail.com> wrote:
>
> > > -       int flags = MSG_DONTWAIT | MSG_NOSIGNAL | more;
>
> Btw, why are you setting MSG_DONTWAIT?  If you're in the middle of
> transmitting a message on a TCP socket, surely you can't just switch to
> transmitting a different message on the same socket without doing some sort of
> reframing?

We don't want to hog kworker threads.  You are correct that we can't
switch to transmitting a different message on the same socket but Ceph
is massively parallel and there can be dozens or even hundreds of other
sockets to work on.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next v5 04/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage()
  2023-06-26 15:30   ` David Howells
@ 2023-06-26 16:07     ` Ilya Dryomov
  2023-06-26 17:01     ` David Howells
  1 sibling, 0 replies; 29+ messages in thread
From: Ilya Dryomov @ 2023-06-26 16:07 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel, Xiubo Li,
	Jeff Layton, ceph-devel

On Mon, Jun 26, 2023 at 5:30 PM David Howells <dhowells@redhat.com> wrote:
>
> Ilya Dryomov <idryomov@gmail.com> wrote:
>
> > write_partial_message_data() is net/ceph/messenger_v1.c specific, so it
> > doesn't apply here.  I would suggest squashing the two net/ceph patches
> > into one since even the titles are the same.
>
> I would, but they're now applied to net-next, so we need to patch that.

I don't see a problem with that given that the patches themselves have
major issues (i.e. it's not just a commit message/title nit).

>
> > >   * Write as much as possible.  The socket is expected to be corked,
> > > - * so we don't bother with MSG_MORE/MSG_SENDPAGE_NOTLAST here.
> > > + * so we don't bother with MSG_MORE here.
> > >   *
> > >   * Return:
> > > - *   1 - done, nothing (else) to write
> > > + *  >0 - done, nothing (else) to write
> >
> > It would be nice to avoid making tweaks like this to the outer
> > interface as part of switching to a new internal API.
>
> Ok.  I'll change that and wrap the sendmsg in a loop.  Though, as I asked in
> an earlier reply, why is MSG_DONTWAIT used here?

See my reply there.

>
> > > +       if (WARN_ON(!iov_iter_is_bvec(&con->v2.out_iter)))
> > > +               return -EINVAL;
> >
> > Previously, this WARN_ON + error applied only to the "try sendpage"
> > path.  There is a ton of kvec usage in net/ceph/messenger_v2.c, so I'm
> > pretty sure that placing it here breaks everything.
>
> This should have been removed as MSG_SPLICE_PAGES now accepts KVEC and XARRAY
> iterators also.
>
> Btw, is it feasible to use con->v2.out_iter_sendpage to apply MSG_SPLICE_PAGES
> to the iterator to be transmitted as a whole?  It seems to be set depending on
> iterator type.

I'm not sure I understand what you mean by "transmitted as a whole".
con->v2.out_iter_sendpage is set only when zerocopy is desired.  If the
underlying data is not guaranteed to remain stable, zerocopy behavior
is not safe.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next v5 03/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage
  2023-06-26 14:00   ` David Howells
  2023-06-26 15:41     ` Ilya Dryomov
@ 2023-06-26 16:44     ` David Howells
  1 sibling, 0 replies; 29+ messages in thread
From: David Howells @ 2023-06-26 16:44 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: dhowells, netdev, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel, Xiubo Li,
	Jeff Layton, ceph-devel

Ilya Dryomov <idryomov@gmail.com> wrote:

> > This is now committed to net-next.
> 
> This needs to be dropped from linux-next because both this and
> especially the other (net/ceph/messenger_v2.c) patch introduce
> regressions.

net-next, not linux-next.  I'm not sure they drop things from there rather
than reverting them.

David


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next v5 04/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage()
  2023-06-26 15:30   ` David Howells
  2023-06-26 16:07     ` Ilya Dryomov
@ 2023-06-26 17:01     ` David Howells
  1 sibling, 0 replies; 29+ messages in thread
From: David Howells @ 2023-06-26 17:01 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: dhowells, netdev, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel, Xiubo Li,
	Jeff Layton, ceph-devel

Ilya Dryomov <idryomov@gmail.com> wrote:

> > Btw, is it feasible to use con->v2.out_iter_sendpage to apply
> > MSG_SPLICE_PAGES to the iterator to be transmitted as a whole?  It seems
> > to be set depending on iterator type.
> 
> I'm not sure I understand what you mean by "transmitted as a whole".
> con->v2.out_iter_sendpage is set only when zerocopy is desired.  If the
> underlying data is not guaranteed to remain stable, zerocopy behavior
> is not safe.

I think I need to reinstate the per-page sendpage_ok() check here also -
though Al pointed out it isn't sufficiently exhaustive.  There are pages that
sendpage_ok() will return true on that you shouldn't be passing to sendpage().

I'll whip up a patch to partially revert this also.

David


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next v5 11/16] scsi: iscsi_tcp: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage
  2023-06-23 22:55 ` [PATCH net-next v5 11/16] scsi: iscsi_tcp: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage David Howells
@ 2023-06-27 19:02   ` Chris Leech
  0 siblings, 0 replies; 29+ messages in thread
From: Chris Leech @ 2023-06-27 19:02 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, Alexander Duyck, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, David Ahern,
	Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel,
	Mike Christie, Lee Duncan, James E.J. Bottomley,
	Martin K. Petersen, Al Viro, open-iscsi, linux-scsi,
	target-devel

On Fri, Jun 23, 2023 at 11:55:08PM +0100, David Howells wrote:
> Use sendmsg() with MSG_SPLICE_PAGES rather than sendpage.  This allows
> multiple pages and multipage folios to be passed through.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> Reviewed-by: Mike Christie <michael.christie@oracle.com>
> cc: Lee Duncan <lduncan@suse.com>
> cc: Chris Leech <cleech@redhat.com>
> cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
> cc: "Martin K. Petersen" <martin.petersen@oracle.com>
> cc: "David S. Miller" <davem@davemloft.net>
> cc: Eric Dumazet <edumazet@google.com>
> cc: Jakub Kicinski <kuba@kernel.org>
> cc: Paolo Abeni <pabeni@redhat.com>
> cc: Jens Axboe <axboe@kernel.dk>
> cc: Matthew Wilcox <willy@infradead.org>
> cc: Al Viro <viro@zeniv.linux.org.uk>
> cc: open-iscsi@googlegroups.com
> cc: linux-scsi@vger.kernel.org
> cc: target-devel@vger.kernel.org
> cc: netdev@vger.kernel.org
> ---
> 
> Notes:
>     ver #5)
>      - Split iscsi changes into client and target patches
> 
>  drivers/scsi/iscsi_tcp.c | 26 ++++++++++----------------
>  drivers/scsi/iscsi_tcp.h |  2 --
>  2 files changed, 10 insertions(+), 18 deletions(-)

This seems good to me.

Reviewed-by: Chris Leech <cleech@redhat.com>
 
> diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
> index 9637d4bc2bc9..9ab8555180a3 100644
> --- a/drivers/scsi/iscsi_tcp.c
> +++ b/drivers/scsi/iscsi_tcp.c
> @@ -301,35 +301,32 @@ static int iscsi_sw_tcp_xmit_segment(struct iscsi_tcp_conn *tcp_conn,
>  
>  	while (!iscsi_tcp_segment_done(tcp_conn, segment, 0, r)) {
>  		struct scatterlist *sg;
> +		struct msghdr msg = {};
> +		struct bio_vec bv;
>  		unsigned int offset, copy;
> -		int flags = 0;
>  
>  		r = 0;
>  		offset = segment->copied;
>  		copy = segment->size - offset;
>  
>  		if (segment->total_copied + segment->size < segment->total_size)
> -			flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST;
> +			msg.msg_flags |= MSG_MORE;
>  
>  		if (tcp_sw_conn->queue_recv)
> -			flags |= MSG_DONTWAIT;
> +			msg.msg_flags |= MSG_DONTWAIT;
>  
> -		/* Use sendpage if we can; else fall back to sendmsg */
>  		if (!segment->data) {
> +			if (!tcp_conn->iscsi_conn->datadgst_en)
> +				msg.msg_flags |= MSG_SPLICE_PAGES;
>  			sg = segment->sg;
>  			offset += segment->sg_offset + sg->offset;
> -			r = tcp_sw_conn->sendpage(sk, sg_page(sg), offset,
> -						  copy, flags);
> +			bvec_set_page(&bv, sg_page(sg), copy, offset);
>  		} else {
> -			struct msghdr msg = { .msg_flags = flags };
> -			struct kvec iov = {
> -				.iov_base = segment->data + offset,
> -				.iov_len = copy
> -			};
> -
> -			r = kernel_sendmsg(sk, &msg, &iov, 1, copy);
> +			bvec_set_virt(&bv, segment->data + offset, copy);
>  		}
> +		iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bv, 1, copy);
>  
> +		r = sock_sendmsg(sk, &msg);
>  		if (r < 0) {
>  			iscsi_tcp_segment_unmap(segment);
>  			return r;
> @@ -746,7 +743,6 @@ iscsi_sw_tcp_conn_bind(struct iscsi_cls_session *cls_session,
>  	sock_no_linger(sk);
>  
>  	iscsi_sw_tcp_conn_set_callbacks(conn);
> -	tcp_sw_conn->sendpage = tcp_sw_conn->sock->ops->sendpage;
>  	/*
>  	 * set receive state machine into initial state
>  	 */
> @@ -777,8 +773,6 @@ static int iscsi_sw_tcp_conn_set_param(struct iscsi_cls_conn *cls_conn,
>  			return -ENOTCONN;
>  		}
>  		iscsi_set_param(cls_conn, param, buf, buflen);
> -		tcp_sw_conn->sendpage = conn->datadgst_en ?
> -			sock_no_sendpage : tcp_sw_conn->sock->ops->sendpage;
>  		mutex_unlock(&tcp_sw_conn->sock_lock);
>  		break;
>  	case ISCSI_PARAM_MAX_R2T:
> diff --git a/drivers/scsi/iscsi_tcp.h b/drivers/scsi/iscsi_tcp.h
> index 68e14a344904..89a6fc552f0b 100644
> --- a/drivers/scsi/iscsi_tcp.h
> +++ b/drivers/scsi/iscsi_tcp.h
> @@ -47,8 +47,6 @@ struct iscsi_sw_tcp_conn {
>  	/* MIB custom statistics */
>  	uint32_t		sendpage_failures_cnt;
>  	uint32_t		discontiguous_hdr_cnt;
> -
> -	ssize_t (*sendpage)(struct socket *, struct page *, int, size_t, int);
>  };
>  
>  struct iscsi_sw_tcp_host {
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-06-27 19:04 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-23 22:54 [PATCH net-next v5 00/16] splice, net: Switch over users of sendpage() and remove it David Howells
2023-06-23 22:54 ` [PATCH net-next v5 01/16] tcp_bpf, smc, tls, espintcp, siw: Reduce MSG_SENDPAGE_NOTLAST usage David Howells
2023-06-23 22:54 ` [PATCH net-next v5 02/16] net: Use sendmsg(MSG_SPLICE_PAGES) not sendpage in skb_send_sock() David Howells
2023-06-23 22:55 ` [PATCH net-next v5 03/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage David Howells
2023-06-25 12:20   ` Ilya Dryomov
2023-06-26 14:00   ` David Howells
2023-06-26 15:41     ` Ilya Dryomov
2023-06-26 16:44     ` David Howells
2023-06-26 15:12   ` David Howells
2023-06-26 15:52     ` Ilya Dryomov
2023-06-23 22:55 ` [PATCH net-next v5 04/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage() David Howells
2023-06-25 12:34   ` Ilya Dryomov
2023-06-26 15:30   ` David Howells
2023-06-26 16:07     ` Ilya Dryomov
2023-06-26 17:01     ` David Howells
2023-06-23 22:55 ` [PATCH net-next v5 05/16] rds: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage David Howells
2023-06-23 22:55 ` [PATCH net-next v5 06/16] dlm: " David Howells
2023-06-23 22:55 ` [PATCH net-next v5 07/16] nvme-tcp: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage David Howells
2023-06-23 22:55 ` [PATCH net-next v5 08/16] nvmet-tcp: " David Howells
2023-06-23 22:55 ` [PATCH net-next v5 09/16] smc: Drop smc_sendpage() in favour of smc_sendmsg() + MSG_SPLICE_PAGES David Howells
2023-06-23 22:55 ` [PATCH net-next v5 10/16] drbd: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage() David Howells
2023-06-23 22:55 ` [PATCH net-next v5 11/16] scsi: iscsi_tcp: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage David Howells
2023-06-27 19:02   ` Chris Leech
2023-06-23 22:55 ` [PATCH net-next v5 12/16] scsi: target: iscsi: " David Howells
2023-06-23 22:55 ` [PATCH net-next v5 13/16] ocfs2: Fix use of slab data with sendpage David Howells
2023-06-23 22:55 ` [PATCH net-next v5 14/16] ocfs2: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage() David Howells
2023-06-23 22:55 ` [PATCH net-next v5 15/16] sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) David Howells
2023-06-23 22:55 ` [PATCH net-next v5 16/16] net: Kill MSG_SENDPAGE_NOTLAST David Howells
2023-06-24 23:00 ` [PATCH net-next v5 00/16] splice, net: Switch over users of sendpage() and remove it patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).