From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Chuck Lever <chuck.lever@oracle.com>,
Daire Byrne <daire@dneg.com>, Sasha Levin <sashal@kernel.org>,
linux-nfs@vger.kernel.org, netdev@vger.kernel.org
Subject: [PATCH AUTOSEL 5.10 03/45] SUNRPC: Handle TCP socket sends with kernel_sendpage() again
Date: Tue, 19 Jan 2021 20:25:20 -0500 [thread overview]
Message-ID: <20210120012602.769683-3-sashal@kernel.org> (raw)
In-Reply-To: <20210120012602.769683-1-sashal@kernel.org>
From: Chuck Lever <chuck.lever@oracle.com>
[ Upstream commit 4a85a6a3320b4a622315d2e0ea91a1d2b013bce4 ]
Daire Byrne reports a ~50% aggregrate throughput regression on his
Linux NFS server after commit da1661b93bf4 ("SUNRPC: Teach server to
use xprt_sock_sendmsg for socket sends"), which replaced
kernel_send_page() calls in NFSD's socket send path with calls to
sock_sendmsg() using iov_iter.
Investigation showed that tcp_sendmsg() was not using zero-copy to
send the xdr_buf's bvec pages, but instead was relying on memcpy.
This means copying every byte of a large NFS READ payload.
It looks like TLS sockets do indeed support a ->sendpage method,
so it's really not necessary to use xprt_sock_sendmsg() to support
TLS fully on the server. A mechanical reversion of da1661b93bf4 is
not possible at this point, but we can re-implement the server's
TCP socket sendmsg path using kernel_sendpage().
Reported-by: Daire Byrne <daire@dneg.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209439
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/sunrpc/svcsock.c | 86 +++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 85 insertions(+), 1 deletion(-)
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index c2752e2b9ce34..4404c491eb388 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1062,6 +1062,90 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp)
return 0; /* record not complete */
}
+static int svc_tcp_send_kvec(struct socket *sock, const struct kvec *vec,
+ int flags)
+{
+ return kernel_sendpage(sock, virt_to_page(vec->iov_base),
+ offset_in_page(vec->iov_base),
+ vec->iov_len, flags);
+}
+
+/*
+ * kernel_sendpage() is used exclusively to reduce the number of
+ * copy operations in this path. Therefore the caller must ensure
+ * that the pages backing @xdr are unchanging.
+ *
+ * In addition, the logic assumes that * .bv_len is never larger
+ * than PAGE_SIZE.
+ */
+static int svc_tcp_sendmsg(struct socket *sock, struct msghdr *msg,
+ struct xdr_buf *xdr, rpc_fraghdr marker,
+ unsigned int *sentp)
+{
+ const struct kvec *head = xdr->head;
+ const struct kvec *tail = xdr->tail;
+ struct kvec rm = {
+ .iov_base = &marker,
+ .iov_len = sizeof(marker),
+ };
+ int flags, ret;
+
+ *sentp = 0;
+ xdr_alloc_bvec(xdr, GFP_KERNEL);
+
+ msg->msg_flags = MSG_MORE;
+ ret = kernel_sendmsg(sock, msg, &rm, 1, rm.iov_len);
+ if (ret < 0)
+ return ret;
+ *sentp += ret;
+ if (ret != rm.iov_len)
+ return -EAGAIN;
+
+ flags = head->iov_len < xdr->len ? MSG_MORE | MSG_SENDPAGE_NOTLAST : 0;
+ ret = svc_tcp_send_kvec(sock, head, flags);
+ if (ret < 0)
+ return ret;
+ *sentp += ret;
+ if (ret != head->iov_len)
+ goto out;
+
+ if (xdr->page_len) {
+ unsigned int offset, len, remaining;
+ struct bio_vec *bvec;
+
+ bvec = xdr->bvec;
+ offset = xdr->page_base;
+ remaining = xdr->page_len;
+ flags = MSG_MORE | MSG_SENDPAGE_NOTLAST;
+ while (remaining > 0) {
+ if (remaining <= PAGE_SIZE && tail->iov_len == 0)
+ flags = 0;
+ len = min(remaining, bvec->bv_len);
+ ret = kernel_sendpage(sock, bvec->bv_page,
+ bvec->bv_offset + offset,
+ len, flags);
+ if (ret < 0)
+ return ret;
+ *sentp += ret;
+ if (ret != len)
+ goto out;
+ remaining -= len;
+ offset = 0;
+ bvec++;
+ }
+ }
+
+ if (tail->iov_len) {
+ ret = svc_tcp_send_kvec(sock, tail, 0);
+ if (ret < 0)
+ return ret;
+ *sentp += ret;
+ }
+
+out:
+ return 0;
+}
+
/**
* svc_tcp_sendto - Send out a reply on a TCP socket
* @rqstp: completed svc_rqst
@@ -1089,7 +1173,7 @@ static int svc_tcp_sendto(struct svc_rqst *rqstp)
mutex_lock(&xprt->xpt_mutex);
if (svc_xprt_is_dead(xprt))
goto out_notconn;
- err = xprt_sock_sendmsg(svsk->sk_sock, &msg, xdr, 0, marker, &sent);
+ err = svc_tcp_sendmsg(svsk->sk_sock, &msg, xdr, marker, &sent);
xdr_free_bvec(xdr);
trace_svcsock_tcp_send(xprt, err < 0 ? err : sent);
if (err < 0 || sent != (xdr->len + sizeof(marker)))
--
2.27.0
next prev parent reply other threads:[~2021-01-20 1:29 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-20 1:25 [PATCH AUTOSEL 5.10 01/45] ASoC: Intel: haswell: Add missing pm_ops Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 02/45] ASoC: rt711: mutex between calibration and power state changes Sasha Levin
2021-01-20 1:25 ` Sasha Levin [this message]
2021-02-08 19:34 ` [PATCH AUTOSEL 5.10 03/45] SUNRPC: Handle TCP socket sends with kernel_sendpage() again Trond Myklebust
2021-02-08 19:48 ` Chuck Lever
2021-02-08 20:12 ` Trond Myklebust
2021-02-08 20:17 ` Chuck Lever
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 04/45] ASoC: AMD Renoir - add DMI entry for Lenovo ThinkPad E14 Gen 2 Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 05/45] HID: multitouch: Enable multi-input for Synaptics pointstick/touchpad device Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 06/45] HID: sony: select CONFIG_CRC32 Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 07/45] dm integrity: select CRYPTO_SKCIPHER Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 08/45] x86/hyperv: Fix kexec panic/hang issues Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 09/45] scsi: ufs: Relax the condition of UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 10/45] scsi: ufs: Correct the LUN used in eh_device_reset_handler() callback Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 11/45] scsi: qedi: Correct max length of CHAP secret Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 12/45] scsi: scsi_debug: Fix memleak in scsi_debug_init() Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 13/45] scsi: sd: Suppress spurious errors when WRITE SAME is being disabled Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 14/45] riscv: Fix kernel time_init() Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 15/45] riscv: Fix sifive serial driver Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 16/45] riscv: Enable interrupts during syscalls with M-Mode Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 17/45] HID: logitech-dj: add the G602 receiver Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 18/45] HID: Ignore battery for Elan touchscreen on ASUS UX550 Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 19/45] clk: tegra30: Add hda clock default rates to clock driver Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 20/45] ALSA: hda/tegra: fix tegra-hda on tegra30 soc Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 21/45] r8152: Add Lenovo Powered USB-C Travel Hub Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 22/45] riscv: cacheinfo: Fix using smp_processor_id() in preemptible Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 23/45] arm64: make atomic helpers __always_inline Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 24/45] xen: Fix event channel callback via INTX/GSI Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 25/45] x86/xen: Add xen_no_vector_callback option to test PCI INTX delivery Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 26/45] x86/xen: Fix xen_hvm_smp_init() when vector callback not available Sasha Levin
2021-01-20 1:35 ` Boris Ostrovsky
2021-01-24 13:11 ` Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 27/45] net: stmmac: use __napi_schedule() for PREEMPT_RT Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 28/45] net: stmmac: Fixed mtu channged by cache aligned Sasha Levin
2021-01-20 6:08 ` Jakub Kicinski
2021-01-20 14:26 ` Sasha Levin
2021-01-21 14:39 ` [Linux-stm32] " Ahmad Fatoum
2021-01-21 16:02 ` Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 29/45] dts: phy: fix missing mdio device and probe failure of vsc8541-01 device Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 30/45] dts: phy: add GPIO number and active state used for phy reset Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 31/45] riscv: defconfig: enable gpio support for HiFive Unleashed Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 32/45] drm/amdgpu/psp: fix psp gfx ctrl cmds Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 33/45] drm/amd/display: disable dcn10 pipe split by default Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 34/45] HID: logitech-hidpp: Add product ID for MX Ergo in Bluetooth mode Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 35/45] drm/amd/display: Fix to be able to stop crc calculation Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 36/45] drm/nouveau/bios: fix issue shadowing expansion ROMs Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 37/45] drm/nouveau/privring: ack interrupts the same way as RM Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 38/45] drm/nouveau/i2c/gm200: increase width of aux semaphore owner fields Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 39/45] drm/nouveau/mmu: fix vram heap sizing Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 40/45] drm/nouveau/kms/nv50-: fix case where notifier buffer is at offset 0 Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 41/45] io_uring: flush timeouts that should already have expired Sasha Levin
2021-01-20 1:25 ` [PATCH AUTOSEL 5.10 42/45] libperf tests: If a test fails return non-zero Sasha Levin
2021-01-20 1:26 ` [PATCH AUTOSEL 5.10 43/45] libperf tests: Fail when failing to get a tracepoint id Sasha Levin
2021-01-20 1:26 ` [PATCH AUTOSEL 5.10 44/45] RISC-V: Set current memblock limit Sasha Levin
2021-01-20 1:26 ` [PATCH AUTOSEL 5.10 45/45] RISC-V: Fix maximum allowed phsyical memory for RV32 Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210120012602.769683-3-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=daire@dneg.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).