From: Tom Talpey <tom@talpey.com>
To: Chuck Lever <chuck.lever@oracle.com>,
linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org
Subject: Re: [PATCH v2 5/6] xprtrdma: Pad optimization, revisited
Date: Wed, 3 Feb 2021 13:13:59 -0500 [thread overview]
Message-ID: <f2aad824-4449-be60-a39f-bb317764b090@talpey.com> (raw)
In-Reply-To: <161236945965.1030487.13894327853038566730.stgit@manet.1015granger.net>
This is a safe and obviously warranted processing revision.
The changelog is quite an eyeful for a one-liner, and maybe only
makes sense to the truly dedicated reader. But...
Reviewed-By: Tom Talpey <tom@talpey.com>
On 2/3/2021 11:24 AM, Chuck Lever wrote:
> The NetApp Linux team discovered that with NFS/RDMA servers that do
> not support RFC 8797, the Linux client is forming NFSv4.x WRITE
> requests incorrectly.
>
> In this case, the Linux NFS client disables implicit chunk round-up
> for odd-length Read and Write chunks. The goal was to support old
> servers that needed that padding to be sent explicitly by clients.
>
> In that case the Linux NFS included the tail kvec in the Read chunk,
> since the tail contains any needed padding. That meant a separate
> memory registration is needed for the tail kvec, adding to the cost
> of forming such requests. To avoid that cost for a mere 3 bytes of
> zeroes that are always ignored by receivers, we try to use implicit
> roundup when possible.
>
> For NFSv4.x, the tail kvec also sometimes contains a trailing
> GETATTR operation. The Linux NFS clients is unintentionally
> including that GETATTR operation in the Read chunk as well as
> inline. Fortunately, servers ignore this craziness and go about
> their normal business.
>
> The fix is simply to /never/ include the tail kvec when forming a
> data payload Read chunk.
>
> Note that since commit 9ed5af268e88 ("SUNRPC: Clean up the handling
> of page padding in rpc_prepare_reply_pages()") the NFS client passes
> payload data to the transport with the padding in xdr->pages instead
> of in the send buffer's tail kvec. So now the Linux NFS client
> appends XDR padding to all odd-sized Read chunks. This shouldn't be
> a problem because:
>
> - RFC 8166-compliant servers are supposed to work with or without
> that XDR padding in Read chunks.
>
> - Since the padding is now in the same memory region as the data
> payload, a separate memory registration is not needed. In
> addition, the link layer extends data in RDMA Read responses to
> 4-byte boundaries anyway. Thus there is now no savings when the
> padding is not included.
>
> Because older kernels include the payload's XDR padding in the
> tail kvec, a fix there will be more complicated. Thus backporting
> this patch is not recommended.
>
> Reported by: Olga Kornievskaia <Olga.Kornievskaia@netapp.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> net/sunrpc/xprtrdma/rpc_rdma.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
> index f0af89a43efd..f1b52f9ab242 100644
> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> @@ -257,10 +257,7 @@ rpcrdma_convert_iovs(struct rpcrdma_xprt *r_xprt, struct xdr_buf *xdrbuf,
> page_base = 0;
> }
>
> - /* When encoding a Read chunk, the tail iovec contains an
> - * XDR pad and may be omitted.
> - */
> - if (type == rpcrdma_readch && r_xprt->rx_ep->re_implicit_roundup)
> + if (type == rpcrdma_readch)
> goto out;
>
> /* When encoding a Write chunk, some servers need to see an
>
>
>
next prev parent reply other threads:[~2021-02-03 18:18 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-03 16:23 [PATCH v2 0/6] RPC/RDMA client fixes Chuck Lever
2021-02-03 16:23 ` [PATCH v2 1/6] xprtrdma: Remove FMR support in rpcrdma_convert_iovs() Chuck Lever
2021-02-03 18:06 ` Tom Talpey
2021-02-03 18:09 ` Chuck Lever
2021-02-03 16:24 ` [PATCH v2 2/6] xprtrdma: Simplify rpcrdma_convert_kvec() and frwr_map() Chuck Lever
2021-02-03 18:07 ` Tom Talpey
2021-02-03 16:24 ` [PATCH v2 3/6] xprtrdma: Refactor invocations of offset_in_page() Chuck Lever
2021-02-03 18:09 ` Tom Talpey
2021-02-03 18:11 ` Chuck Lever
2021-02-03 18:19 ` Tom Talpey
2021-02-03 16:24 ` [PATCH v2 4/6] rpcrdma: Fix comments about reverse-direction operation Chuck Lever
2021-02-03 18:10 ` Tom Talpey
2021-02-03 16:24 ` [PATCH v2 5/6] xprtrdma: Pad optimization, revisited Chuck Lever
2021-02-03 18:13 ` Tom Talpey [this message]
2021-02-03 16:24 ` [PATCH v2 6/6] rpcrdma: Capture bytes received in Receive completion tracepoints Chuck Lever
2021-02-03 18:14 ` Tom Talpey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f2aad824-4449-be60-a39f-bb317764b090@talpey.com \
--to=tom@talpey.com \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).