linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tom Talpey <tom@talpey.com>
To: Chuck Lever <chuck.lever@oracle.com>,
	linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org
Subject: Re: [PATCH v2 5/6] xprtrdma: Pad optimization, revisited
Date: Wed, 3 Feb 2021 13:13:59 -0500	[thread overview]
Message-ID: <f2aad824-4449-be60-a39f-bb317764b090@talpey.com> (raw)
In-Reply-To: <161236945965.1030487.13894327853038566730.stgit@manet.1015granger.net>

This is a safe and obviously warranted processing revision.

The changelog is quite an eyeful for a one-liner, and maybe only
makes sense to the truly dedicated reader. But...

Reviewed-By: Tom Talpey <tom@talpey.com>

On 2/3/2021 11:24 AM, Chuck Lever wrote:
> The NetApp Linux team discovered that with NFS/RDMA servers that do
> not support RFC 8797, the Linux client is forming NFSv4.x WRITE
> requests incorrectly.
> 
> In this case, the Linux NFS client disables implicit chunk round-up
> for odd-length Read and Write chunks. The goal was to support old
> servers that needed that padding to be sent explicitly by clients.
> 
> In that case the Linux NFS included the tail kvec in the Read chunk,
> since the tail contains any needed padding. That meant a separate
> memory registration is needed for the tail kvec, adding to the cost
> of forming such requests. To avoid that cost for a mere 3 bytes of
> zeroes that are always ignored by receivers, we try to use implicit
> roundup when possible.
> 
> For NFSv4.x, the tail kvec also sometimes contains a trailing
> GETATTR operation. The Linux NFS clients is unintentionally
> including that GETATTR operation in the Read chunk as well as
> inline. Fortunately, servers ignore this craziness and go about
> their normal business.
> 
> The fix is simply to /never/ include the tail kvec when forming a
> data payload Read chunk.
> 
> Note that since commit 9ed5af268e88 ("SUNRPC: Clean up the handling
> of page padding in rpc_prepare_reply_pages()") the NFS client passes
> payload data to the transport with the padding in xdr->pages instead
> of in the send buffer's tail kvec. So now the Linux NFS client
> appends XDR padding to all odd-sized Read chunks. This shouldn't be
> a problem because:
> 
>   - RFC 8166-compliant servers are supposed to work with or without
>     that XDR padding in Read chunks.
> 
>   - Since the padding is now in the same memory region as the data
>     payload, a separate memory registration is not needed. In
>     addition, the link layer extends data in RDMA Read responses to
>     4-byte boundaries anyway. Thus there is now no savings when the
>     padding is not included.
> 
> Because older kernels include the payload's XDR padding in the
> tail kvec, a fix there will be more complicated. Thus backporting
> this patch is not recommended.
> 
> Reported by: Olga Kornievskaia <Olga.Kornievskaia@netapp.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>   net/sunrpc/xprtrdma/rpc_rdma.c |    5 +----
>   1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
> index f0af89a43efd..f1b52f9ab242 100644
> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> @@ -257,10 +257,7 @@ rpcrdma_convert_iovs(struct rpcrdma_xprt *r_xprt, struct xdr_buf *xdrbuf,
>   		page_base = 0;
>   	}
>   
> -	/* When encoding a Read chunk, the tail iovec contains an
> -	 * XDR pad and may be omitted.
> -	 */
> -	if (type == rpcrdma_readch && r_xprt->rx_ep->re_implicit_roundup)
> +	if (type == rpcrdma_readch)
>   		goto out;
>   
>   	/* When encoding a Write chunk, some servers need to see an
> 
> 
> 

  reply	other threads:[~2021-02-03 18:18 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-03 16:23 [PATCH v2 0/6] RPC/RDMA client fixes Chuck Lever
2021-02-03 16:23 ` [PATCH v2 1/6] xprtrdma: Remove FMR support in rpcrdma_convert_iovs() Chuck Lever
2021-02-03 18:06   ` Tom Talpey
2021-02-03 18:09     ` Chuck Lever
2021-02-03 16:24 ` [PATCH v2 2/6] xprtrdma: Simplify rpcrdma_convert_kvec() and frwr_map() Chuck Lever
2021-02-03 18:07   ` Tom Talpey
2021-02-03 16:24 ` [PATCH v2 3/6] xprtrdma: Refactor invocations of offset_in_page() Chuck Lever
2021-02-03 18:09   ` Tom Talpey
2021-02-03 18:11     ` Chuck Lever
2021-02-03 18:19       ` Tom Talpey
2021-02-03 16:24 ` [PATCH v2 4/6] rpcrdma: Fix comments about reverse-direction operation Chuck Lever
2021-02-03 18:10   ` Tom Talpey
2021-02-03 16:24 ` [PATCH v2 5/6] xprtrdma: Pad optimization, revisited Chuck Lever
2021-02-03 18:13   ` Tom Talpey [this message]
2021-02-03 16:24 ` [PATCH v2 6/6] rpcrdma: Capture bytes received in Receive completion tracepoints Chuck Lever
2021-02-03 18:14   ` Tom Talpey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f2aad824-4449-be60-a39f-bb317764b090@talpey.com \
    --to=tom@talpey.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).