All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tom Talpey <tom@talpey.com>
To: Chuck Lever <chuck.lever@oracle.com>,
	linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org
Subject: Re: [PATCH v2 5/6] xprtrdma: Pad optimization, revisited
Date: Wed, 3 Feb 2021 13:13:59 -0500	[thread overview]
Message-ID: <f2aad824-4449-be60-a39f-bb317764b090@talpey.com> (raw)
In-Reply-To: <161236945965.1030487.13894327853038566730.stgit@manet.1015granger.net>

This is a safe and obviously warranted processing revision.

The changelog is quite an eyeful for a one-liner, and maybe only
makes sense to the truly dedicated reader. But...

Reviewed-By: Tom Talpey <tom@talpey.com>

On 2/3/2021 11:24 AM, Chuck Lever wrote:
> The NetApp Linux team discovered that with NFS/RDMA servers that do
> not support RFC 8797, the Linux client is forming NFSv4.x WRITE
> requests incorrectly.
> 
> In this case, the Linux NFS client disables implicit chunk round-up
> for odd-length Read and Write chunks. The goal was to support old
> servers that needed that padding to be sent explicitly by clients.
> 
> In that case the Linux NFS included the tail kvec in the Read chunk,
> since the tail contains any needed padding. That meant a separate
> memory registration is needed for the tail kvec, adding to the cost
> of forming such requests. To avoid that cost for a mere 3 bytes of
> zeroes that are always ignored by receivers, we try to use implicit
> roundup when possible.
> 
> For NFSv4.x, the tail kvec also sometimes contains a trailing
> GETATTR operation. The Linux NFS clients is unintentionally
> including that GETATTR operation in the Read chunk as well as
> inline. Fortunately, servers ignore this craziness and go about
> their normal business.
> 
> The fix is simply to /never/ include the tail kvec when forming a
> data payload Read chunk.
> 
> Note that since commit 9ed5af268e88 ("SUNRPC: Clean up the handling
> of page padding in rpc_prepare_reply_pages()") the NFS client passes
> payload data to the transport with the padding in xdr->pages instead
> of in the send buffer's tail kvec. So now the Linux NFS client
> appends XDR padding to all odd-sized Read chunks. This shouldn't be
> a problem because:
> 
>   - RFC 8166-compliant servers are supposed to work with or without
>     that XDR padding in Read chunks.
> 
>   - Since the padding is now in the same memory region as the data
>     payload, a separate memory registration is not needed. In
>     addition, the link layer extends data in RDMA Read responses to
>     4-byte boundaries anyway. Thus there is now no savings when the
>     padding is not included.
> 
> Because older kernels include the payload's XDR padding in the
> tail kvec, a fix there will be more complicated. Thus backporting
> this patch is not recommended.
> 
> Reported by: Olga Kornievskaia <Olga.Kornievskaia@netapp.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>   net/sunrpc/xprtrdma/rpc_rdma.c |    5 +----
>   1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
> index f0af89a43efd..f1b52f9ab242 100644
> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> @@ -257,10 +257,7 @@ rpcrdma_convert_iovs(struct rpcrdma_xprt *r_xprt, struct xdr_buf *xdrbuf,
>   		page_base = 0;
>   	}
>   
> -	/* When encoding a Read chunk, the tail iovec contains an
> -	 * XDR pad and may be omitted.
> -	 */
> -	if (type == rpcrdma_readch && r_xprt->rx_ep->re_implicit_roundup)
> +	if (type == rpcrdma_readch)
>   		goto out;
>   
>   	/* When encoding a Write chunk, some servers need to see an
> 
> 
> 

  reply	other threads:[~2021-02-03 18:18 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-03 16:23 [PATCH v2 0/6] RPC/RDMA client fixes Chuck Lever
2021-02-03 16:23 ` [PATCH v2 1/6] xprtrdma: Remove FMR support in rpcrdma_convert_iovs() Chuck Lever
2021-02-03 18:06   ` Tom Talpey
2021-02-03 18:09     ` Chuck Lever
2021-02-03 16:24 ` [PATCH v2 2/6] xprtrdma: Simplify rpcrdma_convert_kvec() and frwr_map() Chuck Lever
2021-02-03 18:07   ` Tom Talpey
2021-02-03 16:24 ` [PATCH v2 3/6] xprtrdma: Refactor invocations of offset_in_page() Chuck Lever
2021-02-03 18:09   ` Tom Talpey
2021-02-03 18:11     ` Chuck Lever
2021-02-03 18:19       ` Tom Talpey
2021-02-03 16:24 ` [PATCH v2 4/6] rpcrdma: Fix comments about reverse-direction operation Chuck Lever
2021-02-03 18:10   ` Tom Talpey
2021-02-03 16:24 ` [PATCH v2 5/6] xprtrdma: Pad optimization, revisited Chuck Lever
2021-02-03 18:13   ` Tom Talpey [this message]
2021-02-03 16:24 ` [PATCH v2 6/6] rpcrdma: Capture bytes received in Receive completion tracepoints Chuck Lever
2021-02-03 18:14   ` Tom Talpey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f2aad824-4449-be60-a39f-bb317764b090@talpey.com \
    --to=tom@talpey.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.