linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: Tom Talpey <tom@talpey.com>
Cc: linux-rdma@vger.kernel.org,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH RFC] svcrdma: Ignore source port when computing DRC hash
Date: Wed, 5 Jun 2019 13:25:03 -0400	[thread overview]
Message-ID: <955993A4-0626-4819-BC6F-306A50E2E048@oracle.com> (raw)
In-Reply-To: <9E0019E1-1C1B-465C-B2BF-76372029ABD8@talpey.com>

Hi Tom-

> On Jun 5, 2019, at 12:43 PM, Tom Talpey <tom@talpey.com> wrote:
> 
> On 6/5/2019 8:15 AM, Chuck Lever wrote:
>> The DRC is not working at all after an RPC/RDMA transport reconnect.
>> The problem is that the new connection uses a different source port,
>> which defeats DRC hash.
>> 
>> An NFS/RDMA client's source port is meaningless for RDMA transports.
>> The transport layer typically sets the source port value on the
>> connection to a random ephemeral port. The server already ignores it
>> for the "secure port" check. See commit 16e4d93f6de7 ("NFSD: Ignore
>> client's source port on RDMA transports").
> 
> Where does the entropy come from, then, for the server to not
> match other requests from other mount points on this same client?

The first ~200 bytes of each RPC Call message.

[ Note that this has some fun ramifications for calls with small
RPC headers that use Read chunks. ]


> Any time an XID happens to match on a second mount, it will trigger
> incorrect server processing, won't it?

Not a risk for clients that use only a single transport per
client-server pair.


> And since RDMA is capable of
> such high IOPS, the likelihood seems rather high.

Only when the server's durable storage is slow enough to cause
some RPC requests to have extremely high latency.

And, most clients use an atomic counter for their XIDs, so they
are also likely to wrap that counter over some long-pending RPC
request.

The only real answer here is NFSv4 sessions.


> Missing the cache
> might actually be safer than hitting, in this case.

Remember that _any_ retransmit on RPC/RDMA requires a fresh
connection, that includes NFSv3, to reset credit accounting
due to the lost half of the RPC Call/Reply pair.

I can very quickly reproduce bad (non-deterministic) behavior
by running a software build on an NFSv3 on RDMA mount point
with disconnect injection. If the DRC issue is addressed, the
software build runs to completion.

IMO we can't leave things the way they are.


> Tom.
> 
>> I'm not sure why I never noticed this before.
>> 
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> Cc: stable@vger.kernel.org
>> ---
>>  net/sunrpc/xprtrdma/svc_rdma_transport.c |    7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>> 
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> index 027a3b0..1b3700b 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> @@ -211,9 +211,14 @@ static void handle_connect_req(struct rdma_cm_id *new_cma_id,
>>  	/* Save client advertised inbound read limit for use later in accept. */
>>  	newxprt->sc_ord = param->initiator_depth;
>> 
>> -	/* Set the local and remote addresses in the transport */
>>  	sa = (struct sockaddr *)&newxprt->sc_cm_id->route.addr.dst_addr;
>>  	svc_xprt_set_remote(&newxprt->sc_xprt, sa, svc_addr_len(sa));
>> +	/* The remote port is arbitrary and not under the control of the
>> +	 * ULP. Set it to a fixed value so that the DRC continues to work
>> +	 * after a reconnect.
>> +	 */
>> +	rpc_set_port((struct sockaddr *)&newxprt->sc_xprt.xpt_remote, 0);
>> +
>>  	sa = (struct sockaddr *)&newxprt->sc_cm_id->route.addr.src_addr;
>>  	svc_xprt_set_local(&newxprt->sc_xprt, sa, svc_addr_len(sa));
>> 
>> 
>> 
>> 

--
Chuck Lever




  reply	other threads:[~2019-06-05 17:25 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-05 12:15 [PATCH RFC] svcrdma: Ignore source port when computing DRC hash Chuck Lever
2019-06-05 15:57 ` Olga Kornievskaia
2019-06-05 17:28   ` Chuck Lever
2019-06-06 18:13     ` Olga Kornievskaia
2019-06-06 18:33       ` Chuck Lever
2019-06-07 15:43         ` Olga Kornievskaia
2019-06-10 14:38           ` Tom Talpey
2019-06-05 16:43 ` Tom Talpey
2019-06-05 17:25   ` Chuck Lever [this message]
2019-06-10 14:50     ` Tom Talpey
2019-06-10 17:50       ` Chuck Lever
2019-06-10 19:14         ` Tom Talpey
2019-06-10 21:57           ` Chuck Lever
2019-06-10 22:13             ` Tom Talpey
2019-06-11  0:07               ` Tom Talpey
2019-06-11 14:25                 ` Chuck Lever
2019-06-11 14:23               ` Chuck Lever
2019-06-06 13:08 ` Sasha Levin
2019-06-06 13:24   ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=955993A4-0626-4819-BC6F-306A50E2E048@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).