linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] xprtrdma: Move initial Receive posting
@ 2021-08-17 18:24 Chuck Lever
  2021-08-18  1:32 ` Tom Talpey
  0 siblings, 1 reply; 3+ messages in thread
From: Chuck Lever @ 2021-08-17 18:24 UTC (permalink / raw)
  To: haakon.bugge; +Cc: linux-rdma, linux-nfs

Håkon Bugge points out that rdma_create_qp() is not supposed to
return a QP that is ready for Receives to be posted. It so happens
that ours does that, but the IBTA spec (12.9.7.1) states that a
transition to INIT happens only after REQ has been sent.

In future kernels, QPs returned from rdma_create_qp() might not
be in a state where posting Receives will succeed. This patch
is a pre-requisite to changing the legacy behavior of
rdma_create_qp().

The first available moment after cm_send_req where xprtrdma can
post Receives is when the RDMA core reports the QP connection
has been established.

Note that xprtrdma has posted Receives just after rdma_create_qp()
since 8d4fb8ff427a ("xprtrdma: Fix disconnect regression"). To avoid
regressing 8d4fb8ff427a, xprtrdma needs to ensure that initial
Receive WRs are posted before pending RPCs are awoken. It appears
that the current logic does provide that guarantee.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/verbs.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index aaec3c9be8db..87ae62cdea18 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -520,12 +520,6 @@ int rpcrdma_xprt_connect(struct rpcrdma_xprt *r_xprt)
 	xprt_clear_connected(xprt);
 	rpcrdma_reset_cwnd(r_xprt);
 
-	/* Bump the ep's reference count while there are
-	 * outstanding Receives.
-	 */
-	rpcrdma_ep_get(ep);
-	rpcrdma_post_recvs(r_xprt, 1, true);
-
 	rc = rdma_connect(ep->re_id, &ep->re_remote_cma);
 	if (rc)
 		goto out;
@@ -539,6 +533,12 @@ int rpcrdma_xprt_connect(struct rpcrdma_xprt *r_xprt)
 		goto out;
 	}
 
+	/* Bump the ep's reference count while there are
+	 * outstanding Receives.
+	 */
+	rpcrdma_ep_get(ep);
+	rpcrdma_post_recvs(r_xprt, 1, true);
+
 	rc = rpcrdma_sendctxs_create(r_xprt);
 	if (rc) {
 		rc = -ENOTCONN;



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH RFC] xprtrdma: Move initial Receive posting
  2021-08-17 18:24 [PATCH RFC] xprtrdma: Move initial Receive posting Chuck Lever
@ 2021-08-18  1:32 ` Tom Talpey
  2021-08-18  2:32   ` Chuck Lever III
  0 siblings, 1 reply; 3+ messages in thread
From: Tom Talpey @ 2021-08-18  1:32 UTC (permalink / raw)
  To: Chuck Lever, haakon.bugge; +Cc: linux-rdma, linux-nfs

> The first available moment after cm_send_req where xprtrdma can post
> Receives is when the RDMA core reports the QP connection has been
> established.

This has significant implications to upper layer protocols. It
means that the listener cannot safely send the first message on
a new connection. This constrains the upper layer protocol to
be client/server style, with the active-side ULP first to send.

If the CM is changed in the way described here, peer-to-peer
protocols will be rendered unusable, as the passive side cannot
reliably send the first message since the connection will have
no receive posted. The MPA Extensions in RFC6581 discuss this,
and support peer-to-peer connection models with the "A" flag
(section 9.2) enabling passive-first ULPs.

Posting receives and awakening client processing as proposed below
does not close this race. A passive-side-first protocol will have
already begun to send, regardless of this rearrangement. It's an
inherent race and will not interoperate reliably.

Why change the CM API? The IB spec is not authoritative on this,
and there currently is no bug, right?

Tom.

On 8/17/2021 2:24 PM, Chuck Lever wrote:
> Håkon Bugge points out that rdma_create_qp() is not supposed to 
> return a QP that is ready for Receives to be posted. It so happens 
> that ours does that, but the IBTA spec (12.9.7.1) states that a 
> transition to INIT happens only after REQ has been sent.
> 
> In future kernels, QPs returned from rdma_create_qp() might not be in
> a state where posting Receives will succeed. This patch is a
> pre-requisite to changing the legacy behavior of rdma_create_qp().
> 
> The first available moment after cm_send_req where xprtrdma can post
> Receives is when the RDMA core reports the QP connection has been
> established.
> 
> Note that xprtrdma has posted Receives just after rdma_create_qp() 
> since 8d4fb8ff427a ("xprtrdma: Fix disconnect regression"). To avoid 
> regressing 8d4fb8ff427a, xprtrdma needs to ensure that initial 
> Receive WRs are posted before pending RPCs are awoken. It appears 
> that the current logic does provide that guarantee.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> --- 
> net/sunrpc/xprtrdma/verbs.c |   12 ++++++------ 1 file changed, 6
> insertions(+), 6 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/verbs.c
> b/net/sunrpc/xprtrdma/verbs.c index aaec3c9be8db..87ae62cdea18
> 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++
> b/net/sunrpc/xprtrdma/verbs.c @@ -520,12 +520,6 @@ int
> rpcrdma_xprt_connect(struct rpcrdma_xprt *r_xprt) 
> xprt_clear_connected(xprt); rpcrdma_reset_cwnd(r_xprt);
> 
> -	/* Bump the ep's reference count while there are -	 * outstanding
> Receives. -	 */ -	rpcrdma_ep_get(ep); -	rpcrdma_post_recvs(r_xprt, 1,
> true); - rc = rdma_connect(ep->re_id, &ep->re_remote_cma); if (rc) 
> goto out; @@ -539,6 +533,12 @@ int rpcrdma_xprt_connect(struct
> rpcrdma_xprt *r_xprt) goto out; }
> 
> +	/* Bump the ep's reference count while there are +	 * outstanding
> Receives. +	 */ +	rpcrdma_ep_get(ep); +	rpcrdma_post_recvs(r_xprt, 1,
> true); + rc = rpcrdma_sendctxs_create(r_xprt); if (rc) { rc =
> -ENOTCONN;
> 
> 
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH RFC] xprtrdma: Move initial Receive posting
  2021-08-18  1:32 ` Tom Talpey
@ 2021-08-18  2:32   ` Chuck Lever III
  0 siblings, 0 replies; 3+ messages in thread
From: Chuck Lever III @ 2021-08-18  2:32 UTC (permalink / raw)
  To: Tom Talpey; +Cc: Haakon Bugge, linux-rdma, Linux NFS Mailing List



> On Aug 17, 2021, at 9:32 PM, Tom Talpey <tom@talpey.com> wrote:
> 
>> The first available moment after cm_send_req where xprtrdma can post
>> Receives is when the RDMA core reports the QP connection has been
>> established.

For reference, I should have included a link to Håkon's recent
remarks about this issue:

https://lore.kernel.org/linux-rdma/A3297641-63BA-4DF1-886A-3620E2A40BA3@oracle.com/


> This has significant implications to upper layer protocols. It
> means that the listener cannot safely send the first message on
> a new connection. This constrains the upper layer protocol to
> be client/server style, with the active-side ULP first to send.

The current situation is that the consumer doesn't get control
again until it gets RDMA_CM_EVENT_ESTABLISHED, which is a bit
late, I agree. We could plumb a callback that is invoked at
the "Send REQ" step for the purpose of populating the Receive
Queue on the active side.

Alternately, xprtrdma's connect worker can explicitly move a
new QP to INIT state if it wants to post Receives early. I
don't see a hard requirement for rdma_create_qp() to do that
for us.


> If the CM is changed in the way described here, peer-to-peer
> protocols will be rendered unusable, as the passive side cannot
> reliably send the first message since the connection will have
> no receive posted. The MPA Extensions in RFC6581 discuss this,
> and support peer-to-peer connection models with the "A" flag
> (section 9.2) enabling passive-first ULPs.

At the moment, possibly the only passive-first ULP in the
Linux kernel is RDS.


> Posting receives and awakening client processing as proposed below
> does not close this race. A passive-side-first protocol will have
> already begun to send, regardless of this rearrangement. It's an
> inherent race and will not interoperate reliably.

Sure, but as far as I can tell, RPC-over-RDMA v1 does not
have an issue here?


> Why change the CM API? The IB spec is not authoritative on this,
> and there currently is no bug, right?

The minor bug is described in the link above.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-08-18  2:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-17 18:24 [PATCH RFC] xprtrdma: Move initial Receive posting Chuck Lever
2021-08-18  1:32 ` Tom Talpey
2021-08-18  2:32   ` Chuck Lever III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).