All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] NFS/RDMA client-side patches for v4.17
@ 2018-02-28 20:30 Chuck Lever
  2018-02-28 20:30 ` [PATCH 1/8] xprtrdma: Fix latency regression on NUMA NFS/RDMA clients Chuck Lever
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Chuck Lever @ 2018-02-28 20:30 UTC (permalink / raw)
  To: anna.schumaker; +Cc: linux-rdma, linux-nfs

Hi Anna-

Starting set of NFS/RDMA client-side changes for the next merge
window. These are clean-ups and simple fixes. Let me know what you
think.

---

Chuck Lever (8):
      xprtrdma: Fix latency regression on NUMA NFS/RDMA clients
      xprtrdma: Remove arbitrary limit on initiator depth
      xprtrdma: Remove xprt-specific connect cookie
      xprtrdma: ->send_request returns -EAGAIN when there are no free MRs
      xprtrdma: Reduce number of MRs created by rpcrdma_mrs_create
      xprtrdma: "Support" call-only RPCs
      xprtrdma: Chain Send to FastReg WRs
      xprtrdma: Move creation of rl_rdmabuf to rpcrdma_create_req


 include/linux/sunrpc/clnt.h       |    7 +++++
 net/sunrpc/sunrpc.h               |    6 ----
 net/sunrpc/xprtrdma/backchannel.c |    7 -----
 net/sunrpc/xprtrdma/fmr_ops.c     |   13 ++++++++-
 net/sunrpc/xprtrdma/frwr_ops.c    |   53 +++++++++++++++++++++++++------------
 net/sunrpc/xprtrdma/rpc_rdma.c    |   32 +++++++++++++++-------
 net/sunrpc/xprtrdma/transport.c   |   42 ++++++-----------------------
 net/sunrpc/xprtrdma/verbs.c       |   31 ++++++++++++++--------
 net/sunrpc/xprtrdma/xprt_rdma.h   |    4 +--
 9 files changed, 108 insertions(+), 87 deletions(-)

--
Chuck Lever

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/8] xprtrdma: Fix latency regression on NUMA NFS/RDMA clients
  2018-02-28 20:30 [PATCH 0/8] NFS/RDMA client-side patches for v4.17 Chuck Lever
@ 2018-02-28 20:30 ` Chuck Lever
  2018-02-28 20:30 ` [PATCH 2/8] xprtrdma: Remove arbitrary limit on initiator depth Chuck Lever
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2018-02-28 20:30 UTC (permalink / raw)
  To: anna.schumaker; +Cc: linux-rdma, linux-nfs

With v4.15, on one of my NFS/RDMA clients I measured a nearly
doubling in the latency of small read and write system calls. There
was no change in server round trip time. The extra latency appears
in the whole RPC execution path.

"git bisect" settled on commit ccede7598588 ("xprtrdma: Spread reply
processing over more CPUs") .

After some experimentation, I found that leaving the WQ bound and
allowing the scheduler to pick the dispatch CPU seems to eliminate
the long latencies, and it does not introduce any new regressions.

The fix is implemented by reverting only the part of
commit ccede7598588 ("xprtrdma: Spread reply processing over more
CPUs") that dispatches RPC replies specifically on the CPU where the
matching RPC call was made.

Interestingly, saving the CPU number and later queuing reply
processing there was effective _only_ for a NFS READ and WRITE
request. On my NUMA client, in-kernel RPC reply processing for
asynchronous RPCs was dispatched on the same CPU where the RPC call
was made, as expected. However synchronous RPCs seem to get their
reply dispatched on some other CPU than where the call was placed,
every time.

Fixes: ccede7598588 ("xprtrdma: Spread reply processing over ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/rpc_rdma.c  |    2 +-
 net/sunrpc/xprtrdma/transport.c |    2 --
 net/sunrpc/xprtrdma/xprt_rdma.h |    1 -
 3 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index f0855a9..4bc0f4d 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -1366,7 +1366,7 @@ void rpcrdma_reply_handler(struct rpcrdma_rep *rep)
 
 	trace_xprtrdma_reply(rqst->rq_task, rep, req, credits);
 
-	queue_work_on(req->rl_cpu, rpcrdma_receive_wq, &rep->rr_work);
+	queue_work(rpcrdma_receive_wq, &rep->rr_work);
 	return;
 
 out_badstatus:
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 4b1ecfe..f86021e 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -52,7 +52,6 @@
 #include <linux/slab.h>
 #include <linux/seq_file.h>
 #include <linux/sunrpc/addr.h>
-#include <linux/smp.h>
 
 #include "xprt_rdma.h"
 
@@ -651,7 +650,6 @@
 	if (!rpcrdma_get_recvbuf(r_xprt, req, rqst->rq_rcvsize, flags))
 		goto out_fail;
 
-	req->rl_cpu = smp_processor_id();
 	req->rl_connect_cookie = 0;	/* our reserved value */
 	rpcrdma_set_xprtdata(rqst, req);
 	rqst->rq_buffer = req->rl_sendbuf->rg_base;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 69883a9..430a6de 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -334,7 +334,6 @@ enum {
 struct rpcrdma_buffer;
 struct rpcrdma_req {
 	struct list_head	rl_list;
-	int			rl_cpu;
 	unsigned int		rl_connect_cookie;
 	struct rpcrdma_buffer	*rl_buffer;
 	struct rpcrdma_rep	*rl_reply;


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/8] xprtrdma: Remove arbitrary limit on initiator depth
  2018-02-28 20:30 [PATCH 0/8] NFS/RDMA client-side patches for v4.17 Chuck Lever
  2018-02-28 20:30 ` [PATCH 1/8] xprtrdma: Fix latency regression on NUMA NFS/RDMA clients Chuck Lever
@ 2018-02-28 20:30 ` Chuck Lever
  2018-02-28 20:30 ` [PATCH 3/8] xprtrdma: Remove xprt-specific connect cookie Chuck Lever
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2018-02-28 20:30 UTC (permalink / raw)
  To: anna.schumaker; +Cc: linux-rdma, linux-nfs

Clean up: We need to check only that the value does not exceed the
range of the u8 field it's going into.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/verbs.c |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index e6f84a6..7cc3465 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -589,11 +589,8 @@
 
 	/* Client offers RDMA Read but does not initiate */
 	ep->rep_remote_cma.initiator_depth = 0;
-	if (ia->ri_device->attrs.max_qp_rd_atom > 32)	/* arbitrary but <= 255 */
-		ep->rep_remote_cma.responder_resources = 32;
-	else
-		ep->rep_remote_cma.responder_resources =
-						ia->ri_device->attrs.max_qp_rd_atom;
+	ep->rep_remote_cma.responder_resources =
+		min_t(int, U8_MAX, ia->ri_device->attrs.max_qp_rd_atom);
 
 	/* Limit transport retries so client can detect server
 	 * GID changes quickly. RPC layer handles re-establishing


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/8] xprtrdma: Remove xprt-specific connect cookie
  2018-02-28 20:30 [PATCH 0/8] NFS/RDMA client-side patches for v4.17 Chuck Lever
  2018-02-28 20:30 ` [PATCH 1/8] xprtrdma: Fix latency regression on NUMA NFS/RDMA clients Chuck Lever
  2018-02-28 20:30 ` [PATCH 2/8] xprtrdma: Remove arbitrary limit on initiator depth Chuck Lever
@ 2018-02-28 20:30 ` Chuck Lever
  2018-02-28 20:30 ` [PATCH 4/8] xprtrdma: ->send_request returns -EAGAIN when there are no free MRs Chuck Lever
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2018-02-28 20:30 UTC (permalink / raw)
  To: anna.schumaker; +Cc: linux-rdma, linux-nfs

Clean up: The generic rq_connect_cookie is sufficient to detect RPC
Call retransmission.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/transport.c |    6 +-----
 net/sunrpc/xprtrdma/verbs.c     |    2 ++
 net/sunrpc/xprtrdma/xprt_rdma.h |    1 -
 3 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index f86021e..47b4604 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -236,8 +236,6 @@
 	struct rpc_xprt *xprt = &r_xprt->rx_xprt;
 
 	spin_lock_bh(&xprt->transport_lock);
-	if (++xprt->connect_cookie == 0)	/* maintain a reserved value */
-		++xprt->connect_cookie;
 	if (ep->rep_connected > 0) {
 		if (!xprt_test_and_set_connected(xprt))
 			xprt_wake_pending_tasks(xprt, 0);
@@ -650,7 +648,6 @@
 	if (!rpcrdma_get_recvbuf(r_xprt, req, rqst->rq_rcvsize, flags))
 		goto out_fail;
 
-	req->rl_connect_cookie = 0;	/* our reserved value */
 	rpcrdma_set_xprtdata(rqst, req);
 	rqst->rq_buffer = req->rl_sendbuf->rg_base;
 	rqst->rq_rbuffer = req->rl_recvbuf->rg_base;
@@ -721,9 +718,8 @@
 		rpcrdma_recv_buffer_get(req);
 
 	/* Must suppress retransmit to maintain credits */
-	if (req->rl_connect_cookie == xprt->connect_cookie)
+	if (rqst->rq_connect_cookie == xprt->connect_cookie)
 		goto drop_connection;
-	req->rl_connect_cookie = xprt->connect_cookie;
 
 	__set_bit(RPCRDMA_REQ_F_PENDING, &req->rl_flags);
 	if (rpcrdma_ep_post(&r_xprt->rx_ia, &r_xprt->rx_ep, req))
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 7cc3465..520e7e4 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -255,6 +255,7 @@
 		/* Return 1 to ensure the core destroys the id. */
 		return 1;
 	case RDMA_CM_EVENT_ESTABLISHED:
+		++xprt->rx_xprt.connect_cookie;
 		connstate = 1;
 		rpcrdma_update_connect_private(xprt, &event->param.conn);
 		goto connected;
@@ -273,6 +274,7 @@
 			connstate = -EAGAIN;
 		goto connected;
 	case RDMA_CM_EVENT_DISCONNECTED:
+		++xprt->rx_xprt.connect_cookie;
 		connstate = -ECONNABORTED;
 connected:
 		xprt->rx_buf.rb_credits = 1;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 430a6de..29ea0b4 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -334,7 +334,6 @@ enum {
 struct rpcrdma_buffer;
 struct rpcrdma_req {
 	struct list_head	rl_list;
-	unsigned int		rl_connect_cookie;
 	struct rpcrdma_buffer	*rl_buffer;
 	struct rpcrdma_rep	*rl_reply;
 	struct xdr_stream	rl_stream;


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/8] xprtrdma: ->send_request returns -EAGAIN when there are no free MRs
  2018-02-28 20:30 [PATCH 0/8] NFS/RDMA client-side patches for v4.17 Chuck Lever
                   ` (2 preceding siblings ...)
  2018-02-28 20:30 ` [PATCH 3/8] xprtrdma: Remove xprt-specific connect cookie Chuck Lever
@ 2018-02-28 20:30 ` Chuck Lever
  2018-02-28 20:30 ` [PATCH 5/8] xprtrdma: Reduce number of MRs created by rpcrdma_mrs_create Chuck Lever
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2018-02-28 20:30 UTC (permalink / raw)
  To: anna.schumaker; +Cc: linux-rdma, linux-nfs

Currently, when the MR free list is exhausted during marshaling, the
RPC/RDMA transport places the RPC task on the delayq, which forces a
wait for HZ >> 2 before the marshal and send is retried.

With this change, the transport now places such an RPC task on the
pending queue, and wakes it just as soon as more MRs have been
created. Creating more MRs typically takes less than a millisecond,
and this waking mechanism is less deadlock-prone.

Moreover, the waiting RPC task is holding the transport's write
lock, which blocks the transport from sending RPCs. Therefore faster
recovery from MR exhaustion is desirable.

This is the same mechanism that the TCP transport utilizes when
handling write buffer space exhaustion.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/fmr_ops.c   |    2 +-
 net/sunrpc/xprtrdma/frwr_ops.c  |    2 +-
 net/sunrpc/xprtrdma/rpc_rdma.c  |   30 +++++++++++++++++++++---------
 net/sunrpc/xprtrdma/transport.c |    3 ++-
 net/sunrpc/xprtrdma/verbs.c     |    3 ++-
 5 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index d5f95bb..629e539 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -191,7 +191,7 @@ enum {
 
 	mr = rpcrdma_mr_get(r_xprt);
 	if (!mr)
-		return ERR_PTR(-ENOBUFS);
+		return ERR_PTR(-EAGAIN);
 
 	pageoff = offset_in_page(seg1->mr_offset);
 	seg1->mr_offset -= pageoff;	/* start of page */
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 90f688f..e21781c 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -367,7 +367,7 @@
 			rpcrdma_mr_defer_recovery(mr);
 		mr = rpcrdma_mr_get(r_xprt);
 		if (!mr)
-			return ERR_PTR(-ENOBUFS);
+			return ERR_PTR(-EAGAIN);
 	} while (mr->frwr.fr_state != FRWR_IS_INVALID);
 	frwr = &mr->frwr;
 	frwr->fr_state = FRWR_IS_VALID;
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 4bc0f4d..e8adad3 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -365,7 +365,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
 		seg = r_xprt->rx_ia.ri_ops->ro_map(r_xprt, seg, nsegs,
 						   false, &mr);
 		if (IS_ERR(seg))
-			return PTR_ERR(seg);
+			goto out_maperr;
 		rpcrdma_mr_push(mr, &req->rl_registered);
 
 		if (encode_read_segment(xdr, mr, pos) < 0)
@@ -377,6 +377,11 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
 	} while (nsegs);
 
 	return 0;
+
+out_maperr:
+	if (PTR_ERR(seg) == -EAGAIN)
+		xprt_wait_for_buffer_space(rqst->rq_task, NULL);
+	return PTR_ERR(seg);
 }
 
 /* Register and XDR encode the Write list. Supports encoding a list
@@ -423,7 +428,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
 		seg = r_xprt->rx_ia.ri_ops->ro_map(r_xprt, seg, nsegs,
 						   true, &mr);
 		if (IS_ERR(seg))
-			return PTR_ERR(seg);
+			goto out_maperr;
 		rpcrdma_mr_push(mr, &req->rl_registered);
 
 		if (encode_rdma_segment(xdr, mr) < 0)
@@ -440,6 +445,11 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
 	*segcount = cpu_to_be32(nchunks);
 
 	return 0;
+
+out_maperr:
+	if (PTR_ERR(seg) == -EAGAIN)
+		xprt_wait_for_buffer_space(rqst->rq_task, NULL);
+	return PTR_ERR(seg);
 }
 
 /* Register and XDR encode the Reply chunk. Supports encoding an array
@@ -481,7 +491,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
 		seg = r_xprt->rx_ia.ri_ops->ro_map(r_xprt, seg, nsegs,
 						   true, &mr);
 		if (IS_ERR(seg))
-			return PTR_ERR(seg);
+			goto out_maperr;
 		rpcrdma_mr_push(mr, &req->rl_registered);
 
 		if (encode_rdma_segment(xdr, mr) < 0)
@@ -498,6 +508,11 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
 	*segcount = cpu_to_be32(nchunks);
 
 	return 0;
+
+out_maperr:
+	if (PTR_ERR(seg) == -EAGAIN)
+		xprt_wait_for_buffer_space(rqst->rq_task, NULL);
+	return PTR_ERR(seg);
 }
 
 /**
@@ -724,8 +739,8 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
  * Returns:
  *	%0 if the RPC was sent successfully,
  *	%-ENOTCONN if the connection was lost,
- *	%-EAGAIN if not enough pages are available for on-demand reply buffer,
- *	%-ENOBUFS if no MRs are available to register chunks,
+ *	%-EAGAIN if the caller should call again with the same arguments,
+ *	%-ENOBUFS if the caller should call again after a delay,
  *	%-EMSGSIZE if the transport header is too small,
  *	%-EIO if a permanent problem occurred while marshaling.
  */
@@ -868,10 +883,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
 	return 0;
 
 out_err:
-	if (ret != -ENOBUFS) {
-		pr_err("rpcrdma: header marshaling failed (%d)\n", ret);
-		r_xprt->rx_stats.failed_marshal_count++;
-	}
+	r_xprt->rx_stats.failed_marshal_count++;
 	return ret;
 }
 
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 47b4604..0819689 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -689,7 +689,8 @@
  * Returns:
  *	%0 if the RPC message has been sent
  *	%-ENOTCONN if the caller should reconnect and call again
- *	%-ENOBUFS if the caller should call again later
+ *	%-EAGAIN if the caller should call again
+ *	%-ENOBUFS if the caller should call again after a delay
  *	%-EIO if a permanent error occurred and the request was not
  *		sent. Do not try to send this message again.
  */
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 520e7e4..d36c18f 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1048,8 +1048,9 @@ void rpcrdma_sendctx_put_locked(struct rpcrdma_sendctx *sc)
 	list_splice(&all, &buf->rb_all);
 	r_xprt->rx_stats.mrs_allocated += count;
 	spin_unlock(&buf->rb_mrlock);
-
 	trace_xprtrdma_createmrs(r_xprt, count);
+
+	xprt_write_space(&r_xprt->rx_xprt);
 }
 
 static void


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/8] xprtrdma: Reduce number of MRs created by rpcrdma_mrs_create
  2018-02-28 20:30 [PATCH 0/8] NFS/RDMA client-side patches for v4.17 Chuck Lever
                   ` (3 preceding siblings ...)
  2018-02-28 20:30 ` [PATCH 4/8] xprtrdma: ->send_request returns -EAGAIN when there are no free MRs Chuck Lever
@ 2018-02-28 20:30 ` Chuck Lever
  2018-02-28 20:30 ` [PATCH 6/8] xprtrdma: "Support" call-only RPCs Chuck Lever
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2018-02-28 20:30 UTC (permalink / raw)
  To: anna.schumaker; +Cc: linux-rdma, linux-nfs

Create fewer MRs on average. Many workloads don't need as many as
32 MRs, and the transport can now quickly restock the MR free list.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/verbs.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index d36c18f..ab86724 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1023,7 +1023,7 @@ void rpcrdma_sendctx_put_locked(struct rpcrdma_sendctx *sc)
 	LIST_HEAD(free);
 	LIST_HEAD(all);
 
-	for (count = 0; count < 32; count++) {
+	for (count = 0; count < 3; count++) {
 		struct rpcrdma_mr *mr;
 		int rc;
 


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 6/8] xprtrdma: "Support" call-only RPCs
  2018-02-28 20:30 [PATCH 0/8] NFS/RDMA client-side patches for v4.17 Chuck Lever
                   ` (4 preceding siblings ...)
  2018-02-28 20:30 ` [PATCH 5/8] xprtrdma: Reduce number of MRs created by rpcrdma_mrs_create Chuck Lever
@ 2018-02-28 20:30 ` Chuck Lever
  2018-02-28 20:30 ` [PATCH 7/8] xprtrdma: Chain Send to FastReg WRs Chuck Lever
  2018-02-28 20:31 ` [PATCH 8/8] xprtrdma: Move creation of rl_rdmabuf to rpcrdma_create_req Chuck Lever
  7 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2018-02-28 20:30 UTC (permalink / raw)
  To: anna.schumaker; +Cc: linux-rdma, linux-nfs

RPC-over-RDMA version 1 credit accounting relies on there being a
response message for every RPC Call. This means that RPC procedures
that have no reply will disrupt credit accounting, just in the same
way as a retransmit would (since it is sent because no reply has
arrived). Deal with the "no reply" case the same way.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/clnt.h     |    7 +++++++
 net/sunrpc/sunrpc.h             |    6 ------
 net/sunrpc/xprtrdma/transport.c |    6 ++++++
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index ed761f7..9b11b6a 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -217,5 +217,12 @@ int		rpc_clnt_setup_test_and_add_xprt(struct rpc_clnt *,
 bool rpc_clnt_xprt_switch_has_addr(struct rpc_clnt *clnt,
 			const struct sockaddr *sap);
 void rpc_cleanup_clids(void);
+
+static inline int rpc_reply_expected(struct rpc_task *task)
+{
+	return (task->tk_msg.rpc_proc != NULL) &&
+		(task->tk_msg.rpc_proc->p_decode != NULL);
+}
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_SUNRPC_CLNT_H */
diff --git a/net/sunrpc/sunrpc.h b/net/sunrpc/sunrpc.h
index f2b7cb5..09a0315 100644
--- a/net/sunrpc/sunrpc.h
+++ b/net/sunrpc/sunrpc.h
@@ -37,12 +37,6 @@ struct rpc_buffer {
 	char	data[];
 };
 
-static inline int rpc_reply_expected(struct rpc_task *task)
-{
-	return (task->tk_msg.rpc_proc != NULL) &&
-		(task->tk_msg.rpc_proc->p_decode != NULL);
-}
-
 static inline int sock_is_loopback(struct sock *sk)
 {
 	struct dst_entry *dst;
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 0819689..7e39faa 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -728,6 +728,12 @@
 
 	rqst->rq_xmit_bytes_sent += rqst->rq_snd_buf.len;
 	rqst->rq_bytes_sent = 0;
+
+	/* An RPC with no reply will throw off credit accounting,
+	 * so drop the connection to reset the credit grant.
+	 */
+	if (!rpc_reply_expected(task))
+		goto drop_connection;
 	return 0;
 
 failed_marshal:


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 7/8] xprtrdma: Chain Send to FastReg WRs
  2018-02-28 20:30 [PATCH 0/8] NFS/RDMA client-side patches for v4.17 Chuck Lever
                   ` (5 preceding siblings ...)
  2018-02-28 20:30 ` [PATCH 6/8] xprtrdma: "Support" call-only RPCs Chuck Lever
@ 2018-02-28 20:30 ` Chuck Lever
  2018-02-28 21:51   ` Anna Schumaker
  2018-02-28 20:31 ` [PATCH 8/8] xprtrdma: Move creation of rl_rdmabuf to rpcrdma_create_req Chuck Lever
  7 siblings, 1 reply; 12+ messages in thread
From: Chuck Lever @ 2018-02-28 20:30 UTC (permalink / raw)
  To: anna.schumaker; +Cc: linux-rdma, linux-nfs

With FRWR, the client transport can perform memory registration and
post a Send with just a single ib_post_send.

This reduces contention between the send_request path and the Send
Completion handlers, and reduces the overhead of registering a chunk
that has multiple segments.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/fmr_ops.c   |   11 ++++++++
 net/sunrpc/xprtrdma/frwr_ops.c  |   51 +++++++++++++++++++++++++++------------
 net/sunrpc/xprtrdma/verbs.c     |    3 +-
 net/sunrpc/xprtrdma/xprt_rdma.h |    2 ++
 4 files changed, 49 insertions(+), 18 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 629e539..5cc68a8 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -251,6 +251,16 @@ enum {
 	return ERR_PTR(-EIO);
 }
 
+/* Post Send WR containing the RPC Call message.
+ */
+static int
+fmr_op_send(struct rpcrdma_ia *ia, struct rpcrdma_req *req)
+{
+	struct ib_send_wr *bad_wr;
+
+	return ib_post_send(ia->ri_id->qp, &req->rl_sendctx->sc_wr, &bad_wr);
+}
+
 /* Invalidate all memory regions that were registered for "req".
  *
  * Sleeps until it is safe for the host CPU to access the
@@ -305,6 +315,7 @@ enum {
 
 const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
 	.ro_map				= fmr_op_map,
+	.ro_send			= fmr_op_send,
 	.ro_unmap_sync			= fmr_op_unmap_sync,
 	.ro_recover_mr			= fmr_op_recover_mr,
 	.ro_open			= fmr_op_open,
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index e21781c..c5743a0 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -357,8 +357,7 @@
 	struct rpcrdma_mr *mr;
 	struct ib_mr *ibmr;
 	struct ib_reg_wr *reg_wr;
-	struct ib_send_wr *bad_wr;
-	int rc, i, n;
+	int i, n;
 	u8 key;
 
 	mr = NULL;
@@ -407,22 +406,12 @@
 	ib_update_fast_reg_key(ibmr, ++key);
 
 	reg_wr = &frwr->fr_regwr;
-	reg_wr->wr.next = NULL;
-	reg_wr->wr.opcode = IB_WR_REG_MR;
-	frwr->fr_cqe.done = frwr_wc_fastreg;
-	reg_wr->wr.wr_cqe = &frwr->fr_cqe;
-	reg_wr->wr.num_sge = 0;
-	reg_wr->wr.send_flags = 0;
 	reg_wr->mr = ibmr;
 	reg_wr->key = ibmr->rkey;
 	reg_wr->access = writing ?
 			 IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
 			 IB_ACCESS_REMOTE_READ;
 
-	rc = ib_post_send(ia->ri_id->qp, &reg_wr->wr, &bad_wr);
-	if (rc)
-		goto out_senderr;
-
 	mr->mr_handle = ibmr->rkey;
 	mr->mr_length = ibmr->length;
 	mr->mr_offset = ibmr->iova;
@@ -442,11 +431,40 @@
 	       frwr->fr_mr, n, mr->mr_nents);
 	rpcrdma_mr_defer_recovery(mr);
 	return ERR_PTR(-EIO);
+}
 
-out_senderr:
-	pr_err("rpcrdma: FRWR registration ib_post_send returned %i\n", rc);
-	rpcrdma_mr_defer_recovery(mr);
-	return ERR_PTR(-ENOTCONN);
+/* Post Send WR containing the RPC Call message.
+ *
+ * For FRMR, chain any FastReg WRs to the Send WR. Only a
+ * single ib_post_send call is needed to register memory
+ * and then post the Send WR.
+ */
+static int
+frwr_op_send(struct rpcrdma_ia *ia, struct rpcrdma_req *req)
+{
+	struct ib_send_wr *post_wr, *bad_wr;
+	struct rpcrdma_mr *mr;
+
+	post_wr = &req->rl_sendctx->sc_wr;
+	list_for_each_entry(mr, &req->rl_registered, mr_list) {
+		struct rpcrdma_frwr *frwr;
+
+		frwr = &mr->frwr;
+
+		frwr->fr_cqe.done = frwr_wc_fastreg;
+		frwr->fr_regwr.wr.next = post_wr;
+		frwr->fr_regwr.wr.wr_cqe = &frwr->fr_cqe;
+		frwr->fr_regwr.wr.num_sge = 0;
+		frwr->fr_regwr.wr.opcode = IB_WR_REG_MR;
+		frwr->fr_regwr.wr.send_flags = 0;
+
+		post_wr = &frwr->fr_regwr.wr;
+	}
+
+	/* If ib_post_send fails, the next ->send_request for
+	 * @req will queue these MWs for recovery.
+	 */
+	return ib_post_send(ia->ri_id->qp, post_wr, &bad_wr);
 }
 
 /* Handle a remotely invalidated mr on the @mrs list
@@ -561,6 +579,7 @@
 
 const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
 	.ro_map				= frwr_op_map,
+	.ro_send			= frwr_op_send,
 	.ro_reminv			= frwr_op_reminv,
 	.ro_unmap_sync			= frwr_op_unmap_sync,
 	.ro_recover_mr			= frwr_op_recover_mr,
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index ab86724..626fd30 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1535,7 +1535,6 @@ struct rpcrdma_regbuf *
 		struct rpcrdma_req *req)
 {
 	struct ib_send_wr *send_wr = &req->rl_sendctx->sc_wr;
-	struct ib_send_wr *send_wr_fail;
 	int rc;
 
 	if (req->rl_reply) {
@@ -1554,7 +1553,7 @@ struct rpcrdma_regbuf *
 		--ep->rep_send_count;
 	}
 
-	rc = ib_post_send(ia->ri_id->qp, send_wr, &send_wr_fail);
+	rc = ia->ri_ops->ro_send(ia, req);
 	trace_xprtrdma_post_send(req, rc);
 	if (rc)
 		return -ENOTCONN;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 29ea0b4..3d3b423 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -472,6 +472,8 @@ struct rpcrdma_memreg_ops {
 			(*ro_map)(struct rpcrdma_xprt *,
 				  struct rpcrdma_mr_seg *, int, bool,
 				  struct rpcrdma_mr **);
+	int		(*ro_send)(struct rpcrdma_ia *ia,
+				   struct rpcrdma_req *req);
 	void		(*ro_reminv)(struct rpcrdma_rep *rep,
 				     struct list_head *mrs);
 	void		(*ro_unmap_sync)(struct rpcrdma_xprt *,


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 8/8] xprtrdma: Move creation of rl_rdmabuf to rpcrdma_create_req
  2018-02-28 20:30 [PATCH 0/8] NFS/RDMA client-side patches for v4.17 Chuck Lever
                   ` (6 preceding siblings ...)
  2018-02-28 20:30 ` [PATCH 7/8] xprtrdma: Chain Send to FastReg WRs Chuck Lever
@ 2018-02-28 20:31 ` Chuck Lever
  7 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2018-02-28 20:31 UTC (permalink / raw)
  To: anna.schumaker; +Cc: linux-rdma, linux-nfs

Refactor: Both rpcrdma_create_req call sites have to allocate the
buffer where the transport header is built, so just move that
allocation into rpcrdma_create_req.

This buffer is a fixed size. There's no needed information available
in call_allocate that is not also available when the transport is
created.

The original purpose for allocating these buffers on demand was to
reduce the possibility that an allocation failure during transport
creation will hork the mount operation during low memory scenarios.
Some relief for this rare possibility is coming up in the next few
patches.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/backchannel.c |    7 -------
 net/sunrpc/xprtrdma/transport.c   |   25 -------------------------
 net/sunrpc/xprtrdma/verbs.c       |   14 ++++++++++++--
 3 files changed, 12 insertions(+), 34 deletions(-)

diff --git a/net/sunrpc/xprtrdma/backchannel.c b/net/sunrpc/xprtrdma/backchannel.c
index ed1a4a3..47ebac9 100644
--- a/net/sunrpc/xprtrdma/backchannel.c
+++ b/net/sunrpc/xprtrdma/backchannel.c
@@ -44,13 +44,6 @@ static int rpcrdma_bc_setup_rqst(struct rpcrdma_xprt *r_xprt,
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
-	rb = rpcrdma_alloc_regbuf(RPCRDMA_HDRBUF_SIZE,
-				  DMA_TO_DEVICE, GFP_KERNEL);
-	if (IS_ERR(rb))
-		goto out_fail;
-	req->rl_rdmabuf = rb;
-	xdr_buf_init(&req->rl_hdrbuf, rb->rg_base, rdmab_length(rb));
-
 	size = r_xprt->rx_data.inline_rsize;
 	rb = rpcrdma_alloc_regbuf(size, DMA_TO_DEVICE, GFP_KERNEL);
 	if (IS_ERR(rb))
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 7e39faa..67e4386 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -537,29 +537,6 @@
 	}
 }
 
-/* Allocate a fixed-size buffer in which to construct and send the
- * RPC-over-RDMA header for this request.
- */
-static bool
-rpcrdma_get_rdmabuf(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req,
-		    gfp_t flags)
-{
-	size_t size = RPCRDMA_HDRBUF_SIZE;
-	struct rpcrdma_regbuf *rb;
-
-	if (req->rl_rdmabuf)
-		return true;
-
-	rb = rpcrdma_alloc_regbuf(size, DMA_TO_DEVICE, flags);
-	if (IS_ERR(rb))
-		return false;
-
-	r_xprt->rx_stats.hardway_register_count += size;
-	req->rl_rdmabuf = rb;
-	xdr_buf_init(&req->rl_hdrbuf, rb->rg_base, rdmab_length(rb));
-	return true;
-}
-
 static bool
 rpcrdma_get_sendbuf(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req,
 		    size_t size, gfp_t flags)
@@ -641,8 +618,6 @@
 	if (RPC_IS_SWAPPER(task))
 		flags = __GFP_MEMALLOC | GFP_NOWAIT | __GFP_NOWARN;
 
-	if (!rpcrdma_get_rdmabuf(r_xprt, req, flags))
-		goto out_fail;
 	if (!rpcrdma_get_sendbuf(r_xprt, req, rqst->rq_callsize, flags))
 		goto out_fail;
 	if (!rpcrdma_get_recvbuf(r_xprt, req, rqst->rq_rcvsize, flags))
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 626fd30..6a7a5a2 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1068,17 +1068,27 @@ struct rpcrdma_req *
 rpcrdma_create_req(struct rpcrdma_xprt *r_xprt)
 {
 	struct rpcrdma_buffer *buffer = &r_xprt->rx_buf;
+	struct rpcrdma_regbuf *rb;
 	struct rpcrdma_req *req;
 
 	req = kzalloc(sizeof(*req), GFP_KERNEL);
 	if (req == NULL)
 		return ERR_PTR(-ENOMEM);
 
+	rb = rpcrdma_alloc_regbuf(RPCRDMA_HDRBUF_SIZE,
+				  DMA_TO_DEVICE, GFP_KERNEL);
+	if (IS_ERR(rb)) {
+		kfree(req);
+		return ERR_PTR(-ENOMEM);
+	}
+	req->rl_rdmabuf = rb;
+	xdr_buf_init(&req->rl_hdrbuf, rb->rg_base, rdmab_length(rb));
+	req->rl_buffer = buffer;
+	INIT_LIST_HEAD(&req->rl_registered);
+
 	spin_lock(&buffer->rb_reqslock);
 	list_add(&req->rl_all, &buffer->rb_allreqs);
 	spin_unlock(&buffer->rb_reqslock);
-	req->rl_buffer = &r_xprt->rx_buf;
-	INIT_LIST_HEAD(&req->rl_registered);
 	return req;
 }
 


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 7/8] xprtrdma: Chain Send to FastReg WRs
  2018-02-28 20:30 ` [PATCH 7/8] xprtrdma: Chain Send to FastReg WRs Chuck Lever
@ 2018-02-28 21:51   ` Anna Schumaker
  2018-02-28 22:59     ` Jason Gunthorpe
  0 siblings, 1 reply; 12+ messages in thread
From: Anna Schumaker @ 2018-02-28 21:51 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-rdma, linux-nfs



On 02/28/2018 03:30 PM, Chuck Lever wrote:
> With FRWR, the client transport can perform memory registration and
> post a Send with just a single ib_post_send.
> 
> This reduces contention between the send_request path and the Send
> Completion handlers, and reduces the overhead of registering a chunk
> that has multiple segments.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  net/sunrpc/xprtrdma/fmr_ops.c   |   11 ++++++++
>  net/sunrpc/xprtrdma/frwr_ops.c  |   51 +++++++++++++++++++++++++++------------
>  net/sunrpc/xprtrdma/verbs.c     |    3 +-
>  net/sunrpc/xprtrdma/xprt_rdma.h |    2 ++
>  4 files changed, 49 insertions(+), 18 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
> index 629e539..5cc68a8 100644
> --- a/net/sunrpc/xprtrdma/fmr_ops.c
> +++ b/net/sunrpc/xprtrdma/fmr_ops.c
> @@ -251,6 +251,16 @@ enum {
>  	return ERR_PTR(-EIO);
>  }
>  
> +/* Post Send WR containing the RPC Call message.
> + */
> +static int
> +fmr_op_send(struct rpcrdma_ia *ia, struct rpcrdma_req *req)
> +{
> +	struct ib_send_wr *bad_wr;
> +
> +	return ib_post_send(ia->ri_id->qp, &req->rl_sendctx->sc_wr, &bad_wr);

I wish there was a bad_wr null-check in ib_post_send() (or in the infiniband drivers) so you don't have to declare a variable that's never used again.  Coordinating that might be more work than it's worth, though.

Anna

> +}
> +
>  /* Invalidate all memory regions that were registered for "req".
>   *
>   * Sleeps until it is safe for the host CPU to access the
> @@ -305,6 +315,7 @@ enum {
>  
>  const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
>  	.ro_map				= fmr_op_map,
> +	.ro_send			= fmr_op_send,
>  	.ro_unmap_sync			= fmr_op_unmap_sync,
>  	.ro_recover_mr			= fmr_op_recover_mr,
>  	.ro_open			= fmr_op_open,
> diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
> index e21781c..c5743a0 100644
> --- a/net/sunrpc/xprtrdma/frwr_ops.c
> +++ b/net/sunrpc/xprtrdma/frwr_ops.c
> @@ -357,8 +357,7 @@
>  	struct rpcrdma_mr *mr;
>  	struct ib_mr *ibmr;
>  	struct ib_reg_wr *reg_wr;
> -	struct ib_send_wr *bad_wr;
> -	int rc, i, n;
> +	int i, n;
>  	u8 key;
>  
>  	mr = NULL;
> @@ -407,22 +406,12 @@
>  	ib_update_fast_reg_key(ibmr, ++key);
>  
>  	reg_wr = &frwr->fr_regwr;
> -	reg_wr->wr.next = NULL;
> -	reg_wr->wr.opcode = IB_WR_REG_MR;
> -	frwr->fr_cqe.done = frwr_wc_fastreg;
> -	reg_wr->wr.wr_cqe = &frwr->fr_cqe;
> -	reg_wr->wr.num_sge = 0;
> -	reg_wr->wr.send_flags = 0;
>  	reg_wr->mr = ibmr;
>  	reg_wr->key = ibmr->rkey;
>  	reg_wr->access = writing ?
>  			 IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
>  			 IB_ACCESS_REMOTE_READ;
>  
> -	rc = ib_post_send(ia->ri_id->qp, &reg_wr->wr, &bad_wr);
> -	if (rc)
> -		goto out_senderr;
> -
>  	mr->mr_handle = ibmr->rkey;
>  	mr->mr_length = ibmr->length;
>  	mr->mr_offset = ibmr->iova;
> @@ -442,11 +431,40 @@
>  	       frwr->fr_mr, n, mr->mr_nents);
>  	rpcrdma_mr_defer_recovery(mr);
>  	return ERR_PTR(-EIO);
> +}
>  
> -out_senderr:
> -	pr_err("rpcrdma: FRWR registration ib_post_send returned %i\n", rc);
> -	rpcrdma_mr_defer_recovery(mr);
> -	return ERR_PTR(-ENOTCONN);
> +/* Post Send WR containing the RPC Call message.
> + *
> + * For FRMR, chain any FastReg WRs to the Send WR. Only a
> + * single ib_post_send call is needed to register memory
> + * and then post the Send WR.
> + */
> +static int
> +frwr_op_send(struct rpcrdma_ia *ia, struct rpcrdma_req *req)
> +{
> +	struct ib_send_wr *post_wr, *bad_wr;
> +	struct rpcrdma_mr *mr;
> +
> +	post_wr = &req->rl_sendctx->sc_wr;
> +	list_for_each_entry(mr, &req->rl_registered, mr_list) {
> +		struct rpcrdma_frwr *frwr;
> +
> +		frwr = &mr->frwr;
> +
> +		frwr->fr_cqe.done = frwr_wc_fastreg;
> +		frwr->fr_regwr.wr.next = post_wr;
> +		frwr->fr_regwr.wr.wr_cqe = &frwr->fr_cqe;
> +		frwr->fr_regwr.wr.num_sge = 0;
> +		frwr->fr_regwr.wr.opcode = IB_WR_REG_MR;
> +		frwr->fr_regwr.wr.send_flags = 0;
> +
> +		post_wr = &frwr->fr_regwr.wr;
> +	}
> +
> +	/* If ib_post_send fails, the next ->send_request for
> +	 * @req will queue these MWs for recovery.
> +	 */
> +	return ib_post_send(ia->ri_id->qp, post_wr, &bad_wr);
>  }
>  
>  /* Handle a remotely invalidated mr on the @mrs list
> @@ -561,6 +579,7 @@
>  
>  const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
>  	.ro_map				= frwr_op_map,
> +	.ro_send			= frwr_op_send,
>  	.ro_reminv			= frwr_op_reminv,
>  	.ro_unmap_sync			= frwr_op_unmap_sync,
>  	.ro_recover_mr			= frwr_op_recover_mr,
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index ab86724..626fd30 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -1535,7 +1535,6 @@ struct rpcrdma_regbuf *
>  		struct rpcrdma_req *req)
>  {
>  	struct ib_send_wr *send_wr = &req->rl_sendctx->sc_wr;
> -	struct ib_send_wr *send_wr_fail;
>  	int rc;
>  
>  	if (req->rl_reply) {
> @@ -1554,7 +1553,7 @@ struct rpcrdma_regbuf *
>  		--ep->rep_send_count;
>  	}
>  
> -	rc = ib_post_send(ia->ri_id->qp, send_wr, &send_wr_fail);
> +	rc = ia->ri_ops->ro_send(ia, req);
>  	trace_xprtrdma_post_send(req, rc);
>  	if (rc)
>  		return -ENOTCONN;
> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
> index 29ea0b4..3d3b423 100644
> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
> @@ -472,6 +472,8 @@ struct rpcrdma_memreg_ops {
>  			(*ro_map)(struct rpcrdma_xprt *,
>  				  struct rpcrdma_mr_seg *, int, bool,
>  				  struct rpcrdma_mr **);
> +	int		(*ro_send)(struct rpcrdma_ia *ia,
> +				   struct rpcrdma_req *req);
>  	void		(*ro_reminv)(struct rpcrdma_rep *rep,
>  				     struct list_head *mrs);
>  	void		(*ro_unmap_sync)(struct rpcrdma_xprt *,
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 7/8] xprtrdma: Chain Send to FastReg WRs
  2018-02-28 21:51   ` Anna Schumaker
@ 2018-02-28 22:59     ` Jason Gunthorpe
  2018-02-28 23:04       ` Chuck Lever
  0 siblings, 1 reply; 12+ messages in thread
From: Jason Gunthorpe @ 2018-02-28 22:59 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Chuck Lever, linux-rdma, linux-nfs

On Wed, Feb 28, 2018 at 04:51:11PM -0500, Anna Schumaker wrote:
> 
> 
> On 02/28/2018 03:30 PM, Chuck Lever wrote:
> > With FRWR, the client transport can perform memory registration and
> > post a Send with just a single ib_post_send.
> > 
> > This reduces contention between the send_request path and the Send
> > Completion handlers, and reduces the overhead of registering a chunk
> > that has multiple segments.
> > 
> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >  net/sunrpc/xprtrdma/fmr_ops.c   |   11 ++++++++
> >  net/sunrpc/xprtrdma/frwr_ops.c  |   51 +++++++++++++++++++++++++++------------
> >  net/sunrpc/xprtrdma/verbs.c     |    3 +-
> >  net/sunrpc/xprtrdma/xprt_rdma.h |    2 ++
> >  4 files changed, 49 insertions(+), 18 deletions(-)
> > 
> > diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
> > index 629e539..5cc68a8 100644
> > +++ b/net/sunrpc/xprtrdma/fmr_ops.c
> > @@ -251,6 +251,16 @@ enum {
> >  	return ERR_PTR(-EIO);
> >  }
> >  
> > +/* Post Send WR containing the RPC Call message.
> > + */
> > +static int
> > +fmr_op_send(struct rpcrdma_ia *ia, struct rpcrdma_req *req)
> > +{
> > +	struct ib_send_wr *bad_wr;
> > +
> > +	return ib_post_send(ia->ri_id->qp, &req->rl_sendctx->sc_wr, &bad_wr);
> 
> I wish there was a bad_wr null-check in ib_post_send() (or in the
> infiniband drivers) so you don't have to declare a variable that's
> never used again.  Coordinating that might be more work than it's
> worth, though.

It is a good point, I actually don't think we have any user in kernel
of bad_wr ..

Would prefer to just drop the parameter and add a new function call if
really, really needed.

Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 7/8] xprtrdma: Chain Send to FastReg WRs
  2018-02-28 22:59     ` Jason Gunthorpe
@ 2018-02-28 23:04       ` Chuck Lever
  0 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2018-02-28 23:04 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Anna Schumaker, linux-rdma, Linux NFS Mailing List



> On Feb 28, 2018, at 5:59 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>=20
> On Wed, Feb 28, 2018 at 04:51:11PM -0500, Anna Schumaker wrote:
>>=20
>>=20
>> On 02/28/2018 03:30 PM, Chuck Lever wrote:
>>> With FRWR, the client transport can perform memory registration and
>>> post a Send with just a single ib_post_send.
>>>=20
>>> This reduces contention between the send_request path and the Send
>>> Completion handlers, and reduces the overhead of registering a chunk
>>> that has multiple segments.
>>>=20
>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>> net/sunrpc/xprtrdma/fmr_ops.c   |   11 ++++++++
>>> net/sunrpc/xprtrdma/frwr_ops.c  |   51 =
+++++++++++++++++++++++++++------------
>>> net/sunrpc/xprtrdma/verbs.c     |    3 +-
>>> net/sunrpc/xprtrdma/xprt_rdma.h |    2 ++
>>> 4 files changed, 49 insertions(+), 18 deletions(-)
>>>=20
>>> diff --git a/net/sunrpc/xprtrdma/fmr_ops.c =
b/net/sunrpc/xprtrdma/fmr_ops.c
>>> index 629e539..5cc68a8 100644
>>> +++ b/net/sunrpc/xprtrdma/fmr_ops.c
>>> @@ -251,6 +251,16 @@ enum {
>>> 	return ERR_PTR(-EIO);
>>> }
>>>=20
>>> +/* Post Send WR containing the RPC Call message.
>>> + */
>>> +static int
>>> +fmr_op_send(struct rpcrdma_ia *ia, struct rpcrdma_req *req)
>>> +{
>>> +	struct ib_send_wr *bad_wr;
>>> +
>>> +	return ib_post_send(ia->ri_id->qp, &req->rl_sendctx->sc_wr, =
&bad_wr);
>>=20
>> I wish there was a bad_wr null-check in ib_post_send() (or in the
>> infiniband drivers) so you don't have to declare a variable that's
>> never used again.  Coordinating that might be more work than it's
>> worth, though.
>=20
> It is a good point, I actually don't think we have any user in kernel
> of bad_wr ..

Yes, frwr_op_unmap_sync() uses the 3rd argument, and I'm about to add
a call site for ib_post_recv which uses that argument as well.


> Would prefer to just drop the parameter and add a new function call if
> really, really needed.

You could do something like

  ib_post_send_one(qp, wr);

for the common case; and likewise for ib_post_recv.

--
Chuck Lever




^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-02-28 23:04 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-28 20:30 [PATCH 0/8] NFS/RDMA client-side patches for v4.17 Chuck Lever
2018-02-28 20:30 ` [PATCH 1/8] xprtrdma: Fix latency regression on NUMA NFS/RDMA clients Chuck Lever
2018-02-28 20:30 ` [PATCH 2/8] xprtrdma: Remove arbitrary limit on initiator depth Chuck Lever
2018-02-28 20:30 ` [PATCH 3/8] xprtrdma: Remove xprt-specific connect cookie Chuck Lever
2018-02-28 20:30 ` [PATCH 4/8] xprtrdma: ->send_request returns -EAGAIN when there are no free MRs Chuck Lever
2018-02-28 20:30 ` [PATCH 5/8] xprtrdma: Reduce number of MRs created by rpcrdma_mrs_create Chuck Lever
2018-02-28 20:30 ` [PATCH 6/8] xprtrdma: "Support" call-only RPCs Chuck Lever
2018-02-28 20:30 ` [PATCH 7/8] xprtrdma: Chain Send to FastReg WRs Chuck Lever
2018-02-28 21:51   ` Anna Schumaker
2018-02-28 22:59     ` Jason Gunthorpe
2018-02-28 23:04       ` Chuck Lever
2018-02-28 20:31 ` [PATCH 8/8] xprtrdma: Move creation of rl_rdmabuf to rpcrdma_create_req Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.