All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 00/19] NFS/RDMA server for-next
@ 2018-05-07 19:26 Chuck Lever
  2018-05-07 19:26 ` [PATCH v1 01/19] svcrdma: Add proper SPDX tags for NetApp-contributed source Chuck Lever
                   ` (18 more replies)
  0 siblings, 19 replies; 29+ messages in thread
From: Chuck Lever @ 2018-05-07 19:26 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

Hi Bruce-

Here are all the patches I'd like to see merged into the next kernel
(v4.18 or v5.0) if possible. The main changes are:

 - Added trace points to svcrdma
 - Post Recv WRs in Receive completion handler
 - Handle Send WRs with fewer page allocations
 - Lots of clean up that results from these changes

The svc_rdma_recv_ctxt and svc_rdma_send_ctxt changes improve the
efficiency of the transport receive and send paths by reducing
memory allocation and DMA mapping activity per RPC. Posting Recv
WRs in the Receive completion handler means the Receive Queue
does not bounce among all the CPUs.

---

Chuck Lever (19):
      svcrdma: Add proper SPDX tags for NetApp-contributed source
      svcrdma: Use passed-in net namespace when creating RDMA listener
      xprtrdma: Prepare RPC/RDMA includes for server-side trace points
      svcrdma: Trace key RPC/RDMA protocol events
      svcrdma: Trace key RDMA API events
      svcrdma: Introduce svc_rdma_recv_ctxt
      svcrdma: Remove sc_rq_depth
      svcrdma: Simplify svc_rdma_recv_ctxt_put
      svcrdma: Preserve Receive buffer until svc_rdma_sendto
      svcrdma: Persistently allocate and DMA-map Receive buffers
      svcrdma: Allocate recv_ctxt's on CPU handling Receives
      svcrdma: Refactor svc_rdma_dma_map_buf
      svcrdma: Clean up Send SGE accounting
      svcrdma: Introduce svc_rdma_send_ctxt
      svcrdma: Don't overrun the SGE array in svc_rdma_send_ctxt
      svcrdma: Remove post_send_wr
      svcrdma: Simplify svc_rdma_send()
      svcrdma: Persistently allocate and DMA-map Send buffers
      svcrdma: Remove unused svc_rdma_op_ctxt


 include/linux/sunrpc/svc_rdma.h            |   95 ++---
 include/trace/events/rpcrdma.h             |  584 ++++++++++++++++++++++++++++
 net/sunrpc/xprtrdma/backchannel.c          |    2 
 net/sunrpc/xprtrdma/fmr_ops.c              |    3 
 net/sunrpc/xprtrdma/frwr_ops.c             |    2 
 net/sunrpc/xprtrdma/module.c               |    4 
 net/sunrpc/xprtrdma/rpc_rdma.c             |    7 
 net/sunrpc/xprtrdma/svc_rdma.c             |    3 
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |   54 +--
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c    |  439 +++++++++++++++------
 net/sunrpc/xprtrdma/svc_rdma_rw.c          |  133 +++---
 net/sunrpc/xprtrdma/svc_rdma_sendto.c      |  510 ++++++++++++++++--------
 net/sunrpc/xprtrdma/svc_rdma_transport.c   |  484 ++++-------------------
 net/sunrpc/xprtrdma/transport.c            |    4 
 net/sunrpc/xprtrdma/verbs.c                |    1 
 net/sunrpc/xprtrdma/xprt_rdma.h            |    2 
 16 files changed, 1461 insertions(+), 866 deletions(-)

--
Chuck Lever

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v1 01/19] svcrdma: Add proper SPDX tags for NetApp-contributed source
  2018-05-07 19:26 [PATCH v1 00/19] NFS/RDMA server for-next Chuck Lever
@ 2018-05-07 19:26 ` Chuck Lever
  2018-05-09 20:23   ` J. Bruce Fields
  2018-05-07 19:27 ` [PATCH v1 02/19] svcrdma: Use passed-in net namespace when creating RDMA listener Chuck Lever
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 29+ messages in thread
From: Chuck Lever @ 2018-05-07 19:26 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h          |    1 +
 net/sunrpc/xprtrdma/svc_rdma.c           |    1 +
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |    1 +
 net/sunrpc/xprtrdma/svc_rdma_sendto.c    |    1 +
 net/sunrpc/xprtrdma/svc_rdma_transport.c |    1 +
 5 files changed, 5 insertions(+)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 7337e12..88da0c9 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
 /*
  * Copyright (c) 2005-2006 Network Appliance, Inc. All rights reserved.
  *
diff --git a/net/sunrpc/xprtrdma/svc_rdma.c b/net/sunrpc/xprtrdma/svc_rdma.c
index dd8a431..a490532 100644
--- a/net/sunrpc/xprtrdma/svc_rdma.c
+++ b/net/sunrpc/xprtrdma/svc_rdma.c
@@ -1,3 +1,4 @@
+// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
 /*
  * Copyright (c) 2005-2006 Network Appliance, Inc. All rights reserved.
  *
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 3d45015..9eae95d 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -1,3 +1,4 @@
+// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
 /*
  * Copyright (c) 2016, 2017 Oracle. All rights reserved.
  * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index 649441d..79bd3a3 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -1,3 +1,4 @@
+// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
 /*
  * Copyright (c) 2016 Oracle. All rights reserved.
  * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 96cc8f6..3633254 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -1,3 +1,4 @@
+// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
 /*
  * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
  * Copyright (c) 2005-2007 Network Appliance, Inc. All rights reserved.


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v1 02/19] svcrdma: Use passed-in net namespace when creating RDMA listener
  2018-05-07 19:26 [PATCH v1 00/19] NFS/RDMA server for-next Chuck Lever
  2018-05-07 19:26 ` [PATCH v1 01/19] svcrdma: Add proper SPDX tags for NetApp-contributed source Chuck Lever
@ 2018-05-07 19:27 ` Chuck Lever
  2018-05-07 19:27 ` [PATCH v1 03/19] xprtrdma: Prepare RPC/RDMA includes for server-side trace points Chuck Lever
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Chuck Lever @ 2018-05-07 19:27 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

Ensure each RDMA listener and its children transports are created in
the same net namespace as the user that started the NFS service.
This is similar to how listener sockets are created in
svc_create_socket, required for enabling support for containers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/svc_rdma_transport.c |   35 +++++++++++++++---------------
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 3633254..22e2595 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -60,7 +60,8 @@
 #define RPCDBG_FACILITY	RPCDBG_SVCXPRT
 
 static int svc_rdma_post_recv(struct svcxprt_rdma *xprt);
-static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *, int);
+static struct svcxprt_rdma *svc_rdma_create_xprt(struct svc_serv *serv,
+						 struct net *net);
 static struct svc_xprt *svc_rdma_create(struct svc_serv *serv,
 					struct net *net,
 					struct sockaddr *sa, int salen,
@@ -124,7 +125,7 @@ static struct svc_xprt *svc_rdma_bc_create(struct svc_serv *serv,
 	struct svcxprt_rdma *cma_xprt;
 	struct svc_xprt *xprt;
 
-	cma_xprt = rdma_create_xprt(serv, 0);
+	cma_xprt = svc_rdma_create_xprt(serv, net);
 	if (!cma_xprt)
 		return ERR_PTR(-ENOMEM);
 	xprt = &cma_xprt->sc_xprt;
@@ -374,14 +375,16 @@ void svc_rdma_wc_send(struct ib_cq *cq, struct ib_wc *wc)
 	svc_xprt_put(&xprt->sc_xprt);
 }
 
-static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv,
-					     int listener)
+static struct svcxprt_rdma *svc_rdma_create_xprt(struct svc_serv *serv,
+						 struct net *net)
 {
 	struct svcxprt_rdma *cma_xprt = kzalloc(sizeof *cma_xprt, GFP_KERNEL);
 
-	if (!cma_xprt)
+	if (!cma_xprt) {
+		dprintk("svcrdma: failed to create new transport\n");
 		return NULL;
-	svc_xprt_init(&init_net, &svc_rdma_class, &cma_xprt->sc_xprt, serv);
+	}
+	svc_xprt_init(net, &svc_rdma_class, &cma_xprt->sc_xprt, serv);
 	INIT_LIST_HEAD(&cma_xprt->sc_accept_q);
 	INIT_LIST_HEAD(&cma_xprt->sc_rq_dto_q);
 	INIT_LIST_HEAD(&cma_xprt->sc_read_complete_q);
@@ -402,11 +405,6 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv,
 	 */
 	set_bit(XPT_CONG_CTRL, &cma_xprt->sc_xprt.xpt_flags);
 
-	if (listener) {
-		strcpy(cma_xprt->sc_xprt.xpt_remotebuf, "listener");
-		set_bit(XPT_LISTENER, &cma_xprt->sc_xprt.xpt_flags);
-	}
-
 	return cma_xprt;
 }
 
@@ -505,11 +503,10 @@ static void handle_connect_req(struct rdma_cm_id *new_cma_id,
 	struct sockaddr *sa;
 
 	/* Create a new transport */
-	newxprt = rdma_create_xprt(listen_xprt->sc_xprt.xpt_server, 0);
-	if (!newxprt) {
-		dprintk("svcrdma: failed to create new transport\n");
+	newxprt = svc_rdma_create_xprt(listen_xprt->sc_xprt.xpt_server,
+				       listen_xprt->sc_xprt.xpt_net);
+	if (!newxprt)
 		return;
-	}
 	newxprt->sc_cm_id = new_cma_id;
 	new_cma_id->context = newxprt;
 	dprintk("svcrdma: Creating newxprt=%p, cm_id=%p, listenxprt=%p\n",
@@ -635,16 +632,18 @@ static struct svc_xprt *svc_rdma_create(struct svc_serv *serv,
 	struct svcxprt_rdma *cma_xprt;
 	int ret;
 
-	dprintk("svcrdma: Creating RDMA socket\n");
+	dprintk("svcrdma: Creating RDMA listener\n");
 	if ((sa->sa_family != AF_INET) && (sa->sa_family != AF_INET6)) {
 		dprintk("svcrdma: Address family %d is not supported.\n", sa->sa_family);
 		return ERR_PTR(-EAFNOSUPPORT);
 	}
-	cma_xprt = rdma_create_xprt(serv, 1);
+	cma_xprt = svc_rdma_create_xprt(serv, net);
 	if (!cma_xprt)
 		return ERR_PTR(-ENOMEM);
+	set_bit(XPT_LISTENER, &cma_xprt->sc_xprt.xpt_flags);
+	strcpy(cma_xprt->sc_xprt.xpt_remotebuf, "listener");
 
-	listen_id = rdma_create_id(&init_net, rdma_listen_handler, cma_xprt,
+	listen_id = rdma_create_id(net, rdma_listen_handler, cma_xprt,
 				   RDMA_PS_TCP, IB_QPT_RC);
 	if (IS_ERR(listen_id)) {
 		ret = PTR_ERR(listen_id);


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v1 03/19] xprtrdma: Prepare RPC/RDMA includes for server-side trace points
  2018-05-07 19:26 [PATCH v1 00/19] NFS/RDMA server for-next Chuck Lever
  2018-05-07 19:26 ` [PATCH v1 01/19] svcrdma: Add proper SPDX tags for NetApp-contributed source Chuck Lever
  2018-05-07 19:27 ` [PATCH v1 02/19] svcrdma: Use passed-in net namespace when creating RDMA listener Chuck Lever
@ 2018-05-07 19:27 ` Chuck Lever
  2018-05-07 19:27 ` [PATCH v1 04/19] svcrdma: Trace key RPC/RDMA protocol events Chuck Lever
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Chuck Lever @ 2018-05-07 19:27 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

Clean up: Move #include <trace/events/rpcrdma.h> into source files,
similar to how it is done with trace/events/sunrpc.h.

Server-side trace points will be part of the rpcrdma subsystem,
just like the client-side trace points.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/backchannel.c |    1 +
 net/sunrpc/xprtrdma/fmr_ops.c     |    1 +
 net/sunrpc/xprtrdma/frwr_ops.c    |    1 +
 net/sunrpc/xprtrdma/module.c      |    4 +++-
 net/sunrpc/xprtrdma/rpc_rdma.c    |    5 +++--
 net/sunrpc/xprtrdma/svc_rdma.c    |    2 +-
 net/sunrpc/xprtrdma/transport.c   |    1 +
 net/sunrpc/xprtrdma/verbs.c       |    1 +
 net/sunrpc/xprtrdma/xprt_rdma.h   |    2 --
 9 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/net/sunrpc/xprtrdma/backchannel.c b/net/sunrpc/xprtrdma/backchannel.c
index 47ebac9..05c69ac 100644
--- a/net/sunrpc/xprtrdma/backchannel.c
+++ b/net/sunrpc/xprtrdma/backchannel.c
@@ -11,6 +11,7 @@
 #include <linux/sunrpc/svc_xprt.h>
 
 #include "xprt_rdma.h"
+#include <trace/events/rpcrdma.h>
 
 #if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
 # define RPCDBG_FACILITY	RPCDBG_TRANS
diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 5cc68a8..08de7da 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -21,6 +21,7 @@
  */
 
 #include "xprt_rdma.h"
+#include <trace/events/rpcrdma.h>
 
 #if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
 # define RPCDBG_FACILITY	RPCDBG_TRANS
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index c5743a0..f8312e3 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -73,6 +73,7 @@
 #include <linux/sunrpc/rpc_rdma.h>
 
 #include "xprt_rdma.h"
+#include <trace/events/rpcrdma.h>
 
 #if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
 # define RPCDBG_FACILITY	RPCDBG_TRANS
diff --git a/net/sunrpc/xprtrdma/module.c b/net/sunrpc/xprtrdma/module.c
index a762d19..d95ac07 100644
--- a/net/sunrpc/xprtrdma/module.c
+++ b/net/sunrpc/xprtrdma/module.c
@@ -13,9 +13,11 @@
 
 #include <asm/swab.h>
 
-#define CREATE_TRACE_POINTS
 #include "xprt_rdma.h"
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/rpcrdma.h>
+
 MODULE_AUTHOR("Open Grid Computing and Network Appliance, Inc.");
 MODULE_DESCRIPTION("RPC/RDMA Transport");
 MODULE_LICENSE("Dual BSD/GPL");
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index e8adad3..f358d1e 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -46,10 +46,11 @@
  * to the Linux RPC framework lives.
  */
 
-#include "xprt_rdma.h"
-
 #include <linux/highmem.h>
 
+#include "xprt_rdma.h"
+#include <trace/events/rpcrdma.h>
+
 #if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
 # define RPCDBG_FACILITY	RPCDBG_TRANS
 #endif
diff --git a/net/sunrpc/xprtrdma/svc_rdma.c b/net/sunrpc/xprtrdma/svc_rdma.c
index a490532..357ba90 100644
--- a/net/sunrpc/xprtrdma/svc_rdma.c
+++ b/net/sunrpc/xprtrdma/svc_rdma.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
 /*
+ * Copyright (c) 2015-2018 Oracle.  All rights reserved.
  * Copyright (c) 2005-2006 Network Appliance, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
@@ -47,7 +48,6 @@
 #include <linux/sunrpc/clnt.h>
 #include <linux/sunrpc/sched.h>
 #include <linux/sunrpc/svc_rdma.h>
-#include "xprt_rdma.h"
 
 #define RPCDBG_FACILITY	RPCDBG_SVCXPRT
 
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index cc1aad3..3d1b277 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -54,6 +54,7 @@
 #include <linux/sunrpc/addr.h>
 
 #include "xprt_rdma.h"
+#include <trace/events/rpcrdma.h>
 
 #if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
 # define RPCDBG_FACILITY	RPCDBG_TRANS
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index fe5eaca..a143c59 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -59,6 +59,7 @@
 #include <rdma/ib_cm.h>
 
 #include "xprt_rdma.h"
+#include <trace/events/rpcrdma.h>
 
 /*
  * Globals/Macros
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 3d3b423..3f856c7 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -675,5 +675,3 @@ static inline void rpcrdma_set_xdrlen(struct xdr_buf *xdr, size_t len)
 extern struct xprt_class xprt_rdma_bc;
 
 #endif				/* _LINUX_SUNRPC_XPRT_RDMA_H */
-
-#include <trace/events/rpcrdma.h>


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v1 04/19] svcrdma: Trace key RPC/RDMA protocol events
  2018-05-07 19:26 [PATCH v1 00/19] NFS/RDMA server for-next Chuck Lever
                   ` (2 preceding siblings ...)
  2018-05-07 19:27 ` [PATCH v1 03/19] xprtrdma: Prepare RPC/RDMA includes for server-side trace points Chuck Lever
@ 2018-05-07 19:27 ` Chuck Lever
  2018-05-07 19:27 ` [PATCH v1 05/19] svcrdma: Trace key RDMA API events Chuck Lever
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Chuck Lever @ 2018-05-07 19:27 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

This includes:
  * Transport accept and tear-down
  * Decisions about using Write and Reply chunks
  * Each RDMA segment that is handled
  * Whenever an RDMA_ERR is sent

As a clean-up, I've standardized the order of the includes, and
removed some now redundant dprintk call sites.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/trace/events/rpcrdma.h           |  262 ++++++++++++++++++++++++++++++
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |   36 ++--
 net/sunrpc/xprtrdma/svc_rdma_rw.c        |   23 ++-
 net/sunrpc/xprtrdma/svc_rdma_sendto.c    |   19 +-
 net/sunrpc/xprtrdma/svc_rdma_transport.c |   19 +-
 5 files changed, 311 insertions(+), 48 deletions(-)

diff --git a/include/trace/events/rpcrdma.h b/include/trace/events/rpcrdma.h
index 50ed3f8..633520a 100644
--- a/include/trace/events/rpcrdma.h
+++ b/include/trace/events/rpcrdma.h
@@ -1,6 +1,8 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 /*
- * Copyright (c) 2017 Oracle.  All rights reserved.
+ * Copyright (c) 2017, 2018 Oracle.  All rights reserved.
+ *
+ * Trace point definitions for the "rpcrdma" subsystem.
  */
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM rpcrdma
@@ -885,6 +887,264 @@
 DEFINE_CB_EVENT(xprtrdma_cb_call);
 DEFINE_CB_EVENT(xprtrdma_cb_reply);
 
+/**
+ ** Server-side RPC/RDMA events
+ **/
+
+DECLARE_EVENT_CLASS(svcrdma_xprt_event,
+	TP_PROTO(
+		const struct svc_xprt *xprt
+	),
+
+	TP_ARGS(xprt),
+
+	TP_STRUCT__entry(
+		__field(const void *, xprt)
+		__string(addr, xprt->xpt_remotebuf)
+	),
+
+	TP_fast_assign(
+		__entry->xprt = xprt;
+		__assign_str(addr, xprt->xpt_remotebuf);
+	),
+
+	TP_printk("xprt=%p addr=%s",
+		__entry->xprt, __get_str(addr)
+	)
+);
+
+#define DEFINE_XPRT_EVENT(name)						\
+		DEFINE_EVENT(svcrdma_xprt_event, svcrdma_xprt_##name,	\
+				TP_PROTO(				\
+					const struct svc_xprt *xprt	\
+				),					\
+				TP_ARGS(xprt))
+
+DEFINE_XPRT_EVENT(accept);
+DEFINE_XPRT_EVENT(fail);
+DEFINE_XPRT_EVENT(free);
+
+TRACE_DEFINE_ENUM(RDMA_MSG);
+TRACE_DEFINE_ENUM(RDMA_NOMSG);
+TRACE_DEFINE_ENUM(RDMA_MSGP);
+TRACE_DEFINE_ENUM(RDMA_DONE);
+TRACE_DEFINE_ENUM(RDMA_ERROR);
+
+#define show_rpcrdma_proc(x)						\
+		__print_symbolic(x,					\
+				{ RDMA_MSG, "RDMA_MSG" },		\
+				{ RDMA_NOMSG, "RDMA_NOMSG" },		\
+				{ RDMA_MSGP, "RDMA_MSGP" },		\
+				{ RDMA_DONE, "RDMA_DONE" },		\
+				{ RDMA_ERROR, "RDMA_ERROR" })
+
+TRACE_EVENT(svcrdma_decode_rqst,
+	TP_PROTO(
+		__be32 *p,
+		unsigned int hdrlen
+	),
+
+	TP_ARGS(p, hdrlen),
+
+	TP_STRUCT__entry(
+		__field(u32, xid)
+		__field(u32, vers)
+		__field(u32, proc)
+		__field(u32, credits)
+		__field(unsigned int, hdrlen)
+	),
+
+	TP_fast_assign(
+		__entry->xid = be32_to_cpup(p++);
+		__entry->vers = be32_to_cpup(p++);
+		__entry->credits = be32_to_cpup(p++);
+		__entry->proc = be32_to_cpup(p);
+		__entry->hdrlen = hdrlen;
+	),
+
+	TP_printk("xid=0x%08x vers=%u credits=%u proc=%s hdrlen=%u",
+		__entry->xid, __entry->vers, __entry->credits,
+		show_rpcrdma_proc(__entry->proc), __entry->hdrlen)
+);
+
+TRACE_EVENT(svcrdma_decode_short,
+	TP_PROTO(
+		unsigned int hdrlen
+	),
+
+	TP_ARGS(hdrlen),
+
+	TP_STRUCT__entry(
+		__field(unsigned int, hdrlen)
+	),
+
+	TP_fast_assign(
+		__entry->hdrlen = hdrlen;
+	),
+
+	TP_printk("hdrlen=%u", __entry->hdrlen)
+);
+
+DECLARE_EVENT_CLASS(svcrdma_badreq_event,
+	TP_PROTO(
+		__be32 *p
+	),
+
+	TP_ARGS(p),
+
+	TP_STRUCT__entry(
+		__field(u32, xid)
+		__field(u32, vers)
+		__field(u32, proc)
+		__field(u32, credits)
+	),
+
+	TP_fast_assign(
+		__entry->xid = be32_to_cpup(p++);
+		__entry->vers = be32_to_cpup(p++);
+		__entry->credits = be32_to_cpup(p++);
+		__entry->proc = be32_to_cpup(p);
+	),
+
+	TP_printk("xid=0x%08x vers=%u credits=%u proc=%u",
+		__entry->xid, __entry->vers, __entry->credits, __entry->proc)
+);
+
+#define DEFINE_BADREQ_EVENT(name)					\
+		DEFINE_EVENT(svcrdma_badreq_event, svcrdma_decode_##name,\
+				TP_PROTO(				\
+					__be32 *p			\
+				),					\
+				TP_ARGS(p))
+
+DEFINE_BADREQ_EVENT(badvers);
+DEFINE_BADREQ_EVENT(drop);
+DEFINE_BADREQ_EVENT(badproc);
+DEFINE_BADREQ_EVENT(parse);
+
+DECLARE_EVENT_CLASS(svcrdma_segment_event,
+	TP_PROTO(
+		u32 handle,
+		u32 length,
+		u64 offset
+	),
+
+	TP_ARGS(handle, length, offset),
+
+	TP_STRUCT__entry(
+		__field(u32, handle)
+		__field(u32, length)
+		__field(u64, offset)
+	),
+
+	TP_fast_assign(
+		__entry->handle = handle;
+		__entry->length = length;
+		__entry->offset = offset;
+	),
+
+	TP_printk("%u@0x%016llx:0x%08x",
+		__entry->length, (unsigned long long)__entry->offset,
+		__entry->handle
+	)
+);
+
+#define DEFINE_SEGMENT_EVENT(name)					\
+		DEFINE_EVENT(svcrdma_segment_event, svcrdma_encode_##name,\
+				TP_PROTO(				\
+					u32 handle,			\
+					u32 length,			\
+					u64 offset			\
+				),					\
+				TP_ARGS(handle, length, offset))
+
+DEFINE_SEGMENT_EVENT(rseg);
+DEFINE_SEGMENT_EVENT(wseg);
+
+DECLARE_EVENT_CLASS(svcrdma_chunk_event,
+	TP_PROTO(
+		u32 length
+	),
+
+	TP_ARGS(length),
+
+	TP_STRUCT__entry(
+		__field(u32, length)
+	),
+
+	TP_fast_assign(
+		__entry->length = length;
+	),
+
+	TP_printk("length=%u",
+		__entry->length
+	)
+);
+
+#define DEFINE_CHUNK_EVENT(name)					\
+		DEFINE_EVENT(svcrdma_chunk_event, svcrdma_encode_##name,\
+				TP_PROTO(				\
+					u32 length			\
+				),					\
+				TP_ARGS(length))
+
+DEFINE_CHUNK_EVENT(pzr);
+DEFINE_CHUNK_EVENT(write);
+DEFINE_CHUNK_EVENT(reply);
+
+TRACE_EVENT(svcrdma_encode_read,
+	TP_PROTO(
+		u32 length,
+		u32 position
+	),
+
+	TP_ARGS(length, position),
+
+	TP_STRUCT__entry(
+		__field(u32, length)
+		__field(u32, position)
+	),
+
+	TP_fast_assign(
+		__entry->length = length;
+		__entry->position = position;
+	),
+
+	TP_printk("length=%u position=%u",
+		__entry->length, __entry->position
+	)
+);
+
+DECLARE_EVENT_CLASS(svcrdma_error_event,
+	TP_PROTO(
+		__be32 xid
+	),
+
+	TP_ARGS(xid),
+
+	TP_STRUCT__entry(
+		__field(u32, xid)
+	),
+
+	TP_fast_assign(
+		__entry->xid = be32_to_cpu(xid);
+	),
+
+	TP_printk("xid=0x%08x",
+		__entry->xid
+	)
+);
+
+#define DEFINE_ERROR_EVENT(name)					\
+		DEFINE_EVENT(svcrdma_error_event, svcrdma_err_##name,	\
+				TP_PROTO(				\
+					__be32 xid			\
+				),					\
+				TP_ARGS(xid))
+
+DEFINE_ERROR_EVENT(vers);
+DEFINE_ERROR_EVENT(chunk);
+
 #endif /* _TRACE_RPCRDMA_H */
 
 #include <trace/define_trace.h>
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 9eae95d..78ca580 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -93,17 +93,19 @@
  * (see rdma_read_complete() below).
  */
 
+#include <linux/spinlock.h>
 #include <asm/unaligned.h>
 #include <rdma/ib_verbs.h>
 #include <rdma/rdma_cm.h>
 
-#include <linux/spinlock.h>
-
 #include <linux/sunrpc/xdr.h>
 #include <linux/sunrpc/debug.h>
 #include <linux/sunrpc/rpc_rdma.h>
 #include <linux/sunrpc/svc_rdma.h>
 
+#include "xprt_rdma.h"
+#include <trace/events/rpcrdma.h>
+
 #define RPCDBG_FACILITY	RPCDBG_SVCXPRT
 
 /*
@@ -295,7 +297,6 @@ static int svc_rdma_xdr_decode_req(struct xdr_buf *rq_arg)
 {
 	__be32 *p, *end, *rdma_argp;
 	unsigned int hdr_len;
-	char *proc;
 
 	/* Verify that there's enough bytes for header + something */
 	if (rq_arg->len <= RPCRDMA_HDRLEN_ERR)
@@ -307,10 +308,8 @@ static int svc_rdma_xdr_decode_req(struct xdr_buf *rq_arg)
 
 	switch (*(rdma_argp + 3)) {
 	case rdma_msg:
-		proc = "RDMA_MSG";
 		break;
 	case rdma_nomsg:
-		proc = "RDMA_NOMSG";
 		break;
 
 	case rdma_done:
@@ -340,30 +339,27 @@ static int svc_rdma_xdr_decode_req(struct xdr_buf *rq_arg)
 	hdr_len = (unsigned long)p - (unsigned long)rdma_argp;
 	rq_arg->head[0].iov_len -= hdr_len;
 	rq_arg->len -= hdr_len;
-	dprintk("svcrdma: received %s request for XID 0x%08x, hdr_len=%u\n",
-		proc, be32_to_cpup(rdma_argp), hdr_len);
+	trace_svcrdma_decode_rqst(rdma_argp, hdr_len);
 	return hdr_len;
 
 out_short:
-	dprintk("svcrdma: header too short = %d\n", rq_arg->len);
+	trace_svcrdma_decode_short(rq_arg->len);
 	return -EINVAL;
 
 out_version:
-	dprintk("svcrdma: bad xprt version: %u\n",
-		be32_to_cpup(rdma_argp + 1));
+	trace_svcrdma_decode_badvers(rdma_argp);
 	return -EPROTONOSUPPORT;
 
 out_drop:
-	dprintk("svcrdma: dropping RDMA_DONE/ERROR message\n");
+	trace_svcrdma_decode_drop(rdma_argp);
 	return 0;
 
 out_proc:
-	dprintk("svcrdma: bad rdma procedure (%u)\n",
-		be32_to_cpup(rdma_argp + 3));
+	trace_svcrdma_decode_badproc(rdma_argp);
 	return -EINVAL;
 
 out_inval:
-	dprintk("svcrdma: failed to parse transport header\n");
+	trace_svcrdma_decode_parse(rdma_argp);
 	return -EINVAL;
 }
 
@@ -412,12 +408,16 @@ static void svc_rdma_send_error(struct svcxprt_rdma *xprt,
 	*p++ = *(rdma_argp + 1);
 	*p++ = xprt->sc_fc_credits;
 	*p++ = rdma_error;
-	if (status == -EPROTONOSUPPORT) {
+	switch (status) {
+	case -EPROTONOSUPPORT:
 		*p++ = err_vers;
 		*p++ = rpcrdma_version;
 		*p++ = rpcrdma_version;
-	} else {
+		trace_svcrdma_err_vers(*rdma_argp);
+		break;
+	default:
 		*p++ = err_chunk;
+		trace_svcrdma_err_chunk(*rdma_argp);
 	}
 	length = (unsigned long)p - (unsigned long)err_msgp;
 
@@ -532,8 +532,6 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
 	}
 	spin_unlock(&rdma_xprt->sc_rq_dto_lock);
 
-	dprintk("svcrdma: recvfrom: ctxt=%p on xprt=%p, rqstp=%p\n",
-		ctxt, rdma_xprt, rqstp);
 	atomic_inc(&rdma_stat_recv);
 
 	svc_rdma_build_arg_xdr(rqstp, ctxt);
@@ -559,8 +557,6 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
 
 complete:
 	svc_rdma_put_context(ctxt, 0);
-	dprintk("svcrdma: recvfrom: xprt=%p, rqstp=%p, rq_arg.len=%u\n",
-		rdma_xprt, rqstp, rqstp->rq_arg.len);
 	rqstp->rq_prot = IPPROTO_MAX;
 	svc_xprt_copy_addrs(rqstp, xprt);
 	return rqstp->rq_arg.len;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
index 12b9a7e..4b9cb54 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
@@ -5,11 +5,14 @@
  * Use the core R/W API to move RPC-over-RDMA Read and Write chunks.
  */
 
+#include <rdma/rw.h>
+
 #include <linux/sunrpc/rpc_rdma.h>
 #include <linux/sunrpc/svc_rdma.h>
 #include <linux/sunrpc/debug.h>
 
-#include <rdma/rw.h>
+#include "xprt_rdma.h"
+#include <trace/events/rpcrdma.h>
 
 #define RPCDBG_FACILITY	RPCDBG_SVCXPRT
 
@@ -437,6 +440,7 @@ static void svc_rdma_pagelist_to_sg(struct svc_rdma_write_info *info,
 		if (ret < 0)
 			goto out_initerr;
 
+		trace_svcrdma_encode_wseg(seg_handle, write_len, seg_offset);
 		list_add(&ctxt->rw_list, &cc->cc_rwctxts);
 		cc->cc_sqecount += ret;
 		if (write_len == seg_length - info->wi_seg_off) {
@@ -526,6 +530,8 @@ int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma, __be32 *wr_ch,
 	ret = svc_rdma_post_chunk_ctxt(&info->wi_cc);
 	if (ret < 0)
 		goto out_err;
+
+	trace_svcrdma_encode_write(xdr->page_len);
 	return xdr->page_len;
 
 out_err:
@@ -582,6 +588,8 @@ int svc_rdma_send_reply_chunk(struct svcxprt_rdma *rdma, __be32 *rp_ch,
 	ret = svc_rdma_post_chunk_ctxt(&info->wi_cc);
 	if (ret < 0)
 		goto out_err;
+
+	trace_svcrdma_encode_reply(consumed);
 	return consumed;
 
 out_err:
@@ -606,9 +614,6 @@ static int svc_rdma_build_read_segment(struct svc_rdma_read_info *info,
 		goto out_noctx;
 	ctxt->rw_nents = sge_no;
 
-	dprintk("svcrdma: reading segment %u@0x%016llx:0x%08x (%u sges)\n",
-		len, offset, rkey, sge_no);
-
 	sg = ctxt->rw_sg_table.sgl;
 	for (sge_no = 0; sge_no < ctxt->rw_nents; sge_no++) {
 		seg_len = min_t(unsigned int, len,
@@ -686,6 +691,7 @@ static int svc_rdma_build_read_chunk(struct svc_rqst *rqstp,
 		if (ret < 0)
 			break;
 
+		trace_svcrdma_encode_rseg(rs_handle, rs_length, rs_offset);
 		info->ri_chunklen += rs_length;
 	}
 
@@ -706,9 +712,6 @@ static int svc_rdma_build_normal_read_chunk(struct svc_rqst *rqstp,
 	struct svc_rdma_op_ctxt *head = info->ri_readctxt;
 	int ret;
 
-	dprintk("svcrdma: Reading Read chunk at position %u\n",
-		info->ri_position);
-
 	info->ri_pageno = head->hdr_count;
 	info->ri_pageoff = 0;
 
@@ -716,6 +719,8 @@ static int svc_rdma_build_normal_read_chunk(struct svc_rqst *rqstp,
 	if (ret < 0)
 		goto out;
 
+	trace_svcrdma_encode_read(info->ri_chunklen, info->ri_position);
+
 	/* Split the Receive buffer between the head and tail
 	 * buffers at Read chunk's position. XDR roundup of the
 	 * chunk is not included in either the pagelist or in
@@ -764,8 +769,6 @@ static int svc_rdma_build_pz_read_chunk(struct svc_rqst *rqstp,
 	struct svc_rdma_op_ctxt *head = info->ri_readctxt;
 	int ret;
 
-	dprintk("svcrdma: Reading Position Zero Read chunk\n");
-
 	info->ri_pageno = head->hdr_count - 1;
 	info->ri_pageoff = offset_in_page(head->byte_len);
 
@@ -773,6 +776,8 @@ static int svc_rdma_build_pz_read_chunk(struct svc_rqst *rqstp,
 	if (ret < 0)
 		goto out;
 
+	trace_svcrdma_encode_pzr(info->ri_chunklen);
+
 	head->arg.len += info->ri_chunklen;
 	head->arg.buflen += info->ri_chunklen;
 
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index 79bd3a3..4c58083 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -99,14 +99,19 @@
  * where two different Write segments send portions of the same page.
  */
 
-#include <linux/sunrpc/debug.h>
-#include <linux/sunrpc/rpc_rdma.h>
 #include <linux/spinlock.h>
 #include <asm/unaligned.h>
+
 #include <rdma/ib_verbs.h>
 #include <rdma/rdma_cm.h>
+
+#include <linux/sunrpc/debug.h>
+#include <linux/sunrpc/rpc_rdma.h>
 #include <linux/sunrpc/svc_rdma.h>
 
+#include "xprt_rdma.h"
+#include <trace/events/rpcrdma.h>
+
 #define RPCDBG_FACILITY	RPCDBG_SVCXPRT
 
 static u32 xdr_padsize(u32 len)
@@ -524,12 +529,6 @@ static int svc_rdma_send_reply_msg(struct svcxprt_rdma *rdma,
 	u32 inv_rkey;
 	int ret;
 
-	dprintk("svcrdma: sending %s reply: head=%zu, pagelen=%u, tail=%zu\n",
-		(rp_ch ? "RDMA_NOMSG" : "RDMA_MSG"),
-		rqstp->rq_res.head[0].iov_len,
-		rqstp->rq_res.page_len,
-		rqstp->rq_res.tail[0].iov_len);
-
 	ctxt = svc_rdma_get_context(rdma);
 
 	ret = svc_rdma_map_reply_hdr(rdma, ctxt, rdma_resp,
@@ -580,6 +579,7 @@ static int svc_rdma_send_error_msg(struct svcxprt_rdma *rdma,
 	/* Replace the original transport header with an
 	 * RDMA_ERROR response. XID etc are preserved.
 	 */
+	trace_svcrdma_err_chunk(*rdma_resp);
 	p = rdma_resp + 3;
 	*p++ = rdma_error;
 	*p   = err_chunk;
@@ -635,9 +635,6 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
 	rdma_argp = page_address(rqstp->rq_pages[0]);
 	svc_rdma_get_write_arrays(rdma_argp, &wr_lst, &rp_ch);
 
-	dprintk("svcrdma: preparing response for XID 0x%08x\n",
-		be32_to_cpup(rdma_argp));
-
 	/* Create the RDMA response header. xprt->xpt_mutex,
 	 * acquired in svc_send(), serializes RPC replies. The
 	 * code path below that inserts the credit grant value
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 22e2595..d2cdffa 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -41,21 +41,25 @@
  * Author: Tom Tucker <tom@opengridcomputing.com>
  */
 
-#include <linux/sunrpc/svc_xprt.h>
-#include <linux/sunrpc/addr.h>
-#include <linux/sunrpc/debug.h>
-#include <linux/sunrpc/rpc_rdma.h>
 #include <linux/interrupt.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 #include <linux/workqueue.h>
+#include <linux/export.h>
+
 #include <rdma/ib_verbs.h>
 #include <rdma/rdma_cm.h>
 #include <rdma/rw.h>
+
+#include <linux/sunrpc/addr.h>
+#include <linux/sunrpc/debug.h>
+#include <linux/sunrpc/rpc_rdma.h>
+#include <linux/sunrpc/svc_xprt.h>
 #include <linux/sunrpc/svc_rdma.h>
-#include <linux/export.h>
+
 #include "xprt_rdma.h"
+#include <trace/events/rpcrdma.h>
 
 #define RPCDBG_FACILITY	RPCDBG_SVCXPRT
 
@@ -862,10 +866,12 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	dprintk("    max_requests    : %d\n", newxprt->sc_max_requests);
 	dprintk("    ord             : %d\n", conn_param.initiator_depth);
 
+	trace_svcrdma_xprt_accept(&newxprt->sc_xprt);
 	return &newxprt->sc_xprt;
 
  errout:
 	dprintk("svcrdma: failure accepting new connection rc=%d.\n", ret);
+	trace_svcrdma_xprt_fail(&newxprt->sc_xprt);
 	/* Take a reference in case the DTO handler runs */
 	svc_xprt_get(&newxprt->sc_xprt);
 	if (newxprt->sc_qp && !IS_ERR(newxprt->sc_qp))
@@ -896,7 +902,6 @@ static void svc_rdma_detach(struct svc_xprt *xprt)
 {
 	struct svcxprt_rdma *rdma =
 		container_of(xprt, struct svcxprt_rdma, sc_xprt);
-	dprintk("svc: svc_rdma_detach(%p)\n", xprt);
 
 	/* Disconnect and flush posted WQE */
 	rdma_disconnect(rdma->sc_cm_id);
@@ -908,7 +913,7 @@ static void __svc_rdma_free(struct work_struct *work)
 		container_of(work, struct svcxprt_rdma, sc_work);
 	struct svc_xprt *xprt = &rdma->sc_xprt;
 
-	dprintk("svcrdma: %s(%p)\n", __func__, rdma);
+	trace_svcrdma_xprt_free(xprt);
 
 	if (rdma->sc_qp && !IS_ERR(rdma->sc_qp))
 		ib_drain_qp(rdma->sc_qp);


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v1 05/19] svcrdma: Trace key RDMA API events
  2018-05-07 19:26 [PATCH v1 00/19] NFS/RDMA server for-next Chuck Lever
                   ` (3 preceding siblings ...)
  2018-05-07 19:27 ` [PATCH v1 04/19] svcrdma: Trace key RPC/RDMA protocol events Chuck Lever
@ 2018-05-07 19:27 ` Chuck Lever
  2018-05-07 19:27 ` [PATCH v1 06/19] svcrdma: Introduce svc_rdma_recv_ctxt Chuck Lever
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Chuck Lever @ 2018-05-07 19:27 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

This includes:
  * Posting on the Send and Receive queues
  * Send, Receive, Read, and Write completion
  * Connect upcalls
  * QP errors

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/trace/events/rpcrdma.h             |  322 ++++++++++++++++++++++++++++
 net/sunrpc/xprtrdma/backchannel.c          |    1 
 net/sunrpc/xprtrdma/fmr_ops.c              |    2 
 net/sunrpc/xprtrdma/frwr_ops.c             |    1 
 net/sunrpc/xprtrdma/rpc_rdma.c             |    2 
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |    3 
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c    |    2 
 net/sunrpc/xprtrdma/svc_rdma_rw.c          |   14 +
 net/sunrpc/xprtrdma/svc_rdma_sendto.c      |    6 -
 net/sunrpc/xprtrdma/svc_rdma_transport.c   |   70 ++----
 net/sunrpc/xprtrdma/transport.c            |    3 
 11 files changed, 374 insertions(+), 52 deletions(-)

diff --git a/include/trace/events/rpcrdma.h b/include/trace/events/rpcrdma.h
index 633520a..094a676 100644
--- a/include/trace/events/rpcrdma.h
+++ b/include/trace/events/rpcrdma.h
@@ -1145,6 +1145,328 @@
 DEFINE_ERROR_EVENT(vers);
 DEFINE_ERROR_EVENT(chunk);
 
+/**
+ ** Server-side RDMA API events
+ **/
+
+TRACE_EVENT(svcrdma_dma_map_page,
+	TP_PROTO(
+		const struct svcxprt_rdma *rdma,
+		const void *page
+	),
+
+	TP_ARGS(rdma, page),
+
+	TP_STRUCT__entry(
+		__field(const void *, page);
+		__string(device, rdma->sc_cm_id->device->name)
+		__string(addr, rdma->sc_xprt.xpt_remotebuf)
+	),
+
+	TP_fast_assign(
+		__entry->page = page;
+		__assign_str(device, rdma->sc_cm_id->device->name);
+		__assign_str(addr, rdma->sc_xprt.xpt_remotebuf);
+	),
+
+	TP_printk("addr=%s device=%s page=%p",
+		__get_str(addr), __get_str(device), __entry->page
+	)
+);
+
+TRACE_EVENT(svcrdma_dma_map_rwctx,
+	TP_PROTO(
+		const struct svcxprt_rdma *rdma,
+		int status
+	),
+
+	TP_ARGS(rdma, status),
+
+	TP_STRUCT__entry(
+		__field(int, status)
+		__string(device, rdma->sc_cm_id->device->name)
+		__string(addr, rdma->sc_xprt.xpt_remotebuf)
+	),
+
+	TP_fast_assign(
+		__entry->status = status;
+		__assign_str(device, rdma->sc_cm_id->device->name);
+		__assign_str(addr, rdma->sc_xprt.xpt_remotebuf);
+	),
+
+	TP_printk("addr=%s device=%s status=%d",
+		__get_str(addr), __get_str(device), __entry->status
+	)
+);
+
+TRACE_EVENT(svcrdma_send_failed,
+	TP_PROTO(
+		const struct svc_rqst *rqst,
+		int status
+	),
+
+	TP_ARGS(rqst, status),
+
+	TP_STRUCT__entry(
+		__field(int, status)
+		__field(u32, xid)
+		__field(const void *, xprt)
+		__string(addr, rqst->rq_xprt->xpt_remotebuf)
+	),
+
+	TP_fast_assign(
+		__entry->status = status;
+		__entry->xid = __be32_to_cpu(rqst->rq_xid);
+		__entry->xprt = rqst->rq_xprt;
+		__assign_str(addr, rqst->rq_xprt->xpt_remotebuf);
+	),
+
+	TP_printk("xprt=%p addr=%s xid=0x%08x status=%d",
+		__entry->xprt, __get_str(addr),
+		__entry->xid, __entry->status
+	)
+);
+
+DECLARE_EVENT_CLASS(svcrdma_sendcomp_event,
+	TP_PROTO(
+		const struct ib_wc *wc
+	),
+
+	TP_ARGS(wc),
+
+	TP_STRUCT__entry(
+		__field(const void *, cqe)
+		__field(unsigned int, status)
+		__field(unsigned int, vendor_err)
+	),
+
+	TP_fast_assign(
+		__entry->cqe = wc->wr_cqe;
+		__entry->status = wc->status;
+		if (wc->status)
+			__entry->vendor_err = wc->vendor_err;
+		else
+			__entry->vendor_err = 0;
+	),
+
+	TP_printk("cqe=%p status=%s (%u/0x%x)",
+		__entry->cqe, rdma_show_wc_status(__entry->status),
+		__entry->status, __entry->vendor_err
+	)
+);
+
+#define DEFINE_SENDCOMP_EVENT(name)					\
+		DEFINE_EVENT(svcrdma_sendcomp_event, svcrdma_wc_##name,	\
+				TP_PROTO(				\
+					const struct ib_wc *wc		\
+				),					\
+				TP_ARGS(wc))
+
+TRACE_EVENT(svcrdma_post_send,
+	TP_PROTO(
+		const struct ib_send_wr *wr,
+		int status
+	),
+
+	TP_ARGS(wr, status),
+
+	TP_STRUCT__entry(
+		__field(const void *, cqe)
+		__field(unsigned int, num_sge)
+		__field(u32, inv_rkey)
+		__field(int, status)
+	),
+
+	TP_fast_assign(
+		__entry->cqe = wr->wr_cqe;
+		__entry->num_sge = wr->num_sge;
+		__entry->inv_rkey = (wr->opcode == IB_WR_SEND_WITH_INV) ?
+					wr->ex.invalidate_rkey : 0;
+		__entry->status = status;
+	),
+
+	TP_printk("cqe=%p num_sge=%u inv_rkey=0x%08x status=%d",
+		__entry->cqe, __entry->num_sge,
+		__entry->inv_rkey, __entry->status
+	)
+);
+
+DEFINE_SENDCOMP_EVENT(send);
+
+TRACE_EVENT(svcrdma_post_recv,
+	TP_PROTO(
+		const struct ib_recv_wr *wr,
+		int status
+	),
+
+	TP_ARGS(wr, status),
+
+	TP_STRUCT__entry(
+		__field(const void *, cqe)
+		__field(int, status)
+	),
+
+	TP_fast_assign(
+		__entry->cqe = wr->wr_cqe;
+		__entry->status = status;
+	),
+
+	TP_printk("cqe=%p status=%d",
+		__entry->cqe, __entry->status
+	)
+);
+
+TRACE_EVENT(svcrdma_wc_receive,
+	TP_PROTO(
+		const struct ib_wc *wc
+	),
+
+	TP_ARGS(wc),
+
+	TP_STRUCT__entry(
+		__field(const void *, cqe)
+		__field(u32, byte_len)
+		__field(unsigned int, status)
+		__field(u32, vendor_err)
+	),
+
+	TP_fast_assign(
+		__entry->cqe = wc->wr_cqe;
+		__entry->status = wc->status;
+		if (wc->status) {
+			__entry->byte_len = 0;
+			__entry->vendor_err = wc->vendor_err;
+		} else {
+			__entry->byte_len = wc->byte_len;
+			__entry->vendor_err = 0;
+		}
+	),
+
+	TP_printk("cqe=%p byte_len=%u status=%s (%u/0x%x)",
+		__entry->cqe, __entry->byte_len,
+		rdma_show_wc_status(__entry->status),
+		__entry->status, __entry->vendor_err
+	)
+);
+
+TRACE_EVENT(svcrdma_post_rw,
+	TP_PROTO(
+		const void *cqe,
+		int sqecount,
+		int status
+	),
+
+	TP_ARGS(cqe, sqecount, status),
+
+	TP_STRUCT__entry(
+		__field(const void *, cqe)
+		__field(int, sqecount)
+		__field(int, status)
+	),
+
+	TP_fast_assign(
+		__entry->cqe = cqe;
+		__entry->sqecount = sqecount;
+		__entry->status = status;
+	),
+
+	TP_printk("cqe=%p sqecount=%d status=%d",
+		__entry->cqe, __entry->sqecount, __entry->status
+	)
+);
+
+DEFINE_SENDCOMP_EVENT(read);
+DEFINE_SENDCOMP_EVENT(write);
+
+TRACE_EVENT(svcrdma_cm_event,
+	TP_PROTO(
+		const struct rdma_cm_event *event,
+		const struct sockaddr *sap
+	),
+
+	TP_ARGS(event, sap),
+
+	TP_STRUCT__entry(
+		__field(unsigned int, event)
+		__field(int, status)
+		__array(__u8, addr, INET6_ADDRSTRLEN + 10)
+	),
+
+	TP_fast_assign(
+		__entry->event = event->event;
+		__entry->status = event->status;
+		snprintf(__entry->addr, sizeof(__entry->addr) - 1,
+			 "%pISpc", sap);
+	),
+
+	TP_printk("addr=%s event=%s (%u/%d)",
+		__entry->addr,
+		rdma_show_cm_event(__entry->event),
+		__entry->event, __entry->status
+	)
+);
+
+TRACE_EVENT(svcrdma_qp_error,
+	TP_PROTO(
+		const struct ib_event *event,
+		const struct sockaddr *sap
+	),
+
+	TP_ARGS(event, sap),
+
+	TP_STRUCT__entry(
+		__field(unsigned int, event)
+		__string(device, event->device->name)
+		__array(__u8, addr, INET6_ADDRSTRLEN + 10)
+	),
+
+	TP_fast_assign(
+		__entry->event = event->event;
+		__assign_str(device, event->device->name);
+		snprintf(__entry->addr, sizeof(__entry->addr) - 1,
+			 "%pISpc", sap);
+	),
+
+	TP_printk("addr=%s dev=%s event=%s (%u)",
+		__entry->addr, __get_str(device),
+		rdma_show_ib_event(__entry->event), __entry->event
+	)
+);
+
+DECLARE_EVENT_CLASS(svcrdma_sendqueue_event,
+	TP_PROTO(
+		const struct svcxprt_rdma *rdma
+	),
+
+	TP_ARGS(rdma),
+
+	TP_STRUCT__entry(
+		__field(int, avail)
+		__field(int, depth)
+		__string(addr, rdma->sc_xprt.xpt_remotebuf)
+	),
+
+	TP_fast_assign(
+		__entry->avail = atomic_read(&rdma->sc_sq_avail);
+		__entry->depth = rdma->sc_sq_depth;
+		__assign_str(addr, rdma->sc_xprt.xpt_remotebuf);
+	),
+
+	TP_printk("addr=%s sc_sq_avail=%d/%d",
+		__get_str(addr), __entry->avail, __entry->depth
+	)
+);
+
+#define DEFINE_SQ_EVENT(name)						\
+		DEFINE_EVENT(svcrdma_sendqueue_event, svcrdma_sq_##name,\
+				TP_PROTO(				\
+					const struct svcxprt_rdma *rdma \
+				),					\
+				TP_ARGS(rdma))
+
+DEFINE_SQ_EVENT(full);
+DEFINE_SQ_EVENT(retry);
+
 #endif /* _TRACE_RPCRDMA_H */
 
 #include <trace/define_trace.h>
diff --git a/net/sunrpc/xprtrdma/backchannel.c b/net/sunrpc/xprtrdma/backchannel.c
index 05c69ac..dbedc87 100644
--- a/net/sunrpc/xprtrdma/backchannel.c
+++ b/net/sunrpc/xprtrdma/backchannel.c
@@ -9,6 +9,7 @@
 #include <linux/sunrpc/xprt.h>
 #include <linux/sunrpc/svc.h>
 #include <linux/sunrpc/svc_xprt.h>
+#include <linux/sunrpc/svc_rdma.h>
 
 #include "xprt_rdma.h"
 #include <trace/events/rpcrdma.h>
diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 08de7da..c74b415 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -20,6 +20,8 @@
  * verb (fmr_op_unmap).
  */
 
+#include <linux/sunrpc/svc_rdma.h>
+
 #include "xprt_rdma.h"
 #include <trace/events/rpcrdma.h>
 
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index f8312e3..5d6c01c 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -71,6 +71,7 @@
  */
 
 #include <linux/sunrpc/rpc_rdma.h>
+#include <linux/sunrpc/svc_rdma.h>
 
 #include "xprt_rdma.h"
 #include <trace/events/rpcrdma.h>
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index f358d1e..b942d7e 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -48,6 +48,8 @@
 
 #include <linux/highmem.h>
 
+#include <linux/sunrpc/svc_rdma.h>
+
 #include "xprt_rdma.h"
 #include <trace/events/rpcrdma.h>
 
diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
index a73632c..d501521 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
@@ -6,8 +6,11 @@
  */
 
 #include <linux/module.h>
+
 #include <linux/sunrpc/svc_rdma.h>
+
 #include "xprt_rdma.h"
+#include <trace/events/rpcrdma.h>
 
 #define RPCDBG_FACILITY	RPCDBG_SVCXPRT
 
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 78ca580..330d542 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -432,8 +432,6 @@ static void svc_rdma_send_error(struct svcxprt_rdma *xprt,
 
 	ret = svc_rdma_post_send_wr(xprt, ctxt, 1, 0);
 	if (ret) {
-		dprintk("svcrdma: Error %d posting send for protocol error\n",
-			ret);
 		svc_rdma_unmap_dma(ctxt);
 		svc_rdma_put_context(ctxt, 1);
 	}
diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
index 4b9cb54..887ceef 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
@@ -208,6 +208,8 @@ static void svc_rdma_write_done(struct ib_cq *cq, struct ib_wc *wc)
 	struct svc_rdma_write_info *info =
 			container_of(cc, struct svc_rdma_write_info, wi_cc);
 
+	trace_svcrdma_wc_write(wc);
+
 	atomic_add(cc->cc_sqecount, &rdma->sc_sq_avail);
 	wake_up(&rdma->sc_send_wait);
 
@@ -269,6 +271,8 @@ static void svc_rdma_wc_read_done(struct ib_cq *cq, struct ib_wc *wc)
 	struct svc_rdma_read_info *info =
 			container_of(cc, struct svc_rdma_read_info, ri_cc);
 
+	trace_svcrdma_wc_read(wc);
+
 	atomic_add(cc->cc_sqecount, &rdma->sc_sq_avail);
 	wake_up(&rdma->sc_send_wait);
 
@@ -326,18 +330,20 @@ static int svc_rdma_post_chunk_ctxt(struct svc_rdma_chunk_ctxt *cc)
 		if (atomic_sub_return(cc->cc_sqecount,
 				      &rdma->sc_sq_avail) > 0) {
 			ret = ib_post_send(rdma->sc_qp, first_wr, &bad_wr);
+			trace_svcrdma_post_rw(&cc->cc_cqe,
+					      cc->cc_sqecount, ret);
 			if (ret)
 				break;
 			return 0;
 		}
 
-		atomic_inc(&rdma_stat_sq_starve);
+		trace_svcrdma_sq_full(rdma);
 		atomic_add(cc->cc_sqecount, &rdma->sc_sq_avail);
 		wait_event(rdma->sc_send_wait,
 			   atomic_read(&rdma->sc_sq_avail) > cc->cc_sqecount);
+		trace_svcrdma_sq_retry(rdma);
 	} while (1);
 
-	pr_err("svcrdma: ib_post_send failed (%d)\n", ret);
 	set_bit(XPT_CLOSE, &xprt->xpt_flags);
 
 	/* If even one was posted, there will be a completion. */
@@ -466,7 +472,7 @@ static void svc_rdma_pagelist_to_sg(struct svc_rdma_write_info *info,
 
 out_initerr:
 	svc_rdma_put_rw_ctxt(rdma, ctxt);
-	pr_err("svcrdma: failed to map pagelist (%d)\n", ret);
+	trace_svcrdma_dma_map_rwctx(rdma, ret);
 	return -EIO;
 }
 
@@ -661,8 +667,8 @@ static int svc_rdma_build_read_segment(struct svc_rdma_read_info *info,
 	return -EINVAL;
 
 out_initerr:
+	trace_svcrdma_dma_map_rwctx(cc->cc_rdma, ret);
 	svc_rdma_put_rw_ctxt(cc->cc_rdma, ctxt);
-	pr_err("svcrdma: failed to map pagelist (%d)\n", ret);
 	return -EIO;
 }
 
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index 4c58083..fed28de 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -353,7 +353,7 @@ static int svc_rdma_dma_map_page(struct svcxprt_rdma *rdma,
 	return 0;
 
 out_maperr:
-	pr_err("svcrdma: failed to map page\n");
+	trace_svcrdma_dma_map_page(rdma, page);
 	return -EIO;
 }
 
@@ -597,7 +597,6 @@ static int svc_rdma_send_error_msg(struct svcxprt_rdma *rdma,
 	return 0;
 
 err:
-	pr_err("svcrdma: failed to post Send WR (%d)\n", ret);
 	svc_rdma_unmap_dma(ctxt);
 	svc_rdma_put_context(ctxt, 1);
 	return ret;
@@ -690,8 +689,7 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
  err1:
 	put_page(res_page);
  err0:
-	pr_err("svcrdma: Could not send reply, err=%d. Closing transport.\n",
-	       ret);
+	trace_svcrdma_send_failed(rqstp, ret);
 	set_bit(XPT_CLOSE, &xprt->xpt_flags);
 	return -ENOTCONN;
 }
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index d2cdffa..ca9001d 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -275,16 +275,15 @@ static void qp_event_handler(struct ib_event *event, void *context)
 {
 	struct svc_xprt *xprt = context;
 
+	trace_svcrdma_qp_error(event, (struct sockaddr *)&xprt->xpt_remote);
 	switch (event->event) {
 	/* These are considered benign events */
 	case IB_EVENT_PATH_MIG:
 	case IB_EVENT_COMM_EST:
 	case IB_EVENT_SQ_DRAINED:
 	case IB_EVENT_QP_LAST_WQE_REACHED:
-		dprintk("svcrdma: QP event %s (%d) received for QP=%p\n",
-			ib_event_msg(event->event), event->event,
-			event->element.qp);
 		break;
+
 	/* These are considered fatal events */
 	case IB_EVENT_PATH_MIG_ERR:
 	case IB_EVENT_QP_FATAL:
@@ -292,10 +291,6 @@ static void qp_event_handler(struct ib_event *event, void *context)
 	case IB_EVENT_QP_ACCESS_ERR:
 	case IB_EVENT_DEVICE_FATAL:
 	default:
-		dprintk("svcrdma: QP ERROR event %s (%d) received for QP=%p, "
-			"closing transport\n",
-			ib_event_msg(event->event), event->event,
-			event->element.qp);
 		set_bit(XPT_CLOSE, &xprt->xpt_flags);
 		svc_xprt_enqueue(xprt);
 		break;
@@ -314,6 +309,8 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
 	struct ib_cqe *cqe = wc->wr_cqe;
 	struct svc_rdma_op_ctxt *ctxt;
 
+	trace_svcrdma_wc_receive(wc);
+
 	/* WARNING: Only wc->wr_cqe and wc->status are reliable */
 	ctxt = container_of(cqe, struct svc_rdma_op_ctxt, cqe);
 	svc_rdma_unmap_dma(ctxt);
@@ -360,6 +357,8 @@ void svc_rdma_wc_send(struct ib_cq *cq, struct ib_wc *wc)
 	struct ib_cqe *cqe = wc->wr_cqe;
 	struct svc_rdma_op_ctxt *ctxt;
 
+	trace_svcrdma_wc_send(wc);
+
 	atomic_inc(&xprt->sc_sq_avail);
 	wake_up(&xprt->sc_send_wait);
 
@@ -455,6 +454,7 @@ static struct svcxprt_rdma *svc_rdma_create_xprt(struct svc_serv *serv,
 
 	svc_xprt_get(&xprt->sc_xprt);
 	ret = ib_post_recv(xprt->sc_qp, &recv_wr, &bad_recv_wr);
+	trace_svcrdma_post_recv(&recv_wr, ret);
 	if (ret) {
 		svc_rdma_unmap_dma(ctxt);
 		svc_rdma_put_context(ctxt, 1);
@@ -513,8 +513,6 @@ static void handle_connect_req(struct rdma_cm_id *new_cma_id,
 		return;
 	newxprt->sc_cm_id = new_cma_id;
 	new_cma_id->context = newxprt;
-	dprintk("svcrdma: Creating newxprt=%p, cm_id=%p, listenxprt=%p\n",
-		newxprt, newxprt->sc_cm_id, listen_xprt);
 	svc_rdma_parse_connect_private(newxprt, param);
 
 	/* Save client advertised inbound read limit for use later in accept. */
@@ -545,33 +543,21 @@ static void handle_connect_req(struct rdma_cm_id *new_cma_id,
 static int rdma_listen_handler(struct rdma_cm_id *cma_id,
 			       struct rdma_cm_event *event)
 {
-	struct svcxprt_rdma *xprt = cma_id->context;
+	struct sockaddr *sap = (struct sockaddr *)&cma_id->route.addr.src_addr;
+	struct svcxprt_rdma *rdma = cma_id->context;
 	int ret = 0;
 
+	trace_svcrdma_cm_event(event, sap);
+
 	switch (event->event) {
 	case RDMA_CM_EVENT_CONNECT_REQUEST:
 		dprintk("svcrdma: Connect request on cma_id=%p, xprt = %p, "
-			"event = %s (%d)\n", cma_id, cma_id->context,
+			"event = %s (%d)\n", cma_id, rdma,
 			rdma_event_msg(event->event), event->event);
 		handle_connect_req(cma_id, &event->param.conn);
 		break;
-
-	case RDMA_CM_EVENT_ESTABLISHED:
-		/* Accept complete */
-		dprintk("svcrdma: Connection completed on LISTEN xprt=%p, "
-			"cm_id=%p\n", xprt, cma_id);
-		break;
-
-	case RDMA_CM_EVENT_DEVICE_REMOVAL:
-		dprintk("svcrdma: Device removal xprt=%p, cm_id=%p\n",
-			xprt, cma_id);
-		if (xprt) {
-			set_bit(XPT_CLOSE, &xprt->sc_xprt.xpt_flags);
-			svc_xprt_enqueue(&xprt->sc_xprt);
-		}
-		break;
-
 	default:
+		/* NB: No device removal upcall for INADDR_ANY listeners */
 		dprintk("svcrdma: Unexpected event on listening endpoint %p, "
 			"event = %s (%d)\n", cma_id,
 			rdma_event_msg(event->event), event->event);
@@ -584,9 +570,12 @@ static int rdma_listen_handler(struct rdma_cm_id *cma_id,
 static int rdma_cma_handler(struct rdma_cm_id *cma_id,
 			    struct rdma_cm_event *event)
 {
-	struct svc_xprt *xprt = cma_id->context;
-	struct svcxprt_rdma *rdma =
-		container_of(xprt, struct svcxprt_rdma, sc_xprt);
+	struct sockaddr *sap = (struct sockaddr *)&cma_id->route.addr.dst_addr;
+	struct svcxprt_rdma *rdma = cma_id->context;
+	struct svc_xprt *xprt = &rdma->sc_xprt;
+
+	trace_svcrdma_cm_event(event, sap);
+
 	switch (event->event) {
 	case RDMA_CM_EVENT_ESTABLISHED:
 		/* Accept complete */
@@ -599,21 +588,17 @@ static int rdma_cma_handler(struct rdma_cm_id *cma_id,
 	case RDMA_CM_EVENT_DISCONNECTED:
 		dprintk("svcrdma: Disconnect on DTO xprt=%p, cm_id=%p\n",
 			xprt, cma_id);
-		if (xprt) {
-			set_bit(XPT_CLOSE, &xprt->xpt_flags);
-			svc_xprt_enqueue(xprt);
-			svc_xprt_put(xprt);
-		}
+		set_bit(XPT_CLOSE, &xprt->xpt_flags);
+		svc_xprt_enqueue(xprt);
+		svc_xprt_put(xprt);
 		break;
 	case RDMA_CM_EVENT_DEVICE_REMOVAL:
 		dprintk("svcrdma: Device removal cma_id=%p, xprt = %p, "
 			"event = %s (%d)\n", cma_id, xprt,
 			rdma_event_msg(event->event), event->event);
-		if (xprt) {
-			set_bit(XPT_CLOSE, &xprt->xpt_flags);
-			svc_xprt_enqueue(xprt);
-			svc_xprt_put(xprt);
-		}
+		set_bit(XPT_CLOSE, &xprt->xpt_flags);
+		svc_xprt_enqueue(xprt);
+		svc_xprt_put(xprt);
 		break;
 	default:
 		dprintk("svcrdma: Unexpected event on DTO endpoint %p, "
@@ -1022,13 +1007,13 @@ int svc_rdma_send(struct svcxprt_rdma *xprt, struct ib_send_wr *wr)
 	while (1) {
 		if ((atomic_sub_return(wr_count, &xprt->sc_sq_avail) < 0)) {
 			atomic_inc(&rdma_stat_sq_starve);
-
-			/* Wait until SQ WR available if SQ still full */
+			trace_svcrdma_sq_full(xprt);
 			atomic_add(wr_count, &xprt->sc_sq_avail);
 			wait_event(xprt->sc_send_wait,
 				   atomic_read(&xprt->sc_sq_avail) > wr_count);
 			if (test_bit(XPT_CLOSE, &xprt->sc_xprt.xpt_flags))
 				return -ENOTCONN;
+			trace_svcrdma_sq_retry(xprt);
 			continue;
 		}
 		/* Take a transport ref for each WR posted */
@@ -1037,6 +1022,7 @@ int svc_rdma_send(struct svcxprt_rdma *xprt, struct ib_send_wr *wr)
 
 		/* Bump used SQ WR count and post */
 		ret = ib_post_send(xprt->sc_qp, wr, &bad_wr);
+		trace_svcrdma_post_send(wr, ret);
 		if (ret) {
 			set_bit(XPT_CLOSE, &xprt->sc_xprt.xpt_flags);
 			for (i = 0; i < wr_count; i ++)
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 3d1b277..caca977 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -51,7 +51,10 @@
 #include <linux/module.h>
 #include <linux/slab.h>
 #include <linux/seq_file.h>
+#include <linux/smp.h>
+
 #include <linux/sunrpc/addr.h>
+#include <linux/sunrpc/svc_rdma.h>
 
 #include "xprt_rdma.h"
 #include <trace/events/rpcrdma.h>


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v1 06/19] svcrdma: Introduce svc_rdma_recv_ctxt
  2018-05-07 19:26 [PATCH v1 00/19] NFS/RDMA server for-next Chuck Lever
                   ` (4 preceding siblings ...)
  2018-05-07 19:27 ` [PATCH v1 05/19] svcrdma: Trace key RDMA API events Chuck Lever
@ 2018-05-07 19:27 ` Chuck Lever
  2018-05-09 20:48   ` J. Bruce Fields
  2018-05-07 19:27 ` [PATCH v1 07/19] svcrdma: Remove sc_rq_depth Chuck Lever
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 29+ messages in thread
From: Chuck Lever @ 2018-05-07 19:27 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

svc_rdma_op_ctxt's are pre-allocated and maintained on a per-xprt
free list. This eliminates the overhead of calling kmalloc / kfree,
both of which grab a globally shared lock that disables interrupts.
To reduce contention further, separate the use of these objects in
the Receive and Send paths in svcrdma.

Subsequent patches will take advantage of this separation by
allocating real resources which are then cached in these objects.
The allocations are freed when the transport is torn down.

I've renamed the structure so that static type checking can be used
to ensure that uses of op_ctxt and recv_ctxt are not confused. As an
additional clean up, structure fields are renamed to conform with
kernel coding conventions.

As a final clean up, helpers related to recv_ctxt are moved closer
to the functions that use them.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h          |   24 ++
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |  318 ++++++++++++++++++++++++++----
 net/sunrpc/xprtrdma/svc_rdma_rw.c        |   84 ++++----
 net/sunrpc/xprtrdma/svc_rdma_sendto.c    |    2 
 net/sunrpc/xprtrdma/svc_rdma_transport.c |  142 +------------
 5 files changed, 349 insertions(+), 221 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 88da0c9..37f759d 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -128,6 +128,9 @@ struct svcxprt_rdma {
 	unsigned long	     sc_flags;
 	struct list_head     sc_read_complete_q;
 	struct work_struct   sc_work;
+
+	spinlock_t	     sc_recv_lock;
+	struct list_head     sc_recv_ctxts;
 };
 /* sc_flags */
 #define RDMAXPRT_CONN_PENDING	3
@@ -142,6 +145,19 @@ struct svcxprt_rdma {
 
 #define RPCSVC_MAXPAYLOAD_RDMA	RPCSVC_MAXPAYLOAD
 
+struct svc_rdma_recv_ctxt {
+	struct list_head	rc_list;
+	struct ib_recv_wr	rc_recv_wr;
+	struct ib_cqe		rc_cqe;
+	struct xdr_buf		rc_arg;
+	u32			rc_byte_len;
+	unsigned int		rc_page_count;
+	unsigned int		rc_hdr_count;
+	struct ib_sge		rc_sges[1 +
+					RPCRDMA_MAX_INLINE_THRESH / PAGE_SIZE];
+	struct page		*rc_pages[RPCSVC_MAXPAGES];
+};
+
 /* Track DMA maps for this transport and context */
 static inline void svc_rdma_count_mappings(struct svcxprt_rdma *rdma,
 					   struct svc_rdma_op_ctxt *ctxt)
@@ -155,13 +171,19 @@ extern int svc_rdma_handle_bc_reply(struct rpc_xprt *xprt,
 				    struct xdr_buf *rcvbuf);
 
 /* svc_rdma_recvfrom.c */
+extern void svc_rdma_recv_ctxts_destroy(struct svcxprt_rdma *rdma);
+extern bool svc_rdma_post_recvs(struct svcxprt_rdma *rdma);
+extern void svc_rdma_recv_ctxt_put(struct svcxprt_rdma *rdma,
+				   struct svc_rdma_recv_ctxt *ctxt,
+				   int free_pages);
+extern void svc_rdma_flush_recv_queues(struct svcxprt_rdma *rdma);
 extern int svc_rdma_recvfrom(struct svc_rqst *);
 
 /* svc_rdma_rw.c */
 extern void svc_rdma_destroy_rw_ctxts(struct svcxprt_rdma *rdma);
 extern int svc_rdma_recv_read_chunk(struct svcxprt_rdma *rdma,
 				    struct svc_rqst *rqstp,
-				    struct svc_rdma_op_ctxt *head, __be32 *p);
+				    struct svc_rdma_recv_ctxt *head, __be32 *p);
 extern int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma,
 				     __be32 *wr_ch, struct xdr_buf *xdr);
 extern int svc_rdma_send_reply_chunk(struct svcxprt_rdma *rdma,
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 330d542..b7d9c55 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
 /*
- * Copyright (c) 2016, 2017 Oracle. All rights reserved.
+ * Copyright (c) 2016-2018 Oracle. All rights reserved.
  * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
  * Copyright (c) 2005-2006 Network Appliance, Inc. All rights reserved.
  *
@@ -61,7 +61,7 @@
  * svc_rdma_recvfrom must post RDMA Reads to pull the RPC Call's
  * data payload from the client. svc_rdma_recvfrom sets up the
  * RDMA Reads using pages in svc_rqst::rq_pages, which are
- * transferred to an svc_rdma_op_ctxt for the duration of the
+ * transferred to an svc_rdma_recv_ctxt for the duration of the
  * I/O. svc_rdma_recvfrom then returns zero, since the RPC message
  * is still not yet ready.
  *
@@ -70,18 +70,18 @@
  * svc_rdma_recvfrom again. This second call may use a different
  * svc_rqst than the first one, thus any information that needs
  * to be preserved across these two calls is kept in an
- * svc_rdma_op_ctxt.
+ * svc_rdma_recv_ctxt.
  *
  * The second call to svc_rdma_recvfrom performs final assembly
  * of the RPC Call message, using the RDMA Read sink pages kept in
- * the svc_rdma_op_ctxt. The xdr_buf is copied from the
- * svc_rdma_op_ctxt to the second svc_rqst. The second call returns
+ * the svc_rdma_recv_ctxt. The xdr_buf is copied from the
+ * svc_rdma_recv_ctxt to the second svc_rqst. The second call returns
  * the length of the completed RPC Call message.
  *
  * Page Management
  *
  * Pages under I/O must be transferred from the first svc_rqst to an
- * svc_rdma_op_ctxt before the first svc_rdma_recvfrom call returns.
+ * svc_rdma_recv_ctxt before the first svc_rdma_recvfrom call returns.
  *
  * The first svc_rqst supplies pages for RDMA Reads. These are moved
  * from rqstp::rq_pages into ctxt::pages. The consumed elements of
@@ -89,7 +89,7 @@
  * svc_rdma_recvfrom call returns.
  *
  * During the second svc_rdma_recvfrom call, RDMA Read sink pages
- * are transferred from the svc_rdma_op_ctxt to the second svc_rqst
+ * are transferred from the svc_rdma_recv_ctxt to the second svc_rqst
  * (see rdma_read_complete() below).
  */
 
@@ -108,13 +108,247 @@
 
 #define RPCDBG_FACILITY	RPCDBG_SVCXPRT
 
+static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc);
+
+static inline struct svc_rdma_recv_ctxt *
+svc_rdma_next_recv_ctxt(struct list_head *list)
+{
+	return list_first_entry_or_null(list, struct svc_rdma_recv_ctxt,
+					rc_list);
+}
+
+/**
+ * svc_rdma_recv_ctxts_destroy - Release all recv_ctxt's for an xprt
+ * @rdma: svcxprt_rdma being torn down
+ *
+ */
+void svc_rdma_recv_ctxts_destroy(struct svcxprt_rdma *rdma)
+{
+	struct svc_rdma_recv_ctxt *ctxt;
+
+	while ((ctxt = svc_rdma_next_recv_ctxt(&rdma->sc_recv_ctxts))) {
+		list_del(&ctxt->rc_list);
+		kfree(ctxt);
+	}
+}
+
+static struct svc_rdma_recv_ctxt *
+svc_rdma_recv_ctxt_get(struct svcxprt_rdma *rdma)
+{
+	struct svc_rdma_recv_ctxt *ctxt;
+
+	spin_lock(&rdma->sc_recv_lock);
+	ctxt = svc_rdma_next_recv_ctxt(&rdma->sc_recv_ctxts);
+	if (!ctxt)
+		goto out_empty;
+	list_del(&ctxt->rc_list);
+	spin_unlock(&rdma->sc_recv_lock);
+
+out:
+	ctxt->rc_recv_wr.num_sge = 0;
+	ctxt->rc_page_count = 0;
+	return ctxt;
+
+out_empty:
+	spin_unlock(&rdma->sc_recv_lock);
+
+	ctxt = kmalloc(sizeof(*ctxt), GFP_KERNEL);
+	if (!ctxt)
+		return NULL;
+	goto out;
+}
+
+static void svc_rdma_recv_ctxt_unmap(struct svcxprt_rdma *rdma,
+				     struct svc_rdma_recv_ctxt *ctxt)
+{
+	struct ib_device *device = rdma->sc_cm_id->device;
+	int i;
+
+	for (i = 0; i < ctxt->rc_recv_wr.num_sge; i++)
+		ib_dma_unmap_page(device,
+				  ctxt->rc_sges[i].addr,
+				  ctxt->rc_sges[i].length,
+				  DMA_FROM_DEVICE);
+}
+
+/**
+ * svc_rdma_recv_ctxt_put - Return recv_ctxt to free list
+ * @rdma: controlling svcxprt_rdma
+ * @ctxt: object to return to the free list
+ * @free_pages: Non-zero if rc_pages should be freed
+ *
+ */
+void svc_rdma_recv_ctxt_put(struct svcxprt_rdma *rdma,
+			    struct svc_rdma_recv_ctxt *ctxt,
+			    int free_pages)
+{
+	unsigned int i;
+
+	if (free_pages)
+		for (i = 0; i < ctxt->rc_page_count; i++)
+			put_page(ctxt->rc_pages[i]);
+	spin_lock(&rdma->sc_recv_lock);
+	list_add(&ctxt->rc_list, &rdma->sc_recv_ctxts);
+	spin_unlock(&rdma->sc_recv_lock);
+}
+
+static int svc_rdma_post_recv(struct svcxprt_rdma *rdma)
+{
+	struct ib_device *device = rdma->sc_cm_id->device;
+	struct svc_rdma_recv_ctxt *ctxt;
+	struct ib_recv_wr *bad_recv_wr;
+	int sge_no, buflen, ret;
+	struct page *page;
+	dma_addr_t pa;
+
+	ctxt = svc_rdma_recv_ctxt_get(rdma);
+	if (!ctxt)
+		return -ENOMEM;
+
+	buflen = 0;
+	ctxt->rc_cqe.done = svc_rdma_wc_receive;
+	for (sge_no = 0; buflen < rdma->sc_max_req_size; sge_no++) {
+		if (sge_no >= rdma->sc_max_sge) {
+			pr_err("svcrdma: Too many sges (%d)\n", sge_no);
+			goto err_put_ctxt;
+		}
+
+		page = alloc_page(GFP_KERNEL);
+		if (!page)
+			goto err_put_ctxt;
+		ctxt->rc_pages[sge_no] = page;
+		ctxt->rc_page_count++;
+
+		pa = ib_dma_map_page(device, ctxt->rc_pages[sge_no],
+				     0, PAGE_SIZE, DMA_FROM_DEVICE);
+		if (ib_dma_mapping_error(device, pa))
+			goto err_put_ctxt;
+		ctxt->rc_sges[sge_no].addr = pa;
+		ctxt->rc_sges[sge_no].length = PAGE_SIZE;
+		ctxt->rc_sges[sge_no].lkey = rdma->sc_pd->local_dma_lkey;
+		ctxt->rc_recv_wr.num_sge++;
+
+		buflen += PAGE_SIZE;
+	}
+	ctxt->rc_recv_wr.next = NULL;
+	ctxt->rc_recv_wr.sg_list = &ctxt->rc_sges[0];
+	ctxt->rc_recv_wr.wr_cqe = &ctxt->rc_cqe;
+
+	svc_xprt_get(&rdma->sc_xprt);
+	ret = ib_post_recv(rdma->sc_qp, &ctxt->rc_recv_wr, &bad_recv_wr);
+	trace_svcrdma_post_recv(&ctxt->rc_recv_wr, ret);
+	if (ret)
+		goto err_post;
+	return 0;
+
+err_put_ctxt:
+	svc_rdma_recv_ctxt_unmap(rdma, ctxt);
+	svc_rdma_recv_ctxt_put(rdma, ctxt, 1);
+	return -ENOMEM;
+err_post:
+	svc_rdma_recv_ctxt_unmap(rdma, ctxt);
+	svc_rdma_recv_ctxt_put(rdma, ctxt, 1);
+	svc_xprt_put(&rdma->sc_xprt);
+	return ret;
+}
+
+/**
+ * svc_rdma_post_recvs - Post initial set of Recv WRs
+ * @rdma: fresh svcxprt_rdma
+ *
+ * Returns true if successful, otherwise false.
+ */
+bool svc_rdma_post_recvs(struct svcxprt_rdma *rdma)
+{
+	unsigned int i;
+	int ret;
+
+	for (i = 0; i < rdma->sc_max_requests; i++) {
+		ret = svc_rdma_post_recv(rdma);
+		if (ret) {
+			pr_err("svcrdma: failure posting recv buffers: %d\n",
+			       ret);
+			return false;
+		}
+	}
+	return true;
+}
+
+/**
+ * svc_rdma_wc_receive - Invoked by RDMA provider for each polled Receive WC
+ * @cq: Completion Queue context
+ * @wc: Work Completion object
+ *
+ * NB: The svc_xprt/svcxprt_rdma is pinned whenever it's possible that
+ * the Receive completion handler could be running.
+ */
+static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
+{
+	struct svcxprt_rdma *rdma = cq->cq_context;
+	struct ib_cqe *cqe = wc->wr_cqe;
+	struct svc_rdma_recv_ctxt *ctxt;
+
+	trace_svcrdma_wc_receive(wc);
+
+	/* WARNING: Only wc->wr_cqe and wc->status are reliable */
+	ctxt = container_of(cqe, struct svc_rdma_recv_ctxt, rc_cqe);
+	svc_rdma_recv_ctxt_unmap(rdma, ctxt);
+
+	if (wc->status != IB_WC_SUCCESS)
+		goto flushed;
+
+	if (svc_rdma_post_recv(rdma))
+		goto post_err;
+
+	/* All wc fields are now known to be valid */
+	ctxt->rc_byte_len = wc->byte_len;
+	spin_lock(&rdma->sc_rq_dto_lock);
+	list_add_tail(&ctxt->rc_list, &rdma->sc_rq_dto_q);
+	spin_unlock(&rdma->sc_rq_dto_lock);
+	set_bit(XPT_DATA, &rdma->sc_xprt.xpt_flags);
+	if (!test_bit(RDMAXPRT_CONN_PENDING, &rdma->sc_flags))
+		svc_xprt_enqueue(&rdma->sc_xprt);
+	goto out;
+
+flushed:
+	if (wc->status != IB_WC_WR_FLUSH_ERR)
+		pr_err("svcrdma: Recv: %s (%u/0x%x)\n",
+		       ib_wc_status_msg(wc->status),
+		       wc->status, wc->vendor_err);
+post_err:
+	svc_rdma_recv_ctxt_put(rdma, ctxt, 1);
+	set_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags);
+	svc_xprt_enqueue(&rdma->sc_xprt);
+out:
+	svc_xprt_put(&rdma->sc_xprt);
+}
+
+/**
+ * svc_rdma_flush_recv_queues - Drain pending Receive work
+ * @rdma: svcxprt_rdma being shut down
+ *
+ */
+void svc_rdma_flush_recv_queues(struct svcxprt_rdma *rdma)
+{
+	struct svc_rdma_recv_ctxt *ctxt;
+
+	while ((ctxt = svc_rdma_next_recv_ctxt(&rdma->sc_read_complete_q))) {
+		list_del(&ctxt->rc_list);
+		svc_rdma_recv_ctxt_put(rdma, ctxt, 1);
+	}
+	while ((ctxt = svc_rdma_next_recv_ctxt(&rdma->sc_rq_dto_q))) {
+		list_del(&ctxt->rc_list);
+		svc_rdma_recv_ctxt_put(rdma, ctxt, 1);
+	}
+}
+
 /*
  * Replace the pages in the rq_argpages array with the pages from the SGE in
  * the RDMA_RECV completion. The SGL should contain full pages up until the
  * last one.
  */
 static void svc_rdma_build_arg_xdr(struct svc_rqst *rqstp,
-				   struct svc_rdma_op_ctxt *ctxt)
+				   struct svc_rdma_recv_ctxt *ctxt)
 {
 	struct page *page;
 	int sge_no;
@@ -123,30 +357,30 @@ static void svc_rdma_build_arg_xdr(struct svc_rqst *rqstp,
 	/* The reply path assumes the Call's transport header resides
 	 * in rqstp->rq_pages[0].
 	 */
-	page = ctxt->pages[0];
+	page = ctxt->rc_pages[0];
 	put_page(rqstp->rq_pages[0]);
 	rqstp->rq_pages[0] = page;
 
 	/* Set up the XDR head */
 	rqstp->rq_arg.head[0].iov_base = page_address(page);
 	rqstp->rq_arg.head[0].iov_len =
-		min_t(size_t, ctxt->byte_len, ctxt->sge[0].length);
-	rqstp->rq_arg.len = ctxt->byte_len;
-	rqstp->rq_arg.buflen = ctxt->byte_len;
+		min_t(size_t, ctxt->rc_byte_len, ctxt->rc_sges[0].length);
+	rqstp->rq_arg.len = ctxt->rc_byte_len;
+	rqstp->rq_arg.buflen = ctxt->rc_byte_len;
 
 	/* Compute bytes past head in the SGL */
-	len = ctxt->byte_len - rqstp->rq_arg.head[0].iov_len;
+	len = ctxt->rc_byte_len - rqstp->rq_arg.head[0].iov_len;
 
 	/* If data remains, store it in the pagelist */
 	rqstp->rq_arg.page_len = len;
 	rqstp->rq_arg.page_base = 0;
 
 	sge_no = 1;
-	while (len && sge_no < ctxt->count) {
-		page = ctxt->pages[sge_no];
+	while (len && sge_no < ctxt->rc_recv_wr.num_sge) {
+		page = ctxt->rc_pages[sge_no];
 		put_page(rqstp->rq_pages[sge_no]);
 		rqstp->rq_pages[sge_no] = page;
-		len -= min_t(u32, len, ctxt->sge[sge_no].length);
+		len -= min_t(u32, len, ctxt->rc_sges[sge_no].length);
 		sge_no++;
 	}
 	rqstp->rq_respages = &rqstp->rq_pages[sge_no];
@@ -154,11 +388,11 @@ static void svc_rdma_build_arg_xdr(struct svc_rqst *rqstp,
 
 	/* If not all pages were used from the SGL, free the remaining ones */
 	len = sge_no;
-	while (sge_no < ctxt->count) {
-		page = ctxt->pages[sge_no++];
+	while (sge_no < ctxt->rc_recv_wr.num_sge) {
+		page = ctxt->rc_pages[sge_no++];
 		put_page(page);
 	}
-	ctxt->count = len;
+	ctxt->rc_page_count = len;
 
 	/* Set up tail */
 	rqstp->rq_arg.tail[0].iov_base = NULL;
@@ -364,29 +598,29 @@ static int svc_rdma_xdr_decode_req(struct xdr_buf *rq_arg)
 }
 
 static void rdma_read_complete(struct svc_rqst *rqstp,
-			       struct svc_rdma_op_ctxt *head)
+			       struct svc_rdma_recv_ctxt *head)
 {
 	int page_no;
 
 	/* Copy RPC pages */
-	for (page_no = 0; page_no < head->count; page_no++) {
+	for (page_no = 0; page_no < head->rc_page_count; page_no++) {
 		put_page(rqstp->rq_pages[page_no]);
-		rqstp->rq_pages[page_no] = head->pages[page_no];
+		rqstp->rq_pages[page_no] = head->rc_pages[page_no];
 	}
 
 	/* Point rq_arg.pages past header */
-	rqstp->rq_arg.pages = &rqstp->rq_pages[head->hdr_count];
-	rqstp->rq_arg.page_len = head->arg.page_len;
+	rqstp->rq_arg.pages = &rqstp->rq_pages[head->rc_hdr_count];
+	rqstp->rq_arg.page_len = head->rc_arg.page_len;
 
 	/* rq_respages starts after the last arg page */
 	rqstp->rq_respages = &rqstp->rq_pages[page_no];
 	rqstp->rq_next_page = rqstp->rq_respages + 1;
 
 	/* Rebuild rq_arg head and tail. */
-	rqstp->rq_arg.head[0] = head->arg.head[0];
-	rqstp->rq_arg.tail[0] = head->arg.tail[0];
-	rqstp->rq_arg.len = head->arg.len;
-	rqstp->rq_arg.buflen = head->arg.buflen;
+	rqstp->rq_arg.head[0] = head->rc_arg.head[0];
+	rqstp->rq_arg.tail[0] = head->rc_arg.tail[0];
+	rqstp->rq_arg.len = head->rc_arg.len;
+	rqstp->rq_arg.buflen = head->rc_arg.buflen;
 }
 
 static void svc_rdma_send_error(struct svcxprt_rdma *xprt,
@@ -506,28 +740,26 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
 	struct svc_xprt *xprt = rqstp->rq_xprt;
 	struct svcxprt_rdma *rdma_xprt =
 		container_of(xprt, struct svcxprt_rdma, sc_xprt);
-	struct svc_rdma_op_ctxt *ctxt;
+	struct svc_rdma_recv_ctxt *ctxt;
 	__be32 *p;
 	int ret;
 
 	spin_lock(&rdma_xprt->sc_rq_dto_lock);
-	if (!list_empty(&rdma_xprt->sc_read_complete_q)) {
-		ctxt = list_first_entry(&rdma_xprt->sc_read_complete_q,
-					struct svc_rdma_op_ctxt, list);
-		list_del(&ctxt->list);
+	ctxt = svc_rdma_next_recv_ctxt(&rdma_xprt->sc_read_complete_q);
+	if (ctxt) {
+		list_del(&ctxt->rc_list);
 		spin_unlock(&rdma_xprt->sc_rq_dto_lock);
 		rdma_read_complete(rqstp, ctxt);
 		goto complete;
-	} else if (!list_empty(&rdma_xprt->sc_rq_dto_q)) {
-		ctxt = list_first_entry(&rdma_xprt->sc_rq_dto_q,
-					struct svc_rdma_op_ctxt, list);
-		list_del(&ctxt->list);
-	} else {
+	}
+	ctxt = svc_rdma_next_recv_ctxt(&rdma_xprt->sc_rq_dto_q);
+	if (!ctxt) {
 		/* No new incoming requests, terminate the loop */
 		clear_bit(XPT_DATA, &xprt->xpt_flags);
 		spin_unlock(&rdma_xprt->sc_rq_dto_lock);
 		return 0;
 	}
+	list_del(&ctxt->rc_list);
 	spin_unlock(&rdma_xprt->sc_rq_dto_lock);
 
 	atomic_inc(&rdma_stat_recv);
@@ -545,7 +777,7 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
 	if (svc_rdma_is_backchannel_reply(xprt, p)) {
 		ret = svc_rdma_handle_bc_reply(xprt->xpt_bc_xprt, p,
 					       &rqstp->rq_arg);
-		svc_rdma_put_context(ctxt, 0);
+		svc_rdma_recv_ctxt_put(rdma_xprt, ctxt, 0);
 		return ret;
 	}
 
@@ -554,7 +786,7 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
 		goto out_readchunk;
 
 complete:
-	svc_rdma_put_context(ctxt, 0);
+	svc_rdma_recv_ctxt_put(rdma_xprt, ctxt, 0);
 	rqstp->rq_prot = IPPROTO_MAX;
 	svc_xprt_copy_addrs(rqstp, xprt);
 	return rqstp->rq_arg.len;
@@ -567,16 +799,16 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
 
 out_err:
 	svc_rdma_send_error(rdma_xprt, p, ret);
-	svc_rdma_put_context(ctxt, 0);
+	svc_rdma_recv_ctxt_put(rdma_xprt, ctxt, 0);
 	return 0;
 
 out_postfail:
 	if (ret == -EINVAL)
 		svc_rdma_send_error(rdma_xprt, p, ret);
-	svc_rdma_put_context(ctxt, 1);
+	svc_rdma_recv_ctxt_put(rdma_xprt, ctxt, 1);
 	return ret;
 
 out_drop:
-	svc_rdma_put_context(ctxt, 1);
+	svc_rdma_recv_ctxt_put(rdma_xprt, ctxt, 1);
 	return 0;
 }
diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
index 887ceef..c080ce2 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
- * Copyright (c) 2016 Oracle.  All rights reserved.
+ * Copyright (c) 2016-2018 Oracle.  All rights reserved.
  *
  * Use the core R/W API to move RPC-over-RDMA Read and Write chunks.
  */
@@ -227,7 +227,7 @@ static void svc_rdma_write_done(struct ib_cq *cq, struct ib_wc *wc)
 /* State for pulling a Read chunk.
  */
 struct svc_rdma_read_info {
-	struct svc_rdma_op_ctxt		*ri_readctxt;
+	struct svc_rdma_recv_ctxt	*ri_readctxt;
 	unsigned int			ri_position;
 	unsigned int			ri_pageno;
 	unsigned int			ri_pageoff;
@@ -282,10 +282,10 @@ static void svc_rdma_wc_read_done(struct ib_cq *cq, struct ib_wc *wc)
 			pr_err("svcrdma: read ctx: %s (%u/0x%x)\n",
 			       ib_wc_status_msg(wc->status),
 			       wc->status, wc->vendor_err);
-		svc_rdma_put_context(info->ri_readctxt, 1);
+		svc_rdma_recv_ctxt_put(rdma, info->ri_readctxt, 1);
 	} else {
 		spin_lock(&rdma->sc_rq_dto_lock);
-		list_add_tail(&info->ri_readctxt->list,
+		list_add_tail(&info->ri_readctxt->rc_list,
 			      &rdma->sc_read_complete_q);
 		spin_unlock(&rdma->sc_rq_dto_lock);
 
@@ -607,7 +607,7 @@ static int svc_rdma_build_read_segment(struct svc_rdma_read_info *info,
 				       struct svc_rqst *rqstp,
 				       u32 rkey, u32 len, u64 offset)
 {
-	struct svc_rdma_op_ctxt *head = info->ri_readctxt;
+	struct svc_rdma_recv_ctxt *head = info->ri_readctxt;
 	struct svc_rdma_chunk_ctxt *cc = &info->ri_cc;
 	struct svc_rdma_rw_ctxt *ctxt;
 	unsigned int sge_no, seg_len;
@@ -625,10 +625,10 @@ static int svc_rdma_build_read_segment(struct svc_rdma_read_info *info,
 		seg_len = min_t(unsigned int, len,
 				PAGE_SIZE - info->ri_pageoff);
 
-		head->arg.pages[info->ri_pageno] =
+		head->rc_arg.pages[info->ri_pageno] =
 			rqstp->rq_pages[info->ri_pageno];
 		if (!info->ri_pageoff)
-			head->count++;
+			head->rc_page_count++;
 
 		sg_set_page(sg, rqstp->rq_pages[info->ri_pageno],
 			    seg_len, info->ri_pageoff);
@@ -705,9 +705,9 @@ static int svc_rdma_build_read_chunk(struct svc_rqst *rqstp,
 }
 
 /* Construct RDMA Reads to pull over a normal Read chunk. The chunk
- * data lands in the page list of head->arg.pages.
+ * data lands in the page list of head->rc_arg.pages.
  *
- * Currently NFSD does not look at the head->arg.tail[0] iovec.
+ * Currently NFSD does not look at the head->rc_arg.tail[0] iovec.
  * Therefore, XDR round-up of the Read chunk and trailing
  * inline content must both be added at the end of the pagelist.
  */
@@ -715,10 +715,10 @@ static int svc_rdma_build_normal_read_chunk(struct svc_rqst *rqstp,
 					    struct svc_rdma_read_info *info,
 					    __be32 *p)
 {
-	struct svc_rdma_op_ctxt *head = info->ri_readctxt;
+	struct svc_rdma_recv_ctxt *head = info->ri_readctxt;
 	int ret;
 
-	info->ri_pageno = head->hdr_count;
+	info->ri_pageno = head->rc_hdr_count;
 	info->ri_pageoff = 0;
 
 	ret = svc_rdma_build_read_chunk(rqstp, info, p);
@@ -732,11 +732,11 @@ static int svc_rdma_build_normal_read_chunk(struct svc_rqst *rqstp,
 	 * chunk is not included in either the pagelist or in
 	 * the tail.
 	 */
-	head->arg.tail[0].iov_base =
-		head->arg.head[0].iov_base + info->ri_position;
-	head->arg.tail[0].iov_len =
-		head->arg.head[0].iov_len - info->ri_position;
-	head->arg.head[0].iov_len = info->ri_position;
+	head->rc_arg.tail[0].iov_base =
+		head->rc_arg.head[0].iov_base + info->ri_position;
+	head->rc_arg.tail[0].iov_len =
+		head->rc_arg.head[0].iov_len - info->ri_position;
+	head->rc_arg.head[0].iov_len = info->ri_position;
 
 	/* Read chunk may need XDR roundup (see RFC 8166, s. 3.4.5.2).
 	 *
@@ -749,9 +749,9 @@ static int svc_rdma_build_normal_read_chunk(struct svc_rqst *rqstp,
 	 */
 	info->ri_chunklen = XDR_QUADLEN(info->ri_chunklen) << 2;
 
-	head->arg.page_len = info->ri_chunklen;
-	head->arg.len += info->ri_chunklen;
-	head->arg.buflen += info->ri_chunklen;
+	head->rc_arg.page_len = info->ri_chunklen;
+	head->rc_arg.len += info->ri_chunklen;
+	head->rc_arg.buflen += info->ri_chunklen;
 
 out:
 	return ret;
@@ -760,7 +760,7 @@ static int svc_rdma_build_normal_read_chunk(struct svc_rqst *rqstp,
 /* Construct RDMA Reads to pull over a Position Zero Read chunk.
  * The start of the data lands in the first page just after
  * the Transport header, and the rest lands in the page list of
- * head->arg.pages.
+ * head->rc_arg.pages.
  *
  * Assumptions:
  *	- A PZRC has an XDR-aligned length (no implicit round-up).
@@ -772,11 +772,11 @@ static int svc_rdma_build_pz_read_chunk(struct svc_rqst *rqstp,
 					struct svc_rdma_read_info *info,
 					__be32 *p)
 {
-	struct svc_rdma_op_ctxt *head = info->ri_readctxt;
+	struct svc_rdma_recv_ctxt *head = info->ri_readctxt;
 	int ret;
 
-	info->ri_pageno = head->hdr_count - 1;
-	info->ri_pageoff = offset_in_page(head->byte_len);
+	info->ri_pageno = head->rc_hdr_count - 1;
+	info->ri_pageoff = offset_in_page(head->rc_byte_len);
 
 	ret = svc_rdma_build_read_chunk(rqstp, info, p);
 	if (ret < 0)
@@ -784,22 +784,22 @@ static int svc_rdma_build_pz_read_chunk(struct svc_rqst *rqstp,
 
 	trace_svcrdma_encode_pzr(info->ri_chunklen);
 
-	head->arg.len += info->ri_chunklen;
-	head->arg.buflen += info->ri_chunklen;
+	head->rc_arg.len += info->ri_chunklen;
+	head->rc_arg.buflen += info->ri_chunklen;
 
-	if (head->arg.buflen <= head->sge[0].length) {
+	if (head->rc_arg.buflen <= head->rc_sges[0].length) {
 		/* Transport header and RPC message fit entirely
 		 * in page where head iovec resides.
 		 */
-		head->arg.head[0].iov_len = info->ri_chunklen;
+		head->rc_arg.head[0].iov_len = info->ri_chunklen;
 	} else {
 		/* Transport header and part of RPC message reside
 		 * in the head iovec's page.
 		 */
-		head->arg.head[0].iov_len =
-				head->sge[0].length - head->byte_len;
-		head->arg.page_len =
-				info->ri_chunklen - head->arg.head[0].iov_len;
+		head->rc_arg.head[0].iov_len =
+			head->rc_sges[0].length - head->rc_byte_len;
+		head->rc_arg.page_len =
+			info->ri_chunklen - head->rc_arg.head[0].iov_len;
 	}
 
 out:
@@ -824,24 +824,24 @@ static int svc_rdma_build_pz_read_chunk(struct svc_rqst *rqstp,
  * - All Read segments in @p have the same Position value.
  */
 int svc_rdma_recv_read_chunk(struct svcxprt_rdma *rdma, struct svc_rqst *rqstp,
-			     struct svc_rdma_op_ctxt *head, __be32 *p)
+			     struct svc_rdma_recv_ctxt *head, __be32 *p)
 {
 	struct svc_rdma_read_info *info;
 	struct page **page;
 	int ret;
 
 	/* The request (with page list) is constructed in
-	 * head->arg. Pages involved with RDMA Read I/O are
+	 * head->rc_arg. Pages involved with RDMA Read I/O are
 	 * transferred there.
 	 */
-	head->hdr_count = head->count;
-	head->arg.head[0] = rqstp->rq_arg.head[0];
-	head->arg.tail[0] = rqstp->rq_arg.tail[0];
-	head->arg.pages = head->pages;
-	head->arg.page_base = 0;
-	head->arg.page_len = 0;
-	head->arg.len = rqstp->rq_arg.len;
-	head->arg.buflen = rqstp->rq_arg.buflen;
+	head->rc_hdr_count = head->rc_page_count;
+	head->rc_arg.head[0] = rqstp->rq_arg.head[0];
+	head->rc_arg.tail[0] = rqstp->rq_arg.tail[0];
+	head->rc_arg.pages = head->rc_pages;
+	head->rc_arg.page_base = 0;
+	head->rc_arg.page_len = 0;
+	head->rc_arg.len = rqstp->rq_arg.len;
+	head->rc_arg.buflen = rqstp->rq_arg.buflen;
 
 	info = svc_rdma_read_info_alloc(rdma);
 	if (!info)
@@ -867,7 +867,7 @@ int svc_rdma_recv_read_chunk(struct svcxprt_rdma *rdma, struct svc_rqst *rqstp,
 
 out:
 	/* Read sink pages have been moved from rqstp->rq_pages to
-	 * head->arg.pages. Force svc_recv to refill those slots
+	 * head->rc_arg.pages. Force svc_recv to refill those slots
 	 * in rq_pages.
 	 */
 	for (page = rqstp->rq_pages; page < rqstp->rq_respages; page++)
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index fed28de..a397d9a 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
 /*
- * Copyright (c) 2016 Oracle. All rights reserved.
+ * Copyright (c) 2016-2018 Oracle. All rights reserved.
  * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
  * Copyright (c) 2005-2006 Network Appliance, Inc. All rights reserved.
  *
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index ca9001d..afd5e61 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -63,7 +63,6 @@
 
 #define RPCDBG_FACILITY	RPCDBG_SVCXPRT
 
-static int svc_rdma_post_recv(struct svcxprt_rdma *xprt);
 static struct svcxprt_rdma *svc_rdma_create_xprt(struct svc_serv *serv,
 						 struct net *net);
 static struct svc_xprt *svc_rdma_create(struct svc_serv *serv,
@@ -175,11 +174,7 @@ static bool svc_rdma_prealloc_ctxts(struct svcxprt_rdma *xprt)
 {
 	unsigned int i;
 
-	/* Each RPC/RDMA credit can consume one Receive and
-	 * one Send WQE at the same time.
-	 */
-	i = xprt->sc_sq_depth + xprt->sc_rq_depth;
-
+	i = xprt->sc_sq_depth;
 	while (i--) {
 		struct svc_rdma_op_ctxt *ctxt;
 
@@ -298,54 +293,6 @@ static void qp_event_handler(struct ib_event *event, void *context)
 }
 
 /**
- * svc_rdma_wc_receive - Invoked by RDMA provider for each polled Receive WC
- * @cq:        completion queue
- * @wc:        completed WR
- *
- */
-static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
-{
-	struct svcxprt_rdma *xprt = cq->cq_context;
-	struct ib_cqe *cqe = wc->wr_cqe;
-	struct svc_rdma_op_ctxt *ctxt;
-
-	trace_svcrdma_wc_receive(wc);
-
-	/* WARNING: Only wc->wr_cqe and wc->status are reliable */
-	ctxt = container_of(cqe, struct svc_rdma_op_ctxt, cqe);
-	svc_rdma_unmap_dma(ctxt);
-
-	if (wc->status != IB_WC_SUCCESS)
-		goto flushed;
-
-	/* All wc fields are now known to be valid */
-	ctxt->byte_len = wc->byte_len;
-	spin_lock(&xprt->sc_rq_dto_lock);
-	list_add_tail(&ctxt->list, &xprt->sc_rq_dto_q);
-	spin_unlock(&xprt->sc_rq_dto_lock);
-
-	svc_rdma_post_recv(xprt);
-
-	set_bit(XPT_DATA, &xprt->sc_xprt.xpt_flags);
-	if (test_bit(RDMAXPRT_CONN_PENDING, &xprt->sc_flags))
-		goto out;
-	goto out_enqueue;
-
-flushed:
-	if (wc->status != IB_WC_WR_FLUSH_ERR)
-		pr_err("svcrdma: Recv: %s (%u/0x%x)\n",
-		       ib_wc_status_msg(wc->status),
-		       wc->status, wc->vendor_err);
-	set_bit(XPT_CLOSE, &xprt->sc_xprt.xpt_flags);
-	svc_rdma_put_context(ctxt, 1);
-
-out_enqueue:
-	svc_xprt_enqueue(&xprt->sc_xprt);
-out:
-	svc_xprt_put(&xprt->sc_xprt);
-}
-
-/**
  * svc_rdma_wc_send - Invoked by RDMA provider for each polled Send WC
  * @cq:        completion queue
  * @wc:        completed WR
@@ -392,12 +339,14 @@ static struct svcxprt_rdma *svc_rdma_create_xprt(struct svc_serv *serv,
 	INIT_LIST_HEAD(&cma_xprt->sc_rq_dto_q);
 	INIT_LIST_HEAD(&cma_xprt->sc_read_complete_q);
 	INIT_LIST_HEAD(&cma_xprt->sc_ctxts);
+	INIT_LIST_HEAD(&cma_xprt->sc_recv_ctxts);
 	INIT_LIST_HEAD(&cma_xprt->sc_rw_ctxts);
 	init_waitqueue_head(&cma_xprt->sc_send_wait);
 
 	spin_lock_init(&cma_xprt->sc_lock);
 	spin_lock_init(&cma_xprt->sc_rq_dto_lock);
 	spin_lock_init(&cma_xprt->sc_ctxt_lock);
+	spin_lock_init(&cma_xprt->sc_recv_lock);
 	spin_lock_init(&cma_xprt->sc_rw_ctxt_lock);
 
 	/*
@@ -411,63 +360,6 @@ static struct svcxprt_rdma *svc_rdma_create_xprt(struct svc_serv *serv,
 	return cma_xprt;
 }
 
-static int
-svc_rdma_post_recv(struct svcxprt_rdma *xprt)
-{
-	struct ib_recv_wr recv_wr, *bad_recv_wr;
-	struct svc_rdma_op_ctxt *ctxt;
-	struct page *page;
-	dma_addr_t pa;
-	int sge_no;
-	int buflen;
-	int ret;
-
-	ctxt = svc_rdma_get_context(xprt);
-	buflen = 0;
-	ctxt->direction = DMA_FROM_DEVICE;
-	ctxt->cqe.done = svc_rdma_wc_receive;
-	for (sge_no = 0; buflen < xprt->sc_max_req_size; sge_no++) {
-		if (sge_no >= xprt->sc_max_sge) {
-			pr_err("svcrdma: Too many sges (%d)\n", sge_no);
-			goto err_put_ctxt;
-		}
-		page = alloc_page(GFP_KERNEL);
-		if (!page)
-			goto err_put_ctxt;
-		ctxt->pages[sge_no] = page;
-		pa = ib_dma_map_page(xprt->sc_cm_id->device,
-				     page, 0, PAGE_SIZE,
-				     DMA_FROM_DEVICE);
-		if (ib_dma_mapping_error(xprt->sc_cm_id->device, pa))
-			goto err_put_ctxt;
-		svc_rdma_count_mappings(xprt, ctxt);
-		ctxt->sge[sge_no].addr = pa;
-		ctxt->sge[sge_no].length = PAGE_SIZE;
-		ctxt->sge[sge_no].lkey = xprt->sc_pd->local_dma_lkey;
-		ctxt->count = sge_no + 1;
-		buflen += PAGE_SIZE;
-	}
-	recv_wr.next = NULL;
-	recv_wr.sg_list = &ctxt->sge[0];
-	recv_wr.num_sge = ctxt->count;
-	recv_wr.wr_cqe = &ctxt->cqe;
-
-	svc_xprt_get(&xprt->sc_xprt);
-	ret = ib_post_recv(xprt->sc_qp, &recv_wr, &bad_recv_wr);
-	trace_svcrdma_post_recv(&recv_wr, ret);
-	if (ret) {
-		svc_rdma_unmap_dma(ctxt);
-		svc_rdma_put_context(ctxt, 1);
-		svc_xprt_put(&xprt->sc_xprt);
-	}
-	return ret;
-
- err_put_ctxt:
-	svc_rdma_unmap_dma(ctxt);
-	svc_rdma_put_context(ctxt, 1);
-	return -ENOMEM;
-}
-
 static void
 svc_rdma_parse_connect_private(struct svcxprt_rdma *newxprt,
 			       struct rdma_conn_param *param)
@@ -699,7 +591,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	struct ib_qp_init_attr qp_attr;
 	struct ib_device *dev;
 	struct sockaddr *sap;
-	unsigned int i, ctxts;
+	unsigned int ctxts;
 	int ret = 0;
 
 	listen_rdma = container_of(xprt, struct svcxprt_rdma, sc_xprt);
@@ -804,14 +696,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	    !rdma_ib_or_roce(dev, newxprt->sc_port_num))
 		goto errout;
 
-	/* Post receive buffers */
-	for (i = 0; i < newxprt->sc_max_requests; i++) {
-		ret = svc_rdma_post_recv(newxprt);
-		if (ret) {
-			dprintk("svcrdma: failure posting receive buffers\n");
-			goto errout;
-		}
-	}
+	if (!svc_rdma_post_recvs(newxprt))
+		goto errout;
 
 	/* Swap out the handler */
 	newxprt->sc_cm_id->event_handler = rdma_cma_handler;
@@ -908,20 +794,7 @@ static void __svc_rdma_free(struct work_struct *work)
 		pr_err("svcrdma: sc_xprt still in use? (%d)\n",
 		       kref_read(&xprt->xpt_ref));
 
-	while (!list_empty(&rdma->sc_read_complete_q)) {
-		struct svc_rdma_op_ctxt *ctxt;
-		ctxt = list_first_entry(&rdma->sc_read_complete_q,
-					struct svc_rdma_op_ctxt, list);
-		list_del(&ctxt->list);
-		svc_rdma_put_context(ctxt, 1);
-	}
-	while (!list_empty(&rdma->sc_rq_dto_q)) {
-		struct svc_rdma_op_ctxt *ctxt;
-		ctxt = list_first_entry(&rdma->sc_rq_dto_q,
-					struct svc_rdma_op_ctxt, list);
-		list_del(&ctxt->list);
-		svc_rdma_put_context(ctxt, 1);
-	}
+	svc_rdma_flush_recv_queues(rdma);
 
 	/* Warn if we leaked a resource or under-referenced */
 	if (rdma->sc_ctxt_used != 0)
@@ -936,6 +809,7 @@ static void __svc_rdma_free(struct work_struct *work)
 
 	svc_rdma_destroy_rw_ctxts(rdma);
 	svc_rdma_destroy_ctxts(rdma);
+	svc_rdma_recv_ctxts_destroy(rdma);
 
 	/* Destroy the QP if present (not a listener) */
 	if (rdma->sc_qp && !IS_ERR(rdma->sc_qp))


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v1 07/19] svcrdma: Remove sc_rq_depth
  2018-05-07 19:26 [PATCH v1 00/19] NFS/RDMA server for-next Chuck Lever
                   ` (5 preceding siblings ...)
  2018-05-07 19:27 ` [PATCH v1 06/19] svcrdma: Introduce svc_rdma_recv_ctxt Chuck Lever
@ 2018-05-07 19:27 ` Chuck Lever
  2018-05-07 19:27 ` [PATCH v1 08/19] svcrdma: Simplify svc_rdma_recv_ctxt_put Chuck Lever
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Chuck Lever @ 2018-05-07 19:27 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

Clean up: No need to retain rq_depth in struct svcrdma_xprt, it is
used only in svc_rdma_accept().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h          |    1 -
 net/sunrpc/xprtrdma/svc_rdma_transport.c |   17 ++++++++---------
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 37f759d..3cb6631 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -101,7 +101,6 @@ struct svcxprt_rdma {
 
 	atomic_t             sc_sq_avail;	/* SQEs ready to be consumed */
 	unsigned int	     sc_sq_depth;	/* Depth of SQ */
-	unsigned int	     sc_rq_depth;	/* Depth of RQ */
 	__be32		     sc_fc_credits;	/* Forward credits */
 	u32		     sc_max_requests;	/* Max requests */
 	u32		     sc_max_bc_requests;/* Backward credits */
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index afd5e61..20abd3a 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -589,9 +589,9 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	struct rdma_conn_param conn_param;
 	struct rpcrdma_connect_private pmsg;
 	struct ib_qp_init_attr qp_attr;
+	unsigned int ctxts, rq_depth;
 	struct ib_device *dev;
 	struct sockaddr *sap;
-	unsigned int ctxts;
 	int ret = 0;
 
 	listen_rdma = container_of(xprt, struct svcxprt_rdma, sc_xprt);
@@ -622,19 +622,18 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	newxprt->sc_max_req_size = svcrdma_max_req_size;
 	newxprt->sc_max_requests = svcrdma_max_requests;
 	newxprt->sc_max_bc_requests = svcrdma_max_bc_requests;
-	newxprt->sc_rq_depth = newxprt->sc_max_requests +
-			       newxprt->sc_max_bc_requests;
-	if (newxprt->sc_rq_depth > dev->attrs.max_qp_wr) {
+	rq_depth = newxprt->sc_max_requests + newxprt->sc_max_bc_requests;
+	if (rq_depth > dev->attrs.max_qp_wr) {
 		pr_warn("svcrdma: reducing receive depth to %d\n",
 			dev->attrs.max_qp_wr);
-		newxprt->sc_rq_depth = dev->attrs.max_qp_wr;
-		newxprt->sc_max_requests = newxprt->sc_rq_depth - 2;
+		rq_depth = dev->attrs.max_qp_wr;
+		newxprt->sc_max_requests = rq_depth - 2;
 		newxprt->sc_max_bc_requests = 2;
 	}
 	newxprt->sc_fc_credits = cpu_to_be32(newxprt->sc_max_requests);
 	ctxts = rdma_rw_mr_factor(dev, newxprt->sc_port_num, RPCSVC_MAXPAGES);
 	ctxts *= newxprt->sc_max_requests;
-	newxprt->sc_sq_depth = newxprt->sc_rq_depth + ctxts;
+	newxprt->sc_sq_depth = rq_depth + ctxts;
 	if (newxprt->sc_sq_depth > dev->attrs.max_qp_wr) {
 		pr_warn("svcrdma: reducing send depth to %d\n",
 			dev->attrs.max_qp_wr);
@@ -656,7 +655,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 		dprintk("svcrdma: error creating SQ CQ for connect request\n");
 		goto errout;
 	}
-	newxprt->sc_rq_cq = ib_alloc_cq(dev, newxprt, newxprt->sc_rq_depth,
+	newxprt->sc_rq_cq = ib_alloc_cq(dev, newxprt, rq_depth,
 					0, IB_POLL_WORKQUEUE);
 	if (IS_ERR(newxprt->sc_rq_cq)) {
 		dprintk("svcrdma: error creating RQ CQ for connect request\n");
@@ -669,7 +668,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	qp_attr.port_num = newxprt->sc_port_num;
 	qp_attr.cap.max_rdma_ctxs = ctxts;
 	qp_attr.cap.max_send_wr = newxprt->sc_sq_depth - ctxts;
-	qp_attr.cap.max_recv_wr = newxprt->sc_rq_depth;
+	qp_attr.cap.max_recv_wr = rq_depth;
 	qp_attr.cap.max_send_sge = newxprt->sc_max_sge;
 	qp_attr.cap.max_recv_sge = newxprt->sc_max_sge;
 	qp_attr.sq_sig_type = IB_SIGNAL_REQ_WR;


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v1 08/19] svcrdma: Simplify svc_rdma_recv_ctxt_put
  2018-05-07 19:26 [PATCH v1 00/19] NFS/RDMA server for-next Chuck Lever
                   ` (6 preceding siblings ...)
  2018-05-07 19:27 ` [PATCH v1 07/19] svcrdma: Remove sc_rq_depth Chuck Lever
@ 2018-05-07 19:27 ` Chuck Lever
  2018-05-07 19:27 ` [PATCH v1 09/19] svcrdma: Preserve Receive buffer until svc_rdma_sendto Chuck Lever
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Chuck Lever @ 2018-05-07 19:27 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

Currently svc_rdma_recv_ctxt_put's callers have to know whether they
want to free the ctxt's pages or not. This means the human
developers have to know when and why to set that free_pages
argument.

Instead, the ctxt should carry that information with it so that
svc_rdma_recv_ctxt_put does the right thing no matter who is
calling.

We want to keep track of the number of pages in the Receive buffer
separately from the number of pages pulled over by RDMA Read. This
is so that the correct number of pages can be freed properly and
that number is well-documented.

So now, rc_hdr_count is the number of pages consumed by head[0]
(ie., the page index where the Read chunk should start); and
rc_page_count is always the number of pages that need to be released
when the ctxt is put.

The @free_pages argument is no longer needed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h         |    3 +-
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c |   41 +++++++++++++++++--------------
 net/sunrpc/xprtrdma/svc_rdma_rw.c       |    4 ++-
 3 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 3cb6631..f0bd0b6d 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -173,8 +173,7 @@ extern int svc_rdma_handle_bc_reply(struct rpc_xprt *xprt,
 extern void svc_rdma_recv_ctxts_destroy(struct svcxprt_rdma *rdma);
 extern bool svc_rdma_post_recvs(struct svcxprt_rdma *rdma);
 extern void svc_rdma_recv_ctxt_put(struct svcxprt_rdma *rdma,
-				   struct svc_rdma_recv_ctxt *ctxt,
-				   int free_pages);
+				   struct svc_rdma_recv_ctxt *ctxt);
 extern void svc_rdma_flush_recv_queues(struct svcxprt_rdma *rdma);
 extern int svc_rdma_recvfrom(struct svc_rqst *);
 
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index b7d9c55..ecfe7c9 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -175,18 +175,15 @@ static void svc_rdma_recv_ctxt_unmap(struct svcxprt_rdma *rdma,
  * svc_rdma_recv_ctxt_put - Return recv_ctxt to free list
  * @rdma: controlling svcxprt_rdma
  * @ctxt: object to return to the free list
- * @free_pages: Non-zero if rc_pages should be freed
  *
  */
 void svc_rdma_recv_ctxt_put(struct svcxprt_rdma *rdma,
-			    struct svc_rdma_recv_ctxt *ctxt,
-			    int free_pages)
+			    struct svc_rdma_recv_ctxt *ctxt)
 {
 	unsigned int i;
 
-	if (free_pages)
-		for (i = 0; i < ctxt->rc_page_count; i++)
-			put_page(ctxt->rc_pages[i]);
+	for (i = 0; i < ctxt->rc_page_count; i++)
+		put_page(ctxt->rc_pages[i]);
 	spin_lock(&rdma->sc_recv_lock);
 	list_add(&ctxt->rc_list, &rdma->sc_recv_ctxts);
 	spin_unlock(&rdma->sc_recv_lock);
@@ -243,11 +240,11 @@ static int svc_rdma_post_recv(struct svcxprt_rdma *rdma)
 
 err_put_ctxt:
 	svc_rdma_recv_ctxt_unmap(rdma, ctxt);
-	svc_rdma_recv_ctxt_put(rdma, ctxt, 1);
+	svc_rdma_recv_ctxt_put(rdma, ctxt);
 	return -ENOMEM;
 err_post:
 	svc_rdma_recv_ctxt_unmap(rdma, ctxt);
-	svc_rdma_recv_ctxt_put(rdma, ctxt, 1);
+	svc_rdma_recv_ctxt_put(rdma, ctxt);
 	svc_xprt_put(&rdma->sc_xprt);
 	return ret;
 }
@@ -316,7 +313,7 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
 		       ib_wc_status_msg(wc->status),
 		       wc->status, wc->vendor_err);
 post_err:
-	svc_rdma_recv_ctxt_put(rdma, ctxt, 1);
+	svc_rdma_recv_ctxt_put(rdma, ctxt);
 	set_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags);
 	svc_xprt_enqueue(&rdma->sc_xprt);
 out:
@@ -334,11 +331,11 @@ void svc_rdma_flush_recv_queues(struct svcxprt_rdma *rdma)
 
 	while ((ctxt = svc_rdma_next_recv_ctxt(&rdma->sc_read_complete_q))) {
 		list_del(&ctxt->rc_list);
-		svc_rdma_recv_ctxt_put(rdma, ctxt, 1);
+		svc_rdma_recv_ctxt_put(rdma, ctxt);
 	}
 	while ((ctxt = svc_rdma_next_recv_ctxt(&rdma->sc_rq_dto_q))) {
 		list_del(&ctxt->rc_list);
-		svc_rdma_recv_ctxt_put(rdma, ctxt, 1);
+		svc_rdma_recv_ctxt_put(rdma, ctxt);
 	}
 }
 
@@ -383,16 +380,19 @@ static void svc_rdma_build_arg_xdr(struct svc_rqst *rqstp,
 		len -= min_t(u32, len, ctxt->rc_sges[sge_no].length);
 		sge_no++;
 	}
+	ctxt->rc_hdr_count = sge_no;
 	rqstp->rq_respages = &rqstp->rq_pages[sge_no];
 	rqstp->rq_next_page = rqstp->rq_respages + 1;
 
 	/* If not all pages were used from the SGL, free the remaining ones */
-	len = sge_no;
 	while (sge_no < ctxt->rc_recv_wr.num_sge) {
 		page = ctxt->rc_pages[sge_no++];
 		put_page(page);
 	}
-	ctxt->rc_page_count = len;
+
+	/* @ctxt's pages have all been released or moved to @rqstp->rq_pages.
+	 */
+	ctxt->rc_page_count = 0;
 
 	/* Set up tail */
 	rqstp->rq_arg.tail[0].iov_base = NULL;
@@ -602,11 +602,14 @@ static void rdma_read_complete(struct svc_rqst *rqstp,
 {
 	int page_no;
 
-	/* Copy RPC pages */
+	/* Move Read chunk pages to rqstp so that they will be released
+	 * when svc_process is done with them.
+	 */
 	for (page_no = 0; page_no < head->rc_page_count; page_no++) {
 		put_page(rqstp->rq_pages[page_no]);
 		rqstp->rq_pages[page_no] = head->rc_pages[page_no];
 	}
+	head->rc_page_count = 0;
 
 	/* Point rq_arg.pages past header */
 	rqstp->rq_arg.pages = &rqstp->rq_pages[head->rc_hdr_count];
@@ -777,7 +780,7 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
 	if (svc_rdma_is_backchannel_reply(xprt, p)) {
 		ret = svc_rdma_handle_bc_reply(xprt->xpt_bc_xprt, p,
 					       &rqstp->rq_arg);
-		svc_rdma_recv_ctxt_put(rdma_xprt, ctxt, 0);
+		svc_rdma_recv_ctxt_put(rdma_xprt, ctxt);
 		return ret;
 	}
 
@@ -786,7 +789,7 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
 		goto out_readchunk;
 
 complete:
-	svc_rdma_recv_ctxt_put(rdma_xprt, ctxt, 0);
+	svc_rdma_recv_ctxt_put(rdma_xprt, ctxt);
 	rqstp->rq_prot = IPPROTO_MAX;
 	svc_xprt_copy_addrs(rqstp, xprt);
 	return rqstp->rq_arg.len;
@@ -799,16 +802,16 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
 
 out_err:
 	svc_rdma_send_error(rdma_xprt, p, ret);
-	svc_rdma_recv_ctxt_put(rdma_xprt, ctxt, 0);
+	svc_rdma_recv_ctxt_put(rdma_xprt, ctxt);
 	return 0;
 
 out_postfail:
 	if (ret == -EINVAL)
 		svc_rdma_send_error(rdma_xprt, p, ret);
-	svc_rdma_recv_ctxt_put(rdma_xprt, ctxt, 1);
+	svc_rdma_recv_ctxt_put(rdma_xprt, ctxt);
 	return ret;
 
 out_drop:
-	svc_rdma_recv_ctxt_put(rdma_xprt, ctxt, 1);
+	svc_rdma_recv_ctxt_put(rdma_xprt, ctxt);
 	return 0;
 }
diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
index c080ce2..8242aa3 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
@@ -282,7 +282,7 @@ static void svc_rdma_wc_read_done(struct ib_cq *cq, struct ib_wc *wc)
 			pr_err("svcrdma: read ctx: %s (%u/0x%x)\n",
 			       ib_wc_status_msg(wc->status),
 			       wc->status, wc->vendor_err);
-		svc_rdma_recv_ctxt_put(rdma, info->ri_readctxt, 1);
+		svc_rdma_recv_ctxt_put(rdma, info->ri_readctxt);
 	} else {
 		spin_lock(&rdma->sc_rq_dto_lock);
 		list_add_tail(&info->ri_readctxt->rc_list,
@@ -834,7 +834,7 @@ int svc_rdma_recv_read_chunk(struct svcxprt_rdma *rdma, struct svc_rqst *rqstp,
 	 * head->rc_arg. Pages involved with RDMA Read I/O are
 	 * transferred there.
 	 */
-	head->rc_hdr_count = head->rc_page_count;
+	head->rc_page_count = head->rc_hdr_count;
 	head->rc_arg.head[0] = rqstp->rq_arg.head[0];
 	head->rc_arg.tail[0] = rqstp->rq_arg.tail[0];
 	head->rc_arg.pages = head->rc_pages;


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v1 09/19] svcrdma: Preserve Receive buffer until svc_rdma_sendto
  2018-05-07 19:26 [PATCH v1 00/19] NFS/RDMA server for-next Chuck Lever
                   ` (7 preceding siblings ...)
  2018-05-07 19:27 ` [PATCH v1 08/19] svcrdma: Simplify svc_rdma_recv_ctxt_put Chuck Lever
@ 2018-05-07 19:27 ` Chuck Lever
  2018-05-09 21:03   ` J. Bruce Fields
  2018-05-07 19:27 ` [PATCH v1 10/19] svcrdma: Persistently allocate and DMA-map Receive buffers Chuck Lever
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 29+ messages in thread
From: Chuck Lever @ 2018-05-07 19:27 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

Rather than releasing the incoming svc_rdma_recv_ctxt at the end of
svc_rdma_recvfrom, hold onto it until svc_rdma_sendto.

This permits the contents of the Receive buffer to be preserved
through svc_process and then referenced directly in sendto as it
constructs Write and Reply chunks to return to the client.

The real changes will come in subsequent patches.

Note: I cannot use ->xpo_release_rqst for this purpose because that
is called _before_ ->xpo_sendto. svc_rdma_sendto uses information in
the received Call transport header to construct the Reply transport
header, which is preserved in the RPC's Receive buffer.

The historical comment in svc_send() isn't helpful: it is already
obvious that ->xpo_release_rqst is being called before ->xpo_sendto,
but there is no explanation for this ordering going back to the
beginning of the git era.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c |    2 +-
 net/sunrpc/xprtrdma/svc_rdma_sendto.c   |   14 +++++++++++---
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index ecfe7c9..d9fef52 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -789,7 +789,7 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
 		goto out_readchunk;
 
 complete:
-	svc_rdma_recv_ctxt_put(rdma_xprt, ctxt);
+	rqstp->rq_xprt_ctxt = ctxt;
 	rqstp->rq_prot = IPPROTO_MAX;
 	svc_xprt_copy_addrs(rqstp, xprt);
 	return rqstp->rq_arg.len;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index a397d9a..cbbde70 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -623,6 +623,7 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
 	struct svc_xprt *xprt = rqstp->rq_xprt;
 	struct svcxprt_rdma *rdma =
 		container_of(xprt, struct svcxprt_rdma, sc_xprt);
+	struct svc_rdma_recv_ctxt *rctxt = rqstp->rq_xprt_ctxt;
 	__be32 *p, *rdma_argp, *rdma_resp, *wr_lst, *rp_ch;
 	struct xdr_buf *xdr = &rqstp->rq_res;
 	struct page *res_page;
@@ -675,7 +676,12 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
 				      wr_lst, rp_ch);
 	if (ret < 0)
 		goto err0;
-	return 0;
+	ret = 0;
+
+out:
+	rqstp->rq_xprt_ctxt = NULL;
+	svc_rdma_recv_ctxt_put(rdma, rctxt);
+	return ret;
 
  err2:
 	if (ret != -E2BIG && ret != -EINVAL)
@@ -684,12 +690,14 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
 	ret = svc_rdma_send_error_msg(rdma, rdma_resp, rqstp);
 	if (ret < 0)
 		goto err0;
-	return 0;
+	ret = 0;
+	goto out;
 
  err1:
 	put_page(res_page);
  err0:
 	trace_svcrdma_send_failed(rqstp, ret);
 	set_bit(XPT_CLOSE, &xprt->xpt_flags);
-	return -ENOTCONN;
+	ret = -ENOTCONN;
+	goto out;
 }


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v1 10/19] svcrdma: Persistently allocate and DMA-map Receive buffers
  2018-05-07 19:26 [PATCH v1 00/19] NFS/RDMA server for-next Chuck Lever
                   ` (8 preceding siblings ...)
  2018-05-07 19:27 ` [PATCH v1 09/19] svcrdma: Preserve Receive buffer until svc_rdma_sendto Chuck Lever
@ 2018-05-07 19:27 ` Chuck Lever
  2018-05-09 21:18   ` J. Bruce Fields
  2018-05-07 19:27 ` [PATCH v1 11/19] svcrdma: Allocate recv_ctxt's on CPU handling Receives Chuck Lever
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 29+ messages in thread
From: Chuck Lever @ 2018-05-07 19:27 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

The current Receive path uses an array of pages which are allocated
and DMA mapped when each Receive WR is posted, and then handed off
to the upper layer in rqstp::rq_arg. The page flip releases unused
pages in the rq_pages pagelist. This mechanism introduces a
significant amount of overhead.

So instead, kmalloc the Receive buffer, and leave it DMA-mapped
while the transport remains connected. This confers a number of
benefits:

* Each Receive WR requires only one receive SGE, no matter how large
  the inline threshold is. This helps the server-side NFS/RDMA
  transport operate on less capable RDMA devices.

* The Receive buffer is left allocated and mapped all the time. This
  relieves svc_rdma_post_recv from the overhead of allocating and
  DMA-mapping a fresh buffer.

* svc_rdma_wc_receive no longer has to DMA unmap the Receive buffer.
  It has to DMA sync only the number of bytes that were received.

* svc_rdma_build_arg_xdr no longer has to free a page in rq_pages
  for each page in the Receive buffer, making it a constant-time
  function.

* The Receive buffer is now plugged directly into the rq_arg's
  head[0].iov_vec, and can be larger than a page without spilling
  over into rq_arg's page list. This enables simplification of
  the RDMA Read path in subsequent patches.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h          |    4 -
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |  168 +++++++++++-------------------
 net/sunrpc/xprtrdma/svc_rdma_rw.c        |   32 ++----
 net/sunrpc/xprtrdma/svc_rdma_sendto.c    |    5 -
 net/sunrpc/xprtrdma/svc_rdma_transport.c |    2 
 5 files changed, 75 insertions(+), 136 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index f0bd0b6d..01baabf 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -148,12 +148,12 @@ struct svc_rdma_recv_ctxt {
 	struct list_head	rc_list;
 	struct ib_recv_wr	rc_recv_wr;
 	struct ib_cqe		rc_cqe;
+	struct ib_sge		rc_recv_sge;
+	void			*rc_recv_buf;
 	struct xdr_buf		rc_arg;
 	u32			rc_byte_len;
 	unsigned int		rc_page_count;
 	unsigned int		rc_hdr_count;
-	struct ib_sge		rc_sges[1 +
-					RPCRDMA_MAX_INLINE_THRESH / PAGE_SIZE];
 	struct page		*rc_pages[RPCSVC_MAXPAGES];
 };
 
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index d9fef52..d4ccd1c 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -117,6 +117,43 @@
 					rc_list);
 }
 
+static struct svc_rdma_recv_ctxt *
+svc_rdma_recv_ctxt_alloc(struct svcxprt_rdma *rdma)
+{
+	struct svc_rdma_recv_ctxt *ctxt;
+	dma_addr_t addr;
+	void *buffer;
+
+	ctxt = kmalloc(sizeof(*ctxt), GFP_KERNEL);
+	if (!ctxt)
+		goto fail0;
+	buffer = kmalloc(rdma->sc_max_req_size, GFP_KERNEL);
+	if (!buffer)
+		goto fail1;
+	addr = ib_dma_map_single(rdma->sc_pd->device, buffer,
+				 rdma->sc_max_req_size, DMA_FROM_DEVICE);
+	if (ib_dma_mapping_error(rdma->sc_pd->device, addr))
+		goto fail2;
+
+	ctxt->rc_recv_wr.next = NULL;
+	ctxt->rc_recv_wr.wr_cqe = &ctxt->rc_cqe;
+	ctxt->rc_recv_wr.sg_list = &ctxt->rc_recv_sge;
+	ctxt->rc_recv_wr.num_sge = 1;
+	ctxt->rc_cqe.done = svc_rdma_wc_receive;
+	ctxt->rc_recv_sge.addr = addr;
+	ctxt->rc_recv_sge.length = rdma->sc_max_req_size;
+	ctxt->rc_recv_sge.lkey = rdma->sc_pd->local_dma_lkey;
+	ctxt->rc_recv_buf = buffer;
+	return ctxt;
+
+fail2:
+	kfree(buffer);
+fail1:
+	kfree(ctxt);
+fail0:
+	return NULL;
+}
+
 /**
  * svc_rdma_recv_ctxts_destroy - Release all recv_ctxt's for an xprt
  * @rdma: svcxprt_rdma being torn down
@@ -128,6 +165,11 @@ void svc_rdma_recv_ctxts_destroy(struct svcxprt_rdma *rdma)
 
 	while ((ctxt = svc_rdma_next_recv_ctxt(&rdma->sc_recv_ctxts))) {
 		list_del(&ctxt->rc_list);
+		ib_dma_unmap_single(rdma->sc_pd->device,
+				    ctxt->rc_recv_sge.addr,
+				    ctxt->rc_recv_sge.length,
+				    DMA_FROM_DEVICE);
+		kfree(ctxt->rc_recv_buf);
 		kfree(ctxt);
 	}
 }
@@ -145,32 +187,18 @@ void svc_rdma_recv_ctxts_destroy(struct svcxprt_rdma *rdma)
 	spin_unlock(&rdma->sc_recv_lock);
 
 out:
-	ctxt->rc_recv_wr.num_sge = 0;
 	ctxt->rc_page_count = 0;
 	return ctxt;
 
 out_empty:
 	spin_unlock(&rdma->sc_recv_lock);
 
-	ctxt = kmalloc(sizeof(*ctxt), GFP_KERNEL);
+	ctxt = svc_rdma_recv_ctxt_alloc(rdma);
 	if (!ctxt)
 		return NULL;
 	goto out;
 }
 
-static void svc_rdma_recv_ctxt_unmap(struct svcxprt_rdma *rdma,
-				     struct svc_rdma_recv_ctxt *ctxt)
-{
-	struct ib_device *device = rdma->sc_cm_id->device;
-	int i;
-
-	for (i = 0; i < ctxt->rc_recv_wr.num_sge; i++)
-		ib_dma_unmap_page(device,
-				  ctxt->rc_sges[i].addr,
-				  ctxt->rc_sges[i].length,
-				  DMA_FROM_DEVICE);
-}
-
 /**
  * svc_rdma_recv_ctxt_put - Return recv_ctxt to free list
  * @rdma: controlling svcxprt_rdma
@@ -191,46 +219,14 @@ void svc_rdma_recv_ctxt_put(struct svcxprt_rdma *rdma,
 
 static int svc_rdma_post_recv(struct svcxprt_rdma *rdma)
 {
-	struct ib_device *device = rdma->sc_cm_id->device;
 	struct svc_rdma_recv_ctxt *ctxt;
 	struct ib_recv_wr *bad_recv_wr;
-	int sge_no, buflen, ret;
-	struct page *page;
-	dma_addr_t pa;
+	int ret;
 
 	ctxt = svc_rdma_recv_ctxt_get(rdma);
 	if (!ctxt)
 		return -ENOMEM;
 
-	buflen = 0;
-	ctxt->rc_cqe.done = svc_rdma_wc_receive;
-	for (sge_no = 0; buflen < rdma->sc_max_req_size; sge_no++) {
-		if (sge_no >= rdma->sc_max_sge) {
-			pr_err("svcrdma: Too many sges (%d)\n", sge_no);
-			goto err_put_ctxt;
-		}
-
-		page = alloc_page(GFP_KERNEL);
-		if (!page)
-			goto err_put_ctxt;
-		ctxt->rc_pages[sge_no] = page;
-		ctxt->rc_page_count++;
-
-		pa = ib_dma_map_page(device, ctxt->rc_pages[sge_no],
-				     0, PAGE_SIZE, DMA_FROM_DEVICE);
-		if (ib_dma_mapping_error(device, pa))
-			goto err_put_ctxt;
-		ctxt->rc_sges[sge_no].addr = pa;
-		ctxt->rc_sges[sge_no].length = PAGE_SIZE;
-		ctxt->rc_sges[sge_no].lkey = rdma->sc_pd->local_dma_lkey;
-		ctxt->rc_recv_wr.num_sge++;
-
-		buflen += PAGE_SIZE;
-	}
-	ctxt->rc_recv_wr.next = NULL;
-	ctxt->rc_recv_wr.sg_list = &ctxt->rc_sges[0];
-	ctxt->rc_recv_wr.wr_cqe = &ctxt->rc_cqe;
-
 	svc_xprt_get(&rdma->sc_xprt);
 	ret = ib_post_recv(rdma->sc_qp, &ctxt->rc_recv_wr, &bad_recv_wr);
 	trace_svcrdma_post_recv(&ctxt->rc_recv_wr, ret);
@@ -238,12 +234,7 @@ static int svc_rdma_post_recv(struct svcxprt_rdma *rdma)
 		goto err_post;
 	return 0;
 
-err_put_ctxt:
-	svc_rdma_recv_ctxt_unmap(rdma, ctxt);
-	svc_rdma_recv_ctxt_put(rdma, ctxt);
-	return -ENOMEM;
 err_post:
-	svc_rdma_recv_ctxt_unmap(rdma, ctxt);
 	svc_rdma_recv_ctxt_put(rdma, ctxt);
 	svc_xprt_put(&rdma->sc_xprt);
 	return ret;
@@ -289,7 +280,6 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
 
 	/* WARNING: Only wc->wr_cqe and wc->status are reliable */
 	ctxt = container_of(cqe, struct svc_rdma_recv_ctxt, rc_cqe);
-	svc_rdma_recv_ctxt_unmap(rdma, ctxt);
 
 	if (wc->status != IB_WC_SUCCESS)
 		goto flushed;
@@ -299,6 +289,10 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
 
 	/* All wc fields are now known to be valid */
 	ctxt->rc_byte_len = wc->byte_len;
+	ib_dma_sync_single_for_cpu(rdma->sc_pd->device,
+				   ctxt->rc_recv_sge.addr,
+				   wc->byte_len, DMA_FROM_DEVICE);
+
 	spin_lock(&rdma->sc_rq_dto_lock);
 	list_add_tail(&ctxt->rc_list, &rdma->sc_rq_dto_q);
 	spin_unlock(&rdma->sc_rq_dto_lock);
@@ -339,64 +333,22 @@ void svc_rdma_flush_recv_queues(struct svcxprt_rdma *rdma)
 	}
 }
 
-/*
- * Replace the pages in the rq_argpages array with the pages from the SGE in
- * the RDMA_RECV completion. The SGL should contain full pages up until the
- * last one.
- */
 static void svc_rdma_build_arg_xdr(struct svc_rqst *rqstp,
 				   struct svc_rdma_recv_ctxt *ctxt)
 {
-	struct page *page;
-	int sge_no;
-	u32 len;
-
-	/* The reply path assumes the Call's transport header resides
-	 * in rqstp->rq_pages[0].
-	 */
-	page = ctxt->rc_pages[0];
-	put_page(rqstp->rq_pages[0]);
-	rqstp->rq_pages[0] = page;
-
-	/* Set up the XDR head */
-	rqstp->rq_arg.head[0].iov_base = page_address(page);
-	rqstp->rq_arg.head[0].iov_len =
-		min_t(size_t, ctxt->rc_byte_len, ctxt->rc_sges[0].length);
-	rqstp->rq_arg.len = ctxt->rc_byte_len;
-	rqstp->rq_arg.buflen = ctxt->rc_byte_len;
-
-	/* Compute bytes past head in the SGL */
-	len = ctxt->rc_byte_len - rqstp->rq_arg.head[0].iov_len;
-
-	/* If data remains, store it in the pagelist */
-	rqstp->rq_arg.page_len = len;
-	rqstp->rq_arg.page_base = 0;
-
-	sge_no = 1;
-	while (len && sge_no < ctxt->rc_recv_wr.num_sge) {
-		page = ctxt->rc_pages[sge_no];
-		put_page(rqstp->rq_pages[sge_no]);
-		rqstp->rq_pages[sge_no] = page;
-		len -= min_t(u32, len, ctxt->rc_sges[sge_no].length);
-		sge_no++;
-	}
-	ctxt->rc_hdr_count = sge_no;
-	rqstp->rq_respages = &rqstp->rq_pages[sge_no];
+	struct xdr_buf *arg = &rqstp->rq_arg;
+
+	arg->head[0].iov_base = ctxt->rc_recv_buf;
+	arg->head[0].iov_len = ctxt->rc_byte_len;
+	arg->tail[0].iov_base = NULL;
+	arg->tail[0].iov_len = 0;
+	arg->page_len = 0;
+	arg->page_base = 0;
+	arg->buflen = ctxt->rc_byte_len;
+	arg->len = ctxt->rc_byte_len;
+
+	rqstp->rq_respages = &rqstp->rq_pages[0];
 	rqstp->rq_next_page = rqstp->rq_respages + 1;
-
-	/* If not all pages were used from the SGL, free the remaining ones */
-	while (sge_no < ctxt->rc_recv_wr.num_sge) {
-		page = ctxt->rc_pages[sge_no++];
-		put_page(page);
-	}
-
-	/* @ctxt's pages have all been released or moved to @rqstp->rq_pages.
-	 */
-	ctxt->rc_page_count = 0;
-
-	/* Set up tail */
-	rqstp->rq_arg.tail[0].iov_base = NULL;
-	rqstp->rq_arg.tail[0].iov_len = 0;
 }
 
 /* This accommodates the largest possible Write chunk,
diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
index 8242aa3..ce3ea84 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
@@ -718,15 +718,14 @@ static int svc_rdma_build_normal_read_chunk(struct svc_rqst *rqstp,
 	struct svc_rdma_recv_ctxt *head = info->ri_readctxt;
 	int ret;
 
-	info->ri_pageno = head->rc_hdr_count;
-	info->ri_pageoff = 0;
-
 	ret = svc_rdma_build_read_chunk(rqstp, info, p);
 	if (ret < 0)
 		goto out;
 
 	trace_svcrdma_encode_read(info->ri_chunklen, info->ri_position);
 
+	head->rc_hdr_count = 0;
+
 	/* Split the Receive buffer between the head and tail
 	 * buffers at Read chunk's position. XDR roundup of the
 	 * chunk is not included in either the pagelist or in
@@ -775,9 +774,6 @@ static int svc_rdma_build_pz_read_chunk(struct svc_rqst *rqstp,
 	struct svc_rdma_recv_ctxt *head = info->ri_readctxt;
 	int ret;
 
-	info->ri_pageno = head->rc_hdr_count - 1;
-	info->ri_pageoff = offset_in_page(head->rc_byte_len);
-
 	ret = svc_rdma_build_read_chunk(rqstp, info, p);
 	if (ret < 0)
 		goto out;
@@ -787,20 +783,13 @@ static int svc_rdma_build_pz_read_chunk(struct svc_rqst *rqstp,
 	head->rc_arg.len += info->ri_chunklen;
 	head->rc_arg.buflen += info->ri_chunklen;
 
-	if (head->rc_arg.buflen <= head->rc_sges[0].length) {
-		/* Transport header and RPC message fit entirely
-		 * in page where head iovec resides.
-		 */
-		head->rc_arg.head[0].iov_len = info->ri_chunklen;
-	} else {
-		/* Transport header and part of RPC message reside
-		 * in the head iovec's page.
-		 */
-		head->rc_arg.head[0].iov_len =
-			head->rc_sges[0].length - head->rc_byte_len;
-		head->rc_arg.page_len =
-			info->ri_chunklen - head->rc_arg.head[0].iov_len;
-	}
+	head->rc_hdr_count = 1;
+	head->rc_arg.head[0].iov_base = page_address(head->rc_pages[0]);
+	head->rc_arg.head[0].iov_len = min_t(size_t, PAGE_SIZE,
+					     info->ri_chunklen);
+
+	head->rc_arg.page_len = info->ri_chunklen -
+				head->rc_arg.head[0].iov_len;
 
 out:
 	return ret;
@@ -834,7 +823,6 @@ int svc_rdma_recv_read_chunk(struct svcxprt_rdma *rdma, struct svc_rqst *rqstp,
 	 * head->rc_arg. Pages involved with RDMA Read I/O are
 	 * transferred there.
 	 */
-	head->rc_page_count = head->rc_hdr_count;
 	head->rc_arg.head[0] = rqstp->rq_arg.head[0];
 	head->rc_arg.tail[0] = rqstp->rq_arg.tail[0];
 	head->rc_arg.pages = head->rc_pages;
@@ -847,6 +835,8 @@ int svc_rdma_recv_read_chunk(struct svcxprt_rdma *rdma, struct svc_rqst *rqstp,
 	if (!info)
 		return -ENOMEM;
 	info->ri_readctxt = head;
+	info->ri_pageno = 0;
+	info->ri_pageoff = 0;
 
 	info->ri_position = be32_to_cpup(p + 1);
 	if (info->ri_position)
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index cbbde70..b27b597 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -629,10 +629,7 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
 	struct page *res_page;
 	int ret;
 
-	/* Find the call's chunk lists to decide how to send the reply.
-	 * Receive places the Call's xprt header at the start of page 0.
-	 */
-	rdma_argp = page_address(rqstp->rq_pages[0]);
+	rdma_argp = rctxt->rc_recv_buf;
 	svc_rdma_get_write_arrays(rdma_argp, &wr_lst, &rp_ch);
 
 	/* Create the RDMA response header. xprt->xpt_mutex,
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 20abd3a..333c432 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -670,7 +670,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	qp_attr.cap.max_send_wr = newxprt->sc_sq_depth - ctxts;
 	qp_attr.cap.max_recv_wr = rq_depth;
 	qp_attr.cap.max_send_sge = newxprt->sc_max_sge;
-	qp_attr.cap.max_recv_sge = newxprt->sc_max_sge;
+	qp_attr.cap.max_recv_sge = 1;
 	qp_attr.sq_sig_type = IB_SIGNAL_REQ_WR;
 	qp_attr.qp_type = IB_QPT_RC;
 	qp_attr.send_cq = newxprt->sc_sq_cq;


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v1 11/19] svcrdma: Allocate recv_ctxt's on CPU handling Receives
  2018-05-07 19:26 [PATCH v1 00/19] NFS/RDMA server for-next Chuck Lever
                   ` (9 preceding siblings ...)
  2018-05-07 19:27 ` [PATCH v1 10/19] svcrdma: Persistently allocate and DMA-map Receive buffers Chuck Lever
@ 2018-05-07 19:27 ` Chuck Lever
  2018-05-07 19:27 ` [PATCH v1 12/19] svcrdma: Refactor svc_rdma_dma_map_buf Chuck Lever
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Chuck Lever @ 2018-05-07 19:27 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

There is a significant latency penalty when processing an ingress
Receive if the Receive buffer resides in memory that is not on the
same NUMA node as the the CPU handling completions for a CQ.

The system administrator and the device driver determine which CPU
handles completions. This CPU does not change during life of the CQ.
Further the Upper Layer does not have any visibility of which CPU it
is.

Allocating Receive buffers in the Receive completion handler
guarantees that Receive buffers are allocated on the preferred NUMA
node for that CQ.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h         |    1 +
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c |   52 +++++++++++++++++++++----------
 2 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 01baabf..27cf59c 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -151,6 +151,7 @@ struct svc_rdma_recv_ctxt {
 	struct ib_sge		rc_recv_sge;
 	void			*rc_recv_buf;
 	struct xdr_buf		rc_arg;
+	bool			rc_temp;
 	u32			rc_byte_len;
 	unsigned int		rc_page_count;
 	unsigned int		rc_hdr_count;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index d4ccd1c..0445e75 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -144,6 +144,7 @@
 	ctxt->rc_recv_sge.length = rdma->sc_max_req_size;
 	ctxt->rc_recv_sge.lkey = rdma->sc_pd->local_dma_lkey;
 	ctxt->rc_recv_buf = buffer;
+	ctxt->rc_temp = false;
 	return ctxt;
 
 fail2:
@@ -154,6 +155,15 @@
 	return NULL;
 }
 
+static void svc_rdma_recv_ctxt_destroy(struct svcxprt_rdma *rdma,
+				       struct svc_rdma_recv_ctxt *ctxt)
+{
+	ib_dma_unmap_single(rdma->sc_pd->device, ctxt->rc_recv_sge.addr,
+			    ctxt->rc_recv_sge.length, DMA_FROM_DEVICE);
+	kfree(ctxt->rc_recv_buf);
+	kfree(ctxt);
+}
+
 /**
  * svc_rdma_recv_ctxts_destroy - Release all recv_ctxt's for an xprt
  * @rdma: svcxprt_rdma being torn down
@@ -165,12 +175,7 @@ void svc_rdma_recv_ctxts_destroy(struct svcxprt_rdma *rdma)
 
 	while ((ctxt = svc_rdma_next_recv_ctxt(&rdma->sc_recv_ctxts))) {
 		list_del(&ctxt->rc_list);
-		ib_dma_unmap_single(rdma->sc_pd->device,
-				    ctxt->rc_recv_sge.addr,
-				    ctxt->rc_recv_sge.length,
-				    DMA_FROM_DEVICE);
-		kfree(ctxt->rc_recv_buf);
-		kfree(ctxt);
+		svc_rdma_recv_ctxt_destroy(rdma, ctxt);
 	}
 }
 
@@ -212,21 +217,21 @@ void svc_rdma_recv_ctxt_put(struct svcxprt_rdma *rdma,
 
 	for (i = 0; i < ctxt->rc_page_count; i++)
 		put_page(ctxt->rc_pages[i]);
-	spin_lock(&rdma->sc_recv_lock);
-	list_add(&ctxt->rc_list, &rdma->sc_recv_ctxts);
-	spin_unlock(&rdma->sc_recv_lock);
+
+	if (!ctxt->rc_temp) {
+		spin_lock(&rdma->sc_recv_lock);
+		list_add(&ctxt->rc_list, &rdma->sc_recv_ctxts);
+		spin_unlock(&rdma->sc_recv_lock);
+	} else
+		svc_rdma_recv_ctxt_destroy(rdma, ctxt);
 }
 
-static int svc_rdma_post_recv(struct svcxprt_rdma *rdma)
+static int __svc_rdma_post_recv(struct svcxprt_rdma *rdma,
+				struct svc_rdma_recv_ctxt *ctxt)
 {
-	struct svc_rdma_recv_ctxt *ctxt;
 	struct ib_recv_wr *bad_recv_wr;
 	int ret;
 
-	ctxt = svc_rdma_recv_ctxt_get(rdma);
-	if (!ctxt)
-		return -ENOMEM;
-
 	svc_xprt_get(&rdma->sc_xprt);
 	ret = ib_post_recv(rdma->sc_qp, &ctxt->rc_recv_wr, &bad_recv_wr);
 	trace_svcrdma_post_recv(&ctxt->rc_recv_wr, ret);
@@ -240,6 +245,16 @@ static int svc_rdma_post_recv(struct svcxprt_rdma *rdma)
 	return ret;
 }
 
+static int svc_rdma_post_recv(struct svcxprt_rdma *rdma)
+{
+	struct svc_rdma_recv_ctxt *ctxt;
+
+	ctxt = svc_rdma_recv_ctxt_get(rdma);
+	if (!ctxt)
+		return -ENOMEM;
+	return __svc_rdma_post_recv(rdma, ctxt);
+}
+
 /**
  * svc_rdma_post_recvs - Post initial set of Recv WRs
  * @rdma: fresh svcxprt_rdma
@@ -248,11 +263,16 @@ static int svc_rdma_post_recv(struct svcxprt_rdma *rdma)
  */
 bool svc_rdma_post_recvs(struct svcxprt_rdma *rdma)
 {
+	struct svc_rdma_recv_ctxt *ctxt;
 	unsigned int i;
 	int ret;
 
 	for (i = 0; i < rdma->sc_max_requests; i++) {
-		ret = svc_rdma_post_recv(rdma);
+		ctxt = svc_rdma_recv_ctxt_get(rdma);
+		if (!ctxt)
+			return -ENOMEM;
+		ctxt->rc_temp = true;
+		ret = __svc_rdma_post_recv(rdma, ctxt);
 		if (ret) {
 			pr_err("svcrdma: failure posting recv buffers: %d\n",
 			       ret);


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v1 12/19] svcrdma: Refactor svc_rdma_dma_map_buf
  2018-05-07 19:26 [PATCH v1 00/19] NFS/RDMA server for-next Chuck Lever
                   ` (10 preceding siblings ...)
  2018-05-07 19:27 ` [PATCH v1 11/19] svcrdma: Allocate recv_ctxt's on CPU handling Receives Chuck Lever
@ 2018-05-07 19:27 ` Chuck Lever
  2018-05-07 19:27 ` [PATCH v1 13/19] svcrdma: Clean up Send SGE accounting Chuck Lever
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Chuck Lever @ 2018-05-07 19:27 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

Clean up: svc_rdma_dma_map_buf does mostly the same thing as
svc_rdma_dma_map_page, so let's fold these together.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h       |    7 -----
 net/sunrpc/xprtrdma/svc_rdma_sendto.c |   50 +++++++++++----------------------
 2 files changed, 17 insertions(+), 40 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 27cf59c..95530bc 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -158,13 +158,6 @@ struct svc_rdma_recv_ctxt {
 	struct page		*rc_pages[RPCSVC_MAXPAGES];
 };
 
-/* Track DMA maps for this transport and context */
-static inline void svc_rdma_count_mappings(struct svcxprt_rdma *rdma,
-					   struct svc_rdma_op_ctxt *ctxt)
-{
-	ctxt->mapped_sges++;
-}
-
 /* svc_rdma_backchannel.c */
 extern int svc_rdma_handle_bc_reply(struct rpc_xprt *xprt,
 				    __be32 *rdma_resp,
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index b27b597..ee9ba07 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -302,41 +302,11 @@ static u32 svc_rdma_get_inv_rkey(__be32 *rdma_argp,
 	return be32_to_cpup(p);
 }
 
-/* ib_dma_map_page() is used here because svc_rdma_dma_unmap()
- * is used during completion to DMA-unmap this memory, and
- * it uses ib_dma_unmap_page() exclusively.
- */
-static int svc_rdma_dma_map_buf(struct svcxprt_rdma *rdma,
-				struct svc_rdma_op_ctxt *ctxt,
-				unsigned int sge_no,
-				unsigned char *base,
-				unsigned int len)
-{
-	unsigned long offset = (unsigned long)base & ~PAGE_MASK;
-	struct ib_device *dev = rdma->sc_cm_id->device;
-	dma_addr_t dma_addr;
-
-	dma_addr = ib_dma_map_page(dev, virt_to_page(base),
-				   offset, len, DMA_TO_DEVICE);
-	if (ib_dma_mapping_error(dev, dma_addr))
-		goto out_maperr;
-
-	ctxt->sge[sge_no].addr = dma_addr;
-	ctxt->sge[sge_no].length = len;
-	ctxt->sge[sge_no].lkey = rdma->sc_pd->local_dma_lkey;
-	svc_rdma_count_mappings(rdma, ctxt);
-	return 0;
-
-out_maperr:
-	pr_err("svcrdma: failed to map buffer\n");
-	return -EIO;
-}
-
 static int svc_rdma_dma_map_page(struct svcxprt_rdma *rdma,
 				 struct svc_rdma_op_ctxt *ctxt,
 				 unsigned int sge_no,
 				 struct page *page,
-				 unsigned int offset,
+				 unsigned long offset,
 				 unsigned int len)
 {
 	struct ib_device *dev = rdma->sc_cm_id->device;
@@ -349,7 +319,7 @@ static int svc_rdma_dma_map_page(struct svcxprt_rdma *rdma,
 	ctxt->sge[sge_no].addr = dma_addr;
 	ctxt->sge[sge_no].length = len;
 	ctxt->sge[sge_no].lkey = rdma->sc_pd->local_dma_lkey;
-	svc_rdma_count_mappings(rdma, ctxt);
+	ctxt->mapped_sges++;
 	return 0;
 
 out_maperr:
@@ -357,6 +327,19 @@ static int svc_rdma_dma_map_page(struct svcxprt_rdma *rdma,
 	return -EIO;
 }
 
+/* ib_dma_map_page() is used here because svc_rdma_dma_unmap()
+ * handles DMA-unmap and it uses ib_dma_unmap_page() exclusively.
+ */
+static int svc_rdma_dma_map_buf(struct svcxprt_rdma *rdma,
+				struct svc_rdma_op_ctxt *ctxt,
+				unsigned int sge_no,
+				unsigned char *base,
+				unsigned int len)
+{
+	return svc_rdma_dma_map_page(rdma, ctxt, sge_no, virt_to_page(base),
+				     offset_in_page(base), len);
+}
+
 /**
  * svc_rdma_map_reply_hdr - DMA map the transport header buffer
  * @rdma: controlling transport
@@ -389,7 +372,8 @@ static int svc_rdma_map_reply_msg(struct svcxprt_rdma *rdma,
 				  struct svc_rdma_op_ctxt *ctxt,
 				  struct xdr_buf *xdr, __be32 *wr_lst)
 {
-	unsigned int len, sge_no, remaining, page_off;
+	unsigned int len, sge_no, remaining;
+	unsigned long page_off;
 	struct page **ppages;
 	unsigned char *base;
 	u32 xdr_pad;


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v1 13/19] svcrdma: Clean up Send SGE accounting
  2018-05-07 19:26 [PATCH v1 00/19] NFS/RDMA server for-next Chuck Lever
                   ` (11 preceding siblings ...)
  2018-05-07 19:27 ` [PATCH v1 12/19] svcrdma: Refactor svc_rdma_dma_map_buf Chuck Lever
@ 2018-05-07 19:27 ` Chuck Lever
  2018-05-07 19:28 ` [PATCH v1 14/19] svcrdma: Introduce svc_rdma_send_ctxt Chuck Lever
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Chuck Lever @ 2018-05-07 19:27 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

Clean up: Since there's already a svc_rdma_op_ctxt being passed
around with the running count of mapped SGEs, drop unneeded
parameters to svc_rdma_post_send_wr().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h            |    2 +-
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |    2 +-
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c    |    2 +-
 net/sunrpc/xprtrdma/svc_rdma_sendto.c      |   17 ++++++++---------
 4 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 95530bc..8827b4e 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -188,7 +188,7 @@ extern int svc_rdma_map_reply_hdr(struct svcxprt_rdma *rdma,
 				  __be32 *rdma_resp, unsigned int len);
 extern int svc_rdma_post_send_wr(struct svcxprt_rdma *rdma,
 				 struct svc_rdma_op_ctxt *ctxt,
-				 int num_sge, u32 inv_rkey);
+				 u32 inv_rkey);
 extern int svc_rdma_sendto(struct svc_rqst *);
 
 /* svc_rdma_transport.c */
diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
index d501521..0b9ba9f 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
@@ -135,7 +135,7 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
 	 * the rq_buffer before all retransmits are complete.
 	 */
 	get_page(virt_to_page(rqst->rq_buffer));
-	ret = svc_rdma_post_send_wr(rdma, ctxt, 1, 0);
+	ret = svc_rdma_post_send_wr(rdma, ctxt, 0);
 	if (ret)
 		goto out_unmap;
 
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 0445e75..af6d2f3 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -639,7 +639,7 @@ static void svc_rdma_send_error(struct svcxprt_rdma *xprt,
 		return;
 	}
 
-	ret = svc_rdma_post_send_wr(xprt, ctxt, 1, 0);
+	ret = svc_rdma_post_send_wr(xprt, ctxt, 0);
 	if (ret) {
 		svc_rdma_unmap_dma(ctxt);
 		svc_rdma_put_context(ctxt, 1);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index ee9ba07..4591017 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -365,8 +365,7 @@ int svc_rdma_map_reply_hdr(struct svcxprt_rdma *rdma,
 /* Load the xdr_buf into the ctxt's sge array, and DMA map each
  * element as it is added.
  *
- * Returns the number of sge elements loaded on success, or
- * a negative errno on failure.
+ * Returns zero on success, or a negative errno on failure.
  */
 static int svc_rdma_map_reply_msg(struct svcxprt_rdma *rdma,
 				  struct svc_rdma_op_ctxt *ctxt,
@@ -429,7 +428,7 @@ static int svc_rdma_map_reply_msg(struct svcxprt_rdma *rdma,
 			return ret;
 	}
 
-	return sge_no - 1;
+	return 0;
 }
 
 /* The svc_rqst and all resources it owns are released as soon as
@@ -453,7 +452,6 @@ static void svc_rdma_save_io_pages(struct svc_rqst *rqstp,
  * svc_rdma_post_send_wr - Set up and post one Send Work Request
  * @rdma: controlling transport
  * @ctxt: op_ctxt for transmitting the Send WR
- * @num_sge: number of SGEs to send
  * @inv_rkey: R_key argument to Send With Invalidate, or zero
  *
  * Returns:
@@ -463,18 +461,19 @@ static void svc_rdma_save_io_pages(struct svc_rqst *rqstp,
  *	%-ENOMEM if ib_post_send failed.
  */
 int svc_rdma_post_send_wr(struct svcxprt_rdma *rdma,
-			  struct svc_rdma_op_ctxt *ctxt, int num_sge,
+			  struct svc_rdma_op_ctxt *ctxt,
 			  u32 inv_rkey)
 {
 	struct ib_send_wr *send_wr = &ctxt->send_wr;
 
-	dprintk("svcrdma: posting Send WR with %u sge(s)\n", num_sge);
+	dprintk("svcrdma: posting Send WR with %u sge(s)\n",
+		ctxt->mapped_sges);
 
 	send_wr->next = NULL;
 	ctxt->cqe.done = svc_rdma_wc_send;
 	send_wr->wr_cqe = &ctxt->cqe;
 	send_wr->sg_list = ctxt->sge;
-	send_wr->num_sge = num_sge;
+	send_wr->num_sge = ctxt->mapped_sges;
 	send_wr->send_flags = IB_SEND_SIGNALED;
 	if (inv_rkey) {
 		send_wr->opcode = IB_WR_SEND_WITH_INV;
@@ -532,7 +531,7 @@ static int svc_rdma_send_reply_msg(struct svcxprt_rdma *rdma,
 	inv_rkey = 0;
 	if (rdma->sc_snd_w_inv)
 		inv_rkey = svc_rdma_get_inv_rkey(rdma_argp, wr_lst, rp_ch);
-	ret = svc_rdma_post_send_wr(rdma, ctxt, 1 + ret, inv_rkey);
+	ret = svc_rdma_post_send_wr(rdma, ctxt, inv_rkey);
 	if (ret)
 		goto err;
 
@@ -574,7 +573,7 @@ static int svc_rdma_send_error_msg(struct svcxprt_rdma *rdma,
 
 	svc_rdma_save_io_pages(rqstp, ctxt);
 
-	ret = svc_rdma_post_send_wr(rdma, ctxt, 1 + ret, 0);
+	ret = svc_rdma_post_send_wr(rdma, ctxt, 0);
 	if (ret)
 		goto err;
 


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v1 14/19] svcrdma: Introduce svc_rdma_send_ctxt
  2018-05-07 19:26 [PATCH v1 00/19] NFS/RDMA server for-next Chuck Lever
                   ` (12 preceding siblings ...)
  2018-05-07 19:27 ` [PATCH v1 13/19] svcrdma: Clean up Send SGE accounting Chuck Lever
@ 2018-05-07 19:28 ` Chuck Lever
  2018-05-07 19:28 ` [PATCH v1 15/19] svcrdma: Don't overrun the SGE array in svc_rdma_send_ctxt Chuck Lever
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Chuck Lever @ 2018-05-07 19:28 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

svc_rdma_op_ctxt's are pre-allocated and maintained on a per-xprt
free list. This eliminates the overhead of calling kmalloc / kfree,
both of which grab a globally shared lock that disables interrupts.
Introduce a replacement to svc_rdma_op_ctxt's that is built
especially for the svcrdma Send path.

Subsequent patches will take advantage of this new structure by
allocating real resources which are then cached in these objects.
The allocations are freed when the transport is torn down.

I've renamed the structure so that static type checking can be used
to ensure that uses of op_ctxt and send_ctxt are not confused. As an
additional clean up, structure fields are renamed to conform with
kernel coding conventions.

Additional clean ups:
- Handle svc_rdma_send_ctxt_get allocation failure at each call
  site, rather than pre-allocating and hoping we guessed correctly
- All send_ctxt_put call-sites request page freeing, so remove
  the @free_pages argument
- All send_ctxt_put call-sites unmap SGEs, so fold that into
  svc_rdma_send_ctxt_put

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h            |   35 +++-
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |   13 +
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c    |   13 +
 net/sunrpc/xprtrdma/svc_rdma_sendto.c      |  254 +++++++++++++++++++++++-----
 net/sunrpc/xprtrdma/svc_rdma_transport.c   |  205 -----------------------
 5 files changed, 254 insertions(+), 266 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 8827b4e..d3e2bb3 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -109,8 +109,8 @@ struct svcxprt_rdma {
 
 	struct ib_pd         *sc_pd;
 
-	spinlock_t	     sc_ctxt_lock;
-	struct list_head     sc_ctxts;
+	spinlock_t	     sc_send_lock;
+	struct list_head     sc_send_ctxts;
 	int		     sc_ctxt_used;
 	spinlock_t	     sc_rw_ctxt_lock;
 	struct list_head     sc_rw_ctxts;
@@ -158,6 +158,19 @@ struct svc_rdma_recv_ctxt {
 	struct page		*rc_pages[RPCSVC_MAXPAGES];
 };
 
+enum {
+	RPCRDMA_MAX_SGES	= 1 + (RPCRDMA_MAX_INLINE_THRESH / PAGE_SIZE),
+};
+
+struct svc_rdma_send_ctxt {
+	struct list_head	sc_list;
+	struct ib_send_wr	sc_send_wr;
+	struct ib_cqe		sc_cqe;
+	int			sc_page_count;
+	struct page		*sc_pages[RPCSVC_MAXPAGES];
+	struct ib_sge		sc_sges[RPCRDMA_MAX_SGES];
+};
+
 /* svc_rdma_backchannel.c */
 extern int svc_rdma_handle_bc_reply(struct rpc_xprt *xprt,
 				    __be32 *rdma_resp,
@@ -183,24 +196,22 @@ extern int svc_rdma_send_reply_chunk(struct svcxprt_rdma *rdma,
 				     struct xdr_buf *xdr);
 
 /* svc_rdma_sendto.c */
+extern void svc_rdma_send_ctxts_destroy(struct svcxprt_rdma *rdma);
+extern struct svc_rdma_send_ctxt *
+		svc_rdma_send_ctxt_get(struct svcxprt_rdma *rdma);
+extern void svc_rdma_send_ctxt_put(struct svcxprt_rdma *rdma,
+				   struct svc_rdma_send_ctxt *ctxt);
+extern int svc_rdma_send(struct svcxprt_rdma *rdma, struct ib_send_wr *wr);
 extern int svc_rdma_map_reply_hdr(struct svcxprt_rdma *rdma,
-				  struct svc_rdma_op_ctxt *ctxt,
+				  struct svc_rdma_send_ctxt *ctxt,
 				  __be32 *rdma_resp, unsigned int len);
 extern int svc_rdma_post_send_wr(struct svcxprt_rdma *rdma,
-				 struct svc_rdma_op_ctxt *ctxt,
+				 struct svc_rdma_send_ctxt *ctxt,
 				 u32 inv_rkey);
 extern int svc_rdma_sendto(struct svc_rqst *);
 
 /* svc_rdma_transport.c */
-extern void svc_rdma_wc_send(struct ib_cq *, struct ib_wc *);
-extern void svc_rdma_wc_reg(struct ib_cq *, struct ib_wc *);
-extern void svc_rdma_wc_read(struct ib_cq *, struct ib_wc *);
-extern void svc_rdma_wc_inv(struct ib_cq *, struct ib_wc *);
-extern int svc_rdma_send(struct svcxprt_rdma *, struct ib_send_wr *);
 extern int svc_rdma_create_listen(struct svc_serv *, int, struct sockaddr *);
-extern struct svc_rdma_op_ctxt *svc_rdma_get_context(struct svcxprt_rdma *);
-extern void svc_rdma_put_context(struct svc_rdma_op_ctxt *, int);
-extern void svc_rdma_unmap_dma(struct svc_rdma_op_ctxt *ctxt);
 extern void svc_sq_reap(struct svcxprt_rdma *);
 extern void svc_rq_reap(struct svcxprt_rdma *);
 extern void svc_rdma_prep_reply_hdr(struct svc_rqst *);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
index 0b9ba9f..95e3351 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
- * Copyright (c) 2015 Oracle.  All rights reserved.
+ * Copyright (c) 2015-2018 Oracle.  All rights reserved.
  *
  * Support for backward direction RPCs on RPC/RDMA (server-side).
  */
@@ -117,10 +117,14 @@ int svc_rdma_handle_bc_reply(struct rpc_xprt *xprt, __be32 *rdma_resp,
 static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
 			      struct rpc_rqst *rqst)
 {
-	struct svc_rdma_op_ctxt *ctxt;
+	struct svc_rdma_send_ctxt *ctxt;
 	int ret;
 
-	ctxt = svc_rdma_get_context(rdma);
+	ctxt = svc_rdma_send_ctxt_get(rdma);
+	if (!ctxt) {
+		ret = -ENOMEM;
+		goto out_err;
+	}
 
 	/* rpcrdma_bc_send_request builds the transport header and
 	 * the backchannel RPC message in the same buffer. Thus only
@@ -144,8 +148,7 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
 	return ret;
 
 out_unmap:
-	svc_rdma_unmap_dma(ctxt);
-	svc_rdma_put_context(ctxt, 1);
+	svc_rdma_send_ctxt_put(rdma, ctxt);
 	ret = -EIO;
 	goto out_err;
 }
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index af6d2f3..2d1e0db 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -601,7 +601,7 @@ static void rdma_read_complete(struct svc_rqst *rqstp,
 static void svc_rdma_send_error(struct svcxprt_rdma *xprt,
 				__be32 *rdma_argp, int status)
 {
-	struct svc_rdma_op_ctxt *ctxt;
+	struct svc_rdma_send_ctxt *ctxt;
 	__be32 *p, *err_msgp;
 	unsigned int length;
 	struct page *page;
@@ -631,7 +631,10 @@ static void svc_rdma_send_error(struct svcxprt_rdma *xprt,
 	length = (unsigned long)p - (unsigned long)err_msgp;
 
 	/* Map transport header; no RPC message payload */
-	ctxt = svc_rdma_get_context(xprt);
+	ctxt = svc_rdma_send_ctxt_get(xprt);
+	if (!ctxt)
+		return;
+
 	ret = svc_rdma_map_reply_hdr(xprt, ctxt, err_msgp, length);
 	if (ret) {
 		dprintk("svcrdma: Error %d mapping send for protocol error\n",
@@ -640,10 +643,8 @@ static void svc_rdma_send_error(struct svcxprt_rdma *xprt,
 	}
 
 	ret = svc_rdma_post_send_wr(xprt, ctxt, 0);
-	if (ret) {
-		svc_rdma_unmap_dma(ctxt);
-		svc_rdma_put_context(ctxt, 1);
-	}
+	if (ret)
+		svc_rdma_send_ctxt_put(xprt, ctxt);
 }
 
 /* By convention, backchannel calls arrive via rdma_msg type
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index 4591017..b286d6a 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -75,11 +75,11 @@
  * DMA-unmap the pages under I/O for that Write segment. The Write
  * completion handler does not release any pages.
  *
- * When the Send WR is constructed, it also gets its own svc_rdma_op_ctxt.
+ * When the Send WR is constructed, it also gets its own svc_rdma_send_ctxt.
  * The ownership of all of the Reply's pages are transferred into that
  * ctxt, the Send WR is posted, and sendto returns.
  *
- * The svc_rdma_op_ctxt is presented when the Send WR completes. The
+ * The svc_rdma_send_ctxt is presented when the Send WR completes. The
  * Send completion handler finally releases the Reply's pages.
  *
  * This mechanism also assumes that completions on the transport's Send
@@ -114,6 +114,184 @@
 
 #define RPCDBG_FACILITY	RPCDBG_SVCXPRT
 
+static void svc_rdma_wc_send(struct ib_cq *cq, struct ib_wc *wc);
+
+static inline struct svc_rdma_send_ctxt *
+svc_rdma_next_send_ctxt(struct list_head *list)
+{
+	return list_first_entry_or_null(list, struct svc_rdma_send_ctxt,
+					sc_list);
+}
+
+static struct svc_rdma_send_ctxt *
+svc_rdma_send_ctxt_alloc(struct svcxprt_rdma *rdma)
+{
+	struct svc_rdma_send_ctxt *ctxt;
+	int i;
+
+	ctxt = kmalloc(sizeof(*ctxt), GFP_KERNEL);
+	if (!ctxt)
+		return NULL;
+
+	ctxt->sc_cqe.done = svc_rdma_wc_send;
+	ctxt->sc_send_wr.next = NULL;
+	ctxt->sc_send_wr.wr_cqe = &ctxt->sc_cqe;
+	ctxt->sc_send_wr.sg_list = ctxt->sc_sges;
+	ctxt->sc_send_wr.send_flags = IB_SEND_SIGNALED;
+	for (i = 0; i < ARRAY_SIZE(ctxt->sc_sges); i++)
+		ctxt->sc_sges[i].lkey = rdma->sc_pd->local_dma_lkey;
+	return ctxt;
+}
+
+/**
+ * svc_rdma_send_ctxts_destroy - Release all send_ctxt's for an xprt
+ * @rdma: svcxprt_rdma being torn down
+ *
+ */
+void svc_rdma_send_ctxts_destroy(struct svcxprt_rdma *rdma)
+{
+	struct svc_rdma_send_ctxt *ctxt;
+
+	while ((ctxt = svc_rdma_next_send_ctxt(&rdma->sc_send_ctxts))) {
+		list_del(&ctxt->sc_list);
+		kfree(ctxt);
+	}
+}
+
+/**
+ * svc_rdma_send_ctxt_get - Get a free send_ctxt
+ * @rdma: controlling svcxprt_rdma
+ *
+ * Returns a ready-to-use send_ctxt, or NULL if none are
+ * available and a fresh one cannot be allocated.
+ */
+struct svc_rdma_send_ctxt *svc_rdma_send_ctxt_get(struct svcxprt_rdma *rdma)
+{
+	struct svc_rdma_send_ctxt *ctxt;
+
+	spin_lock(&rdma->sc_send_lock);
+	ctxt = svc_rdma_next_send_ctxt(&rdma->sc_send_ctxts);
+	if (!ctxt)
+		goto out_empty;
+	list_del(&ctxt->sc_list);
+	spin_unlock(&rdma->sc_send_lock);
+
+out:
+	ctxt->sc_send_wr.num_sge = 0;
+	ctxt->sc_page_count = 0;
+	return ctxt;
+
+out_empty:
+	spin_unlock(&rdma->sc_send_lock);
+	ctxt = svc_rdma_send_ctxt_alloc(rdma);
+	if (!ctxt)
+		return NULL;
+	goto out;
+}
+
+/**
+ * svc_rdma_send_ctxt_put - Return send_ctxt to free list
+ * @rdma: controlling svcxprt_rdma
+ * @ctxt: object to return to the free list
+ *
+ * Pages left in sc_pages are DMA unmapped and released.
+ */
+void svc_rdma_send_ctxt_put(struct svcxprt_rdma *rdma,
+			    struct svc_rdma_send_ctxt *ctxt)
+{
+	struct ib_device *device = rdma->sc_cm_id->device;
+	unsigned int i;
+
+	for (i = 0; i < ctxt->sc_send_wr.num_sge; i++)
+		ib_dma_unmap_page(device,
+				  ctxt->sc_sges[i].addr,
+				  ctxt->sc_sges[i].length,
+				  DMA_TO_DEVICE);
+
+	for (i = 0; i < ctxt->sc_page_count; ++i)
+		put_page(ctxt->sc_pages[i]);
+
+	spin_lock(&rdma->sc_send_lock);
+	list_add(&ctxt->sc_list, &rdma->sc_send_ctxts);
+	spin_unlock(&rdma->sc_send_lock);
+}
+
+/**
+ * svc_rdma_wc_send - Invoked by RDMA provider for each polled Send WC
+ * @cq: Completion Queue context
+ * @wc: Work Completion object
+ *
+ * NB: The svc_xprt/svcxprt_rdma is pinned whenever it's possible that
+ * the Send completion handler could be running.
+ */
+static void svc_rdma_wc_send(struct ib_cq *cq, struct ib_wc *wc)
+{
+	struct svcxprt_rdma *rdma = cq->cq_context;
+	struct ib_cqe *cqe = wc->wr_cqe;
+	struct svc_rdma_send_ctxt *ctxt;
+
+	trace_svcrdma_wc_send(wc);
+
+	atomic_inc(&rdma->sc_sq_avail);
+	wake_up(&rdma->sc_send_wait);
+
+	ctxt = container_of(cqe, struct svc_rdma_send_ctxt, sc_cqe);
+	svc_rdma_send_ctxt_put(rdma, ctxt);
+
+	if (unlikely(wc->status != IB_WC_SUCCESS)) {
+		set_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags);
+		svc_xprt_enqueue(&rdma->sc_xprt);
+		if (wc->status != IB_WC_WR_FLUSH_ERR)
+			pr_err("svcrdma: Send: %s (%u/0x%x)\n",
+			       ib_wc_status_msg(wc->status),
+			       wc->status, wc->vendor_err);
+	}
+
+	svc_xprt_put(&rdma->sc_xprt);
+}
+
+int svc_rdma_send(struct svcxprt_rdma *rdma, struct ib_send_wr *wr)
+{
+	struct ib_send_wr *bad_wr, *n_wr;
+	int wr_count;
+	int i;
+	int ret;
+
+	wr_count = 1;
+	for (n_wr = wr->next; n_wr; n_wr = n_wr->next)
+		wr_count++;
+
+	/* If the SQ is full, wait until an SQ entry is available */
+	while (1) {
+		if ((atomic_sub_return(wr_count, &rdma->sc_sq_avail) < 0)) {
+			atomic_inc(&rdma_stat_sq_starve);
+			trace_svcrdma_sq_full(rdma);
+			atomic_add(wr_count, &rdma->sc_sq_avail);
+			wait_event(rdma->sc_send_wait,
+				   atomic_read(&rdma->sc_sq_avail) > wr_count);
+			if (test_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags))
+				return -ENOTCONN;
+			trace_svcrdma_sq_retry(rdma);
+			continue;
+		}
+		/* Take a transport ref for each WR posted */
+		for (i = 0; i < wr_count; i++)
+			svc_xprt_get(&rdma->sc_xprt);
+
+		/* Bump used SQ WR count and post */
+		ret = ib_post_send(rdma->sc_qp, wr, &bad_wr);
+		trace_svcrdma_post_send(wr, ret);
+		if (ret) {
+			set_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags);
+			for (i = 0; i < wr_count; i++)
+				svc_xprt_put(&rdma->sc_xprt);
+			wake_up(&rdma->sc_send_wait);
+		}
+		break;
+	}
+	return ret;
+}
+
 static u32 xdr_padsize(u32 len)
 {
 	return (len & 3) ? (4 - (len & 3)) : 0;
@@ -303,7 +481,7 @@ static u32 svc_rdma_get_inv_rkey(__be32 *rdma_argp,
 }
 
 static int svc_rdma_dma_map_page(struct svcxprt_rdma *rdma,
-				 struct svc_rdma_op_ctxt *ctxt,
+				 struct svc_rdma_send_ctxt *ctxt,
 				 unsigned int sge_no,
 				 struct page *page,
 				 unsigned long offset,
@@ -316,10 +494,9 @@ static int svc_rdma_dma_map_page(struct svcxprt_rdma *rdma,
 	if (ib_dma_mapping_error(dev, dma_addr))
 		goto out_maperr;
 
-	ctxt->sge[sge_no].addr = dma_addr;
-	ctxt->sge[sge_no].length = len;
-	ctxt->sge[sge_no].lkey = rdma->sc_pd->local_dma_lkey;
-	ctxt->mapped_sges++;
+	ctxt->sc_sges[sge_no].addr = dma_addr;
+	ctxt->sc_sges[sge_no].length = len;
+	ctxt->sc_send_wr.num_sge++;
 	return 0;
 
 out_maperr:
@@ -331,7 +508,7 @@ static int svc_rdma_dma_map_page(struct svcxprt_rdma *rdma,
  * handles DMA-unmap and it uses ib_dma_unmap_page() exclusively.
  */
 static int svc_rdma_dma_map_buf(struct svcxprt_rdma *rdma,
-				struct svc_rdma_op_ctxt *ctxt,
+				struct svc_rdma_send_ctxt *ctxt,
 				unsigned int sge_no,
 				unsigned char *base,
 				unsigned int len)
@@ -352,14 +529,13 @@ static int svc_rdma_dma_map_buf(struct svcxprt_rdma *rdma,
  *	%-EIO if DMA mapping failed.
  */
 int svc_rdma_map_reply_hdr(struct svcxprt_rdma *rdma,
-			   struct svc_rdma_op_ctxt *ctxt,
+			   struct svc_rdma_send_ctxt *ctxt,
 			   __be32 *rdma_resp,
 			   unsigned int len)
 {
-	ctxt->direction = DMA_TO_DEVICE;
-	ctxt->pages[0] = virt_to_page(rdma_resp);
-	ctxt->count = 1;
-	return svc_rdma_dma_map_page(rdma, ctxt, 0, ctxt->pages[0], 0, len);
+	ctxt->sc_pages[0] = virt_to_page(rdma_resp);
+	ctxt->sc_page_count++;
+	return svc_rdma_dma_map_page(rdma, ctxt, 0, ctxt->sc_pages[0], 0, len);
 }
 
 /* Load the xdr_buf into the ctxt's sge array, and DMA map each
@@ -368,7 +544,7 @@ int svc_rdma_map_reply_hdr(struct svcxprt_rdma *rdma,
  * Returns zero on success, or a negative errno on failure.
  */
 static int svc_rdma_map_reply_msg(struct svcxprt_rdma *rdma,
-				  struct svc_rdma_op_ctxt *ctxt,
+				  struct svc_rdma_send_ctxt *ctxt,
 				  struct xdr_buf *xdr, __be32 *wr_lst)
 {
 	unsigned int len, sge_no, remaining;
@@ -436,13 +612,13 @@ static int svc_rdma_map_reply_msg(struct svcxprt_rdma *rdma,
  * so they are released by the Send completion handler.
  */
 static void svc_rdma_save_io_pages(struct svc_rqst *rqstp,
-				   struct svc_rdma_op_ctxt *ctxt)
+				   struct svc_rdma_send_ctxt *ctxt)
 {
 	int i, pages = rqstp->rq_next_page - rqstp->rq_respages;
 
-	ctxt->count += pages;
+	ctxt->sc_page_count += pages;
 	for (i = 0; i < pages; i++) {
-		ctxt->pages[i + 1] = rqstp->rq_respages[i];
+		ctxt->sc_pages[i + 1] = rqstp->rq_respages[i];
 		rqstp->rq_respages[i] = NULL;
 	}
 	rqstp->rq_next_page = rqstp->rq_respages + 1;
@@ -461,37 +637,29 @@ static void svc_rdma_save_io_pages(struct svc_rqst *rqstp,
  *	%-ENOMEM if ib_post_send failed.
  */
 int svc_rdma_post_send_wr(struct svcxprt_rdma *rdma,
-			  struct svc_rdma_op_ctxt *ctxt,
+			  struct svc_rdma_send_ctxt *ctxt,
 			  u32 inv_rkey)
 {
-	struct ib_send_wr *send_wr = &ctxt->send_wr;
-
 	dprintk("svcrdma: posting Send WR with %u sge(s)\n",
-		ctxt->mapped_sges);
-
-	send_wr->next = NULL;
-	ctxt->cqe.done = svc_rdma_wc_send;
-	send_wr->wr_cqe = &ctxt->cqe;
-	send_wr->sg_list = ctxt->sge;
-	send_wr->num_sge = ctxt->mapped_sges;
-	send_wr->send_flags = IB_SEND_SIGNALED;
+		ctxt->sc_send_wr.num_sge);
+
 	if (inv_rkey) {
-		send_wr->opcode = IB_WR_SEND_WITH_INV;
-		send_wr->ex.invalidate_rkey = inv_rkey;
+		ctxt->sc_send_wr.opcode = IB_WR_SEND_WITH_INV;
+		ctxt->sc_send_wr.ex.invalidate_rkey = inv_rkey;
 	} else {
-		send_wr->opcode = IB_WR_SEND;
+		ctxt->sc_send_wr.opcode = IB_WR_SEND;
 	}
 
-	return svc_rdma_send(rdma, send_wr);
+	return svc_rdma_send(rdma, &ctxt->sc_send_wr);
 }
 
 /* Prepare the portion of the RPC Reply that will be transmitted
  * via RDMA Send. The RPC-over-RDMA transport header is prepared
- * in sge[0], and the RPC xdr_buf is prepared in following sges.
+ * in sc_sges[0], and the RPC xdr_buf is prepared in following sges.
  *
  * Depending on whether a Write list or Reply chunk is present,
  * the server may send all, a portion of, or none of the xdr_buf.
- * In the latter case, only the transport header (sge[0]) is
+ * In the latter case, only the transport header (sc_sges[0]) is
  * transmitted.
  *
  * RDMA Send is the last step of transmitting an RPC reply. Pages
@@ -508,11 +676,13 @@ static int svc_rdma_send_reply_msg(struct svcxprt_rdma *rdma,
 				   struct svc_rqst *rqstp,
 				   __be32 *wr_lst, __be32 *rp_ch)
 {
-	struct svc_rdma_op_ctxt *ctxt;
+	struct svc_rdma_send_ctxt *ctxt;
 	u32 inv_rkey;
 	int ret;
 
-	ctxt = svc_rdma_get_context(rdma);
+	ctxt = svc_rdma_send_ctxt_get(rdma);
+	if (!ctxt)
+		return -ENOMEM;
 
 	ret = svc_rdma_map_reply_hdr(rdma, ctxt, rdma_resp,
 				     svc_rdma_reply_hdr_len(rdma_resp));
@@ -538,8 +708,7 @@ static int svc_rdma_send_reply_msg(struct svcxprt_rdma *rdma,
 	return 0;
 
 err:
-	svc_rdma_unmap_dma(ctxt);
-	svc_rdma_put_context(ctxt, 1);
+	svc_rdma_send_ctxt_put(rdma, ctxt);
 	return ret;
 }
 
@@ -553,11 +722,13 @@ static int svc_rdma_send_reply_msg(struct svcxprt_rdma *rdma,
 static int svc_rdma_send_error_msg(struct svcxprt_rdma *rdma,
 				   __be32 *rdma_resp, struct svc_rqst *rqstp)
 {
-	struct svc_rdma_op_ctxt *ctxt;
+	struct svc_rdma_send_ctxt *ctxt;
 	__be32 *p;
 	int ret;
 
-	ctxt = svc_rdma_get_context(rdma);
+	ctxt = svc_rdma_send_ctxt_get(rdma);
+	if (!ctxt)
+		return -ENOMEM;
 
 	/* Replace the original transport header with an
 	 * RDMA_ERROR response. XID etc are preserved.
@@ -580,8 +751,7 @@ static int svc_rdma_send_error_msg(struct svcxprt_rdma *rdma,
 	return 0;
 
 err:
-	svc_rdma_unmap_dma(ctxt);
-	svc_rdma_put_context(ctxt, 1);
+	svc_rdma_send_ctxt_put(rdma, ctxt);
 	return ret;
 }
 
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 333c432..e31c164 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
 /*
+ * Copyright (c) 2015-2018 Oracle. All rights reserved.
  * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
  * Copyright (c) 2005-2007 Network Appliance, Inc. All rights reserved.
  *
@@ -157,114 +158,6 @@ static void svc_rdma_bc_free(struct svc_xprt *xprt)
 }
 #endif	/* CONFIG_SUNRPC_BACKCHANNEL */
 
-static struct svc_rdma_op_ctxt *alloc_ctxt(struct svcxprt_rdma *xprt,
-					   gfp_t flags)
-{
-	struct svc_rdma_op_ctxt *ctxt;
-
-	ctxt = kmalloc(sizeof(*ctxt), flags);
-	if (ctxt) {
-		ctxt->xprt = xprt;
-		INIT_LIST_HEAD(&ctxt->list);
-	}
-	return ctxt;
-}
-
-static bool svc_rdma_prealloc_ctxts(struct svcxprt_rdma *xprt)
-{
-	unsigned int i;
-
-	i = xprt->sc_sq_depth;
-	while (i--) {
-		struct svc_rdma_op_ctxt *ctxt;
-
-		ctxt = alloc_ctxt(xprt, GFP_KERNEL);
-		if (!ctxt) {
-			dprintk("svcrdma: No memory for RDMA ctxt\n");
-			return false;
-		}
-		list_add(&ctxt->list, &xprt->sc_ctxts);
-	}
-	return true;
-}
-
-struct svc_rdma_op_ctxt *svc_rdma_get_context(struct svcxprt_rdma *xprt)
-{
-	struct svc_rdma_op_ctxt *ctxt = NULL;
-
-	spin_lock(&xprt->sc_ctxt_lock);
-	xprt->sc_ctxt_used++;
-	if (list_empty(&xprt->sc_ctxts))
-		goto out_empty;
-
-	ctxt = list_first_entry(&xprt->sc_ctxts,
-				struct svc_rdma_op_ctxt, list);
-	list_del(&ctxt->list);
-	spin_unlock(&xprt->sc_ctxt_lock);
-
-out:
-	ctxt->count = 0;
-	ctxt->mapped_sges = 0;
-	return ctxt;
-
-out_empty:
-	/* Either pre-allocation missed the mark, or send
-	 * queue accounting is broken.
-	 */
-	spin_unlock(&xprt->sc_ctxt_lock);
-
-	ctxt = alloc_ctxt(xprt, GFP_NOIO);
-	if (ctxt)
-		goto out;
-
-	spin_lock(&xprt->sc_ctxt_lock);
-	xprt->sc_ctxt_used--;
-	spin_unlock(&xprt->sc_ctxt_lock);
-	WARN_ONCE(1, "svcrdma: empty RDMA ctxt list?\n");
-	return NULL;
-}
-
-void svc_rdma_unmap_dma(struct svc_rdma_op_ctxt *ctxt)
-{
-	struct svcxprt_rdma *xprt = ctxt->xprt;
-	struct ib_device *device = xprt->sc_cm_id->device;
-	unsigned int i;
-
-	for (i = 0; i < ctxt->mapped_sges; i++)
-		ib_dma_unmap_page(device,
-				  ctxt->sge[i].addr,
-				  ctxt->sge[i].length,
-				  ctxt->direction);
-	ctxt->mapped_sges = 0;
-}
-
-void svc_rdma_put_context(struct svc_rdma_op_ctxt *ctxt, int free_pages)
-{
-	struct svcxprt_rdma *xprt = ctxt->xprt;
-	int i;
-
-	if (free_pages)
-		for (i = 0; i < ctxt->count; i++)
-			put_page(ctxt->pages[i]);
-
-	spin_lock(&xprt->sc_ctxt_lock);
-	xprt->sc_ctxt_used--;
-	list_add(&ctxt->list, &xprt->sc_ctxts);
-	spin_unlock(&xprt->sc_ctxt_lock);
-}
-
-static void svc_rdma_destroy_ctxts(struct svcxprt_rdma *xprt)
-{
-	while (!list_empty(&xprt->sc_ctxts)) {
-		struct svc_rdma_op_ctxt *ctxt;
-
-		ctxt = list_first_entry(&xprt->sc_ctxts,
-					struct svc_rdma_op_ctxt, list);
-		list_del(&ctxt->list);
-		kfree(ctxt);
-	}
-}
-
 /* QP event handler */
 static void qp_event_handler(struct ib_event *event, void *context)
 {
@@ -292,39 +185,6 @@ static void qp_event_handler(struct ib_event *event, void *context)
 	}
 }
 
-/**
- * svc_rdma_wc_send - Invoked by RDMA provider for each polled Send WC
- * @cq:        completion queue
- * @wc:        completed WR
- *
- */
-void svc_rdma_wc_send(struct ib_cq *cq, struct ib_wc *wc)
-{
-	struct svcxprt_rdma *xprt = cq->cq_context;
-	struct ib_cqe *cqe = wc->wr_cqe;
-	struct svc_rdma_op_ctxt *ctxt;
-
-	trace_svcrdma_wc_send(wc);
-
-	atomic_inc(&xprt->sc_sq_avail);
-	wake_up(&xprt->sc_send_wait);
-
-	ctxt = container_of(cqe, struct svc_rdma_op_ctxt, cqe);
-	svc_rdma_unmap_dma(ctxt);
-	svc_rdma_put_context(ctxt, 1);
-
-	if (unlikely(wc->status != IB_WC_SUCCESS)) {
-		set_bit(XPT_CLOSE, &xprt->sc_xprt.xpt_flags);
-		svc_xprt_enqueue(&xprt->sc_xprt);
-		if (wc->status != IB_WC_WR_FLUSH_ERR)
-			pr_err("svcrdma: Send: %s (%u/0x%x)\n",
-			       ib_wc_status_msg(wc->status),
-			       wc->status, wc->vendor_err);
-	}
-
-	svc_xprt_put(&xprt->sc_xprt);
-}
-
 static struct svcxprt_rdma *svc_rdma_create_xprt(struct svc_serv *serv,
 						 struct net *net)
 {
@@ -338,14 +198,14 @@ static struct svcxprt_rdma *svc_rdma_create_xprt(struct svc_serv *serv,
 	INIT_LIST_HEAD(&cma_xprt->sc_accept_q);
 	INIT_LIST_HEAD(&cma_xprt->sc_rq_dto_q);
 	INIT_LIST_HEAD(&cma_xprt->sc_read_complete_q);
-	INIT_LIST_HEAD(&cma_xprt->sc_ctxts);
+	INIT_LIST_HEAD(&cma_xprt->sc_send_ctxts);
 	INIT_LIST_HEAD(&cma_xprt->sc_recv_ctxts);
 	INIT_LIST_HEAD(&cma_xprt->sc_rw_ctxts);
 	init_waitqueue_head(&cma_xprt->sc_send_wait);
 
 	spin_lock_init(&cma_xprt->sc_lock);
 	spin_lock_init(&cma_xprt->sc_rq_dto_lock);
-	spin_lock_init(&cma_xprt->sc_ctxt_lock);
+	spin_lock_init(&cma_xprt->sc_send_lock);
 	spin_lock_init(&cma_xprt->sc_recv_lock);
 	spin_lock_init(&cma_xprt->sc_rw_ctxt_lock);
 
@@ -641,9 +501,6 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	}
 	atomic_set(&newxprt->sc_sq_avail, newxprt->sc_sq_depth);
 
-	if (!svc_rdma_prealloc_ctxts(newxprt))
-		goto errout;
-
 	newxprt->sc_pd = ib_alloc_pd(dev, 0);
 	if (IS_ERR(newxprt->sc_pd)) {
 		dprintk("svcrdma: error creating PD for connect request\n");
@@ -795,11 +652,6 @@ static void __svc_rdma_free(struct work_struct *work)
 
 	svc_rdma_flush_recv_queues(rdma);
 
-	/* Warn if we leaked a resource or under-referenced */
-	if (rdma->sc_ctxt_used != 0)
-		pr_err("svcrdma: ctxt still in use? (%d)\n",
-		       rdma->sc_ctxt_used);
-
 	/* Final put of backchannel client transport */
 	if (xprt->xpt_bc_xprt) {
 		xprt_put(xprt->xpt_bc_xprt);
@@ -807,7 +659,7 @@ static void __svc_rdma_free(struct work_struct *work)
 	}
 
 	svc_rdma_destroy_rw_ctxts(rdma);
-	svc_rdma_destroy_ctxts(rdma);
+	svc_rdma_send_ctxts_destroy(rdma);
 	svc_rdma_recv_ctxts_destroy(rdma);
 
 	/* Destroy the QP if present (not a listener) */
@@ -861,52 +713,3 @@ static void svc_rdma_secure_port(struct svc_rqst *rqstp)
 static void svc_rdma_kill_temp_xprt(struct svc_xprt *xprt)
 {
 }
-
-int svc_rdma_send(struct svcxprt_rdma *xprt, struct ib_send_wr *wr)
-{
-	struct ib_send_wr *bad_wr, *n_wr;
-	int wr_count;
-	int i;
-	int ret;
-
-	if (test_bit(XPT_CLOSE, &xprt->sc_xprt.xpt_flags))
-		return -ENOTCONN;
-
-	wr_count = 1;
-	for (n_wr = wr->next; n_wr; n_wr = n_wr->next)
-		wr_count++;
-
-	/* If the SQ is full, wait until an SQ entry is available */
-	while (1) {
-		if ((atomic_sub_return(wr_count, &xprt->sc_sq_avail) < 0)) {
-			atomic_inc(&rdma_stat_sq_starve);
-			trace_svcrdma_sq_full(xprt);
-			atomic_add(wr_count, &xprt->sc_sq_avail);
-			wait_event(xprt->sc_send_wait,
-				   atomic_read(&xprt->sc_sq_avail) > wr_count);
-			if (test_bit(XPT_CLOSE, &xprt->sc_xprt.xpt_flags))
-				return -ENOTCONN;
-			trace_svcrdma_sq_retry(xprt);
-			continue;
-		}
-		/* Take a transport ref for each WR posted */
-		for (i = 0; i < wr_count; i++)
-			svc_xprt_get(&xprt->sc_xprt);
-
-		/* Bump used SQ WR count and post */
-		ret = ib_post_send(xprt->sc_qp, wr, &bad_wr);
-		trace_svcrdma_post_send(wr, ret);
-		if (ret) {
-			set_bit(XPT_CLOSE, &xprt->sc_xprt.xpt_flags);
-			for (i = 0; i < wr_count; i ++)
-				svc_xprt_put(&xprt->sc_xprt);
-			dprintk("svcrdma: failed to post SQ WR rc=%d\n", ret);
-			dprintk("    sc_sq_avail=%d, sc_sq_depth=%d\n",
-				atomic_read(&xprt->sc_sq_avail),
-				xprt->sc_sq_depth);
-			wake_up(&xprt->sc_send_wait);
-		}
-		break;
-	}
-	return ret;
-}


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v1 15/19] svcrdma: Don't overrun the SGE array in svc_rdma_send_ctxt
  2018-05-07 19:26 [PATCH v1 00/19] NFS/RDMA server for-next Chuck Lever
                   ` (13 preceding siblings ...)
  2018-05-07 19:28 ` [PATCH v1 14/19] svcrdma: Introduce svc_rdma_send_ctxt Chuck Lever
@ 2018-05-07 19:28 ` Chuck Lever
  2018-05-07 19:28 ` [PATCH v1 16/19] svcrdma: Remove post_send_wr Chuck Lever
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Chuck Lever @ 2018-05-07 19:28 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

Receive buffers are always the same size, but each Send WR has a
variable number of SGEs, based on the contents of the xdr_buf being
sent.

While assembling a Send WR, keep track of the number of SGEs so that
we don't exceed the device's maximum, or walk off the end of the
Send SGE array.

For now the Send path just fails if it exceeds the maximum.

The current logic in svc_rdma_accept bases the maximum number of
Send SGEs on the largest NFS request that can be sent or received.
In the transport layer, the limit is actually based on the
capabilities of the underlying device, not on properties of the
Upper Layer Protocol.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h          |    9 +++-----
 net/sunrpc/xprtrdma/svc_rdma_sendto.c    |   36 ++++++++++++++++++------------
 net/sunrpc/xprtrdma/svc_rdma_transport.c |   13 ++++++++---
 3 files changed, 33 insertions(+), 25 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index d3e2bb3..bfb8824 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -96,7 +96,7 @@ struct svcxprt_rdma {
 	struct rdma_cm_id    *sc_cm_id;		/* RDMA connection id */
 	struct list_head     sc_accept_q;	/* Conn. waiting accept */
 	int		     sc_ord;		/* RDMA read limit */
-	int                  sc_max_sge;
+	int                  sc_max_send_sges;
 	bool		     sc_snd_w_inv;	/* OK to use Send With Invalidate */
 
 	atomic_t             sc_sq_avail;	/* SQEs ready to be consumed */
@@ -158,17 +158,14 @@ struct svc_rdma_recv_ctxt {
 	struct page		*rc_pages[RPCSVC_MAXPAGES];
 };
 
-enum {
-	RPCRDMA_MAX_SGES	= 1 + (RPCRDMA_MAX_INLINE_THRESH / PAGE_SIZE),
-};
-
 struct svc_rdma_send_ctxt {
 	struct list_head	sc_list;
 	struct ib_send_wr	sc_send_wr;
 	struct ib_cqe		sc_cqe;
 	int			sc_page_count;
+	int			sc_cur_sge_no;
 	struct page		*sc_pages[RPCSVC_MAXPAGES];
-	struct ib_sge		sc_sges[RPCRDMA_MAX_SGES];
+	struct ib_sge		sc_sges[];
 };
 
 /* svc_rdma_backchannel.c */
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index b286d6a..53d8db6 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -127,9 +127,12 @@
 svc_rdma_send_ctxt_alloc(struct svcxprt_rdma *rdma)
 {
 	struct svc_rdma_send_ctxt *ctxt;
+	size_t size;
 	int i;
 
-	ctxt = kmalloc(sizeof(*ctxt), GFP_KERNEL);
+	size = sizeof(*ctxt);
+	size += rdma->sc_max_send_sges * sizeof(struct ib_sge);
+	ctxt = kmalloc(size, GFP_KERNEL);
 	if (!ctxt)
 		return NULL;
 
@@ -138,7 +141,7 @@
 	ctxt->sc_send_wr.wr_cqe = &ctxt->sc_cqe;
 	ctxt->sc_send_wr.sg_list = ctxt->sc_sges;
 	ctxt->sc_send_wr.send_flags = IB_SEND_SIGNALED;
-	for (i = 0; i < ARRAY_SIZE(ctxt->sc_sges); i++)
+	for (i = 0; i < rdma->sc_max_send_sges; i++)
 		ctxt->sc_sges[i].lkey = rdma->sc_pd->local_dma_lkey;
 	return ctxt;
 }
@@ -482,7 +485,6 @@ static u32 svc_rdma_get_inv_rkey(__be32 *rdma_argp,
 
 static int svc_rdma_dma_map_page(struct svcxprt_rdma *rdma,
 				 struct svc_rdma_send_ctxt *ctxt,
-				 unsigned int sge_no,
 				 struct page *page,
 				 unsigned long offset,
 				 unsigned int len)
@@ -494,8 +496,8 @@ static int svc_rdma_dma_map_page(struct svcxprt_rdma *rdma,
 	if (ib_dma_mapping_error(dev, dma_addr))
 		goto out_maperr;
 
-	ctxt->sc_sges[sge_no].addr = dma_addr;
-	ctxt->sc_sges[sge_no].length = len;
+	ctxt->sc_sges[ctxt->sc_cur_sge_no].addr = dma_addr;
+	ctxt->sc_sges[ctxt->sc_cur_sge_no].length = len;
 	ctxt->sc_send_wr.num_sge++;
 	return 0;
 
@@ -509,11 +511,10 @@ static int svc_rdma_dma_map_page(struct svcxprt_rdma *rdma,
  */
 static int svc_rdma_dma_map_buf(struct svcxprt_rdma *rdma,
 				struct svc_rdma_send_ctxt *ctxt,
-				unsigned int sge_no,
 				unsigned char *base,
 				unsigned int len)
 {
-	return svc_rdma_dma_map_page(rdma, ctxt, sge_no, virt_to_page(base),
+	return svc_rdma_dma_map_page(rdma, ctxt, virt_to_page(base),
 				     offset_in_page(base), len);
 }
 
@@ -535,7 +536,8 @@ int svc_rdma_map_reply_hdr(struct svcxprt_rdma *rdma,
 {
 	ctxt->sc_pages[0] = virt_to_page(rdma_resp);
 	ctxt->sc_page_count++;
-	return svc_rdma_dma_map_page(rdma, ctxt, 0, ctxt->sc_pages[0], 0, len);
+	ctxt->sc_cur_sge_no = 0;
+	return svc_rdma_dma_map_page(rdma, ctxt, ctxt->sc_pages[0], 0, len);
 }
 
 /* Load the xdr_buf into the ctxt's sge array, and DMA map each
@@ -547,16 +549,16 @@ static int svc_rdma_map_reply_msg(struct svcxprt_rdma *rdma,
 				  struct svc_rdma_send_ctxt *ctxt,
 				  struct xdr_buf *xdr, __be32 *wr_lst)
 {
-	unsigned int len, sge_no, remaining;
+	unsigned int len, remaining;
 	unsigned long page_off;
 	struct page **ppages;
 	unsigned char *base;
 	u32 xdr_pad;
 	int ret;
 
-	sge_no = 1;
-
-	ret = svc_rdma_dma_map_buf(rdma, ctxt, sge_no++,
+	if (++ctxt->sc_cur_sge_no >= rdma->sc_max_send_sges)
+		return -EIO;
+	ret = svc_rdma_dma_map_buf(rdma, ctxt,
 				   xdr->head[0].iov_base,
 				   xdr->head[0].iov_len);
 	if (ret < 0)
@@ -586,8 +588,10 @@ static int svc_rdma_map_reply_msg(struct svcxprt_rdma *rdma,
 	while (remaining) {
 		len = min_t(u32, PAGE_SIZE - page_off, remaining);
 
-		ret = svc_rdma_dma_map_page(rdma, ctxt, sge_no++,
-					    *ppages++, page_off, len);
+		if (++ctxt->sc_cur_sge_no >= rdma->sc_max_send_sges)
+			return -EIO;
+		ret = svc_rdma_dma_map_page(rdma, ctxt, *ppages++,
+					    page_off, len);
 		if (ret < 0)
 			return ret;
 
@@ -599,7 +603,9 @@ static int svc_rdma_map_reply_msg(struct svcxprt_rdma *rdma,
 	len = xdr->tail[0].iov_len;
 tail:
 	if (len) {
-		ret = svc_rdma_dma_map_buf(rdma, ctxt, sge_no++, base, len);
+		if (++ctxt->sc_cur_sge_no >= rdma->sc_max_send_sges)
+			return -EIO;
+		ret = svc_rdma_dma_map_buf(rdma, ctxt, base, len);
 		if (ret < 0)
 			return ret;
 	}
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index e31c164..78b554d 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -477,8 +477,13 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 
 	/* Qualify the transport resource defaults with the
 	 * capabilities of this particular device */
-	newxprt->sc_max_sge = min((size_t)dev->attrs.max_sge,
-				  (size_t)RPCSVC_MAXPAGES);
+	newxprt->sc_max_send_sges = dev->attrs.max_sge;
+	/* transport hdr, head iovec, one page list entry, tail iovec */
+	if (newxprt->sc_max_send_sges < 4) {
+		pr_err("svcrdma: too few Send SGEs available (%d)\n",
+		       newxprt->sc_max_send_sges);
+		goto errout;
+	}
 	newxprt->sc_max_req_size = svcrdma_max_req_size;
 	newxprt->sc_max_requests = svcrdma_max_requests;
 	newxprt->sc_max_bc_requests = svcrdma_max_bc_requests;
@@ -526,7 +531,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	qp_attr.cap.max_rdma_ctxs = ctxts;
 	qp_attr.cap.max_send_wr = newxprt->sc_sq_depth - ctxts;
 	qp_attr.cap.max_recv_wr = rq_depth;
-	qp_attr.cap.max_send_sge = newxprt->sc_max_sge;
+	qp_attr.cap.max_send_sge = newxprt->sc_max_send_sges;
 	qp_attr.cap.max_recv_sge = 1;
 	qp_attr.sq_sig_type = IB_SIGNAL_REQ_WR;
 	qp_attr.qp_type = IB_QPT_RC;
@@ -587,7 +592,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	dprintk("    local address   : %pIS:%u\n", sap, rpc_get_port(sap));
 	sap = (struct sockaddr *)&newxprt->sc_cm_id->route.addr.dst_addr;
 	dprintk("    remote address  : %pIS:%u\n", sap, rpc_get_port(sap));
-	dprintk("    max_sge         : %d\n", newxprt->sc_max_sge);
+	dprintk("    max_sge         : %d\n", newxprt->sc_max_send_sges);
 	dprintk("    sq_depth        : %d\n", newxprt->sc_sq_depth);
 	dprintk("    rdma_rw_ctxs    : %d\n", ctxts);
 	dprintk("    max_requests    : %d\n", newxprt->sc_max_requests);


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v1 16/19] svcrdma: Remove post_send_wr
  2018-05-07 19:26 [PATCH v1 00/19] NFS/RDMA server for-next Chuck Lever
                   ` (14 preceding siblings ...)
  2018-05-07 19:28 ` [PATCH v1 15/19] svcrdma: Don't overrun the SGE array in svc_rdma_send_ctxt Chuck Lever
@ 2018-05-07 19:28 ` Chuck Lever
  2018-05-07 19:28 ` [PATCH v1 17/19] svcrdma: Simplify svc_rdma_send() Chuck Lever
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 29+ messages in thread
From: Chuck Lever @ 2018-05-07 19:28 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

Clean up: Now that the send_wr is part of the svc_rdma_send_ctxt,
svc_rdma_post_send_wr is nearly empty.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h            |    3 --
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |    3 +-
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c    |    3 +-
 net/sunrpc/xprtrdma/svc_rdma_sendto.c      |   47 +++++++---------------------
 4 files changed, 16 insertions(+), 40 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index bfb8824..a8bfc21 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -202,9 +202,6 @@ extern void svc_rdma_send_ctxt_put(struct svcxprt_rdma *rdma,
 extern int svc_rdma_map_reply_hdr(struct svcxprt_rdma *rdma,
 				  struct svc_rdma_send_ctxt *ctxt,
 				  __be32 *rdma_resp, unsigned int len);
-extern int svc_rdma_post_send_wr(struct svcxprt_rdma *rdma,
-				 struct svc_rdma_send_ctxt *ctxt,
-				 u32 inv_rkey);
 extern int svc_rdma_sendto(struct svc_rqst *);
 
 /* svc_rdma_transport.c */
diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
index 95e3351..40f5e4a 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
@@ -139,7 +139,8 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
 	 * the rq_buffer before all retransmits are complete.
 	 */
 	get_page(virt_to_page(rqst->rq_buffer));
-	ret = svc_rdma_post_send_wr(rdma, ctxt, 0);
+	ctxt->sc_send_wr.opcode = IB_WR_SEND;
+	ret = svc_rdma_send(rdma, &ctxt->sc_send_wr);
 	if (ret)
 		goto out_unmap;
 
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 2d1e0db..68648e6 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -642,7 +642,8 @@ static void svc_rdma_send_error(struct svcxprt_rdma *xprt,
 		return;
 	}
 
-	ret = svc_rdma_post_send_wr(xprt, ctxt, 0);
+	ctxt->sc_send_wr.opcode = IB_WR_SEND;
+	ret = svc_rdma_send(xprt, &ctxt->sc_send_wr);
 	if (ret)
 		svc_rdma_send_ctxt_put(xprt, ctxt);
 }
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index 53d8db6..0ebdc0c 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -630,35 +630,6 @@ static void svc_rdma_save_io_pages(struct svc_rqst *rqstp,
 	rqstp->rq_next_page = rqstp->rq_respages + 1;
 }
 
-/**
- * svc_rdma_post_send_wr - Set up and post one Send Work Request
- * @rdma: controlling transport
- * @ctxt: op_ctxt for transmitting the Send WR
- * @inv_rkey: R_key argument to Send With Invalidate, or zero
- *
- * Returns:
- *	%0 if the Send* was posted successfully,
- *	%-ENOTCONN if the connection was lost or dropped,
- *	%-EINVAL if there was a problem with the Send we built,
- *	%-ENOMEM if ib_post_send failed.
- */
-int svc_rdma_post_send_wr(struct svcxprt_rdma *rdma,
-			  struct svc_rdma_send_ctxt *ctxt,
-			  u32 inv_rkey)
-{
-	dprintk("svcrdma: posting Send WR with %u sge(s)\n",
-		ctxt->sc_send_wr.num_sge);
-
-	if (inv_rkey) {
-		ctxt->sc_send_wr.opcode = IB_WR_SEND_WITH_INV;
-		ctxt->sc_send_wr.ex.invalidate_rkey = inv_rkey;
-	} else {
-		ctxt->sc_send_wr.opcode = IB_WR_SEND;
-	}
-
-	return svc_rdma_send(rdma, &ctxt->sc_send_wr);
-}
-
 /* Prepare the portion of the RPC Reply that will be transmitted
  * via RDMA Send. The RPC-over-RDMA transport header is prepared
  * in sc_sges[0], and the RPC xdr_buf is prepared in following sges.
@@ -683,7 +654,6 @@ static int svc_rdma_send_reply_msg(struct svcxprt_rdma *rdma,
 				   __be32 *wr_lst, __be32 *rp_ch)
 {
 	struct svc_rdma_send_ctxt *ctxt;
-	u32 inv_rkey;
 	int ret;
 
 	ctxt = svc_rdma_send_ctxt_get(rdma);
@@ -704,10 +674,16 @@ static int svc_rdma_send_reply_msg(struct svcxprt_rdma *rdma,
 
 	svc_rdma_save_io_pages(rqstp, ctxt);
 
-	inv_rkey = 0;
-	if (rdma->sc_snd_w_inv)
-		inv_rkey = svc_rdma_get_inv_rkey(rdma_argp, wr_lst, rp_ch);
-	ret = svc_rdma_post_send_wr(rdma, ctxt, inv_rkey);
+	ctxt->sc_send_wr.opcode = IB_WR_SEND;
+	if (rdma->sc_snd_w_inv) {
+		ctxt->sc_send_wr.ex.invalidate_rkey =
+			svc_rdma_get_inv_rkey(rdma_argp, wr_lst, rp_ch);
+		if (ctxt->sc_send_wr.ex.invalidate_rkey)
+			ctxt->sc_send_wr.opcode = IB_WR_SEND_WITH_INV;
+	}
+	dprintk("svcrdma: posting Send WR with %u sge(s)\n",
+		ctxt->sc_send_wr.num_sge);
+	ret = svc_rdma_send(rdma, &ctxt->sc_send_wr);
 	if (ret)
 		goto err;
 
@@ -750,7 +726,8 @@ static int svc_rdma_send_error_msg(struct svcxprt_rdma *rdma,
 
 	svc_rdma_save_io_pages(rqstp, ctxt);
 
-	ret = svc_rdma_post_send_wr(rdma, ctxt, 0);
+	ctxt->sc_send_wr.opcode = IB_WR_SEND;
+	ret = svc_rdma_send(rdma, &ctxt->sc_send_wr);
 	if (ret)
 		goto err;
 


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v1 17/19] svcrdma: Simplify svc_rdma_send()
  2018-05-07 19:26 [PATCH v1 00/19] NFS/RDMA server for-next Chuck Lever
                   ` (15 preceding siblings ...)
  2018-05-07 19:28 ` [PATCH v1 16/19] svcrdma: Remove post_send_wr Chuck Lever
@ 2018-05-07 19:28 ` Chuck Lever
  2018-05-07 19:28 ` [PATCH v1 18/19] svcrdma: Persistently allocate and DMA-map Send buffers Chuck Lever
  2018-05-07 19:28 ` [PATCH v1 19/19] svcrdma: Remove unused svc_rdma_op_ctxt Chuck Lever
  18 siblings, 0 replies; 29+ messages in thread
From: Chuck Lever @ 2018-05-07 19:28 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

Clean up: No current caller of svc_rdma_send's passes in a chained
WR. The logic that counts the chain length can be replaced with a
constant (1).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/svc_rdma_sendto.c |   30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index 0ebdc0c..edfeca4 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -253,41 +253,41 @@ static void svc_rdma_wc_send(struct ib_cq *cq, struct ib_wc *wc)
 	svc_xprt_put(&rdma->sc_xprt);
 }
 
+/**
+ * svc_rdma_send - Post a single Send WR
+ * @rdma: transport on which to post the WR
+ * @wr: prepared Send WR to post
+ *
+ * Returns zero the Send WR was posted successfully. Otherwise, a
+ * negative errno is returned.
+ */
 int svc_rdma_send(struct svcxprt_rdma *rdma, struct ib_send_wr *wr)
 {
-	struct ib_send_wr *bad_wr, *n_wr;
-	int wr_count;
-	int i;
+	struct ib_send_wr *bad_wr;
 	int ret;
 
-	wr_count = 1;
-	for (n_wr = wr->next; n_wr; n_wr = n_wr->next)
-		wr_count++;
+	might_sleep();
 
 	/* If the SQ is full, wait until an SQ entry is available */
 	while (1) {
-		if ((atomic_sub_return(wr_count, &rdma->sc_sq_avail) < 0)) {
+		if ((atomic_dec_return(&rdma->sc_sq_avail) < 0)) {
 			atomic_inc(&rdma_stat_sq_starve);
 			trace_svcrdma_sq_full(rdma);
-			atomic_add(wr_count, &rdma->sc_sq_avail);
+			atomic_inc(&rdma->sc_sq_avail);
 			wait_event(rdma->sc_send_wait,
-				   atomic_read(&rdma->sc_sq_avail) > wr_count);
+				   atomic_read(&rdma->sc_sq_avail) > 1);
 			if (test_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags))
 				return -ENOTCONN;
 			trace_svcrdma_sq_retry(rdma);
 			continue;
 		}
-		/* Take a transport ref for each WR posted */
-		for (i = 0; i < wr_count; i++)
-			svc_xprt_get(&rdma->sc_xprt);
 
-		/* Bump used SQ WR count and post */
+		svc_xprt_get(&rdma->sc_xprt);
 		ret = ib_post_send(rdma->sc_qp, wr, &bad_wr);
 		trace_svcrdma_post_send(wr, ret);
 		if (ret) {
 			set_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags);
-			for (i = 0; i < wr_count; i++)
-				svc_xprt_put(&rdma->sc_xprt);
+			svc_xprt_put(&rdma->sc_xprt);
 			wake_up(&rdma->sc_send_wait);
 		}
 		break;


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v1 18/19] svcrdma: Persistently allocate and DMA-map Send buffers
  2018-05-07 19:26 [PATCH v1 00/19] NFS/RDMA server for-next Chuck Lever
                   ` (16 preceding siblings ...)
  2018-05-07 19:28 ` [PATCH v1 17/19] svcrdma: Simplify svc_rdma_send() Chuck Lever
@ 2018-05-07 19:28 ` Chuck Lever
  2018-05-07 19:28 ` [PATCH v1 19/19] svcrdma: Remove unused svc_rdma_op_ctxt Chuck Lever
  18 siblings, 0 replies; 29+ messages in thread
From: Chuck Lever @ 2018-05-07 19:28 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

While sending each RPC Reply, svc_rdma_sendto allocates and DMA-
maps a separate buffer where the RPC/RDMA transport header is
constructed. The buffer is unmapped and released in the Send
completion handler. This is significant per-RPC overhead,
especially for small RPCs.

Instead, allocate and DMA-map a buffer, and cache it in each
svc_rdma_send_ctxt. This buffer and its mapping can be re-used
for each RPC, saving the cost of memory allocation and DMA
mapping.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h            |    8 +-
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |   51 +++-------
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c    |   25 +----
 net/sunrpc/xprtrdma/svc_rdma_sendto.c      |  149 ++++++++++++++--------------
 4 files changed, 105 insertions(+), 128 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index a8bfc21..96b14a7 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -162,6 +162,7 @@ struct svc_rdma_send_ctxt {
 	struct list_head	sc_list;
 	struct ib_send_wr	sc_send_wr;
 	struct ib_cqe		sc_cqe;
+	void			*sc_xprt_buf;
 	int			sc_page_count;
 	int			sc_cur_sge_no;
 	struct page		*sc_pages[RPCSVC_MAXPAGES];
@@ -199,9 +200,12 @@ extern int svc_rdma_send_reply_chunk(struct svcxprt_rdma *rdma,
 extern void svc_rdma_send_ctxt_put(struct svcxprt_rdma *rdma,
 				   struct svc_rdma_send_ctxt *ctxt);
 extern int svc_rdma_send(struct svcxprt_rdma *rdma, struct ib_send_wr *wr);
-extern int svc_rdma_map_reply_hdr(struct svcxprt_rdma *rdma,
+extern void svc_rdma_sync_reply_hdr(struct svcxprt_rdma *rdma,
+				    struct svc_rdma_send_ctxt *ctxt,
+				    unsigned int len);
+extern int svc_rdma_map_reply_msg(struct svcxprt_rdma *rdma,
 				  struct svc_rdma_send_ctxt *ctxt,
-				  __be32 *rdma_resp, unsigned int len);
+				  struct xdr_buf *xdr, __be32 *wr_lst);
 extern int svc_rdma_sendto(struct svc_rqst *);
 
 /* svc_rdma_transport.c */
diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
index 40f5e4a..343e7ad 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
@@ -115,43 +115,21 @@ int svc_rdma_handle_bc_reply(struct rpc_xprt *xprt, __be32 *rdma_resp,
  * the adapter has a small maximum SQ depth.
  */
 static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
-			      struct rpc_rqst *rqst)
+			      struct rpc_rqst *rqst,
+			      struct svc_rdma_send_ctxt *ctxt)
 {
-	struct svc_rdma_send_ctxt *ctxt;
 	int ret;
 
-	ctxt = svc_rdma_send_ctxt_get(rdma);
-	if (!ctxt) {
-		ret = -ENOMEM;
-		goto out_err;
-	}
-
-	/* rpcrdma_bc_send_request builds the transport header and
-	 * the backchannel RPC message in the same buffer. Thus only
-	 * one SGE is needed to send both.
-	 */
-	ret = svc_rdma_map_reply_hdr(rdma, ctxt, rqst->rq_buffer,
-				     rqst->rq_snd_buf.len);
+	ret = svc_rdma_map_reply_msg(rdma, ctxt, &rqst->rq_snd_buf, NULL);
 	if (ret < 0)
-		goto out_err;
+		return -EIO;
 
 	/* Bump page refcnt so Send completion doesn't release
 	 * the rq_buffer before all retransmits are complete.
 	 */
 	get_page(virt_to_page(rqst->rq_buffer));
 	ctxt->sc_send_wr.opcode = IB_WR_SEND;
-	ret = svc_rdma_send(rdma, &ctxt->sc_send_wr);
-	if (ret)
-		goto out_unmap;
-
-out_err:
-	dprintk("svcrdma: %s returns %d\n", __func__, ret);
-	return ret;
-
-out_unmap:
-	svc_rdma_send_ctxt_put(rdma, ctxt);
-	ret = -EIO;
-	goto out_err;
+	return svc_rdma_send(rdma, &ctxt->sc_send_wr);
 }
 
 /* Server-side transport endpoint wants a whole page for its send
@@ -198,13 +176,15 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
 {
 	struct rpc_xprt *xprt = rqst->rq_xprt;
 	struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
+	struct svc_rdma_send_ctxt *ctxt;
 	__be32 *p;
 	int rc;
 
-	/* Space in the send buffer for an RPC/RDMA header is reserved
-	 * via xprt->tsh_size.
-	 */
-	p = rqst->rq_buffer;
+	ctxt = svc_rdma_send_ctxt_get(rdma);
+	if (!ctxt)
+		goto drop_connection;
+
+	p = ctxt->sc_xprt_buf;
 	*p++ = rqst->rq_xid;
 	*p++ = rpcrdma_version;
 	*p++ = cpu_to_be32(r_xprt->rx_buf.rb_bc_max_requests);
@@ -212,14 +192,17 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
 	*p++ = xdr_zero;
 	*p++ = xdr_zero;
 	*p   = xdr_zero;
+	svc_rdma_sync_reply_hdr(rdma, ctxt, RPCRDMA_HDRLEN_MIN);
 
 #ifdef SVCRDMA_BACKCHANNEL_DEBUG
 	pr_info("%s: %*ph\n", __func__, 64, rqst->rq_buffer);
 #endif
 
-	rc = svc_rdma_bc_sendto(rdma, rqst);
-	if (rc)
+	rc = svc_rdma_bc_sendto(rdma, rqst, ctxt);
+	if (rc) {
+		svc_rdma_send_ctxt_put(rdma, ctxt);
 		goto drop_connection;
+	}
 	return rc;
 
 drop_connection:
@@ -327,7 +310,7 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
 	xprt->idle_timeout = RPCRDMA_IDLE_DISC_TO;
 
 	xprt->prot = XPRT_TRANSPORT_BC_RDMA;
-	xprt->tsh_size = RPCRDMA_HDRLEN_MIN / sizeof(__be32);
+	xprt->tsh_size = 0;
 	xprt->ops = &xprt_rdma_bc_procs;
 
 	memcpy(&xprt->addr, args->dstaddr, args->addrlen);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 68648e6..09ce09b 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -602,17 +602,15 @@ static void svc_rdma_send_error(struct svcxprt_rdma *xprt,
 				__be32 *rdma_argp, int status)
 {
 	struct svc_rdma_send_ctxt *ctxt;
-	__be32 *p, *err_msgp;
 	unsigned int length;
-	struct page *page;
+	__be32 *p;
 	int ret;
 
-	page = alloc_page(GFP_KERNEL);
-	if (!page)
+	ctxt = svc_rdma_send_ctxt_get(xprt);
+	if (!ctxt)
 		return;
-	err_msgp = page_address(page);
 
-	p = err_msgp;
+	p = ctxt->sc_xprt_buf;
 	*p++ = *rdma_argp;
 	*p++ = *(rdma_argp + 1);
 	*p++ = xprt->sc_fc_credits;
@@ -628,19 +626,8 @@ static void svc_rdma_send_error(struct svcxprt_rdma *xprt,
 		*p++ = err_chunk;
 		trace_svcrdma_err_chunk(*rdma_argp);
 	}
-	length = (unsigned long)p - (unsigned long)err_msgp;
-
-	/* Map transport header; no RPC message payload */
-	ctxt = svc_rdma_send_ctxt_get(xprt);
-	if (!ctxt)
-		return;
-
-	ret = svc_rdma_map_reply_hdr(xprt, ctxt, err_msgp, length);
-	if (ret) {
-		dprintk("svcrdma: Error %d mapping send for protocol error\n",
-			ret);
-		return;
-	}
+	length = (unsigned long)p - (unsigned long)ctxt->sc_xprt_buf;
+	svc_rdma_sync_reply_hdr(xprt, ctxt, length);
 
 	ctxt->sc_send_wr.opcode = IB_WR_SEND;
 	ret = svc_rdma_send(xprt, &ctxt->sc_send_wr);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index edfeca4..4a3efae 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -127,6 +127,8 @@
 svc_rdma_send_ctxt_alloc(struct svcxprt_rdma *rdma)
 {
 	struct svc_rdma_send_ctxt *ctxt;
+	dma_addr_t addr;
+	void *buffer;
 	size_t size;
 	int i;
 
@@ -134,16 +136,33 @@
 	size += rdma->sc_max_send_sges * sizeof(struct ib_sge);
 	ctxt = kmalloc(size, GFP_KERNEL);
 	if (!ctxt)
-		return NULL;
+		goto fail0;
+	buffer = kmalloc(rdma->sc_max_req_size, GFP_KERNEL);
+	if (!buffer)
+		goto fail1;
+	addr = ib_dma_map_single(rdma->sc_pd->device, buffer,
+				 rdma->sc_max_req_size, DMA_TO_DEVICE);
+	if (ib_dma_mapping_error(rdma->sc_pd->device, addr))
+		goto fail2;
 
-	ctxt->sc_cqe.done = svc_rdma_wc_send;
 	ctxt->sc_send_wr.next = NULL;
 	ctxt->sc_send_wr.wr_cqe = &ctxt->sc_cqe;
 	ctxt->sc_send_wr.sg_list = ctxt->sc_sges;
 	ctxt->sc_send_wr.send_flags = IB_SEND_SIGNALED;
+	ctxt->sc_cqe.done = svc_rdma_wc_send;
+	ctxt->sc_xprt_buf = buffer;
+	ctxt->sc_sges[0].addr = addr;
+
 	for (i = 0; i < rdma->sc_max_send_sges; i++)
 		ctxt->sc_sges[i].lkey = rdma->sc_pd->local_dma_lkey;
 	return ctxt;
+
+fail2:
+	kfree(buffer);
+fail1:
+	kfree(ctxt);
+fail0:
+	return NULL;
 }
 
 /**
@@ -157,6 +176,11 @@ void svc_rdma_send_ctxts_destroy(struct svcxprt_rdma *rdma)
 
 	while ((ctxt = svc_rdma_next_send_ctxt(&rdma->sc_send_ctxts))) {
 		list_del(&ctxt->sc_list);
+		ib_dma_unmap_single(rdma->sc_pd->device,
+				    ctxt->sc_sges[0].addr,
+				    rdma->sc_max_req_size,
+				    DMA_TO_DEVICE);
+		kfree(ctxt->sc_xprt_buf);
 		kfree(ctxt);
 	}
 }
@@ -181,6 +205,7 @@ struct svc_rdma_send_ctxt *svc_rdma_send_ctxt_get(struct svcxprt_rdma *rdma)
 
 out:
 	ctxt->sc_send_wr.num_sge = 0;
+	ctxt->sc_cur_sge_no = 0;
 	ctxt->sc_page_count = 0;
 	return ctxt;
 
@@ -205,7 +230,10 @@ void svc_rdma_send_ctxt_put(struct svcxprt_rdma *rdma,
 	struct ib_device *device = rdma->sc_cm_id->device;
 	unsigned int i;
 
-	for (i = 0; i < ctxt->sc_send_wr.num_sge; i++)
+	/* The first SGE contains the transport header, which
+	 * remains mapped until @ctxt is destroyed.
+	 */
+	for (i = 1; i < ctxt->sc_send_wr.num_sge; i++)
 		ib_dma_unmap_page(device,
 				  ctxt->sc_sges[i].addr,
 				  ctxt->sc_sges[i].length,
@@ -519,35 +547,37 @@ static int svc_rdma_dma_map_buf(struct svcxprt_rdma *rdma,
 }
 
 /**
- * svc_rdma_map_reply_hdr - DMA map the transport header buffer
+ * svc_rdma_sync_reply_hdr - DMA sync the transport header buffer
  * @rdma: controlling transport
- * @ctxt: op_ctxt for the Send WR
- * @rdma_resp: buffer containing transport header
+ * @ctxt: send_ctxt for the Send WR
  * @len: length of transport header
  *
- * Returns:
- *	%0 if the header is DMA mapped,
- *	%-EIO if DMA mapping failed.
  */
-int svc_rdma_map_reply_hdr(struct svcxprt_rdma *rdma,
-			   struct svc_rdma_send_ctxt *ctxt,
-			   __be32 *rdma_resp,
-			   unsigned int len)
+void svc_rdma_sync_reply_hdr(struct svcxprt_rdma *rdma,
+			     struct svc_rdma_send_ctxt *ctxt,
+			     unsigned int len)
 {
-	ctxt->sc_pages[0] = virt_to_page(rdma_resp);
-	ctxt->sc_page_count++;
-	ctxt->sc_cur_sge_no = 0;
-	return svc_rdma_dma_map_page(rdma, ctxt, ctxt->sc_pages[0], 0, len);
+	ctxt->sc_sges[0].length = len;
+	ctxt->sc_send_wr.num_sge++;
+	ib_dma_sync_single_for_device(rdma->sc_pd->device,
+				      ctxt->sc_sges[0].addr, len,
+				      DMA_TO_DEVICE);
 }
 
-/* Load the xdr_buf into the ctxt's sge array, and DMA map each
+/* svc_rdma_map_reply_msg - Map the buffer holding RPC message
+ * @rdma: controlling transport
+ * @ctxt: send_ctxt for the Send WR
+ * @xdr: prepared xdr_buf containing RPC message
+ * @wr_lst: pointer to Call header's Write list, or NULL
+ *
+ * Load the xdr_buf into the ctxt's sge array, and DMA map each
  * element as it is added.
  *
  * Returns zero on success, or a negative errno on failure.
  */
-static int svc_rdma_map_reply_msg(struct svcxprt_rdma *rdma,
-				  struct svc_rdma_send_ctxt *ctxt,
-				  struct xdr_buf *xdr, __be32 *wr_lst)
+int svc_rdma_map_reply_msg(struct svcxprt_rdma *rdma,
+			   struct svc_rdma_send_ctxt *ctxt,
+			   struct xdr_buf *xdr, __be32 *wr_lst)
 {
 	unsigned int len, remaining;
 	unsigned long page_off;
@@ -624,7 +654,7 @@ static void svc_rdma_save_io_pages(struct svc_rqst *rqstp,
 
 	ctxt->sc_page_count += pages;
 	for (i = 0; i < pages; i++) {
-		ctxt->sc_pages[i + 1] = rqstp->rq_respages[i];
+		ctxt->sc_pages[i] = rqstp->rq_respages[i];
 		rqstp->rq_respages[i] = NULL;
 	}
 	rqstp->rq_next_page = rqstp->rq_respages + 1;
@@ -649,27 +679,18 @@ static void svc_rdma_save_io_pages(struct svc_rqst *rqstp,
  * - The Reply's transport header will never be larger than a page.
  */
 static int svc_rdma_send_reply_msg(struct svcxprt_rdma *rdma,
-				   __be32 *rdma_argp, __be32 *rdma_resp,
+				   struct svc_rdma_send_ctxt *ctxt,
+				   __be32 *rdma_argp,
 				   struct svc_rqst *rqstp,
 				   __be32 *wr_lst, __be32 *rp_ch)
 {
-	struct svc_rdma_send_ctxt *ctxt;
 	int ret;
 
-	ctxt = svc_rdma_send_ctxt_get(rdma);
-	if (!ctxt)
-		return -ENOMEM;
-
-	ret = svc_rdma_map_reply_hdr(rdma, ctxt, rdma_resp,
-				     svc_rdma_reply_hdr_len(rdma_resp));
-	if (ret < 0)
-		goto err;
-
 	if (!rp_ch) {
 		ret = svc_rdma_map_reply_msg(rdma, ctxt,
 					     &rqstp->rq_res, wr_lst);
 		if (ret < 0)
-			goto err;
+			return ret;
 	}
 
 	svc_rdma_save_io_pages(rqstp, ctxt);
@@ -683,15 +704,7 @@ static int svc_rdma_send_reply_msg(struct svcxprt_rdma *rdma,
 	}
 	dprintk("svcrdma: posting Send WR with %u sge(s)\n",
 		ctxt->sc_send_wr.num_sge);
-	ret = svc_rdma_send(rdma, &ctxt->sc_send_wr);
-	if (ret)
-		goto err;
-
-	return 0;
-
-err:
-	svc_rdma_send_ctxt_put(rdma, ctxt);
-	return ret;
+	return svc_rdma_send(rdma, &ctxt->sc_send_wr);
 }
 
 /* Given the client-provided Write and Reply chunks, the server was not
@@ -702,40 +715,29 @@ static int svc_rdma_send_reply_msg(struct svcxprt_rdma *rdma,
  * Remote Invalidation is skipped for simplicity.
  */
 static int svc_rdma_send_error_msg(struct svcxprt_rdma *rdma,
-				   __be32 *rdma_resp, struct svc_rqst *rqstp)
+				   struct svc_rdma_send_ctxt *ctxt,
+				   struct svc_rqst *rqstp)
 {
-	struct svc_rdma_send_ctxt *ctxt;
 	__be32 *p;
 	int ret;
 
-	ctxt = svc_rdma_send_ctxt_get(rdma);
-	if (!ctxt)
-		return -ENOMEM;
-
-	/* Replace the original transport header with an
-	 * RDMA_ERROR response. XID etc are preserved.
-	 */
-	trace_svcrdma_err_chunk(*rdma_resp);
-	p = rdma_resp + 3;
+	p = ctxt->sc_xprt_buf;
+	trace_svcrdma_err_chunk(*p);
+	p += 3;
 	*p++ = rdma_error;
 	*p   = err_chunk;
-
-	ret = svc_rdma_map_reply_hdr(rdma, ctxt, rdma_resp, 20);
-	if (ret < 0)
-		goto err;
+	svc_rdma_sync_reply_hdr(rdma, ctxt, RPCRDMA_HDRLEN_ERR);
 
 	svc_rdma_save_io_pages(rqstp, ctxt);
 
 	ctxt->sc_send_wr.opcode = IB_WR_SEND;
 	ret = svc_rdma_send(rdma, &ctxt->sc_send_wr);
-	if (ret)
-		goto err;
+	if (ret) {
+		svc_rdma_send_ctxt_put(rdma, ctxt);
+		return ret;
+	}
 
 	return 0;
-
-err:
-	svc_rdma_send_ctxt_put(rdma, ctxt);
-	return ret;
 }
 
 void svc_rdma_prep_reply_hdr(struct svc_rqst *rqstp)
@@ -762,7 +764,7 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
 	struct svc_rdma_recv_ctxt *rctxt = rqstp->rq_xprt_ctxt;
 	__be32 *p, *rdma_argp, *rdma_resp, *wr_lst, *rp_ch;
 	struct xdr_buf *xdr = &rqstp->rq_res;
-	struct page *res_page;
+	struct svc_rdma_send_ctxt *sctxt;
 	int ret;
 
 	rdma_argp = rctxt->rc_recv_buf;
@@ -775,10 +777,10 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
 	 * critical section.
 	 */
 	ret = -ENOMEM;
-	res_page = alloc_page(GFP_KERNEL);
-	if (!res_page)
+	sctxt = svc_rdma_send_ctxt_get(rdma);
+	if (!sctxt)
 		goto err0;
-	rdma_resp = page_address(res_page);
+	rdma_resp = sctxt->sc_xprt_buf;
 
 	p = rdma_resp;
 	*p++ = *rdma_argp;
@@ -805,10 +807,11 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
 		svc_rdma_xdr_encode_reply_chunk(rdma_resp, rp_ch, ret);
 	}
 
-	ret = svc_rdma_send_reply_msg(rdma, rdma_argp, rdma_resp, rqstp,
+	svc_rdma_sync_reply_hdr(rdma, sctxt, svc_rdma_reply_hdr_len(rdma_resp));
+	ret = svc_rdma_send_reply_msg(rdma, sctxt, rdma_argp, rqstp,
 				      wr_lst, rp_ch);
 	if (ret < 0)
-		goto err0;
+		goto err1;
 	ret = 0;
 
 out:
@@ -820,14 +823,14 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
 	if (ret != -E2BIG && ret != -EINVAL)
 		goto err1;
 
-	ret = svc_rdma_send_error_msg(rdma, rdma_resp, rqstp);
+	ret = svc_rdma_send_error_msg(rdma, sctxt, rqstp);
 	if (ret < 0)
-		goto err0;
+		goto err1;
 	ret = 0;
 	goto out;
 
  err1:
-	put_page(res_page);
+	svc_rdma_send_ctxt_put(rdma, sctxt);
  err0:
 	trace_svcrdma_send_failed(rqstp, ret);
 	set_bit(XPT_CLOSE, &xprt->xpt_flags);


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v1 19/19] svcrdma: Remove unused svc_rdma_op_ctxt
  2018-05-07 19:26 [PATCH v1 00/19] NFS/RDMA server for-next Chuck Lever
                   ` (17 preceding siblings ...)
  2018-05-07 19:28 ` [PATCH v1 18/19] svcrdma: Persistently allocate and DMA-map Send buffers Chuck Lever
@ 2018-05-07 19:28 ` Chuck Lever
  18 siblings, 0 replies; 29+ messages in thread
From: Chuck Lever @ 2018-05-07 19:28 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

Clean up: Eliminate a structure that is no longer used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h |   21 ---------------------
 1 file changed, 21 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 96b14a7..fd78f78 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -71,26 +71,6 @@ enum {
 extern atomic_t rdma_stat_sq_poll;
 extern atomic_t rdma_stat_sq_prod;
 
-/*
- * Contexts are built when an RDMA request is created and are a
- * record of the resources that can be recovered when the request
- * completes.
- */
-struct svc_rdma_op_ctxt {
-	struct list_head list;
-	struct xdr_buf arg;
-	struct ib_cqe cqe;
-	u32 byte_len;
-	struct svcxprt_rdma *xprt;
-	enum dma_data_direction direction;
-	int count;
-	unsigned int mapped_sges;
-	int hdr_count;
-	struct ib_send_wr send_wr;
-	struct ib_sge sge[1 + RPCRDMA_MAX_INLINE_THRESH / PAGE_SIZE];
-	struct page *pages[RPCSVC_MAXPAGES];
-};
-
 struct svcxprt_rdma {
 	struct svc_xprt      sc_xprt;		/* SVC transport structure */
 	struct rdma_cm_id    *sc_cm_id;		/* RDMA connection id */
@@ -111,7 +91,6 @@ struct svcxprt_rdma {
 
 	spinlock_t	     sc_send_lock;
 	struct list_head     sc_send_ctxts;
-	int		     sc_ctxt_used;
 	spinlock_t	     sc_rw_ctxt_lock;
 	struct list_head     sc_rw_ctxts;
 


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 01/19] svcrdma: Add proper SPDX tags for NetApp-contributed source
  2018-05-07 19:26 ` [PATCH v1 01/19] svcrdma: Add proper SPDX tags for NetApp-contributed source Chuck Lever
@ 2018-05-09 20:23   ` J. Bruce Fields
  2018-05-09 20:42     ` Chuck Lever
  0 siblings, 1 reply; 29+ messages in thread
From: J. Bruce Fields @ 2018-05-09 20:23 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-rdma, linux-nfs

Looking at the git history, it looks like others are taking this as an
opportunity to replace the existing boilerplate.  Could we do this here?
Looks like "BSD-3-Clause" does in fact refer to a license that's
word-for-word the same as the one included here (except for the name of
the copyright holder), so I wonder if we need it written out here any
more.

(Minor point, I'm applying this anyway and you can follow up with the
remval patch or not.)

--b.

On Mon, May 07, 2018 at 03:26:55PM -0400, Chuck Lever wrote:
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  include/linux/sunrpc/svc_rdma.h          |    1 +
>  net/sunrpc/xprtrdma/svc_rdma.c           |    1 +
>  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |    1 +
>  net/sunrpc/xprtrdma/svc_rdma_sendto.c    |    1 +
>  net/sunrpc/xprtrdma/svc_rdma_transport.c |    1 +
>  5 files changed, 5 insertions(+)
> 
> diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
> index 7337e12..88da0c9 100644
> --- a/include/linux/sunrpc/svc_rdma.h
> +++ b/include/linux/sunrpc/svc_rdma.h
> @@ -1,3 +1,4 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
>  /*
>   * Copyright (c) 2005-2006 Network Appliance, Inc. All rights reserved.
>   *
> diff --git a/net/sunrpc/xprtrdma/svc_rdma.c b/net/sunrpc/xprtrdma/svc_rdma.c
> index dd8a431..a490532 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma.c
> @@ -1,3 +1,4 @@
> +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
>  /*
>   * Copyright (c) 2005-2006 Network Appliance, Inc. All rights reserved.
>   *
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> index 3d45015..9eae95d 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> @@ -1,3 +1,4 @@
> +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
>  /*
>   * Copyright (c) 2016, 2017 Oracle. All rights reserved.
>   * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> index 649441d..79bd3a3 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> @@ -1,3 +1,4 @@
> +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
>  /*
>   * Copyright (c) 2016 Oracle. All rights reserved.
>   * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index 96cc8f6..3633254 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -1,3 +1,4 @@
> +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
>  /*
>   * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
>   * Copyright (c) 2005-2007 Network Appliance, Inc. All rights reserved.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 01/19] svcrdma: Add proper SPDX tags for NetApp-contributed source
  2018-05-09 20:23   ` J. Bruce Fields
@ 2018-05-09 20:42     ` Chuck Lever
  2018-05-15 14:52       ` Doug Ledford
  0 siblings, 1 reply; 29+ messages in thread
From: Chuck Lever @ 2018-05-09 20:42 UTC (permalink / raw)
  To: Bruce Fields; +Cc: linux-rdma, Linux NFS Mailing List



> On May 9, 2018, at 4:23 PM, J. Bruce Fields <bfields@fieldses.org> =
wrote:
>=20
> Looking at the git history, it looks like others are taking this as an
> opportunity to replace the existing boilerplate.  Could we do this =
here?
> Looks like "BSD-3-Clause" does in fact refer to a license that's
> word-for-word the same as the one included here (except for the name =
of
> the copyright holder), so I wonder if we need it written out here any
> more.
>=20
> (Minor point, I'm applying this anyway and you can follow up with the
> remval patch or not.)

Because the holder's name is part of the boilerplate, I feel
that it is up to the holder to submit a patch removing the
whole copyright notice if they are comfortable doing that.

But IANAL. Feel free to convince me that I'm being prudish.


> --b.
>=20
> On Mon, May 07, 2018 at 03:26:55PM -0400, Chuck Lever wrote:
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>> include/linux/sunrpc/svc_rdma.h          |    1 +
>> net/sunrpc/xprtrdma/svc_rdma.c           |    1 +
>> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |    1 +
>> net/sunrpc/xprtrdma/svc_rdma_sendto.c    |    1 +
>> net/sunrpc/xprtrdma/svc_rdma_transport.c |    1 +
>> 5 files changed, 5 insertions(+)
>>=20
>> diff --git a/include/linux/sunrpc/svc_rdma.h =
b/include/linux/sunrpc/svc_rdma.h
>> index 7337e12..88da0c9 100644
>> --- a/include/linux/sunrpc/svc_rdma.h
>> +++ b/include/linux/sunrpc/svc_rdma.h
>> @@ -1,3 +1,4 @@
>> +/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
>> /*
>>  * Copyright (c) 2005-2006 Network Appliance, Inc. All rights =
reserved.
>>  *
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma.c =
b/net/sunrpc/xprtrdma/svc_rdma.c
>> index dd8a431..a490532 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma.c
>> @@ -1,3 +1,4 @@
>> +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
>> /*
>>  * Copyright (c) 2005-2006 Network Appliance, Inc. All rights =
reserved.
>>  *
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c =
b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> index 3d45015..9eae95d 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> @@ -1,3 +1,4 @@
>> +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
>> /*
>>  * Copyright (c) 2016, 2017 Oracle. All rights reserved.
>>  * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c =
b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>> index 649441d..79bd3a3 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>> @@ -1,3 +1,4 @@
>> +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
>> /*
>>  * Copyright (c) 2016 Oracle. All rights reserved.
>>  * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c =
b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> index 96cc8f6..3633254 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> @@ -1,3 +1,4 @@
>> +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
>> /*
>>  * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
>>  * Copyright (c) 2005-2007 Network Appliance, Inc. All rights =
reserved.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 06/19] svcrdma: Introduce svc_rdma_recv_ctxt
  2018-05-07 19:27 ` [PATCH v1 06/19] svcrdma: Introduce svc_rdma_recv_ctxt Chuck Lever
@ 2018-05-09 20:48   ` J. Bruce Fields
  2018-05-09 21:02     ` Chuck Lever
  0 siblings, 1 reply; 29+ messages in thread
From: J. Bruce Fields @ 2018-05-09 20:48 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-rdma, linux-nfs

On Mon, May 07, 2018 at 03:27:21PM -0400, Chuck Lever wrote:
> svc_rdma_op_ctxt's are pre-allocated and maintained on a per-xprt
> free list. This eliminates the overhead of calling kmalloc / kfree,
> both of which grab a globally shared lock that disables interrupts.
> To reduce contention further, separate the use of these objects in
> the Receive and Send paths in svcrdma.
> 
> Subsequent patches will take advantage of this separation by
> allocating real resources which are then cached in these objects.
> The allocations are freed when the transport is torn down.

Out of curiosity, about how much memory does that end up being per
svc_xprt?

--b.

> 
> I've renamed the structure so that static type checking can be used
> to ensure that uses of op_ctxt and recv_ctxt are not confused. As an
> additional clean up, structure fields are renamed to conform with
> kernel coding conventions.
> 
> As a final clean up, helpers related to recv_ctxt are moved closer
> to the functions that use them.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  include/linux/sunrpc/svc_rdma.h          |   24 ++
>  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |  318 ++++++++++++++++++++++++++----
>  net/sunrpc/xprtrdma/svc_rdma_rw.c        |   84 ++++----
>  net/sunrpc/xprtrdma/svc_rdma_sendto.c    |    2 
>  net/sunrpc/xprtrdma/svc_rdma_transport.c |  142 +------------
>  5 files changed, 349 insertions(+), 221 deletions(-)
> 
> diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
> index 88da0c9..37f759d 100644
> --- a/include/linux/sunrpc/svc_rdma.h
> +++ b/include/linux/sunrpc/svc_rdma.h
> @@ -128,6 +128,9 @@ struct svcxprt_rdma {
>  	unsigned long	     sc_flags;
>  	struct list_head     sc_read_complete_q;
>  	struct work_struct   sc_work;
> +
> +	spinlock_t	     sc_recv_lock;
> +	struct list_head     sc_recv_ctxts;
>  };
>  /* sc_flags */
>  #define RDMAXPRT_CONN_PENDING	3
> @@ -142,6 +145,19 @@ struct svcxprt_rdma {
>  
>  #define RPCSVC_MAXPAYLOAD_RDMA	RPCSVC_MAXPAYLOAD
>  
> +struct svc_rdma_recv_ctxt {
> +	struct list_head	rc_list;
> +	struct ib_recv_wr	rc_recv_wr;
> +	struct ib_cqe		rc_cqe;
> +	struct xdr_buf		rc_arg;
> +	u32			rc_byte_len;
> +	unsigned int		rc_page_count;
> +	unsigned int		rc_hdr_count;
> +	struct ib_sge		rc_sges[1 +
> +					RPCRDMA_MAX_INLINE_THRESH / PAGE_SIZE];
> +	struct page		*rc_pages[RPCSVC_MAXPAGES];
> +};
> +
>  /* Track DMA maps for this transport and context */
>  static inline void svc_rdma_count_mappings(struct svcxprt_rdma *rdma,
>  					   struct svc_rdma_op_ctxt *ctxt)
> @@ -155,13 +171,19 @@ extern int svc_rdma_handle_bc_reply(struct rpc_xprt *xprt,
>  				    struct xdr_buf *rcvbuf);
>  
>  /* svc_rdma_recvfrom.c */
> +extern void svc_rdma_recv_ctxts_destroy(struct svcxprt_rdma *rdma);
> +extern bool svc_rdma_post_recvs(struct svcxprt_rdma *rdma);
> +extern void svc_rdma_recv_ctxt_put(struct svcxprt_rdma *rdma,
> +				   struct svc_rdma_recv_ctxt *ctxt,
> +				   int free_pages);
> +extern void svc_rdma_flush_recv_queues(struct svcxprt_rdma *rdma);
>  extern int svc_rdma_recvfrom(struct svc_rqst *);
>  
>  /* svc_rdma_rw.c */
>  extern void svc_rdma_destroy_rw_ctxts(struct svcxprt_rdma *rdma);
>  extern int svc_rdma_recv_read_chunk(struct svcxprt_rdma *rdma,
>  				    struct svc_rqst *rqstp,
> -				    struct svc_rdma_op_ctxt *head, __be32 *p);
> +				    struct svc_rdma_recv_ctxt *head, __be32 *p);
>  extern int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma,
>  				     __be32 *wr_ch, struct xdr_buf *xdr);
>  extern int svc_rdma_send_reply_chunk(struct svcxprt_rdma *rdma,
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> index 330d542..b7d9c55 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> @@ -1,6 +1,6 @@
>  // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
>  /*
> - * Copyright (c) 2016, 2017 Oracle. All rights reserved.
> + * Copyright (c) 2016-2018 Oracle. All rights reserved.
>   * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
>   * Copyright (c) 2005-2006 Network Appliance, Inc. All rights reserved.
>   *
> @@ -61,7 +61,7 @@
>   * svc_rdma_recvfrom must post RDMA Reads to pull the RPC Call's
>   * data payload from the client. svc_rdma_recvfrom sets up the
>   * RDMA Reads using pages in svc_rqst::rq_pages, which are
> - * transferred to an svc_rdma_op_ctxt for the duration of the
> + * transferred to an svc_rdma_recv_ctxt for the duration of the
>   * I/O. svc_rdma_recvfrom then returns zero, since the RPC message
>   * is still not yet ready.
>   *
> @@ -70,18 +70,18 @@
>   * svc_rdma_recvfrom again. This second call may use a different
>   * svc_rqst than the first one, thus any information that needs
>   * to be preserved across these two calls is kept in an
> - * svc_rdma_op_ctxt.
> + * svc_rdma_recv_ctxt.
>   *
>   * The second call to svc_rdma_recvfrom performs final assembly
>   * of the RPC Call message, using the RDMA Read sink pages kept in
> - * the svc_rdma_op_ctxt. The xdr_buf is copied from the
> - * svc_rdma_op_ctxt to the second svc_rqst. The second call returns
> + * the svc_rdma_recv_ctxt. The xdr_buf is copied from the
> + * svc_rdma_recv_ctxt to the second svc_rqst. The second call returns
>   * the length of the completed RPC Call message.
>   *
>   * Page Management
>   *
>   * Pages under I/O must be transferred from the first svc_rqst to an
> - * svc_rdma_op_ctxt before the first svc_rdma_recvfrom call returns.
> + * svc_rdma_recv_ctxt before the first svc_rdma_recvfrom call returns.
>   *
>   * The first svc_rqst supplies pages for RDMA Reads. These are moved
>   * from rqstp::rq_pages into ctxt::pages. The consumed elements of
> @@ -89,7 +89,7 @@
>   * svc_rdma_recvfrom call returns.
>   *
>   * During the second svc_rdma_recvfrom call, RDMA Read sink pages
> - * are transferred from the svc_rdma_op_ctxt to the second svc_rqst
> + * are transferred from the svc_rdma_recv_ctxt to the second svc_rqst
>   * (see rdma_read_complete() below).
>   */
>  
> @@ -108,13 +108,247 @@
>  
>  #define RPCDBG_FACILITY	RPCDBG_SVCXPRT
>  
> +static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc);
> +
> +static inline struct svc_rdma_recv_ctxt *
> +svc_rdma_next_recv_ctxt(struct list_head *list)
> +{
> +	return list_first_entry_or_null(list, struct svc_rdma_recv_ctxt,
> +					rc_list);
> +}
> +
> +/**
> + * svc_rdma_recv_ctxts_destroy - Release all recv_ctxt's for an xprt
> + * @rdma: svcxprt_rdma being torn down
> + *
> + */
> +void svc_rdma_recv_ctxts_destroy(struct svcxprt_rdma *rdma)
> +{
> +	struct svc_rdma_recv_ctxt *ctxt;
> +
> +	while ((ctxt = svc_rdma_next_recv_ctxt(&rdma->sc_recv_ctxts))) {
> +		list_del(&ctxt->rc_list);
> +		kfree(ctxt);
> +	}
> +}
> +
> +static struct svc_rdma_recv_ctxt *
> +svc_rdma_recv_ctxt_get(struct svcxprt_rdma *rdma)
> +{
> +	struct svc_rdma_recv_ctxt *ctxt;
> +
> +	spin_lock(&rdma->sc_recv_lock);
> +	ctxt = svc_rdma_next_recv_ctxt(&rdma->sc_recv_ctxts);
> +	if (!ctxt)
> +		goto out_empty;
> +	list_del(&ctxt->rc_list);
> +	spin_unlock(&rdma->sc_recv_lock);
> +
> +out:
> +	ctxt->rc_recv_wr.num_sge = 0;
> +	ctxt->rc_page_count = 0;
> +	return ctxt;
> +
> +out_empty:
> +	spin_unlock(&rdma->sc_recv_lock);
> +
> +	ctxt = kmalloc(sizeof(*ctxt), GFP_KERNEL);
> +	if (!ctxt)
> +		return NULL;
> +	goto out;
> +}
> +
> +static void svc_rdma_recv_ctxt_unmap(struct svcxprt_rdma *rdma,
> +				     struct svc_rdma_recv_ctxt *ctxt)
> +{
> +	struct ib_device *device = rdma->sc_cm_id->device;
> +	int i;
> +
> +	for (i = 0; i < ctxt->rc_recv_wr.num_sge; i++)
> +		ib_dma_unmap_page(device,
> +				  ctxt->rc_sges[i].addr,
> +				  ctxt->rc_sges[i].length,
> +				  DMA_FROM_DEVICE);
> +}
> +
> +/**
> + * svc_rdma_recv_ctxt_put - Return recv_ctxt to free list
> + * @rdma: controlling svcxprt_rdma
> + * @ctxt: object to return to the free list
> + * @free_pages: Non-zero if rc_pages should be freed
> + *
> + */
> +void svc_rdma_recv_ctxt_put(struct svcxprt_rdma *rdma,
> +			    struct svc_rdma_recv_ctxt *ctxt,
> +			    int free_pages)
> +{
> +	unsigned int i;
> +
> +	if (free_pages)
> +		for (i = 0; i < ctxt->rc_page_count; i++)
> +			put_page(ctxt->rc_pages[i]);
> +	spin_lock(&rdma->sc_recv_lock);
> +	list_add(&ctxt->rc_list, &rdma->sc_recv_ctxts);
> +	spin_unlock(&rdma->sc_recv_lock);
> +}
> +
> +static int svc_rdma_post_recv(struct svcxprt_rdma *rdma)
> +{
> +	struct ib_device *device = rdma->sc_cm_id->device;
> +	struct svc_rdma_recv_ctxt *ctxt;
> +	struct ib_recv_wr *bad_recv_wr;
> +	int sge_no, buflen, ret;
> +	struct page *page;
> +	dma_addr_t pa;
> +
> +	ctxt = svc_rdma_recv_ctxt_get(rdma);
> +	if (!ctxt)
> +		return -ENOMEM;
> +
> +	buflen = 0;
> +	ctxt->rc_cqe.done = svc_rdma_wc_receive;
> +	for (sge_no = 0; buflen < rdma->sc_max_req_size; sge_no++) {
> +		if (sge_no >= rdma->sc_max_sge) {
> +			pr_err("svcrdma: Too many sges (%d)\n", sge_no);
> +			goto err_put_ctxt;
> +		}
> +
> +		page = alloc_page(GFP_KERNEL);
> +		if (!page)
> +			goto err_put_ctxt;
> +		ctxt->rc_pages[sge_no] = page;
> +		ctxt->rc_page_count++;
> +
> +		pa = ib_dma_map_page(device, ctxt->rc_pages[sge_no],
> +				     0, PAGE_SIZE, DMA_FROM_DEVICE);
> +		if (ib_dma_mapping_error(device, pa))
> +			goto err_put_ctxt;
> +		ctxt->rc_sges[sge_no].addr = pa;
> +		ctxt->rc_sges[sge_no].length = PAGE_SIZE;
> +		ctxt->rc_sges[sge_no].lkey = rdma->sc_pd->local_dma_lkey;
> +		ctxt->rc_recv_wr.num_sge++;
> +
> +		buflen += PAGE_SIZE;
> +	}
> +	ctxt->rc_recv_wr.next = NULL;
> +	ctxt->rc_recv_wr.sg_list = &ctxt->rc_sges[0];
> +	ctxt->rc_recv_wr.wr_cqe = &ctxt->rc_cqe;
> +
> +	svc_xprt_get(&rdma->sc_xprt);
> +	ret = ib_post_recv(rdma->sc_qp, &ctxt->rc_recv_wr, &bad_recv_wr);
> +	trace_svcrdma_post_recv(&ctxt->rc_recv_wr, ret);
> +	if (ret)
> +		goto err_post;
> +	return 0;
> +
> +err_put_ctxt:
> +	svc_rdma_recv_ctxt_unmap(rdma, ctxt);
> +	svc_rdma_recv_ctxt_put(rdma, ctxt, 1);
> +	return -ENOMEM;
> +err_post:
> +	svc_rdma_recv_ctxt_unmap(rdma, ctxt);
> +	svc_rdma_recv_ctxt_put(rdma, ctxt, 1);
> +	svc_xprt_put(&rdma->sc_xprt);
> +	return ret;
> +}
> +
> +/**
> + * svc_rdma_post_recvs - Post initial set of Recv WRs
> + * @rdma: fresh svcxprt_rdma
> + *
> + * Returns true if successful, otherwise false.
> + */
> +bool svc_rdma_post_recvs(struct svcxprt_rdma *rdma)
> +{
> +	unsigned int i;
> +	int ret;
> +
> +	for (i = 0; i < rdma->sc_max_requests; i++) {
> +		ret = svc_rdma_post_recv(rdma);
> +		if (ret) {
> +			pr_err("svcrdma: failure posting recv buffers: %d\n",
> +			       ret);
> +			return false;
> +		}
> +	}
> +	return true;
> +}
> +
> +/**
> + * svc_rdma_wc_receive - Invoked by RDMA provider for each polled Receive WC
> + * @cq: Completion Queue context
> + * @wc: Work Completion object
> + *
> + * NB: The svc_xprt/svcxprt_rdma is pinned whenever it's possible that
> + * the Receive completion handler could be running.
> + */
> +static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
> +{
> +	struct svcxprt_rdma *rdma = cq->cq_context;
> +	struct ib_cqe *cqe = wc->wr_cqe;
> +	struct svc_rdma_recv_ctxt *ctxt;
> +
> +	trace_svcrdma_wc_receive(wc);
> +
> +	/* WARNING: Only wc->wr_cqe and wc->status are reliable */
> +	ctxt = container_of(cqe, struct svc_rdma_recv_ctxt, rc_cqe);
> +	svc_rdma_recv_ctxt_unmap(rdma, ctxt);
> +
> +	if (wc->status != IB_WC_SUCCESS)
> +		goto flushed;
> +
> +	if (svc_rdma_post_recv(rdma))
> +		goto post_err;
> +
> +	/* All wc fields are now known to be valid */
> +	ctxt->rc_byte_len = wc->byte_len;
> +	spin_lock(&rdma->sc_rq_dto_lock);
> +	list_add_tail(&ctxt->rc_list, &rdma->sc_rq_dto_q);
> +	spin_unlock(&rdma->sc_rq_dto_lock);
> +	set_bit(XPT_DATA, &rdma->sc_xprt.xpt_flags);
> +	if (!test_bit(RDMAXPRT_CONN_PENDING, &rdma->sc_flags))
> +		svc_xprt_enqueue(&rdma->sc_xprt);
> +	goto out;
> +
> +flushed:
> +	if (wc->status != IB_WC_WR_FLUSH_ERR)
> +		pr_err("svcrdma: Recv: %s (%u/0x%x)\n",
> +		       ib_wc_status_msg(wc->status),
> +		       wc->status, wc->vendor_err);
> +post_err:
> +	svc_rdma_recv_ctxt_put(rdma, ctxt, 1);
> +	set_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags);
> +	svc_xprt_enqueue(&rdma->sc_xprt);
> +out:
> +	svc_xprt_put(&rdma->sc_xprt);
> +}
> +
> +/**
> + * svc_rdma_flush_recv_queues - Drain pending Receive work
> + * @rdma: svcxprt_rdma being shut down
> + *
> + */
> +void svc_rdma_flush_recv_queues(struct svcxprt_rdma *rdma)
> +{
> +	struct svc_rdma_recv_ctxt *ctxt;
> +
> +	while ((ctxt = svc_rdma_next_recv_ctxt(&rdma->sc_read_complete_q))) {
> +		list_del(&ctxt->rc_list);
> +		svc_rdma_recv_ctxt_put(rdma, ctxt, 1);
> +	}
> +	while ((ctxt = svc_rdma_next_recv_ctxt(&rdma->sc_rq_dto_q))) {
> +		list_del(&ctxt->rc_list);
> +		svc_rdma_recv_ctxt_put(rdma, ctxt, 1);
> +	}
> +}
> +
>  /*
>   * Replace the pages in the rq_argpages array with the pages from the SGE in
>   * the RDMA_RECV completion. The SGL should contain full pages up until the
>   * last one.
>   */
>  static void svc_rdma_build_arg_xdr(struct svc_rqst *rqstp,
> -				   struct svc_rdma_op_ctxt *ctxt)
> +				   struct svc_rdma_recv_ctxt *ctxt)
>  {
>  	struct page *page;
>  	int sge_no;
> @@ -123,30 +357,30 @@ static void svc_rdma_build_arg_xdr(struct svc_rqst *rqstp,
>  	/* The reply path assumes the Call's transport header resides
>  	 * in rqstp->rq_pages[0].
>  	 */
> -	page = ctxt->pages[0];
> +	page = ctxt->rc_pages[0];
>  	put_page(rqstp->rq_pages[0]);
>  	rqstp->rq_pages[0] = page;
>  
>  	/* Set up the XDR head */
>  	rqstp->rq_arg.head[0].iov_base = page_address(page);
>  	rqstp->rq_arg.head[0].iov_len =
> -		min_t(size_t, ctxt->byte_len, ctxt->sge[0].length);
> -	rqstp->rq_arg.len = ctxt->byte_len;
> -	rqstp->rq_arg.buflen = ctxt->byte_len;
> +		min_t(size_t, ctxt->rc_byte_len, ctxt->rc_sges[0].length);
> +	rqstp->rq_arg.len = ctxt->rc_byte_len;
> +	rqstp->rq_arg.buflen = ctxt->rc_byte_len;
>  
>  	/* Compute bytes past head in the SGL */
> -	len = ctxt->byte_len - rqstp->rq_arg.head[0].iov_len;
> +	len = ctxt->rc_byte_len - rqstp->rq_arg.head[0].iov_len;
>  
>  	/* If data remains, store it in the pagelist */
>  	rqstp->rq_arg.page_len = len;
>  	rqstp->rq_arg.page_base = 0;
>  
>  	sge_no = 1;
> -	while (len && sge_no < ctxt->count) {
> -		page = ctxt->pages[sge_no];
> +	while (len && sge_no < ctxt->rc_recv_wr.num_sge) {
> +		page = ctxt->rc_pages[sge_no];
>  		put_page(rqstp->rq_pages[sge_no]);
>  		rqstp->rq_pages[sge_no] = page;
> -		len -= min_t(u32, len, ctxt->sge[sge_no].length);
> +		len -= min_t(u32, len, ctxt->rc_sges[sge_no].length);
>  		sge_no++;
>  	}
>  	rqstp->rq_respages = &rqstp->rq_pages[sge_no];
> @@ -154,11 +388,11 @@ static void svc_rdma_build_arg_xdr(struct svc_rqst *rqstp,
>  
>  	/* If not all pages were used from the SGL, free the remaining ones */
>  	len = sge_no;
> -	while (sge_no < ctxt->count) {
> -		page = ctxt->pages[sge_no++];
> +	while (sge_no < ctxt->rc_recv_wr.num_sge) {
> +		page = ctxt->rc_pages[sge_no++];
>  		put_page(page);
>  	}
> -	ctxt->count = len;
> +	ctxt->rc_page_count = len;
>  
>  	/* Set up tail */
>  	rqstp->rq_arg.tail[0].iov_base = NULL;
> @@ -364,29 +598,29 @@ static int svc_rdma_xdr_decode_req(struct xdr_buf *rq_arg)
>  }
>  
>  static void rdma_read_complete(struct svc_rqst *rqstp,
> -			       struct svc_rdma_op_ctxt *head)
> +			       struct svc_rdma_recv_ctxt *head)
>  {
>  	int page_no;
>  
>  	/* Copy RPC pages */
> -	for (page_no = 0; page_no < head->count; page_no++) {
> +	for (page_no = 0; page_no < head->rc_page_count; page_no++) {
>  		put_page(rqstp->rq_pages[page_no]);
> -		rqstp->rq_pages[page_no] = head->pages[page_no];
> +		rqstp->rq_pages[page_no] = head->rc_pages[page_no];
>  	}
>  
>  	/* Point rq_arg.pages past header */
> -	rqstp->rq_arg.pages = &rqstp->rq_pages[head->hdr_count];
> -	rqstp->rq_arg.page_len = head->arg.page_len;
> +	rqstp->rq_arg.pages = &rqstp->rq_pages[head->rc_hdr_count];
> +	rqstp->rq_arg.page_len = head->rc_arg.page_len;
>  
>  	/* rq_respages starts after the last arg page */
>  	rqstp->rq_respages = &rqstp->rq_pages[page_no];
>  	rqstp->rq_next_page = rqstp->rq_respages + 1;
>  
>  	/* Rebuild rq_arg head and tail. */
> -	rqstp->rq_arg.head[0] = head->arg.head[0];
> -	rqstp->rq_arg.tail[0] = head->arg.tail[0];
> -	rqstp->rq_arg.len = head->arg.len;
> -	rqstp->rq_arg.buflen = head->arg.buflen;
> +	rqstp->rq_arg.head[0] = head->rc_arg.head[0];
> +	rqstp->rq_arg.tail[0] = head->rc_arg.tail[0];
> +	rqstp->rq_arg.len = head->rc_arg.len;
> +	rqstp->rq_arg.buflen = head->rc_arg.buflen;
>  }
>  
>  static void svc_rdma_send_error(struct svcxprt_rdma *xprt,
> @@ -506,28 +740,26 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
>  	struct svc_xprt *xprt = rqstp->rq_xprt;
>  	struct svcxprt_rdma *rdma_xprt =
>  		container_of(xprt, struct svcxprt_rdma, sc_xprt);
> -	struct svc_rdma_op_ctxt *ctxt;
> +	struct svc_rdma_recv_ctxt *ctxt;
>  	__be32 *p;
>  	int ret;
>  
>  	spin_lock(&rdma_xprt->sc_rq_dto_lock);
> -	if (!list_empty(&rdma_xprt->sc_read_complete_q)) {
> -		ctxt = list_first_entry(&rdma_xprt->sc_read_complete_q,
> -					struct svc_rdma_op_ctxt, list);
> -		list_del(&ctxt->list);
> +	ctxt = svc_rdma_next_recv_ctxt(&rdma_xprt->sc_read_complete_q);
> +	if (ctxt) {
> +		list_del(&ctxt->rc_list);
>  		spin_unlock(&rdma_xprt->sc_rq_dto_lock);
>  		rdma_read_complete(rqstp, ctxt);
>  		goto complete;
> -	} else if (!list_empty(&rdma_xprt->sc_rq_dto_q)) {
> -		ctxt = list_first_entry(&rdma_xprt->sc_rq_dto_q,
> -					struct svc_rdma_op_ctxt, list);
> -		list_del(&ctxt->list);
> -	} else {
> +	}
> +	ctxt = svc_rdma_next_recv_ctxt(&rdma_xprt->sc_rq_dto_q);
> +	if (!ctxt) {
>  		/* No new incoming requests, terminate the loop */
>  		clear_bit(XPT_DATA, &xprt->xpt_flags);
>  		spin_unlock(&rdma_xprt->sc_rq_dto_lock);
>  		return 0;
>  	}
> +	list_del(&ctxt->rc_list);
>  	spin_unlock(&rdma_xprt->sc_rq_dto_lock);
>  
>  	atomic_inc(&rdma_stat_recv);
> @@ -545,7 +777,7 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
>  	if (svc_rdma_is_backchannel_reply(xprt, p)) {
>  		ret = svc_rdma_handle_bc_reply(xprt->xpt_bc_xprt, p,
>  					       &rqstp->rq_arg);
> -		svc_rdma_put_context(ctxt, 0);
> +		svc_rdma_recv_ctxt_put(rdma_xprt, ctxt, 0);
>  		return ret;
>  	}
>  
> @@ -554,7 +786,7 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
>  		goto out_readchunk;
>  
>  complete:
> -	svc_rdma_put_context(ctxt, 0);
> +	svc_rdma_recv_ctxt_put(rdma_xprt, ctxt, 0);
>  	rqstp->rq_prot = IPPROTO_MAX;
>  	svc_xprt_copy_addrs(rqstp, xprt);
>  	return rqstp->rq_arg.len;
> @@ -567,16 +799,16 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
>  
>  out_err:
>  	svc_rdma_send_error(rdma_xprt, p, ret);
> -	svc_rdma_put_context(ctxt, 0);
> +	svc_rdma_recv_ctxt_put(rdma_xprt, ctxt, 0);
>  	return 0;
>  
>  out_postfail:
>  	if (ret == -EINVAL)
>  		svc_rdma_send_error(rdma_xprt, p, ret);
> -	svc_rdma_put_context(ctxt, 1);
> +	svc_rdma_recv_ctxt_put(rdma_xprt, ctxt, 1);
>  	return ret;
>  
>  out_drop:
> -	svc_rdma_put_context(ctxt, 1);
> +	svc_rdma_recv_ctxt_put(rdma_xprt, ctxt, 1);
>  	return 0;
>  }
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
> index 887ceef..c080ce2 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
> @@ -1,6 +1,6 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /*
> - * Copyright (c) 2016 Oracle.  All rights reserved.
> + * Copyright (c) 2016-2018 Oracle.  All rights reserved.
>   *
>   * Use the core R/W API to move RPC-over-RDMA Read and Write chunks.
>   */
> @@ -227,7 +227,7 @@ static void svc_rdma_write_done(struct ib_cq *cq, struct ib_wc *wc)
>  /* State for pulling a Read chunk.
>   */
>  struct svc_rdma_read_info {
> -	struct svc_rdma_op_ctxt		*ri_readctxt;
> +	struct svc_rdma_recv_ctxt	*ri_readctxt;
>  	unsigned int			ri_position;
>  	unsigned int			ri_pageno;
>  	unsigned int			ri_pageoff;
> @@ -282,10 +282,10 @@ static void svc_rdma_wc_read_done(struct ib_cq *cq, struct ib_wc *wc)
>  			pr_err("svcrdma: read ctx: %s (%u/0x%x)\n",
>  			       ib_wc_status_msg(wc->status),
>  			       wc->status, wc->vendor_err);
> -		svc_rdma_put_context(info->ri_readctxt, 1);
> +		svc_rdma_recv_ctxt_put(rdma, info->ri_readctxt, 1);
>  	} else {
>  		spin_lock(&rdma->sc_rq_dto_lock);
> -		list_add_tail(&info->ri_readctxt->list,
> +		list_add_tail(&info->ri_readctxt->rc_list,
>  			      &rdma->sc_read_complete_q);
>  		spin_unlock(&rdma->sc_rq_dto_lock);
>  
> @@ -607,7 +607,7 @@ static int svc_rdma_build_read_segment(struct svc_rdma_read_info *info,
>  				       struct svc_rqst *rqstp,
>  				       u32 rkey, u32 len, u64 offset)
>  {
> -	struct svc_rdma_op_ctxt *head = info->ri_readctxt;
> +	struct svc_rdma_recv_ctxt *head = info->ri_readctxt;
>  	struct svc_rdma_chunk_ctxt *cc = &info->ri_cc;
>  	struct svc_rdma_rw_ctxt *ctxt;
>  	unsigned int sge_no, seg_len;
> @@ -625,10 +625,10 @@ static int svc_rdma_build_read_segment(struct svc_rdma_read_info *info,
>  		seg_len = min_t(unsigned int, len,
>  				PAGE_SIZE - info->ri_pageoff);
>  
> -		head->arg.pages[info->ri_pageno] =
> +		head->rc_arg.pages[info->ri_pageno] =
>  			rqstp->rq_pages[info->ri_pageno];
>  		if (!info->ri_pageoff)
> -			head->count++;
> +			head->rc_page_count++;
>  
>  		sg_set_page(sg, rqstp->rq_pages[info->ri_pageno],
>  			    seg_len, info->ri_pageoff);
> @@ -705,9 +705,9 @@ static int svc_rdma_build_read_chunk(struct svc_rqst *rqstp,
>  }
>  
>  /* Construct RDMA Reads to pull over a normal Read chunk. The chunk
> - * data lands in the page list of head->arg.pages.
> + * data lands in the page list of head->rc_arg.pages.
>   *
> - * Currently NFSD does not look at the head->arg.tail[0] iovec.
> + * Currently NFSD does not look at the head->rc_arg.tail[0] iovec.
>   * Therefore, XDR round-up of the Read chunk and trailing
>   * inline content must both be added at the end of the pagelist.
>   */
> @@ -715,10 +715,10 @@ static int svc_rdma_build_normal_read_chunk(struct svc_rqst *rqstp,
>  					    struct svc_rdma_read_info *info,
>  					    __be32 *p)
>  {
> -	struct svc_rdma_op_ctxt *head = info->ri_readctxt;
> +	struct svc_rdma_recv_ctxt *head = info->ri_readctxt;
>  	int ret;
>  
> -	info->ri_pageno = head->hdr_count;
> +	info->ri_pageno = head->rc_hdr_count;
>  	info->ri_pageoff = 0;
>  
>  	ret = svc_rdma_build_read_chunk(rqstp, info, p);
> @@ -732,11 +732,11 @@ static int svc_rdma_build_normal_read_chunk(struct svc_rqst *rqstp,
>  	 * chunk is not included in either the pagelist or in
>  	 * the tail.
>  	 */
> -	head->arg.tail[0].iov_base =
> -		head->arg.head[0].iov_base + info->ri_position;
> -	head->arg.tail[0].iov_len =
> -		head->arg.head[0].iov_len - info->ri_position;
> -	head->arg.head[0].iov_len = info->ri_position;
> +	head->rc_arg.tail[0].iov_base =
> +		head->rc_arg.head[0].iov_base + info->ri_position;
> +	head->rc_arg.tail[0].iov_len =
> +		head->rc_arg.head[0].iov_len - info->ri_position;
> +	head->rc_arg.head[0].iov_len = info->ri_position;
>  
>  	/* Read chunk may need XDR roundup (see RFC 8166, s. 3.4.5.2).
>  	 *
> @@ -749,9 +749,9 @@ static int svc_rdma_build_normal_read_chunk(struct svc_rqst *rqstp,
>  	 */
>  	info->ri_chunklen = XDR_QUADLEN(info->ri_chunklen) << 2;
>  
> -	head->arg.page_len = info->ri_chunklen;
> -	head->arg.len += info->ri_chunklen;
> -	head->arg.buflen += info->ri_chunklen;
> +	head->rc_arg.page_len = info->ri_chunklen;
> +	head->rc_arg.len += info->ri_chunklen;
> +	head->rc_arg.buflen += info->ri_chunklen;
>  
>  out:
>  	return ret;
> @@ -760,7 +760,7 @@ static int svc_rdma_build_normal_read_chunk(struct svc_rqst *rqstp,
>  /* Construct RDMA Reads to pull over a Position Zero Read chunk.
>   * The start of the data lands in the first page just after
>   * the Transport header, and the rest lands in the page list of
> - * head->arg.pages.
> + * head->rc_arg.pages.
>   *
>   * Assumptions:
>   *	- A PZRC has an XDR-aligned length (no implicit round-up).
> @@ -772,11 +772,11 @@ static int svc_rdma_build_pz_read_chunk(struct svc_rqst *rqstp,
>  					struct svc_rdma_read_info *info,
>  					__be32 *p)
>  {
> -	struct svc_rdma_op_ctxt *head = info->ri_readctxt;
> +	struct svc_rdma_recv_ctxt *head = info->ri_readctxt;
>  	int ret;
>  
> -	info->ri_pageno = head->hdr_count - 1;
> -	info->ri_pageoff = offset_in_page(head->byte_len);
> +	info->ri_pageno = head->rc_hdr_count - 1;
> +	info->ri_pageoff = offset_in_page(head->rc_byte_len);
>  
>  	ret = svc_rdma_build_read_chunk(rqstp, info, p);
>  	if (ret < 0)
> @@ -784,22 +784,22 @@ static int svc_rdma_build_pz_read_chunk(struct svc_rqst *rqstp,
>  
>  	trace_svcrdma_encode_pzr(info->ri_chunklen);
>  
> -	head->arg.len += info->ri_chunklen;
> -	head->arg.buflen += info->ri_chunklen;
> +	head->rc_arg.len += info->ri_chunklen;
> +	head->rc_arg.buflen += info->ri_chunklen;
>  
> -	if (head->arg.buflen <= head->sge[0].length) {
> +	if (head->rc_arg.buflen <= head->rc_sges[0].length) {
>  		/* Transport header and RPC message fit entirely
>  		 * in page where head iovec resides.
>  		 */
> -		head->arg.head[0].iov_len = info->ri_chunklen;
> +		head->rc_arg.head[0].iov_len = info->ri_chunklen;
>  	} else {
>  		/* Transport header and part of RPC message reside
>  		 * in the head iovec's page.
>  		 */
> -		head->arg.head[0].iov_len =
> -				head->sge[0].length - head->byte_len;
> -		head->arg.page_len =
> -				info->ri_chunklen - head->arg.head[0].iov_len;
> +		head->rc_arg.head[0].iov_len =
> +			head->rc_sges[0].length - head->rc_byte_len;
> +		head->rc_arg.page_len =
> +			info->ri_chunklen - head->rc_arg.head[0].iov_len;
>  	}
>  
>  out:
> @@ -824,24 +824,24 @@ static int svc_rdma_build_pz_read_chunk(struct svc_rqst *rqstp,
>   * - All Read segments in @p have the same Position value.
>   */
>  int svc_rdma_recv_read_chunk(struct svcxprt_rdma *rdma, struct svc_rqst *rqstp,
> -			     struct svc_rdma_op_ctxt *head, __be32 *p)
> +			     struct svc_rdma_recv_ctxt *head, __be32 *p)
>  {
>  	struct svc_rdma_read_info *info;
>  	struct page **page;
>  	int ret;
>  
>  	/* The request (with page list) is constructed in
> -	 * head->arg. Pages involved with RDMA Read I/O are
> +	 * head->rc_arg. Pages involved with RDMA Read I/O are
>  	 * transferred there.
>  	 */
> -	head->hdr_count = head->count;
> -	head->arg.head[0] = rqstp->rq_arg.head[0];
> -	head->arg.tail[0] = rqstp->rq_arg.tail[0];
> -	head->arg.pages = head->pages;
> -	head->arg.page_base = 0;
> -	head->arg.page_len = 0;
> -	head->arg.len = rqstp->rq_arg.len;
> -	head->arg.buflen = rqstp->rq_arg.buflen;
> +	head->rc_hdr_count = head->rc_page_count;
> +	head->rc_arg.head[0] = rqstp->rq_arg.head[0];
> +	head->rc_arg.tail[0] = rqstp->rq_arg.tail[0];
> +	head->rc_arg.pages = head->rc_pages;
> +	head->rc_arg.page_base = 0;
> +	head->rc_arg.page_len = 0;
> +	head->rc_arg.len = rqstp->rq_arg.len;
> +	head->rc_arg.buflen = rqstp->rq_arg.buflen;
>  
>  	info = svc_rdma_read_info_alloc(rdma);
>  	if (!info)
> @@ -867,7 +867,7 @@ int svc_rdma_recv_read_chunk(struct svcxprt_rdma *rdma, struct svc_rqst *rqstp,
>  
>  out:
>  	/* Read sink pages have been moved from rqstp->rq_pages to
> -	 * head->arg.pages. Force svc_recv to refill those slots
> +	 * head->rc_arg.pages. Force svc_recv to refill those slots
>  	 * in rq_pages.
>  	 */
>  	for (page = rqstp->rq_pages; page < rqstp->rq_respages; page++)
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> index fed28de..a397d9a 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> @@ -1,6 +1,6 @@
>  // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
>  /*
> - * Copyright (c) 2016 Oracle. All rights reserved.
> + * Copyright (c) 2016-2018 Oracle. All rights reserved.
>   * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
>   * Copyright (c) 2005-2006 Network Appliance, Inc. All rights reserved.
>   *
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index ca9001d..afd5e61 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -63,7 +63,6 @@
>  
>  #define RPCDBG_FACILITY	RPCDBG_SVCXPRT
>  
> -static int svc_rdma_post_recv(struct svcxprt_rdma *xprt);
>  static struct svcxprt_rdma *svc_rdma_create_xprt(struct svc_serv *serv,
>  						 struct net *net);
>  static struct svc_xprt *svc_rdma_create(struct svc_serv *serv,
> @@ -175,11 +174,7 @@ static bool svc_rdma_prealloc_ctxts(struct svcxprt_rdma *xprt)
>  {
>  	unsigned int i;
>  
> -	/* Each RPC/RDMA credit can consume one Receive and
> -	 * one Send WQE at the same time.
> -	 */
> -	i = xprt->sc_sq_depth + xprt->sc_rq_depth;
> -
> +	i = xprt->sc_sq_depth;
>  	while (i--) {
>  		struct svc_rdma_op_ctxt *ctxt;
>  
> @@ -298,54 +293,6 @@ static void qp_event_handler(struct ib_event *event, void *context)
>  }
>  
>  /**
> - * svc_rdma_wc_receive - Invoked by RDMA provider for each polled Receive WC
> - * @cq:        completion queue
> - * @wc:        completed WR
> - *
> - */
> -static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
> -{
> -	struct svcxprt_rdma *xprt = cq->cq_context;
> -	struct ib_cqe *cqe = wc->wr_cqe;
> -	struct svc_rdma_op_ctxt *ctxt;
> -
> -	trace_svcrdma_wc_receive(wc);
> -
> -	/* WARNING: Only wc->wr_cqe and wc->status are reliable */
> -	ctxt = container_of(cqe, struct svc_rdma_op_ctxt, cqe);
> -	svc_rdma_unmap_dma(ctxt);
> -
> -	if (wc->status != IB_WC_SUCCESS)
> -		goto flushed;
> -
> -	/* All wc fields are now known to be valid */
> -	ctxt->byte_len = wc->byte_len;
> -	spin_lock(&xprt->sc_rq_dto_lock);
> -	list_add_tail(&ctxt->list, &xprt->sc_rq_dto_q);
> -	spin_unlock(&xprt->sc_rq_dto_lock);
> -
> -	svc_rdma_post_recv(xprt);
> -
> -	set_bit(XPT_DATA, &xprt->sc_xprt.xpt_flags);
> -	if (test_bit(RDMAXPRT_CONN_PENDING, &xprt->sc_flags))
> -		goto out;
> -	goto out_enqueue;
> -
> -flushed:
> -	if (wc->status != IB_WC_WR_FLUSH_ERR)
> -		pr_err("svcrdma: Recv: %s (%u/0x%x)\n",
> -		       ib_wc_status_msg(wc->status),
> -		       wc->status, wc->vendor_err);
> -	set_bit(XPT_CLOSE, &xprt->sc_xprt.xpt_flags);
> -	svc_rdma_put_context(ctxt, 1);
> -
> -out_enqueue:
> -	svc_xprt_enqueue(&xprt->sc_xprt);
> -out:
> -	svc_xprt_put(&xprt->sc_xprt);
> -}
> -
> -/**
>   * svc_rdma_wc_send - Invoked by RDMA provider for each polled Send WC
>   * @cq:        completion queue
>   * @wc:        completed WR
> @@ -392,12 +339,14 @@ static struct svcxprt_rdma *svc_rdma_create_xprt(struct svc_serv *serv,
>  	INIT_LIST_HEAD(&cma_xprt->sc_rq_dto_q);
>  	INIT_LIST_HEAD(&cma_xprt->sc_read_complete_q);
>  	INIT_LIST_HEAD(&cma_xprt->sc_ctxts);
> +	INIT_LIST_HEAD(&cma_xprt->sc_recv_ctxts);
>  	INIT_LIST_HEAD(&cma_xprt->sc_rw_ctxts);
>  	init_waitqueue_head(&cma_xprt->sc_send_wait);
>  
>  	spin_lock_init(&cma_xprt->sc_lock);
>  	spin_lock_init(&cma_xprt->sc_rq_dto_lock);
>  	spin_lock_init(&cma_xprt->sc_ctxt_lock);
> +	spin_lock_init(&cma_xprt->sc_recv_lock);
>  	spin_lock_init(&cma_xprt->sc_rw_ctxt_lock);
>  
>  	/*
> @@ -411,63 +360,6 @@ static struct svcxprt_rdma *svc_rdma_create_xprt(struct svc_serv *serv,
>  	return cma_xprt;
>  }
>  
> -static int
> -svc_rdma_post_recv(struct svcxprt_rdma *xprt)
> -{
> -	struct ib_recv_wr recv_wr, *bad_recv_wr;
> -	struct svc_rdma_op_ctxt *ctxt;
> -	struct page *page;
> -	dma_addr_t pa;
> -	int sge_no;
> -	int buflen;
> -	int ret;
> -
> -	ctxt = svc_rdma_get_context(xprt);
> -	buflen = 0;
> -	ctxt->direction = DMA_FROM_DEVICE;
> -	ctxt->cqe.done = svc_rdma_wc_receive;
> -	for (sge_no = 0; buflen < xprt->sc_max_req_size; sge_no++) {
> -		if (sge_no >= xprt->sc_max_sge) {
> -			pr_err("svcrdma: Too many sges (%d)\n", sge_no);
> -			goto err_put_ctxt;
> -		}
> -		page = alloc_page(GFP_KERNEL);
> -		if (!page)
> -			goto err_put_ctxt;
> -		ctxt->pages[sge_no] = page;
> -		pa = ib_dma_map_page(xprt->sc_cm_id->device,
> -				     page, 0, PAGE_SIZE,
> -				     DMA_FROM_DEVICE);
> -		if (ib_dma_mapping_error(xprt->sc_cm_id->device, pa))
> -			goto err_put_ctxt;
> -		svc_rdma_count_mappings(xprt, ctxt);
> -		ctxt->sge[sge_no].addr = pa;
> -		ctxt->sge[sge_no].length = PAGE_SIZE;
> -		ctxt->sge[sge_no].lkey = xprt->sc_pd->local_dma_lkey;
> -		ctxt->count = sge_no + 1;
> -		buflen += PAGE_SIZE;
> -	}
> -	recv_wr.next = NULL;
> -	recv_wr.sg_list = &ctxt->sge[0];
> -	recv_wr.num_sge = ctxt->count;
> -	recv_wr.wr_cqe = &ctxt->cqe;
> -
> -	svc_xprt_get(&xprt->sc_xprt);
> -	ret = ib_post_recv(xprt->sc_qp, &recv_wr, &bad_recv_wr);
> -	trace_svcrdma_post_recv(&recv_wr, ret);
> -	if (ret) {
> -		svc_rdma_unmap_dma(ctxt);
> -		svc_rdma_put_context(ctxt, 1);
> -		svc_xprt_put(&xprt->sc_xprt);
> -	}
> -	return ret;
> -
> - err_put_ctxt:
> -	svc_rdma_unmap_dma(ctxt);
> -	svc_rdma_put_context(ctxt, 1);
> -	return -ENOMEM;
> -}
> -
>  static void
>  svc_rdma_parse_connect_private(struct svcxprt_rdma *newxprt,
>  			       struct rdma_conn_param *param)
> @@ -699,7 +591,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
>  	struct ib_qp_init_attr qp_attr;
>  	struct ib_device *dev;
>  	struct sockaddr *sap;
> -	unsigned int i, ctxts;
> +	unsigned int ctxts;
>  	int ret = 0;
>  
>  	listen_rdma = container_of(xprt, struct svcxprt_rdma, sc_xprt);
> @@ -804,14 +696,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
>  	    !rdma_ib_or_roce(dev, newxprt->sc_port_num))
>  		goto errout;
>  
> -	/* Post receive buffers */
> -	for (i = 0; i < newxprt->sc_max_requests; i++) {
> -		ret = svc_rdma_post_recv(newxprt);
> -		if (ret) {
> -			dprintk("svcrdma: failure posting receive buffers\n");
> -			goto errout;
> -		}
> -	}
> +	if (!svc_rdma_post_recvs(newxprt))
> +		goto errout;
>  
>  	/* Swap out the handler */
>  	newxprt->sc_cm_id->event_handler = rdma_cma_handler;
> @@ -908,20 +794,7 @@ static void __svc_rdma_free(struct work_struct *work)
>  		pr_err("svcrdma: sc_xprt still in use? (%d)\n",
>  		       kref_read(&xprt->xpt_ref));
>  
> -	while (!list_empty(&rdma->sc_read_complete_q)) {
> -		struct svc_rdma_op_ctxt *ctxt;
> -		ctxt = list_first_entry(&rdma->sc_read_complete_q,
> -					struct svc_rdma_op_ctxt, list);
> -		list_del(&ctxt->list);
> -		svc_rdma_put_context(ctxt, 1);
> -	}
> -	while (!list_empty(&rdma->sc_rq_dto_q)) {
> -		struct svc_rdma_op_ctxt *ctxt;
> -		ctxt = list_first_entry(&rdma->sc_rq_dto_q,
> -					struct svc_rdma_op_ctxt, list);
> -		list_del(&ctxt->list);
> -		svc_rdma_put_context(ctxt, 1);
> -	}
> +	svc_rdma_flush_recv_queues(rdma);
>  
>  	/* Warn if we leaked a resource or under-referenced */
>  	if (rdma->sc_ctxt_used != 0)
> @@ -936,6 +809,7 @@ static void __svc_rdma_free(struct work_struct *work)
>  
>  	svc_rdma_destroy_rw_ctxts(rdma);
>  	svc_rdma_destroy_ctxts(rdma);
> +	svc_rdma_recv_ctxts_destroy(rdma);
>  
>  	/* Destroy the QP if present (not a listener) */
>  	if (rdma->sc_qp && !IS_ERR(rdma->sc_qp))

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 06/19] svcrdma: Introduce svc_rdma_recv_ctxt
  2018-05-09 20:48   ` J. Bruce Fields
@ 2018-05-09 21:02     ` Chuck Lever
  0 siblings, 0 replies; 29+ messages in thread
From: Chuck Lever @ 2018-05-09 21:02 UTC (permalink / raw)
  To: Bruce Fields; +Cc: linux-rdma, Linux NFS Mailing List



> On May 9, 2018, at 4:48 PM, J. Bruce Fields <bfields@fieldses.org> =
wrote:
>=20
> On Mon, May 07, 2018 at 03:27:21PM -0400, Chuck Lever wrote:
>> svc_rdma_op_ctxt's are pre-allocated and maintained on a per-xprt
>> free list. This eliminates the overhead of calling kmalloc / kfree,
>> both of which grab a globally shared lock that disables interrupts.
>> To reduce contention further, separate the use of these objects in
>> the Receive and Send paths in svcrdma.
>>=20
>> Subsequent patches will take advantage of this separation by
>> allocating real resources which are then cached in these objects.
>> The allocations are freed when the transport is torn down.
>=20
> Out of curiosity, about how much memory does that end up being per
> svc_xprt?

On the Receive side, the server keeps 32 Recv WRs posted by
default. This work does not change that.

Currently each Receive amounts to a svc_rdma_op_ctxt and a page.

After this patch series, it is a svc_rdma_read_ctxt (about the
same size) and a kmalloc'd buffer (by default, 4096 bytes).

Assuming 64-bit x86, in each svc_rdma_recv_ctxt:
- The page array is 258 * 8 bytes =3D 2064 bytes
- The sge array is 17 * 16 bytes =3D 272 bytes

The rest of the structure is around 128 bytes. The sge array
goes away in this series. However, the allocator is going to
round up to the next power of two, or 4096 bytes.

32 * 2 pages =3D a quarter megabyte per xprt, with default settings.

I suppose some of this (like the page array) could be moved to
svc_rdma_read_info, which is kmalloc'd on demand (for NFS WRITEs).


> --b.
>=20
>>=20
>> I've renamed the structure so that static type checking can be used
>> to ensure that uses of op_ctxt and recv_ctxt are not confused. As an
>> additional clean up, structure fields are renamed to conform with
>> kernel coding conventions.
>>=20
>> As a final clean up, helpers related to recv_ctxt are moved closer
>> to the functions that use them.
>>=20
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>> include/linux/sunrpc/svc_rdma.h          |   24 ++
>> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |  318 =
++++++++++++++++++++++++++----
>> net/sunrpc/xprtrdma/svc_rdma_rw.c        |   84 ++++----
>> net/sunrpc/xprtrdma/svc_rdma_sendto.c    |    2=20
>> net/sunrpc/xprtrdma/svc_rdma_transport.c |  142 +------------
>> 5 files changed, 349 insertions(+), 221 deletions(-)
>>=20
>> diff --git a/include/linux/sunrpc/svc_rdma.h =
b/include/linux/sunrpc/svc_rdma.h
>> index 88da0c9..37f759d 100644
>> --- a/include/linux/sunrpc/svc_rdma.h
>> +++ b/include/linux/sunrpc/svc_rdma.h
>> @@ -128,6 +128,9 @@ struct svcxprt_rdma {
>> 	unsigned long	     sc_flags;
>> 	struct list_head     sc_read_complete_q;
>> 	struct work_struct   sc_work;
>> +
>> +	spinlock_t	     sc_recv_lock;
>> +	struct list_head     sc_recv_ctxts;
>> };
>> /* sc_flags */
>> #define RDMAXPRT_CONN_PENDING	3
>> @@ -142,6 +145,19 @@ struct svcxprt_rdma {
>>=20
>> #define RPCSVC_MAXPAYLOAD_RDMA	RPCSVC_MAXPAYLOAD
>>=20
>> +struct svc_rdma_recv_ctxt {
>> +	struct list_head	rc_list;
>> +	struct ib_recv_wr	rc_recv_wr;
>> +	struct ib_cqe		rc_cqe;
>> +	struct xdr_buf		rc_arg;
>> +	u32			rc_byte_len;
>> +	unsigned int		rc_page_count;
>> +	unsigned int		rc_hdr_count;
>> +	struct ib_sge		rc_sges[1 +
>> +					RPCRDMA_MAX_INLINE_THRESH / =
PAGE_SIZE];
>> +	struct page		*rc_pages[RPCSVC_MAXPAGES];
>> +};
>> +
>> /* Track DMA maps for this transport and context */
>> static inline void svc_rdma_count_mappings(struct svcxprt_rdma *rdma,
>> 					   struct svc_rdma_op_ctxt =
*ctxt)
>> @@ -155,13 +171,19 @@ extern int svc_rdma_handle_bc_reply(struct =
rpc_xprt *xprt,
>> 				    struct xdr_buf *rcvbuf);
>>=20
>> /* svc_rdma_recvfrom.c */
>> +extern void svc_rdma_recv_ctxts_destroy(struct svcxprt_rdma *rdma);
>> +extern bool svc_rdma_post_recvs(struct svcxprt_rdma *rdma);
>> +extern void svc_rdma_recv_ctxt_put(struct svcxprt_rdma *rdma,
>> +				   struct svc_rdma_recv_ctxt *ctxt,
>> +				   int free_pages);
>> +extern void svc_rdma_flush_recv_queues(struct svcxprt_rdma *rdma);
>> extern int svc_rdma_recvfrom(struct svc_rqst *);
>>=20
>> /* svc_rdma_rw.c */
>> extern void svc_rdma_destroy_rw_ctxts(struct svcxprt_rdma *rdma);
>> extern int svc_rdma_recv_read_chunk(struct svcxprt_rdma *rdma,
>> 				    struct svc_rqst *rqstp,
>> -				    struct svc_rdma_op_ctxt *head, =
__be32 *p);
>> +				    struct svc_rdma_recv_ctxt *head, =
__be32 *p);
>> extern int svc_rdma_send_write_chunk(struct svcxprt_rdma *rdma,
>> 				     __be32 *wr_ch, struct xdr_buf =
*xdr);
>> extern int svc_rdma_send_reply_chunk(struct svcxprt_rdma *rdma,
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c =
b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> index 330d542..b7d9c55 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> @@ -1,6 +1,6 @@
>> // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
>> /*
>> - * Copyright (c) 2016, 2017 Oracle. All rights reserved.
>> + * Copyright (c) 2016-2018 Oracle. All rights reserved.
>>  * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
>>  * Copyright (c) 2005-2006 Network Appliance, Inc. All rights =
reserved.
>>  *
>> @@ -61,7 +61,7 @@
>>  * svc_rdma_recvfrom must post RDMA Reads to pull the RPC Call's
>>  * data payload from the client. svc_rdma_recvfrom sets up the
>>  * RDMA Reads using pages in svc_rqst::rq_pages, which are
>> - * transferred to an svc_rdma_op_ctxt for the duration of the
>> + * transferred to an svc_rdma_recv_ctxt for the duration of the
>>  * I/O. svc_rdma_recvfrom then returns zero, since the RPC message
>>  * is still not yet ready.
>>  *
>> @@ -70,18 +70,18 @@
>>  * svc_rdma_recvfrom again. This second call may use a different
>>  * svc_rqst than the first one, thus any information that needs
>>  * to be preserved across these two calls is kept in an
>> - * svc_rdma_op_ctxt.
>> + * svc_rdma_recv_ctxt.
>>  *
>>  * The second call to svc_rdma_recvfrom performs final assembly
>>  * of the RPC Call message, using the RDMA Read sink pages kept in
>> - * the svc_rdma_op_ctxt. The xdr_buf is copied from the
>> - * svc_rdma_op_ctxt to the second svc_rqst. The second call returns
>> + * the svc_rdma_recv_ctxt. The xdr_buf is copied from the
>> + * svc_rdma_recv_ctxt to the second svc_rqst. The second call =
returns
>>  * the length of the completed RPC Call message.
>>  *
>>  * Page Management
>>  *
>>  * Pages under I/O must be transferred from the first svc_rqst to an
>> - * svc_rdma_op_ctxt before the first svc_rdma_recvfrom call returns.
>> + * svc_rdma_recv_ctxt before the first svc_rdma_recvfrom call =
returns.
>>  *
>>  * The first svc_rqst supplies pages for RDMA Reads. These are moved
>>  * from rqstp::rq_pages into ctxt::pages. The consumed elements of
>> @@ -89,7 +89,7 @@
>>  * svc_rdma_recvfrom call returns.
>>  *
>>  * During the second svc_rdma_recvfrom call, RDMA Read sink pages
>> - * are transferred from the svc_rdma_op_ctxt to the second svc_rqst
>> + * are transferred from the svc_rdma_recv_ctxt to the second =
svc_rqst
>>  * (see rdma_read_complete() below).
>>  */
>>=20
>> @@ -108,13 +108,247 @@
>>=20
>> #define RPCDBG_FACILITY	RPCDBG_SVCXPRT
>>=20
>> +static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc);
>> +
>> +static inline struct svc_rdma_recv_ctxt *
>> +svc_rdma_next_recv_ctxt(struct list_head *list)
>> +{
>> +	return list_first_entry_or_null(list, struct svc_rdma_recv_ctxt,
>> +					rc_list);
>> +}
>> +
>> +/**
>> + * svc_rdma_recv_ctxts_destroy - Release all recv_ctxt's for an xprt
>> + * @rdma: svcxprt_rdma being torn down
>> + *
>> + */
>> +void svc_rdma_recv_ctxts_destroy(struct svcxprt_rdma *rdma)
>> +{
>> +	struct svc_rdma_recv_ctxt *ctxt;
>> +
>> +	while ((ctxt =3D svc_rdma_next_recv_ctxt(&rdma->sc_recv_ctxts))) =
{
>> +		list_del(&ctxt->rc_list);
>> +		kfree(ctxt);
>> +	}
>> +}
>> +
>> +static struct svc_rdma_recv_ctxt *
>> +svc_rdma_recv_ctxt_get(struct svcxprt_rdma *rdma)
>> +{
>> +	struct svc_rdma_recv_ctxt *ctxt;
>> +
>> +	spin_lock(&rdma->sc_recv_lock);
>> +	ctxt =3D svc_rdma_next_recv_ctxt(&rdma->sc_recv_ctxts);
>> +	if (!ctxt)
>> +		goto out_empty;
>> +	list_del(&ctxt->rc_list);
>> +	spin_unlock(&rdma->sc_recv_lock);
>> +
>> +out:
>> +	ctxt->rc_recv_wr.num_sge =3D 0;
>> +	ctxt->rc_page_count =3D 0;
>> +	return ctxt;
>> +
>> +out_empty:
>> +	spin_unlock(&rdma->sc_recv_lock);
>> +
>> +	ctxt =3D kmalloc(sizeof(*ctxt), GFP_KERNEL);
>> +	if (!ctxt)
>> +		return NULL;
>> +	goto out;
>> +}
>> +
>> +static void svc_rdma_recv_ctxt_unmap(struct svcxprt_rdma *rdma,
>> +				     struct svc_rdma_recv_ctxt *ctxt)
>> +{
>> +	struct ib_device *device =3D rdma->sc_cm_id->device;
>> +	int i;
>> +
>> +	for (i =3D 0; i < ctxt->rc_recv_wr.num_sge; i++)
>> +		ib_dma_unmap_page(device,
>> +				  ctxt->rc_sges[i].addr,
>> +				  ctxt->rc_sges[i].length,
>> +				  DMA_FROM_DEVICE);
>> +}
>> +
>> +/**
>> + * svc_rdma_recv_ctxt_put - Return recv_ctxt to free list
>> + * @rdma: controlling svcxprt_rdma
>> + * @ctxt: object to return to the free list
>> + * @free_pages: Non-zero if rc_pages should be freed
>> + *
>> + */
>> +void svc_rdma_recv_ctxt_put(struct svcxprt_rdma *rdma,
>> +			    struct svc_rdma_recv_ctxt *ctxt,
>> +			    int free_pages)
>> +{
>> +	unsigned int i;
>> +
>> +	if (free_pages)
>> +		for (i =3D 0; i < ctxt->rc_page_count; i++)
>> +			put_page(ctxt->rc_pages[i]);
>> +	spin_lock(&rdma->sc_recv_lock);
>> +	list_add(&ctxt->rc_list, &rdma->sc_recv_ctxts);
>> +	spin_unlock(&rdma->sc_recv_lock);
>> +}
>> +
>> +static int svc_rdma_post_recv(struct svcxprt_rdma *rdma)
>> +{
>> +	struct ib_device *device =3D rdma->sc_cm_id->device;
>> +	struct svc_rdma_recv_ctxt *ctxt;
>> +	struct ib_recv_wr *bad_recv_wr;
>> +	int sge_no, buflen, ret;
>> +	struct page *page;
>> +	dma_addr_t pa;
>> +
>> +	ctxt =3D svc_rdma_recv_ctxt_get(rdma);
>> +	if (!ctxt)
>> +		return -ENOMEM;
>> +
>> +	buflen =3D 0;
>> +	ctxt->rc_cqe.done =3D svc_rdma_wc_receive;
>> +	for (sge_no =3D 0; buflen < rdma->sc_max_req_size; sge_no++) {
>> +		if (sge_no >=3D rdma->sc_max_sge) {
>> +			pr_err("svcrdma: Too many sges (%d)\n", sge_no);
>> +			goto err_put_ctxt;
>> +		}
>> +
>> +		page =3D alloc_page(GFP_KERNEL);
>> +		if (!page)
>> +			goto err_put_ctxt;
>> +		ctxt->rc_pages[sge_no] =3D page;
>> +		ctxt->rc_page_count++;
>> +
>> +		pa =3D ib_dma_map_page(device, ctxt->rc_pages[sge_no],
>> +				     0, PAGE_SIZE, DMA_FROM_DEVICE);
>> +		if (ib_dma_mapping_error(device, pa))
>> +			goto err_put_ctxt;
>> +		ctxt->rc_sges[sge_no].addr =3D pa;
>> +		ctxt->rc_sges[sge_no].length =3D PAGE_SIZE;
>> +		ctxt->rc_sges[sge_no].lkey =3D =
rdma->sc_pd->local_dma_lkey;
>> +		ctxt->rc_recv_wr.num_sge++;
>> +
>> +		buflen +=3D PAGE_SIZE;
>> +	}
>> +	ctxt->rc_recv_wr.next =3D NULL;
>> +	ctxt->rc_recv_wr.sg_list =3D &ctxt->rc_sges[0];
>> +	ctxt->rc_recv_wr.wr_cqe =3D &ctxt->rc_cqe;
>> +
>> +	svc_xprt_get(&rdma->sc_xprt);
>> +	ret =3D ib_post_recv(rdma->sc_qp, &ctxt->rc_recv_wr, =
&bad_recv_wr);
>> +	trace_svcrdma_post_recv(&ctxt->rc_recv_wr, ret);
>> +	if (ret)
>> +		goto err_post;
>> +	return 0;
>> +
>> +err_put_ctxt:
>> +	svc_rdma_recv_ctxt_unmap(rdma, ctxt);
>> +	svc_rdma_recv_ctxt_put(rdma, ctxt, 1);
>> +	return -ENOMEM;
>> +err_post:
>> +	svc_rdma_recv_ctxt_unmap(rdma, ctxt);
>> +	svc_rdma_recv_ctxt_put(rdma, ctxt, 1);
>> +	svc_xprt_put(&rdma->sc_xprt);
>> +	return ret;
>> +}
>> +
>> +/**
>> + * svc_rdma_post_recvs - Post initial set of Recv WRs
>> + * @rdma: fresh svcxprt_rdma
>> + *
>> + * Returns true if successful, otherwise false.
>> + */
>> +bool svc_rdma_post_recvs(struct svcxprt_rdma *rdma)
>> +{
>> +	unsigned int i;
>> +	int ret;
>> +
>> +	for (i =3D 0; i < rdma->sc_max_requests; i++) {
>> +		ret =3D svc_rdma_post_recv(rdma);
>> +		if (ret) {
>> +			pr_err("svcrdma: failure posting recv buffers: =
%d\n",
>> +			       ret);
>> +			return false;
>> +		}
>> +	}
>> +	return true;
>> +}
>> +
>> +/**
>> + * svc_rdma_wc_receive - Invoked by RDMA provider for each polled =
Receive WC
>> + * @cq: Completion Queue context
>> + * @wc: Work Completion object
>> + *
>> + * NB: The svc_xprt/svcxprt_rdma is pinned whenever it's possible =
that
>> + * the Receive completion handler could be running.
>> + */
>> +static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
>> +{
>> +	struct svcxprt_rdma *rdma =3D cq->cq_context;
>> +	struct ib_cqe *cqe =3D wc->wr_cqe;
>> +	struct svc_rdma_recv_ctxt *ctxt;
>> +
>> +	trace_svcrdma_wc_receive(wc);
>> +
>> +	/* WARNING: Only wc->wr_cqe and wc->status are reliable */
>> +	ctxt =3D container_of(cqe, struct svc_rdma_recv_ctxt, rc_cqe);
>> +	svc_rdma_recv_ctxt_unmap(rdma, ctxt);
>> +
>> +	if (wc->status !=3D IB_WC_SUCCESS)
>> +		goto flushed;
>> +
>> +	if (svc_rdma_post_recv(rdma))
>> +		goto post_err;
>> +
>> +	/* All wc fields are now known to be valid */
>> +	ctxt->rc_byte_len =3D wc->byte_len;
>> +	spin_lock(&rdma->sc_rq_dto_lock);
>> +	list_add_tail(&ctxt->rc_list, &rdma->sc_rq_dto_q);
>> +	spin_unlock(&rdma->sc_rq_dto_lock);
>> +	set_bit(XPT_DATA, &rdma->sc_xprt.xpt_flags);
>> +	if (!test_bit(RDMAXPRT_CONN_PENDING, &rdma->sc_flags))
>> +		svc_xprt_enqueue(&rdma->sc_xprt);
>> +	goto out;
>> +
>> +flushed:
>> +	if (wc->status !=3D IB_WC_WR_FLUSH_ERR)
>> +		pr_err("svcrdma: Recv: %s (%u/0x%x)\n",
>> +		       ib_wc_status_msg(wc->status),
>> +		       wc->status, wc->vendor_err);
>> +post_err:
>> +	svc_rdma_recv_ctxt_put(rdma, ctxt, 1);
>> +	set_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags);
>> +	svc_xprt_enqueue(&rdma->sc_xprt);
>> +out:
>> +	svc_xprt_put(&rdma->sc_xprt);
>> +}
>> +
>> +/**
>> + * svc_rdma_flush_recv_queues - Drain pending Receive work
>> + * @rdma: svcxprt_rdma being shut down
>> + *
>> + */
>> +void svc_rdma_flush_recv_queues(struct svcxprt_rdma *rdma)
>> +{
>> +	struct svc_rdma_recv_ctxt *ctxt;
>> +
>> +	while ((ctxt =3D =
svc_rdma_next_recv_ctxt(&rdma->sc_read_complete_q))) {
>> +		list_del(&ctxt->rc_list);
>> +		svc_rdma_recv_ctxt_put(rdma, ctxt, 1);
>> +	}
>> +	while ((ctxt =3D svc_rdma_next_recv_ctxt(&rdma->sc_rq_dto_q))) {
>> +		list_del(&ctxt->rc_list);
>> +		svc_rdma_recv_ctxt_put(rdma, ctxt, 1);
>> +	}
>> +}
>> +
>> /*
>>  * Replace the pages in the rq_argpages array with the pages from the =
SGE in
>>  * the RDMA_RECV completion. The SGL should contain full pages up =
until the
>>  * last one.
>>  */
>> static void svc_rdma_build_arg_xdr(struct svc_rqst *rqstp,
>> -				   struct svc_rdma_op_ctxt *ctxt)
>> +				   struct svc_rdma_recv_ctxt *ctxt)
>> {
>> 	struct page *page;
>> 	int sge_no;
>> @@ -123,30 +357,30 @@ static void svc_rdma_build_arg_xdr(struct =
svc_rqst *rqstp,
>> 	/* The reply path assumes the Call's transport header resides
>> 	 * in rqstp->rq_pages[0].
>> 	 */
>> -	page =3D ctxt->pages[0];
>> +	page =3D ctxt->rc_pages[0];
>> 	put_page(rqstp->rq_pages[0]);
>> 	rqstp->rq_pages[0] =3D page;
>>=20
>> 	/* Set up the XDR head */
>> 	rqstp->rq_arg.head[0].iov_base =3D page_address(page);
>> 	rqstp->rq_arg.head[0].iov_len =3D
>> -		min_t(size_t, ctxt->byte_len, ctxt->sge[0].length);
>> -	rqstp->rq_arg.len =3D ctxt->byte_len;
>> -	rqstp->rq_arg.buflen =3D ctxt->byte_len;
>> +		min_t(size_t, ctxt->rc_byte_len, =
ctxt->rc_sges[0].length);
>> +	rqstp->rq_arg.len =3D ctxt->rc_byte_len;
>> +	rqstp->rq_arg.buflen =3D ctxt->rc_byte_len;
>>=20
>> 	/* Compute bytes past head in the SGL */
>> -	len =3D ctxt->byte_len - rqstp->rq_arg.head[0].iov_len;
>> +	len =3D ctxt->rc_byte_len - rqstp->rq_arg.head[0].iov_len;
>>=20
>> 	/* If data remains, store it in the pagelist */
>> 	rqstp->rq_arg.page_len =3D len;
>> 	rqstp->rq_arg.page_base =3D 0;
>>=20
>> 	sge_no =3D 1;
>> -	while (len && sge_no < ctxt->count) {
>> -		page =3D ctxt->pages[sge_no];
>> +	while (len && sge_no < ctxt->rc_recv_wr.num_sge) {
>> +		page =3D ctxt->rc_pages[sge_no];
>> 		put_page(rqstp->rq_pages[sge_no]);
>> 		rqstp->rq_pages[sge_no] =3D page;
>> -		len -=3D min_t(u32, len, ctxt->sge[sge_no].length);
>> +		len -=3D min_t(u32, len, ctxt->rc_sges[sge_no].length);
>> 		sge_no++;
>> 	}
>> 	rqstp->rq_respages =3D &rqstp->rq_pages[sge_no];
>> @@ -154,11 +388,11 @@ static void svc_rdma_build_arg_xdr(struct =
svc_rqst *rqstp,
>>=20
>> 	/* If not all pages were used from the SGL, free the remaining =
ones */
>> 	len =3D sge_no;
>> -	while (sge_no < ctxt->count) {
>> -		page =3D ctxt->pages[sge_no++];
>> +	while (sge_no < ctxt->rc_recv_wr.num_sge) {
>> +		page =3D ctxt->rc_pages[sge_no++];
>> 		put_page(page);
>> 	}
>> -	ctxt->count =3D len;
>> +	ctxt->rc_page_count =3D len;
>>=20
>> 	/* Set up tail */
>> 	rqstp->rq_arg.tail[0].iov_base =3D NULL;
>> @@ -364,29 +598,29 @@ static int svc_rdma_xdr_decode_req(struct =
xdr_buf *rq_arg)
>> }
>>=20
>> static void rdma_read_complete(struct svc_rqst *rqstp,
>> -			       struct svc_rdma_op_ctxt *head)
>> +			       struct svc_rdma_recv_ctxt *head)
>> {
>> 	int page_no;
>>=20
>> 	/* Copy RPC pages */
>> -	for (page_no =3D 0; page_no < head->count; page_no++) {
>> +	for (page_no =3D 0; page_no < head->rc_page_count; page_no++) {
>> 		put_page(rqstp->rq_pages[page_no]);
>> -		rqstp->rq_pages[page_no] =3D head->pages[page_no];
>> +		rqstp->rq_pages[page_no] =3D head->rc_pages[page_no];
>> 	}
>>=20
>> 	/* Point rq_arg.pages past header */
>> -	rqstp->rq_arg.pages =3D &rqstp->rq_pages[head->hdr_count];
>> -	rqstp->rq_arg.page_len =3D head->arg.page_len;
>> +	rqstp->rq_arg.pages =3D &rqstp->rq_pages[head->rc_hdr_count];
>> +	rqstp->rq_arg.page_len =3D head->rc_arg.page_len;
>>=20
>> 	/* rq_respages starts after the last arg page */
>> 	rqstp->rq_respages =3D &rqstp->rq_pages[page_no];
>> 	rqstp->rq_next_page =3D rqstp->rq_respages + 1;
>>=20
>> 	/* Rebuild rq_arg head and tail. */
>> -	rqstp->rq_arg.head[0] =3D head->arg.head[0];
>> -	rqstp->rq_arg.tail[0] =3D head->arg.tail[0];
>> -	rqstp->rq_arg.len =3D head->arg.len;
>> -	rqstp->rq_arg.buflen =3D head->arg.buflen;
>> +	rqstp->rq_arg.head[0] =3D head->rc_arg.head[0];
>> +	rqstp->rq_arg.tail[0] =3D head->rc_arg.tail[0];
>> +	rqstp->rq_arg.len =3D head->rc_arg.len;
>> +	rqstp->rq_arg.buflen =3D head->rc_arg.buflen;
>> }
>>=20
>> static void svc_rdma_send_error(struct svcxprt_rdma *xprt,
>> @@ -506,28 +740,26 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
>> 	struct svc_xprt *xprt =3D rqstp->rq_xprt;
>> 	struct svcxprt_rdma *rdma_xprt =3D
>> 		container_of(xprt, struct svcxprt_rdma, sc_xprt);
>> -	struct svc_rdma_op_ctxt *ctxt;
>> +	struct svc_rdma_recv_ctxt *ctxt;
>> 	__be32 *p;
>> 	int ret;
>>=20
>> 	spin_lock(&rdma_xprt->sc_rq_dto_lock);
>> -	if (!list_empty(&rdma_xprt->sc_read_complete_q)) {
>> -		ctxt =3D =
list_first_entry(&rdma_xprt->sc_read_complete_q,
>> -					struct svc_rdma_op_ctxt, list);
>> -		list_del(&ctxt->list);
>> +	ctxt =3D =
svc_rdma_next_recv_ctxt(&rdma_xprt->sc_read_complete_q);
>> +	if (ctxt) {
>> +		list_del(&ctxt->rc_list);
>> 		spin_unlock(&rdma_xprt->sc_rq_dto_lock);
>> 		rdma_read_complete(rqstp, ctxt);
>> 		goto complete;
>> -	} else if (!list_empty(&rdma_xprt->sc_rq_dto_q)) {
>> -		ctxt =3D list_first_entry(&rdma_xprt->sc_rq_dto_q,
>> -					struct svc_rdma_op_ctxt, list);
>> -		list_del(&ctxt->list);
>> -	} else {
>> +	}
>> +	ctxt =3D svc_rdma_next_recv_ctxt(&rdma_xprt->sc_rq_dto_q);
>> +	if (!ctxt) {
>> 		/* No new incoming requests, terminate the loop */
>> 		clear_bit(XPT_DATA, &xprt->xpt_flags);
>> 		spin_unlock(&rdma_xprt->sc_rq_dto_lock);
>> 		return 0;
>> 	}
>> +	list_del(&ctxt->rc_list);
>> 	spin_unlock(&rdma_xprt->sc_rq_dto_lock);
>>=20
>> 	atomic_inc(&rdma_stat_recv);
>> @@ -545,7 +777,7 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
>> 	if (svc_rdma_is_backchannel_reply(xprt, p)) {
>> 		ret =3D svc_rdma_handle_bc_reply(xprt->xpt_bc_xprt, p,
>> 					       &rqstp->rq_arg);
>> -		svc_rdma_put_context(ctxt, 0);
>> +		svc_rdma_recv_ctxt_put(rdma_xprt, ctxt, 0);
>> 		return ret;
>> 	}
>>=20
>> @@ -554,7 +786,7 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
>> 		goto out_readchunk;
>>=20
>> complete:
>> -	svc_rdma_put_context(ctxt, 0);
>> +	svc_rdma_recv_ctxt_put(rdma_xprt, ctxt, 0);
>> 	rqstp->rq_prot =3D IPPROTO_MAX;
>> 	svc_xprt_copy_addrs(rqstp, xprt);
>> 	return rqstp->rq_arg.len;
>> @@ -567,16 +799,16 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
>>=20
>> out_err:
>> 	svc_rdma_send_error(rdma_xprt, p, ret);
>> -	svc_rdma_put_context(ctxt, 0);
>> +	svc_rdma_recv_ctxt_put(rdma_xprt, ctxt, 0);
>> 	return 0;
>>=20
>> out_postfail:
>> 	if (ret =3D=3D -EINVAL)
>> 		svc_rdma_send_error(rdma_xprt, p, ret);
>> -	svc_rdma_put_context(ctxt, 1);
>> +	svc_rdma_recv_ctxt_put(rdma_xprt, ctxt, 1);
>> 	return ret;
>>=20
>> out_drop:
>> -	svc_rdma_put_context(ctxt, 1);
>> +	svc_rdma_recv_ctxt_put(rdma_xprt, ctxt, 1);
>> 	return 0;
>> }
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c =
b/net/sunrpc/xprtrdma/svc_rdma_rw.c
>> index 887ceef..c080ce2 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
>> @@ -1,6 +1,6 @@
>> // SPDX-License-Identifier: GPL-2.0
>> /*
>> - * Copyright (c) 2016 Oracle.  All rights reserved.
>> + * Copyright (c) 2016-2018 Oracle.  All rights reserved.
>>  *
>>  * Use the core R/W API to move RPC-over-RDMA Read and Write chunks.
>>  */
>> @@ -227,7 +227,7 @@ static void svc_rdma_write_done(struct ib_cq *cq, =
struct ib_wc *wc)
>> /* State for pulling a Read chunk.
>>  */
>> struct svc_rdma_read_info {
>> -	struct svc_rdma_op_ctxt		*ri_readctxt;
>> +	struct svc_rdma_recv_ctxt	*ri_readctxt;
>> 	unsigned int			ri_position;
>> 	unsigned int			ri_pageno;
>> 	unsigned int			ri_pageoff;
>> @@ -282,10 +282,10 @@ static void svc_rdma_wc_read_done(struct ib_cq =
*cq, struct ib_wc *wc)
>> 			pr_err("svcrdma: read ctx: %s (%u/0x%x)\n",
>> 			       ib_wc_status_msg(wc->status),
>> 			       wc->status, wc->vendor_err);
>> -		svc_rdma_put_context(info->ri_readctxt, 1);
>> +		svc_rdma_recv_ctxt_put(rdma, info->ri_readctxt, 1);
>> 	} else {
>> 		spin_lock(&rdma->sc_rq_dto_lock);
>> -		list_add_tail(&info->ri_readctxt->list,
>> +		list_add_tail(&info->ri_readctxt->rc_list,
>> 			      &rdma->sc_read_complete_q);
>> 		spin_unlock(&rdma->sc_rq_dto_lock);
>>=20
>> @@ -607,7 +607,7 @@ static int svc_rdma_build_read_segment(struct =
svc_rdma_read_info *info,
>> 				       struct svc_rqst *rqstp,
>> 				       u32 rkey, u32 len, u64 offset)
>> {
>> -	struct svc_rdma_op_ctxt *head =3D info->ri_readctxt;
>> +	struct svc_rdma_recv_ctxt *head =3D info->ri_readctxt;
>> 	struct svc_rdma_chunk_ctxt *cc =3D &info->ri_cc;
>> 	struct svc_rdma_rw_ctxt *ctxt;
>> 	unsigned int sge_no, seg_len;
>> @@ -625,10 +625,10 @@ static int svc_rdma_build_read_segment(struct =
svc_rdma_read_info *info,
>> 		seg_len =3D min_t(unsigned int, len,
>> 				PAGE_SIZE - info->ri_pageoff);
>>=20
>> -		head->arg.pages[info->ri_pageno] =3D
>> +		head->rc_arg.pages[info->ri_pageno] =3D
>> 			rqstp->rq_pages[info->ri_pageno];
>> 		if (!info->ri_pageoff)
>> -			head->count++;
>> +			head->rc_page_count++;
>>=20
>> 		sg_set_page(sg, rqstp->rq_pages[info->ri_pageno],
>> 			    seg_len, info->ri_pageoff);
>> @@ -705,9 +705,9 @@ static int svc_rdma_build_read_chunk(struct =
svc_rqst *rqstp,
>> }
>>=20
>> /* Construct RDMA Reads to pull over a normal Read chunk. The chunk
>> - * data lands in the page list of head->arg.pages.
>> + * data lands in the page list of head->rc_arg.pages.
>>  *
>> - * Currently NFSD does not look at the head->arg.tail[0] iovec.
>> + * Currently NFSD does not look at the head->rc_arg.tail[0] iovec.
>>  * Therefore, XDR round-up of the Read chunk and trailing
>>  * inline content must both be added at the end of the pagelist.
>>  */
>> @@ -715,10 +715,10 @@ static int =
svc_rdma_build_normal_read_chunk(struct svc_rqst *rqstp,
>> 					    struct svc_rdma_read_info =
*info,
>> 					    __be32 *p)
>> {
>> -	struct svc_rdma_op_ctxt *head =3D info->ri_readctxt;
>> +	struct svc_rdma_recv_ctxt *head =3D info->ri_readctxt;
>> 	int ret;
>>=20
>> -	info->ri_pageno =3D head->hdr_count;
>> +	info->ri_pageno =3D head->rc_hdr_count;
>> 	info->ri_pageoff =3D 0;
>>=20
>> 	ret =3D svc_rdma_build_read_chunk(rqstp, info, p);
>> @@ -732,11 +732,11 @@ static int =
svc_rdma_build_normal_read_chunk(struct svc_rqst *rqstp,
>> 	 * chunk is not included in either the pagelist or in
>> 	 * the tail.
>> 	 */
>> -	head->arg.tail[0].iov_base =3D
>> -		head->arg.head[0].iov_base + info->ri_position;
>> -	head->arg.tail[0].iov_len =3D
>> -		head->arg.head[0].iov_len - info->ri_position;
>> -	head->arg.head[0].iov_len =3D info->ri_position;
>> +	head->rc_arg.tail[0].iov_base =3D
>> +		head->rc_arg.head[0].iov_base + info->ri_position;
>> +	head->rc_arg.tail[0].iov_len =3D
>> +		head->rc_arg.head[0].iov_len - info->ri_position;
>> +	head->rc_arg.head[0].iov_len =3D info->ri_position;
>>=20
>> 	/* Read chunk may need XDR roundup (see RFC 8166, s. 3.4.5.2).
>> 	 *
>> @@ -749,9 +749,9 @@ static int =
svc_rdma_build_normal_read_chunk(struct svc_rqst *rqstp,
>> 	 */
>> 	info->ri_chunklen =3D XDR_QUADLEN(info->ri_chunklen) << 2;
>>=20
>> -	head->arg.page_len =3D info->ri_chunklen;
>> -	head->arg.len +=3D info->ri_chunklen;
>> -	head->arg.buflen +=3D info->ri_chunklen;
>> +	head->rc_arg.page_len =3D info->ri_chunklen;
>> +	head->rc_arg.len +=3D info->ri_chunklen;
>> +	head->rc_arg.buflen +=3D info->ri_chunklen;
>>=20
>> out:
>> 	return ret;
>> @@ -760,7 +760,7 @@ static int =
svc_rdma_build_normal_read_chunk(struct svc_rqst *rqstp,
>> /* Construct RDMA Reads to pull over a Position Zero Read chunk.
>>  * The start of the data lands in the first page just after
>>  * the Transport header, and the rest lands in the page list of
>> - * head->arg.pages.
>> + * head->rc_arg.pages.
>>  *
>>  * Assumptions:
>>  *	- A PZRC has an XDR-aligned length (no implicit round-up).
>> @@ -772,11 +772,11 @@ static int svc_rdma_build_pz_read_chunk(struct =
svc_rqst *rqstp,
>> 					struct svc_rdma_read_info *info,
>> 					__be32 *p)
>> {
>> -	struct svc_rdma_op_ctxt *head =3D info->ri_readctxt;
>> +	struct svc_rdma_recv_ctxt *head =3D info->ri_readctxt;
>> 	int ret;
>>=20
>> -	info->ri_pageno =3D head->hdr_count - 1;
>> -	info->ri_pageoff =3D offset_in_page(head->byte_len);
>> +	info->ri_pageno =3D head->rc_hdr_count - 1;
>> +	info->ri_pageoff =3D offset_in_page(head->rc_byte_len);
>>=20
>> 	ret =3D svc_rdma_build_read_chunk(rqstp, info, p);
>> 	if (ret < 0)
>> @@ -784,22 +784,22 @@ static int svc_rdma_build_pz_read_chunk(struct =
svc_rqst *rqstp,
>>=20
>> 	trace_svcrdma_encode_pzr(info->ri_chunklen);
>>=20
>> -	head->arg.len +=3D info->ri_chunklen;
>> -	head->arg.buflen +=3D info->ri_chunklen;
>> +	head->rc_arg.len +=3D info->ri_chunklen;
>> +	head->rc_arg.buflen +=3D info->ri_chunklen;
>>=20
>> -	if (head->arg.buflen <=3D head->sge[0].length) {
>> +	if (head->rc_arg.buflen <=3D head->rc_sges[0].length) {
>> 		/* Transport header and RPC message fit entirely
>> 		 * in page where head iovec resides.
>> 		 */
>> -		head->arg.head[0].iov_len =3D info->ri_chunklen;
>> +		head->rc_arg.head[0].iov_len =3D info->ri_chunklen;
>> 	} else {
>> 		/* Transport header and part of RPC message reside
>> 		 * in the head iovec's page.
>> 		 */
>> -		head->arg.head[0].iov_len =3D
>> -				head->sge[0].length - head->byte_len;
>> -		head->arg.page_len =3D
>> -				info->ri_chunklen - =
head->arg.head[0].iov_len;
>> +		head->rc_arg.head[0].iov_len =3D
>> +			head->rc_sges[0].length - head->rc_byte_len;
>> +		head->rc_arg.page_len =3D
>> +			info->ri_chunklen - =
head->rc_arg.head[0].iov_len;
>> 	}
>>=20
>> out:
>> @@ -824,24 +824,24 @@ static int svc_rdma_build_pz_read_chunk(struct =
svc_rqst *rqstp,
>>  * - All Read segments in @p have the same Position value.
>>  */
>> int svc_rdma_recv_read_chunk(struct svcxprt_rdma *rdma, struct =
svc_rqst *rqstp,
>> -			     struct svc_rdma_op_ctxt *head, __be32 *p)
>> +			     struct svc_rdma_recv_ctxt *head, __be32 *p)
>> {
>> 	struct svc_rdma_read_info *info;
>> 	struct page **page;
>> 	int ret;
>>=20
>> 	/* The request (with page list) is constructed in
>> -	 * head->arg. Pages involved with RDMA Read I/O are
>> +	 * head->rc_arg. Pages involved with RDMA Read I/O are
>> 	 * transferred there.
>> 	 */
>> -	head->hdr_count =3D head->count;
>> -	head->arg.head[0] =3D rqstp->rq_arg.head[0];
>> -	head->arg.tail[0] =3D rqstp->rq_arg.tail[0];
>> -	head->arg.pages =3D head->pages;
>> -	head->arg.page_base =3D 0;
>> -	head->arg.page_len =3D 0;
>> -	head->arg.len =3D rqstp->rq_arg.len;
>> -	head->arg.buflen =3D rqstp->rq_arg.buflen;
>> +	head->rc_hdr_count =3D head->rc_page_count;
>> +	head->rc_arg.head[0] =3D rqstp->rq_arg.head[0];
>> +	head->rc_arg.tail[0] =3D rqstp->rq_arg.tail[0];
>> +	head->rc_arg.pages =3D head->rc_pages;
>> +	head->rc_arg.page_base =3D 0;
>> +	head->rc_arg.page_len =3D 0;
>> +	head->rc_arg.len =3D rqstp->rq_arg.len;
>> +	head->rc_arg.buflen =3D rqstp->rq_arg.buflen;
>>=20
>> 	info =3D svc_rdma_read_info_alloc(rdma);
>> 	if (!info)
>> @@ -867,7 +867,7 @@ int svc_rdma_recv_read_chunk(struct svcxprt_rdma =
*rdma, struct svc_rqst *rqstp,
>>=20
>> out:
>> 	/* Read sink pages have been moved from rqstp->rq_pages to
>> -	 * head->arg.pages. Force svc_recv to refill those slots
>> +	 * head->rc_arg.pages. Force svc_recv to refill those slots
>> 	 * in rq_pages.
>> 	 */
>> 	for (page =3D rqstp->rq_pages; page < rqstp->rq_respages; =
page++)
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c =
b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>> index fed28de..a397d9a 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>> @@ -1,6 +1,6 @@
>> // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
>> /*
>> - * Copyright (c) 2016 Oracle. All rights reserved.
>> + * Copyright (c) 2016-2018 Oracle. All rights reserved.
>>  * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
>>  * Copyright (c) 2005-2006 Network Appliance, Inc. All rights =
reserved.
>>  *
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c =
b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> index ca9001d..afd5e61 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> @@ -63,7 +63,6 @@
>>=20
>> #define RPCDBG_FACILITY	RPCDBG_SVCXPRT
>>=20
>> -static int svc_rdma_post_recv(struct svcxprt_rdma *xprt);
>> static struct svcxprt_rdma *svc_rdma_create_xprt(struct svc_serv =
*serv,
>> 						 struct net *net);
>> static struct svc_xprt *svc_rdma_create(struct svc_serv *serv,
>> @@ -175,11 +174,7 @@ static bool svc_rdma_prealloc_ctxts(struct =
svcxprt_rdma *xprt)
>> {
>> 	unsigned int i;
>>=20
>> -	/* Each RPC/RDMA credit can consume one Receive and
>> -	 * one Send WQE at the same time.
>> -	 */
>> -	i =3D xprt->sc_sq_depth + xprt->sc_rq_depth;
>> -
>> +	i =3D xprt->sc_sq_depth;
>> 	while (i--) {
>> 		struct svc_rdma_op_ctxt *ctxt;
>>=20
>> @@ -298,54 +293,6 @@ static void qp_event_handler(struct ib_event =
*event, void *context)
>> }
>>=20
>> /**
>> - * svc_rdma_wc_receive - Invoked by RDMA provider for each polled =
Receive WC
>> - * @cq:        completion queue
>> - * @wc:        completed WR
>> - *
>> - */
>> -static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
>> -{
>> -	struct svcxprt_rdma *xprt =3D cq->cq_context;
>> -	struct ib_cqe *cqe =3D wc->wr_cqe;
>> -	struct svc_rdma_op_ctxt *ctxt;
>> -
>> -	trace_svcrdma_wc_receive(wc);
>> -
>> -	/* WARNING: Only wc->wr_cqe and wc->status are reliable */
>> -	ctxt =3D container_of(cqe, struct svc_rdma_op_ctxt, cqe);
>> -	svc_rdma_unmap_dma(ctxt);
>> -
>> -	if (wc->status !=3D IB_WC_SUCCESS)
>> -		goto flushed;
>> -
>> -	/* All wc fields are now known to be valid */
>> -	ctxt->byte_len =3D wc->byte_len;
>> -	spin_lock(&xprt->sc_rq_dto_lock);
>> -	list_add_tail(&ctxt->list, &xprt->sc_rq_dto_q);
>> -	spin_unlock(&xprt->sc_rq_dto_lock);
>> -
>> -	svc_rdma_post_recv(xprt);
>> -
>> -	set_bit(XPT_DATA, &xprt->sc_xprt.xpt_flags);
>> -	if (test_bit(RDMAXPRT_CONN_PENDING, &xprt->sc_flags))
>> -		goto out;
>> -	goto out_enqueue;
>> -
>> -flushed:
>> -	if (wc->status !=3D IB_WC_WR_FLUSH_ERR)
>> -		pr_err("svcrdma: Recv: %s (%u/0x%x)\n",
>> -		       ib_wc_status_msg(wc->status),
>> -		       wc->status, wc->vendor_err);
>> -	set_bit(XPT_CLOSE, &xprt->sc_xprt.xpt_flags);
>> -	svc_rdma_put_context(ctxt, 1);
>> -
>> -out_enqueue:
>> -	svc_xprt_enqueue(&xprt->sc_xprt);
>> -out:
>> -	svc_xprt_put(&xprt->sc_xprt);
>> -}
>> -
>> -/**
>>  * svc_rdma_wc_send - Invoked by RDMA provider for each polled Send =
WC
>>  * @cq:        completion queue
>>  * @wc:        completed WR
>> @@ -392,12 +339,14 @@ static struct svcxprt_rdma =
*svc_rdma_create_xprt(struct svc_serv *serv,
>> 	INIT_LIST_HEAD(&cma_xprt->sc_rq_dto_q);
>> 	INIT_LIST_HEAD(&cma_xprt->sc_read_complete_q);
>> 	INIT_LIST_HEAD(&cma_xprt->sc_ctxts);
>> +	INIT_LIST_HEAD(&cma_xprt->sc_recv_ctxts);
>> 	INIT_LIST_HEAD(&cma_xprt->sc_rw_ctxts);
>> 	init_waitqueue_head(&cma_xprt->sc_send_wait);
>>=20
>> 	spin_lock_init(&cma_xprt->sc_lock);
>> 	spin_lock_init(&cma_xprt->sc_rq_dto_lock);
>> 	spin_lock_init(&cma_xprt->sc_ctxt_lock);
>> +	spin_lock_init(&cma_xprt->sc_recv_lock);
>> 	spin_lock_init(&cma_xprt->sc_rw_ctxt_lock);
>>=20
>> 	/*
>> @@ -411,63 +360,6 @@ static struct svcxprt_rdma =
*svc_rdma_create_xprt(struct svc_serv *serv,
>> 	return cma_xprt;
>> }
>>=20
>> -static int
>> -svc_rdma_post_recv(struct svcxprt_rdma *xprt)
>> -{
>> -	struct ib_recv_wr recv_wr, *bad_recv_wr;
>> -	struct svc_rdma_op_ctxt *ctxt;
>> -	struct page *page;
>> -	dma_addr_t pa;
>> -	int sge_no;
>> -	int buflen;
>> -	int ret;
>> -
>> -	ctxt =3D svc_rdma_get_context(xprt);
>> -	buflen =3D 0;
>> -	ctxt->direction =3D DMA_FROM_DEVICE;
>> -	ctxt->cqe.done =3D svc_rdma_wc_receive;
>> -	for (sge_no =3D 0; buflen < xprt->sc_max_req_size; sge_no++) {
>> -		if (sge_no >=3D xprt->sc_max_sge) {
>> -			pr_err("svcrdma: Too many sges (%d)\n", sge_no);
>> -			goto err_put_ctxt;
>> -		}
>> -		page =3D alloc_page(GFP_KERNEL);
>> -		if (!page)
>> -			goto err_put_ctxt;
>> -		ctxt->pages[sge_no] =3D page;
>> -		pa =3D ib_dma_map_page(xprt->sc_cm_id->device,
>> -				     page, 0, PAGE_SIZE,
>> -				     DMA_FROM_DEVICE);
>> -		if (ib_dma_mapping_error(xprt->sc_cm_id->device, pa))
>> -			goto err_put_ctxt;
>> -		svc_rdma_count_mappings(xprt, ctxt);
>> -		ctxt->sge[sge_no].addr =3D pa;
>> -		ctxt->sge[sge_no].length =3D PAGE_SIZE;
>> -		ctxt->sge[sge_no].lkey =3D xprt->sc_pd->local_dma_lkey;
>> -		ctxt->count =3D sge_no + 1;
>> -		buflen +=3D PAGE_SIZE;
>> -	}
>> -	recv_wr.next =3D NULL;
>> -	recv_wr.sg_list =3D &ctxt->sge[0];
>> -	recv_wr.num_sge =3D ctxt->count;
>> -	recv_wr.wr_cqe =3D &ctxt->cqe;
>> -
>> -	svc_xprt_get(&xprt->sc_xprt);
>> -	ret =3D ib_post_recv(xprt->sc_qp, &recv_wr, &bad_recv_wr);
>> -	trace_svcrdma_post_recv(&recv_wr, ret);
>> -	if (ret) {
>> -		svc_rdma_unmap_dma(ctxt);
>> -		svc_rdma_put_context(ctxt, 1);
>> -		svc_xprt_put(&xprt->sc_xprt);
>> -	}
>> -	return ret;
>> -
>> - err_put_ctxt:
>> -	svc_rdma_unmap_dma(ctxt);
>> -	svc_rdma_put_context(ctxt, 1);
>> -	return -ENOMEM;
>> -}
>> -
>> static void
>> svc_rdma_parse_connect_private(struct svcxprt_rdma *newxprt,
>> 			       struct rdma_conn_param *param)
>> @@ -699,7 +591,7 @@ static struct svc_xprt *svc_rdma_accept(struct =
svc_xprt *xprt)
>> 	struct ib_qp_init_attr qp_attr;
>> 	struct ib_device *dev;
>> 	struct sockaddr *sap;
>> -	unsigned int i, ctxts;
>> +	unsigned int ctxts;
>> 	int ret =3D 0;
>>=20
>> 	listen_rdma =3D container_of(xprt, struct svcxprt_rdma, =
sc_xprt);
>> @@ -804,14 +696,8 @@ static struct svc_xprt *svc_rdma_accept(struct =
svc_xprt *xprt)
>> 	    !rdma_ib_or_roce(dev, newxprt->sc_port_num))
>> 		goto errout;
>>=20
>> -	/* Post receive buffers */
>> -	for (i =3D 0; i < newxprt->sc_max_requests; i++) {
>> -		ret =3D svc_rdma_post_recv(newxprt);
>> -		if (ret) {
>> -			dprintk("svcrdma: failure posting receive =
buffers\n");
>> -			goto errout;
>> -		}
>> -	}
>> +	if (!svc_rdma_post_recvs(newxprt))
>> +		goto errout;
>>=20
>> 	/* Swap out the handler */
>> 	newxprt->sc_cm_id->event_handler =3D rdma_cma_handler;
>> @@ -908,20 +794,7 @@ static void __svc_rdma_free(struct work_struct =
*work)
>> 		pr_err("svcrdma: sc_xprt still in use? (%d)\n",
>> 		       kref_read(&xprt->xpt_ref));
>>=20
>> -	while (!list_empty(&rdma->sc_read_complete_q)) {
>> -		struct svc_rdma_op_ctxt *ctxt;
>> -		ctxt =3D list_first_entry(&rdma->sc_read_complete_q,
>> -					struct svc_rdma_op_ctxt, list);
>> -		list_del(&ctxt->list);
>> -		svc_rdma_put_context(ctxt, 1);
>> -	}
>> -	while (!list_empty(&rdma->sc_rq_dto_q)) {
>> -		struct svc_rdma_op_ctxt *ctxt;
>> -		ctxt =3D list_first_entry(&rdma->sc_rq_dto_q,
>> -					struct svc_rdma_op_ctxt, list);
>> -		list_del(&ctxt->list);
>> -		svc_rdma_put_context(ctxt, 1);
>> -	}
>> +	svc_rdma_flush_recv_queues(rdma);
>>=20
>> 	/* Warn if we leaked a resource or under-referenced */
>> 	if (rdma->sc_ctxt_used !=3D 0)
>> @@ -936,6 +809,7 @@ static void __svc_rdma_free(struct work_struct =
*work)
>>=20
>> 	svc_rdma_destroy_rw_ctxts(rdma);
>> 	svc_rdma_destroy_ctxts(rdma);
>> +	svc_rdma_recv_ctxts_destroy(rdma);
>>=20
>> 	/* Destroy the QP if present (not a listener) */
>> 	if (rdma->sc_qp && !IS_ERR(rdma->sc_qp))
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" =
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 09/19] svcrdma: Preserve Receive buffer until svc_rdma_sendto
  2018-05-07 19:27 ` [PATCH v1 09/19] svcrdma: Preserve Receive buffer until svc_rdma_sendto Chuck Lever
@ 2018-05-09 21:03   ` J. Bruce Fields
  0 siblings, 0 replies; 29+ messages in thread
From: J. Bruce Fields @ 2018-05-09 21:03 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-rdma, linux-nfs

On Mon, May 07, 2018 at 03:27:37PM -0400, Chuck Lever wrote:
> Rather than releasing the incoming svc_rdma_recv_ctxt at the end of
> svc_rdma_recvfrom, hold onto it until svc_rdma_sendto.
> 
> This permits the contents of the Receive buffer to be preserved
> through svc_process and then referenced directly in sendto as it
> constructs Write and Reply chunks to return to the client.
> 
> The real changes will come in subsequent patches.
> 
> Note: I cannot use ->xpo_release_rqst for this purpose because that
> is called _before_ ->xpo_sendto. svc_rdma_sendto uses information in
> the received Call transport header to construct the Reply transport
> header, which is preserved in the RPC's Receive buffer.
> 
> The historical comment in svc_send() isn't helpful: it is already
> obvious that ->xpo_release_rqst is being called before ->xpo_sendto,
> but there is no explanation for this ordering going back to the
> beginning of the git era.

Yeah.  I'm fine with deleting that comment at least.  (Or maybe moving
the call if it makes sense, I haven't thought about it.)

--b.

> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c |    2 +-
>  net/sunrpc/xprtrdma/svc_rdma_sendto.c   |   14 +++++++++++---
>  2 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> index ecfe7c9..d9fef52 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> @@ -789,7 +789,7 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
>  		goto out_readchunk;
>  
>  complete:
> -	svc_rdma_recv_ctxt_put(rdma_xprt, ctxt);
> +	rqstp->rq_xprt_ctxt = ctxt;
>  	rqstp->rq_prot = IPPROTO_MAX;
>  	svc_xprt_copy_addrs(rqstp, xprt);
>  	return rqstp->rq_arg.len;
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> index a397d9a..cbbde70 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> @@ -623,6 +623,7 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
>  	struct svc_xprt *xprt = rqstp->rq_xprt;
>  	struct svcxprt_rdma *rdma =
>  		container_of(xprt, struct svcxprt_rdma, sc_xprt);
> +	struct svc_rdma_recv_ctxt *rctxt = rqstp->rq_xprt_ctxt;
>  	__be32 *p, *rdma_argp, *rdma_resp, *wr_lst, *rp_ch;
>  	struct xdr_buf *xdr = &rqstp->rq_res;
>  	struct page *res_page;
> @@ -675,7 +676,12 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
>  				      wr_lst, rp_ch);
>  	if (ret < 0)
>  		goto err0;
> -	return 0;
> +	ret = 0;
> +
> +out:
> +	rqstp->rq_xprt_ctxt = NULL;
> +	svc_rdma_recv_ctxt_put(rdma, rctxt);
> +	return ret;
>  
>   err2:
>  	if (ret != -E2BIG && ret != -EINVAL)
> @@ -684,12 +690,14 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
>  	ret = svc_rdma_send_error_msg(rdma, rdma_resp, rqstp);
>  	if (ret < 0)
>  		goto err0;
> -	return 0;
> +	ret = 0;
> +	goto out;
>  
>   err1:
>  	put_page(res_page);
>   err0:
>  	trace_svcrdma_send_failed(rqstp, ret);
>  	set_bit(XPT_CLOSE, &xprt->xpt_flags);
> -	return -ENOTCONN;
> +	ret = -ENOTCONN;
> +	goto out;
>  }

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 10/19] svcrdma: Persistently allocate and DMA-map Receive buffers
  2018-05-07 19:27 ` [PATCH v1 10/19] svcrdma: Persistently allocate and DMA-map Receive buffers Chuck Lever
@ 2018-05-09 21:18   ` J. Bruce Fields
  2018-05-09 21:31     ` Chuck Lever
  0 siblings, 1 reply; 29+ messages in thread
From: J. Bruce Fields @ 2018-05-09 21:18 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-rdma, linux-nfs

On Mon, May 07, 2018 at 03:27:43PM -0400, Chuck Lever wrote:
> The current Receive path uses an array of pages which are allocated
> and DMA mapped when each Receive WR is posted, and then handed off
> to the upper layer in rqstp::rq_arg. The page flip releases unused
> pages in the rq_pages pagelist. This mechanism introduces a
> significant amount of overhead.
> 
> So instead, kmalloc the Receive buffer, and leave it DMA-mapped
> while the transport remains connected. This confers a number of
> benefits:
> 
> * Each Receive WR requires only one receive SGE, no matter how large
>   the inline threshold is. This helps the server-side NFS/RDMA
>   transport operate on less capable RDMA devices.
> 
> * The Receive buffer is left allocated and mapped all the time. This
>   relieves svc_rdma_post_recv from the overhead of allocating and
>   DMA-mapping a fresh buffer.

Dumb question: does that mean the buffer could still change if the
client does something weird?  (So could the xdr decoding code see data
change out from under it?)

--b.

> 
> * svc_rdma_wc_receive no longer has to DMA unmap the Receive buffer.
>   It has to DMA sync only the number of bytes that were received.
> 
> * svc_rdma_build_arg_xdr no longer has to free a page in rq_pages
>   for each page in the Receive buffer, making it a constant-time
>   function.
> 
> * The Receive buffer is now plugged directly into the rq_arg's
>   head[0].iov_vec, and can be larger than a page without spilling
>   over into rq_arg's page list. This enables simplification of
>   the RDMA Read path in subsequent patches.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  include/linux/sunrpc/svc_rdma.h          |    4 -
>  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |  168 +++++++++++-------------------
>  net/sunrpc/xprtrdma/svc_rdma_rw.c        |   32 ++----
>  net/sunrpc/xprtrdma/svc_rdma_sendto.c    |    5 -
>  net/sunrpc/xprtrdma/svc_rdma_transport.c |    2 
>  5 files changed, 75 insertions(+), 136 deletions(-)
> 
> diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
> index f0bd0b6d..01baabf 100644
> --- a/include/linux/sunrpc/svc_rdma.h
> +++ b/include/linux/sunrpc/svc_rdma.h
> @@ -148,12 +148,12 @@ struct svc_rdma_recv_ctxt {
>  	struct list_head	rc_list;
>  	struct ib_recv_wr	rc_recv_wr;
>  	struct ib_cqe		rc_cqe;
> +	struct ib_sge		rc_recv_sge;
> +	void			*rc_recv_buf;
>  	struct xdr_buf		rc_arg;
>  	u32			rc_byte_len;
>  	unsigned int		rc_page_count;
>  	unsigned int		rc_hdr_count;
> -	struct ib_sge		rc_sges[1 +
> -					RPCRDMA_MAX_INLINE_THRESH / PAGE_SIZE];
>  	struct page		*rc_pages[RPCSVC_MAXPAGES];
>  };
>  
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> index d9fef52..d4ccd1c 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> @@ -117,6 +117,43 @@
>  					rc_list);
>  }
>  
> +static struct svc_rdma_recv_ctxt *
> +svc_rdma_recv_ctxt_alloc(struct svcxprt_rdma *rdma)
> +{
> +	struct svc_rdma_recv_ctxt *ctxt;
> +	dma_addr_t addr;
> +	void *buffer;
> +
> +	ctxt = kmalloc(sizeof(*ctxt), GFP_KERNEL);
> +	if (!ctxt)
> +		goto fail0;
> +	buffer = kmalloc(rdma->sc_max_req_size, GFP_KERNEL);
> +	if (!buffer)
> +		goto fail1;
> +	addr = ib_dma_map_single(rdma->sc_pd->device, buffer,
> +				 rdma->sc_max_req_size, DMA_FROM_DEVICE);
> +	if (ib_dma_mapping_error(rdma->sc_pd->device, addr))
> +		goto fail2;
> +
> +	ctxt->rc_recv_wr.next = NULL;
> +	ctxt->rc_recv_wr.wr_cqe = &ctxt->rc_cqe;
> +	ctxt->rc_recv_wr.sg_list = &ctxt->rc_recv_sge;
> +	ctxt->rc_recv_wr.num_sge = 1;
> +	ctxt->rc_cqe.done = svc_rdma_wc_receive;
> +	ctxt->rc_recv_sge.addr = addr;
> +	ctxt->rc_recv_sge.length = rdma->sc_max_req_size;
> +	ctxt->rc_recv_sge.lkey = rdma->sc_pd->local_dma_lkey;
> +	ctxt->rc_recv_buf = buffer;
> +	return ctxt;
> +
> +fail2:
> +	kfree(buffer);
> +fail1:
> +	kfree(ctxt);
> +fail0:
> +	return NULL;
> +}
> +
>  /**
>   * svc_rdma_recv_ctxts_destroy - Release all recv_ctxt's for an xprt
>   * @rdma: svcxprt_rdma being torn down
> @@ -128,6 +165,11 @@ void svc_rdma_recv_ctxts_destroy(struct svcxprt_rdma *rdma)
>  
>  	while ((ctxt = svc_rdma_next_recv_ctxt(&rdma->sc_recv_ctxts))) {
>  		list_del(&ctxt->rc_list);
> +		ib_dma_unmap_single(rdma->sc_pd->device,
> +				    ctxt->rc_recv_sge.addr,
> +				    ctxt->rc_recv_sge.length,
> +				    DMA_FROM_DEVICE);
> +		kfree(ctxt->rc_recv_buf);
>  		kfree(ctxt);
>  	}
>  }
> @@ -145,32 +187,18 @@ void svc_rdma_recv_ctxts_destroy(struct svcxprt_rdma *rdma)
>  	spin_unlock(&rdma->sc_recv_lock);
>  
>  out:
> -	ctxt->rc_recv_wr.num_sge = 0;
>  	ctxt->rc_page_count = 0;
>  	return ctxt;
>  
>  out_empty:
>  	spin_unlock(&rdma->sc_recv_lock);
>  
> -	ctxt = kmalloc(sizeof(*ctxt), GFP_KERNEL);
> +	ctxt = svc_rdma_recv_ctxt_alloc(rdma);
>  	if (!ctxt)
>  		return NULL;
>  	goto out;
>  }
>  
> -static void svc_rdma_recv_ctxt_unmap(struct svcxprt_rdma *rdma,
> -				     struct svc_rdma_recv_ctxt *ctxt)
> -{
> -	struct ib_device *device = rdma->sc_cm_id->device;
> -	int i;
> -
> -	for (i = 0; i < ctxt->rc_recv_wr.num_sge; i++)
> -		ib_dma_unmap_page(device,
> -				  ctxt->rc_sges[i].addr,
> -				  ctxt->rc_sges[i].length,
> -				  DMA_FROM_DEVICE);
> -}
> -
>  /**
>   * svc_rdma_recv_ctxt_put - Return recv_ctxt to free list
>   * @rdma: controlling svcxprt_rdma
> @@ -191,46 +219,14 @@ void svc_rdma_recv_ctxt_put(struct svcxprt_rdma *rdma,
>  
>  static int svc_rdma_post_recv(struct svcxprt_rdma *rdma)
>  {
> -	struct ib_device *device = rdma->sc_cm_id->device;
>  	struct svc_rdma_recv_ctxt *ctxt;
>  	struct ib_recv_wr *bad_recv_wr;
> -	int sge_no, buflen, ret;
> -	struct page *page;
> -	dma_addr_t pa;
> +	int ret;
>  
>  	ctxt = svc_rdma_recv_ctxt_get(rdma);
>  	if (!ctxt)
>  		return -ENOMEM;
>  
> -	buflen = 0;
> -	ctxt->rc_cqe.done = svc_rdma_wc_receive;
> -	for (sge_no = 0; buflen < rdma->sc_max_req_size; sge_no++) {
> -		if (sge_no >= rdma->sc_max_sge) {
> -			pr_err("svcrdma: Too many sges (%d)\n", sge_no);
> -			goto err_put_ctxt;
> -		}
> -
> -		page = alloc_page(GFP_KERNEL);
> -		if (!page)
> -			goto err_put_ctxt;
> -		ctxt->rc_pages[sge_no] = page;
> -		ctxt->rc_page_count++;
> -
> -		pa = ib_dma_map_page(device, ctxt->rc_pages[sge_no],
> -				     0, PAGE_SIZE, DMA_FROM_DEVICE);
> -		if (ib_dma_mapping_error(device, pa))
> -			goto err_put_ctxt;
> -		ctxt->rc_sges[sge_no].addr = pa;
> -		ctxt->rc_sges[sge_no].length = PAGE_SIZE;
> -		ctxt->rc_sges[sge_no].lkey = rdma->sc_pd->local_dma_lkey;
> -		ctxt->rc_recv_wr.num_sge++;
> -
> -		buflen += PAGE_SIZE;
> -	}
> -	ctxt->rc_recv_wr.next = NULL;
> -	ctxt->rc_recv_wr.sg_list = &ctxt->rc_sges[0];
> -	ctxt->rc_recv_wr.wr_cqe = &ctxt->rc_cqe;
> -
>  	svc_xprt_get(&rdma->sc_xprt);
>  	ret = ib_post_recv(rdma->sc_qp, &ctxt->rc_recv_wr, &bad_recv_wr);
>  	trace_svcrdma_post_recv(&ctxt->rc_recv_wr, ret);
> @@ -238,12 +234,7 @@ static int svc_rdma_post_recv(struct svcxprt_rdma *rdma)
>  		goto err_post;
>  	return 0;
>  
> -err_put_ctxt:
> -	svc_rdma_recv_ctxt_unmap(rdma, ctxt);
> -	svc_rdma_recv_ctxt_put(rdma, ctxt);
> -	return -ENOMEM;
>  err_post:
> -	svc_rdma_recv_ctxt_unmap(rdma, ctxt);
>  	svc_rdma_recv_ctxt_put(rdma, ctxt);
>  	svc_xprt_put(&rdma->sc_xprt);
>  	return ret;
> @@ -289,7 +280,6 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
>  
>  	/* WARNING: Only wc->wr_cqe and wc->status are reliable */
>  	ctxt = container_of(cqe, struct svc_rdma_recv_ctxt, rc_cqe);
> -	svc_rdma_recv_ctxt_unmap(rdma, ctxt);
>  
>  	if (wc->status != IB_WC_SUCCESS)
>  		goto flushed;
> @@ -299,6 +289,10 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
>  
>  	/* All wc fields are now known to be valid */
>  	ctxt->rc_byte_len = wc->byte_len;
> +	ib_dma_sync_single_for_cpu(rdma->sc_pd->device,
> +				   ctxt->rc_recv_sge.addr,
> +				   wc->byte_len, DMA_FROM_DEVICE);
> +
>  	spin_lock(&rdma->sc_rq_dto_lock);
>  	list_add_tail(&ctxt->rc_list, &rdma->sc_rq_dto_q);
>  	spin_unlock(&rdma->sc_rq_dto_lock);
> @@ -339,64 +333,22 @@ void svc_rdma_flush_recv_queues(struct svcxprt_rdma *rdma)
>  	}
>  }
>  
> -/*
> - * Replace the pages in the rq_argpages array with the pages from the SGE in
> - * the RDMA_RECV completion. The SGL should contain full pages up until the
> - * last one.
> - */
>  static void svc_rdma_build_arg_xdr(struct svc_rqst *rqstp,
>  				   struct svc_rdma_recv_ctxt *ctxt)
>  {
> -	struct page *page;
> -	int sge_no;
> -	u32 len;
> -
> -	/* The reply path assumes the Call's transport header resides
> -	 * in rqstp->rq_pages[0].
> -	 */
> -	page = ctxt->rc_pages[0];
> -	put_page(rqstp->rq_pages[0]);
> -	rqstp->rq_pages[0] = page;
> -
> -	/* Set up the XDR head */
> -	rqstp->rq_arg.head[0].iov_base = page_address(page);
> -	rqstp->rq_arg.head[0].iov_len =
> -		min_t(size_t, ctxt->rc_byte_len, ctxt->rc_sges[0].length);
> -	rqstp->rq_arg.len = ctxt->rc_byte_len;
> -	rqstp->rq_arg.buflen = ctxt->rc_byte_len;
> -
> -	/* Compute bytes past head in the SGL */
> -	len = ctxt->rc_byte_len - rqstp->rq_arg.head[0].iov_len;
> -
> -	/* If data remains, store it in the pagelist */
> -	rqstp->rq_arg.page_len = len;
> -	rqstp->rq_arg.page_base = 0;
> -
> -	sge_no = 1;
> -	while (len && sge_no < ctxt->rc_recv_wr.num_sge) {
> -		page = ctxt->rc_pages[sge_no];
> -		put_page(rqstp->rq_pages[sge_no]);
> -		rqstp->rq_pages[sge_no] = page;
> -		len -= min_t(u32, len, ctxt->rc_sges[sge_no].length);
> -		sge_no++;
> -	}
> -	ctxt->rc_hdr_count = sge_no;
> -	rqstp->rq_respages = &rqstp->rq_pages[sge_no];
> +	struct xdr_buf *arg = &rqstp->rq_arg;
> +
> +	arg->head[0].iov_base = ctxt->rc_recv_buf;
> +	arg->head[0].iov_len = ctxt->rc_byte_len;
> +	arg->tail[0].iov_base = NULL;
> +	arg->tail[0].iov_len = 0;
> +	arg->page_len = 0;
> +	arg->page_base = 0;
> +	arg->buflen = ctxt->rc_byte_len;
> +	arg->len = ctxt->rc_byte_len;
> +
> +	rqstp->rq_respages = &rqstp->rq_pages[0];
>  	rqstp->rq_next_page = rqstp->rq_respages + 1;
> -
> -	/* If not all pages were used from the SGL, free the remaining ones */
> -	while (sge_no < ctxt->rc_recv_wr.num_sge) {
> -		page = ctxt->rc_pages[sge_no++];
> -		put_page(page);
> -	}
> -
> -	/* @ctxt's pages have all been released or moved to @rqstp->rq_pages.
> -	 */
> -	ctxt->rc_page_count = 0;
> -
> -	/* Set up tail */
> -	rqstp->rq_arg.tail[0].iov_base = NULL;
> -	rqstp->rq_arg.tail[0].iov_len = 0;
>  }
>  
>  /* This accommodates the largest possible Write chunk,
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
> index 8242aa3..ce3ea84 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
> @@ -718,15 +718,14 @@ static int svc_rdma_build_normal_read_chunk(struct svc_rqst *rqstp,
>  	struct svc_rdma_recv_ctxt *head = info->ri_readctxt;
>  	int ret;
>  
> -	info->ri_pageno = head->rc_hdr_count;
> -	info->ri_pageoff = 0;
> -
>  	ret = svc_rdma_build_read_chunk(rqstp, info, p);
>  	if (ret < 0)
>  		goto out;
>  
>  	trace_svcrdma_encode_read(info->ri_chunklen, info->ri_position);
>  
> +	head->rc_hdr_count = 0;
> +
>  	/* Split the Receive buffer between the head and tail
>  	 * buffers at Read chunk's position. XDR roundup of the
>  	 * chunk is not included in either the pagelist or in
> @@ -775,9 +774,6 @@ static int svc_rdma_build_pz_read_chunk(struct svc_rqst *rqstp,
>  	struct svc_rdma_recv_ctxt *head = info->ri_readctxt;
>  	int ret;
>  
> -	info->ri_pageno = head->rc_hdr_count - 1;
> -	info->ri_pageoff = offset_in_page(head->rc_byte_len);
> -
>  	ret = svc_rdma_build_read_chunk(rqstp, info, p);
>  	if (ret < 0)
>  		goto out;
> @@ -787,20 +783,13 @@ static int svc_rdma_build_pz_read_chunk(struct svc_rqst *rqstp,
>  	head->rc_arg.len += info->ri_chunklen;
>  	head->rc_arg.buflen += info->ri_chunklen;
>  
> -	if (head->rc_arg.buflen <= head->rc_sges[0].length) {
> -		/* Transport header and RPC message fit entirely
> -		 * in page where head iovec resides.
> -		 */
> -		head->rc_arg.head[0].iov_len = info->ri_chunklen;
> -	} else {
> -		/* Transport header and part of RPC message reside
> -		 * in the head iovec's page.
> -		 */
> -		head->rc_arg.head[0].iov_len =
> -			head->rc_sges[0].length - head->rc_byte_len;
> -		head->rc_arg.page_len =
> -			info->ri_chunklen - head->rc_arg.head[0].iov_len;
> -	}
> +	head->rc_hdr_count = 1;
> +	head->rc_arg.head[0].iov_base = page_address(head->rc_pages[0]);
> +	head->rc_arg.head[0].iov_len = min_t(size_t, PAGE_SIZE,
> +					     info->ri_chunklen);
> +
> +	head->rc_arg.page_len = info->ri_chunklen -
> +				head->rc_arg.head[0].iov_len;
>  
>  out:
>  	return ret;
> @@ -834,7 +823,6 @@ int svc_rdma_recv_read_chunk(struct svcxprt_rdma *rdma, struct svc_rqst *rqstp,
>  	 * head->rc_arg. Pages involved with RDMA Read I/O are
>  	 * transferred there.
>  	 */
> -	head->rc_page_count = head->rc_hdr_count;
>  	head->rc_arg.head[0] = rqstp->rq_arg.head[0];
>  	head->rc_arg.tail[0] = rqstp->rq_arg.tail[0];
>  	head->rc_arg.pages = head->rc_pages;
> @@ -847,6 +835,8 @@ int svc_rdma_recv_read_chunk(struct svcxprt_rdma *rdma, struct svc_rqst *rqstp,
>  	if (!info)
>  		return -ENOMEM;
>  	info->ri_readctxt = head;
> +	info->ri_pageno = 0;
> +	info->ri_pageoff = 0;
>  
>  	info->ri_position = be32_to_cpup(p + 1);
>  	if (info->ri_position)
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> index cbbde70..b27b597 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> @@ -629,10 +629,7 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
>  	struct page *res_page;
>  	int ret;
>  
> -	/* Find the call's chunk lists to decide how to send the reply.
> -	 * Receive places the Call's xprt header at the start of page 0.
> -	 */
> -	rdma_argp = page_address(rqstp->rq_pages[0]);
> +	rdma_argp = rctxt->rc_recv_buf;
>  	svc_rdma_get_write_arrays(rdma_argp, &wr_lst, &rp_ch);
>  
>  	/* Create the RDMA response header. xprt->xpt_mutex,
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index 20abd3a..333c432 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -670,7 +670,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
>  	qp_attr.cap.max_send_wr = newxprt->sc_sq_depth - ctxts;
>  	qp_attr.cap.max_recv_wr = rq_depth;
>  	qp_attr.cap.max_send_sge = newxprt->sc_max_sge;
> -	qp_attr.cap.max_recv_sge = newxprt->sc_max_sge;
> +	qp_attr.cap.max_recv_sge = 1;
>  	qp_attr.sq_sig_type = IB_SIGNAL_REQ_WR;
>  	qp_attr.qp_type = IB_QPT_RC;
>  	qp_attr.send_cq = newxprt->sc_sq_cq;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 10/19] svcrdma: Persistently allocate and DMA-map Receive buffers
  2018-05-09 21:18   ` J. Bruce Fields
@ 2018-05-09 21:31     ` Chuck Lever
  2018-05-09 21:37       ` Bruce Fields
  0 siblings, 1 reply; 29+ messages in thread
From: Chuck Lever @ 2018-05-09 21:31 UTC (permalink / raw)
  To: Bruce Fields; +Cc: linux-rdma, Linux NFS Mailing List



> On May 9, 2018, at 5:18 PM, J. Bruce Fields <bfields@fieldses.org> =
wrote:
>=20
> On Mon, May 07, 2018 at 03:27:43PM -0400, Chuck Lever wrote:
>> The current Receive path uses an array of pages which are allocated
>> and DMA mapped when each Receive WR is posted, and then handed off
>> to the upper layer in rqstp::rq_arg. The page flip releases unused
>> pages in the rq_pages pagelist. This mechanism introduces a
>> significant amount of overhead.
>>=20
>> So instead, kmalloc the Receive buffer, and leave it DMA-mapped
>> while the transport remains connected. This confers a number of
>> benefits:
>>=20
>> * Each Receive WR requires only one receive SGE, no matter how large
>>  the inline threshold is. This helps the server-side NFS/RDMA
>>  transport operate on less capable RDMA devices.
>>=20
>> * The Receive buffer is left allocated and mapped all the time. This
>>  relieves svc_rdma_post_recv from the overhead of allocating and
>>  DMA-mapping a fresh buffer.
>=20
> Dumb question: does that mean the buffer could still change if the
> client does something weird?  (So could the xdr decoding code see data
> change out from under it?)

No, once the Receive completes, the HCA no longer has access
to the buffer. Leaving it DMA-mapped is not the same as leaving
it posted or leaving it registered; it just means that the I/O
subsystem has the buffer prepared for more I/O.

The current scheme works like this:

repeat:
Allocate a page
DMA-map the page
Post a Recv WR for that page
Recv WR completes
DMA-unmap the page
XDR decoding and stuff
Free the page
goto repeat

And I want it to go faster:

kmalloc a buffer
DMA-map the buffer
repeat:
Post a Recv for that buffer
Recv WR completes
DMA-sync the buffer
XDR decoding and stuff
goto repeat

On disconnect, the Recv flushes. The transport tear-down logic
takes care of DMA-unmapping and freeing the buffer.


> --b.
>=20
>>=20
>> * svc_rdma_wc_receive no longer has to DMA unmap the Receive buffer.
>>  It has to DMA sync only the number of bytes that were received.
>>=20
>> * svc_rdma_build_arg_xdr no longer has to free a page in rq_pages
>>  for each page in the Receive buffer, making it a constant-time
>>  function.
>>=20
>> * The Receive buffer is now plugged directly into the rq_arg's
>>  head[0].iov_vec, and can be larger than a page without spilling
>>  over into rq_arg's page list. This enables simplification of
>>  the RDMA Read path in subsequent patches.
>>=20
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>> include/linux/sunrpc/svc_rdma.h          |    4 -
>> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |  168 =
+++++++++++-------------------
>> net/sunrpc/xprtrdma/svc_rdma_rw.c        |   32 ++----
>> net/sunrpc/xprtrdma/svc_rdma_sendto.c    |    5 -
>> net/sunrpc/xprtrdma/svc_rdma_transport.c |    2=20
>> 5 files changed, 75 insertions(+), 136 deletions(-)
>>=20
>> diff --git a/include/linux/sunrpc/svc_rdma.h =
b/include/linux/sunrpc/svc_rdma.h
>> index f0bd0b6d..01baabf 100644
>> --- a/include/linux/sunrpc/svc_rdma.h
>> +++ b/include/linux/sunrpc/svc_rdma.h
>> @@ -148,12 +148,12 @@ struct svc_rdma_recv_ctxt {
>> 	struct list_head	rc_list;
>> 	struct ib_recv_wr	rc_recv_wr;
>> 	struct ib_cqe		rc_cqe;
>> +	struct ib_sge		rc_recv_sge;
>> +	void			*rc_recv_buf;
>> 	struct xdr_buf		rc_arg;
>> 	u32			rc_byte_len;
>> 	unsigned int		rc_page_count;
>> 	unsigned int		rc_hdr_count;
>> -	struct ib_sge		rc_sges[1 +
>> -					RPCRDMA_MAX_INLINE_THRESH / =
PAGE_SIZE];
>> 	struct page		*rc_pages[RPCSVC_MAXPAGES];
>> };
>>=20
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c =
b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> index d9fef52..d4ccd1c 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> @@ -117,6 +117,43 @@
>> 					rc_list);
>> }
>>=20
>> +static struct svc_rdma_recv_ctxt *
>> +svc_rdma_recv_ctxt_alloc(struct svcxprt_rdma *rdma)
>> +{
>> +	struct svc_rdma_recv_ctxt *ctxt;
>> +	dma_addr_t addr;
>> +	void *buffer;
>> +
>> +	ctxt =3D kmalloc(sizeof(*ctxt), GFP_KERNEL);
>> +	if (!ctxt)
>> +		goto fail0;
>> +	buffer =3D kmalloc(rdma->sc_max_req_size, GFP_KERNEL);
>> +	if (!buffer)
>> +		goto fail1;
>> +	addr =3D ib_dma_map_single(rdma->sc_pd->device, buffer,
>> +				 rdma->sc_max_req_size, =
DMA_FROM_DEVICE);
>> +	if (ib_dma_mapping_error(rdma->sc_pd->device, addr))
>> +		goto fail2;
>> +
>> +	ctxt->rc_recv_wr.next =3D NULL;
>> +	ctxt->rc_recv_wr.wr_cqe =3D &ctxt->rc_cqe;
>> +	ctxt->rc_recv_wr.sg_list =3D &ctxt->rc_recv_sge;
>> +	ctxt->rc_recv_wr.num_sge =3D 1;
>> +	ctxt->rc_cqe.done =3D svc_rdma_wc_receive;
>> +	ctxt->rc_recv_sge.addr =3D addr;
>> +	ctxt->rc_recv_sge.length =3D rdma->sc_max_req_size;
>> +	ctxt->rc_recv_sge.lkey =3D rdma->sc_pd->local_dma_lkey;
>> +	ctxt->rc_recv_buf =3D buffer;
>> +	return ctxt;
>> +
>> +fail2:
>> +	kfree(buffer);
>> +fail1:
>> +	kfree(ctxt);
>> +fail0:
>> +	return NULL;
>> +}
>> +
>> /**
>>  * svc_rdma_recv_ctxts_destroy - Release all recv_ctxt's for an xprt
>>  * @rdma: svcxprt_rdma being torn down
>> @@ -128,6 +165,11 @@ void svc_rdma_recv_ctxts_destroy(struct =
svcxprt_rdma *rdma)
>>=20
>> 	while ((ctxt =3D svc_rdma_next_recv_ctxt(&rdma->sc_recv_ctxts))) =
{
>> 		list_del(&ctxt->rc_list);
>> +		ib_dma_unmap_single(rdma->sc_pd->device,
>> +				    ctxt->rc_recv_sge.addr,
>> +				    ctxt->rc_recv_sge.length,
>> +				    DMA_FROM_DEVICE);
>> +		kfree(ctxt->rc_recv_buf);
>> 		kfree(ctxt);
>> 	}
>> }
>> @@ -145,32 +187,18 @@ void svc_rdma_recv_ctxts_destroy(struct =
svcxprt_rdma *rdma)
>> 	spin_unlock(&rdma->sc_recv_lock);
>>=20
>> out:
>> -	ctxt->rc_recv_wr.num_sge =3D 0;
>> 	ctxt->rc_page_count =3D 0;
>> 	return ctxt;
>>=20
>> out_empty:
>> 	spin_unlock(&rdma->sc_recv_lock);
>>=20
>> -	ctxt =3D kmalloc(sizeof(*ctxt), GFP_KERNEL);
>> +	ctxt =3D svc_rdma_recv_ctxt_alloc(rdma);
>> 	if (!ctxt)
>> 		return NULL;
>> 	goto out;
>> }
>>=20
>> -static void svc_rdma_recv_ctxt_unmap(struct svcxprt_rdma *rdma,
>> -				     struct svc_rdma_recv_ctxt *ctxt)
>> -{
>> -	struct ib_device *device =3D rdma->sc_cm_id->device;
>> -	int i;
>> -
>> -	for (i =3D 0; i < ctxt->rc_recv_wr.num_sge; i++)
>> -		ib_dma_unmap_page(device,
>> -				  ctxt->rc_sges[i].addr,
>> -				  ctxt->rc_sges[i].length,
>> -				  DMA_FROM_DEVICE);
>> -}
>> -
>> /**
>>  * svc_rdma_recv_ctxt_put - Return recv_ctxt to free list
>>  * @rdma: controlling svcxprt_rdma
>> @@ -191,46 +219,14 @@ void svc_rdma_recv_ctxt_put(struct svcxprt_rdma =
*rdma,
>>=20
>> static int svc_rdma_post_recv(struct svcxprt_rdma *rdma)
>> {
>> -	struct ib_device *device =3D rdma->sc_cm_id->device;
>> 	struct svc_rdma_recv_ctxt *ctxt;
>> 	struct ib_recv_wr *bad_recv_wr;
>> -	int sge_no, buflen, ret;
>> -	struct page *page;
>> -	dma_addr_t pa;
>> +	int ret;
>>=20
>> 	ctxt =3D svc_rdma_recv_ctxt_get(rdma);
>> 	if (!ctxt)
>> 		return -ENOMEM;
>>=20
>> -	buflen =3D 0;
>> -	ctxt->rc_cqe.done =3D svc_rdma_wc_receive;
>> -	for (sge_no =3D 0; buflen < rdma->sc_max_req_size; sge_no++) {
>> -		if (sge_no >=3D rdma->sc_max_sge) {
>> -			pr_err("svcrdma: Too many sges (%d)\n", sge_no);
>> -			goto err_put_ctxt;
>> -		}
>> -
>> -		page =3D alloc_page(GFP_KERNEL);
>> -		if (!page)
>> -			goto err_put_ctxt;
>> -		ctxt->rc_pages[sge_no] =3D page;
>> -		ctxt->rc_page_count++;
>> -
>> -		pa =3D ib_dma_map_page(device, ctxt->rc_pages[sge_no],
>> -				     0, PAGE_SIZE, DMA_FROM_DEVICE);
>> -		if (ib_dma_mapping_error(device, pa))
>> -			goto err_put_ctxt;
>> -		ctxt->rc_sges[sge_no].addr =3D pa;
>> -		ctxt->rc_sges[sge_no].length =3D PAGE_SIZE;
>> -		ctxt->rc_sges[sge_no].lkey =3D =
rdma->sc_pd->local_dma_lkey;
>> -		ctxt->rc_recv_wr.num_sge++;
>> -
>> -		buflen +=3D PAGE_SIZE;
>> -	}
>> -	ctxt->rc_recv_wr.next =3D NULL;
>> -	ctxt->rc_recv_wr.sg_list =3D &ctxt->rc_sges[0];
>> -	ctxt->rc_recv_wr.wr_cqe =3D &ctxt->rc_cqe;
>> -
>> 	svc_xprt_get(&rdma->sc_xprt);
>> 	ret =3D ib_post_recv(rdma->sc_qp, &ctxt->rc_recv_wr, =
&bad_recv_wr);
>> 	trace_svcrdma_post_recv(&ctxt->rc_recv_wr, ret);
>> @@ -238,12 +234,7 @@ static int svc_rdma_post_recv(struct =
svcxprt_rdma *rdma)
>> 		goto err_post;
>> 	return 0;
>>=20
>> -err_put_ctxt:
>> -	svc_rdma_recv_ctxt_unmap(rdma, ctxt);
>> -	svc_rdma_recv_ctxt_put(rdma, ctxt);
>> -	return -ENOMEM;
>> err_post:
>> -	svc_rdma_recv_ctxt_unmap(rdma, ctxt);
>> 	svc_rdma_recv_ctxt_put(rdma, ctxt);
>> 	svc_xprt_put(&rdma->sc_xprt);
>> 	return ret;
>> @@ -289,7 +280,6 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, =
struct ib_wc *wc)
>>=20
>> 	/* WARNING: Only wc->wr_cqe and wc->status are reliable */
>> 	ctxt =3D container_of(cqe, struct svc_rdma_recv_ctxt, rc_cqe);
>> -	svc_rdma_recv_ctxt_unmap(rdma, ctxt);
>>=20
>> 	if (wc->status !=3D IB_WC_SUCCESS)
>> 		goto flushed;
>> @@ -299,6 +289,10 @@ static void svc_rdma_wc_receive(struct ib_cq =
*cq, struct ib_wc *wc)
>>=20
>> 	/* All wc fields are now known to be valid */
>> 	ctxt->rc_byte_len =3D wc->byte_len;
>> +	ib_dma_sync_single_for_cpu(rdma->sc_pd->device,
>> +				   ctxt->rc_recv_sge.addr,
>> +				   wc->byte_len, DMA_FROM_DEVICE);
>> +
>> 	spin_lock(&rdma->sc_rq_dto_lock);
>> 	list_add_tail(&ctxt->rc_list, &rdma->sc_rq_dto_q);
>> 	spin_unlock(&rdma->sc_rq_dto_lock);
>> @@ -339,64 +333,22 @@ void svc_rdma_flush_recv_queues(struct =
svcxprt_rdma *rdma)
>> 	}
>> }
>>=20
>> -/*
>> - * Replace the pages in the rq_argpages array with the pages from =
the SGE in
>> - * the RDMA_RECV completion. The SGL should contain full pages up =
until the
>> - * last one.
>> - */
>> static void svc_rdma_build_arg_xdr(struct svc_rqst *rqstp,
>> 				   struct svc_rdma_recv_ctxt *ctxt)
>> {
>> -	struct page *page;
>> -	int sge_no;
>> -	u32 len;
>> -
>> -	/* The reply path assumes the Call's transport header resides
>> -	 * in rqstp->rq_pages[0].
>> -	 */
>> -	page =3D ctxt->rc_pages[0];
>> -	put_page(rqstp->rq_pages[0]);
>> -	rqstp->rq_pages[0] =3D page;
>> -
>> -	/* Set up the XDR head */
>> -	rqstp->rq_arg.head[0].iov_base =3D page_address(page);
>> -	rqstp->rq_arg.head[0].iov_len =3D
>> -		min_t(size_t, ctxt->rc_byte_len, =
ctxt->rc_sges[0].length);
>> -	rqstp->rq_arg.len =3D ctxt->rc_byte_len;
>> -	rqstp->rq_arg.buflen =3D ctxt->rc_byte_len;
>> -
>> -	/* Compute bytes past head in the SGL */
>> -	len =3D ctxt->rc_byte_len - rqstp->rq_arg.head[0].iov_len;
>> -
>> -	/* If data remains, store it in the pagelist */
>> -	rqstp->rq_arg.page_len =3D len;
>> -	rqstp->rq_arg.page_base =3D 0;
>> -
>> -	sge_no =3D 1;
>> -	while (len && sge_no < ctxt->rc_recv_wr.num_sge) {
>> -		page =3D ctxt->rc_pages[sge_no];
>> -		put_page(rqstp->rq_pages[sge_no]);
>> -		rqstp->rq_pages[sge_no] =3D page;
>> -		len -=3D min_t(u32, len, ctxt->rc_sges[sge_no].length);
>> -		sge_no++;
>> -	}
>> -	ctxt->rc_hdr_count =3D sge_no;
>> -	rqstp->rq_respages =3D &rqstp->rq_pages[sge_no];
>> +	struct xdr_buf *arg =3D &rqstp->rq_arg;
>> +
>> +	arg->head[0].iov_base =3D ctxt->rc_recv_buf;
>> +	arg->head[0].iov_len =3D ctxt->rc_byte_len;
>> +	arg->tail[0].iov_base =3D NULL;
>> +	arg->tail[0].iov_len =3D 0;
>> +	arg->page_len =3D 0;
>> +	arg->page_base =3D 0;
>> +	arg->buflen =3D ctxt->rc_byte_len;
>> +	arg->len =3D ctxt->rc_byte_len;
>> +
>> +	rqstp->rq_respages =3D &rqstp->rq_pages[0];
>> 	rqstp->rq_next_page =3D rqstp->rq_respages + 1;
>> -
>> -	/* If not all pages were used from the SGL, free the remaining =
ones */
>> -	while (sge_no < ctxt->rc_recv_wr.num_sge) {
>> -		page =3D ctxt->rc_pages[sge_no++];
>> -		put_page(page);
>> -	}
>> -
>> -	/* @ctxt's pages have all been released or moved to =
@rqstp->rq_pages.
>> -	 */
>> -	ctxt->rc_page_count =3D 0;
>> -
>> -	/* Set up tail */
>> -	rqstp->rq_arg.tail[0].iov_base =3D NULL;
>> -	rqstp->rq_arg.tail[0].iov_len =3D 0;
>> }
>>=20
>> /* This accommodates the largest possible Write chunk,
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c =
b/net/sunrpc/xprtrdma/svc_rdma_rw.c
>> index 8242aa3..ce3ea84 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
>> @@ -718,15 +718,14 @@ static int =
svc_rdma_build_normal_read_chunk(struct svc_rqst *rqstp,
>> 	struct svc_rdma_recv_ctxt *head =3D info->ri_readctxt;
>> 	int ret;
>>=20
>> -	info->ri_pageno =3D head->rc_hdr_count;
>> -	info->ri_pageoff =3D 0;
>> -
>> 	ret =3D svc_rdma_build_read_chunk(rqstp, info, p);
>> 	if (ret < 0)
>> 		goto out;
>>=20
>> 	trace_svcrdma_encode_read(info->ri_chunklen, info->ri_position);
>>=20
>> +	head->rc_hdr_count =3D 0;
>> +
>> 	/* Split the Receive buffer between the head and tail
>> 	 * buffers at Read chunk's position. XDR roundup of the
>> 	 * chunk is not included in either the pagelist or in
>> @@ -775,9 +774,6 @@ static int svc_rdma_build_pz_read_chunk(struct =
svc_rqst *rqstp,
>> 	struct svc_rdma_recv_ctxt *head =3D info->ri_readctxt;
>> 	int ret;
>>=20
>> -	info->ri_pageno =3D head->rc_hdr_count - 1;
>> -	info->ri_pageoff =3D offset_in_page(head->rc_byte_len);
>> -
>> 	ret =3D svc_rdma_build_read_chunk(rqstp, info, p);
>> 	if (ret < 0)
>> 		goto out;
>> @@ -787,20 +783,13 @@ static int svc_rdma_build_pz_read_chunk(struct =
svc_rqst *rqstp,
>> 	head->rc_arg.len +=3D info->ri_chunklen;
>> 	head->rc_arg.buflen +=3D info->ri_chunklen;
>>=20
>> -	if (head->rc_arg.buflen <=3D head->rc_sges[0].length) {
>> -		/* Transport header and RPC message fit entirely
>> -		 * in page where head iovec resides.
>> -		 */
>> -		head->rc_arg.head[0].iov_len =3D info->ri_chunklen;
>> -	} else {
>> -		/* Transport header and part of RPC message reside
>> -		 * in the head iovec's page.
>> -		 */
>> -		head->rc_arg.head[0].iov_len =3D
>> -			head->rc_sges[0].length - head->rc_byte_len;
>> -		head->rc_arg.page_len =3D
>> -			info->ri_chunklen - =
head->rc_arg.head[0].iov_len;
>> -	}
>> +	head->rc_hdr_count =3D 1;
>> +	head->rc_arg.head[0].iov_base =3D =
page_address(head->rc_pages[0]);
>> +	head->rc_arg.head[0].iov_len =3D min_t(size_t, PAGE_SIZE,
>> +					     info->ri_chunklen);
>> +
>> +	head->rc_arg.page_len =3D info->ri_chunklen -
>> +				head->rc_arg.head[0].iov_len;
>>=20
>> out:
>> 	return ret;
>> @@ -834,7 +823,6 @@ int svc_rdma_recv_read_chunk(struct svcxprt_rdma =
*rdma, struct svc_rqst *rqstp,
>> 	 * head->rc_arg. Pages involved with RDMA Read I/O are
>> 	 * transferred there.
>> 	 */
>> -	head->rc_page_count =3D head->rc_hdr_count;
>> 	head->rc_arg.head[0] =3D rqstp->rq_arg.head[0];
>> 	head->rc_arg.tail[0] =3D rqstp->rq_arg.tail[0];
>> 	head->rc_arg.pages =3D head->rc_pages;
>> @@ -847,6 +835,8 @@ int svc_rdma_recv_read_chunk(struct svcxprt_rdma =
*rdma, struct svc_rqst *rqstp,
>> 	if (!info)
>> 		return -ENOMEM;
>> 	info->ri_readctxt =3D head;
>> +	info->ri_pageno =3D 0;
>> +	info->ri_pageoff =3D 0;
>>=20
>> 	info->ri_position =3D be32_to_cpup(p + 1);
>> 	if (info->ri_position)
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c =
b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>> index cbbde70..b27b597 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>> @@ -629,10 +629,7 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
>> 	struct page *res_page;
>> 	int ret;
>>=20
>> -	/* Find the call's chunk lists to decide how to send the reply.
>> -	 * Receive places the Call's xprt header at the start of page 0.
>> -	 */
>> -	rdma_argp =3D page_address(rqstp->rq_pages[0]);
>> +	rdma_argp =3D rctxt->rc_recv_buf;
>> 	svc_rdma_get_write_arrays(rdma_argp, &wr_lst, &rp_ch);
>>=20
>> 	/* Create the RDMA response header. xprt->xpt_mutex,
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c =
b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> index 20abd3a..333c432 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> @@ -670,7 +670,7 @@ static struct svc_xprt *svc_rdma_accept(struct =
svc_xprt *xprt)
>> 	qp_attr.cap.max_send_wr =3D newxprt->sc_sq_depth - ctxts;
>> 	qp_attr.cap.max_recv_wr =3D rq_depth;
>> 	qp_attr.cap.max_send_sge =3D newxprt->sc_max_sge;
>> -	qp_attr.cap.max_recv_sge =3D newxprt->sc_max_sge;
>> +	qp_attr.cap.max_recv_sge =3D 1;
>> 	qp_attr.sq_sig_type =3D IB_SIGNAL_REQ_WR;
>> 	qp_attr.qp_type =3D IB_QPT_RC;
>> 	qp_attr.send_cq =3D newxprt->sc_sq_cq;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 10/19] svcrdma: Persistently allocate and DMA-map Receive buffers
  2018-05-09 21:31     ` Chuck Lever
@ 2018-05-09 21:37       ` Bruce Fields
  0 siblings, 0 replies; 29+ messages in thread
From: Bruce Fields @ 2018-05-09 21:37 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-rdma, Linux NFS Mailing List

On Wed, May 09, 2018 at 05:31:17PM -0400, Chuck Lever wrote:
> 
> 
> > On May 9, 2018, at 5:18 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > 
> > On Mon, May 07, 2018 at 03:27:43PM -0400, Chuck Lever wrote:
> >> The current Receive path uses an array of pages which are allocated
> >> and DMA mapped when each Receive WR is posted, and then handed off
> >> to the upper layer in rqstp::rq_arg. The page flip releases unused
> >> pages in the rq_pages pagelist. This mechanism introduces a
> >> significant amount of overhead.
> >> 
> >> So instead, kmalloc the Receive buffer, and leave it DMA-mapped
> >> while the transport remains connected. This confers a number of
> >> benefits:
> >> 
> >> * Each Receive WR requires only one receive SGE, no matter how large
> >>  the inline threshold is. This helps the server-side NFS/RDMA
> >>  transport operate on less capable RDMA devices.
> >> 
> >> * The Receive buffer is left allocated and mapped all the time. This
> >>  relieves svc_rdma_post_recv from the overhead of allocating and
> >>  DMA-mapping a fresh buffer.
> > 
> > Dumb question: does that mean the buffer could still change if the
> > client does something weird?  (So could the xdr decoding code see data
> > change out from under it?)
> 
> No, once the Receive completes, the HCA no longer has access
> to the buffer. Leaving it DMA-mapped is not the same as leaving
> it posted or leaving it registered; it just means that the I/O
> subsystem has the buffer prepared for more I/O.
> 
> The current scheme works like this:
> 
> repeat:
> Allocate a page
> DMA-map the page
> Post a Recv WR for that page
> Recv WR completes
> DMA-unmap the page
> XDR decoding and stuff
> Free the page
> goto repeat
> 
> And I want it to go faster:
> 
> kmalloc a buffer
> DMA-map the buffer
> repeat:
> Post a Recv for that buffer
> Recv WR completes
> DMA-sync the buffer
> XDR decoding and stuff
> goto repeat
> 
> On disconnect, the Recv flushes. The transport tear-down logic
> takes care of DMA-unmapping and freeing the buffer.

Got it, thanks!


--b.

> 
> 
> > --b.
> > 
> >> 
> >> * svc_rdma_wc_receive no longer has to DMA unmap the Receive buffer.
> >>  It has to DMA sync only the number of bytes that were received.
> >> 
> >> * svc_rdma_build_arg_xdr no longer has to free a page in rq_pages
> >>  for each page in the Receive buffer, making it a constant-time
> >>  function.
> >> 
> >> * The Receive buffer is now plugged directly into the rq_arg's
> >>  head[0].iov_vec, and can be larger than a page without spilling
> >>  over into rq_arg's page list. This enables simplification of
> >>  the RDMA Read path in subsequent patches.
> >> 
> >> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >> ---
> >> include/linux/sunrpc/svc_rdma.h          |    4 -
> >> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |  168 +++++++++++-------------------
> >> net/sunrpc/xprtrdma/svc_rdma_rw.c        |   32 ++----
> >> net/sunrpc/xprtrdma/svc_rdma_sendto.c    |    5 -
> >> net/sunrpc/xprtrdma/svc_rdma_transport.c |    2 
> >> 5 files changed, 75 insertions(+), 136 deletions(-)
> >> 
> >> diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
> >> index f0bd0b6d..01baabf 100644
> >> --- a/include/linux/sunrpc/svc_rdma.h
> >> +++ b/include/linux/sunrpc/svc_rdma.h
> >> @@ -148,12 +148,12 @@ struct svc_rdma_recv_ctxt {
> >> 	struct list_head	rc_list;
> >> 	struct ib_recv_wr	rc_recv_wr;
> >> 	struct ib_cqe		rc_cqe;
> >> +	struct ib_sge		rc_recv_sge;
> >> +	void			*rc_recv_buf;
> >> 	struct xdr_buf		rc_arg;
> >> 	u32			rc_byte_len;
> >> 	unsigned int		rc_page_count;
> >> 	unsigned int		rc_hdr_count;
> >> -	struct ib_sge		rc_sges[1 +
> >> -					RPCRDMA_MAX_INLINE_THRESH / PAGE_SIZE];
> >> 	struct page		*rc_pages[RPCSVC_MAXPAGES];
> >> };
> >> 
> >> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> >> index d9fef52..d4ccd1c 100644
> >> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> >> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> >> @@ -117,6 +117,43 @@
> >> 					rc_list);
> >> }
> >> 
> >> +static struct svc_rdma_recv_ctxt *
> >> +svc_rdma_recv_ctxt_alloc(struct svcxprt_rdma *rdma)
> >> +{
> >> +	struct svc_rdma_recv_ctxt *ctxt;
> >> +	dma_addr_t addr;
> >> +	void *buffer;
> >> +
> >> +	ctxt = kmalloc(sizeof(*ctxt), GFP_KERNEL);
> >> +	if (!ctxt)
> >> +		goto fail0;
> >> +	buffer = kmalloc(rdma->sc_max_req_size, GFP_KERNEL);
> >> +	if (!buffer)
> >> +		goto fail1;
> >> +	addr = ib_dma_map_single(rdma->sc_pd->device, buffer,
> >> +				 rdma->sc_max_req_size, DMA_FROM_DEVICE);
> >> +	if (ib_dma_mapping_error(rdma->sc_pd->device, addr))
> >> +		goto fail2;
> >> +
> >> +	ctxt->rc_recv_wr.next = NULL;
> >> +	ctxt->rc_recv_wr.wr_cqe = &ctxt->rc_cqe;
> >> +	ctxt->rc_recv_wr.sg_list = &ctxt->rc_recv_sge;
> >> +	ctxt->rc_recv_wr.num_sge = 1;
> >> +	ctxt->rc_cqe.done = svc_rdma_wc_receive;
> >> +	ctxt->rc_recv_sge.addr = addr;
> >> +	ctxt->rc_recv_sge.length = rdma->sc_max_req_size;
> >> +	ctxt->rc_recv_sge.lkey = rdma->sc_pd->local_dma_lkey;
> >> +	ctxt->rc_recv_buf = buffer;
> >> +	return ctxt;
> >> +
> >> +fail2:
> >> +	kfree(buffer);
> >> +fail1:
> >> +	kfree(ctxt);
> >> +fail0:
> >> +	return NULL;
> >> +}
> >> +
> >> /**
> >>  * svc_rdma_recv_ctxts_destroy - Release all recv_ctxt's for an xprt
> >>  * @rdma: svcxprt_rdma being torn down
> >> @@ -128,6 +165,11 @@ void svc_rdma_recv_ctxts_destroy(struct svcxprt_rdma *rdma)
> >> 
> >> 	while ((ctxt = svc_rdma_next_recv_ctxt(&rdma->sc_recv_ctxts))) {
> >> 		list_del(&ctxt->rc_list);
> >> +		ib_dma_unmap_single(rdma->sc_pd->device,
> >> +				    ctxt->rc_recv_sge.addr,
> >> +				    ctxt->rc_recv_sge.length,
> >> +				    DMA_FROM_DEVICE);
> >> +		kfree(ctxt->rc_recv_buf);
> >> 		kfree(ctxt);
> >> 	}
> >> }
> >> @@ -145,32 +187,18 @@ void svc_rdma_recv_ctxts_destroy(struct svcxprt_rdma *rdma)
> >> 	spin_unlock(&rdma->sc_recv_lock);
> >> 
> >> out:
> >> -	ctxt->rc_recv_wr.num_sge = 0;
> >> 	ctxt->rc_page_count = 0;
> >> 	return ctxt;
> >> 
> >> out_empty:
> >> 	spin_unlock(&rdma->sc_recv_lock);
> >> 
> >> -	ctxt = kmalloc(sizeof(*ctxt), GFP_KERNEL);
> >> +	ctxt = svc_rdma_recv_ctxt_alloc(rdma);
> >> 	if (!ctxt)
> >> 		return NULL;
> >> 	goto out;
> >> }
> >> 
> >> -static void svc_rdma_recv_ctxt_unmap(struct svcxprt_rdma *rdma,
> >> -				     struct svc_rdma_recv_ctxt *ctxt)
> >> -{
> >> -	struct ib_device *device = rdma->sc_cm_id->device;
> >> -	int i;
> >> -
> >> -	for (i = 0; i < ctxt->rc_recv_wr.num_sge; i++)
> >> -		ib_dma_unmap_page(device,
> >> -				  ctxt->rc_sges[i].addr,
> >> -				  ctxt->rc_sges[i].length,
> >> -				  DMA_FROM_DEVICE);
> >> -}
> >> -
> >> /**
> >>  * svc_rdma_recv_ctxt_put - Return recv_ctxt to free list
> >>  * @rdma: controlling svcxprt_rdma
> >> @@ -191,46 +219,14 @@ void svc_rdma_recv_ctxt_put(struct svcxprt_rdma *rdma,
> >> 
> >> static int svc_rdma_post_recv(struct svcxprt_rdma *rdma)
> >> {
> >> -	struct ib_device *device = rdma->sc_cm_id->device;
> >> 	struct svc_rdma_recv_ctxt *ctxt;
> >> 	struct ib_recv_wr *bad_recv_wr;
> >> -	int sge_no, buflen, ret;
> >> -	struct page *page;
> >> -	dma_addr_t pa;
> >> +	int ret;
> >> 
> >> 	ctxt = svc_rdma_recv_ctxt_get(rdma);
> >> 	if (!ctxt)
> >> 		return -ENOMEM;
> >> 
> >> -	buflen = 0;
> >> -	ctxt->rc_cqe.done = svc_rdma_wc_receive;
> >> -	for (sge_no = 0; buflen < rdma->sc_max_req_size; sge_no++) {
> >> -		if (sge_no >= rdma->sc_max_sge) {
> >> -			pr_err("svcrdma: Too many sges (%d)\n", sge_no);
> >> -			goto err_put_ctxt;
> >> -		}
> >> -
> >> -		page = alloc_page(GFP_KERNEL);
> >> -		if (!page)
> >> -			goto err_put_ctxt;
> >> -		ctxt->rc_pages[sge_no] = page;
> >> -		ctxt->rc_page_count++;
> >> -
> >> -		pa = ib_dma_map_page(device, ctxt->rc_pages[sge_no],
> >> -				     0, PAGE_SIZE, DMA_FROM_DEVICE);
> >> -		if (ib_dma_mapping_error(device, pa))
> >> -			goto err_put_ctxt;
> >> -		ctxt->rc_sges[sge_no].addr = pa;
> >> -		ctxt->rc_sges[sge_no].length = PAGE_SIZE;
> >> -		ctxt->rc_sges[sge_no].lkey = rdma->sc_pd->local_dma_lkey;
> >> -		ctxt->rc_recv_wr.num_sge++;
> >> -
> >> -		buflen += PAGE_SIZE;
> >> -	}
> >> -	ctxt->rc_recv_wr.next = NULL;
> >> -	ctxt->rc_recv_wr.sg_list = &ctxt->rc_sges[0];
> >> -	ctxt->rc_recv_wr.wr_cqe = &ctxt->rc_cqe;
> >> -
> >> 	svc_xprt_get(&rdma->sc_xprt);
> >> 	ret = ib_post_recv(rdma->sc_qp, &ctxt->rc_recv_wr, &bad_recv_wr);
> >> 	trace_svcrdma_post_recv(&ctxt->rc_recv_wr, ret);
> >> @@ -238,12 +234,7 @@ static int svc_rdma_post_recv(struct svcxprt_rdma *rdma)
> >> 		goto err_post;
> >> 	return 0;
> >> 
> >> -err_put_ctxt:
> >> -	svc_rdma_recv_ctxt_unmap(rdma, ctxt);
> >> -	svc_rdma_recv_ctxt_put(rdma, ctxt);
> >> -	return -ENOMEM;
> >> err_post:
> >> -	svc_rdma_recv_ctxt_unmap(rdma, ctxt);
> >> 	svc_rdma_recv_ctxt_put(rdma, ctxt);
> >> 	svc_xprt_put(&rdma->sc_xprt);
> >> 	return ret;
> >> @@ -289,7 +280,6 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
> >> 
> >> 	/* WARNING: Only wc->wr_cqe and wc->status are reliable */
> >> 	ctxt = container_of(cqe, struct svc_rdma_recv_ctxt, rc_cqe);
> >> -	svc_rdma_recv_ctxt_unmap(rdma, ctxt);
> >> 
> >> 	if (wc->status != IB_WC_SUCCESS)
> >> 		goto flushed;
> >> @@ -299,6 +289,10 @@ static void svc_rdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
> >> 
> >> 	/* All wc fields are now known to be valid */
> >> 	ctxt->rc_byte_len = wc->byte_len;
> >> +	ib_dma_sync_single_for_cpu(rdma->sc_pd->device,
> >> +				   ctxt->rc_recv_sge.addr,
> >> +				   wc->byte_len, DMA_FROM_DEVICE);
> >> +
> >> 	spin_lock(&rdma->sc_rq_dto_lock);
> >> 	list_add_tail(&ctxt->rc_list, &rdma->sc_rq_dto_q);
> >> 	spin_unlock(&rdma->sc_rq_dto_lock);
> >> @@ -339,64 +333,22 @@ void svc_rdma_flush_recv_queues(struct svcxprt_rdma *rdma)
> >> 	}
> >> }
> >> 
> >> -/*
> >> - * Replace the pages in the rq_argpages array with the pages from the SGE in
> >> - * the RDMA_RECV completion. The SGL should contain full pages up until the
> >> - * last one.
> >> - */
> >> static void svc_rdma_build_arg_xdr(struct svc_rqst *rqstp,
> >> 				   struct svc_rdma_recv_ctxt *ctxt)
> >> {
> >> -	struct page *page;
> >> -	int sge_no;
> >> -	u32 len;
> >> -
> >> -	/* The reply path assumes the Call's transport header resides
> >> -	 * in rqstp->rq_pages[0].
> >> -	 */
> >> -	page = ctxt->rc_pages[0];
> >> -	put_page(rqstp->rq_pages[0]);
> >> -	rqstp->rq_pages[0] = page;
> >> -
> >> -	/* Set up the XDR head */
> >> -	rqstp->rq_arg.head[0].iov_base = page_address(page);
> >> -	rqstp->rq_arg.head[0].iov_len =
> >> -		min_t(size_t, ctxt->rc_byte_len, ctxt->rc_sges[0].length);
> >> -	rqstp->rq_arg.len = ctxt->rc_byte_len;
> >> -	rqstp->rq_arg.buflen = ctxt->rc_byte_len;
> >> -
> >> -	/* Compute bytes past head in the SGL */
> >> -	len = ctxt->rc_byte_len - rqstp->rq_arg.head[0].iov_len;
> >> -
> >> -	/* If data remains, store it in the pagelist */
> >> -	rqstp->rq_arg.page_len = len;
> >> -	rqstp->rq_arg.page_base = 0;
> >> -
> >> -	sge_no = 1;
> >> -	while (len && sge_no < ctxt->rc_recv_wr.num_sge) {
> >> -		page = ctxt->rc_pages[sge_no];
> >> -		put_page(rqstp->rq_pages[sge_no]);
> >> -		rqstp->rq_pages[sge_no] = page;
> >> -		len -= min_t(u32, len, ctxt->rc_sges[sge_no].length);
> >> -		sge_no++;
> >> -	}
> >> -	ctxt->rc_hdr_count = sge_no;
> >> -	rqstp->rq_respages = &rqstp->rq_pages[sge_no];
> >> +	struct xdr_buf *arg = &rqstp->rq_arg;
> >> +
> >> +	arg->head[0].iov_base = ctxt->rc_recv_buf;
> >> +	arg->head[0].iov_len = ctxt->rc_byte_len;
> >> +	arg->tail[0].iov_base = NULL;
> >> +	arg->tail[0].iov_len = 0;
> >> +	arg->page_len = 0;
> >> +	arg->page_base = 0;
> >> +	arg->buflen = ctxt->rc_byte_len;
> >> +	arg->len = ctxt->rc_byte_len;
> >> +
> >> +	rqstp->rq_respages = &rqstp->rq_pages[0];
> >> 	rqstp->rq_next_page = rqstp->rq_respages + 1;
> >> -
> >> -	/* If not all pages were used from the SGL, free the remaining ones */
> >> -	while (sge_no < ctxt->rc_recv_wr.num_sge) {
> >> -		page = ctxt->rc_pages[sge_no++];
> >> -		put_page(page);
> >> -	}
> >> -
> >> -	/* @ctxt's pages have all been released or moved to @rqstp->rq_pages.
> >> -	 */
> >> -	ctxt->rc_page_count = 0;
> >> -
> >> -	/* Set up tail */
> >> -	rqstp->rq_arg.tail[0].iov_base = NULL;
> >> -	rqstp->rq_arg.tail[0].iov_len = 0;
> >> }
> >> 
> >> /* This accommodates the largest possible Write chunk,
> >> diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
> >> index 8242aa3..ce3ea84 100644
> >> --- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
> >> +++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
> >> @@ -718,15 +718,14 @@ static int svc_rdma_build_normal_read_chunk(struct svc_rqst *rqstp,
> >> 	struct svc_rdma_recv_ctxt *head = info->ri_readctxt;
> >> 	int ret;
> >> 
> >> -	info->ri_pageno = head->rc_hdr_count;
> >> -	info->ri_pageoff = 0;
> >> -
> >> 	ret = svc_rdma_build_read_chunk(rqstp, info, p);
> >> 	if (ret < 0)
> >> 		goto out;
> >> 
> >> 	trace_svcrdma_encode_read(info->ri_chunklen, info->ri_position);
> >> 
> >> +	head->rc_hdr_count = 0;
> >> +
> >> 	/* Split the Receive buffer between the head and tail
> >> 	 * buffers at Read chunk's position. XDR roundup of the
> >> 	 * chunk is not included in either the pagelist or in
> >> @@ -775,9 +774,6 @@ static int svc_rdma_build_pz_read_chunk(struct svc_rqst *rqstp,
> >> 	struct svc_rdma_recv_ctxt *head = info->ri_readctxt;
> >> 	int ret;
> >> 
> >> -	info->ri_pageno = head->rc_hdr_count - 1;
> >> -	info->ri_pageoff = offset_in_page(head->rc_byte_len);
> >> -
> >> 	ret = svc_rdma_build_read_chunk(rqstp, info, p);
> >> 	if (ret < 0)
> >> 		goto out;
> >> @@ -787,20 +783,13 @@ static int svc_rdma_build_pz_read_chunk(struct svc_rqst *rqstp,
> >> 	head->rc_arg.len += info->ri_chunklen;
> >> 	head->rc_arg.buflen += info->ri_chunklen;
> >> 
> >> -	if (head->rc_arg.buflen <= head->rc_sges[0].length) {
> >> -		/* Transport header and RPC message fit entirely
> >> -		 * in page where head iovec resides.
> >> -		 */
> >> -		head->rc_arg.head[0].iov_len = info->ri_chunklen;
> >> -	} else {
> >> -		/* Transport header and part of RPC message reside
> >> -		 * in the head iovec's page.
> >> -		 */
> >> -		head->rc_arg.head[0].iov_len =
> >> -			head->rc_sges[0].length - head->rc_byte_len;
> >> -		head->rc_arg.page_len =
> >> -			info->ri_chunklen - head->rc_arg.head[0].iov_len;
> >> -	}
> >> +	head->rc_hdr_count = 1;
> >> +	head->rc_arg.head[0].iov_base = page_address(head->rc_pages[0]);
> >> +	head->rc_arg.head[0].iov_len = min_t(size_t, PAGE_SIZE,
> >> +					     info->ri_chunklen);
> >> +
> >> +	head->rc_arg.page_len = info->ri_chunklen -
> >> +				head->rc_arg.head[0].iov_len;
> >> 
> >> out:
> >> 	return ret;
> >> @@ -834,7 +823,6 @@ int svc_rdma_recv_read_chunk(struct svcxprt_rdma *rdma, struct svc_rqst *rqstp,
> >> 	 * head->rc_arg. Pages involved with RDMA Read I/O are
> >> 	 * transferred there.
> >> 	 */
> >> -	head->rc_page_count = head->rc_hdr_count;
> >> 	head->rc_arg.head[0] = rqstp->rq_arg.head[0];
> >> 	head->rc_arg.tail[0] = rqstp->rq_arg.tail[0];
> >> 	head->rc_arg.pages = head->rc_pages;
> >> @@ -847,6 +835,8 @@ int svc_rdma_recv_read_chunk(struct svcxprt_rdma *rdma, struct svc_rqst *rqstp,
> >> 	if (!info)
> >> 		return -ENOMEM;
> >> 	info->ri_readctxt = head;
> >> +	info->ri_pageno = 0;
> >> +	info->ri_pageoff = 0;
> >> 
> >> 	info->ri_position = be32_to_cpup(p + 1);
> >> 	if (info->ri_position)
> >> diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> >> index cbbde70..b27b597 100644
> >> --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> >> +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> >> @@ -629,10 +629,7 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
> >> 	struct page *res_page;
> >> 	int ret;
> >> 
> >> -	/* Find the call's chunk lists to decide how to send the reply.
> >> -	 * Receive places the Call's xprt header at the start of page 0.
> >> -	 */
> >> -	rdma_argp = page_address(rqstp->rq_pages[0]);
> >> +	rdma_argp = rctxt->rc_recv_buf;
> >> 	svc_rdma_get_write_arrays(rdma_argp, &wr_lst, &rp_ch);
> >> 
> >> 	/* Create the RDMA response header. xprt->xpt_mutex,
> >> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> >> index 20abd3a..333c432 100644
> >> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> >> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> >> @@ -670,7 +670,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
> >> 	qp_attr.cap.max_send_wr = newxprt->sc_sq_depth - ctxts;
> >> 	qp_attr.cap.max_recv_wr = rq_depth;
> >> 	qp_attr.cap.max_send_sge = newxprt->sc_max_sge;
> >> -	qp_attr.cap.max_recv_sge = newxprt->sc_max_sge;
> >> +	qp_attr.cap.max_recv_sge = 1;
> >> 	qp_attr.sq_sig_type = IB_SIGNAL_REQ_WR;
> >> 	qp_attr.qp_type = IB_QPT_RC;
> >> 	qp_attr.send_cq = newxprt->sc_sq_cq;
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> Chuck Lever
> 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v1 01/19] svcrdma: Add proper SPDX tags for NetApp-contributed source
  2018-05-09 20:42     ` Chuck Lever
@ 2018-05-15 14:52       ` Doug Ledford
  0 siblings, 0 replies; 29+ messages in thread
From: Doug Ledford @ 2018-05-15 14:52 UTC (permalink / raw)
  To: Chuck Lever, Bruce Fields; +Cc: linux-rdma, Linux NFS Mailing List

[-- Attachment #1: Type: text/plain, Size: 4732 bytes --]

On Wed, 2018-05-09 at 16:42 -0400, Chuck Lever wrote:
> > On May 9, 2018, at 4:23 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > 
> > Looking at the git history, it looks like others are taking this as an
> > opportunity to replace the existing boilerplate.  Could we do this here?
> > Looks like "BSD-3-Clause" does in fact refer to a license that's
> > word-for-word the same as the one included here (except for the name of
> > the copyright holder), so I wonder if we need it written out here any
> > more.
> > 
> > (Minor point, I'm applying this anyway and you can follow up with the
> > remval patch or not.)
> 
> Because the holder's name is part of the boilerplate, I feel
> that it is up to the holder to submit a patch removing the
> whole copyright notice if they are comfortable doing that.
> 
> But IANAL. Feel free to convince me that I'm being prudish.

As long as you leave their ownership statement, the SPDX replacement is
a direct replacement for the license text.  Just because the two are in
the same comment block (the ownership statement and date, and the text
of the actual license) does not mean they are a single item.  The
license is a license, the other is an ownership statement, and replacing
the long form license term with an accepted short form license term that
is regarded as being identical to the long form is perfectly acceptable.

> 
> > --b.
> > 
> > On Mon, May 07, 2018 at 03:26:55PM -0400, Chuck Lever wrote:
> > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > > ---
> > > include/linux/sunrpc/svc_rdma.h          |    1 +
> > > net/sunrpc/xprtrdma/svc_rdma.c           |    1 +
> > > net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |    1 +
> > > net/sunrpc/xprtrdma/svc_rdma_sendto.c    |    1 +
> > > net/sunrpc/xprtrdma/svc_rdma_transport.c |    1 +
> > > 5 files changed, 5 insertions(+)
> > > 
> > > diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
> > > index 7337e12..88da0c9 100644
> > > --- a/include/linux/sunrpc/svc_rdma.h
> > > +++ b/include/linux/sunrpc/svc_rdma.h
> > > @@ -1,3 +1,4 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
> > > /*
> > >  * Copyright (c) 2005-2006 Network Appliance, Inc. All rights reserved.
> > >  *
> > > diff --git a/net/sunrpc/xprtrdma/svc_rdma.c b/net/sunrpc/xprtrdma/svc_rdma.c
> > > index dd8a431..a490532 100644
> > > --- a/net/sunrpc/xprtrdma/svc_rdma.c
> > > +++ b/net/sunrpc/xprtrdma/svc_rdma.c
> > > @@ -1,3 +1,4 @@
> > > +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
> > > /*
> > >  * Copyright (c) 2005-2006 Network Appliance, Inc. All rights reserved.
> > >  *
> > > diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > > index 3d45015..9eae95d 100644
> > > --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > > +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > > @@ -1,3 +1,4 @@
> > > +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
> > > /*
> > >  * Copyright (c) 2016, 2017 Oracle. All rights reserved.
> > >  * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
> > > diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> > > index 649441d..79bd3a3 100644
> > > --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> > > +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> > > @@ -1,3 +1,4 @@
> > > +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
> > > /*
> > >  * Copyright (c) 2016 Oracle. All rights reserved.
> > >  * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
> > > diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> > > index 96cc8f6..3633254 100644
> > > --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> > > +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> > > @@ -1,3 +1,4 @@
> > > +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
> > > /*
> > >  * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved.
> > >  * Copyright (c) 2005-2007 Network Appliance, Inc. All rights reserved.
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> Chuck Lever
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2018-05-15 14:52 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-07 19:26 [PATCH v1 00/19] NFS/RDMA server for-next Chuck Lever
2018-05-07 19:26 ` [PATCH v1 01/19] svcrdma: Add proper SPDX tags for NetApp-contributed source Chuck Lever
2018-05-09 20:23   ` J. Bruce Fields
2018-05-09 20:42     ` Chuck Lever
2018-05-15 14:52       ` Doug Ledford
2018-05-07 19:27 ` [PATCH v1 02/19] svcrdma: Use passed-in net namespace when creating RDMA listener Chuck Lever
2018-05-07 19:27 ` [PATCH v1 03/19] xprtrdma: Prepare RPC/RDMA includes for server-side trace points Chuck Lever
2018-05-07 19:27 ` [PATCH v1 04/19] svcrdma: Trace key RPC/RDMA protocol events Chuck Lever
2018-05-07 19:27 ` [PATCH v1 05/19] svcrdma: Trace key RDMA API events Chuck Lever
2018-05-07 19:27 ` [PATCH v1 06/19] svcrdma: Introduce svc_rdma_recv_ctxt Chuck Lever
2018-05-09 20:48   ` J. Bruce Fields
2018-05-09 21:02     ` Chuck Lever
2018-05-07 19:27 ` [PATCH v1 07/19] svcrdma: Remove sc_rq_depth Chuck Lever
2018-05-07 19:27 ` [PATCH v1 08/19] svcrdma: Simplify svc_rdma_recv_ctxt_put Chuck Lever
2018-05-07 19:27 ` [PATCH v1 09/19] svcrdma: Preserve Receive buffer until svc_rdma_sendto Chuck Lever
2018-05-09 21:03   ` J. Bruce Fields
2018-05-07 19:27 ` [PATCH v1 10/19] svcrdma: Persistently allocate and DMA-map Receive buffers Chuck Lever
2018-05-09 21:18   ` J. Bruce Fields
2018-05-09 21:31     ` Chuck Lever
2018-05-09 21:37       ` Bruce Fields
2018-05-07 19:27 ` [PATCH v1 11/19] svcrdma: Allocate recv_ctxt's on CPU handling Receives Chuck Lever
2018-05-07 19:27 ` [PATCH v1 12/19] svcrdma: Refactor svc_rdma_dma_map_buf Chuck Lever
2018-05-07 19:27 ` [PATCH v1 13/19] svcrdma: Clean up Send SGE accounting Chuck Lever
2018-05-07 19:28 ` [PATCH v1 14/19] svcrdma: Introduce svc_rdma_send_ctxt Chuck Lever
2018-05-07 19:28 ` [PATCH v1 15/19] svcrdma: Don't overrun the SGE array in svc_rdma_send_ctxt Chuck Lever
2018-05-07 19:28 ` [PATCH v1 16/19] svcrdma: Remove post_send_wr Chuck Lever
2018-05-07 19:28 ` [PATCH v1 17/19] svcrdma: Simplify svc_rdma_send() Chuck Lever
2018-05-07 19:28 ` [PATCH v1 18/19] svcrdma: Persistently allocate and DMA-map Send buffers Chuck Lever
2018-05-07 19:28 ` [PATCH v1 19/19] svcrdma: Remove unused svc_rdma_op_ctxt Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.