All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/11] NFS/RDMA server patches for v4.5
@ 2015-12-14 21:30 ` Chuck Lever
  0 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-14 21:30 UTC (permalink / raw)
  To: bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA

Here are patches to support server-side bi-directional RPC/RDMA
operation (to enable NFSv4.1 on RPC/RDMA transports). Thanks to
all who reviewed v1, v2, and v3. This version has some significant
changes since the previous one.

In preparation for Doug's final topic branch, Bruce, I've rebased
these on Christoph's ib_device_attr branch. There were some merge
conflicts which I've fixed and tested. These are ready for your
review.

Also available in the "nfsd-rdma-for-4.5" topic branch of this git repo:

git://git.linux-nfs.org/projects/cel/cel-2.6.git

Or for browsing:

http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=log;h=refs/heads/nfsd-rdma-for-4.5


Changes since v3:
- Rebased on Christoph's ib_device_attr branch
- Backchannel patches have been squashed together
- Memory allocation overhaul to prevent blocking allocation
  when sending backchannel calls


Changes since v2:
- Rebased on v4.4-rc4
- Backchannel code in new source file to address dprintk issues
- svc_rdma_get_context() now uses a pre-allocated cache
- Dropped svc_rdma_send clean up


Changes since v1:

- Rebased on v4.4-rc3
- Removed the use of CONFIG_SUNRPC_BACKCHANNEL
- Fixed computation of forward and backward max_requests
- Updated some comments and patch descriptions
- pr_err and pr_info converted to dprintk
- Simplified svc_rdma_get_context()
- Dropped patch removing access_flags field
- NFSv4.1 callbacks tested with for-4.5 client

---

Chuck Lever (11):
      svcrdma: Do not send XDR roundup bytes for a write chunk
      svcrdma: Clean up rdma_create_xprt()
      svcrdma: Clean up process_context()
      svcrdma: Improve allocation of struct svc_rdma_op_ctxt
      svcrdma: Improve allocation of struct svc_rdma_req_map
      svcrdma: Remove unused req_map and ctxt kmem_caches
      svcrdma: Add gfp flags to svc_rdma_post_recv()
      svcrdma: Remove last two __GFP_NOFAIL call sites
      svcrdma: Make map_xdr non-static
      svcrdma: Define maximum number of backchannel requests
      svcrdma: Add class for RDMA backwards direction transport


 include/linux/sunrpc/svc_rdma.h            |   37 ++-
 net/sunrpc/xprt.c                          |    1 
 net/sunrpc/xprtrdma/Makefile               |    2 
 net/sunrpc/xprtrdma/svc_rdma.c             |   41 ---
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |  371 ++++++++++++++++++++++++++++
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c    |   52 ++++
 net/sunrpc/xprtrdma/svc_rdma_sendto.c      |   34 ++-
 net/sunrpc/xprtrdma/svc_rdma_transport.c   |  284 ++++++++++++++++-----
 net/sunrpc/xprtrdma/transport.c            |   30 +-
 net/sunrpc/xprtrdma/xprt_rdma.h            |   20 +-
 10 files changed, 730 insertions(+), 142 deletions(-)
 create mode 100644 net/sunrpc/xprtrdma/svc_rdma_backchannel.c

--
Signature
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v4 00/11] NFS/RDMA server patches for v4.5
@ 2015-12-14 21:30 ` Chuck Lever
  0 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-14 21:30 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

Here are patches to support server-side bi-directional RPC/RDMA
operation (to enable NFSv4.1 on RPC/RDMA transports). Thanks to
all who reviewed v1, v2, and v3. This version has some significant
changes since the previous one.

In preparation for Doug's final topic branch, Bruce, I've rebased
these on Christoph's ib_device_attr branch. There were some merge
conflicts which I've fixed and tested. These are ready for your
review.

Also available in the "nfsd-rdma-for-4.5" topic branch of this git repo:

git://git.linux-nfs.org/projects/cel/cel-2.6.git

Or for browsing:

http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=log;h=refs/heads/nfsd-rdma-for-4.5


Changes since v3:
- Rebased on Christoph's ib_device_attr branch
- Backchannel patches have been squashed together
- Memory allocation overhaul to prevent blocking allocation
  when sending backchannel calls


Changes since v2:
- Rebased on v4.4-rc4
- Backchannel code in new source file to address dprintk issues
- svc_rdma_get_context() now uses a pre-allocated cache
- Dropped svc_rdma_send clean up


Changes since v1:

- Rebased on v4.4-rc3
- Removed the use of CONFIG_SUNRPC_BACKCHANNEL
- Fixed computation of forward and backward max_requests
- Updated some comments and patch descriptions
- pr_err and pr_info converted to dprintk
- Simplified svc_rdma_get_context()
- Dropped patch removing access_flags field
- NFSv4.1 callbacks tested with for-4.5 client

---

Chuck Lever (11):
      svcrdma: Do not send XDR roundup bytes for a write chunk
      svcrdma: Clean up rdma_create_xprt()
      svcrdma: Clean up process_context()
      svcrdma: Improve allocation of struct svc_rdma_op_ctxt
      svcrdma: Improve allocation of struct svc_rdma_req_map
      svcrdma: Remove unused req_map and ctxt kmem_caches
      svcrdma: Add gfp flags to svc_rdma_post_recv()
      svcrdma: Remove last two __GFP_NOFAIL call sites
      svcrdma: Make map_xdr non-static
      svcrdma: Define maximum number of backchannel requests
      svcrdma: Add class for RDMA backwards direction transport


 include/linux/sunrpc/svc_rdma.h            |   37 ++-
 net/sunrpc/xprt.c                          |    1 
 net/sunrpc/xprtrdma/Makefile               |    2 
 net/sunrpc/xprtrdma/svc_rdma.c             |   41 ---
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |  371 ++++++++++++++++++++++++++++
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c    |   52 ++++
 net/sunrpc/xprtrdma/svc_rdma_sendto.c      |   34 ++-
 net/sunrpc/xprtrdma/svc_rdma_transport.c   |  284 ++++++++++++++++-----
 net/sunrpc/xprtrdma/transport.c            |   30 +-
 net/sunrpc/xprtrdma/xprt_rdma.h            |   20 +-
 10 files changed, 730 insertions(+), 142 deletions(-)
 create mode 100644 net/sunrpc/xprtrdma/svc_rdma_backchannel.c

--
Signature

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v4 01/11] svcrdma: Do not send XDR roundup bytes for a write chunk
  2015-12-14 21:30 ` Chuck Lever
@ 2015-12-14 21:30     ` Chuck Lever
  -1 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-14 21:30 UTC (permalink / raw)
  To: bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA

Minor optimization: when dealing with write chunk XDR roundup, do
not post a Write WR for the zero bytes in the pad. Simply update
the write segment in the RPC-over-RDMA header to reflect the extra
pad bytes.

The Reply chunk is also a write chunk, but the server does not use
send_write_chunks() to send the Reply chunk. That's OK in this case:
the server Upper Layer typically marshals the Reply chunk contents
in a single contiguous buffer, without a separate tail for the XDR
pad.

The comments and the variable naming refer to "chunks" but what is
really meant is "segments." The existing code sends only one
xdr_write_chunk per RPC reply.

The fix assumes this as well. When the XDR pad in the first write
chunk is reached, the assumption is the Write list is complete and
send_write_chunks() returns.

That will remain a valid assumption until the server Upper Layer can
support multiple bulk payload results per RPC.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 net/sunrpc/xprtrdma/svc_rdma_sendto.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index 969a1ab..bad5eaa 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -342,6 +342,13 @@ static int send_write_chunks(struct svcxprt_rdma *xprt,
 						arg_ch->rs_handle,
 						arg_ch->rs_offset,
 						write_len);
+
+		/* Do not send XDR pad bytes */
+		if (chunk_no && write_len < 4) {
+			chunk_no++;
+			break;
+		}
+
 		chunk_off = 0;
 		while (write_len) {
 			ret = send_write(xprt, rqstp,

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 01/11] svcrdma: Do not send XDR roundup bytes for a write chunk
@ 2015-12-14 21:30     ` Chuck Lever
  0 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-14 21:30 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

Minor optimization: when dealing with write chunk XDR roundup, do
not post a Write WR for the zero bytes in the pad. Simply update
the write segment in the RPC-over-RDMA header to reflect the extra
pad bytes.

The Reply chunk is also a write chunk, but the server does not use
send_write_chunks() to send the Reply chunk. That's OK in this case:
the server Upper Layer typically marshals the Reply chunk contents
in a single contiguous buffer, without a separate tail for the XDR
pad.

The comments and the variable naming refer to "chunks" but what is
really meant is "segments." The existing code sends only one
xdr_write_chunk per RPC reply.

The fix assumes this as well. When the XDR pad in the first write
chunk is reached, the assumption is the Write list is complete and
send_write_chunks() returns.

That will remain a valid assumption until the server Upper Layer can
support multiple bulk payload results per RPC.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/svc_rdma_sendto.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index 969a1ab..bad5eaa 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -342,6 +342,13 @@ static int send_write_chunks(struct svcxprt_rdma *xprt,
 						arg_ch->rs_handle,
 						arg_ch->rs_offset,
 						write_len);
+
+		/* Do not send XDR pad bytes */
+		if (chunk_no && write_len < 4) {
+			chunk_no++;
+			break;
+		}
+
 		chunk_off = 0;
 		while (write_len) {
 			ret = send_write(xprt, rqstp,


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 02/11] svcrdma: Clean up rdma_create_xprt()
  2015-12-14 21:30 ` Chuck Lever
@ 2015-12-14 21:30     ` Chuck Lever
  -1 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-14 21:30 UTC (permalink / raw)
  To: bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA

kzalloc is used here, so setting the atomic fields to zero is
unnecessary. sc_ord is set again in handle_connect_req. The other
fields are re-initialized in svc_rdma_accept().

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 net/sunrpc/xprtrdma/svc_rdma_transport.c |    9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 9f3eb89..27f338a 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -529,14 +529,6 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv,
 	spin_lock_init(&cma_xprt->sc_rq_dto_lock);
 	spin_lock_init(&cma_xprt->sc_frmr_q_lock);
 
-	cma_xprt->sc_ord = svcrdma_ord;
-
-	cma_xprt->sc_max_req_size = svcrdma_max_req_size;
-	cma_xprt->sc_max_requests = svcrdma_max_requests;
-	cma_xprt->sc_sq_depth = svcrdma_max_requests * RPCRDMA_SQ_DEPTH_MULT;
-	atomic_set(&cma_xprt->sc_sq_count, 0);
-	atomic_set(&cma_xprt->sc_ctxt_used, 0);
-
 	if (listener)
 		set_bit(XPT_LISTENER, &cma_xprt->sc_xprt.xpt_flags);
 
@@ -918,6 +910,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 				  (size_t)RPCSVC_MAXPAGES);
 	newxprt->sc_max_sge_rd = min_t(size_t, dev->max_sge_rd,
 				       RPCSVC_MAXPAGES);
+	newxprt->sc_max_req_size = svcrdma_max_req_size;
 	newxprt->sc_max_requests = min((size_t)dev->max_qp_wr,
 				   (size_t)svcrdma_max_requests);
 	newxprt->sc_sq_depth = RPCRDMA_SQ_DEPTH_MULT * newxprt->sc_max_requests;

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 02/11] svcrdma: Clean up rdma_create_xprt()
@ 2015-12-14 21:30     ` Chuck Lever
  0 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-14 21:30 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

kzalloc is used here, so setting the atomic fields to zero is
unnecessary. sc_ord is set again in handle_connect_req. The other
fields are re-initialized in svc_rdma_accept().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/svc_rdma_transport.c |    9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 9f3eb89..27f338a 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -529,14 +529,6 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv,
 	spin_lock_init(&cma_xprt->sc_rq_dto_lock);
 	spin_lock_init(&cma_xprt->sc_frmr_q_lock);
 
-	cma_xprt->sc_ord = svcrdma_ord;
-
-	cma_xprt->sc_max_req_size = svcrdma_max_req_size;
-	cma_xprt->sc_max_requests = svcrdma_max_requests;
-	cma_xprt->sc_sq_depth = svcrdma_max_requests * RPCRDMA_SQ_DEPTH_MULT;
-	atomic_set(&cma_xprt->sc_sq_count, 0);
-	atomic_set(&cma_xprt->sc_ctxt_used, 0);
-
 	if (listener)
 		set_bit(XPT_LISTENER, &cma_xprt->sc_xprt.xpt_flags);
 
@@ -918,6 +910,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 				  (size_t)RPCSVC_MAXPAGES);
 	newxprt->sc_max_sge_rd = min_t(size_t, dev->max_sge_rd,
 				       RPCSVC_MAXPAGES);
+	newxprt->sc_max_req_size = svcrdma_max_req_size;
 	newxprt->sc_max_requests = min((size_t)dev->max_qp_wr,
 				   (size_t)svcrdma_max_requests);
 	newxprt->sc_sq_depth = RPCRDMA_SQ_DEPTH_MULT * newxprt->sc_max_requests;


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 03/11] svcrdma: Clean up process_context()
  2015-12-14 21:30 ` Chuck Lever
@ 2015-12-14 21:30     ` Chuck Lever
  -1 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-14 21:30 UTC (permalink / raw)
  To: bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA

Be sure the completed ctxt is put in every path.

The xprt enqueue can take a while, so put the completed ctxt back
in circulation _before_ enqueuing the xprt.

Remove/disable debugging.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 net/sunrpc/xprtrdma/svc_rdma_transport.c |   44 ++++++++++++++----------------
 1 file changed, 21 insertions(+), 23 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 27f338a..0783f6e 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -386,46 +386,44 @@ static void rq_cq_reap(struct svcxprt_rdma *xprt)
 static void process_context(struct svcxprt_rdma *xprt,
 			    struct svc_rdma_op_ctxt *ctxt)
 {
+	struct svc_rdma_op_ctxt *read_hdr;
+	int free_pages = 0;
+
 	svc_rdma_unmap_dma(ctxt);
 
 	switch (ctxt->wr_op) {
 	case IB_WR_SEND:
-		if (ctxt->frmr)
-			pr_err("svcrdma: SEND: ctxt->frmr != NULL\n");
-		svc_rdma_put_context(ctxt, 1);
+		free_pages = 1;
 		break;
 
 	case IB_WR_RDMA_WRITE:
-		if (ctxt->frmr)
-			pr_err("svcrdma: WRITE: ctxt->frmr != NULL\n");
-		svc_rdma_put_context(ctxt, 0);
 		break;
 
 	case IB_WR_RDMA_READ:
 	case IB_WR_RDMA_READ_WITH_INV:
 		svc_rdma_put_frmr(xprt, ctxt->frmr);
-		if (test_bit(RDMACTXT_F_LAST_CTXT, &ctxt->flags)) {
-			struct svc_rdma_op_ctxt *read_hdr = ctxt->read_hdr;
-			if (read_hdr) {
-				spin_lock_bh(&xprt->sc_rq_dto_lock);
-				set_bit(XPT_DATA, &xprt->sc_xprt.xpt_flags);
-				list_add_tail(&read_hdr->dto_q,
-					      &xprt->sc_read_complete_q);
-				spin_unlock_bh(&xprt->sc_rq_dto_lock);
-			} else {
-				pr_err("svcrdma: ctxt->read_hdr == NULL\n");
-			}
-			svc_xprt_enqueue(&xprt->sc_xprt);
-		}
+
+		if (!test_bit(RDMACTXT_F_LAST_CTXT, &ctxt->flags))
+			break;
+
+		read_hdr = ctxt->read_hdr;
 		svc_rdma_put_context(ctxt, 0);
-		break;
+
+		spin_lock_bh(&xprt->sc_rq_dto_lock);
+		set_bit(XPT_DATA, &xprt->sc_xprt.xpt_flags);
+		list_add_tail(&read_hdr->dto_q,
+			      &xprt->sc_read_complete_q);
+		spin_unlock_bh(&xprt->sc_rq_dto_lock);
+		svc_xprt_enqueue(&xprt->sc_xprt);
+		return;
 
 	default:
-		printk(KERN_ERR "svcrdma: unexpected completion type, "
-		       "opcode=%d\n",
-		       ctxt->wr_op);
+		dprintk("svcrdma: unexpected completion opcode=%d\n",
+			ctxt->wr_op);
 		break;
 	}
+
+	svc_rdma_put_context(ctxt, free_pages);
 }
 
 /*

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 03/11] svcrdma: Clean up process_context()
@ 2015-12-14 21:30     ` Chuck Lever
  0 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-14 21:30 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

Be sure the completed ctxt is put in every path.

The xprt enqueue can take a while, so put the completed ctxt back
in circulation _before_ enqueuing the xprt.

Remove/disable debugging.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/svc_rdma_transport.c |   44 ++++++++++++++----------------
 1 file changed, 21 insertions(+), 23 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 27f338a..0783f6e 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -386,46 +386,44 @@ static void rq_cq_reap(struct svcxprt_rdma *xprt)
 static void process_context(struct svcxprt_rdma *xprt,
 			    struct svc_rdma_op_ctxt *ctxt)
 {
+	struct svc_rdma_op_ctxt *read_hdr;
+	int free_pages = 0;
+
 	svc_rdma_unmap_dma(ctxt);
 
 	switch (ctxt->wr_op) {
 	case IB_WR_SEND:
-		if (ctxt->frmr)
-			pr_err("svcrdma: SEND: ctxt->frmr != NULL\n");
-		svc_rdma_put_context(ctxt, 1);
+		free_pages = 1;
 		break;
 
 	case IB_WR_RDMA_WRITE:
-		if (ctxt->frmr)
-			pr_err("svcrdma: WRITE: ctxt->frmr != NULL\n");
-		svc_rdma_put_context(ctxt, 0);
 		break;
 
 	case IB_WR_RDMA_READ:
 	case IB_WR_RDMA_READ_WITH_INV:
 		svc_rdma_put_frmr(xprt, ctxt->frmr);
-		if (test_bit(RDMACTXT_F_LAST_CTXT, &ctxt->flags)) {
-			struct svc_rdma_op_ctxt *read_hdr = ctxt->read_hdr;
-			if (read_hdr) {
-				spin_lock_bh(&xprt->sc_rq_dto_lock);
-				set_bit(XPT_DATA, &xprt->sc_xprt.xpt_flags);
-				list_add_tail(&read_hdr->dto_q,
-					      &xprt->sc_read_complete_q);
-				spin_unlock_bh(&xprt->sc_rq_dto_lock);
-			} else {
-				pr_err("svcrdma: ctxt->read_hdr == NULL\n");
-			}
-			svc_xprt_enqueue(&xprt->sc_xprt);
-		}
+
+		if (!test_bit(RDMACTXT_F_LAST_CTXT, &ctxt->flags))
+			break;
+
+		read_hdr = ctxt->read_hdr;
 		svc_rdma_put_context(ctxt, 0);
-		break;
+
+		spin_lock_bh(&xprt->sc_rq_dto_lock);
+		set_bit(XPT_DATA, &xprt->sc_xprt.xpt_flags);
+		list_add_tail(&read_hdr->dto_q,
+			      &xprt->sc_read_complete_q);
+		spin_unlock_bh(&xprt->sc_rq_dto_lock);
+		svc_xprt_enqueue(&xprt->sc_xprt);
+		return;
 
 	default:
-		printk(KERN_ERR "svcrdma: unexpected completion type, "
-		       "opcode=%d\n",
-		       ctxt->wr_op);
+		dprintk("svcrdma: unexpected completion opcode=%d\n",
+			ctxt->wr_op);
 		break;
 	}
+
+	svc_rdma_put_context(ctxt, free_pages);
 }
 
 /*


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 04/11] svcrdma: Improve allocation of struct svc_rdma_op_ctxt
  2015-12-14 21:30 ` Chuck Lever
@ 2015-12-14 21:30     ` Chuck Lever
  -1 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-14 21:30 UTC (permalink / raw)
  To: bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA

When the maximum payload size of NFS READ and WRITE was increased
by commit cc9a903d915c ("svcrdma: Change maximum server payload back
to RPCSVC_MAXPAYLOAD"), the size of struct svc_rdma_op_ctxt
increased to over 6KB (on x86_64). That makes allocating one of
these from a kmem_cache more likely to fail in situations when
system memory is exhausted.

Since I'm about to add a caller where this allocation must always
work _and_ it cannot sleep, pre-allocate ctxts for each connection.

Another motivation for this change is that NFSv4.x servers are
required by specification not to drop NFS requests. Pre-allocating
memory resources reduces the likelihood of a drop.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 include/linux/sunrpc/svc_rdma.h          |    6 +-
 net/sunrpc/xprtrdma/svc_rdma_transport.c |  102 ++++++++++++++++++++++++++----
 2 files changed, 94 insertions(+), 14 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index f869807..be2804b 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -69,6 +69,7 @@ extern atomic_t rdma_stat_sq_prod;
  * completes.
  */
 struct svc_rdma_op_ctxt {
+	struct list_head free;
 	struct svc_rdma_op_ctxt *read_hdr;
 	struct svc_rdma_fastreg_mr *frmr;
 	int hdr_count;
@@ -141,7 +142,10 @@ struct svcxprt_rdma {
 	struct ib_pd         *sc_pd;
 
 	atomic_t	     sc_dma_used;
-	atomic_t	     sc_ctxt_used;
+	spinlock_t	     sc_ctxt_lock;
+	struct list_head     sc_ctxts;
+	int		     sc_ctxt_used;
+
 	struct list_head     sc_rq_dto_q;
 	spinlock_t	     sc_rq_dto_lock;
 	struct ib_qp         *sc_qp;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 0783f6e..58ed9f2 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -153,18 +153,76 @@ static void svc_rdma_bc_free(struct svc_xprt *xprt)
 }
 #endif	/* CONFIG_SUNRPC_BACKCHANNEL */
 
-struct svc_rdma_op_ctxt *svc_rdma_get_context(struct svcxprt_rdma *xprt)
+static struct svc_rdma_op_ctxt *alloc_ctxt(struct svcxprt_rdma *xprt,
+					   gfp_t flags)
 {
 	struct svc_rdma_op_ctxt *ctxt;
 
-	ctxt = kmem_cache_alloc(svc_rdma_ctxt_cachep,
-				GFP_KERNEL | __GFP_NOFAIL);
-	ctxt->xprt = xprt;
-	INIT_LIST_HEAD(&ctxt->dto_q);
+	ctxt = kmalloc(sizeof(*ctxt), flags);
+	if (ctxt) {
+		ctxt->xprt = xprt;
+		INIT_LIST_HEAD(&ctxt->free);
+		INIT_LIST_HEAD(&ctxt->dto_q);
+	}
+	return ctxt;
+}
+
+static bool svc_rdma_prealloc_ctxts(struct svcxprt_rdma *xprt)
+{
+	int i;
+
+	/* Each RPC/RDMA credit can consume a number of send
+	 * and receive WQEs. One ctxt is allocated for each.
+	 */
+	i = xprt->sc_sq_depth + xprt->sc_max_requests;
+
+	while (i--) {
+		struct svc_rdma_op_ctxt *ctxt;
+
+		ctxt = alloc_ctxt(xprt, GFP_KERNEL);
+		if (!ctxt) {
+			dprintk("svcrdma: No memory for RDMA ctxt\n");
+			return false;
+		}
+		list_add(&ctxt->free, &xprt->sc_ctxts);
+	}
+	return true;
+}
+
+struct svc_rdma_op_ctxt *svc_rdma_get_context(struct svcxprt_rdma *xprt)
+{
+	struct svc_rdma_op_ctxt *ctxt = NULL;
+
+	spin_lock_bh(&xprt->sc_ctxt_lock);
+	xprt->sc_ctxt_used++;
+	if (list_empty(&xprt->sc_ctxts))
+		goto out_empty;
+
+	ctxt = list_first_entry(&xprt->sc_ctxts,
+				struct svc_rdma_op_ctxt, free);
+	list_del_init(&ctxt->free);
+	spin_unlock_bh(&xprt->sc_ctxt_lock);
+
+out:
 	ctxt->count = 0;
 	ctxt->frmr = NULL;
-	atomic_inc(&xprt->sc_ctxt_used);
 	return ctxt;
+
+out_empty:
+	/* Either pre-allocation missed the mark, or send
+	 * queue accounting is broken.
+	 */
+	spin_unlock_bh(&xprt->sc_ctxt_lock);
+
+	ctxt = alloc_ctxt(xprt, GFP_NOIO);
+	if (ctxt)
+		goto out;
+
+	spin_lock_bh(&xprt->sc_ctxt_lock);
+	xprt->sc_ctxt_used--;
+	spin_unlock_bh(&xprt->sc_ctxt_lock);
+	WARN_ON_ONCE("svcrdma: empty RDMA ctxt list?\n");
+	return NULL;
 }
 
 void svc_rdma_unmap_dma(struct svc_rdma_op_ctxt *ctxt)
@@ -190,16 +248,29 @@ void svc_rdma_unmap_dma(struct svc_rdma_op_ctxt *ctxt)
 
 void svc_rdma_put_context(struct svc_rdma_op_ctxt *ctxt, int free_pages)
 {
-	struct svcxprt_rdma *xprt;
+	struct svcxprt_rdma *xprt = ctxt->xprt;
 	int i;
 
-	xprt = ctxt->xprt;
 	if (free_pages)
 		for (i = 0; i < ctxt->count; i++)
 			put_page(ctxt->pages[i]);
 
-	kmem_cache_free(svc_rdma_ctxt_cachep, ctxt);
-	atomic_dec(&xprt->sc_ctxt_used);
+	spin_lock_bh(&xprt->sc_ctxt_lock);
+	xprt->sc_ctxt_used--;
+	list_add(&ctxt->free, &xprt->sc_ctxts);
+	spin_unlock_bh(&xprt->sc_ctxt_lock);
+}
+
+static void svc_rdma_destroy_ctxts(struct svcxprt_rdma *xprt)
+{
+	while (!list_empty(&xprt->sc_ctxts)) {
+		struct svc_rdma_op_ctxt *ctxt;
+
+		ctxt = list_first_entry(&xprt->sc_ctxts,
+					struct svc_rdma_op_ctxt, free);
+		list_del(&ctxt->free);
+		kfree(ctxt);
+	}
 }
 
 /*
@@ -521,11 +592,13 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv,
 	INIT_LIST_HEAD(&cma_xprt->sc_rq_dto_q);
 	INIT_LIST_HEAD(&cma_xprt->sc_read_complete_q);
 	INIT_LIST_HEAD(&cma_xprt->sc_frmr_q);
+	INIT_LIST_HEAD(&cma_xprt->sc_ctxts);
 	init_waitqueue_head(&cma_xprt->sc_send_wait);
 
 	spin_lock_init(&cma_xprt->sc_lock);
 	spin_lock_init(&cma_xprt->sc_rq_dto_lock);
 	spin_lock_init(&cma_xprt->sc_frmr_q_lock);
+	spin_lock_init(&cma_xprt->sc_ctxt_lock);
 
 	if (listener)
 		set_bit(XPT_LISTENER, &cma_xprt->sc_xprt.xpt_flags);
@@ -913,6 +986,9 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 				   (size_t)svcrdma_max_requests);
 	newxprt->sc_sq_depth = RPCRDMA_SQ_DEPTH_MULT * newxprt->sc_max_requests;
 
+	if (!svc_rdma_prealloc_ctxts(newxprt))
+		goto errout;
+
 	/*
 	 * Limit ORD based on client limit, local device limit, and
 	 * configured svcrdma limit.
@@ -1174,15 +1250,15 @@ static void __svc_rdma_free(struct work_struct *work)
 	}
 
 	/* Warn if we leaked a resource or under-referenced */
-	if (atomic_read(&rdma->sc_ctxt_used) != 0)
+	if (rdma->sc_ctxt_used != 0)
 		pr_err("svcrdma: ctxt still in use? (%d)\n",
-		       atomic_read(&rdma->sc_ctxt_used));
+		       rdma->sc_ctxt_used);
 	if (atomic_read(&rdma->sc_dma_used) != 0)
 		pr_err("svcrdma: dma still in use? (%d)\n",
 		       atomic_read(&rdma->sc_dma_used));
 
-	/* De-allocate fastreg mr */
 	rdma_dealloc_frmr_q(rdma);
+	svc_rdma_destroy_ctxts(rdma);
 
 	/* Destroy the QP if present (not a listener) */
 	if (rdma->sc_qp && !IS_ERR(rdma->sc_qp))

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 04/11] svcrdma: Improve allocation of struct svc_rdma_op_ctxt
@ 2015-12-14 21:30     ` Chuck Lever
  0 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-14 21:30 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

When the maximum payload size of NFS READ and WRITE was increased
by commit cc9a903d915c ("svcrdma: Change maximum server payload back
to RPCSVC_MAXPAYLOAD"), the size of struct svc_rdma_op_ctxt
increased to over 6KB (on x86_64). That makes allocating one of
these from a kmem_cache more likely to fail in situations when
system memory is exhausted.

Since I'm about to add a caller where this allocation must always
work _and_ it cannot sleep, pre-allocate ctxts for each connection.

Another motivation for this change is that NFSv4.x servers are
required by specification not to drop NFS requests. Pre-allocating
memory resources reduces the likelihood of a drop.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h          |    6 +-
 net/sunrpc/xprtrdma/svc_rdma_transport.c |  102 ++++++++++++++++++++++++++----
 2 files changed, 94 insertions(+), 14 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index f869807..be2804b 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -69,6 +69,7 @@ extern atomic_t rdma_stat_sq_prod;
  * completes.
  */
 struct svc_rdma_op_ctxt {
+	struct list_head free;
 	struct svc_rdma_op_ctxt *read_hdr;
 	struct svc_rdma_fastreg_mr *frmr;
 	int hdr_count;
@@ -141,7 +142,10 @@ struct svcxprt_rdma {
 	struct ib_pd         *sc_pd;
 
 	atomic_t	     sc_dma_used;
-	atomic_t	     sc_ctxt_used;
+	spinlock_t	     sc_ctxt_lock;
+	struct list_head     sc_ctxts;
+	int		     sc_ctxt_used;
+
 	struct list_head     sc_rq_dto_q;
 	spinlock_t	     sc_rq_dto_lock;
 	struct ib_qp         *sc_qp;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 0783f6e..58ed9f2 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -153,18 +153,76 @@ static void svc_rdma_bc_free(struct svc_xprt *xprt)
 }
 #endif	/* CONFIG_SUNRPC_BACKCHANNEL */
 
-struct svc_rdma_op_ctxt *svc_rdma_get_context(struct svcxprt_rdma *xprt)
+static struct svc_rdma_op_ctxt *alloc_ctxt(struct svcxprt_rdma *xprt,
+					   gfp_t flags)
 {
 	struct svc_rdma_op_ctxt *ctxt;
 
-	ctxt = kmem_cache_alloc(svc_rdma_ctxt_cachep,
-				GFP_KERNEL | __GFP_NOFAIL);
-	ctxt->xprt = xprt;
-	INIT_LIST_HEAD(&ctxt->dto_q);
+	ctxt = kmalloc(sizeof(*ctxt), flags);
+	if (ctxt) {
+		ctxt->xprt = xprt;
+		INIT_LIST_HEAD(&ctxt->free);
+		INIT_LIST_HEAD(&ctxt->dto_q);
+	}
+	return ctxt;
+}
+
+static bool svc_rdma_prealloc_ctxts(struct svcxprt_rdma *xprt)
+{
+	int i;
+
+	/* Each RPC/RDMA credit can consume a number of send
+	 * and receive WQEs. One ctxt is allocated for each.
+	 */
+	i = xprt->sc_sq_depth + xprt->sc_max_requests;
+
+	while (i--) {
+		struct svc_rdma_op_ctxt *ctxt;
+
+		ctxt = alloc_ctxt(xprt, GFP_KERNEL);
+		if (!ctxt) {
+			dprintk("svcrdma: No memory for RDMA ctxt\n");
+			return false;
+		}
+		list_add(&ctxt->free, &xprt->sc_ctxts);
+	}
+	return true;
+}
+
+struct svc_rdma_op_ctxt *svc_rdma_get_context(struct svcxprt_rdma *xprt)
+{
+	struct svc_rdma_op_ctxt *ctxt = NULL;
+
+	spin_lock_bh(&xprt->sc_ctxt_lock);
+	xprt->sc_ctxt_used++;
+	if (list_empty(&xprt->sc_ctxts))
+		goto out_empty;
+
+	ctxt = list_first_entry(&xprt->sc_ctxts,
+				struct svc_rdma_op_ctxt, free);
+	list_del_init(&ctxt->free);
+	spin_unlock_bh(&xprt->sc_ctxt_lock);
+
+out:
 	ctxt->count = 0;
 	ctxt->frmr = NULL;
-	atomic_inc(&xprt->sc_ctxt_used);
 	return ctxt;
+
+out_empty:
+	/* Either pre-allocation missed the mark, or send
+	 * queue accounting is broken.
+	 */
+	spin_unlock_bh(&xprt->sc_ctxt_lock);
+
+	ctxt = alloc_ctxt(xprt, GFP_NOIO);
+	if (ctxt)
+		goto out;
+
+	spin_lock_bh(&xprt->sc_ctxt_lock);
+	xprt->sc_ctxt_used--;
+	spin_unlock_bh(&xprt->sc_ctxt_lock);
+	WARN_ON_ONCE("svcrdma: empty RDMA ctxt list?\n");
+	return NULL;
 }
 
 void svc_rdma_unmap_dma(struct svc_rdma_op_ctxt *ctxt)
@@ -190,16 +248,29 @@ void svc_rdma_unmap_dma(struct svc_rdma_op_ctxt *ctxt)
 
 void svc_rdma_put_context(struct svc_rdma_op_ctxt *ctxt, int free_pages)
 {
-	struct svcxprt_rdma *xprt;
+	struct svcxprt_rdma *xprt = ctxt->xprt;
 	int i;
 
-	xprt = ctxt->xprt;
 	if (free_pages)
 		for (i = 0; i < ctxt->count; i++)
 			put_page(ctxt->pages[i]);
 
-	kmem_cache_free(svc_rdma_ctxt_cachep, ctxt);
-	atomic_dec(&xprt->sc_ctxt_used);
+	spin_lock_bh(&xprt->sc_ctxt_lock);
+	xprt->sc_ctxt_used--;
+	list_add(&ctxt->free, &xprt->sc_ctxts);
+	spin_unlock_bh(&xprt->sc_ctxt_lock);
+}
+
+static void svc_rdma_destroy_ctxts(struct svcxprt_rdma *xprt)
+{
+	while (!list_empty(&xprt->sc_ctxts)) {
+		struct svc_rdma_op_ctxt *ctxt;
+
+		ctxt = list_first_entry(&xprt->sc_ctxts,
+					struct svc_rdma_op_ctxt, free);
+		list_del(&ctxt->free);
+		kfree(ctxt);
+	}
 }
 
 /*
@@ -521,11 +592,13 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv,
 	INIT_LIST_HEAD(&cma_xprt->sc_rq_dto_q);
 	INIT_LIST_HEAD(&cma_xprt->sc_read_complete_q);
 	INIT_LIST_HEAD(&cma_xprt->sc_frmr_q);
+	INIT_LIST_HEAD(&cma_xprt->sc_ctxts);
 	init_waitqueue_head(&cma_xprt->sc_send_wait);
 
 	spin_lock_init(&cma_xprt->sc_lock);
 	spin_lock_init(&cma_xprt->sc_rq_dto_lock);
 	spin_lock_init(&cma_xprt->sc_frmr_q_lock);
+	spin_lock_init(&cma_xprt->sc_ctxt_lock);
 
 	if (listener)
 		set_bit(XPT_LISTENER, &cma_xprt->sc_xprt.xpt_flags);
@@ -913,6 +986,9 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 				   (size_t)svcrdma_max_requests);
 	newxprt->sc_sq_depth = RPCRDMA_SQ_DEPTH_MULT * newxprt->sc_max_requests;
 
+	if (!svc_rdma_prealloc_ctxts(newxprt))
+		goto errout;
+
 	/*
 	 * Limit ORD based on client limit, local device limit, and
 	 * configured svcrdma limit.
@@ -1174,15 +1250,15 @@ static void __svc_rdma_free(struct work_struct *work)
 	}
 
 	/* Warn if we leaked a resource or under-referenced */
-	if (atomic_read(&rdma->sc_ctxt_used) != 0)
+	if (rdma->sc_ctxt_used != 0)
 		pr_err("svcrdma: ctxt still in use? (%d)\n",
-		       atomic_read(&rdma->sc_ctxt_used));
+		       rdma->sc_ctxt_used);
 	if (atomic_read(&rdma->sc_dma_used) != 0)
 		pr_err("svcrdma: dma still in use? (%d)\n",
 		       atomic_read(&rdma->sc_dma_used));
 
-	/* De-allocate fastreg mr */
 	rdma_dealloc_frmr_q(rdma);
+	svc_rdma_destroy_ctxts(rdma);
 
 	/* Destroy the QP if present (not a listener) */
 	if (rdma->sc_qp && !IS_ERR(rdma->sc_qp))


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 05/11] svcrdma: Improve allocation of struct svc_rdma_req_map
  2015-12-14 21:30 ` Chuck Lever
@ 2015-12-14 21:30     ` Chuck Lever
  -1 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-14 21:30 UTC (permalink / raw)
  To: bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA

To ensure this allocation cannot fail and will not sleep,
pre-allocate the req_map structures per-connection.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 include/linux/sunrpc/svc_rdma.h          |    8 ++-
 net/sunrpc/xprtrdma/svc_rdma_sendto.c    |    6 +-
 net/sunrpc/xprtrdma/svc_rdma_transport.c |   85 ++++++++++++++++++++++++++----
 3 files changed, 84 insertions(+), 15 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index be2804b..05bf4fe 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -113,6 +113,7 @@ struct svc_rdma_fastreg_mr {
 	struct list_head frmr_list;
 };
 struct svc_rdma_req_map {
+	struct list_head free;
 	unsigned long count;
 	union {
 		struct kvec sge[RPCSVC_MAXPAGES];
@@ -145,6 +146,8 @@ struct svcxprt_rdma {
 	spinlock_t	     sc_ctxt_lock;
 	struct list_head     sc_ctxts;
 	int		     sc_ctxt_used;
+	spinlock_t	     sc_map_lock;
+	struct list_head     sc_maps;
 
 	struct list_head     sc_rq_dto_q;
 	spinlock_t	     sc_rq_dto_lock;
@@ -223,8 +226,9 @@ extern int svc_rdma_create_listen(struct svc_serv *, int, struct sockaddr *);
 extern struct svc_rdma_op_ctxt *svc_rdma_get_context(struct svcxprt_rdma *);
 extern void svc_rdma_put_context(struct svc_rdma_op_ctxt *, int);
 extern void svc_rdma_unmap_dma(struct svc_rdma_op_ctxt *ctxt);
-extern struct svc_rdma_req_map *svc_rdma_get_req_map(void);
-extern void svc_rdma_put_req_map(struct svc_rdma_req_map *);
+extern struct svc_rdma_req_map *svc_rdma_get_req_map(struct svcxprt_rdma *);
+extern void svc_rdma_put_req_map(struct svcxprt_rdma *,
+				 struct svc_rdma_req_map *);
 extern struct svc_rdma_fastreg_mr *svc_rdma_get_frmr(struct svcxprt_rdma *);
 extern void svc_rdma_put_frmr(struct svcxprt_rdma *,
 			      struct svc_rdma_fastreg_mr *);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index bad5eaa..b75566c 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -598,7 +598,7 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
 	/* Build an req vec for the XDR */
 	ctxt = svc_rdma_get_context(rdma);
 	ctxt->direction = DMA_TO_DEVICE;
-	vec = svc_rdma_get_req_map();
+	vec = svc_rdma_get_req_map(rdma);
 	ret = map_xdr(rdma, &rqstp->rq_res, vec);
 	if (ret)
 		goto err0;
@@ -637,14 +637,14 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
 
 	ret = send_reply(rdma, rqstp, res_page, rdma_resp, ctxt, vec,
 			 inline_bytes);
-	svc_rdma_put_req_map(vec);
+	svc_rdma_put_req_map(rdma, vec);
 	dprintk("svcrdma: send_reply returns %d\n", ret);
 	return ret;
 
  err1:
 	put_page(res_page);
  err0:
-	svc_rdma_put_req_map(vec);
+	svc_rdma_put_req_map(rdma, vec);
 	svc_rdma_put_context(ctxt, 0);
 	return ret;
 }
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 58ed9f2..ec10ae3 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -273,23 +273,83 @@ static void svc_rdma_destroy_ctxts(struct svcxprt_rdma *xprt)
 	}
 }
 
-/*
- * Temporary NFS req mappings are shared across all transport
- * instances. These are short lived and should be bounded by the number
- * of concurrent server threads * depth of the SQ.
- */
-struct svc_rdma_req_map *svc_rdma_get_req_map(void)
+static struct svc_rdma_req_map *alloc_req_map(gfp_t flags)
 {
 	struct svc_rdma_req_map *map;
-	map = kmem_cache_alloc(svc_rdma_map_cachep,
-			       GFP_KERNEL | __GFP_NOFAIL);
+
+	map = kmalloc(sizeof(*map), flags);
+	if (map)
+		INIT_LIST_HEAD(&map->free);
+	return map;
+}
+
+static bool svc_rdma_prealloc_maps(struct svcxprt_rdma *xprt)
+{
+	int i;
+
+	/* One for each receive buffer on this connection. */
+	i = xprt->sc_max_requests;
+
+	while (i--) {
+		struct svc_rdma_req_map *map;
+
+		map = alloc_req_map(GFP_KERNEL);
+		if (!map) {
+			dprintk("svcrdma: No memory for request map\n");
+			return false;
+		}
+		list_add(&map->free, &xprt->sc_maps);
+	}
+	return true;
+}
+
+struct svc_rdma_req_map *svc_rdma_get_req_map(struct svcxprt_rdma *xprt)
+{
+	struct svc_rdma_req_map *map = NULL;
+
+	spin_lock(&xprt->sc_map_lock);
+	if (list_empty(&xprt->sc_maps))
+		goto out_empty;
+
+	map = list_first_entry(&xprt->sc_maps,
+			       struct svc_rdma_req_map, free);
+	list_del_init(&map->free);
+	spin_unlock(&xprt->sc_map_lock);
+
+out:
 	map->count = 0;
 	return map;
+
+out_empty:
+	spin_unlock(&xprt->sc_map_lock);
+
+	/* Pre-allocation amount was incorrect */
+	map = alloc_req_map(GFP_NOIO);
+	if (map)
+		goto out;
+
+	WARN_ON_ONCE("svcrdma: empty request map list?\n");
+	return NULL;
 }
 
-void svc_rdma_put_req_map(struct svc_rdma_req_map *map)
+void svc_rdma_put_req_map(struct svcxprt_rdma *xprt,
+			  struct svc_rdma_req_map *map)
 {
-	kmem_cache_free(svc_rdma_map_cachep, map);
+	spin_lock(&xprt->sc_map_lock);
+	list_add(&map->free, &xprt->sc_maps);
+	spin_unlock(&xprt->sc_map_lock);
+}
+
+static void svc_rdma_destroy_maps(struct svcxprt_rdma *xprt)
+{
+	while (!list_empty(&xprt->sc_maps)) {
+		struct svc_rdma_req_map *map;
+
+		map = list_first_entry(&xprt->sc_maps,
+				       struct svc_rdma_req_map, free);
+		list_del(&map->free);
+		kfree(map);
+	}
 }
 
 /* ib_cq event handler */
@@ -593,12 +653,14 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv,
 	INIT_LIST_HEAD(&cma_xprt->sc_read_complete_q);
 	INIT_LIST_HEAD(&cma_xprt->sc_frmr_q);
 	INIT_LIST_HEAD(&cma_xprt->sc_ctxts);
+	INIT_LIST_HEAD(&cma_xprt->sc_maps);
 	init_waitqueue_head(&cma_xprt->sc_send_wait);
 
 	spin_lock_init(&cma_xprt->sc_lock);
 	spin_lock_init(&cma_xprt->sc_rq_dto_lock);
 	spin_lock_init(&cma_xprt->sc_frmr_q_lock);
 	spin_lock_init(&cma_xprt->sc_ctxt_lock);
+	spin_lock_init(&cma_xprt->sc_map_lock);
 
 	if (listener)
 		set_bit(XPT_LISTENER, &cma_xprt->sc_xprt.xpt_flags);
@@ -988,6 +1050,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 
 	if (!svc_rdma_prealloc_ctxts(newxprt))
 		goto errout;
+	if (!svc_rdma_prealloc_maps(newxprt))
+		goto errout;
 
 	/*
 	 * Limit ORD based on client limit, local device limit, and
@@ -1259,6 +1323,7 @@ static void __svc_rdma_free(struct work_struct *work)
 
 	rdma_dealloc_frmr_q(rdma);
 	svc_rdma_destroy_ctxts(rdma);
+	svc_rdma_destroy_maps(rdma);
 
 	/* Destroy the QP if present (not a listener) */
 	if (rdma->sc_qp && !IS_ERR(rdma->sc_qp))

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 05/11] svcrdma: Improve allocation of struct svc_rdma_req_map
@ 2015-12-14 21:30     ` Chuck Lever
  0 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-14 21:30 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

To ensure this allocation cannot fail and will not sleep,
pre-allocate the req_map structures per-connection.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h          |    8 ++-
 net/sunrpc/xprtrdma/svc_rdma_sendto.c    |    6 +-
 net/sunrpc/xprtrdma/svc_rdma_transport.c |   85 ++++++++++++++++++++++++++----
 3 files changed, 84 insertions(+), 15 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index be2804b..05bf4fe 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -113,6 +113,7 @@ struct svc_rdma_fastreg_mr {
 	struct list_head frmr_list;
 };
 struct svc_rdma_req_map {
+	struct list_head free;
 	unsigned long count;
 	union {
 		struct kvec sge[RPCSVC_MAXPAGES];
@@ -145,6 +146,8 @@ struct svcxprt_rdma {
 	spinlock_t	     sc_ctxt_lock;
 	struct list_head     sc_ctxts;
 	int		     sc_ctxt_used;
+	spinlock_t	     sc_map_lock;
+	struct list_head     sc_maps;
 
 	struct list_head     sc_rq_dto_q;
 	spinlock_t	     sc_rq_dto_lock;
@@ -223,8 +226,9 @@ extern int svc_rdma_create_listen(struct svc_serv *, int, struct sockaddr *);
 extern struct svc_rdma_op_ctxt *svc_rdma_get_context(struct svcxprt_rdma *);
 extern void svc_rdma_put_context(struct svc_rdma_op_ctxt *, int);
 extern void svc_rdma_unmap_dma(struct svc_rdma_op_ctxt *ctxt);
-extern struct svc_rdma_req_map *svc_rdma_get_req_map(void);
-extern void svc_rdma_put_req_map(struct svc_rdma_req_map *);
+extern struct svc_rdma_req_map *svc_rdma_get_req_map(struct svcxprt_rdma *);
+extern void svc_rdma_put_req_map(struct svcxprt_rdma *,
+				 struct svc_rdma_req_map *);
 extern struct svc_rdma_fastreg_mr *svc_rdma_get_frmr(struct svcxprt_rdma *);
 extern void svc_rdma_put_frmr(struct svcxprt_rdma *,
 			      struct svc_rdma_fastreg_mr *);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index bad5eaa..b75566c 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -598,7 +598,7 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
 	/* Build an req vec for the XDR */
 	ctxt = svc_rdma_get_context(rdma);
 	ctxt->direction = DMA_TO_DEVICE;
-	vec = svc_rdma_get_req_map();
+	vec = svc_rdma_get_req_map(rdma);
 	ret = map_xdr(rdma, &rqstp->rq_res, vec);
 	if (ret)
 		goto err0;
@@ -637,14 +637,14 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
 
 	ret = send_reply(rdma, rqstp, res_page, rdma_resp, ctxt, vec,
 			 inline_bytes);
-	svc_rdma_put_req_map(vec);
+	svc_rdma_put_req_map(rdma, vec);
 	dprintk("svcrdma: send_reply returns %d\n", ret);
 	return ret;
 
  err1:
 	put_page(res_page);
  err0:
-	svc_rdma_put_req_map(vec);
+	svc_rdma_put_req_map(rdma, vec);
 	svc_rdma_put_context(ctxt, 0);
 	return ret;
 }
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 58ed9f2..ec10ae3 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -273,23 +273,83 @@ static void svc_rdma_destroy_ctxts(struct svcxprt_rdma *xprt)
 	}
 }
 
-/*
- * Temporary NFS req mappings are shared across all transport
- * instances. These are short lived and should be bounded by the number
- * of concurrent server threads * depth of the SQ.
- */
-struct svc_rdma_req_map *svc_rdma_get_req_map(void)
+static struct svc_rdma_req_map *alloc_req_map(gfp_t flags)
 {
 	struct svc_rdma_req_map *map;
-	map = kmem_cache_alloc(svc_rdma_map_cachep,
-			       GFP_KERNEL | __GFP_NOFAIL);
+
+	map = kmalloc(sizeof(*map), flags);
+	if (map)
+		INIT_LIST_HEAD(&map->free);
+	return map;
+}
+
+static bool svc_rdma_prealloc_maps(struct svcxprt_rdma *xprt)
+{
+	int i;
+
+	/* One for each receive buffer on this connection. */
+	i = xprt->sc_max_requests;
+
+	while (i--) {
+		struct svc_rdma_req_map *map;
+
+		map = alloc_req_map(GFP_KERNEL);
+		if (!map) {
+			dprintk("svcrdma: No memory for request map\n");
+			return false;
+		}
+		list_add(&map->free, &xprt->sc_maps);
+	}
+	return true;
+}
+
+struct svc_rdma_req_map *svc_rdma_get_req_map(struct svcxprt_rdma *xprt)
+{
+	struct svc_rdma_req_map *map = NULL;
+
+	spin_lock(&xprt->sc_map_lock);
+	if (list_empty(&xprt->sc_maps))
+		goto out_empty;
+
+	map = list_first_entry(&xprt->sc_maps,
+			       struct svc_rdma_req_map, free);
+	list_del_init(&map->free);
+	spin_unlock(&xprt->sc_map_lock);
+
+out:
 	map->count = 0;
 	return map;
+
+out_empty:
+	spin_unlock(&xprt->sc_map_lock);
+
+	/* Pre-allocation amount was incorrect */
+	map = alloc_req_map(GFP_NOIO);
+	if (map)
+		goto out;
+
+	WARN_ON_ONCE("svcrdma: empty request map list?\n");
+	return NULL;
 }
 
-void svc_rdma_put_req_map(struct svc_rdma_req_map *map)
+void svc_rdma_put_req_map(struct svcxprt_rdma *xprt,
+			  struct svc_rdma_req_map *map)
 {
-	kmem_cache_free(svc_rdma_map_cachep, map);
+	spin_lock(&xprt->sc_map_lock);
+	list_add(&map->free, &xprt->sc_maps);
+	spin_unlock(&xprt->sc_map_lock);
+}
+
+static void svc_rdma_destroy_maps(struct svcxprt_rdma *xprt)
+{
+	while (!list_empty(&xprt->sc_maps)) {
+		struct svc_rdma_req_map *map;
+
+		map = list_first_entry(&xprt->sc_maps,
+				       struct svc_rdma_req_map, free);
+		list_del(&map->free);
+		kfree(map);
+	}
 }
 
 /* ib_cq event handler */
@@ -593,12 +653,14 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv,
 	INIT_LIST_HEAD(&cma_xprt->sc_read_complete_q);
 	INIT_LIST_HEAD(&cma_xprt->sc_frmr_q);
 	INIT_LIST_HEAD(&cma_xprt->sc_ctxts);
+	INIT_LIST_HEAD(&cma_xprt->sc_maps);
 	init_waitqueue_head(&cma_xprt->sc_send_wait);
 
 	spin_lock_init(&cma_xprt->sc_lock);
 	spin_lock_init(&cma_xprt->sc_rq_dto_lock);
 	spin_lock_init(&cma_xprt->sc_frmr_q_lock);
 	spin_lock_init(&cma_xprt->sc_ctxt_lock);
+	spin_lock_init(&cma_xprt->sc_map_lock);
 
 	if (listener)
 		set_bit(XPT_LISTENER, &cma_xprt->sc_xprt.xpt_flags);
@@ -988,6 +1050,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 
 	if (!svc_rdma_prealloc_ctxts(newxprt))
 		goto errout;
+	if (!svc_rdma_prealloc_maps(newxprt))
+		goto errout;
 
 	/*
 	 * Limit ORD based on client limit, local device limit, and
@@ -1259,6 +1323,7 @@ static void __svc_rdma_free(struct work_struct *work)
 
 	rdma_dealloc_frmr_q(rdma);
 	svc_rdma_destroy_ctxts(rdma);
+	svc_rdma_destroy_maps(rdma);
 
 	/* Destroy the QP if present (not a listener) */
 	if (rdma->sc_qp && !IS_ERR(rdma->sc_qp))


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 06/11] svcrdma: Remove unused req_map and ctxt kmem_caches
  2015-12-14 21:30 ` Chuck Lever
@ 2015-12-14 21:30     ` Chuck Lever
  -1 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-14 21:30 UTC (permalink / raw)
  To: bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA

Clean up.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 include/linux/sunrpc/svc_rdma.h |    1 +
 net/sunrpc/xprtrdma/svc_rdma.c  |   35 -----------------------------------
 net/sunrpc/xprtrdma/xprt_rdma.h |    7 -------
 3 files changed, 1 insertion(+), 42 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 05bf4fe..141edbb 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -242,6 +242,7 @@ extern struct svc_xprt_class svc_rdma_bc_class;
 #endif
 
 /* svc_rdma.c */
+extern struct workqueue_struct *svc_rdma_wq;
 extern int svc_rdma_init(void);
 extern void svc_rdma_cleanup(void);
 
diff --git a/net/sunrpc/xprtrdma/svc_rdma.c b/net/sunrpc/xprtrdma/svc_rdma.c
index 1b7051b..e894e06 100644
--- a/net/sunrpc/xprtrdma/svc_rdma.c
+++ b/net/sunrpc/xprtrdma/svc_rdma.c
@@ -71,10 +71,6 @@ atomic_t rdma_stat_rq_prod;
 atomic_t rdma_stat_sq_poll;
 atomic_t rdma_stat_sq_prod;
 
-/* Temporary NFS request map and context caches */
-struct kmem_cache *svc_rdma_map_cachep;
-struct kmem_cache *svc_rdma_ctxt_cachep;
-
 struct workqueue_struct *svc_rdma_wq;
 
 /*
@@ -243,8 +239,6 @@ void svc_rdma_cleanup(void)
 	svc_unreg_xprt_class(&svc_rdma_bc_class);
 #endif
 	svc_unreg_xprt_class(&svc_rdma_class);
-	kmem_cache_destroy(svc_rdma_map_cachep);
-	kmem_cache_destroy(svc_rdma_ctxt_cachep);
 }
 
 int svc_rdma_init(void)
@@ -264,39 +258,10 @@ int svc_rdma_init(void)
 		svcrdma_table_header =
 			register_sysctl_table(svcrdma_root_table);
 
-	/* Create the temporary map cache */
-	svc_rdma_map_cachep = kmem_cache_create("svc_rdma_map_cache",
-						sizeof(struct svc_rdma_req_map),
-						0,
-						SLAB_HWCACHE_ALIGN,
-						NULL);
-	if (!svc_rdma_map_cachep) {
-		printk(KERN_INFO "Could not allocate map cache.\n");
-		goto err0;
-	}
-
-	/* Create the temporary context cache */
-	svc_rdma_ctxt_cachep =
-		kmem_cache_create("svc_rdma_ctxt_cache",
-				  sizeof(struct svc_rdma_op_ctxt),
-				  0,
-				  SLAB_HWCACHE_ALIGN,
-				  NULL);
-	if (!svc_rdma_ctxt_cachep) {
-		printk(KERN_INFO "Could not allocate WR ctxt cache.\n");
-		goto err1;
-	}
-
 	/* Register RDMA with the SVC transport switch */
 	svc_reg_xprt_class(&svc_rdma_class);
 #if defined(CONFIG_SUNRPC_BACKCHANNEL)
 	svc_reg_xprt_class(&svc_rdma_bc_class);
 #endif
 	return 0;
- err1:
-	kmem_cache_destroy(svc_rdma_map_cachep);
- err0:
-	unregister_sysctl_table(svcrdma_table_header);
-	destroy_workqueue(svc_rdma_wq);
-	return -ENOMEM;
 }
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 4197191..72276c7 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -528,11 +528,4 @@ void xprt_rdma_bc_free_rqst(struct rpc_rqst *);
 void xprt_rdma_bc_destroy(struct rpc_xprt *, unsigned int);
 #endif	/* CONFIG_SUNRPC_BACKCHANNEL */
 
-/* Temporary NFS request map cache. Created in svc_rdma.c  */
-extern struct kmem_cache *svc_rdma_map_cachep;
-/* WR context cache. Created in svc_rdma.c  */
-extern struct kmem_cache *svc_rdma_ctxt_cachep;
-/* Workqueue created in svc_rdma.c */
-extern struct workqueue_struct *svc_rdma_wq;
-
 #endif				/* _LINUX_SUNRPC_XPRT_RDMA_H */

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 06/11] svcrdma: Remove unused req_map and ctxt kmem_caches
@ 2015-12-14 21:30     ` Chuck Lever
  0 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-14 21:30 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

Clean up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h |    1 +
 net/sunrpc/xprtrdma/svc_rdma.c  |   35 -----------------------------------
 net/sunrpc/xprtrdma/xprt_rdma.h |    7 -------
 3 files changed, 1 insertion(+), 42 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 05bf4fe..141edbb 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -242,6 +242,7 @@ extern struct svc_xprt_class svc_rdma_bc_class;
 #endif
 
 /* svc_rdma.c */
+extern struct workqueue_struct *svc_rdma_wq;
 extern int svc_rdma_init(void);
 extern void svc_rdma_cleanup(void);
 
diff --git a/net/sunrpc/xprtrdma/svc_rdma.c b/net/sunrpc/xprtrdma/svc_rdma.c
index 1b7051b..e894e06 100644
--- a/net/sunrpc/xprtrdma/svc_rdma.c
+++ b/net/sunrpc/xprtrdma/svc_rdma.c
@@ -71,10 +71,6 @@ atomic_t rdma_stat_rq_prod;
 atomic_t rdma_stat_sq_poll;
 atomic_t rdma_stat_sq_prod;
 
-/* Temporary NFS request map and context caches */
-struct kmem_cache *svc_rdma_map_cachep;
-struct kmem_cache *svc_rdma_ctxt_cachep;
-
 struct workqueue_struct *svc_rdma_wq;
 
 /*
@@ -243,8 +239,6 @@ void svc_rdma_cleanup(void)
 	svc_unreg_xprt_class(&svc_rdma_bc_class);
 #endif
 	svc_unreg_xprt_class(&svc_rdma_class);
-	kmem_cache_destroy(svc_rdma_map_cachep);
-	kmem_cache_destroy(svc_rdma_ctxt_cachep);
 }
 
 int svc_rdma_init(void)
@@ -264,39 +258,10 @@ int svc_rdma_init(void)
 		svcrdma_table_header =
 			register_sysctl_table(svcrdma_root_table);
 
-	/* Create the temporary map cache */
-	svc_rdma_map_cachep = kmem_cache_create("svc_rdma_map_cache",
-						sizeof(struct svc_rdma_req_map),
-						0,
-						SLAB_HWCACHE_ALIGN,
-						NULL);
-	if (!svc_rdma_map_cachep) {
-		printk(KERN_INFO "Could not allocate map cache.\n");
-		goto err0;
-	}
-
-	/* Create the temporary context cache */
-	svc_rdma_ctxt_cachep =
-		kmem_cache_create("svc_rdma_ctxt_cache",
-				  sizeof(struct svc_rdma_op_ctxt),
-				  0,
-				  SLAB_HWCACHE_ALIGN,
-				  NULL);
-	if (!svc_rdma_ctxt_cachep) {
-		printk(KERN_INFO "Could not allocate WR ctxt cache.\n");
-		goto err1;
-	}
-
 	/* Register RDMA with the SVC transport switch */
 	svc_reg_xprt_class(&svc_rdma_class);
 #if defined(CONFIG_SUNRPC_BACKCHANNEL)
 	svc_reg_xprt_class(&svc_rdma_bc_class);
 #endif
 	return 0;
- err1:
-	kmem_cache_destroy(svc_rdma_map_cachep);
- err0:
-	unregister_sysctl_table(svcrdma_table_header);
-	destroy_workqueue(svc_rdma_wq);
-	return -ENOMEM;
 }
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 4197191..72276c7 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -528,11 +528,4 @@ void xprt_rdma_bc_free_rqst(struct rpc_rqst *);
 void xprt_rdma_bc_destroy(struct rpc_xprt *, unsigned int);
 #endif	/* CONFIG_SUNRPC_BACKCHANNEL */
 
-/* Temporary NFS request map cache. Created in svc_rdma.c  */
-extern struct kmem_cache *svc_rdma_map_cachep;
-/* WR context cache. Created in svc_rdma.c  */
-extern struct kmem_cache *svc_rdma_ctxt_cachep;
-/* Workqueue created in svc_rdma.c */
-extern struct workqueue_struct *svc_rdma_wq;
-
 #endif				/* _LINUX_SUNRPC_XPRT_RDMA_H */


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 07/11] svcrdma: Add gfp flags to svc_rdma_post_recv()
  2015-12-14 21:30 ` Chuck Lever
@ 2015-12-14 21:30     ` Chuck Lever
  -1 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-14 21:30 UTC (permalink / raw)
  To: bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA

svc_rdma_post_recv() allocates pages for receive buffers on-demand.
It uses GFP_KERNEL so the allocator tries hard, and may sleep. But
I'm about to add a call to svc_rdma_post_recv() from a function
that may not sleep.

Since all svc_rdma_post_recv() call sites can tolerate its failure,
allow it to fail if the page allocator returns nothing. Longer term,
receive buffers, being a finite resource per-connection, should be
pre-allocated and re-used.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 include/linux/sunrpc/svc_rdma.h          |    2 +-
 net/sunrpc/xprtrdma/svc_rdma_sendto.c    |    2 +-
 net/sunrpc/xprtrdma/svc_rdma_transport.c |    8 +++++---
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 141edbb..729ff35 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -221,7 +221,7 @@ extern struct rpcrdma_read_chunk *
 extern int svc_rdma_send(struct svcxprt_rdma *, struct ib_send_wr *);
 extern void svc_rdma_send_error(struct svcxprt_rdma *, struct rpcrdma_msg *,
 				enum rpcrdma_errcode);
-extern int svc_rdma_post_recv(struct svcxprt_rdma *);
+extern int svc_rdma_post_recv(struct svcxprt_rdma *, gfp_t);
 extern int svc_rdma_create_listen(struct svc_serv *, int, struct sockaddr *);
 extern struct svc_rdma_op_ctxt *svc_rdma_get_context(struct svcxprt_rdma *);
 extern void svc_rdma_put_context(struct svc_rdma_op_ctxt *, int);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index b75566c..2d3d7a4 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -472,7 +472,7 @@ static int send_reply(struct svcxprt_rdma *rdma,
 	int ret;
 
 	/* Post a recv buffer to handle another request. */
-	ret = svc_rdma_post_recv(rdma);
+	ret = svc_rdma_post_recv(rdma, GFP_KERNEL);
 	if (ret) {
 		printk(KERN_INFO
 		       "svcrdma: could not post a receive buffer, err=%d."
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index ec10ae3..14b692d 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -668,7 +668,7 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv,
 	return cma_xprt;
 }
 
-int svc_rdma_post_recv(struct svcxprt_rdma *xprt)
+int svc_rdma_post_recv(struct svcxprt_rdma *xprt, gfp_t flags)
 {
 	struct ib_recv_wr recv_wr, *bad_recv_wr;
 	struct svc_rdma_op_ctxt *ctxt;
@@ -686,7 +686,9 @@ int svc_rdma_post_recv(struct svcxprt_rdma *xprt)
 			pr_err("svcrdma: Too many sges (%d)\n", sge_no);
 			goto err_put_ctxt;
 		}
-		page = alloc_page(GFP_KERNEL | __GFP_NOFAIL);
+		page = alloc_page(flags);
+		if (!page)
+			goto err_put_ctxt;
 		ctxt->pages[sge_no] = page;
 		pa = ib_dma_map_page(xprt->sc_cm_id->device,
 				     page, 0, PAGE_SIZE,
@@ -1182,7 +1184,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 
 	/* Post receive buffers */
 	for (i = 0; i < newxprt->sc_max_requests; i++) {
-		ret = svc_rdma_post_recv(newxprt);
+		ret = svc_rdma_post_recv(newxprt, GFP_KERNEL);
 		if (ret) {
 			dprintk("svcrdma: failure posting receive buffers\n");
 			goto errout;

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 07/11] svcrdma: Add gfp flags to svc_rdma_post_recv()
@ 2015-12-14 21:30     ` Chuck Lever
  0 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-14 21:30 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

svc_rdma_post_recv() allocates pages for receive buffers on-demand.
It uses GFP_KERNEL so the allocator tries hard, and may sleep. But
I'm about to add a call to svc_rdma_post_recv() from a function
that may not sleep.

Since all svc_rdma_post_recv() call sites can tolerate its failure,
allow it to fail if the page allocator returns nothing. Longer term,
receive buffers, being a finite resource per-connection, should be
pre-allocated and re-used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h          |    2 +-
 net/sunrpc/xprtrdma/svc_rdma_sendto.c    |    2 +-
 net/sunrpc/xprtrdma/svc_rdma_transport.c |    8 +++++---
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 141edbb..729ff35 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -221,7 +221,7 @@ extern struct rpcrdma_read_chunk *
 extern int svc_rdma_send(struct svcxprt_rdma *, struct ib_send_wr *);
 extern void svc_rdma_send_error(struct svcxprt_rdma *, struct rpcrdma_msg *,
 				enum rpcrdma_errcode);
-extern int svc_rdma_post_recv(struct svcxprt_rdma *);
+extern int svc_rdma_post_recv(struct svcxprt_rdma *, gfp_t);
 extern int svc_rdma_create_listen(struct svc_serv *, int, struct sockaddr *);
 extern struct svc_rdma_op_ctxt *svc_rdma_get_context(struct svcxprt_rdma *);
 extern void svc_rdma_put_context(struct svc_rdma_op_ctxt *, int);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index b75566c..2d3d7a4 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -472,7 +472,7 @@ static int send_reply(struct svcxprt_rdma *rdma,
 	int ret;
 
 	/* Post a recv buffer to handle another request. */
-	ret = svc_rdma_post_recv(rdma);
+	ret = svc_rdma_post_recv(rdma, GFP_KERNEL);
 	if (ret) {
 		printk(KERN_INFO
 		       "svcrdma: could not post a receive buffer, err=%d."
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index ec10ae3..14b692d 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -668,7 +668,7 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv,
 	return cma_xprt;
 }
 
-int svc_rdma_post_recv(struct svcxprt_rdma *xprt)
+int svc_rdma_post_recv(struct svcxprt_rdma *xprt, gfp_t flags)
 {
 	struct ib_recv_wr recv_wr, *bad_recv_wr;
 	struct svc_rdma_op_ctxt *ctxt;
@@ -686,7 +686,9 @@ int svc_rdma_post_recv(struct svcxprt_rdma *xprt)
 			pr_err("svcrdma: Too many sges (%d)\n", sge_no);
 			goto err_put_ctxt;
 		}
-		page = alloc_page(GFP_KERNEL | __GFP_NOFAIL);
+		page = alloc_page(flags);
+		if (!page)
+			goto err_put_ctxt;
 		ctxt->pages[sge_no] = page;
 		pa = ib_dma_map_page(xprt->sc_cm_id->device,
 				     page, 0, PAGE_SIZE,
@@ -1182,7 +1184,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 
 	/* Post receive buffers */
 	for (i = 0; i < newxprt->sc_max_requests; i++) {
-		ret = svc_rdma_post_recv(newxprt);
+		ret = svc_rdma_post_recv(newxprt, GFP_KERNEL);
 		if (ret) {
 			dprintk("svcrdma: failure posting receive buffers\n");
 			goto errout;


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 08/11] svcrdma: Remove last two __GFP_NOFAIL call sites
  2015-12-14 21:30 ` Chuck Lever
@ 2015-12-14 21:31     ` Chuck Lever
  -1 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-14 21:31 UTC (permalink / raw)
  To: bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA

Clean up.

These functions can otherwise fail, so check for page allocation
failures too.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 net/sunrpc/xprtrdma/svc_rdma_sendto.c    |    5 ++++-
 net/sunrpc/xprtrdma/svc_rdma_transport.c |    4 +++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index 2d3d7a4..9221086 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -605,7 +605,10 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
 	inline_bytes = rqstp->rq_res.len;
 
 	/* Create the RDMA response header */
-	res_page = alloc_page(GFP_KERNEL | __GFP_NOFAIL);
+	ret = -ENOMEM;
+	res_page = alloc_page(GFP_KERNEL);
+	if (!res_page)
+		goto err0;
 	rdma_resp = page_address(res_page);
 	reply_ary = svc_rdma_get_reply_array(rdma_argp);
 	if (reply_ary)
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 14b692d..694ade4 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -1445,7 +1445,9 @@ void svc_rdma_send_error(struct svcxprt_rdma *xprt, struct rpcrdma_msg *rmsgp,
 	int length;
 	int ret;
 
-	p = alloc_page(GFP_KERNEL | __GFP_NOFAIL);
+	p = alloc_page(GFP_KERNEL);
+	if (!p)
+		return;
 	va = page_address(p);
 
 	/* XDR encode error */

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 08/11] svcrdma: Remove last two __GFP_NOFAIL call sites
@ 2015-12-14 21:31     ` Chuck Lever
  0 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-14 21:31 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

Clean up.

These functions can otherwise fail, so check for page allocation
failures too.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/svc_rdma_sendto.c    |    5 ++++-
 net/sunrpc/xprtrdma/svc_rdma_transport.c |    4 +++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index 2d3d7a4..9221086 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -605,7 +605,10 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
 	inline_bytes = rqstp->rq_res.len;
 
 	/* Create the RDMA response header */
-	res_page = alloc_page(GFP_KERNEL | __GFP_NOFAIL);
+	ret = -ENOMEM;
+	res_page = alloc_page(GFP_KERNEL);
+	if (!res_page)
+		goto err0;
 	rdma_resp = page_address(res_page);
 	reply_ary = svc_rdma_get_reply_array(rdma_argp);
 	if (reply_ary)
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 14b692d..694ade4 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -1445,7 +1445,9 @@ void svc_rdma_send_error(struct svcxprt_rdma *xprt, struct rpcrdma_msg *rmsgp,
 	int length;
 	int ret;
 
-	p = alloc_page(GFP_KERNEL | __GFP_NOFAIL);
+	p = alloc_page(GFP_KERNEL);
+	if (!p)
+		return;
 	va = page_address(p);
 
 	/* XDR encode error */


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 09/11] svcrdma: Make map_xdr non-static
  2015-12-14 21:30 ` Chuck Lever
@ 2015-12-14 21:31     ` Chuck Lever
  -1 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-14 21:31 UTC (permalink / raw)
  To: bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA

Pre-requisite to use map_xdr in the backchannel code.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 include/linux/sunrpc/svc_rdma.h       |    2 ++
 net/sunrpc/xprtrdma/svc_rdma_sendto.c |   14 +++++++-------
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 729ff35..aeffa30 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -213,6 +213,8 @@ extern int rdma_read_chunk_frmr(struct svcxprt_rdma *, struct svc_rqst *,
 				u32, u32, u64, bool);
 
 /* svc_rdma_sendto.c */
+extern int svc_rdma_map_xdr(struct svcxprt_rdma *, struct xdr_buf *,
+			    struct svc_rdma_req_map *);
 extern int svc_rdma_sendto(struct svc_rqst *);
 extern struct rpcrdma_read_chunk *
 	svc_rdma_get_read_chunk(struct rpcrdma_msg *);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index 9221086..ced3151 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -50,9 +50,9 @@
 
 #define RPCDBG_FACILITY	RPCDBG_SVCXPRT
 
-static int map_xdr(struct svcxprt_rdma *xprt,
-		   struct xdr_buf *xdr,
-		   struct svc_rdma_req_map *vec)
+int svc_rdma_map_xdr(struct svcxprt_rdma *xprt,
+		     struct xdr_buf *xdr,
+		     struct svc_rdma_req_map *vec)
 {
 	int sge_no;
 	u32 sge_bytes;
@@ -62,7 +62,7 @@ static int map_xdr(struct svcxprt_rdma *xprt,
 
 	if (xdr->len !=
 	    (xdr->head[0].iov_len + xdr->page_len + xdr->tail[0].iov_len)) {
-		pr_err("svcrdma: map_xdr: XDR buffer length error\n");
+		pr_err("svcrdma: %s: XDR buffer length error\n", __func__);
 		return -EIO;
 	}
 
@@ -97,9 +97,9 @@ static int map_xdr(struct svcxprt_rdma *xprt,
 		sge_no++;
 	}
 
-	dprintk("svcrdma: map_xdr: sge_no %d page_no %d "
+	dprintk("svcrdma: %s: sge_no %d page_no %d "
 		"page_base %u page_len %u head_len %zu tail_len %zu\n",
-		sge_no, page_no, xdr->page_base, xdr->page_len,
+		__func__, sge_no, page_no, xdr->page_base, xdr->page_len,
 		xdr->head[0].iov_len, xdr->tail[0].iov_len);
 
 	vec->count = sge_no;
@@ -599,7 +599,7 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
 	ctxt = svc_rdma_get_context(rdma);
 	ctxt->direction = DMA_TO_DEVICE;
 	vec = svc_rdma_get_req_map(rdma);
-	ret = map_xdr(rdma, &rqstp->rq_res, vec);
+	ret = svc_rdma_map_xdr(rdma, &rqstp->rq_res, vec);
 	if (ret)
 		goto err0;
 	inline_bytes = rqstp->rq_res.len;

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 09/11] svcrdma: Make map_xdr non-static
@ 2015-12-14 21:31     ` Chuck Lever
  0 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-14 21:31 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

Pre-requisite to use map_xdr in the backchannel code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h       |    2 ++
 net/sunrpc/xprtrdma/svc_rdma_sendto.c |   14 +++++++-------
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 729ff35..aeffa30 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -213,6 +213,8 @@ extern int rdma_read_chunk_frmr(struct svcxprt_rdma *, struct svc_rqst *,
 				u32, u32, u64, bool);
 
 /* svc_rdma_sendto.c */
+extern int svc_rdma_map_xdr(struct svcxprt_rdma *, struct xdr_buf *,
+			    struct svc_rdma_req_map *);
 extern int svc_rdma_sendto(struct svc_rqst *);
 extern struct rpcrdma_read_chunk *
 	svc_rdma_get_read_chunk(struct rpcrdma_msg *);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index 9221086..ced3151 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -50,9 +50,9 @@
 
 #define RPCDBG_FACILITY	RPCDBG_SVCXPRT
 
-static int map_xdr(struct svcxprt_rdma *xprt,
-		   struct xdr_buf *xdr,
-		   struct svc_rdma_req_map *vec)
+int svc_rdma_map_xdr(struct svcxprt_rdma *xprt,
+		     struct xdr_buf *xdr,
+		     struct svc_rdma_req_map *vec)
 {
 	int sge_no;
 	u32 sge_bytes;
@@ -62,7 +62,7 @@ static int map_xdr(struct svcxprt_rdma *xprt,
 
 	if (xdr->len !=
 	    (xdr->head[0].iov_len + xdr->page_len + xdr->tail[0].iov_len)) {
-		pr_err("svcrdma: map_xdr: XDR buffer length error\n");
+		pr_err("svcrdma: %s: XDR buffer length error\n", __func__);
 		return -EIO;
 	}
 
@@ -97,9 +97,9 @@ static int map_xdr(struct svcxprt_rdma *xprt,
 		sge_no++;
 	}
 
-	dprintk("svcrdma: map_xdr: sge_no %d page_no %d "
+	dprintk("svcrdma: %s: sge_no %d page_no %d "
 		"page_base %u page_len %u head_len %zu tail_len %zu\n",
-		sge_no, page_no, xdr->page_base, xdr->page_len,
+		__func__, sge_no, page_no, xdr->page_base, xdr->page_len,
 		xdr->head[0].iov_len, xdr->tail[0].iov_len);
 
 	vec->count = sge_no;
@@ -599,7 +599,7 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
 	ctxt = svc_rdma_get_context(rdma);
 	ctxt->direction = DMA_TO_DEVICE;
 	vec = svc_rdma_get_req_map(rdma);
-	ret = map_xdr(rdma, &rqstp->rq_res, vec);
+	ret = svc_rdma_map_xdr(rdma, &rqstp->rq_res, vec);
 	if (ret)
 		goto err0;
 	inline_bytes = rqstp->rq_res.len;


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 10/11] svcrdma: Define maximum number of backchannel requests
  2015-12-14 21:30 ` Chuck Lever
@ 2015-12-14 21:31     ` Chuck Lever
  -1 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-14 21:31 UTC (permalink / raw)
  To: bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA

Extra resources for handling backchannel requests have to be
pre-allocated when a transport instance is created. Set up
additional fields in svcxprt_rdma to track these resources.

The max_requests fields are elements of the RPC-over-RDMA
protocol, so they should be u32. To ensure that unsigned
arithmetic is used everywhere, some other fields in the
svcxprt_rdma struct are updated.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 include/linux/sunrpc/svc_rdma.h          |   13 ++++++++++---
 net/sunrpc/xprtrdma/svc_rdma.c           |    6 ++++--
 net/sunrpc/xprtrdma/svc_rdma_transport.c |   24 ++++++++++++++----------
 3 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index aeffa30..9a2c418 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -51,6 +51,7 @@
 /* RPC/RDMA parameters and stats */
 extern unsigned int svcrdma_ord;
 extern unsigned int svcrdma_max_requests;
+extern unsigned int svcrdma_max_bc_requests;
 extern unsigned int svcrdma_max_req_size;
 
 extern atomic_t rdma_stat_recv;
@@ -134,10 +135,11 @@ struct svcxprt_rdma {
 	int                  sc_max_sge;
 	int                  sc_max_sge_rd;	/* max sge for read target */
 
-	int                  sc_sq_depth;	/* Depth of SQ */
 	atomic_t             sc_sq_count;	/* Number of SQ WR on queue */
-
-	int                  sc_max_requests;	/* Depth of RQ */
+	unsigned int	     sc_sq_depth;	/* Depth of SQ */
+	unsigned int	     sc_rq_depth;	/* Depth of RQ */
+	u32		     sc_max_requests;	/* Forward credits */
+	u32		     sc_max_bc_requests;/* Backward credits */
 	int                  sc_max_req_size;	/* Size of each RQ WR buf */
 
 	struct ib_pd         *sc_pd;
@@ -186,6 +188,11 @@ struct svcxprt_rdma {
 #define RPCRDMA_MAX_REQUESTS    32
 #define RPCRDMA_MAX_REQ_SIZE    4096
 
+/* Typical ULP usage of BC requests is NFSv4.1 backchannel. Our
+ * current NFSv4.1 implementation supports one backchannel slot.
+ */
+#define RPCRDMA_MAX_BC_REQUESTS	2
+
 #define RPCSVC_MAXPAYLOAD_RDMA	RPCSVC_MAXPAYLOAD
 
 /* svc_rdma_marshal.c */
diff --git a/net/sunrpc/xprtrdma/svc_rdma.c b/net/sunrpc/xprtrdma/svc_rdma.c
index e894e06..c846ca9 100644
--- a/net/sunrpc/xprtrdma/svc_rdma.c
+++ b/net/sunrpc/xprtrdma/svc_rdma.c
@@ -55,6 +55,7 @@ unsigned int svcrdma_ord = RPCRDMA_ORD;
 static unsigned int min_ord = 1;
 static unsigned int max_ord = 4096;
 unsigned int svcrdma_max_requests = RPCRDMA_MAX_REQUESTS;
+unsigned int svcrdma_max_bc_requests = RPCRDMA_MAX_BC_REQUESTS;
 static unsigned int min_max_requests = 4;
 static unsigned int max_max_requests = 16384;
 unsigned int svcrdma_max_req_size = RPCRDMA_MAX_REQ_SIZE;
@@ -245,9 +246,10 @@ int svc_rdma_init(void)
 {
 	dprintk("SVCRDMA Module Init, register RPC RDMA transport\n");
 	dprintk("\tsvcrdma_ord      : %d\n", svcrdma_ord);
-	dprintk("\tmax_requests     : %d\n", svcrdma_max_requests);
-	dprintk("\tsq_depth         : %d\n",
+	dprintk("\tmax_requests     : %u\n", svcrdma_max_requests);
+	dprintk("\tsq_depth         : %u\n",
 		svcrdma_max_requests * RPCRDMA_SQ_DEPTH_MULT);
+	dprintk("\tmax_bc_requests  : %u\n", svcrdma_max_bc_requests);
 	dprintk("\tmax_inline       : %d\n", svcrdma_max_req_size);
 
 	svc_rdma_wq = alloc_workqueue("svc_rdma", 0, 0);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 694ade4..35326a3 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -169,12 +169,12 @@ static struct svc_rdma_op_ctxt *alloc_ctxt(struct svcxprt_rdma *xprt,
 
 static bool svc_rdma_prealloc_ctxts(struct svcxprt_rdma *xprt)
 {
-	int i;
+	unsigned int i;
 
 	/* Each RPC/RDMA credit can consume a number of send
 	 * and receive WQEs. One ctxt is allocated for each.
 	 */
-	i = xprt->sc_sq_depth + xprt->sc_max_requests;
+	i = xprt->sc_sq_depth + xprt->sc_rq_depth;
 
 	while (i--) {
 		struct svc_rdma_op_ctxt *ctxt;
@@ -285,7 +285,7 @@ static struct svc_rdma_req_map *alloc_req_map(gfp_t flags)
 
 static bool svc_rdma_prealloc_maps(struct svcxprt_rdma *xprt)
 {
-	int i;
+	unsigned int i;
 
 	/* One for each receive buffer on this connection. */
 	i = xprt->sc_max_requests;
@@ -1016,8 +1016,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	struct ib_device *dev;
 	int uninitialized_var(dma_mr_acc);
 	int need_dma_mr = 0;
+	unsigned int i;
 	int ret = 0;
-	int i;
 
 	listen_rdma = container_of(xprt, struct svcxprt_rdma, sc_xprt);
 	clear_bit(XPT_CONN, &xprt->xpt_flags);
@@ -1046,9 +1046,13 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	newxprt->sc_max_sge_rd = min_t(size_t, dev->max_sge_rd,
 				       RPCSVC_MAXPAGES);
 	newxprt->sc_max_req_size = svcrdma_max_req_size;
-	newxprt->sc_max_requests = min((size_t)dev->max_qp_wr,
-				   (size_t)svcrdma_max_requests);
-	newxprt->sc_sq_depth = RPCRDMA_SQ_DEPTH_MULT * newxprt->sc_max_requests;
+	newxprt->sc_max_requests = min_t(u32, dev->max_qp_wr,
+					 svcrdma_max_requests);
+	newxprt->sc_max_bc_requests = min_t(u32, dev->max_qp_wr,
+					    svcrdma_max_bc_requests);
+	newxprt->sc_rq_depth = newxprt->sc_max_requests +
+			       newxprt->sc_max_bc_requests;
+	newxprt->sc_sq_depth = RPCRDMA_SQ_DEPTH_MULT * newxprt->sc_rq_depth;
 
 	if (!svc_rdma_prealloc_ctxts(newxprt))
 		goto errout;
@@ -1077,7 +1081,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 		dprintk("svcrdma: error creating SQ CQ for connect request\n");
 		goto errout;
 	}
-	cq_attr.cqe = newxprt->sc_max_requests;
+	cq_attr.cqe = newxprt->sc_rq_depth;
 	newxprt->sc_rq_cq = ib_create_cq(dev,
 					 rq_comp_handler,
 					 cq_event_handler,
@@ -1092,7 +1096,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	qp_attr.event_handler = qp_event_handler;
 	qp_attr.qp_context = &newxprt->sc_xprt;
 	qp_attr.cap.max_send_wr = newxprt->sc_sq_depth;
-	qp_attr.cap.max_recv_wr = newxprt->sc_max_requests;
+	qp_attr.cap.max_recv_wr = newxprt->sc_rq_depth;
 	qp_attr.cap.max_send_sge = newxprt->sc_max_sge;
 	qp_attr.cap.max_recv_sge = newxprt->sc_max_sge;
 	qp_attr.sq_sig_type = IB_SIGNAL_REQ_WR;
@@ -1183,7 +1187,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 		newxprt->sc_dma_lkey = dev->local_dma_lkey;
 
 	/* Post receive buffers */
-	for (i = 0; i < newxprt->sc_max_requests; i++) {
+	for (i = 0; i < newxprt->sc_rq_depth; i++) {
 		ret = svc_rdma_post_recv(newxprt, GFP_KERNEL);
 		if (ret) {
 			dprintk("svcrdma: failure posting receive buffers\n");

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 10/11] svcrdma: Define maximum number of backchannel requests
@ 2015-12-14 21:31     ` Chuck Lever
  0 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-14 21:31 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

Extra resources for handling backchannel requests have to be
pre-allocated when a transport instance is created. Set up
additional fields in svcxprt_rdma to track these resources.

The max_requests fields are elements of the RPC-over-RDMA
protocol, so they should be u32. To ensure that unsigned
arithmetic is used everywhere, some other fields in the
svcxprt_rdma struct are updated.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h          |   13 ++++++++++---
 net/sunrpc/xprtrdma/svc_rdma.c           |    6 ++++--
 net/sunrpc/xprtrdma/svc_rdma_transport.c |   24 ++++++++++++++----------
 3 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index aeffa30..9a2c418 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -51,6 +51,7 @@
 /* RPC/RDMA parameters and stats */
 extern unsigned int svcrdma_ord;
 extern unsigned int svcrdma_max_requests;
+extern unsigned int svcrdma_max_bc_requests;
 extern unsigned int svcrdma_max_req_size;
 
 extern atomic_t rdma_stat_recv;
@@ -134,10 +135,11 @@ struct svcxprt_rdma {
 	int                  sc_max_sge;
 	int                  sc_max_sge_rd;	/* max sge for read target */
 
-	int                  sc_sq_depth;	/* Depth of SQ */
 	atomic_t             sc_sq_count;	/* Number of SQ WR on queue */
-
-	int                  sc_max_requests;	/* Depth of RQ */
+	unsigned int	     sc_sq_depth;	/* Depth of SQ */
+	unsigned int	     sc_rq_depth;	/* Depth of RQ */
+	u32		     sc_max_requests;	/* Forward credits */
+	u32		     sc_max_bc_requests;/* Backward credits */
 	int                  sc_max_req_size;	/* Size of each RQ WR buf */
 
 	struct ib_pd         *sc_pd;
@@ -186,6 +188,11 @@ struct svcxprt_rdma {
 #define RPCRDMA_MAX_REQUESTS    32
 #define RPCRDMA_MAX_REQ_SIZE    4096
 
+/* Typical ULP usage of BC requests is NFSv4.1 backchannel. Our
+ * current NFSv4.1 implementation supports one backchannel slot.
+ */
+#define RPCRDMA_MAX_BC_REQUESTS	2
+
 #define RPCSVC_MAXPAYLOAD_RDMA	RPCSVC_MAXPAYLOAD
 
 /* svc_rdma_marshal.c */
diff --git a/net/sunrpc/xprtrdma/svc_rdma.c b/net/sunrpc/xprtrdma/svc_rdma.c
index e894e06..c846ca9 100644
--- a/net/sunrpc/xprtrdma/svc_rdma.c
+++ b/net/sunrpc/xprtrdma/svc_rdma.c
@@ -55,6 +55,7 @@ unsigned int svcrdma_ord = RPCRDMA_ORD;
 static unsigned int min_ord = 1;
 static unsigned int max_ord = 4096;
 unsigned int svcrdma_max_requests = RPCRDMA_MAX_REQUESTS;
+unsigned int svcrdma_max_bc_requests = RPCRDMA_MAX_BC_REQUESTS;
 static unsigned int min_max_requests = 4;
 static unsigned int max_max_requests = 16384;
 unsigned int svcrdma_max_req_size = RPCRDMA_MAX_REQ_SIZE;
@@ -245,9 +246,10 @@ int svc_rdma_init(void)
 {
 	dprintk("SVCRDMA Module Init, register RPC RDMA transport\n");
 	dprintk("\tsvcrdma_ord      : %d\n", svcrdma_ord);
-	dprintk("\tmax_requests     : %d\n", svcrdma_max_requests);
-	dprintk("\tsq_depth         : %d\n",
+	dprintk("\tmax_requests     : %u\n", svcrdma_max_requests);
+	dprintk("\tsq_depth         : %u\n",
 		svcrdma_max_requests * RPCRDMA_SQ_DEPTH_MULT);
+	dprintk("\tmax_bc_requests  : %u\n", svcrdma_max_bc_requests);
 	dprintk("\tmax_inline       : %d\n", svcrdma_max_req_size);
 
 	svc_rdma_wq = alloc_workqueue("svc_rdma", 0, 0);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 694ade4..35326a3 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -169,12 +169,12 @@ static struct svc_rdma_op_ctxt *alloc_ctxt(struct svcxprt_rdma *xprt,
 
 static bool svc_rdma_prealloc_ctxts(struct svcxprt_rdma *xprt)
 {
-	int i;
+	unsigned int i;
 
 	/* Each RPC/RDMA credit can consume a number of send
 	 * and receive WQEs. One ctxt is allocated for each.
 	 */
-	i = xprt->sc_sq_depth + xprt->sc_max_requests;
+	i = xprt->sc_sq_depth + xprt->sc_rq_depth;
 
 	while (i--) {
 		struct svc_rdma_op_ctxt *ctxt;
@@ -285,7 +285,7 @@ static struct svc_rdma_req_map *alloc_req_map(gfp_t flags)
 
 static bool svc_rdma_prealloc_maps(struct svcxprt_rdma *xprt)
 {
-	int i;
+	unsigned int i;
 
 	/* One for each receive buffer on this connection. */
 	i = xprt->sc_max_requests;
@@ -1016,8 +1016,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	struct ib_device *dev;
 	int uninitialized_var(dma_mr_acc);
 	int need_dma_mr = 0;
+	unsigned int i;
 	int ret = 0;
-	int i;
 
 	listen_rdma = container_of(xprt, struct svcxprt_rdma, sc_xprt);
 	clear_bit(XPT_CONN, &xprt->xpt_flags);
@@ -1046,9 +1046,13 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	newxprt->sc_max_sge_rd = min_t(size_t, dev->max_sge_rd,
 				       RPCSVC_MAXPAGES);
 	newxprt->sc_max_req_size = svcrdma_max_req_size;
-	newxprt->sc_max_requests = min((size_t)dev->max_qp_wr,
-				   (size_t)svcrdma_max_requests);
-	newxprt->sc_sq_depth = RPCRDMA_SQ_DEPTH_MULT * newxprt->sc_max_requests;
+	newxprt->sc_max_requests = min_t(u32, dev->max_qp_wr,
+					 svcrdma_max_requests);
+	newxprt->sc_max_bc_requests = min_t(u32, dev->max_qp_wr,
+					    svcrdma_max_bc_requests);
+	newxprt->sc_rq_depth = newxprt->sc_max_requests +
+			       newxprt->sc_max_bc_requests;
+	newxprt->sc_sq_depth = RPCRDMA_SQ_DEPTH_MULT * newxprt->sc_rq_depth;
 
 	if (!svc_rdma_prealloc_ctxts(newxprt))
 		goto errout;
@@ -1077,7 +1081,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 		dprintk("svcrdma: error creating SQ CQ for connect request\n");
 		goto errout;
 	}
-	cq_attr.cqe = newxprt->sc_max_requests;
+	cq_attr.cqe = newxprt->sc_rq_depth;
 	newxprt->sc_rq_cq = ib_create_cq(dev,
 					 rq_comp_handler,
 					 cq_event_handler,
@@ -1092,7 +1096,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	qp_attr.event_handler = qp_event_handler;
 	qp_attr.qp_context = &newxprt->sc_xprt;
 	qp_attr.cap.max_send_wr = newxprt->sc_sq_depth;
-	qp_attr.cap.max_recv_wr = newxprt->sc_max_requests;
+	qp_attr.cap.max_recv_wr = newxprt->sc_rq_depth;
 	qp_attr.cap.max_send_sge = newxprt->sc_max_sge;
 	qp_attr.cap.max_recv_sge = newxprt->sc_max_sge;
 	qp_attr.sq_sig_type = IB_SIGNAL_REQ_WR;
@@ -1183,7 +1187,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 		newxprt->sc_dma_lkey = dev->local_dma_lkey;
 
 	/* Post receive buffers */
-	for (i = 0; i < newxprt->sc_max_requests; i++) {
+	for (i = 0; i < newxprt->sc_rq_depth; i++) {
 		ret = svc_rdma_post_recv(newxprt, GFP_KERNEL);
 		if (ret) {
 			dprintk("svcrdma: failure posting receive buffers\n");


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 11/11] svcrdma: Add class for RDMA backwards direction transport
  2015-12-14 21:30 ` Chuck Lever
@ 2015-12-14 21:31     ` Chuck Lever
  -1 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-14 21:31 UTC (permalink / raw)
  To: bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA

To support the server-side of an NFSv4.1 backchannel on RDMA
connections, add a transport class that enables backward
direction messages on an existing forward channel connection.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 include/linux/sunrpc/svc_rdma.h            |    5 
 net/sunrpc/xprt.c                          |    1 
 net/sunrpc/xprtrdma/Makefile               |    2 
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |  371 ++++++++++++++++++++++++++++
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c    |   52 ++++
 net/sunrpc/xprtrdma/svc_rdma_transport.c   |   14 +
 net/sunrpc/xprtrdma/transport.c            |   30 +-
 net/sunrpc/xprtrdma/xprt_rdma.h            |   15 +
 8 files changed, 475 insertions(+), 15 deletions(-)
 create mode 100644 net/sunrpc/xprtrdma/svc_rdma_backchannel.c

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 9a2c418..b13513a 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -195,6 +195,11 @@ struct svcxprt_rdma {
 
 #define RPCSVC_MAXPAYLOAD_RDMA	RPCSVC_MAXPAYLOAD
 
+/* svc_rdma_backchannel.c */
+extern int svc_rdma_handle_bc_reply(struct rpc_xprt *xprt,
+				    struct rpcrdma_msg *rmsgp,
+				    struct xdr_buf *rcvbuf);
+
 /* svc_rdma_marshal.c */
 extern int svc_rdma_xdr_decode_req(struct rpcrdma_msg **, struct svc_rqst *);
 extern int svc_rdma_xdr_encode_error(struct svcxprt_rdma *,
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 2e98f4a..37edea6 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1425,3 +1425,4 @@ void xprt_put(struct rpc_xprt *xprt)
 	if (atomic_dec_and_test(&xprt->count))
 		xprt_destroy(xprt);
 }
+EXPORT_SYMBOL_GPL(xprt_put);
diff --git a/net/sunrpc/xprtrdma/Makefile b/net/sunrpc/xprtrdma/Makefile
index 33f99d3..dc9f3b5 100644
--- a/net/sunrpc/xprtrdma/Makefile
+++ b/net/sunrpc/xprtrdma/Makefile
@@ -2,7 +2,7 @@ obj-$(CONFIG_SUNRPC_XPRT_RDMA) += rpcrdma.o
 
 rpcrdma-y := transport.o rpc_rdma.o verbs.o \
 	fmr_ops.o frwr_ops.o physical_ops.o \
-	svc_rdma.o svc_rdma_transport.o \
+	svc_rdma.o svc_rdma_backchannel.o svc_rdma_transport.o \
 	svc_rdma_marshal.o svc_rdma_sendto.o svc_rdma_recvfrom.o \
 	module.o
 rpcrdma-$(CONFIG_SUNRPC_BACKCHANNEL) += backchannel.o
diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
new file mode 100644
index 0000000..417cec1
--- /dev/null
+++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
@@ -0,0 +1,371 @@
+/*
+ * Copyright (c) 2015 Oracle.  All rights reserved.
+ *
+ * Support for backward direction RPCs on RPC/RDMA (server-side).
+ */
+
+#include <linux/sunrpc/svc_rdma.h>
+#include "xprt_rdma.h"
+
+#define RPCDBG_FACILITY	RPCDBG_SVCXPRT
+
+#undef SVCRDMA_BACKCHANNEL_DEBUG
+
+int svc_rdma_handle_bc_reply(struct rpc_xprt *xprt, struct rpcrdma_msg *rmsgp,
+			     struct xdr_buf *rcvbuf)
+{
+	struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
+	struct kvec *dst, *src = &rcvbuf->head[0];
+	struct rpc_rqst *req;
+	unsigned long cwnd;
+	u32 credits;
+	size_t len;
+	__be32 xid;
+	__be32 *p;
+	int ret;
+
+	p = (__be32 *)src->iov_base;
+	len = src->iov_len;
+	xid = rmsgp->rm_xid;
+
+#ifdef SVCRDMA_BACKCHANNEL_DEBUG
+	pr_info("%s: xid=%08x, length=%zu\n",
+		__func__, be32_to_cpu(xid), len);
+	pr_info("%s: RPC/RDMA: %*ph\n",
+		__func__, (int)RPCRDMA_HDRLEN_MIN, rmsgp);
+	pr_info("%s:      RPC: %*ph\n",
+		__func__, (int)len, p);
+#endif
+
+	ret = -EAGAIN;
+	if (src->iov_len < 24)
+		goto out_shortreply;
+
+	spin_lock_bh(&xprt->transport_lock);
+	req = xprt_lookup_rqst(xprt, xid);
+	if (!req)
+		goto out_notfound;
+
+	dst = &req->rq_private_buf.head[0];
+	memcpy(&req->rq_private_buf, &req->rq_rcv_buf, sizeof(struct xdr_buf));
+	if (dst->iov_len < len)
+		goto out_unlock;
+	memcpy(dst->iov_base, p, len);
+
+	credits = be32_to_cpu(rmsgp->rm_credit);
+	if (credits == 0)
+		credits = 1;	/* don't deadlock */
+	else if (credits > r_xprt->rx_buf.rb_bc_max_requests)
+		credits = r_xprt->rx_buf.rb_bc_max_requests;
+
+	cwnd = xprt->cwnd;
+	xprt->cwnd = credits << RPC_CWNDSHIFT;
+	if (xprt->cwnd > cwnd)
+		xprt_release_rqst_cong(req->rq_task);
+
+	ret = 0;
+	xprt_complete_rqst(req->rq_task, rcvbuf->len);
+	rcvbuf->len = 0;
+
+out_unlock:
+	spin_unlock_bh(&xprt->transport_lock);
+out:
+	return ret;
+
+out_shortreply:
+	dprintk("svcrdma: short bc reply: xprt=%p, len=%zu\n",
+		xprt, src->iov_len);
+	goto out;
+
+out_notfound:
+	dprintk("svcrdma: unrecognized bc reply: xprt=%p, xid=%08x\n",
+		xprt, be32_to_cpu(xid));
+
+	goto out_unlock;
+}
+
+/* Send a backwards direction RPC call.
+ *
+ * Caller holds the connection's mutex and has already marshaled
+ * the RPC/RDMA request.
+ *
+ * This is similar to svc_rdma_reply, but takes an rpc_rqst
+ * instead, does not support chunks, and avoids blocking memory
+ * allocation.
+ *
+ * XXX: There is still an opportunity to block in svc_rdma_send()
+ * if there are no SQ entries to post the Send. This may occur if
+ * the adapter has a small maximum SQ depth.
+ */
+static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
+			      struct rpc_rqst *rqst)
+{
+	struct xdr_buf *sndbuf = &rqst->rq_snd_buf;
+	struct svc_rdma_op_ctxt *ctxt;
+	struct svc_rdma_req_map *vec;
+	struct ib_send_wr send_wr;
+	int ret;
+
+	vec = svc_rdma_get_req_map(rdma);
+	ret = svc_rdma_map_xdr(rdma, sndbuf, vec);
+	if (ret)
+		goto out_err;
+
+	/* Post a recv buffer to handle the reply for this request. */
+	ret = svc_rdma_post_recv(rdma, GFP_NOIO);
+	if (ret) {
+		pr_err("svcrdma: Failed to post bc receive buffer, err=%d.\n",
+		       ret);
+		pr_err("svcrdma: closing transport %p.\n", rdma);
+		set_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags);
+		ret = -ENOTCONN;
+		goto out_err;
+	}
+
+	ctxt = svc_rdma_get_context(rdma);
+	ctxt->pages[0] = virt_to_page(rqst->rq_buffer);
+	ctxt->count = 1;
+
+	ctxt->wr_op = IB_WR_SEND;
+	ctxt->direction = DMA_TO_DEVICE;
+	ctxt->sge[0].lkey = rdma->sc_dma_lkey;
+	ctxt->sge[0].length = sndbuf->len;
+	ctxt->sge[0].addr =
+	    ib_dma_map_page(rdma->sc_cm_id->device, ctxt->pages[0], 0,
+			    sndbuf->len, DMA_TO_DEVICE);
+	if (ib_dma_mapping_error(rdma->sc_cm_id->device, ctxt->sge[0].addr)) {
+		ret = -EIO;
+		goto out_unmap;
+	}
+	atomic_inc(&rdma->sc_dma_used);
+
+	memset(&send_wr, 0, sizeof(send_wr));
+	send_wr.wr_id = (unsigned long)ctxt;
+	send_wr.sg_list = ctxt->sge;
+	send_wr.num_sge = 1;
+	send_wr.opcode = IB_WR_SEND;
+	send_wr.send_flags = IB_SEND_SIGNALED;
+
+	ret = svc_rdma_send(rdma, &send_wr);
+	if (ret) {
+		ret = -EIO;
+		goto out_unmap;
+	}
+
+out_err:
+	svc_rdma_put_req_map(rdma, vec);
+	dprintk("svcrdma: %s returns %d\n", __func__, ret);
+	return ret;
+
+out_unmap:
+	svc_rdma_unmap_dma(ctxt);
+	svc_rdma_put_context(ctxt, 1);
+	goto out_err;
+}
+
+/* Server-side transport endpoint wants a whole page for its send
+ * buffer. The client RPC code constructs the RPC header in this
+ * buffer before it invokes ->send_request.
+ *
+ * Returns NULL if there was a temporary allocation failure.
+ */
+static void *
+xprt_rdma_bc_allocate(struct rpc_task *task, size_t size)
+{
+	struct rpc_rqst *rqst = task->tk_rqstp;
+	struct svc_xprt *sxprt = rqst->rq_xprt->bc_xprt;
+	struct svcxprt_rdma *rdma;
+	struct page *page;
+
+	rdma = container_of(sxprt, struct svcxprt_rdma, sc_xprt);
+
+	/* Prevent an infinite loop: try to make this case work */
+	if (size > PAGE_SIZE)
+		WARN_ONCE(1, "svcrdma: large bc buffer request (size %zu)\n",
+			  size);
+
+	page = alloc_page(RPCRDMA_DEF_GFP);
+	if (!page)
+		return NULL;
+
+	return page_address(page);
+}
+
+static void
+xprt_rdma_bc_free(void *buffer)
+{
+	/* No-op: ctxt and page have already been freed. */
+}
+
+static int
+rpcrdma_bc_send_request(struct svcxprt_rdma *rdma, struct rpc_rqst *rqst)
+{
+	struct rpc_xprt *xprt = rqst->rq_xprt;
+	struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
+	struct rpcrdma_msg *headerp = (struct rpcrdma_msg *)rqst->rq_buffer;
+	int rc;
+
+	/* Space in the send buffer for an RPC/RDMA header is reserved
+	 * via xprt->tsh_size.
+	 */
+	headerp->rm_xid = rqst->rq_xid;
+	headerp->rm_vers = rpcrdma_version;
+	headerp->rm_credit = cpu_to_be32(r_xprt->rx_buf.rb_bc_max_requests);
+	headerp->rm_type = rdma_msg;
+	headerp->rm_body.rm_chunks[0] = xdr_zero;
+	headerp->rm_body.rm_chunks[1] = xdr_zero;
+	headerp->rm_body.rm_chunks[2] = xdr_zero;
+
+#ifdef SVCRDMA_BACKCHANNEL_DEBUG
+	pr_info("%s: %*ph\n", __func__, 64, rqst->rq_buffer);
+#endif
+
+	rc = svc_rdma_bc_sendto(rdma, rqst);
+	if (rc)
+		goto drop_connection;
+	return rc;
+
+drop_connection:
+	dprintk("svcrdma: failed to send bc call\n");
+	xprt_disconnect_done(xprt);
+	return -ENOTCONN;
+}
+
+/* Send an RPC call on the passive end of a transport
+ * connection.
+ */
+static int
+xprt_rdma_bc_send_request(struct rpc_task *task)
+{
+	struct rpc_rqst *rqst = task->tk_rqstp;
+	struct svc_xprt *sxprt = rqst->rq_xprt->bc_xprt;
+	struct svcxprt_rdma *rdma;
+	u32 len;
+
+	dprintk("svcrdma: sending bc call with xid: %08x\n",
+		be32_to_cpu(rqst->rq_xid));
+
+	if (!mutex_trylock(&sxprt->xpt_mutex)) {
+		rpc_sleep_on(&sxprt->xpt_bc_pending, task, NULL);
+		if (!mutex_trylock(&sxprt->xpt_mutex))
+			return -EAGAIN;
+		rpc_wake_up_queued_task(&sxprt->xpt_bc_pending, task);
+	}
+
+	len = -ENOTCONN;
+	rdma = container_of(sxprt, struct svcxprt_rdma, sc_xprt);
+	if (!test_bit(XPT_DEAD, &sxprt->xpt_flags))
+		len = rpcrdma_bc_send_request(rdma, rqst);
+
+	mutex_unlock(&sxprt->xpt_mutex);
+
+	if (len < 0)
+		return len;
+	return 0;
+}
+
+static void
+xprt_rdma_bc_close(struct rpc_xprt *xprt)
+{
+	dprintk("svcrdma: %s: xprt %p\n", __func__, xprt);
+}
+
+static void
+xprt_rdma_bc_put(struct rpc_xprt *xprt)
+{
+	dprintk("svcrdma: %s: xprt %p\n", __func__, xprt);
+
+	xprt_free(xprt);
+	module_put(THIS_MODULE);
+}
+
+static struct rpc_xprt_ops xprt_rdma_bc_procs = {
+	.reserve_xprt		= xprt_reserve_xprt_cong,
+	.release_xprt		= xprt_release_xprt_cong,
+	.alloc_slot		= xprt_alloc_slot,
+	.release_request	= xprt_release_rqst_cong,
+	.buf_alloc		= xprt_rdma_bc_allocate,
+	.buf_free		= xprt_rdma_bc_free,
+	.send_request		= xprt_rdma_bc_send_request,
+	.set_retrans_timeout	= xprt_set_retrans_timeout_def,
+	.close			= xprt_rdma_bc_close,
+	.destroy		= xprt_rdma_bc_put,
+	.print_stats		= xprt_rdma_print_stats
+};
+
+static const struct rpc_timeout xprt_rdma_bc_timeout = {
+	.to_initval = 60 * HZ,
+	.to_maxval = 60 * HZ,
+};
+
+/* It shouldn't matter if the number of backchannel session slots
+ * doesn't match the number of RPC/RDMA credits. That just means
+ * one or the other will have extra slots that aren't used.
+ */
+static struct rpc_xprt *
+xprt_setup_rdma_bc(struct xprt_create *args)
+{
+	struct rpc_xprt *xprt;
+	struct rpcrdma_xprt *new_xprt;
+
+	if (args->addrlen > sizeof(xprt->addr)) {
+		dprintk("RPC:       %s: address too large\n", __func__);
+		return ERR_PTR(-EBADF);
+	}
+
+	xprt = xprt_alloc(args->net, sizeof(*new_xprt),
+			  RPCRDMA_MAX_BC_REQUESTS,
+			  RPCRDMA_MAX_BC_REQUESTS);
+	if (!xprt) {
+		dprintk("RPC:       %s: couldn't allocate rpc_xprt\n",
+			__func__);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	xprt->timeout = &xprt_rdma_bc_timeout;
+	xprt_set_bound(xprt);
+	xprt_set_connected(xprt);
+	xprt->bind_timeout = RPCRDMA_BIND_TO;
+	xprt->reestablish_timeout = RPCRDMA_INIT_REEST_TO;
+	xprt->idle_timeout = RPCRDMA_IDLE_DISC_TO;
+
+	xprt->prot = XPRT_TRANSPORT_BC_RDMA;
+	xprt->tsh_size = RPCRDMA_HDRLEN_MIN / sizeof(__be32);
+	xprt->ops = &xprt_rdma_bc_procs;
+
+	memcpy(&xprt->addr, args->dstaddr, args->addrlen);
+	xprt->addrlen = args->addrlen;
+	xprt_rdma_format_addresses(xprt, (struct sockaddr *)&xprt->addr);
+	xprt->resvport = 0;
+
+	xprt->max_payload = xprt_rdma_max_inline_read;
+
+	new_xprt = rpcx_to_rdmax(xprt);
+	new_xprt->rx_buf.rb_bc_max_requests = xprt->max_reqs;
+
+	xprt_get(xprt);
+	args->bc_xprt->xpt_bc_xprt = xprt;
+	xprt->bc_xprt = args->bc_xprt;
+
+	if (!try_module_get(THIS_MODULE))
+		goto out_fail;
+
+	/* Final put for backchannel xprt is in __svc_rdma_free */
+	xprt_get(xprt);
+	return xprt;
+
+out_fail:
+	xprt_rdma_free_addresses(xprt);
+	args->bc_xprt->xpt_bc_xprt = NULL;
+	xprt_put(xprt);
+	xprt_free(xprt);
+	return ERR_PTR(-EINVAL);
+}
+
+struct xprt_class xprt_rdma_bc = {
+	.list			= LIST_HEAD_INIT(xprt_rdma_bc.list),
+	.name			= "rdma backchannel",
+	.owner			= THIS_MODULE,
+	.ident			= XPRT_TRANSPORT_BC_RDMA,
+	.setup			= xprt_setup_rdma_bc,
+};
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index ff4f01e..3dfe464 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -567,6 +567,38 @@ static int rdma_read_complete(struct svc_rqst *rqstp,
 	return ret;
 }
 
+/* By convention, backchannel calls arrive via rdma_msg type
+ * messages, and never populate the chunk lists. This makes
+ * the RPC/RDMA header small and fixed in size, so it is
+ * straightforward to check the RPC header's direction field.
+ */
+static bool
+svc_rdma_is_backchannel_reply(struct svc_xprt *xprt, struct rpcrdma_msg *rmsgp)
+{
+	__be32 *p = (__be32 *)rmsgp;
+
+	if (!xprt->xpt_bc_xprt)
+		return false;
+
+	if (rmsgp->rm_type != rdma_msg)
+		return false;
+	if (rmsgp->rm_body.rm_chunks[0] != xdr_zero)
+		return false;
+	if (rmsgp->rm_body.rm_chunks[1] != xdr_zero)
+		return false;
+	if (rmsgp->rm_body.rm_chunks[2] != xdr_zero)
+		return false;
+
+	/* sanity */
+	if (p[7] != rmsgp->rm_xid)
+		return false;
+	/* call direction */
+	if (p[8] == cpu_to_be32(RPC_CALL))
+		return false;
+
+	return true;
+}
+
 /*
  * Set up the rqstp thread context to point to the RQ buffer. If
  * necessary, pull additional data from the client with an RDMA_READ
@@ -632,6 +664,15 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
 		goto close_out;
 	}
 
+	if (svc_rdma_is_backchannel_reply(xprt, rmsgp)) {
+		ret = svc_rdma_handle_bc_reply(xprt->xpt_bc_xprt, rmsgp,
+					       &rqstp->rq_arg);
+		svc_rdma_put_context(ctxt, 0);
+		if (ret)
+			goto repost;
+		return ret;
+	}
+
 	/* Read read-list data. */
 	ret = rdma_read_chunks(rdma_xprt, rmsgp, rqstp, ctxt);
 	if (ret > 0) {
@@ -668,4 +709,15 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
 	set_bit(XPT_CLOSE, &xprt->xpt_flags);
 defer:
 	return 0;
+
+repost:
+	ret = svc_rdma_post_recv(rdma_xprt, GFP_KERNEL);
+	if (ret) {
+		pr_err("svcrdma: could not post a receive buffer, err=%d.\n",
+		       ret);
+		pr_err("svcrdma: closing transport %p.\n", rdma_xprt);
+		set_bit(XPT_CLOSE, &rdma_xprt->sc_xprt.xpt_flags);
+		ret = -ENOTCONN;
+	}
+	return ret;
 }
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 35326a3..abfbd02 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -1287,12 +1287,14 @@ static void __svc_rdma_free(struct work_struct *work)
 {
 	struct svcxprt_rdma *rdma =
 		container_of(work, struct svcxprt_rdma, sc_work);
-	dprintk("svcrdma: svc_rdma_free(%p)\n", rdma);
+	struct svc_xprt *xprt = &rdma->sc_xprt;
+
+	dprintk("svcrdma: %s(%p)\n", __func__, rdma);
 
 	/* We should only be called from kref_put */
-	if (atomic_read(&rdma->sc_xprt.xpt_ref.refcount) != 0)
+	if (atomic_read(&xprt->xpt_ref.refcount) != 0)
 		pr_err("svcrdma: sc_xprt still in use? (%d)\n",
-		       atomic_read(&rdma->sc_xprt.xpt_ref.refcount));
+		       atomic_read(&xprt->xpt_ref.refcount));
 
 	/*
 	 * Destroy queued, but not processed read completions. Note
@@ -1327,6 +1329,12 @@ static void __svc_rdma_free(struct work_struct *work)
 		pr_err("svcrdma: dma still in use? (%d)\n",
 		       atomic_read(&rdma->sc_dma_used));
 
+	/* Final put of backchannel client transport */
+	if (xprt->xpt_bc_xprt) {
+		xprt_put(xprt->xpt_bc_xprt);
+		xprt->xpt_bc_xprt = NULL;
+	}
+
 	rdma_dealloc_frmr_q(rdma);
 	svc_rdma_destroy_ctxts(rdma);
 	svc_rdma_destroy_maps(rdma);
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 8c545f7..5c7d235 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -63,7 +63,7 @@
  */
 
 static unsigned int xprt_rdma_slot_table_entries = RPCRDMA_DEF_SLOT_TABLE;
-static unsigned int xprt_rdma_max_inline_read = RPCRDMA_DEF_INLINE;
+unsigned int xprt_rdma_max_inline_read = RPCRDMA_DEF_INLINE;
 static unsigned int xprt_rdma_max_inline_write = RPCRDMA_DEF_INLINE;
 static unsigned int xprt_rdma_inline_write_padding;
 static unsigned int xprt_rdma_memreg_strategy = RPCRDMA_FRMR;
@@ -143,12 +143,7 @@ static struct ctl_table sunrpc_table[] = {
 
 #endif
 
-#define RPCRDMA_BIND_TO		(60U * HZ)
-#define RPCRDMA_INIT_REEST_TO	(5U * HZ)
-#define RPCRDMA_MAX_REEST_TO	(30U * HZ)
-#define RPCRDMA_IDLE_DISC_TO	(5U * 60 * HZ)
-
-static struct rpc_xprt_ops xprt_rdma_procs;	/* forward reference */
+static struct rpc_xprt_ops xprt_rdma_procs;	/*forward reference */
 
 static void
 xprt_rdma_format_addresses4(struct rpc_xprt *xprt, struct sockaddr *sap)
@@ -174,7 +169,7 @@ xprt_rdma_format_addresses6(struct rpc_xprt *xprt, struct sockaddr *sap)
 	xprt->address_strings[RPC_DISPLAY_NETID] = RPCBIND_NETID_RDMA6;
 }
 
-static void
+void
 xprt_rdma_format_addresses(struct rpc_xprt *xprt, struct sockaddr *sap)
 {
 	char buf[128];
@@ -203,7 +198,7 @@ xprt_rdma_format_addresses(struct rpc_xprt *xprt, struct sockaddr *sap)
 	xprt->address_strings[RPC_DISPLAY_PROTO] = "rdma";
 }
 
-static void
+void
 xprt_rdma_free_addresses(struct rpc_xprt *xprt)
 {
 	unsigned int i;
@@ -499,7 +494,7 @@ xprt_rdma_allocate(struct rpc_task *task, size_t size)
 	if (req == NULL)
 		return NULL;
 
-	flags = GFP_NOIO | __GFP_NOWARN;
+	flags = RPCRDMA_DEF_GFP;
 	if (RPC_IS_SWAPPER(task))
 		flags = __GFP_MEMALLOC | GFP_NOWAIT | __GFP_NOWARN;
 
@@ -639,7 +634,7 @@ drop_connection:
 	return -ENOTCONN;	/* implies disconnect */
 }
 
-static void xprt_rdma_print_stats(struct rpc_xprt *xprt, struct seq_file *seq)
+void xprt_rdma_print_stats(struct rpc_xprt *xprt, struct seq_file *seq)
 {
 	struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
 	long idle_time = 0;
@@ -740,6 +735,11 @@ void xprt_rdma_cleanup(void)
 
 	rpcrdma_destroy_wq();
 	frwr_destroy_recovery_wq();
+
+	rc = xprt_unregister_transport(&xprt_rdma_bc);
+	if (rc)
+		dprintk("RPC:       %s: xprt_unregister(bc) returned %i\n",
+			__func__, rc);
 }
 
 int xprt_rdma_init(void)
@@ -763,6 +763,14 @@ int xprt_rdma_init(void)
 		return rc;
 	}
 
+	rc = xprt_register_transport(&xprt_rdma_bc);
+	if (rc) {
+		xprt_unregister_transport(&xprt_rdma);
+		rpcrdma_destroy_wq();
+		frwr_destroy_recovery_wq();
+		return rc;
+	}
+
 	dprintk("RPCRDMA Module Init, register RPC RDMA transport\n");
 
 	dprintk("Defaults:\n");
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 72276c7..5a38236 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -55,6 +55,11 @@
 #define RDMA_RESOLVE_TIMEOUT	(5000)	/* 5 seconds */
 #define RDMA_CONNECT_RETRY_MAX	(2)	/* retries if no listener backlog */
 
+#define RPCRDMA_BIND_TO		(60U * HZ)
+#define RPCRDMA_INIT_REEST_TO	(5U * HZ)
+#define RPCRDMA_MAX_REEST_TO	(30U * HZ)
+#define RPCRDMA_IDLE_DISC_TO	(5U * 60 * HZ)
+
 /*
  * Interface Adapter -- one per transport instance
  */
@@ -147,6 +152,8 @@ rdmab_to_msg(struct rpcrdma_regbuf *rb)
 	return (struct rpcrdma_msg *)rb->rg_base;
 }
 
+#define RPCRDMA_DEF_GFP		(GFP_NOIO | __GFP_NOWARN)
+
 /*
  * struct rpcrdma_rep -- this structure encapsulates state required to recv
  * and complete a reply, asychronously. It needs several pieces of
@@ -308,6 +315,8 @@ struct rpcrdma_buffer {
 	u32			rb_bc_srv_max_requests;
 	spinlock_t		rb_reqslock;	/* protect rb_allreqs */
 	struct list_head	rb_allreqs;
+
+	u32			rb_bc_max_requests;
 };
 #define rdmab_to_ia(b) (&container_of((b), struct rpcrdma_xprt, rx_buf)->rx_ia)
 
@@ -513,6 +522,10 @@ int rpcrdma_marshal_req(struct rpc_rqst *);
 
 /* RPC/RDMA module init - xprtrdma/transport.c
  */
+extern unsigned int xprt_rdma_max_inline_read;
+void xprt_rdma_format_addresses(struct rpc_xprt *xprt, struct sockaddr *sap);
+void xprt_rdma_free_addresses(struct rpc_xprt *xprt);
+void xprt_rdma_print_stats(struct rpc_xprt *xprt, struct seq_file *seq);
 int xprt_rdma_init(void);
 void xprt_rdma_cleanup(void);
 
@@ -528,4 +541,6 @@ void xprt_rdma_bc_free_rqst(struct rpc_rqst *);
 void xprt_rdma_bc_destroy(struct rpc_xprt *, unsigned int);
 #endif	/* CONFIG_SUNRPC_BACKCHANNEL */
 
+extern struct xprt_class xprt_rdma_bc;
+
 #endif				/* _LINUX_SUNRPC_XPRT_RDMA_H */

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 11/11] svcrdma: Add class for RDMA backwards direction transport
@ 2015-12-14 21:31     ` Chuck Lever
  0 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-14 21:31 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

To support the server-side of an NFSv4.1 backchannel on RDMA
connections, add a transport class that enables backward
direction messages on an existing forward channel connection.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h            |    5 
 net/sunrpc/xprt.c                          |    1 
 net/sunrpc/xprtrdma/Makefile               |    2 
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |  371 ++++++++++++++++++++++++++++
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c    |   52 ++++
 net/sunrpc/xprtrdma/svc_rdma_transport.c   |   14 +
 net/sunrpc/xprtrdma/transport.c            |   30 +-
 net/sunrpc/xprtrdma/xprt_rdma.h            |   15 +
 8 files changed, 475 insertions(+), 15 deletions(-)
 create mode 100644 net/sunrpc/xprtrdma/svc_rdma_backchannel.c

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 9a2c418..b13513a 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -195,6 +195,11 @@ struct svcxprt_rdma {
 
 #define RPCSVC_MAXPAYLOAD_RDMA	RPCSVC_MAXPAYLOAD
 
+/* svc_rdma_backchannel.c */
+extern int svc_rdma_handle_bc_reply(struct rpc_xprt *xprt,
+				    struct rpcrdma_msg *rmsgp,
+				    struct xdr_buf *rcvbuf);
+
 /* svc_rdma_marshal.c */
 extern int svc_rdma_xdr_decode_req(struct rpcrdma_msg **, struct svc_rqst *);
 extern int svc_rdma_xdr_encode_error(struct svcxprt_rdma *,
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 2e98f4a..37edea6 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1425,3 +1425,4 @@ void xprt_put(struct rpc_xprt *xprt)
 	if (atomic_dec_and_test(&xprt->count))
 		xprt_destroy(xprt);
 }
+EXPORT_SYMBOL_GPL(xprt_put);
diff --git a/net/sunrpc/xprtrdma/Makefile b/net/sunrpc/xprtrdma/Makefile
index 33f99d3..dc9f3b5 100644
--- a/net/sunrpc/xprtrdma/Makefile
+++ b/net/sunrpc/xprtrdma/Makefile
@@ -2,7 +2,7 @@ obj-$(CONFIG_SUNRPC_XPRT_RDMA) += rpcrdma.o
 
 rpcrdma-y := transport.o rpc_rdma.o verbs.o \
 	fmr_ops.o frwr_ops.o physical_ops.o \
-	svc_rdma.o svc_rdma_transport.o \
+	svc_rdma.o svc_rdma_backchannel.o svc_rdma_transport.o \
 	svc_rdma_marshal.o svc_rdma_sendto.o svc_rdma_recvfrom.o \
 	module.o
 rpcrdma-$(CONFIG_SUNRPC_BACKCHANNEL) += backchannel.o
diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
new file mode 100644
index 0000000..417cec1
--- /dev/null
+++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
@@ -0,0 +1,371 @@
+/*
+ * Copyright (c) 2015 Oracle.  All rights reserved.
+ *
+ * Support for backward direction RPCs on RPC/RDMA (server-side).
+ */
+
+#include <linux/sunrpc/svc_rdma.h>
+#include "xprt_rdma.h"
+
+#define RPCDBG_FACILITY	RPCDBG_SVCXPRT
+
+#undef SVCRDMA_BACKCHANNEL_DEBUG
+
+int svc_rdma_handle_bc_reply(struct rpc_xprt *xprt, struct rpcrdma_msg *rmsgp,
+			     struct xdr_buf *rcvbuf)
+{
+	struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
+	struct kvec *dst, *src = &rcvbuf->head[0];
+	struct rpc_rqst *req;
+	unsigned long cwnd;
+	u32 credits;
+	size_t len;
+	__be32 xid;
+	__be32 *p;
+	int ret;
+
+	p = (__be32 *)src->iov_base;
+	len = src->iov_len;
+	xid = rmsgp->rm_xid;
+
+#ifdef SVCRDMA_BACKCHANNEL_DEBUG
+	pr_info("%s: xid=%08x, length=%zu\n",
+		__func__, be32_to_cpu(xid), len);
+	pr_info("%s: RPC/RDMA: %*ph\n",
+		__func__, (int)RPCRDMA_HDRLEN_MIN, rmsgp);
+	pr_info("%s:      RPC: %*ph\n",
+		__func__, (int)len, p);
+#endif
+
+	ret = -EAGAIN;
+	if (src->iov_len < 24)
+		goto out_shortreply;
+
+	spin_lock_bh(&xprt->transport_lock);
+	req = xprt_lookup_rqst(xprt, xid);
+	if (!req)
+		goto out_notfound;
+
+	dst = &req->rq_private_buf.head[0];
+	memcpy(&req->rq_private_buf, &req->rq_rcv_buf, sizeof(struct xdr_buf));
+	if (dst->iov_len < len)
+		goto out_unlock;
+	memcpy(dst->iov_base, p, len);
+
+	credits = be32_to_cpu(rmsgp->rm_credit);
+	if (credits == 0)
+		credits = 1;	/* don't deadlock */
+	else if (credits > r_xprt->rx_buf.rb_bc_max_requests)
+		credits = r_xprt->rx_buf.rb_bc_max_requests;
+
+	cwnd = xprt->cwnd;
+	xprt->cwnd = credits << RPC_CWNDSHIFT;
+	if (xprt->cwnd > cwnd)
+		xprt_release_rqst_cong(req->rq_task);
+
+	ret = 0;
+	xprt_complete_rqst(req->rq_task, rcvbuf->len);
+	rcvbuf->len = 0;
+
+out_unlock:
+	spin_unlock_bh(&xprt->transport_lock);
+out:
+	return ret;
+
+out_shortreply:
+	dprintk("svcrdma: short bc reply: xprt=%p, len=%zu\n",
+		xprt, src->iov_len);
+	goto out;
+
+out_notfound:
+	dprintk("svcrdma: unrecognized bc reply: xprt=%p, xid=%08x\n",
+		xprt, be32_to_cpu(xid));
+
+	goto out_unlock;
+}
+
+/* Send a backwards direction RPC call.
+ *
+ * Caller holds the connection's mutex and has already marshaled
+ * the RPC/RDMA request.
+ *
+ * This is similar to svc_rdma_reply, but takes an rpc_rqst
+ * instead, does not support chunks, and avoids blocking memory
+ * allocation.
+ *
+ * XXX: There is still an opportunity to block in svc_rdma_send()
+ * if there are no SQ entries to post the Send. This may occur if
+ * the adapter has a small maximum SQ depth.
+ */
+static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
+			      struct rpc_rqst *rqst)
+{
+	struct xdr_buf *sndbuf = &rqst->rq_snd_buf;
+	struct svc_rdma_op_ctxt *ctxt;
+	struct svc_rdma_req_map *vec;
+	struct ib_send_wr send_wr;
+	int ret;
+
+	vec = svc_rdma_get_req_map(rdma);
+	ret = svc_rdma_map_xdr(rdma, sndbuf, vec);
+	if (ret)
+		goto out_err;
+
+	/* Post a recv buffer to handle the reply for this request. */
+	ret = svc_rdma_post_recv(rdma, GFP_NOIO);
+	if (ret) {
+		pr_err("svcrdma: Failed to post bc receive buffer, err=%d.\n",
+		       ret);
+		pr_err("svcrdma: closing transport %p.\n", rdma);
+		set_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags);
+		ret = -ENOTCONN;
+		goto out_err;
+	}
+
+	ctxt = svc_rdma_get_context(rdma);
+	ctxt->pages[0] = virt_to_page(rqst->rq_buffer);
+	ctxt->count = 1;
+
+	ctxt->wr_op = IB_WR_SEND;
+	ctxt->direction = DMA_TO_DEVICE;
+	ctxt->sge[0].lkey = rdma->sc_dma_lkey;
+	ctxt->sge[0].length = sndbuf->len;
+	ctxt->sge[0].addr =
+	    ib_dma_map_page(rdma->sc_cm_id->device, ctxt->pages[0], 0,
+			    sndbuf->len, DMA_TO_DEVICE);
+	if (ib_dma_mapping_error(rdma->sc_cm_id->device, ctxt->sge[0].addr)) {
+		ret = -EIO;
+		goto out_unmap;
+	}
+	atomic_inc(&rdma->sc_dma_used);
+
+	memset(&send_wr, 0, sizeof(send_wr));
+	send_wr.wr_id = (unsigned long)ctxt;
+	send_wr.sg_list = ctxt->sge;
+	send_wr.num_sge = 1;
+	send_wr.opcode = IB_WR_SEND;
+	send_wr.send_flags = IB_SEND_SIGNALED;
+
+	ret = svc_rdma_send(rdma, &send_wr);
+	if (ret) {
+		ret = -EIO;
+		goto out_unmap;
+	}
+
+out_err:
+	svc_rdma_put_req_map(rdma, vec);
+	dprintk("svcrdma: %s returns %d\n", __func__, ret);
+	return ret;
+
+out_unmap:
+	svc_rdma_unmap_dma(ctxt);
+	svc_rdma_put_context(ctxt, 1);
+	goto out_err;
+}
+
+/* Server-side transport endpoint wants a whole page for its send
+ * buffer. The client RPC code constructs the RPC header in this
+ * buffer before it invokes ->send_request.
+ *
+ * Returns NULL if there was a temporary allocation failure.
+ */
+static void *
+xprt_rdma_bc_allocate(struct rpc_task *task, size_t size)
+{
+	struct rpc_rqst *rqst = task->tk_rqstp;
+	struct svc_xprt *sxprt = rqst->rq_xprt->bc_xprt;
+	struct svcxprt_rdma *rdma;
+	struct page *page;
+
+	rdma = container_of(sxprt, struct svcxprt_rdma, sc_xprt);
+
+	/* Prevent an infinite loop: try to make this case work */
+	if (size > PAGE_SIZE)
+		WARN_ONCE(1, "svcrdma: large bc buffer request (size %zu)\n",
+			  size);
+
+	page = alloc_page(RPCRDMA_DEF_GFP);
+	if (!page)
+		return NULL;
+
+	return page_address(page);
+}
+
+static void
+xprt_rdma_bc_free(void *buffer)
+{
+	/* No-op: ctxt and page have already been freed. */
+}
+
+static int
+rpcrdma_bc_send_request(struct svcxprt_rdma *rdma, struct rpc_rqst *rqst)
+{
+	struct rpc_xprt *xprt = rqst->rq_xprt;
+	struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
+	struct rpcrdma_msg *headerp = (struct rpcrdma_msg *)rqst->rq_buffer;
+	int rc;
+
+	/* Space in the send buffer for an RPC/RDMA header is reserved
+	 * via xprt->tsh_size.
+	 */
+	headerp->rm_xid = rqst->rq_xid;
+	headerp->rm_vers = rpcrdma_version;
+	headerp->rm_credit = cpu_to_be32(r_xprt->rx_buf.rb_bc_max_requests);
+	headerp->rm_type = rdma_msg;
+	headerp->rm_body.rm_chunks[0] = xdr_zero;
+	headerp->rm_body.rm_chunks[1] = xdr_zero;
+	headerp->rm_body.rm_chunks[2] = xdr_zero;
+
+#ifdef SVCRDMA_BACKCHANNEL_DEBUG
+	pr_info("%s: %*ph\n", __func__, 64, rqst->rq_buffer);
+#endif
+
+	rc = svc_rdma_bc_sendto(rdma, rqst);
+	if (rc)
+		goto drop_connection;
+	return rc;
+
+drop_connection:
+	dprintk("svcrdma: failed to send bc call\n");
+	xprt_disconnect_done(xprt);
+	return -ENOTCONN;
+}
+
+/* Send an RPC call on the passive end of a transport
+ * connection.
+ */
+static int
+xprt_rdma_bc_send_request(struct rpc_task *task)
+{
+	struct rpc_rqst *rqst = task->tk_rqstp;
+	struct svc_xprt *sxprt = rqst->rq_xprt->bc_xprt;
+	struct svcxprt_rdma *rdma;
+	u32 len;
+
+	dprintk("svcrdma: sending bc call with xid: %08x\n",
+		be32_to_cpu(rqst->rq_xid));
+
+	if (!mutex_trylock(&sxprt->xpt_mutex)) {
+		rpc_sleep_on(&sxprt->xpt_bc_pending, task, NULL);
+		if (!mutex_trylock(&sxprt->xpt_mutex))
+			return -EAGAIN;
+		rpc_wake_up_queued_task(&sxprt->xpt_bc_pending, task);
+	}
+
+	len = -ENOTCONN;
+	rdma = container_of(sxprt, struct svcxprt_rdma, sc_xprt);
+	if (!test_bit(XPT_DEAD, &sxprt->xpt_flags))
+		len = rpcrdma_bc_send_request(rdma, rqst);
+
+	mutex_unlock(&sxprt->xpt_mutex);
+
+	if (len < 0)
+		return len;
+	return 0;
+}
+
+static void
+xprt_rdma_bc_close(struct rpc_xprt *xprt)
+{
+	dprintk("svcrdma: %s: xprt %p\n", __func__, xprt);
+}
+
+static void
+xprt_rdma_bc_put(struct rpc_xprt *xprt)
+{
+	dprintk("svcrdma: %s: xprt %p\n", __func__, xprt);
+
+	xprt_free(xprt);
+	module_put(THIS_MODULE);
+}
+
+static struct rpc_xprt_ops xprt_rdma_bc_procs = {
+	.reserve_xprt		= xprt_reserve_xprt_cong,
+	.release_xprt		= xprt_release_xprt_cong,
+	.alloc_slot		= xprt_alloc_slot,
+	.release_request	= xprt_release_rqst_cong,
+	.buf_alloc		= xprt_rdma_bc_allocate,
+	.buf_free		= xprt_rdma_bc_free,
+	.send_request		= xprt_rdma_bc_send_request,
+	.set_retrans_timeout	= xprt_set_retrans_timeout_def,
+	.close			= xprt_rdma_bc_close,
+	.destroy		= xprt_rdma_bc_put,
+	.print_stats		= xprt_rdma_print_stats
+};
+
+static const struct rpc_timeout xprt_rdma_bc_timeout = {
+	.to_initval = 60 * HZ,
+	.to_maxval = 60 * HZ,
+};
+
+/* It shouldn't matter if the number of backchannel session slots
+ * doesn't match the number of RPC/RDMA credits. That just means
+ * one or the other will have extra slots that aren't used.
+ */
+static struct rpc_xprt *
+xprt_setup_rdma_bc(struct xprt_create *args)
+{
+	struct rpc_xprt *xprt;
+	struct rpcrdma_xprt *new_xprt;
+
+	if (args->addrlen > sizeof(xprt->addr)) {
+		dprintk("RPC:       %s: address too large\n", __func__);
+		return ERR_PTR(-EBADF);
+	}
+
+	xprt = xprt_alloc(args->net, sizeof(*new_xprt),
+			  RPCRDMA_MAX_BC_REQUESTS,
+			  RPCRDMA_MAX_BC_REQUESTS);
+	if (!xprt) {
+		dprintk("RPC:       %s: couldn't allocate rpc_xprt\n",
+			__func__);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	xprt->timeout = &xprt_rdma_bc_timeout;
+	xprt_set_bound(xprt);
+	xprt_set_connected(xprt);
+	xprt->bind_timeout = RPCRDMA_BIND_TO;
+	xprt->reestablish_timeout = RPCRDMA_INIT_REEST_TO;
+	xprt->idle_timeout = RPCRDMA_IDLE_DISC_TO;
+
+	xprt->prot = XPRT_TRANSPORT_BC_RDMA;
+	xprt->tsh_size = RPCRDMA_HDRLEN_MIN / sizeof(__be32);
+	xprt->ops = &xprt_rdma_bc_procs;
+
+	memcpy(&xprt->addr, args->dstaddr, args->addrlen);
+	xprt->addrlen = args->addrlen;
+	xprt_rdma_format_addresses(xprt, (struct sockaddr *)&xprt->addr);
+	xprt->resvport = 0;
+
+	xprt->max_payload = xprt_rdma_max_inline_read;
+
+	new_xprt = rpcx_to_rdmax(xprt);
+	new_xprt->rx_buf.rb_bc_max_requests = xprt->max_reqs;
+
+	xprt_get(xprt);
+	args->bc_xprt->xpt_bc_xprt = xprt;
+	xprt->bc_xprt = args->bc_xprt;
+
+	if (!try_module_get(THIS_MODULE))
+		goto out_fail;
+
+	/* Final put for backchannel xprt is in __svc_rdma_free */
+	xprt_get(xprt);
+	return xprt;
+
+out_fail:
+	xprt_rdma_free_addresses(xprt);
+	args->bc_xprt->xpt_bc_xprt = NULL;
+	xprt_put(xprt);
+	xprt_free(xprt);
+	return ERR_PTR(-EINVAL);
+}
+
+struct xprt_class xprt_rdma_bc = {
+	.list			= LIST_HEAD_INIT(xprt_rdma_bc.list),
+	.name			= "rdma backchannel",
+	.owner			= THIS_MODULE,
+	.ident			= XPRT_TRANSPORT_BC_RDMA,
+	.setup			= xprt_setup_rdma_bc,
+};
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index ff4f01e..3dfe464 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -567,6 +567,38 @@ static int rdma_read_complete(struct svc_rqst *rqstp,
 	return ret;
 }
 
+/* By convention, backchannel calls arrive via rdma_msg type
+ * messages, and never populate the chunk lists. This makes
+ * the RPC/RDMA header small and fixed in size, so it is
+ * straightforward to check the RPC header's direction field.
+ */
+static bool
+svc_rdma_is_backchannel_reply(struct svc_xprt *xprt, struct rpcrdma_msg *rmsgp)
+{
+	__be32 *p = (__be32 *)rmsgp;
+
+	if (!xprt->xpt_bc_xprt)
+		return false;
+
+	if (rmsgp->rm_type != rdma_msg)
+		return false;
+	if (rmsgp->rm_body.rm_chunks[0] != xdr_zero)
+		return false;
+	if (rmsgp->rm_body.rm_chunks[1] != xdr_zero)
+		return false;
+	if (rmsgp->rm_body.rm_chunks[2] != xdr_zero)
+		return false;
+
+	/* sanity */
+	if (p[7] != rmsgp->rm_xid)
+		return false;
+	/* call direction */
+	if (p[8] == cpu_to_be32(RPC_CALL))
+		return false;
+
+	return true;
+}
+
 /*
  * Set up the rqstp thread context to point to the RQ buffer. If
  * necessary, pull additional data from the client with an RDMA_READ
@@ -632,6 +664,15 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
 		goto close_out;
 	}
 
+	if (svc_rdma_is_backchannel_reply(xprt, rmsgp)) {
+		ret = svc_rdma_handle_bc_reply(xprt->xpt_bc_xprt, rmsgp,
+					       &rqstp->rq_arg);
+		svc_rdma_put_context(ctxt, 0);
+		if (ret)
+			goto repost;
+		return ret;
+	}
+
 	/* Read read-list data. */
 	ret = rdma_read_chunks(rdma_xprt, rmsgp, rqstp, ctxt);
 	if (ret > 0) {
@@ -668,4 +709,15 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
 	set_bit(XPT_CLOSE, &xprt->xpt_flags);
 defer:
 	return 0;
+
+repost:
+	ret = svc_rdma_post_recv(rdma_xprt, GFP_KERNEL);
+	if (ret) {
+		pr_err("svcrdma: could not post a receive buffer, err=%d.\n",
+		       ret);
+		pr_err("svcrdma: closing transport %p.\n", rdma_xprt);
+		set_bit(XPT_CLOSE, &rdma_xprt->sc_xprt.xpt_flags);
+		ret = -ENOTCONN;
+	}
+	return ret;
 }
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 35326a3..abfbd02 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -1287,12 +1287,14 @@ static void __svc_rdma_free(struct work_struct *work)
 {
 	struct svcxprt_rdma *rdma =
 		container_of(work, struct svcxprt_rdma, sc_work);
-	dprintk("svcrdma: svc_rdma_free(%p)\n", rdma);
+	struct svc_xprt *xprt = &rdma->sc_xprt;
+
+	dprintk("svcrdma: %s(%p)\n", __func__, rdma);
 
 	/* We should only be called from kref_put */
-	if (atomic_read(&rdma->sc_xprt.xpt_ref.refcount) != 0)
+	if (atomic_read(&xprt->xpt_ref.refcount) != 0)
 		pr_err("svcrdma: sc_xprt still in use? (%d)\n",
-		       atomic_read(&rdma->sc_xprt.xpt_ref.refcount));
+		       atomic_read(&xprt->xpt_ref.refcount));
 
 	/*
 	 * Destroy queued, but not processed read completions. Note
@@ -1327,6 +1329,12 @@ static void __svc_rdma_free(struct work_struct *work)
 		pr_err("svcrdma: dma still in use? (%d)\n",
 		       atomic_read(&rdma->sc_dma_used));
 
+	/* Final put of backchannel client transport */
+	if (xprt->xpt_bc_xprt) {
+		xprt_put(xprt->xpt_bc_xprt);
+		xprt->xpt_bc_xprt = NULL;
+	}
+
 	rdma_dealloc_frmr_q(rdma);
 	svc_rdma_destroy_ctxts(rdma);
 	svc_rdma_destroy_maps(rdma);
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 8c545f7..5c7d235 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -63,7 +63,7 @@
  */
 
 static unsigned int xprt_rdma_slot_table_entries = RPCRDMA_DEF_SLOT_TABLE;
-static unsigned int xprt_rdma_max_inline_read = RPCRDMA_DEF_INLINE;
+unsigned int xprt_rdma_max_inline_read = RPCRDMA_DEF_INLINE;
 static unsigned int xprt_rdma_max_inline_write = RPCRDMA_DEF_INLINE;
 static unsigned int xprt_rdma_inline_write_padding;
 static unsigned int xprt_rdma_memreg_strategy = RPCRDMA_FRMR;
@@ -143,12 +143,7 @@ static struct ctl_table sunrpc_table[] = {
 
 #endif
 
-#define RPCRDMA_BIND_TO		(60U * HZ)
-#define RPCRDMA_INIT_REEST_TO	(5U * HZ)
-#define RPCRDMA_MAX_REEST_TO	(30U * HZ)
-#define RPCRDMA_IDLE_DISC_TO	(5U * 60 * HZ)
-
-static struct rpc_xprt_ops xprt_rdma_procs;	/* forward reference */
+static struct rpc_xprt_ops xprt_rdma_procs;	/*forward reference */
 
 static void
 xprt_rdma_format_addresses4(struct rpc_xprt *xprt, struct sockaddr *sap)
@@ -174,7 +169,7 @@ xprt_rdma_format_addresses6(struct rpc_xprt *xprt, struct sockaddr *sap)
 	xprt->address_strings[RPC_DISPLAY_NETID] = RPCBIND_NETID_RDMA6;
 }
 
-static void
+void
 xprt_rdma_format_addresses(struct rpc_xprt *xprt, struct sockaddr *sap)
 {
 	char buf[128];
@@ -203,7 +198,7 @@ xprt_rdma_format_addresses(struct rpc_xprt *xprt, struct sockaddr *sap)
 	xprt->address_strings[RPC_DISPLAY_PROTO] = "rdma";
 }
 
-static void
+void
 xprt_rdma_free_addresses(struct rpc_xprt *xprt)
 {
 	unsigned int i;
@@ -499,7 +494,7 @@ xprt_rdma_allocate(struct rpc_task *task, size_t size)
 	if (req == NULL)
 		return NULL;
 
-	flags = GFP_NOIO | __GFP_NOWARN;
+	flags = RPCRDMA_DEF_GFP;
 	if (RPC_IS_SWAPPER(task))
 		flags = __GFP_MEMALLOC | GFP_NOWAIT | __GFP_NOWARN;
 
@@ -639,7 +634,7 @@ drop_connection:
 	return -ENOTCONN;	/* implies disconnect */
 }
 
-static void xprt_rdma_print_stats(struct rpc_xprt *xprt, struct seq_file *seq)
+void xprt_rdma_print_stats(struct rpc_xprt *xprt, struct seq_file *seq)
 {
 	struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
 	long idle_time = 0;
@@ -740,6 +735,11 @@ void xprt_rdma_cleanup(void)
 
 	rpcrdma_destroy_wq();
 	frwr_destroy_recovery_wq();
+
+	rc = xprt_unregister_transport(&xprt_rdma_bc);
+	if (rc)
+		dprintk("RPC:       %s: xprt_unregister(bc) returned %i\n",
+			__func__, rc);
 }
 
 int xprt_rdma_init(void)
@@ -763,6 +763,14 @@ int xprt_rdma_init(void)
 		return rc;
 	}
 
+	rc = xprt_register_transport(&xprt_rdma_bc);
+	if (rc) {
+		xprt_unregister_transport(&xprt_rdma);
+		rpcrdma_destroy_wq();
+		frwr_destroy_recovery_wq();
+		return rc;
+	}
+
 	dprintk("RPCRDMA Module Init, register RPC RDMA transport\n");
 
 	dprintk("Defaults:\n");
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 72276c7..5a38236 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -55,6 +55,11 @@
 #define RDMA_RESOLVE_TIMEOUT	(5000)	/* 5 seconds */
 #define RDMA_CONNECT_RETRY_MAX	(2)	/* retries if no listener backlog */
 
+#define RPCRDMA_BIND_TO		(60U * HZ)
+#define RPCRDMA_INIT_REEST_TO	(5U * HZ)
+#define RPCRDMA_MAX_REEST_TO	(30U * HZ)
+#define RPCRDMA_IDLE_DISC_TO	(5U * 60 * HZ)
+
 /*
  * Interface Adapter -- one per transport instance
  */
@@ -147,6 +152,8 @@ rdmab_to_msg(struct rpcrdma_regbuf *rb)
 	return (struct rpcrdma_msg *)rb->rg_base;
 }
 
+#define RPCRDMA_DEF_GFP		(GFP_NOIO | __GFP_NOWARN)
+
 /*
  * struct rpcrdma_rep -- this structure encapsulates state required to recv
  * and complete a reply, asychronously. It needs several pieces of
@@ -308,6 +315,8 @@ struct rpcrdma_buffer {
 	u32			rb_bc_srv_max_requests;
 	spinlock_t		rb_reqslock;	/* protect rb_allreqs */
 	struct list_head	rb_allreqs;
+
+	u32			rb_bc_max_requests;
 };
 #define rdmab_to_ia(b) (&container_of((b), struct rpcrdma_xprt, rx_buf)->rx_ia)
 
@@ -513,6 +522,10 @@ int rpcrdma_marshal_req(struct rpc_rqst *);
 
 /* RPC/RDMA module init - xprtrdma/transport.c
  */
+extern unsigned int xprt_rdma_max_inline_read;
+void xprt_rdma_format_addresses(struct rpc_xprt *xprt, struct sockaddr *sap);
+void xprt_rdma_free_addresses(struct rpc_xprt *xprt);
+void xprt_rdma_print_stats(struct rpc_xprt *xprt, struct seq_file *seq);
 int xprt_rdma_init(void);
 void xprt_rdma_cleanup(void);
 
@@ -528,4 +541,6 @@ void xprt_rdma_bc_free_rqst(struct rpc_rqst *);
 void xprt_rdma_bc_destroy(struct rpc_xprt *, unsigned int);
 #endif	/* CONFIG_SUNRPC_BACKCHANNEL */
 
+extern struct xprt_class xprt_rdma_bc;
+
 #endif				/* _LINUX_SUNRPC_XPRT_RDMA_H */


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 00/11] NFS/RDMA server patches for v4.5
  2015-12-14 21:30 ` Chuck Lever
@ 2015-12-16 12:10     ` Devesh Sharma
  -1 siblings, 0 replies; 40+ messages in thread
From: Devesh Sharma @ 2015-12-16 12:10 UTC (permalink / raw)
  To: Chuck Lever
  Cc: bfields-uC3wQj2KruNg9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Linux NFS Mailing List

iozone passed on ocrdma device. Link bounce fails to recover iozone
traffic, however failure is not related to this patch series. I am in
processes of finding out the patch which broke it.

Tested-By: Devesh Sharma <devesh.sharma-1wcpHE2jlwO1Z/+hSey0Gg@public.gmane.org>

On Tue, Dec 15, 2015 at 3:00 AM, Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> Here are patches to support server-side bi-directional RPC/RDMA
> operation (to enable NFSv4.1 on RPC/RDMA transports). Thanks to
> all who reviewed v1, v2, and v3. This version has some significant
> changes since the previous one.
>
> In preparation for Doug's final topic branch, Bruce, I've rebased
> these on Christoph's ib_device_attr branch. There were some merge
> conflicts which I've fixed and tested. These are ready for your
> review.
>
> Also available in the "nfsd-rdma-for-4.5" topic branch of this git repo:
>
> git://git.linux-nfs.org/projects/cel/cel-2.6.git
>
> Or for browsing:
>
> http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=log;h=refs/heads/nfsd-rdma-for-4.5
>
>
> Changes since v3:
> - Rebased on Christoph's ib_device_attr branch
> - Backchannel patches have been squashed together
> - Memory allocation overhaul to prevent blocking allocation
>   when sending backchannel calls
>
>
> Changes since v2:
> - Rebased on v4.4-rc4
> - Backchannel code in new source file to address dprintk issues
> - svc_rdma_get_context() now uses a pre-allocated cache
> - Dropped svc_rdma_send clean up
>
>
> Changes since v1:
>
> - Rebased on v4.4-rc3
> - Removed the use of CONFIG_SUNRPC_BACKCHANNEL
> - Fixed computation of forward and backward max_requests
> - Updated some comments and patch descriptions
> - pr_err and pr_info converted to dprintk
> - Simplified svc_rdma_get_context()
> - Dropped patch removing access_flags field
> - NFSv4.1 callbacks tested with for-4.5 client
>
> ---
>
> Chuck Lever (11):
>       svcrdma: Do not send XDR roundup bytes for a write chunk
>       svcrdma: Clean up rdma_create_xprt()
>       svcrdma: Clean up process_context()
>       svcrdma: Improve allocation of struct svc_rdma_op_ctxt
>       svcrdma: Improve allocation of struct svc_rdma_req_map
>       svcrdma: Remove unused req_map and ctxt kmem_caches
>       svcrdma: Add gfp flags to svc_rdma_post_recv()
>       svcrdma: Remove last two __GFP_NOFAIL call sites
>       svcrdma: Make map_xdr non-static
>       svcrdma: Define maximum number of backchannel requests
>       svcrdma: Add class for RDMA backwards direction transport
>
>
>  include/linux/sunrpc/svc_rdma.h            |   37 ++-
>  net/sunrpc/xprt.c                          |    1
>  net/sunrpc/xprtrdma/Makefile               |    2
>  net/sunrpc/xprtrdma/svc_rdma.c             |   41 ---
>  net/sunrpc/xprtrdma/svc_rdma_backchannel.c |  371 ++++++++++++++++++++++++++++
>  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c    |   52 ++++
>  net/sunrpc/xprtrdma/svc_rdma_sendto.c      |   34 ++-
>  net/sunrpc/xprtrdma/svc_rdma_transport.c   |  284 ++++++++++++++++-----
>  net/sunrpc/xprtrdma/transport.c            |   30 +-
>  net/sunrpc/xprtrdma/xprt_rdma.h            |   20 +-
>  10 files changed, 730 insertions(+), 142 deletions(-)
>  create mode 100644 net/sunrpc/xprtrdma/svc_rdma_backchannel.c
>
> --
> Signature
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 00/11] NFS/RDMA server patches for v4.5
@ 2015-12-16 12:10     ` Devesh Sharma
  0 siblings, 0 replies; 40+ messages in thread
From: Devesh Sharma @ 2015-12-16 12:10 UTC (permalink / raw)
  To: Chuck Lever; +Cc: bfields, linux-rdma, Linux NFS Mailing List

iozone passed on ocrdma device. Link bounce fails to recover iozone
traffic, however failure is not related to this patch series. I am in
processes of finding out the patch which broke it.

Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>

On Tue, Dec 15, 2015 at 3:00 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> Here are patches to support server-side bi-directional RPC/RDMA
> operation (to enable NFSv4.1 on RPC/RDMA transports). Thanks to
> all who reviewed v1, v2, and v3. This version has some significant
> changes since the previous one.
>
> In preparation for Doug's final topic branch, Bruce, I've rebased
> these on Christoph's ib_device_attr branch. There were some merge
> conflicts which I've fixed and tested. These are ready for your
> review.
>
> Also available in the "nfsd-rdma-for-4.5" topic branch of this git repo:
>
> git://git.linux-nfs.org/projects/cel/cel-2.6.git
>
> Or for browsing:
>
> http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=log;h=refs/heads/nfsd-rdma-for-4.5
>
>
> Changes since v3:
> - Rebased on Christoph's ib_device_attr branch
> - Backchannel patches have been squashed together
> - Memory allocation overhaul to prevent blocking allocation
>   when sending backchannel calls
>
>
> Changes since v2:
> - Rebased on v4.4-rc4
> - Backchannel code in new source file to address dprintk issues
> - svc_rdma_get_context() now uses a pre-allocated cache
> - Dropped svc_rdma_send clean up
>
>
> Changes since v1:
>
> - Rebased on v4.4-rc3
> - Removed the use of CONFIG_SUNRPC_BACKCHANNEL
> - Fixed computation of forward and backward max_requests
> - Updated some comments and patch descriptions
> - pr_err and pr_info converted to dprintk
> - Simplified svc_rdma_get_context()
> - Dropped patch removing access_flags field
> - NFSv4.1 callbacks tested with for-4.5 client
>
> ---
>
> Chuck Lever (11):
>       svcrdma: Do not send XDR roundup bytes for a write chunk
>       svcrdma: Clean up rdma_create_xprt()
>       svcrdma: Clean up process_context()
>       svcrdma: Improve allocation of struct svc_rdma_op_ctxt
>       svcrdma: Improve allocation of struct svc_rdma_req_map
>       svcrdma: Remove unused req_map and ctxt kmem_caches
>       svcrdma: Add gfp flags to svc_rdma_post_recv()
>       svcrdma: Remove last two __GFP_NOFAIL call sites
>       svcrdma: Make map_xdr non-static
>       svcrdma: Define maximum number of backchannel requests
>       svcrdma: Add class for RDMA backwards direction transport
>
>
>  include/linux/sunrpc/svc_rdma.h            |   37 ++-
>  net/sunrpc/xprt.c                          |    1
>  net/sunrpc/xprtrdma/Makefile               |    2
>  net/sunrpc/xprtrdma/svc_rdma.c             |   41 ---
>  net/sunrpc/xprtrdma/svc_rdma_backchannel.c |  371 ++++++++++++++++++++++++++++
>  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c    |   52 ++++
>  net/sunrpc/xprtrdma/svc_rdma_sendto.c      |   34 ++-
>  net/sunrpc/xprtrdma/svc_rdma_transport.c   |  284 ++++++++++++++++-----
>  net/sunrpc/xprtrdma/transport.c            |   30 +-
>  net/sunrpc/xprtrdma/xprt_rdma.h            |   20 +-
>  10 files changed, 730 insertions(+), 142 deletions(-)
>  create mode 100644 net/sunrpc/xprtrdma/svc_rdma_backchannel.c
>
> --
> Signature
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 01/11] svcrdma: Do not send XDR roundup bytes for a write chunk
  2015-12-14 21:30     ` Chuck Lever
@ 2015-12-21 21:07         ` J. Bruce Fields
  -1 siblings, 0 replies; 40+ messages in thread
From: J. Bruce Fields @ 2015-12-21 21:07 UTC (permalink / raw)
  To: Chuck Lever
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA

On Mon, Dec 14, 2015 at 04:30:09PM -0500, Chuck Lever wrote:
> Minor optimization: when dealing with write chunk XDR roundup, do
> not post a Write WR for the zero bytes in the pad. Simply update
> the write segment in the RPC-over-RDMA header to reflect the extra
> pad bytes.
> 
> The Reply chunk is also a write chunk, but the server does not use
> send_write_chunks() to send the Reply chunk. That's OK in this case:
> the server Upper Layer typically marshals the Reply chunk contents
> in a single contiguous buffer, without a separate tail for the XDR
> pad.
> 
> The comments and the variable naming refer to "chunks" but what is
> really meant is "segments." The existing code sends only one
> xdr_write_chunk per RPC reply.
> 
> The fix assumes this as well. When the XDR pad in the first write
> chunk is reached, the assumption is the Write list is complete and
> send_write_chunks() returns.
> 
> That will remain a valid assumption until the server Upper Layer can
> support multiple bulk payload results per RPC.
> 
> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> ---
>  net/sunrpc/xprtrdma/svc_rdma_sendto.c |    7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> index 969a1ab..bad5eaa 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> @@ -342,6 +342,13 @@ static int send_write_chunks(struct svcxprt_rdma *xprt,
>  						arg_ch->rs_handle,
>  						arg_ch->rs_offset,
>  						write_len);
> +
> +		/* Do not send XDR pad bytes */
> +		if (chunk_no && write_len < 4) {
> +			chunk_no++;
> +			break;

I'm pretty lost in this code.  Why does (chunk_no && write_len < 4) mean
this is xdr padding?

> +		}
> +
>  		chunk_off = 0;
>  		while (write_len) {
>  			ret = send_write(xprt, rqstp,
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 01/11] svcrdma: Do not send XDR roundup bytes for a write chunk
@ 2015-12-21 21:07         ` J. Bruce Fields
  0 siblings, 0 replies; 40+ messages in thread
From: J. Bruce Fields @ 2015-12-21 21:07 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-rdma, linux-nfs

On Mon, Dec 14, 2015 at 04:30:09PM -0500, Chuck Lever wrote:
> Minor optimization: when dealing with write chunk XDR roundup, do
> not post a Write WR for the zero bytes in the pad. Simply update
> the write segment in the RPC-over-RDMA header to reflect the extra
> pad bytes.
> 
> The Reply chunk is also a write chunk, but the server does not use
> send_write_chunks() to send the Reply chunk. That's OK in this case:
> the server Upper Layer typically marshals the Reply chunk contents
> in a single contiguous buffer, without a separate tail for the XDR
> pad.
> 
> The comments and the variable naming refer to "chunks" but what is
> really meant is "segments." The existing code sends only one
> xdr_write_chunk per RPC reply.
> 
> The fix assumes this as well. When the XDR pad in the first write
> chunk is reached, the assumption is the Write list is complete and
> send_write_chunks() returns.
> 
> That will remain a valid assumption until the server Upper Layer can
> support multiple bulk payload results per RPC.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  net/sunrpc/xprtrdma/svc_rdma_sendto.c |    7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> index 969a1ab..bad5eaa 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> @@ -342,6 +342,13 @@ static int send_write_chunks(struct svcxprt_rdma *xprt,
>  						arg_ch->rs_handle,
>  						arg_ch->rs_offset,
>  						write_len);
> +
> +		/* Do not send XDR pad bytes */
> +		if (chunk_no && write_len < 4) {
> +			chunk_no++;
> +			break;

I'm pretty lost in this code.  Why does (chunk_no && write_len < 4) mean
this is xdr padding?

> +		}
> +
>  		chunk_off = 0;
>  		while (write_len) {
>  			ret = send_write(xprt, rqstp,

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 01/11] svcrdma: Do not send XDR roundup bytes for a write chunk
  2015-12-21 21:07         ` J. Bruce Fields
@ 2015-12-21 21:15             ` Chuck Lever
  -1 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-21 21:15 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Linux RDMA Mailing List, Linux NFS Mailing List


> On Dec 21, 2015, at 4:07 PM, J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote:
> 
> On Mon, Dec 14, 2015 at 04:30:09PM -0500, Chuck Lever wrote:
>> Minor optimization: when dealing with write chunk XDR roundup, do
>> not post a Write WR for the zero bytes in the pad. Simply update
>> the write segment in the RPC-over-RDMA header to reflect the extra
>> pad bytes.
>> 
>> The Reply chunk is also a write chunk, but the server does not use
>> send_write_chunks() to send the Reply chunk. That's OK in this case:
>> the server Upper Layer typically marshals the Reply chunk contents
>> in a single contiguous buffer, without a separate tail for the XDR
>> pad.
>> 
>> The comments and the variable naming refer to "chunks" but what is
>> really meant is "segments." The existing code sends only one
>> xdr_write_chunk per RPC reply.
>> 
>> The fix assumes this as well. When the XDR pad in the first write
>> chunk is reached, the assumption is the Write list is complete and
>> send_write_chunks() returns.
>> 
>> That will remain a valid assumption until the server Upper Layer can
>> support multiple bulk payload results per RPC.
>> 
>> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>> ---
>> net/sunrpc/xprtrdma/svc_rdma_sendto.c |    7 +++++++
>> 1 file changed, 7 insertions(+)
>> 
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>> index 969a1ab..bad5eaa 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>> @@ -342,6 +342,13 @@ static int send_write_chunks(struct svcxprt_rdma *xprt,
>> 						arg_ch->rs_handle,
>> 						arg_ch->rs_offset,
>> 						write_len);
>> +
>> +		/* Do not send XDR pad bytes */
>> +		if (chunk_no && write_len < 4) {
>> +			chunk_no++;
>> +			break;
> 
> I'm pretty lost in this code.  Why does (chunk_no && write_len < 4) mean
> this is xdr padding?

Chunk zero is always data. Padding is always going to be
after the first chunk. Any chunk after chunk zero that is
shorter than XDR quad alignment is going to be a pad.

Probably too clever. Is there a better way to detect
the XDR pad?


>> +		}
>> +
>> 		chunk_off = 0;
>> 		while (write_len) {
>> 			ret = send_write(xprt, rqstp,

--
Chuck Lever




--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 01/11] svcrdma: Do not send XDR roundup bytes for a write chunk
@ 2015-12-21 21:15             ` Chuck Lever
  0 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-21 21:15 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Linux RDMA Mailing List, Linux NFS Mailing List


> On Dec 21, 2015, at 4:07 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Mon, Dec 14, 2015 at 04:30:09PM -0500, Chuck Lever wrote:
>> Minor optimization: when dealing with write chunk XDR roundup, do
>> not post a Write WR for the zero bytes in the pad. Simply update
>> the write segment in the RPC-over-RDMA header to reflect the extra
>> pad bytes.
>> 
>> The Reply chunk is also a write chunk, but the server does not use
>> send_write_chunks() to send the Reply chunk. That's OK in this case:
>> the server Upper Layer typically marshals the Reply chunk contents
>> in a single contiguous buffer, without a separate tail for the XDR
>> pad.
>> 
>> The comments and the variable naming refer to "chunks" but what is
>> really meant is "segments." The existing code sends only one
>> xdr_write_chunk per RPC reply.
>> 
>> The fix assumes this as well. When the XDR pad in the first write
>> chunk is reached, the assumption is the Write list is complete and
>> send_write_chunks() returns.
>> 
>> That will remain a valid assumption until the server Upper Layer can
>> support multiple bulk payload results per RPC.
>> 
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>> net/sunrpc/xprtrdma/svc_rdma_sendto.c |    7 +++++++
>> 1 file changed, 7 insertions(+)
>> 
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>> index 969a1ab..bad5eaa 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>> @@ -342,6 +342,13 @@ static int send_write_chunks(struct svcxprt_rdma *xprt,
>> 						arg_ch->rs_handle,
>> 						arg_ch->rs_offset,
>> 						write_len);
>> +
>> +		/* Do not send XDR pad bytes */
>> +		if (chunk_no && write_len < 4) {
>> +			chunk_no++;
>> +			break;
> 
> I'm pretty lost in this code.  Why does (chunk_no && write_len < 4) mean
> this is xdr padding?

Chunk zero is always data. Padding is always going to be
after the first chunk. Any chunk after chunk zero that is
shorter than XDR quad alignment is going to be a pad.

Probably too clever. Is there a better way to detect
the XDR pad?


>> +		}
>> +
>> 		chunk_off = 0;
>> 		while (write_len) {
>> 			ret = send_write(xprt, rqstp,

--
Chuck Lever





^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 01/11] svcrdma: Do not send XDR roundup bytes for a write chunk
  2015-12-21 21:15             ` Chuck Lever
@ 2015-12-21 21:29                 ` J. Bruce Fields
  -1 siblings, 0 replies; 40+ messages in thread
From: J. Bruce Fields @ 2015-12-21 21:29 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux RDMA Mailing List, Linux NFS Mailing List

On Mon, Dec 21, 2015 at 04:15:23PM -0500, Chuck Lever wrote:
> 
> > On Dec 21, 2015, at 4:07 PM, J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote:
> > 
> > On Mon, Dec 14, 2015 at 04:30:09PM -0500, Chuck Lever wrote:
> >> Minor optimization: when dealing with write chunk XDR roundup, do
> >> not post a Write WR for the zero bytes in the pad. Simply update
> >> the write segment in the RPC-over-RDMA header to reflect the extra
> >> pad bytes.
> >> 
> >> The Reply chunk is also a write chunk, but the server does not use
> >> send_write_chunks() to send the Reply chunk. That's OK in this case:
> >> the server Upper Layer typically marshals the Reply chunk contents
> >> in a single contiguous buffer, without a separate tail for the XDR
> >> pad.
> >> 
> >> The comments and the variable naming refer to "chunks" but what is
> >> really meant is "segments." The existing code sends only one
> >> xdr_write_chunk per RPC reply.
> >> 
> >> The fix assumes this as well. When the XDR pad in the first write
> >> chunk is reached, the assumption is the Write list is complete and
> >> send_write_chunks() returns.
> >> 
> >> That will remain a valid assumption until the server Upper Layer can
> >> support multiple bulk payload results per RPC.
> >> 
> >> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> >> ---
> >> net/sunrpc/xprtrdma/svc_rdma_sendto.c |    7 +++++++
> >> 1 file changed, 7 insertions(+)
> >> 
> >> diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> >> index 969a1ab..bad5eaa 100644
> >> --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> >> +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> >> @@ -342,6 +342,13 @@ static int send_write_chunks(struct svcxprt_rdma *xprt,
> >> 						arg_ch->rs_handle,
> >> 						arg_ch->rs_offset,
> >> 						write_len);
> >> +
> >> +		/* Do not send XDR pad bytes */
> >> +		if (chunk_no && write_len < 4) {
> >> +			chunk_no++;
> >> +			break;
> > 
> > I'm pretty lost in this code.  Why does (chunk_no && write_len < 4) mean
> > this is xdr padding?
> 
> Chunk zero is always data. Padding is always going to be
> after the first chunk. Any chunk after chunk zero that is
> shorter than XDR quad alignment is going to be a pad.

I don't really know what a chunk is....  Looking at the code:

	write_len = min(xfer_len, be32_to_cpu(arg_ch->rs_length));

so I guess the assumption is just that those rs_length's are always a
multiple of four?

--b.

> 
> Probably too clever. Is there a better way to detect
> the XDR pad?
> 
> 
> >> +		}
> >> +
> >> 		chunk_off = 0;
> >> 		while (write_len) {
> >> 			ret = send_write(xprt, rqstp,
> 
> --
> Chuck Lever
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 01/11] svcrdma: Do not send XDR roundup bytes for a write chunk
@ 2015-12-21 21:29                 ` J. Bruce Fields
  0 siblings, 0 replies; 40+ messages in thread
From: J. Bruce Fields @ 2015-12-21 21:29 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux RDMA Mailing List, Linux NFS Mailing List

On Mon, Dec 21, 2015 at 04:15:23PM -0500, Chuck Lever wrote:
> 
> > On Dec 21, 2015, at 4:07 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > 
> > On Mon, Dec 14, 2015 at 04:30:09PM -0500, Chuck Lever wrote:
> >> Minor optimization: when dealing with write chunk XDR roundup, do
> >> not post a Write WR for the zero bytes in the pad. Simply update
> >> the write segment in the RPC-over-RDMA header to reflect the extra
> >> pad bytes.
> >> 
> >> The Reply chunk is also a write chunk, but the server does not use
> >> send_write_chunks() to send the Reply chunk. That's OK in this case:
> >> the server Upper Layer typically marshals the Reply chunk contents
> >> in a single contiguous buffer, without a separate tail for the XDR
> >> pad.
> >> 
> >> The comments and the variable naming refer to "chunks" but what is
> >> really meant is "segments." The existing code sends only one
> >> xdr_write_chunk per RPC reply.
> >> 
> >> The fix assumes this as well. When the XDR pad in the first write
> >> chunk is reached, the assumption is the Write list is complete and
> >> send_write_chunks() returns.
> >> 
> >> That will remain a valid assumption until the server Upper Layer can
> >> support multiple bulk payload results per RPC.
> >> 
> >> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >> ---
> >> net/sunrpc/xprtrdma/svc_rdma_sendto.c |    7 +++++++
> >> 1 file changed, 7 insertions(+)
> >> 
> >> diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> >> index 969a1ab..bad5eaa 100644
> >> --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> >> +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> >> @@ -342,6 +342,13 @@ static int send_write_chunks(struct svcxprt_rdma *xprt,
> >> 						arg_ch->rs_handle,
> >> 						arg_ch->rs_offset,
> >> 						write_len);
> >> +
> >> +		/* Do not send XDR pad bytes */
> >> +		if (chunk_no && write_len < 4) {
> >> +			chunk_no++;
> >> +			break;
> > 
> > I'm pretty lost in this code.  Why does (chunk_no && write_len < 4) mean
> > this is xdr padding?
> 
> Chunk zero is always data. Padding is always going to be
> after the first chunk. Any chunk after chunk zero that is
> shorter than XDR quad alignment is going to be a pad.

I don't really know what a chunk is....  Looking at the code:

	write_len = min(xfer_len, be32_to_cpu(arg_ch->rs_length));

so I guess the assumption is just that those rs_length's are always a
multiple of four?

--b.

> 
> Probably too clever. Is there a better way to detect
> the XDR pad?
> 
> 
> >> +		}
> >> +
> >> 		chunk_off = 0;
> >> 		while (write_len) {
> >> 			ret = send_write(xprt, rqstp,
> 
> --
> Chuck Lever
> 
> 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 01/11] svcrdma: Do not send XDR roundup bytes for a write chunk
  2015-12-21 21:29                 ` J. Bruce Fields
@ 2015-12-21 22:11                     ` Chuck Lever
  -1 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-21 22:11 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Linux RDMA Mailing List, Linux NFS Mailing List


> On Dec 21, 2015, at 4:29 PM, J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote:
> 
> On Mon, Dec 21, 2015 at 04:15:23PM -0500, Chuck Lever wrote:
>> 
>>> On Dec 21, 2015, at 4:07 PM, J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote:
>>> 
>>> On Mon, Dec 14, 2015 at 04:30:09PM -0500, Chuck Lever wrote:
>>>> Minor optimization: when dealing with write chunk XDR roundup, do
>>>> not post a Write WR for the zero bytes in the pad. Simply update
>>>> the write segment in the RPC-over-RDMA header to reflect the extra
>>>> pad bytes.
>>>> 
>>>> The Reply chunk is also a write chunk, but the server does not use
>>>> send_write_chunks() to send the Reply chunk. That's OK in this case:
>>>> the server Upper Layer typically marshals the Reply chunk contents
>>>> in a single contiguous buffer, without a separate tail for the XDR
>>>> pad.
>>>> 
>>>> The comments and the variable naming refer to "chunks" but what is
>>>> really meant is "segments." The existing code sends only one
>>>> xdr_write_chunk per RPC reply.
>>>> 
>>>> The fix assumes this as well. When the XDR pad in the first write
>>>> chunk is reached, the assumption is the Write list is complete and
>>>> send_write_chunks() returns.
>>>> 
>>>> That will remain a valid assumption until the server Upper Layer can
>>>> support multiple bulk payload results per RPC.
>>>> 
>>>> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>>> ---
>>>> net/sunrpc/xprtrdma/svc_rdma_sendto.c |    7 +++++++
>>>> 1 file changed, 7 insertions(+)
>>>> 
>>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>>>> index 969a1ab..bad5eaa 100644
>>>> --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>>>> @@ -342,6 +342,13 @@ static int send_write_chunks(struct svcxprt_rdma *xprt,
>>>> 						arg_ch->rs_handle,
>>>> 						arg_ch->rs_offset,
>>>> 						write_len);
>>>> +
>>>> +		/* Do not send XDR pad bytes */
>>>> +		if (chunk_no && write_len < 4) {
>>>> +			chunk_no++;
>>>> +			break;
>>> 
>>> I'm pretty lost in this code.  Why does (chunk_no && write_len < 4) mean
>>> this is xdr padding?
>> 
>> Chunk zero is always data. Padding is always going to be
>> after the first chunk. Any chunk after chunk zero that is
>> shorter than XDR quad alignment is going to be a pad.
> 
> I don't really know what a chunk is....  Looking at the code:
> 
> 	write_len = min(xfer_len, be32_to_cpu(arg_ch->rs_length));
> 
> so I guess the assumption is just that those rs_length's are always a
> multiple of four?

The example you recently gave was a two-byte NFS READ
that crosses a page boundary.

In that case, the NFSD would pass down an xdr_buf that
has one byte in a page, one byte in another page, and
a two-byte XDR pad. The logic introduced by this
optimization would be fooled, and neither the second
byte nor the XDR pad would be written to the client.

Unless you can think of a way to recognize an XDR pad
in the xdr_buf 100% of the time, you should drop this
patch.

As far as I know, none of the other patches in this
series depend on this optimization, so please merge
them if you can.


> --b.
> 
>> 
>> Probably too clever. Is there a better way to detect
>> the XDR pad?
>> 
>> 
>>>> +		}
>>>> +
>>>> 		chunk_off = 0;
>>>> 		while (write_len) {
>>>> 			ret = send_write(xprt, rqstp,
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 01/11] svcrdma: Do not send XDR roundup bytes for a write chunk
@ 2015-12-21 22:11                     ` Chuck Lever
  0 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-21 22:11 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Linux RDMA Mailing List, Linux NFS Mailing List


> On Dec 21, 2015, at 4:29 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Mon, Dec 21, 2015 at 04:15:23PM -0500, Chuck Lever wrote:
>> 
>>> On Dec 21, 2015, at 4:07 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>>> 
>>> On Mon, Dec 14, 2015 at 04:30:09PM -0500, Chuck Lever wrote:
>>>> Minor optimization: when dealing with write chunk XDR roundup, do
>>>> not post a Write WR for the zero bytes in the pad. Simply update
>>>> the write segment in the RPC-over-RDMA header to reflect the extra
>>>> pad bytes.
>>>> 
>>>> The Reply chunk is also a write chunk, but the server does not use
>>>> send_write_chunks() to send the Reply chunk. That's OK in this case:
>>>> the server Upper Layer typically marshals the Reply chunk contents
>>>> in a single contiguous buffer, without a separate tail for the XDR
>>>> pad.
>>>> 
>>>> The comments and the variable naming refer to "chunks" but what is
>>>> really meant is "segments." The existing code sends only one
>>>> xdr_write_chunk per RPC reply.
>>>> 
>>>> The fix assumes this as well. When the XDR pad in the first write
>>>> chunk is reached, the assumption is the Write list is complete and
>>>> send_write_chunks() returns.
>>>> 
>>>> That will remain a valid assumption until the server Upper Layer can
>>>> support multiple bulk payload results per RPC.
>>>> 
>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>> ---
>>>> net/sunrpc/xprtrdma/svc_rdma_sendto.c |    7 +++++++
>>>> 1 file changed, 7 insertions(+)
>>>> 
>>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>>>> index 969a1ab..bad5eaa 100644
>>>> --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>>>> @@ -342,6 +342,13 @@ static int send_write_chunks(struct svcxprt_rdma *xprt,
>>>> 						arg_ch->rs_handle,
>>>> 						arg_ch->rs_offset,
>>>> 						write_len);
>>>> +
>>>> +		/* Do not send XDR pad bytes */
>>>> +		if (chunk_no && write_len < 4) {
>>>> +			chunk_no++;
>>>> +			break;
>>> 
>>> I'm pretty lost in this code.  Why does (chunk_no && write_len < 4) mean
>>> this is xdr padding?
>> 
>> Chunk zero is always data. Padding is always going to be
>> after the first chunk. Any chunk after chunk zero that is
>> shorter than XDR quad alignment is going to be a pad.
> 
> I don't really know what a chunk is....  Looking at the code:
> 
> 	write_len = min(xfer_len, be32_to_cpu(arg_ch->rs_length));
> 
> so I guess the assumption is just that those rs_length's are always a
> multiple of four?

The example you recently gave was a two-byte NFS READ
that crosses a page boundary.

In that case, the NFSD would pass down an xdr_buf that
has one byte in a page, one byte in another page, and
a two-byte XDR pad. The logic introduced by this
optimization would be fooled, and neither the second
byte nor the XDR pad would be written to the client.

Unless you can think of a way to recognize an XDR pad
in the xdr_buf 100% of the time, you should drop this
patch.

As far as I know, none of the other patches in this
series depend on this optimization, so please merge
them if you can.


> --b.
> 
>> 
>> Probably too clever. Is there a better way to detect
>> the XDR pad?
>> 
>> 
>>>> +		}
>>>> +
>>>> 		chunk_off = 0;
>>>> 		while (write_len) {
>>>> 			ret = send_write(xprt, rqstp,
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever





^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 01/11] svcrdma: Do not send XDR roundup bytes for a write chunk
  2015-12-21 22:11                     ` Chuck Lever
@ 2015-12-23 19:59                         ` J. Bruce Fields
  -1 siblings, 0 replies; 40+ messages in thread
From: J. Bruce Fields @ 2015-12-23 19:59 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux RDMA Mailing List, Linux NFS Mailing List

On Mon, Dec 21, 2015 at 05:11:56PM -0500, Chuck Lever wrote:
> 
> > On Dec 21, 2015, at 4:29 PM, J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote:
> > 
> > On Mon, Dec 21, 2015 at 04:15:23PM -0500, Chuck Lever wrote:
> >> 
> >>> On Dec 21, 2015, at 4:07 PM, J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote:
> >>> 
> >>> On Mon, Dec 14, 2015 at 04:30:09PM -0500, Chuck Lever wrote:
> >>>> Minor optimization: when dealing with write chunk XDR roundup, do
> >>>> not post a Write WR for the zero bytes in the pad. Simply update
> >>>> the write segment in the RPC-over-RDMA header to reflect the extra
> >>>> pad bytes.
> >>>> 
> >>>> The Reply chunk is also a write chunk, but the server does not use
> >>>> send_write_chunks() to send the Reply chunk. That's OK in this case:
> >>>> the server Upper Layer typically marshals the Reply chunk contents
> >>>> in a single contiguous buffer, without a separate tail for the XDR
> >>>> pad.
> >>>> 
> >>>> The comments and the variable naming refer to "chunks" but what is
> >>>> really meant is "segments." The existing code sends only one
> >>>> xdr_write_chunk per RPC reply.
> >>>> 
> >>>> The fix assumes this as well. When the XDR pad in the first write
> >>>> chunk is reached, the assumption is the Write list is complete and
> >>>> send_write_chunks() returns.
> >>>> 
> >>>> That will remain a valid assumption until the server Upper Layer can
> >>>> support multiple bulk payload results per RPC.
> >>>> 
> >>>> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> >>>> ---
> >>>> net/sunrpc/xprtrdma/svc_rdma_sendto.c |    7 +++++++
> >>>> 1 file changed, 7 insertions(+)
> >>>> 
> >>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> >>>> index 969a1ab..bad5eaa 100644
> >>>> --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> >>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> >>>> @@ -342,6 +342,13 @@ static int send_write_chunks(struct svcxprt_rdma *xprt,
> >>>> 						arg_ch->rs_handle,
> >>>> 						arg_ch->rs_offset,
> >>>> 						write_len);
> >>>> +
> >>>> +		/* Do not send XDR pad bytes */
> >>>> +		if (chunk_no && write_len < 4) {
> >>>> +			chunk_no++;
> >>>> +			break;
> >>> 
> >>> I'm pretty lost in this code.  Why does (chunk_no && write_len < 4) mean
> >>> this is xdr padding?
> >> 
> >> Chunk zero is always data. Padding is always going to be
> >> after the first chunk. Any chunk after chunk zero that is
> >> shorter than XDR quad alignment is going to be a pad.
> > 
> > I don't really know what a chunk is....  Looking at the code:
> > 
> > 	write_len = min(xfer_len, be32_to_cpu(arg_ch->rs_length));
> > 
> > so I guess the assumption is just that those rs_length's are always a
> > multiple of four?
> 
> The example you recently gave was a two-byte NFS READ
> that crosses a page boundary.
> 
> In that case, the NFSD would pass down an xdr_buf that
> has one byte in a page, one byte in another page, and
> a two-byte XDR pad. The logic introduced by this
> optimization would be fooled, and neither the second
> byte nor the XDR pad would be written to the client.
> 
> Unless you can think of a way to recognize an XDR pad
> in the xdr_buf 100% of the time, you should drop this
> patch.

It might be best to make this explicit and mark bytes as padding somehow
while encoding.

> As far as I know, none of the other patches in this
> series depend on this optimization, so please merge
> them if you can.

OK, I'll take a look at the rest, thanks.

--b.

> >> Probably too clever. Is there a better way to detect
> >> the XDR pad?
> >> 
> >> 
> >>>> +		}
> >>>> +
> >>>> 		chunk_off = 0;
> >>>> 		while (write_len) {
> >>>> 			ret = send_write(xprt, rqstp,
> >> 
> >> --
> >> Chuck Lever
> >> 
> >> 
> >> 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> Chuck Lever
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 01/11] svcrdma: Do not send XDR roundup bytes for a write chunk
@ 2015-12-23 19:59                         ` J. Bruce Fields
  0 siblings, 0 replies; 40+ messages in thread
From: J. Bruce Fields @ 2015-12-23 19:59 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux RDMA Mailing List, Linux NFS Mailing List

On Mon, Dec 21, 2015 at 05:11:56PM -0500, Chuck Lever wrote:
> 
> > On Dec 21, 2015, at 4:29 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > 
> > On Mon, Dec 21, 2015 at 04:15:23PM -0500, Chuck Lever wrote:
> >> 
> >>> On Dec 21, 2015, at 4:07 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> >>> 
> >>> On Mon, Dec 14, 2015 at 04:30:09PM -0500, Chuck Lever wrote:
> >>>> Minor optimization: when dealing with write chunk XDR roundup, do
> >>>> not post a Write WR for the zero bytes in the pad. Simply update
> >>>> the write segment in the RPC-over-RDMA header to reflect the extra
> >>>> pad bytes.
> >>>> 
> >>>> The Reply chunk is also a write chunk, but the server does not use
> >>>> send_write_chunks() to send the Reply chunk. That's OK in this case:
> >>>> the server Upper Layer typically marshals the Reply chunk contents
> >>>> in a single contiguous buffer, without a separate tail for the XDR
> >>>> pad.
> >>>> 
> >>>> The comments and the variable naming refer to "chunks" but what is
> >>>> really meant is "segments." The existing code sends only one
> >>>> xdr_write_chunk per RPC reply.
> >>>> 
> >>>> The fix assumes this as well. When the XDR pad in the first write
> >>>> chunk is reached, the assumption is the Write list is complete and
> >>>> send_write_chunks() returns.
> >>>> 
> >>>> That will remain a valid assumption until the server Upper Layer can
> >>>> support multiple bulk payload results per RPC.
> >>>> 
> >>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >>>> ---
> >>>> net/sunrpc/xprtrdma/svc_rdma_sendto.c |    7 +++++++
> >>>> 1 file changed, 7 insertions(+)
> >>>> 
> >>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> >>>> index 969a1ab..bad5eaa 100644
> >>>> --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> >>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> >>>> @@ -342,6 +342,13 @@ static int send_write_chunks(struct svcxprt_rdma *xprt,
> >>>> 						arg_ch->rs_handle,
> >>>> 						arg_ch->rs_offset,
> >>>> 						write_len);
> >>>> +
> >>>> +		/* Do not send XDR pad bytes */
> >>>> +		if (chunk_no && write_len < 4) {
> >>>> +			chunk_no++;
> >>>> +			break;
> >>> 
> >>> I'm pretty lost in this code.  Why does (chunk_no && write_len < 4) mean
> >>> this is xdr padding?
> >> 
> >> Chunk zero is always data. Padding is always going to be
> >> after the first chunk. Any chunk after chunk zero that is
> >> shorter than XDR quad alignment is going to be a pad.
> > 
> > I don't really know what a chunk is....  Looking at the code:
> > 
> > 	write_len = min(xfer_len, be32_to_cpu(arg_ch->rs_length));
> > 
> > so I guess the assumption is just that those rs_length's are always a
> > multiple of four?
> 
> The example you recently gave was a two-byte NFS READ
> that crosses a page boundary.
> 
> In that case, the NFSD would pass down an xdr_buf that
> has one byte in a page, one byte in another page, and
> a two-byte XDR pad. The logic introduced by this
> optimization would be fooled, and neither the second
> byte nor the XDR pad would be written to the client.
> 
> Unless you can think of a way to recognize an XDR pad
> in the xdr_buf 100% of the time, you should drop this
> patch.

It might be best to make this explicit and mark bytes as padding somehow
while encoding.

> As far as I know, none of the other patches in this
> series depend on this optimization, so please merge
> them if you can.

OK, I'll take a look at the rest, thanks.

--b.

> >> Probably too clever. Is there a better way to detect
> >> the XDR pad?
> >> 
> >> 
> >>>> +		}
> >>>> +
> >>>> 		chunk_off = 0;
> >>>> 		while (write_len) {
> >>>> 			ret = send_write(xprt, rqstp,
> >> 
> >> --
> >> Chuck Lever
> >> 
> >> 
> >> 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> Chuck Lever
> 
> 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 00/11] NFS/RDMA server patches for v4.5
  2015-12-16 12:10     ` Devesh Sharma
@ 2015-12-23 21:00         ` J. Bruce Fields
  -1 siblings, 0 replies; 40+ messages in thread
From: J. Bruce Fields @ 2015-12-23 21:00 UTC (permalink / raw)
  To: Devesh Sharma
  Cc: Chuck Lever, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Linux NFS Mailing List

On Wed, Dec 16, 2015 at 05:40:09PM +0530, Devesh Sharma wrote:
> iozone passed on ocrdma device.

What other testing has there been of this patchset?

Connectathon, xfstests, and pynfs make more of an effort to test corner
cases, iozone isn't much of a test of correctness.

--b.

> Link bounce fails to recover iozone
> traffic, however failure is not related to this patch series. I am in
> processes of finding out the patch which broke it.
> 
> Tested-By: Devesh Sharma <devesh.sharma-1wcpHE2jlwO1Z/+hSey0Gg@public.gmane.org>
> 
> On Tue, Dec 15, 2015 at 3:00 AM, Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> > Here are patches to support server-side bi-directional RPC/RDMA
> > operation (to enable NFSv4.1 on RPC/RDMA transports). Thanks to
> > all who reviewed v1, v2, and v3. This version has some significant
> > changes since the previous one.
> >
> > In preparation for Doug's final topic branch, Bruce, I've rebased
> > these on Christoph's ib_device_attr branch. There were some merge
> > conflicts which I've fixed and tested. These are ready for your
> > review.
> >
> > Also available in the "nfsd-rdma-for-4.5" topic branch of this git repo:
> >
> > git://git.linux-nfs.org/projects/cel/cel-2.6.git
> >
> > Or for browsing:
> >
> > http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=log;h=refs/heads/nfsd-rdma-for-4.5
> >
> >
> > Changes since v3:
> > - Rebased on Christoph's ib_device_attr branch
> > - Backchannel patches have been squashed together
> > - Memory allocation overhaul to prevent blocking allocation
> >   when sending backchannel calls
> >
> >
> > Changes since v2:
> > - Rebased on v4.4-rc4
> > - Backchannel code in new source file to address dprintk issues
> > - svc_rdma_get_context() now uses a pre-allocated cache
> > - Dropped svc_rdma_send clean up
> >
> >
> > Changes since v1:
> >
> > - Rebased on v4.4-rc3
> > - Removed the use of CONFIG_SUNRPC_BACKCHANNEL
> > - Fixed computation of forward and backward max_requests
> > - Updated some comments and patch descriptions
> > - pr_err and pr_info converted to dprintk
> > - Simplified svc_rdma_get_context()
> > - Dropped patch removing access_flags field
> > - NFSv4.1 callbacks tested with for-4.5 client
> >
> > ---
> >
> > Chuck Lever (11):
> >       svcrdma: Do not send XDR roundup bytes for a write chunk
> >       svcrdma: Clean up rdma_create_xprt()
> >       svcrdma: Clean up process_context()
> >       svcrdma: Improve allocation of struct svc_rdma_op_ctxt
> >       svcrdma: Improve allocation of struct svc_rdma_req_map
> >       svcrdma: Remove unused req_map and ctxt kmem_caches
> >       svcrdma: Add gfp flags to svc_rdma_post_recv()
> >       svcrdma: Remove last two __GFP_NOFAIL call sites
> >       svcrdma: Make map_xdr non-static
> >       svcrdma: Define maximum number of backchannel requests
> >       svcrdma: Add class for RDMA backwards direction transport
> >
> >
> >  include/linux/sunrpc/svc_rdma.h            |   37 ++-
> >  net/sunrpc/xprt.c                          |    1
> >  net/sunrpc/xprtrdma/Makefile               |    2
> >  net/sunrpc/xprtrdma/svc_rdma.c             |   41 ---
> >  net/sunrpc/xprtrdma/svc_rdma_backchannel.c |  371 ++++++++++++++++++++++++++++
> >  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c    |   52 ++++
> >  net/sunrpc/xprtrdma/svc_rdma_sendto.c      |   34 ++-
> >  net/sunrpc/xprtrdma/svc_rdma_transport.c   |  284 ++++++++++++++++-----
> >  net/sunrpc/xprtrdma/transport.c            |   30 +-
> >  net/sunrpc/xprtrdma/xprt_rdma.h            |   20 +-
> >  10 files changed, 730 insertions(+), 142 deletions(-)
> >  create mode 100644 net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> >
> > --
> > Signature
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 00/11] NFS/RDMA server patches for v4.5
@ 2015-12-23 21:00         ` J. Bruce Fields
  0 siblings, 0 replies; 40+ messages in thread
From: J. Bruce Fields @ 2015-12-23 21:00 UTC (permalink / raw)
  To: Devesh Sharma; +Cc: Chuck Lever, linux-rdma, Linux NFS Mailing List

On Wed, Dec 16, 2015 at 05:40:09PM +0530, Devesh Sharma wrote:
> iozone passed on ocrdma device.

What other testing has there been of this patchset?

Connectathon, xfstests, and pynfs make more of an effort to test corner
cases, iozone isn't much of a test of correctness.

--b.

> Link bounce fails to recover iozone
> traffic, however failure is not related to this patch series. I am in
> processes of finding out the patch which broke it.
> 
> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
> 
> On Tue, Dec 15, 2015 at 3:00 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> > Here are patches to support server-side bi-directional RPC/RDMA
> > operation (to enable NFSv4.1 on RPC/RDMA transports). Thanks to
> > all who reviewed v1, v2, and v3. This version has some significant
> > changes since the previous one.
> >
> > In preparation for Doug's final topic branch, Bruce, I've rebased
> > these on Christoph's ib_device_attr branch. There were some merge
> > conflicts which I've fixed and tested. These are ready for your
> > review.
> >
> > Also available in the "nfsd-rdma-for-4.5" topic branch of this git repo:
> >
> > git://git.linux-nfs.org/projects/cel/cel-2.6.git
> >
> > Or for browsing:
> >
> > http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=log;h=refs/heads/nfsd-rdma-for-4.5
> >
> >
> > Changes since v3:
> > - Rebased on Christoph's ib_device_attr branch
> > - Backchannel patches have been squashed together
> > - Memory allocation overhaul to prevent blocking allocation
> >   when sending backchannel calls
> >
> >
> > Changes since v2:
> > - Rebased on v4.4-rc4
> > - Backchannel code in new source file to address dprintk issues
> > - svc_rdma_get_context() now uses a pre-allocated cache
> > - Dropped svc_rdma_send clean up
> >
> >
> > Changes since v1:
> >
> > - Rebased on v4.4-rc3
> > - Removed the use of CONFIG_SUNRPC_BACKCHANNEL
> > - Fixed computation of forward and backward max_requests
> > - Updated some comments and patch descriptions
> > - pr_err and pr_info converted to dprintk
> > - Simplified svc_rdma_get_context()
> > - Dropped patch removing access_flags field
> > - NFSv4.1 callbacks tested with for-4.5 client
> >
> > ---
> >
> > Chuck Lever (11):
> >       svcrdma: Do not send XDR roundup bytes for a write chunk
> >       svcrdma: Clean up rdma_create_xprt()
> >       svcrdma: Clean up process_context()
> >       svcrdma: Improve allocation of struct svc_rdma_op_ctxt
> >       svcrdma: Improve allocation of struct svc_rdma_req_map
> >       svcrdma: Remove unused req_map and ctxt kmem_caches
> >       svcrdma: Add gfp flags to svc_rdma_post_recv()
> >       svcrdma: Remove last two __GFP_NOFAIL call sites
> >       svcrdma: Make map_xdr non-static
> >       svcrdma: Define maximum number of backchannel requests
> >       svcrdma: Add class for RDMA backwards direction transport
> >
> >
> >  include/linux/sunrpc/svc_rdma.h            |   37 ++-
> >  net/sunrpc/xprt.c                          |    1
> >  net/sunrpc/xprtrdma/Makefile               |    2
> >  net/sunrpc/xprtrdma/svc_rdma.c             |   41 ---
> >  net/sunrpc/xprtrdma/svc_rdma_backchannel.c |  371 ++++++++++++++++++++++++++++
> >  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c    |   52 ++++
> >  net/sunrpc/xprtrdma/svc_rdma_sendto.c      |   34 ++-
> >  net/sunrpc/xprtrdma/svc_rdma_transport.c   |  284 ++++++++++++++++-----
> >  net/sunrpc/xprtrdma/transport.c            |   30 +-
> >  net/sunrpc/xprtrdma/xprt_rdma.h            |   20 +-
> >  10 files changed, 730 insertions(+), 142 deletions(-)
> >  create mode 100644 net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> >
> > --
> > Signature
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 00/11] NFS/RDMA server patches for v4.5
  2015-12-23 21:00         ` J. Bruce Fields
@ 2015-12-24  9:57             ` Chuck Lever
  -1 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-24  9:57 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Devesh Sharma, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Linux NFS Mailing List

My functional test suite includes Cthon, iozone, dbench, fio,
multi-threaded builds of git and the Linux kernel, and xfstests.

This patch series passes with NFSv3, NFSv4.0, and now NFSv4.1.

--
Chuck Lever

> On Dec 23, 2015, at 21:00, J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote:
> 
>> On Wed, Dec 16, 2015 at 05:40:09PM +0530, Devesh Sharma wrote:
>> iozone passed on ocrdma device.
> 
> What other testing has there been of this patchset?
> 
> Connectathon, xfstests, and pynfs make more of an effort to test corner
> cases, iozone isn't much of a test of correctness.
> 
> --b.
> 
>> Link bounce fails to recover iozone
>> traffic, however failure is not related to this patch series. I am in
>> processes of finding out the patch which broke it.
>> 
>> Tested-By: Devesh Sharma <devesh.sharma-1wcpHE2jlwO1Z/+hSey0Gg@public.gmane.org>
>> 
>>> On Tue, Dec 15, 2015 at 3:00 AM, Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>>> Here are patches to support server-side bi-directional RPC/RDMA
>>> operation (to enable NFSv4.1 on RPC/RDMA transports). Thanks to
>>> all who reviewed v1, v2, and v3. This version has some significant
>>> changes since the previous one.
>>> 
>>> In preparation for Doug's final topic branch, Bruce, I've rebased
>>> these on Christoph's ib_device_attr branch. There were some merge
>>> conflicts which I've fixed and tested. These are ready for your
>>> review.
>>> 
>>> Also available in the "nfsd-rdma-for-4.5" topic branch of this git repo:
>>> 
>>> git://git.linux-nfs.org/projects/cel/cel-2.6.git
>>> 
>>> Or for browsing:
>>> 
>>> http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=log;h=refs/heads/nfsd-rdma-for-4.5
>>> 
>>> 
>>> Changes since v3:
>>> - Rebased on Christoph's ib_device_attr branch
>>> - Backchannel patches have been squashed together
>>> - Memory allocation overhaul to prevent blocking allocation
>>>  when sending backchannel calls
>>> 
>>> 
>>> Changes since v2:
>>> - Rebased on v4.4-rc4
>>> - Backchannel code in new source file to address dprintk issues
>>> - svc_rdma_get_context() now uses a pre-allocated cache
>>> - Dropped svc_rdma_send clean up
>>> 
>>> 
>>> Changes since v1:
>>> 
>>> - Rebased on v4.4-rc3
>>> - Removed the use of CONFIG_SUNRPC_BACKCHANNEL
>>> - Fixed computation of forward and backward max_requests
>>> - Updated some comments and patch descriptions
>>> - pr_err and pr_info converted to dprintk
>>> - Simplified svc_rdma_get_context()
>>> - Dropped patch removing access_flags field
>>> - NFSv4.1 callbacks tested with for-4.5 client
>>> 
>>> ---
>>> 
>>> Chuck Lever (11):
>>>      svcrdma: Do not send XDR roundup bytes for a write chunk
>>>      svcrdma: Clean up rdma_create_xprt()
>>>      svcrdma: Clean up process_context()
>>>      svcrdma: Improve allocation of struct svc_rdma_op_ctxt
>>>      svcrdma: Improve allocation of struct svc_rdma_req_map
>>>      svcrdma: Remove unused req_map and ctxt kmem_caches
>>>      svcrdma: Add gfp flags to svc_rdma_post_recv()
>>>      svcrdma: Remove last two __GFP_NOFAIL call sites
>>>      svcrdma: Make map_xdr non-static
>>>      svcrdma: Define maximum number of backchannel requests
>>>      svcrdma: Add class for RDMA backwards direction transport
>>> 
>>> 
>>> include/linux/sunrpc/svc_rdma.h            |   37 ++-
>>> net/sunrpc/xprt.c                          |    1
>>> net/sunrpc/xprtrdma/Makefile               |    2
>>> net/sunrpc/xprtrdma/svc_rdma.c             |   41 ---
>>> net/sunrpc/xprtrdma/svc_rdma_backchannel.c |  371 ++++++++++++++++++++++++++++
>>> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c    |   52 ++++
>>> net/sunrpc/xprtrdma/svc_rdma_sendto.c      |   34 ++-
>>> net/sunrpc/xprtrdma/svc_rdma_transport.c   |  284 ++++++++++++++++-----
>>> net/sunrpc/xprtrdma/transport.c            |   30 +-
>>> net/sunrpc/xprtrdma/xprt_rdma.h            |   20 +-
>>> 10 files changed, 730 insertions(+), 142 deletions(-)
>>> create mode 100644 net/sunrpc/xprtrdma/svc_rdma_backchannel.c
>>> 
>>> --
>>> Signature
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 00/11] NFS/RDMA server patches for v4.5
@ 2015-12-24  9:57             ` Chuck Lever
  0 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever @ 2015-12-24  9:57 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Devesh Sharma, linux-rdma, Linux NFS Mailing List

My functional test suite includes Cthon, iozone, dbench, fio,
multi-threaded builds of git and the Linux kernel, and xfstests.

This patch series passes with NFSv3, NFSv4.0, and now NFSv4.1.

--
Chuck Lever

> On Dec 23, 2015, at 21:00, J. Bruce Fields <bfields@fieldses.org> wrote:
> 
>> On Wed, Dec 16, 2015 at 05:40:09PM +0530, Devesh Sharma wrote:
>> iozone passed on ocrdma device.
> 
> What other testing has there been of this patchset?
> 
> Connectathon, xfstests, and pynfs make more of an effort to test corner
> cases, iozone isn't much of a test of correctness.
> 
> --b.
> 
>> Link bounce fails to recover iozone
>> traffic, however failure is not related to this patch series. I am in
>> processes of finding out the patch which broke it.
>> 
>> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
>> 
>>> On Tue, Dec 15, 2015 at 3:00 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
>>> Here are patches to support server-side bi-directional RPC/RDMA
>>> operation (to enable NFSv4.1 on RPC/RDMA transports). Thanks to
>>> all who reviewed v1, v2, and v3. This version has some significant
>>> changes since the previous one.
>>> 
>>> In preparation for Doug's final topic branch, Bruce, I've rebased
>>> these on Christoph's ib_device_attr branch. There were some merge
>>> conflicts which I've fixed and tested. These are ready for your
>>> review.
>>> 
>>> Also available in the "nfsd-rdma-for-4.5" topic branch of this git repo:
>>> 
>>> git://git.linux-nfs.org/projects/cel/cel-2.6.git
>>> 
>>> Or for browsing:
>>> 
>>> http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=log;h=refs/heads/nfsd-rdma-for-4.5
>>> 
>>> 
>>> Changes since v3:
>>> - Rebased on Christoph's ib_device_attr branch
>>> - Backchannel patches have been squashed together
>>> - Memory allocation overhaul to prevent blocking allocation
>>>  when sending backchannel calls
>>> 
>>> 
>>> Changes since v2:
>>> - Rebased on v4.4-rc4
>>> - Backchannel code in new source file to address dprintk issues
>>> - svc_rdma_get_context() now uses a pre-allocated cache
>>> - Dropped svc_rdma_send clean up
>>> 
>>> 
>>> Changes since v1:
>>> 
>>> - Rebased on v4.4-rc3
>>> - Removed the use of CONFIG_SUNRPC_BACKCHANNEL
>>> - Fixed computation of forward and backward max_requests
>>> - Updated some comments and patch descriptions
>>> - pr_err and pr_info converted to dprintk
>>> - Simplified svc_rdma_get_context()
>>> - Dropped patch removing access_flags field
>>> - NFSv4.1 callbacks tested with for-4.5 client
>>> 
>>> ---
>>> 
>>> Chuck Lever (11):
>>>      svcrdma: Do not send XDR roundup bytes for a write chunk
>>>      svcrdma: Clean up rdma_create_xprt()
>>>      svcrdma: Clean up process_context()
>>>      svcrdma: Improve allocation of struct svc_rdma_op_ctxt
>>>      svcrdma: Improve allocation of struct svc_rdma_req_map
>>>      svcrdma: Remove unused req_map and ctxt kmem_caches
>>>      svcrdma: Add gfp flags to svc_rdma_post_recv()
>>>      svcrdma: Remove last two __GFP_NOFAIL call sites
>>>      svcrdma: Make map_xdr non-static
>>>      svcrdma: Define maximum number of backchannel requests
>>>      svcrdma: Add class for RDMA backwards direction transport
>>> 
>>> 
>>> include/linux/sunrpc/svc_rdma.h            |   37 ++-
>>> net/sunrpc/xprt.c                          |    1
>>> net/sunrpc/xprtrdma/Makefile               |    2
>>> net/sunrpc/xprtrdma/svc_rdma.c             |   41 ---
>>> net/sunrpc/xprtrdma/svc_rdma_backchannel.c |  371 ++++++++++++++++++++++++++++
>>> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c    |   52 ++++
>>> net/sunrpc/xprtrdma/svc_rdma_sendto.c      |   34 ++-
>>> net/sunrpc/xprtrdma/svc_rdma_transport.c   |  284 ++++++++++++++++-----
>>> net/sunrpc/xprtrdma/transport.c            |   30 +-
>>> net/sunrpc/xprtrdma/xprt_rdma.h            |   20 +-
>>> 10 files changed, 730 insertions(+), 142 deletions(-)
>>> create mode 100644 net/sunrpc/xprtrdma/svc_rdma_backchannel.c
>>> 
>>> --
>>> Signature
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2015-12-24  9:57 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-14 21:30 [PATCH v4 00/11] NFS/RDMA server patches for v4.5 Chuck Lever
2015-12-14 21:30 ` Chuck Lever
     [not found] ` <20151214211951.12932.99017.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-12-14 21:30   ` [PATCH v4 01/11] svcrdma: Do not send XDR roundup bytes for a write chunk Chuck Lever
2015-12-14 21:30     ` Chuck Lever
     [not found]     ` <20151214213009.12932.60521.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-12-21 21:07       ` J. Bruce Fields
2015-12-21 21:07         ` J. Bruce Fields
     [not found]         ` <20151221210708.GD7869-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-12-21 21:15           ` Chuck Lever
2015-12-21 21:15             ` Chuck Lever
     [not found]             ` <D0AD304E-0356-48DE-A68D-F9F8930D854D-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-12-21 21:29               ` J. Bruce Fields
2015-12-21 21:29                 ` J. Bruce Fields
     [not found]                 ` <20151221212959.GE7869-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-12-21 22:11                   ` Chuck Lever
2015-12-21 22:11                     ` Chuck Lever
     [not found]                     ` <DF5B7D29-0C6C-47EF-8E3E-74BF137D7F95-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-12-23 19:59                       ` J. Bruce Fields
2015-12-23 19:59                         ` J. Bruce Fields
2015-12-14 21:30   ` [PATCH v4 02/11] svcrdma: Clean up rdma_create_xprt() Chuck Lever
2015-12-14 21:30     ` Chuck Lever
2015-12-14 21:30   ` [PATCH v4 03/11] svcrdma: Clean up process_context() Chuck Lever
2015-12-14 21:30     ` Chuck Lever
2015-12-14 21:30   ` [PATCH v4 04/11] svcrdma: Improve allocation of struct svc_rdma_op_ctxt Chuck Lever
2015-12-14 21:30     ` Chuck Lever
2015-12-14 21:30   ` [PATCH v4 05/11] svcrdma: Improve allocation of struct svc_rdma_req_map Chuck Lever
2015-12-14 21:30     ` Chuck Lever
2015-12-14 21:30   ` [PATCH v4 06/11] svcrdma: Remove unused req_map and ctxt kmem_caches Chuck Lever
2015-12-14 21:30     ` Chuck Lever
2015-12-14 21:30   ` [PATCH v4 07/11] svcrdma: Add gfp flags to svc_rdma_post_recv() Chuck Lever
2015-12-14 21:30     ` Chuck Lever
2015-12-14 21:31   ` [PATCH v4 08/11] svcrdma: Remove last two __GFP_NOFAIL call sites Chuck Lever
2015-12-14 21:31     ` Chuck Lever
2015-12-14 21:31   ` [PATCH v4 09/11] svcrdma: Make map_xdr non-static Chuck Lever
2015-12-14 21:31     ` Chuck Lever
2015-12-14 21:31   ` [PATCH v4 10/11] svcrdma: Define maximum number of backchannel requests Chuck Lever
2015-12-14 21:31     ` Chuck Lever
2015-12-14 21:31   ` [PATCH v4 11/11] svcrdma: Add class for RDMA backwards direction transport Chuck Lever
2015-12-14 21:31     ` Chuck Lever
2015-12-16 12:10   ` [PATCH v4 00/11] NFS/RDMA server patches for v4.5 Devesh Sharma
2015-12-16 12:10     ` Devesh Sharma
     [not found]     ` <CANjDDBjypu0jX22fZ-Nf-bNNeam2MM0MADaZY9C-o9ihJPiseg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-12-23 21:00       ` J. Bruce Fields
2015-12-23 21:00         ` J. Bruce Fields
     [not found]         ` <20151223210015.GA29650-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-12-24  9:57           ` Chuck Lever
2015-12-24  9:57             ` Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.