[PATCH V3 00/17] NFS/RDMA client-side patches

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH V3 00/17] NFS/RDMA client-side patches
@ 2014-04-30 19:29 ` Chuck Lever
  0 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:29 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA

Changes since V2:

 - Rebased on v3.15-rc3

 - "enable pad optimization" dropped. Testing showed Linux NFS/RDMA
   server does not support pad optimization yet.

 - "ALLPHYSICAL CONFIG" dropped. There is a lack of consensus on
   this one. Christoph would like ALLPHYSICAL removed, but the HPC
   community prefers keeping a performance-at-all-costs option. And,
   with most other registration modes now removed, ALLPHYSICAL is
   the mode of last resort if an adapter does not support FRMR or
   MTHCAFMR, since ALLPHYSICAL is universally supported. We will
   very likely revisit this later. I'm erring on the side of less
   churn and dropping this until the community agrees on how to
   move forward.

 - Added a patch to ensure there is always a valid ->qp if RPCs
   might awaken while the transport is disconnected.

 - Added a patch to clean up an MTU settings hack for a very old
   adapter model.

Test and review the "nfs-rdma-client" branch:

 git://git.linux-nfs.org/projects/cel/cel-2.6.git

Thanks!

---

Allen Andrews (1):
      nfs-rdma: Fix for FMR leaks

Chuck Lever (15):
      xprtrdma: Remove Tavor MTU setting
      xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting
      xprtrdma: Reduce the number of hardway buffer allocations
      xprtrdma: Limit work done by completion handler
      xprtrmda: Reduce calls to ib_poll_cq() in completion handlers
      xprtrmda: Reduce lock contention in completion handlers
      xprtrdma: Split the completion queue
      xprtrdma: Make rpcrdma_ep_destroy() return void
      xprtrdma: Simplify rpcrdma_deregister_external() synopsis
      xprtrdma: mount reports "Invalid mount option" if memreg mode not supported
      xprtrdma: Fall back to MTHCAFMR when FRMR is not supported
      xprtrdma: Remove REGISTER memory registration mode
      xprtrdma: Remove MEMWINDOWS registration modes
      xprtrdma: Remove BOUNCEBUFFERS memory registration mode
      xprtrdma: RPC/RDMA must invoke xprt_wake_pending_tasks() in process context

Steve Wise (1):
      xprtrdma: mind the device's max fast register page list depth


 net/sunrpc/xprtrdma/rpc_rdma.c  |   63 +--
 net/sunrpc/xprtrdma/transport.c |   32 --
 net/sunrpc/xprtrdma/verbs.c     |  735 +++++++++++++++------------------------
 net/sunrpc/xprtrdma/xprt_rdma.h |   16 +
 4 files changed, 320 insertions(+), 526 deletions(-)

-- 
Chuck Lever
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH V3 00/17] NFS/RDMA client-side patches
@ 2014-04-30 19:29 ` Chuck Lever
  0 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:29 UTC (permalink / raw)
  To: linux-nfs, linux-rdma; +Cc: Anna.Schumaker

Changes since V2:

 - Rebased on v3.15-rc3

 - "enable pad optimization" dropped. Testing showed Linux NFS/RDMA
   server does not support pad optimization yet.

 - "ALLPHYSICAL CONFIG" dropped. There is a lack of consensus on
   this one. Christoph would like ALLPHYSICAL removed, but the HPC
   community prefers keeping a performance-at-all-costs option. And,
   with most other registration modes now removed, ALLPHYSICAL is
   the mode of last resort if an adapter does not support FRMR or
   MTHCAFMR, since ALLPHYSICAL is universally supported. We will
   very likely revisit this later. I'm erring on the side of less
   churn and dropping this until the community agrees on how to
   move forward.

 - Added a patch to ensure there is always a valid ->qp if RPCs
   might awaken while the transport is disconnected.

 - Added a patch to clean up an MTU settings hack for a very old
   adapter model.

Test and review the "nfs-rdma-client" branch:

 git://git.linux-nfs.org/projects/cel/cel-2.6.git

Thanks!

---

Allen Andrews (1):
      nfs-rdma: Fix for FMR leaks

Chuck Lever (15):
      xprtrdma: Remove Tavor MTU setting
      xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting
      xprtrdma: Reduce the number of hardway buffer allocations
      xprtrdma: Limit work done by completion handler
      xprtrmda: Reduce calls to ib_poll_cq() in completion handlers
      xprtrmda: Reduce lock contention in completion handlers
      xprtrdma: Split the completion queue
      xprtrdma: Make rpcrdma_ep_destroy() return void
      xprtrdma: Simplify rpcrdma_deregister_external() synopsis
      xprtrdma: mount reports "Invalid mount option" if memreg mode not supported
      xprtrdma: Fall back to MTHCAFMR when FRMR is not supported
      xprtrdma: Remove REGISTER memory registration mode
      xprtrdma: Remove MEMWINDOWS registration modes
      xprtrdma: Remove BOUNCEBUFFERS memory registration mode
      xprtrdma: RPC/RDMA must invoke xprt_wake_pending_tasks() in process context

Steve Wise (1):
      xprtrdma: mind the device's max fast register page list depth


 net/sunrpc/xprtrdma/rpc_rdma.c  |   63 +--
 net/sunrpc/xprtrdma/transport.c |   32 --
 net/sunrpc/xprtrdma/verbs.c     |  735 +++++++++++++++------------------------
 net/sunrpc/xprtrdma/xprt_rdma.h |   16 +
 4 files changed, 320 insertions(+), 526 deletions(-)

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH V3 01/17] xprtrdma: mind the device's max fast register page list depth
  2014-04-30 19:29 ` Chuck Lever
@ 2014-04-30 19:29     ` Chuck Lever
  -1 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:29 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA

From: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>

Some rdma devices don't support a fast register page list depth of
at least RPCRDMA_MAX_DATA_SEGS.  So xprtrdma needs to chunk its fast
register regions according to the minimum of the device max supported
depth or RPCRDMA_MAX_DATA_SEGS.

Signed-off-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
Reviewed-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---

 net/sunrpc/xprtrdma/rpc_rdma.c  |    4 ---
 net/sunrpc/xprtrdma/verbs.c     |   47 +++++++++++++++++++++++++++++----------
 net/sunrpc/xprtrdma/xprt_rdma.h |    1 +
 3 files changed, 36 insertions(+), 16 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 96ead52..400aa1b 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -248,10 +248,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct xdr_buf *target,
 	/* success. all failures return above */
 	req->rl_nchunks = nchunks;
 
-	BUG_ON(nchunks == 0);
-	BUG_ON((r_xprt->rx_ia.ri_memreg_strategy == RPCRDMA_FRMR)
-	       && (nchunks > 3));
-
 	/*
 	 * finish off header. If write, marshal discrim and nchunks.
 	 */
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 9372656..55fb09a 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -539,6 +539,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
 				__func__);
 			memreg = RPCRDMA_REGISTER;
 #endif
+		} else {
+			/* Mind the ia limit on FRMR page list depth */
+			ia->ri_max_frmr_depth = min_t(unsigned int,
+				RPCRDMA_MAX_DATA_SEGS,
+				devattr.max_fast_reg_page_list_len);
 		}
 		break;
 	}
@@ -659,24 +664,42 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 	ep->rep_attr.srq = NULL;
 	ep->rep_attr.cap.max_send_wr = cdata->max_requests;
 	switch (ia->ri_memreg_strategy) {
-	case RPCRDMA_FRMR:
+	case RPCRDMA_FRMR: {
+		int depth = 7;
+
 		/* Add room for frmr register and invalidate WRs.
 		 * 1. FRMR reg WR for head
 		 * 2. FRMR invalidate WR for head
-		 * 3. FRMR reg WR for pagelist
-		 * 4. FRMR invalidate WR for pagelist
+		 * 3. N FRMR reg WRs for pagelist
+		 * 4. N FRMR invalidate WRs for pagelist
 		 * 5. FRMR reg WR for tail
 		 * 6. FRMR invalidate WR for tail
 		 * 7. The RDMA_SEND WR
 		 */
-		ep->rep_attr.cap.max_send_wr *= 7;
+
+		/* Calculate N if the device max FRMR depth is smaller than
+		 * RPCRDMA_MAX_DATA_SEGS.
+		 */
+		if (ia->ri_max_frmr_depth < RPCRDMA_MAX_DATA_SEGS) {
+			int delta = RPCRDMA_MAX_DATA_SEGS -
+				    ia->ri_max_frmr_depth;
+
+			do {
+				depth += 2; /* FRMR reg + invalidate */
+				delta -= ia->ri_max_frmr_depth;
+			} while (delta > 0);
+
+		}
+		ep->rep_attr.cap.max_send_wr *= depth;
 		if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr) {
-			cdata->max_requests = devattr.max_qp_wr / 7;
+			cdata->max_requests = devattr.max_qp_wr / depth;
 			if (!cdata->max_requests)
 				return -EINVAL;
-			ep->rep_attr.cap.max_send_wr = cdata->max_requests * 7;
+			ep->rep_attr.cap.max_send_wr = cdata->max_requests *
+						       depth;
 		}
 		break;
+	}
 	case RPCRDMA_MEMWINDOWS_ASYNC:
 	case RPCRDMA_MEMWINDOWS:
 		/* Add room for mw_binds+unbinds - overkill! */
@@ -1043,16 +1066,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
 	case RPCRDMA_FRMR:
 		for (i = buf->rb_max_requests * RPCRDMA_MAX_SEGS; i; i--) {
 			r->r.frmr.fr_mr = ib_alloc_fast_reg_mr(ia->ri_pd,
-							 RPCRDMA_MAX_SEGS);
+						ia->ri_max_frmr_depth);
 			if (IS_ERR(r->r.frmr.fr_mr)) {
 				rc = PTR_ERR(r->r.frmr.fr_mr);
 				dprintk("RPC:       %s: ib_alloc_fast_reg_mr"
 					" failed %i\n", __func__, rc);
 				goto out;
 			}
-			r->r.frmr.fr_pgl =
-				ib_alloc_fast_reg_page_list(ia->ri_id->device,
-							    RPCRDMA_MAX_SEGS);
+			r->r.frmr.fr_pgl = ib_alloc_fast_reg_page_list(
+						ia->ri_id->device,
+						ia->ri_max_frmr_depth);
 			if (IS_ERR(r->r.frmr.fr_pgl)) {
 				rc = PTR_ERR(r->r.frmr.fr_pgl);
 				dprintk("RPC:       %s: "
@@ -1498,8 +1521,8 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
 	seg1->mr_offset -= pageoff;	/* start of page */
 	seg1->mr_len += pageoff;
 	len = -pageoff;
-	if (*nsegs > RPCRDMA_MAX_DATA_SEGS)
-		*nsegs = RPCRDMA_MAX_DATA_SEGS;
+	if (*nsegs > ia->ri_max_frmr_depth)
+		*nsegs = ia->ri_max_frmr_depth;
 	for (page_no = i = 0; i < *nsegs;) {
 		rpcrdma_map_one(ia, seg, writing);
 		pa = seg->mr_dma;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index cc1445d..98340a3 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -66,6 +66,7 @@ struct rpcrdma_ia {
 	struct completion	ri_done;
 	int			ri_async_rc;
 	enum rpcrdma_memreg	ri_memreg_strategy;
+	unsigned int		ri_max_frmr_depth;
 };
 
 /*

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 01/17] xprtrdma: mind the device's max fast register page list depth
@ 2014-04-30 19:29     ` Chuck Lever
  0 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:29 UTC (permalink / raw)
  To: linux-nfs, linux-rdma; +Cc: Anna.Schumaker

From: Steve Wise <swise@opengridcomputing.com>

Some rdma devices don't support a fast register page list depth of
at least RPCRDMA_MAX_DATA_SEGS.  So xprtrdma needs to chunk its fast
register regions according to the minimum of the device max supported
depth or RPCRDMA_MAX_DATA_SEGS.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
---

 net/sunrpc/xprtrdma/rpc_rdma.c  |    4 ---
 net/sunrpc/xprtrdma/verbs.c     |   47 +++++++++++++++++++++++++++++----------
 net/sunrpc/xprtrdma/xprt_rdma.h |    1 +
 3 files changed, 36 insertions(+), 16 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 96ead52..400aa1b 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -248,10 +248,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct xdr_buf *target,
 	/* success. all failures return above */
 	req->rl_nchunks = nchunks;
 
-	BUG_ON(nchunks == 0);
-	BUG_ON((r_xprt->rx_ia.ri_memreg_strategy == RPCRDMA_FRMR)
-	       && (nchunks > 3));
-
 	/*
 	 * finish off header. If write, marshal discrim and nchunks.
 	 */
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 9372656..55fb09a 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -539,6 +539,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
 				__func__);
 			memreg = RPCRDMA_REGISTER;
 #endif
+		} else {
+			/* Mind the ia limit on FRMR page list depth */
+			ia->ri_max_frmr_depth = min_t(unsigned int,
+				RPCRDMA_MAX_DATA_SEGS,
+				devattr.max_fast_reg_page_list_len);
 		}
 		break;
 	}
@@ -659,24 +664,42 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 	ep->rep_attr.srq = NULL;
 	ep->rep_attr.cap.max_send_wr = cdata->max_requests;
 	switch (ia->ri_memreg_strategy) {
-	case RPCRDMA_FRMR:
+	case RPCRDMA_FRMR: {
+		int depth = 7;
+
 		/* Add room for frmr register and invalidate WRs.
 		 * 1. FRMR reg WR for head
 		 * 2. FRMR invalidate WR for head
-		 * 3. FRMR reg WR for pagelist
-		 * 4. FRMR invalidate WR for pagelist
+		 * 3. N FRMR reg WRs for pagelist
+		 * 4. N FRMR invalidate WRs for pagelist
 		 * 5. FRMR reg WR for tail
 		 * 6. FRMR invalidate WR for tail
 		 * 7. The RDMA_SEND WR
 		 */
-		ep->rep_attr.cap.max_send_wr *= 7;
+
+		/* Calculate N if the device max FRMR depth is smaller than
+		 * RPCRDMA_MAX_DATA_SEGS.
+		 */
+		if (ia->ri_max_frmr_depth < RPCRDMA_MAX_DATA_SEGS) {
+			int delta = RPCRDMA_MAX_DATA_SEGS -
+				    ia->ri_max_frmr_depth;
+
+			do {
+				depth += 2; /* FRMR reg + invalidate */
+				delta -= ia->ri_max_frmr_depth;
+			} while (delta > 0);
+
+		}
+		ep->rep_attr.cap.max_send_wr *= depth;
 		if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr) {
-			cdata->max_requests = devattr.max_qp_wr / 7;
+			cdata->max_requests = devattr.max_qp_wr / depth;
 			if (!cdata->max_requests)
 				return -EINVAL;
-			ep->rep_attr.cap.max_send_wr = cdata->max_requests * 7;
+			ep->rep_attr.cap.max_send_wr = cdata->max_requests *
+						       depth;
 		}
 		break;
+	}
 	case RPCRDMA_MEMWINDOWS_ASYNC:
 	case RPCRDMA_MEMWINDOWS:
 		/* Add room for mw_binds+unbinds - overkill! */
@@ -1043,16 +1066,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
 	case RPCRDMA_FRMR:
 		for (i = buf->rb_max_requests * RPCRDMA_MAX_SEGS; i; i--) {
 			r->r.frmr.fr_mr = ib_alloc_fast_reg_mr(ia->ri_pd,
-							 RPCRDMA_MAX_SEGS);
+						ia->ri_max_frmr_depth);
 			if (IS_ERR(r->r.frmr.fr_mr)) {
 				rc = PTR_ERR(r->r.frmr.fr_mr);
 				dprintk("RPC:       %s: ib_alloc_fast_reg_mr"
 					" failed %i\n", __func__, rc);
 				goto out;
 			}
-			r->r.frmr.fr_pgl =
-				ib_alloc_fast_reg_page_list(ia->ri_id->device,
-							    RPCRDMA_MAX_SEGS);
+			r->r.frmr.fr_pgl = ib_alloc_fast_reg_page_list(
+						ia->ri_id->device,
+						ia->ri_max_frmr_depth);
 			if (IS_ERR(r->r.frmr.fr_pgl)) {
 				rc = PTR_ERR(r->r.frmr.fr_pgl);
 				dprintk("RPC:       %s: "
@@ -1498,8 +1521,8 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
 	seg1->mr_offset -= pageoff;	/* start of page */
 	seg1->mr_len += pageoff;
 	len = -pageoff;
-	if (*nsegs > RPCRDMA_MAX_DATA_SEGS)
-		*nsegs = RPCRDMA_MAX_DATA_SEGS;
+	if (*nsegs > ia->ri_max_frmr_depth)
+		*nsegs = ia->ri_max_frmr_depth;
 	for (page_no = i = 0; i < *nsegs;) {
 		rpcrdma_map_one(ia, seg, writing);
 		pa = seg->mr_dma;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index cc1445d..98340a3 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -66,6 +66,7 @@ struct rpcrdma_ia {
 	struct completion	ri_done;
 	int			ri_async_rc;
 	enum rpcrdma_memreg	ri_memreg_strategy;
+	unsigned int		ri_max_frmr_depth;
 };
 
 /*


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 02/17] nfs-rdma: Fix for FMR leaks
  2014-04-30 19:29 ` Chuck Lever
@ 2014-04-30 19:29     ` Chuck Lever
  -1 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:29 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA

From: Allen Andrews <allen.andrews-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org>

Two memory region leaks were found during testing:

1. rpcrdma_buffer_create: While allocating RPCRDMA_FRMR's
ib_alloc_fast_reg_mr is called and then ib_alloc_fast_reg_page_list is
called.  If ib_alloc_fast_reg_page_list returns an error it bails out of
the routine dropping the last ib_alloc_fast_reg_mr frmr region creating a
memory leak.  Added code to dereg the last frmr if
ib_alloc_fast_reg_page_list fails.

2. rpcrdma_buffer_destroy: While cleaning up, the routine will only free
the MR's on the rb_mws list if there are rb_send_bufs present.  However, in
rpcrdma_buffer_create while the rb_mws list is being built if one of the MR
allocation requests fail after some MR's have been allocated on the rb_mws
list the routine never gets to create any rb_send_bufs but instead jumps to
the rpcrdma_buffer_destroy routine which will never free the MR's on rb_mws
list because the rb_send_bufs were never created.   This leaks all the MR's
on the rb_mws list that were created prior to one of the MR allocations
failing.

Issue(2) was seen during testing. Our adapter had a finite number of MR's
available and we created enough connections to where we saw an MR
allocation failure on our Nth NFS connection request. After the kernel
cleaned up the resources it had allocated for the Nth connection we noticed
that FMR's had been leaked due to the coding error described above.

Issue(1) was seen during a code review while debugging issue(2).

Signed-off-by: Allen Andrews <allen.andrews-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---

 net/sunrpc/xprtrdma/verbs.c |   73 ++++++++++++++++++++++---------------------
 1 files changed, 38 insertions(+), 35 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 55fb09a..8f9704e 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1081,6 +1081,8 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
 				dprintk("RPC:       %s: "
 					"ib_alloc_fast_reg_page_list "
 					"failed %i\n", __func__, rc);
+
+				ib_dereg_mr(r->r.frmr.fr_mr);
 				goto out;
 			}
 			list_add(&r->mw_list, &buf->rb_mws);
@@ -1217,41 +1219,6 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
 			kfree(buf->rb_recv_bufs[i]);
 		}
 		if (buf->rb_send_bufs && buf->rb_send_bufs[i]) {
-			while (!list_empty(&buf->rb_mws)) {
-				r = list_entry(buf->rb_mws.next,
-					struct rpcrdma_mw, mw_list);
-				list_del(&r->mw_list);
-				switch (ia->ri_memreg_strategy) {
-				case RPCRDMA_FRMR:
-					rc = ib_dereg_mr(r->r.frmr.fr_mr);
-					if (rc)
-						dprintk("RPC:       %s:"
-							" ib_dereg_mr"
-							" failed %i\n",
-							__func__, rc);
-					ib_free_fast_reg_page_list(r->r.frmr.fr_pgl);
-					break;
-				case RPCRDMA_MTHCAFMR:
-					rc = ib_dealloc_fmr(r->r.fmr);
-					if (rc)
-						dprintk("RPC:       %s:"
-							" ib_dealloc_fmr"
-							" failed %i\n",
-							__func__, rc);
-					break;
-				case RPCRDMA_MEMWINDOWS_ASYNC:
-				case RPCRDMA_MEMWINDOWS:
-					rc = ib_dealloc_mw(r->r.mw);
-					if (rc)
-						dprintk("RPC:       %s:"
-							" ib_dealloc_mw"
-							" failed %i\n",
-							__func__, rc);
-					break;
-				default:
-					break;
-				}
-			}
 			rpcrdma_deregister_internal(ia,
 					buf->rb_send_bufs[i]->rl_handle,
 					&buf->rb_send_bufs[i]->rl_iov);
@@ -1259,6 +1226,42 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
 		}
 	}
 
+	while (!list_empty(&buf->rb_mws)) {
+		r = list_entry(buf->rb_mws.next,
+			struct rpcrdma_mw, mw_list);
+		list_del(&r->mw_list);
+		switch (ia->ri_memreg_strategy) {
+		case RPCRDMA_FRMR:
+			rc = ib_dereg_mr(r->r.frmr.fr_mr);
+			if (rc)
+				dprintk("RPC:       %s:"
+					" ib_dereg_mr"
+					" failed %i\n",
+					__func__, rc);
+			ib_free_fast_reg_page_list(r->r.frmr.fr_pgl);
+			break;
+		case RPCRDMA_MTHCAFMR:
+			rc = ib_dealloc_fmr(r->r.fmr);
+			if (rc)
+				dprintk("RPC:       %s:"
+					" ib_dealloc_fmr"
+					" failed %i\n",
+					__func__, rc);
+			break;
+		case RPCRDMA_MEMWINDOWS_ASYNC:
+		case RPCRDMA_MEMWINDOWS:
+			rc = ib_dealloc_mw(r->r.mw);
+			if (rc)
+				dprintk("RPC:       %s:"
+					" ib_dealloc_mw"
+					" failed %i\n",
+					__func__, rc);
+			break;
+		default:
+			break;
+		}
+	}
+
 	kfree(buf->rb_pool);
 }
 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 02/17] nfs-rdma: Fix for FMR leaks
@ 2014-04-30 19:29     ` Chuck Lever
  0 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:29 UTC (permalink / raw)
  To: linux-nfs, linux-rdma; +Cc: Anna.Schumaker

From: Allen Andrews <allen.andrews@emulex.com>

Two memory region leaks were found during testing:

1. rpcrdma_buffer_create: While allocating RPCRDMA_FRMR's
ib_alloc_fast_reg_mr is called and then ib_alloc_fast_reg_page_list is
called.  If ib_alloc_fast_reg_page_list returns an error it bails out of
the routine dropping the last ib_alloc_fast_reg_mr frmr region creating a
memory leak.  Added code to dereg the last frmr if
ib_alloc_fast_reg_page_list fails.

2. rpcrdma_buffer_destroy: While cleaning up, the routine will only free
the MR's on the rb_mws list if there are rb_send_bufs present.  However, in
rpcrdma_buffer_create while the rb_mws list is being built if one of the MR
allocation requests fail after some MR's have been allocated on the rb_mws
list the routine never gets to create any rb_send_bufs but instead jumps to
the rpcrdma_buffer_destroy routine which will never free the MR's on rb_mws
list because the rb_send_bufs were never created.   This leaks all the MR's
on the rb_mws list that were created prior to one of the MR allocations
failing.

Issue(2) was seen during testing. Our adapter had a finite number of MR's
available and we created enough connections to where we saw an MR
allocation failure on our Nth NFS connection request. After the kernel
cleaned up the resources it had allocated for the Nth connection we noticed
that FMR's had been leaked due to the coding error described above.

Issue(1) was seen during a code review while debugging issue(2).

Signed-off-by: Allen Andrews <allen.andrews@emulex.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
---

 net/sunrpc/xprtrdma/verbs.c |   73 ++++++++++++++++++++++---------------------
 1 files changed, 38 insertions(+), 35 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 55fb09a..8f9704e 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1081,6 +1081,8 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
 				dprintk("RPC:       %s: "
 					"ib_alloc_fast_reg_page_list "
 					"failed %i\n", __func__, rc);
+
+				ib_dereg_mr(r->r.frmr.fr_mr);
 				goto out;
 			}
 			list_add(&r->mw_list, &buf->rb_mws);
@@ -1217,41 +1219,6 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
 			kfree(buf->rb_recv_bufs[i]);
 		}
 		if (buf->rb_send_bufs && buf->rb_send_bufs[i]) {
-			while (!list_empty(&buf->rb_mws)) {
-				r = list_entry(buf->rb_mws.next,
-					struct rpcrdma_mw, mw_list);
-				list_del(&r->mw_list);
-				switch (ia->ri_memreg_strategy) {
-				case RPCRDMA_FRMR:
-					rc = ib_dereg_mr(r->r.frmr.fr_mr);
-					if (rc)
-						dprintk("RPC:       %s:"
-							" ib_dereg_mr"
-							" failed %i\n",
-							__func__, rc);
-					ib_free_fast_reg_page_list(r->r.frmr.fr_pgl);
-					break;
-				case RPCRDMA_MTHCAFMR:
-					rc = ib_dealloc_fmr(r->r.fmr);
-					if (rc)
-						dprintk("RPC:       %s:"
-							" ib_dealloc_fmr"
-							" failed %i\n",
-							__func__, rc);
-					break;
-				case RPCRDMA_MEMWINDOWS_ASYNC:
-				case RPCRDMA_MEMWINDOWS:
-					rc = ib_dealloc_mw(r->r.mw);
-					if (rc)
-						dprintk("RPC:       %s:"
-							" ib_dealloc_mw"
-							" failed %i\n",
-							__func__, rc);
-					break;
-				default:
-					break;
-				}
-			}
 			rpcrdma_deregister_internal(ia,
 					buf->rb_send_bufs[i]->rl_handle,
 					&buf->rb_send_bufs[i]->rl_iov);
@@ -1259,6 +1226,42 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
 		}
 	}

+	while (!list_empty(&buf->rb_mws)) {
+		r = list_entry(buf->rb_mws.next,
+			struct rpcrdma_mw, mw_list);
+		list_del(&r->mw_list);
+		switch (ia->ri_memreg_strategy) {
+		case RPCRDMA_FRMR:
+			rc = ib_dereg_mr(r->r.frmr.fr_mr);
+			if (rc)
+				dprintk("RPC:       %s:"
+					" ib_dereg_mr"
+					" failed %i\n",
+					__func__, rc);
+			ib_free_fast_reg_page_list(r->r.frmr.fr_pgl);
+			break;
+		case RPCRDMA_MTHCAFMR:
+			rc = ib_dealloc_fmr(r->r.fmr);
+			if (rc)
+				dprintk("RPC:       %s:"
+					" ib_dealloc_fmr"
+					" failed %i\n",
+					__func__, rc);
+			break;
+		case RPCRDMA_MEMWINDOWS_ASYNC:
+		case RPCRDMA_MEMWINDOWS:
+			rc = ib_dealloc_mw(r->r.mw);
+			if (rc)
+				dprintk("RPC:       %s:"
+					" ib_dealloc_mw"
+					" failed %i\n",
+					__func__, rc);
+			break;
+		default:
+			break;
+		}
+	}
+
 	kfree(buf->rb_pool);
 }

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 03/17] xprtrdma: RPC/RDMA must invoke xprt_wake_pending_tasks() in process context
  2014-04-30 19:29 ` Chuck Lever
@ 2014-04-30 19:29     ` Chuck Lever
  -1 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:29 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA

An IB provider can invoke rpcrdma_conn_func() in an IRQ context,
thus rpcrdma_conn_func() cannot be allowed to directly invoke
generic RPC functions like xprt_wake_pending_tasks().

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Tested-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---

 net/sunrpc/xprtrdma/rpc_rdma.c  |   22 +++++++++++++++-------
 net/sunrpc/xprtrdma/verbs.c     |    3 +++
 net/sunrpc/xprtrdma/xprt_rdma.h |    3 +++
 3 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 400aa1b..c296468 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -676,15 +676,11 @@ rpcrdma_inline_fixup(struct rpc_rqst *rqst, char *srcp, int copy_len, int pad)
 	rqst->rq_private_buf = rqst->rq_rcv_buf;
 }
 
-/*
- * This function is called when an async event is posted to
- * the connection which changes the connection state. All it
- * does at this point is mark the connection up/down, the rpc
- * timers do the rest.
- */
 void
-rpcrdma_conn_func(struct rpcrdma_ep *ep)
+rpcrdma_connect_worker(struct work_struct *work)
 {
+	struct rpcrdma_ep *ep =
+		container_of(work, struct rpcrdma_ep, rep_connect_worker.work);
 	struct rpc_xprt *xprt = ep->rep_xprt;
 
 	spin_lock_bh(&xprt->transport_lock);
@@ -701,6 +697,18 @@ rpcrdma_conn_func(struct rpcrdma_ep *ep)
 }
 
 /*
+ * This function is called when an async event is posted to
+ * the connection which changes the connection state. All it
+ * does at this point is mark the connection up/down, the rpc
+ * timers do the rest.
+ */
+void
+rpcrdma_conn_func(struct rpcrdma_ep *ep)
+{
+	schedule_delayed_work(&ep->rep_connect_worker, 0);
+}
+
+/*
  * This function is called when memory window unbind which we are waiting
  * for completes. Just use rr_func (zeroed by upcall) to signal completion.
  */
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 8f9704e..9cb88f3 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -742,6 +742,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 	INIT_CQCOUNT(ep);
 	ep->rep_ia = ia;
 	init_waitqueue_head(&ep->rep_connect_wait);
+	INIT_DELAYED_WORK(&ep->rep_connect_worker, rpcrdma_connect_worker);
 
 	/*
 	 * Create a single cq for receive dto and mw_bind (only ever
@@ -817,6 +818,8 @@ rpcrdma_ep_destroy(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
 	dprintk("RPC:       %s: entering, connected is %d\n",
 		__func__, ep->rep_connected);
 
+	cancel_delayed_work_sync(&ep->rep_connect_worker);
+
 	if (ia->ri_id->qp) {
 		rc = rpcrdma_ep_disconnect(ep, ia);
 		if (rc)
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 98340a3..c620d13 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -43,6 +43,7 @@
 #include <linux/wait.h> 		/* wait_queue_head_t, etc */
 #include <linux/spinlock.h> 		/* spinlock_t, etc */
 #include <linux/atomic.h>			/* atomic_t, etc */
+#include <linux/workqueue.h>		/* struct work_struct */
 
 #include <rdma/rdma_cm.h>		/* RDMA connection api */
 #include <rdma/ib_verbs.h>		/* RDMA verbs api */
@@ -87,6 +88,7 @@ struct rpcrdma_ep {
 	struct rpc_xprt		*rep_xprt;	/* for rep_func */
 	struct rdma_conn_param	rep_remote_cma;
 	struct sockaddr_storage	rep_remote_addr;
+	struct delayed_work	rep_connect_worker;
 };
 
 #define INIT_CQCOUNT(ep) atomic_set(&(ep)->rep_cqcount, (ep)->rep_cqinit)
@@ -336,6 +338,7 @@ int rpcrdma_deregister_external(struct rpcrdma_mr_seg *,
 /*
  * RPC/RDMA connection management calls - xprtrdma/rpc_rdma.c
  */
+void rpcrdma_connect_worker(struct work_struct *);
 void rpcrdma_conn_func(struct rpcrdma_ep *);
 void rpcrdma_reply_handler(struct rpcrdma_rep *);
 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 03/17] xprtrdma: RPC/RDMA must invoke xprt_wake_pending_tasks() in process context
@ 2014-04-30 19:29     ` Chuck Lever
  0 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:29 UTC (permalink / raw)
  To: linux-nfs, linux-rdma; +Cc: Anna.Schumaker

An IB provider can invoke rpcrdma_conn_func() in an IRQ context,
thus rpcrdma_conn_func() cannot be allowed to directly invoke
generic RPC functions like xprt_wake_pending_tasks().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
---

 net/sunrpc/xprtrdma/rpc_rdma.c  |   22 +++++++++++++++-------
 net/sunrpc/xprtrdma/verbs.c     |    3 +++
 net/sunrpc/xprtrdma/xprt_rdma.h |    3 +++
 3 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 400aa1b..c296468 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -676,15 +676,11 @@ rpcrdma_inline_fixup(struct rpc_rqst *rqst, char *srcp, int copy_len, int pad)
 	rqst->rq_private_buf = rqst->rq_rcv_buf;
 }
 
-/*
- * This function is called when an async event is posted to
- * the connection which changes the connection state. All it
- * does at this point is mark the connection up/down, the rpc
- * timers do the rest.
- */
 void
-rpcrdma_conn_func(struct rpcrdma_ep *ep)
+rpcrdma_connect_worker(struct work_struct *work)
 {
+	struct rpcrdma_ep *ep =
+		container_of(work, struct rpcrdma_ep, rep_connect_worker.work);
 	struct rpc_xprt *xprt = ep->rep_xprt;
 
 	spin_lock_bh(&xprt->transport_lock);
@@ -701,6 +697,18 @@ rpcrdma_conn_func(struct rpcrdma_ep *ep)
 }
 
 /*
+ * This function is called when an async event is posted to
+ * the connection which changes the connection state. All it
+ * does at this point is mark the connection up/down, the rpc
+ * timers do the rest.
+ */
+void
+rpcrdma_conn_func(struct rpcrdma_ep *ep)
+{
+	schedule_delayed_work(&ep->rep_connect_worker, 0);
+}
+
+/*
  * This function is called when memory window unbind which we are waiting
  * for completes. Just use rr_func (zeroed by upcall) to signal completion.
  */
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 8f9704e..9cb88f3 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -742,6 +742,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 	INIT_CQCOUNT(ep);
 	ep->rep_ia = ia;
 	init_waitqueue_head(&ep->rep_connect_wait);
+	INIT_DELAYED_WORK(&ep->rep_connect_worker, rpcrdma_connect_worker);
 
 	/*
 	 * Create a single cq for receive dto and mw_bind (only ever
@@ -817,6 +818,8 @@ rpcrdma_ep_destroy(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
 	dprintk("RPC:       %s: entering, connected is %d\n",
 		__func__, ep->rep_connected);
 
+	cancel_delayed_work_sync(&ep->rep_connect_worker);
+
 	if (ia->ri_id->qp) {
 		rc = rpcrdma_ep_disconnect(ep, ia);
 		if (rc)
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 98340a3..c620d13 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -43,6 +43,7 @@
 #include <linux/wait.h> 		/* wait_queue_head_t, etc */
 #include <linux/spinlock.h> 		/* spinlock_t, etc */
 #include <linux/atomic.h>			/* atomic_t, etc */
+#include <linux/workqueue.h>		/* struct work_struct */
 
 #include <rdma/rdma_cm.h>		/* RDMA connection api */
 #include <rdma/ib_verbs.h>		/* RDMA verbs api */
@@ -87,6 +88,7 @@ struct rpcrdma_ep {
 	struct rpc_xprt		*rep_xprt;	/* for rep_func */
 	struct rdma_conn_param	rep_remote_cma;
 	struct sockaddr_storage	rep_remote_addr;
+	struct delayed_work	rep_connect_worker;
 };
 
 #define INIT_CQCOUNT(ep) atomic_set(&(ep)->rep_cqcount, (ep)->rep_cqinit)
@@ -336,6 +338,7 @@ int rpcrdma_deregister_external(struct rpcrdma_mr_seg *,
 /*
  * RPC/RDMA connection management calls - xprtrdma/rpc_rdma.c
  */
+void rpcrdma_connect_worker(struct work_struct *);
 void rpcrdma_conn_func(struct rpcrdma_ep *);
 void rpcrdma_reply_handler(struct rpcrdma_rep *);
 


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 04/17] xprtrdma: Remove BOUNCEBUFFERS memory registration mode
  2014-04-30 19:29 ` Chuck Lever
@ 2014-04-30 19:30     ` Chuck Lever
  -1 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:30 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA

Clean up: This memory registration mode is slow and was never
meant for use in production environments. Remove it to reduce
implementation complexity.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Tested-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---

 net/sunrpc/xprtrdma/rpc_rdma.c  |    8 --------
 net/sunrpc/xprtrdma/transport.c |   13 -------------
 net/sunrpc/xprtrdma/verbs.c     |    5 +----
 3 files changed, 1 insertions(+), 25 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index c296468..b963e50 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -439,14 +439,6 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
 		wtype = rpcrdma_noch;
 	BUG_ON(rtype != rpcrdma_noch && wtype != rpcrdma_noch);
 
-	if (r_xprt->rx_ia.ri_memreg_strategy == RPCRDMA_BOUNCEBUFFERS &&
-	    (rtype != rpcrdma_noch || wtype != rpcrdma_noch)) {
-		/* forced to "pure inline"? */
-		dprintk("RPC:       %s: too much data (%d/%d) for inline\n",
-			__func__, rqst->rq_rcv_buf.len, rqst->rq_snd_buf.len);
-		return -1;
-	}
-
 	hdrlen = 28; /*sizeof *headerp;*/
 	padlen = 0;
 
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 1eb9c46..8c5035a 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -503,18 +503,6 @@ xprt_rdma_allocate(struct rpc_task *task, size_t size)
 		 * If the allocation or registration fails, the RPC framework
 		 * will (doggedly) retry.
 		 */
-		if (rpcx_to_rdmax(xprt)->rx_ia.ri_memreg_strategy ==
-				RPCRDMA_BOUNCEBUFFERS) {
-			/* forced to "pure inline" */
-			dprintk("RPC:       %s: too much data (%zd) for inline "
-					"(r/w max %d/%d)\n", __func__, size,
-					rpcx_to_rdmad(xprt).inline_rsize,
-					rpcx_to_rdmad(xprt).inline_wsize);
-			size = req->rl_size;
-			rpc_exit(task, -EIO);		/* fail the operation */
-			rpcx_to_rdmax(xprt)->rx_stats.failed_marshal_count++;
-			goto out;
-		}
 		if (task->tk_flags & RPC_TASK_SWAPPER)
 			nreq = kmalloc(sizeof *req + size, GFP_ATOMIC);
 		else
@@ -543,7 +531,6 @@ xprt_rdma_allocate(struct rpc_task *task, size_t size)
 		req = nreq;
 	}
 	dprintk("RPC:       %s: size %zd, request 0x%p\n", __func__, size, req);
-out:
 	req->rl_connect_cookie = 0;	/* our reserved value */
 	return req->rl_xdr_buf;
 
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 9cb88f3..4a4e4ea 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -557,7 +557,6 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
 	 * adapter.
 	 */
 	switch (memreg) {
-	case RPCRDMA_BOUNCEBUFFERS:
 	case RPCRDMA_REGISTER:
 	case RPCRDMA_FRMR:
 		break;
@@ -778,9 +777,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 
 	/* Client offers RDMA Read but does not initiate */
 	ep->rep_remote_cma.initiator_depth = 0;
-	if (ia->ri_memreg_strategy == RPCRDMA_BOUNCEBUFFERS)
-		ep->rep_remote_cma.responder_resources = 0;
-	else if (devattr.max_qp_rd_atom > 32)	/* arbitrary but <= 255 */
+	if (devattr.max_qp_rd_atom > 32)	/* arbitrary but <= 255 */
 		ep->rep_remote_cma.responder_resources = 32;
 	else
 		ep->rep_remote_cma.responder_resources = devattr.max_qp_rd_atom;

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 04/17] xprtrdma: Remove BOUNCEBUFFERS memory registration mode
@ 2014-04-30 19:30     ` Chuck Lever
  0 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:30 UTC (permalink / raw)
  To: linux-nfs, linux-rdma; +Cc: Anna.Schumaker

Clean up: This memory registration mode is slow and was never
meant for use in production environments. Remove it to reduce
implementation complexity.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
---

 net/sunrpc/xprtrdma/rpc_rdma.c  |    8 --------
 net/sunrpc/xprtrdma/transport.c |   13 -------------
 net/sunrpc/xprtrdma/verbs.c     |    5 +----
 3 files changed, 1 insertions(+), 25 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index c296468..b963e50 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -439,14 +439,6 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
 		wtype = rpcrdma_noch;
 	BUG_ON(rtype != rpcrdma_noch && wtype != rpcrdma_noch);
 
-	if (r_xprt->rx_ia.ri_memreg_strategy == RPCRDMA_BOUNCEBUFFERS &&
-	    (rtype != rpcrdma_noch || wtype != rpcrdma_noch)) {
-		/* forced to "pure inline"? */
-		dprintk("RPC:       %s: too much data (%d/%d) for inline\n",
-			__func__, rqst->rq_rcv_buf.len, rqst->rq_snd_buf.len);
-		return -1;
-	}
-
 	hdrlen = 28; /*sizeof *headerp;*/
 	padlen = 0;
 
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 1eb9c46..8c5035a 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -503,18 +503,6 @@ xprt_rdma_allocate(struct rpc_task *task, size_t size)
 		 * If the allocation or registration fails, the RPC framework
 		 * will (doggedly) retry.
 		 */
-		if (rpcx_to_rdmax(xprt)->rx_ia.ri_memreg_strategy ==
-				RPCRDMA_BOUNCEBUFFERS) {
-			/* forced to "pure inline" */
-			dprintk("RPC:       %s: too much data (%zd) for inline "
-					"(r/w max %d/%d)\n", __func__, size,
-					rpcx_to_rdmad(xprt).inline_rsize,
-					rpcx_to_rdmad(xprt).inline_wsize);
-			size = req->rl_size;
-			rpc_exit(task, -EIO);		/* fail the operation */
-			rpcx_to_rdmax(xprt)->rx_stats.failed_marshal_count++;
-			goto out;
-		}
 		if (task->tk_flags & RPC_TASK_SWAPPER)
 			nreq = kmalloc(sizeof *req + size, GFP_ATOMIC);
 		else
@@ -543,7 +531,6 @@ xprt_rdma_allocate(struct rpc_task *task, size_t size)
 		req = nreq;
 	}
 	dprintk("RPC:       %s: size %zd, request 0x%p\n", __func__, size, req);
-out:
 	req->rl_connect_cookie = 0;	/* our reserved value */
 	return req->rl_xdr_buf;
 
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 9cb88f3..4a4e4ea 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -557,7 +557,6 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
 	 * adapter.
 	 */
 	switch (memreg) {
-	case RPCRDMA_BOUNCEBUFFERS:
 	case RPCRDMA_REGISTER:
 	case RPCRDMA_FRMR:
 		break;
@@ -778,9 +777,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 
 	/* Client offers RDMA Read but does not initiate */
 	ep->rep_remote_cma.initiator_depth = 0;
-	if (ia->ri_memreg_strategy == RPCRDMA_BOUNCEBUFFERS)
-		ep->rep_remote_cma.responder_resources = 0;
-	else if (devattr.max_qp_rd_atom > 32)	/* arbitrary but <= 255 */
+	if (devattr.max_qp_rd_atom > 32)	/* arbitrary but <= 255 */
 		ep->rep_remote_cma.responder_resources = 32;
 	else
 		ep->rep_remote_cma.responder_resources = devattr.max_qp_rd_atom;


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 05/17] xprtrdma: Remove MEMWINDOWS registration modes
  2014-04-30 19:29 ` Chuck Lever
@ 2014-04-30 19:30     ` Chuck Lever
  -1 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:30 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA

The MEMWINDOWS and MEMWINDOES_ASYNC memory registration modes were
intended as stop-gap modes before the introduction of FRMR. They
are now considered obsolete.

MEMWINDOWS_ASYNC is also considered unsafe because it can leave
client memory registered and exposed for an indeterminant time after
each I/O.

At this point, the MEMWINDOWS modes add needless complexity, so
remove them.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Tested-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---

 net/sunrpc/xprtrdma/rpc_rdma.c  |   34 --------
 net/sunrpc/xprtrdma/transport.c |    9 --
 net/sunrpc/xprtrdma/verbs.c     |  165 +--------------------------------------
 net/sunrpc/xprtrdma/xprt_rdma.h |    2 
 4 files changed, 7 insertions(+), 203 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index b963e50..e4af26a 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -202,7 +202,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct xdr_buf *target,
 		return 0;
 
 	do {
-		/* bind/register the memory, then build chunk from result. */
 		int n = rpcrdma_register_external(seg, nsegs,
 						cur_wchunk != NULL, r_xprt);
 		if (n <= 0)
@@ -701,16 +700,6 @@ rpcrdma_conn_func(struct rpcrdma_ep *ep)
 }
 
 /*
- * This function is called when memory window unbind which we are waiting
- * for completes. Just use rr_func (zeroed by upcall) to signal completion.
- */
-static void
-rpcrdma_unbind_func(struct rpcrdma_rep *rep)
-{
-	wake_up(&rep->rr_unbind);
-}
-
-/*
  * Called as a tasklet to do req/reply match and complete a request
  * Errors must result in the RPC task either being awakened, or
  * allowed to timeout, to discover the errors at that time.
@@ -724,7 +713,7 @@ rpcrdma_reply_handler(struct rpcrdma_rep *rep)
 	struct rpc_xprt *xprt = rep->rr_xprt;
 	struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
 	__be32 *iptr;
-	int i, rdmalen, status;
+	int rdmalen, status;
 
 	/* Check status. If bad, signal disconnect and return rep to pool */
 	if (rep->rr_len == ~0U) {
@@ -853,27 +842,6 @@ badheader:
 		break;
 	}
 
-	/* If using mw bind, start the deregister process now. */
-	/* (Note: if mr_free(), cannot perform it here, in tasklet context) */
-	if (req->rl_nchunks) switch (r_xprt->rx_ia.ri_memreg_strategy) {
-	case RPCRDMA_MEMWINDOWS:
-		for (i = 0; req->rl_nchunks-- > 1;)
-			i += rpcrdma_deregister_external(
-				&req->rl_segments[i], r_xprt, NULL);
-		/* Optionally wait (not here) for unbinds to complete */
-		rep->rr_func = rpcrdma_unbind_func;
-		(void) rpcrdma_deregister_external(&req->rl_segments[i],
-						   r_xprt, rep);
-		break;
-	case RPCRDMA_MEMWINDOWS_ASYNC:
-		for (i = 0; req->rl_nchunks--;)
-			i += rpcrdma_deregister_external(&req->rl_segments[i],
-							 r_xprt, NULL);
-		break;
-	default:
-		break;
-	}
-
 	dprintk("RPC:       %s: xprt_complete_rqst(0x%p, 0x%p, %d)\n",
 			__func__, xprt, rqst, status);
 	xprt_complete_rqst(rqst->rq_task, status);
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 8c5035a..c23b0c1 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -566,9 +566,7 @@ xprt_rdma_free(void *buffer)
 		__func__, rep, (rep && rep->rr_func) ? " (with waiter)" : "");
 
 	/*
-	 * Finish the deregistration. When using mw bind, this was
-	 * begun in rpcrdma_reply_handler(). In all other modes, we
-	 * do it here, in thread context. The process is considered
+	 * Finish the deregistration.  The process is considered
 	 * complete when the rr_func vector becomes NULL - this
 	 * was put in place during rpcrdma_reply_handler() - the wait
 	 * call below will not block if the dereg is "done". If
@@ -580,11 +578,6 @@ xprt_rdma_free(void *buffer)
 			&req->rl_segments[i], r_xprt, NULL);
 	}
 
-	if (rep && wait_event_interruptible(rep->rr_unbind, !rep->rr_func)) {
-		rep->rr_func = NULL;	/* abandon the callback */
-		req->rl_reply = NULL;
-	}
-
 	if (req->rl_iov.length == 0) {	/* see allocate above */
 		struct rpcrdma_req *oreq = (struct rpcrdma_req *)req->rl_buffer;
 		oreq->rl_reply = req->rl_reply;
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 4a4e4ea..304c7ad 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -152,7 +152,7 @@ void rpcrdma_event_process(struct ib_wc *wc)
 	dprintk("RPC:       %s: event rep %p status %X opcode %X length %u\n",
 		__func__, rep, wc->status, wc->opcode, wc->byte_len);
 
-	if (!rep) /* send or bind completion that we don't care about */
+	if (!rep) /* send completion that we don't care about */
 		return;
 
 	if (IB_WC_SUCCESS != wc->status) {
@@ -197,8 +197,6 @@ void rpcrdma_event_process(struct ib_wc *wc)
 			}
 			atomic_set(&rep->rr_buffer->rb_credits, credits);
 		}
-		/* fall through */
-	case IB_WC_BIND_MW:
 		rpcrdma_schedule_tasklet(rep);
 		break;
 	default:
@@ -233,7 +231,7 @@ rpcrdma_cq_poll(struct ib_cq *cq)
 /*
  * rpcrdma_cq_event_upcall
  *
- * This upcall handles recv, send, bind and unbind events.
+ * This upcall handles recv and send events.
  * It is reentrant but processes single events in order to maintain
  * ordering of receives to keep server credits.
  *
@@ -494,16 +492,6 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
 	}
 
 	switch (memreg) {
-	case RPCRDMA_MEMWINDOWS:
-	case RPCRDMA_MEMWINDOWS_ASYNC:
-		if (!(devattr.device_cap_flags & IB_DEVICE_MEM_WINDOW)) {
-			dprintk("RPC:       %s: MEMWINDOWS registration "
-				"specified but not supported by adapter, "
-				"using slower RPCRDMA_REGISTER\n",
-				__func__);
-			memreg = RPCRDMA_REGISTER;
-		}
-		break;
 	case RPCRDMA_MTHCAFMR:
 		if (!ia->ri_id->device->alloc_fmr) {
 #if RPCRDMA_PERSISTENT_REGISTRATION
@@ -567,16 +555,13 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
 				IB_ACCESS_REMOTE_READ;
 		goto register_setup;
 #endif
-	case RPCRDMA_MEMWINDOWS_ASYNC:
-	case RPCRDMA_MEMWINDOWS:
-		mem_priv = IB_ACCESS_LOCAL_WRITE |
-				IB_ACCESS_MW_BIND;
-		goto register_setup;
 	case RPCRDMA_MTHCAFMR:
 		if (ia->ri_have_dma_lkey)
 			break;
 		mem_priv = IB_ACCESS_LOCAL_WRITE;
+#if RPCRDMA_PERSISTENT_REGISTRATION
 	register_setup:
+#endif
 		ia->ri_bind_mem = ib_get_dma_mr(ia->ri_pd, mem_priv);
 		if (IS_ERR(ia->ri_bind_mem)) {
 			printk(KERN_ALERT "%s: ib_get_dma_mr for "
@@ -699,14 +684,6 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 		}
 		break;
 	}
-	case RPCRDMA_MEMWINDOWS_ASYNC:
-	case RPCRDMA_MEMWINDOWS:
-		/* Add room for mw_binds+unbinds - overkill! */
-		ep->rep_attr.cap.max_send_wr++;
-		ep->rep_attr.cap.max_send_wr *= (2 * RPCRDMA_MAX_SEGS);
-		if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr)
-			return -EINVAL;
-		break;
 	default:
 		break;
 	}
@@ -728,14 +705,6 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 
 	/* set trigger for requesting send completion */
 	ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/2 /*  - 1*/;
-	switch (ia->ri_memreg_strategy) {
-	case RPCRDMA_MEMWINDOWS_ASYNC:
-	case RPCRDMA_MEMWINDOWS:
-		ep->rep_cqinit -= RPCRDMA_MAX_SEGS;
-		break;
-	default:
-		break;
-	}
 	if (ep->rep_cqinit <= 2)
 		ep->rep_cqinit = 0;
 	INIT_CQCOUNT(ep);
@@ -743,11 +712,6 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 	init_waitqueue_head(&ep->rep_connect_wait);
 	INIT_DELAYED_WORK(&ep->rep_connect_worker, rpcrdma_connect_worker);
 
-	/*
-	 * Create a single cq for receive dto and mw_bind (only ever
-	 * care about unbind, really). Send completions are suppressed.
-	 * Use single threaded tasklet upcalls to maintain ordering.
-	 */
 	ep->rep_cq = ib_create_cq(ia->ri_id->device, rpcrdma_cq_event_upcall,
 				  rpcrdma_cq_async_error_upcall, NULL,
 				  ep->rep_attr.cap.max_recv_wr +
@@ -1020,11 +984,6 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
 		len += (buf->rb_max_requests + 1) * RPCRDMA_MAX_SEGS *
 				sizeof(struct rpcrdma_mw);
 		break;
-	case RPCRDMA_MEMWINDOWS_ASYNC:
-	case RPCRDMA_MEMWINDOWS:
-		len += (buf->rb_max_requests + 1) * RPCRDMA_MAX_SEGS *
-				sizeof(struct rpcrdma_mw);
-		break;
 	default:
 		break;
 	}
@@ -1055,11 +1014,6 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
 	}
 	p += cdata->padding;
 
-	/*
-	 * Allocate the fmr's, or mw's for mw_bind chunk registration.
-	 * We "cycle" the mw's in order to minimize rkey reuse,
-	 * and also reduce unbind-to-bind collision.
-	 */
 	INIT_LIST_HEAD(&buf->rb_mws);
 	r = (struct rpcrdma_mw *)p;
 	switch (ia->ri_memreg_strategy) {
@@ -1107,21 +1061,6 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
 			++r;
 		}
 		break;
-	case RPCRDMA_MEMWINDOWS_ASYNC:
-	case RPCRDMA_MEMWINDOWS:
-		/* Allocate one extra request's worth, for full cycling */
-		for (i = (buf->rb_max_requests+1) * RPCRDMA_MAX_SEGS; i; i--) {
-			r->r.mw = ib_alloc_mw(ia->ri_pd, IB_MW_TYPE_1);
-			if (IS_ERR(r->r.mw)) {
-				rc = PTR_ERR(r->r.mw);
-				dprintk("RPC:       %s: ib_alloc_mw"
-					" failed %i\n", __func__, rc);
-				goto out;
-			}
-			list_add(&r->mw_list, &buf->rb_mws);
-			++r;
-		}
-		break;
 	default:
 		break;
 	}
@@ -1170,7 +1109,6 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
 		memset(rep, 0, sizeof(struct rpcrdma_rep));
 		buf->rb_recv_bufs[i] = rep;
 		buf->rb_recv_bufs[i]->rr_buffer = buf;
-		init_waitqueue_head(&rep->rr_unbind);
 
 		rc = rpcrdma_register_internal(ia, rep->rr_base,
 				len - offsetof(struct rpcrdma_rep, rr_base),
@@ -1204,7 +1142,6 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
 
 	/* clean up in reverse order from create
 	 *   1.  recv mr memory (mr free, then kfree)
-	 *   1a. bind mw memory
 	 *   2.  send mr memory (mr free, then kfree)
 	 *   3.  padding (if any) [moved to rpcrdma_ep_destroy]
 	 *   4.  arrays
@@ -1248,15 +1185,6 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
 					" failed %i\n",
 					__func__, rc);
 			break;
-		case RPCRDMA_MEMWINDOWS_ASYNC:
-		case RPCRDMA_MEMWINDOWS:
-			rc = ib_dealloc_mw(r->r.mw);
-			if (rc)
-				dprintk("RPC:       %s:"
-					" ib_dealloc_mw"
-					" failed %i\n",
-					__func__, rc);
-			break;
 		default:
 			break;
 		}
@@ -1331,15 +1259,12 @@ rpcrdma_buffer_put(struct rpcrdma_req *req)
 	req->rl_niovs = 0;
 	if (req->rl_reply) {
 		buffers->rb_recv_bufs[--buffers->rb_recv_index] = req->rl_reply;
-		init_waitqueue_head(&req->rl_reply->rr_unbind);
 		req->rl_reply->rr_func = NULL;
 		req->rl_reply = NULL;
 	}
 	switch (ia->ri_memreg_strategy) {
 	case RPCRDMA_FRMR:
 	case RPCRDMA_MTHCAFMR:
-	case RPCRDMA_MEMWINDOWS_ASYNC:
-	case RPCRDMA_MEMWINDOWS:
 		/*
 		 * Cycle mw's back in reverse order, and "spin" them.
 		 * This delays and scrambles reuse as much as possible.
@@ -1384,8 +1309,7 @@ rpcrdma_recv_buffer_get(struct rpcrdma_req *req)
 
 /*
  * Put reply buffers back into pool when not attached to
- * request. This happens in error conditions, and when
- * aborting unbinds. Pre-decrement counter/array index.
+ * request. This happens in error conditions.
  */
 void
 rpcrdma_recv_buffer_put(struct rpcrdma_rep *rep)
@@ -1688,74 +1612,6 @@ rpcrdma_deregister_fmr_external(struct rpcrdma_mr_seg *seg,
 }
 
 static int
-rpcrdma_register_memwin_external(struct rpcrdma_mr_seg *seg,
-			int *nsegs, int writing, struct rpcrdma_ia *ia,
-			struct rpcrdma_xprt *r_xprt)
-{
-	int mem_priv = (writing ? IB_ACCESS_REMOTE_WRITE :
-				  IB_ACCESS_REMOTE_READ);
-	struct ib_mw_bind param;
-	int rc;
-
-	*nsegs = 1;
-	rpcrdma_map_one(ia, seg, writing);
-	param.bind_info.mr = ia->ri_bind_mem;
-	param.wr_id = 0ULL;	/* no send cookie */
-	param.bind_info.addr = seg->mr_dma;
-	param.bind_info.length = seg->mr_len;
-	param.send_flags = 0;
-	param.bind_info.mw_access_flags = mem_priv;
-
-	DECR_CQCOUNT(&r_xprt->rx_ep);
-	rc = ib_bind_mw(ia->ri_id->qp, seg->mr_chunk.rl_mw->r.mw, &param);
-	if (rc) {
-		dprintk("RPC:       %s: failed ib_bind_mw "
-			"%u@0x%llx status %i\n",
-			__func__, seg->mr_len,
-			(unsigned long long)seg->mr_dma, rc);
-		rpcrdma_unmap_one(ia, seg);
-	} else {
-		seg->mr_rkey = seg->mr_chunk.rl_mw->r.mw->rkey;
-		seg->mr_base = param.bind_info.addr;
-		seg->mr_nsegs = 1;
-	}
-	return rc;
-}
-
-static int
-rpcrdma_deregister_memwin_external(struct rpcrdma_mr_seg *seg,
-			struct rpcrdma_ia *ia,
-			struct rpcrdma_xprt *r_xprt, void **r)
-{
-	struct ib_mw_bind param;
-	LIST_HEAD(l);
-	int rc;
-
-	BUG_ON(seg->mr_nsegs != 1);
-	param.bind_info.mr = ia->ri_bind_mem;
-	param.bind_info.addr = 0ULL;	/* unbind */
-	param.bind_info.length = 0;
-	param.bind_info.mw_access_flags = 0;
-	if (*r) {
-		param.wr_id = (u64) (unsigned long) *r;
-		param.send_flags = IB_SEND_SIGNALED;
-		INIT_CQCOUNT(&r_xprt->rx_ep);
-	} else {
-		param.wr_id = 0ULL;
-		param.send_flags = 0;
-		DECR_CQCOUNT(&r_xprt->rx_ep);
-	}
-	rc = ib_bind_mw(ia->ri_id->qp, seg->mr_chunk.rl_mw->r.mw, &param);
-	rpcrdma_unmap_one(ia, seg);
-	if (rc)
-		dprintk("RPC:       %s: failed ib_(un)bind_mw,"
-			" status %i\n", __func__, rc);
-	else
-		*r = NULL;	/* will upcall on completion */
-	return rc;
-}
-
-static int
 rpcrdma_register_default_external(struct rpcrdma_mr_seg *seg,
 			int *nsegs, int writing, struct rpcrdma_ia *ia)
 {
@@ -1845,12 +1701,6 @@ rpcrdma_register_external(struct rpcrdma_mr_seg *seg,
 		rc = rpcrdma_register_fmr_external(seg, &nsegs, writing, ia);
 		break;
 
-	/* Registration using memory windows */
-	case RPCRDMA_MEMWINDOWS_ASYNC:
-	case RPCRDMA_MEMWINDOWS:
-		rc = rpcrdma_register_memwin_external(seg, &nsegs, writing, ia, r_xprt);
-		break;
-
 	/* Default registration each time */
 	default:
 		rc = rpcrdma_register_default_external(seg, &nsegs, writing, ia);
@@ -1887,11 +1737,6 @@ rpcrdma_deregister_external(struct rpcrdma_mr_seg *seg,
 		rc = rpcrdma_deregister_fmr_external(seg, ia);
 		break;
 
-	case RPCRDMA_MEMWINDOWS_ASYNC:
-	case RPCRDMA_MEMWINDOWS:
-		rc = rpcrdma_deregister_memwin_external(seg, ia, r_xprt, &r);
-		break;
-
 	default:
 		rc = rpcrdma_deregister_default_external(seg, ia);
 		break;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index c620d13..bf08ee0 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -127,7 +127,6 @@ struct rpcrdma_rep {
 	struct rpc_xprt	*rr_xprt;	/* needed for request/reply matching */
 	void (*rr_func)(struct rpcrdma_rep *);/* called by tasklet in softint */
 	struct list_head rr_list;	/* tasklet list */
-	wait_queue_head_t rr_unbind;	/* optional unbind wait */
 	struct ib_sge	rr_iov;		/* for posting */
 	struct ib_mr	*rr_handle;	/* handle for mem in rr_iov */
 	char	rr_base[MAX_RPCRDMAHDR]; /* minimal inline receive buffer */
@@ -162,7 +161,6 @@ struct rpcrdma_mr_seg {		/* chunk descriptors */
 		struct ib_mr	*rl_mr;		/* if registered directly */
 		struct rpcrdma_mw {		/* if registered from region */
 			union {
-				struct ib_mw	*mw;
 				struct ib_fmr	*fmr;
 				struct {
 					struct ib_fast_reg_page_list *fr_pgl;

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 05/17] xprtrdma: Remove MEMWINDOWS registration modes
@ 2014-04-30 19:30     ` Chuck Lever
  0 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:30 UTC (permalink / raw)
  To: linux-nfs, linux-rdma; +Cc: Anna.Schumaker

The MEMWINDOWS and MEMWINDOES_ASYNC memory registration modes were
intended as stop-gap modes before the introduction of FRMR. They
are now considered obsolete.

MEMWINDOWS_ASYNC is also considered unsafe because it can leave
client memory registered and exposed for an indeterminant time after
each I/O.

At this point, the MEMWINDOWS modes add needless complexity, so
remove them.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
---

 net/sunrpc/xprtrdma/rpc_rdma.c  |   34 --------
 net/sunrpc/xprtrdma/transport.c |    9 --
 net/sunrpc/xprtrdma/verbs.c     |  165 +--------------------------------------
 net/sunrpc/xprtrdma/xprt_rdma.h |    2 
 4 files changed, 7 insertions(+), 203 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index b963e50..e4af26a 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -202,7 +202,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct xdr_buf *target,
 		return 0;
 
 	do {
-		/* bind/register the memory, then build chunk from result. */
 		int n = rpcrdma_register_external(seg, nsegs,
 						cur_wchunk != NULL, r_xprt);
 		if (n <= 0)
@@ -701,16 +700,6 @@ rpcrdma_conn_func(struct rpcrdma_ep *ep)
 }
 
 /*
- * This function is called when memory window unbind which we are waiting
- * for completes. Just use rr_func (zeroed by upcall) to signal completion.
- */
-static void
-rpcrdma_unbind_func(struct rpcrdma_rep *rep)
-{
-	wake_up(&rep->rr_unbind);
-}
-
-/*
  * Called as a tasklet to do req/reply match and complete a request
  * Errors must result in the RPC task either being awakened, or
  * allowed to timeout, to discover the errors at that time.
@@ -724,7 +713,7 @@ rpcrdma_reply_handler(struct rpcrdma_rep *rep)
 	struct rpc_xprt *xprt = rep->rr_xprt;
 	struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
 	__be32 *iptr;
-	int i, rdmalen, status;
+	int rdmalen, status;
 
 	/* Check status. If bad, signal disconnect and return rep to pool */
 	if (rep->rr_len == ~0U) {
@@ -853,27 +842,6 @@ badheader:
 		break;
 	}
 
-	/* If using mw bind, start the deregister process now. */
-	/* (Note: if mr_free(), cannot perform it here, in tasklet context) */
-	if (req->rl_nchunks) switch (r_xprt->rx_ia.ri_memreg_strategy) {
-	case RPCRDMA_MEMWINDOWS:
-		for (i = 0; req->rl_nchunks-- > 1;)
-			i += rpcrdma_deregister_external(
-				&req->rl_segments[i], r_xprt, NULL);
-		/* Optionally wait (not here) for unbinds to complete */
-		rep->rr_func = rpcrdma_unbind_func;
-		(void) rpcrdma_deregister_external(&req->rl_segments[i],
-						   r_xprt, rep);
-		break;
-	case RPCRDMA_MEMWINDOWS_ASYNC:
-		for (i = 0; req->rl_nchunks--;)
-			i += rpcrdma_deregister_external(&req->rl_segments[i],
-							 r_xprt, NULL);
-		break;
-	default:
-		break;
-	}
-
 	dprintk("RPC:       %s: xprt_complete_rqst(0x%p, 0x%p, %d)\n",
 			__func__, xprt, rqst, status);
 	xprt_complete_rqst(rqst->rq_task, status);
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 8c5035a..c23b0c1 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -566,9 +566,7 @@ xprt_rdma_free(void *buffer)
 		__func__, rep, (rep && rep->rr_func) ? " (with waiter)" : "");
 
 	/*
-	 * Finish the deregistration. When using mw bind, this was
-	 * begun in rpcrdma_reply_handler(). In all other modes, we
-	 * do it here, in thread context. The process is considered
+	 * Finish the deregistration.  The process is considered
 	 * complete when the rr_func vector becomes NULL - this
 	 * was put in place during rpcrdma_reply_handler() - the wait
 	 * call below will not block if the dereg is "done". If
@@ -580,11 +578,6 @@ xprt_rdma_free(void *buffer)
 			&req->rl_segments[i], r_xprt, NULL);
 	}
 
-	if (rep && wait_event_interruptible(rep->rr_unbind, !rep->rr_func)) {
-		rep->rr_func = NULL;	/* abandon the callback */
-		req->rl_reply = NULL;
-	}
-
 	if (req->rl_iov.length == 0) {	/* see allocate above */
 		struct rpcrdma_req *oreq = (struct rpcrdma_req *)req->rl_buffer;
 		oreq->rl_reply = req->rl_reply;
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 4a4e4ea..304c7ad 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -152,7 +152,7 @@ void rpcrdma_event_process(struct ib_wc *wc)
 	dprintk("RPC:       %s: event rep %p status %X opcode %X length %u\n",
 		__func__, rep, wc->status, wc->opcode, wc->byte_len);
 
-	if (!rep) /* send or bind completion that we don't care about */
+	if (!rep) /* send completion that we don't care about */
 		return;
 
 	if (IB_WC_SUCCESS != wc->status) {
@@ -197,8 +197,6 @@ void rpcrdma_event_process(struct ib_wc *wc)
 			}
 			atomic_set(&rep->rr_buffer->rb_credits, credits);
 		}
-		/* fall through */
-	case IB_WC_BIND_MW:
 		rpcrdma_schedule_tasklet(rep);
 		break;
 	default:
@@ -233,7 +231,7 @@ rpcrdma_cq_poll(struct ib_cq *cq)
 /*
  * rpcrdma_cq_event_upcall
  *
- * This upcall handles recv, send, bind and unbind events.
+ * This upcall handles recv and send events.
  * It is reentrant but processes single events in order to maintain
  * ordering of receives to keep server credits.
  *
@@ -494,16 +492,6 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
 	}
 
 	switch (memreg) {
-	case RPCRDMA_MEMWINDOWS:
-	case RPCRDMA_MEMWINDOWS_ASYNC:
-		if (!(devattr.device_cap_flags & IB_DEVICE_MEM_WINDOW)) {
-			dprintk("RPC:       %s: MEMWINDOWS registration "
-				"specified but not supported by adapter, "
-				"using slower RPCRDMA_REGISTER\n",
-				__func__);
-			memreg = RPCRDMA_REGISTER;
-		}
-		break;
 	case RPCRDMA_MTHCAFMR:
 		if (!ia->ri_id->device->alloc_fmr) {
 #if RPCRDMA_PERSISTENT_REGISTRATION
@@ -567,16 +555,13 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
 				IB_ACCESS_REMOTE_READ;
 		goto register_setup;
 #endif
-	case RPCRDMA_MEMWINDOWS_ASYNC:
-	case RPCRDMA_MEMWINDOWS:
-		mem_priv = IB_ACCESS_LOCAL_WRITE |
-				IB_ACCESS_MW_BIND;
-		goto register_setup;
 	case RPCRDMA_MTHCAFMR:
 		if (ia->ri_have_dma_lkey)
 			break;
 		mem_priv = IB_ACCESS_LOCAL_WRITE;
+#if RPCRDMA_PERSISTENT_REGISTRATION
 	register_setup:
+#endif
 		ia->ri_bind_mem = ib_get_dma_mr(ia->ri_pd, mem_priv);
 		if (IS_ERR(ia->ri_bind_mem)) {
 			printk(KERN_ALERT "%s: ib_get_dma_mr for "
@@ -699,14 +684,6 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 		}
 		break;
 	}
-	case RPCRDMA_MEMWINDOWS_ASYNC:
-	case RPCRDMA_MEMWINDOWS:
-		/* Add room for mw_binds+unbinds - overkill! */
-		ep->rep_attr.cap.max_send_wr++;
-		ep->rep_attr.cap.max_send_wr *= (2 * RPCRDMA_MAX_SEGS);
-		if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr)
-			return -EINVAL;
-		break;
 	default:
 		break;
 	}
@@ -728,14 +705,6 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 
 	/* set trigger for requesting send completion */
 	ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/2 /*  - 1*/;
-	switch (ia->ri_memreg_strategy) {
-	case RPCRDMA_MEMWINDOWS_ASYNC:
-	case RPCRDMA_MEMWINDOWS:
-		ep->rep_cqinit -= RPCRDMA_MAX_SEGS;
-		break;
-	default:
-		break;
-	}
 	if (ep->rep_cqinit <= 2)
 		ep->rep_cqinit = 0;
 	INIT_CQCOUNT(ep);
@@ -743,11 +712,6 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 	init_waitqueue_head(&ep->rep_connect_wait);
 	INIT_DELAYED_WORK(&ep->rep_connect_worker, rpcrdma_connect_worker);
 
-	/*
-	 * Create a single cq for receive dto and mw_bind (only ever
-	 * care about unbind, really). Send completions are suppressed.
-	 * Use single threaded tasklet upcalls to maintain ordering.
-	 */
 	ep->rep_cq = ib_create_cq(ia->ri_id->device, rpcrdma_cq_event_upcall,
 				  rpcrdma_cq_async_error_upcall, NULL,
 				  ep->rep_attr.cap.max_recv_wr +
@@ -1020,11 +984,6 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
 		len += (buf->rb_max_requests + 1) * RPCRDMA_MAX_SEGS *
 				sizeof(struct rpcrdma_mw);
 		break;
-	case RPCRDMA_MEMWINDOWS_ASYNC:
-	case RPCRDMA_MEMWINDOWS:
-		len += (buf->rb_max_requests + 1) * RPCRDMA_MAX_SEGS *
-				sizeof(struct rpcrdma_mw);
-		break;
 	default:
 		break;
 	}
@@ -1055,11 +1014,6 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
 	}
 	p += cdata->padding;
 
-	/*
-	 * Allocate the fmr's, or mw's for mw_bind chunk registration.
-	 * We "cycle" the mw's in order to minimize rkey reuse,
-	 * and also reduce unbind-to-bind collision.
-	 */
 	INIT_LIST_HEAD(&buf->rb_mws);
 	r = (struct rpcrdma_mw *)p;
 	switch (ia->ri_memreg_strategy) {
@@ -1107,21 +1061,6 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
 			++r;
 		}
 		break;
-	case RPCRDMA_MEMWINDOWS_ASYNC:
-	case RPCRDMA_MEMWINDOWS:
-		/* Allocate one extra request's worth, for full cycling */
-		for (i = (buf->rb_max_requests+1) * RPCRDMA_MAX_SEGS; i; i--) {
-			r->r.mw = ib_alloc_mw(ia->ri_pd, IB_MW_TYPE_1);
-			if (IS_ERR(r->r.mw)) {
-				rc = PTR_ERR(r->r.mw);
-				dprintk("RPC:       %s: ib_alloc_mw"
-					" failed %i\n", __func__, rc);
-				goto out;
-			}
-			list_add(&r->mw_list, &buf->rb_mws);
-			++r;
-		}
-		break;
 	default:
 		break;
 	}
@@ -1170,7 +1109,6 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
 		memset(rep, 0, sizeof(struct rpcrdma_rep));
 		buf->rb_recv_bufs[i] = rep;
 		buf->rb_recv_bufs[i]->rr_buffer = buf;
-		init_waitqueue_head(&rep->rr_unbind);
 
 		rc = rpcrdma_register_internal(ia, rep->rr_base,
 				len - offsetof(struct rpcrdma_rep, rr_base),
@@ -1204,7 +1142,6 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
 
 	/* clean up in reverse order from create
 	 *   1.  recv mr memory (mr free, then kfree)
-	 *   1a. bind mw memory
 	 *   2.  send mr memory (mr free, then kfree)
 	 *   3.  padding (if any) [moved to rpcrdma_ep_destroy]
 	 *   4.  arrays
@@ -1248,15 +1185,6 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
 					" failed %i\n",
 					__func__, rc);
 			break;
-		case RPCRDMA_MEMWINDOWS_ASYNC:
-		case RPCRDMA_MEMWINDOWS:
-			rc = ib_dealloc_mw(r->r.mw);
-			if (rc)
-				dprintk("RPC:       %s:"
-					" ib_dealloc_mw"
-					" failed %i\n",
-					__func__, rc);
-			break;
 		default:
 			break;
 		}
@@ -1331,15 +1259,12 @@ rpcrdma_buffer_put(struct rpcrdma_req *req)
 	req->rl_niovs = 0;
 	if (req->rl_reply) {
 		buffers->rb_recv_bufs[--buffers->rb_recv_index] = req->rl_reply;
-		init_waitqueue_head(&req->rl_reply->rr_unbind);
 		req->rl_reply->rr_func = NULL;
 		req->rl_reply = NULL;
 	}
 	switch (ia->ri_memreg_strategy) {
 	case RPCRDMA_FRMR:
 	case RPCRDMA_MTHCAFMR:
-	case RPCRDMA_MEMWINDOWS_ASYNC:
-	case RPCRDMA_MEMWINDOWS:
 		/*
 		 * Cycle mw's back in reverse order, and "spin" them.
 		 * This delays and scrambles reuse as much as possible.
@@ -1384,8 +1309,7 @@ rpcrdma_recv_buffer_get(struct rpcrdma_req *req)
 
 /*
  * Put reply buffers back into pool when not attached to
- * request. This happens in error conditions, and when
- * aborting unbinds. Pre-decrement counter/array index.
+ * request. This happens in error conditions.
  */
 void
 rpcrdma_recv_buffer_put(struct rpcrdma_rep *rep)
@@ -1688,74 +1612,6 @@ rpcrdma_deregister_fmr_external(struct rpcrdma_mr_seg *seg,
 }
 
 static int
-rpcrdma_register_memwin_external(struct rpcrdma_mr_seg *seg,
-			int *nsegs, int writing, struct rpcrdma_ia *ia,
-			struct rpcrdma_xprt *r_xprt)
-{
-	int mem_priv = (writing ? IB_ACCESS_REMOTE_WRITE :
-				  IB_ACCESS_REMOTE_READ);
-	struct ib_mw_bind param;
-	int rc;
-
-	*nsegs = 1;
-	rpcrdma_map_one(ia, seg, writing);
-	param.bind_info.mr = ia->ri_bind_mem;
-	param.wr_id = 0ULL;	/* no send cookie */
-	param.bind_info.addr = seg->mr_dma;
-	param.bind_info.length = seg->mr_len;
-	param.send_flags = 0;
-	param.bind_info.mw_access_flags = mem_priv;
-
-	DECR_CQCOUNT(&r_xprt->rx_ep);
-	rc = ib_bind_mw(ia->ri_id->qp, seg->mr_chunk.rl_mw->r.mw, &param);
-	if (rc) {
-		dprintk("RPC:       %s: failed ib_bind_mw "
-			"%u@0x%llx status %i\n",
-			__func__, seg->mr_len,
-			(unsigned long long)seg->mr_dma, rc);
-		rpcrdma_unmap_one(ia, seg);
-	} else {
-		seg->mr_rkey = seg->mr_chunk.rl_mw->r.mw->rkey;
-		seg->mr_base = param.bind_info.addr;
-		seg->mr_nsegs = 1;
-	}
-	return rc;
-}
-
-static int
-rpcrdma_deregister_memwin_external(struct rpcrdma_mr_seg *seg,
-			struct rpcrdma_ia *ia,
-			struct rpcrdma_xprt *r_xprt, void **r)
-{
-	struct ib_mw_bind param;
-	LIST_HEAD(l);
-	int rc;
-
-	BUG_ON(seg->mr_nsegs != 1);
-	param.bind_info.mr = ia->ri_bind_mem;
-	param.bind_info.addr = 0ULL;	/* unbind */
-	param.bind_info.length = 0;
-	param.bind_info.mw_access_flags = 0;
-	if (*r) {
-		param.wr_id = (u64) (unsigned long) *r;
-		param.send_flags = IB_SEND_SIGNALED;
-		INIT_CQCOUNT(&r_xprt->rx_ep);
-	} else {
-		param.wr_id = 0ULL;
-		param.send_flags = 0;
-		DECR_CQCOUNT(&r_xprt->rx_ep);
-	}
-	rc = ib_bind_mw(ia->ri_id->qp, seg->mr_chunk.rl_mw->r.mw, &param);
-	rpcrdma_unmap_one(ia, seg);
-	if (rc)
-		dprintk("RPC:       %s: failed ib_(un)bind_mw,"
-			" status %i\n", __func__, rc);
-	else
-		*r = NULL;	/* will upcall on completion */
-	return rc;
-}
-
-static int
 rpcrdma_register_default_external(struct rpcrdma_mr_seg *seg,
 			int *nsegs, int writing, struct rpcrdma_ia *ia)
 {
@@ -1845,12 +1701,6 @@ rpcrdma_register_external(struct rpcrdma_mr_seg *seg,
 		rc = rpcrdma_register_fmr_external(seg, &nsegs, writing, ia);
 		break;
 
-	/* Registration using memory windows */
-	case RPCRDMA_MEMWINDOWS_ASYNC:
-	case RPCRDMA_MEMWINDOWS:
-		rc = rpcrdma_register_memwin_external(seg, &nsegs, writing, ia, r_xprt);
-		break;
-
 	/* Default registration each time */
 	default:
 		rc = rpcrdma_register_default_external(seg, &nsegs, writing, ia);
@@ -1887,11 +1737,6 @@ rpcrdma_deregister_external(struct rpcrdma_mr_seg *seg,
 		rc = rpcrdma_deregister_fmr_external(seg, ia);
 		break;
 
-	case RPCRDMA_MEMWINDOWS_ASYNC:
-	case RPCRDMA_MEMWINDOWS:
-		rc = rpcrdma_deregister_memwin_external(seg, ia, r_xprt, &r);
-		break;
-
 	default:
 		rc = rpcrdma_deregister_default_external(seg, ia);
 		break;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index c620d13..bf08ee0 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -127,7 +127,6 @@ struct rpcrdma_rep {
 	struct rpc_xprt	*rr_xprt;	/* needed for request/reply matching */
 	void (*rr_func)(struct rpcrdma_rep *);/* called by tasklet in softint */
 	struct list_head rr_list;	/* tasklet list */
-	wait_queue_head_t rr_unbind;	/* optional unbind wait */
 	struct ib_sge	rr_iov;		/* for posting */
 	struct ib_mr	*rr_handle;	/* handle for mem in rr_iov */
 	char	rr_base[MAX_RPCRDMAHDR]; /* minimal inline receive buffer */
@@ -162,7 +161,6 @@ struct rpcrdma_mr_seg {		/* chunk descriptors */
 		struct ib_mr	*rl_mr;		/* if registered directly */
 		struct rpcrdma_mw {		/* if registered from region */
 			union {
-				struct ib_mw	*mw;
 				struct ib_fmr	*fmr;
 				struct {
 					struct ib_fast_reg_page_list *fr_pgl;


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 06/17] xprtrdma: Remove REGISTER memory registration mode
  2014-04-30 19:29 ` Chuck Lever
@ 2014-04-30 19:30     ` Chuck Lever
  -1 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:30 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA

All kernel RDMA providers except amso1100 support either MTHCAFMR
or FRMR, both of which are faster than REGISTER.  amso1100 can
continue to use ALLPHYSICAL.

The only other ULP consumer in the kernel that uses the reg_phys_mr
verb is Lustre.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---

 net/sunrpc/xprtrdma/rpc_rdma.c |    3 -
 net/sunrpc/xprtrdma/verbs.c    |   90 ++--------------------------------------
 2 files changed, 5 insertions(+), 88 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index e4af26a..a38efda 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -479,8 +479,7 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
 			 * on receive. Therefore, we request a reply chunk
 			 * for non-writes wherever feasible and efficient.
 			 */
-			if (wtype == rpcrdma_noch &&
-			    r_xprt->rx_ia.ri_memreg_strategy > RPCRDMA_REGISTER)
+			if (wtype == rpcrdma_noch)
 				wtype = rpcrdma_replych;
 		}
 	}
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 304c7ad..6bb9a07 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -494,19 +494,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
 	switch (memreg) {
 	case RPCRDMA_MTHCAFMR:
 		if (!ia->ri_id->device->alloc_fmr) {
-#if RPCRDMA_PERSISTENT_REGISTRATION
 			dprintk("RPC:       %s: MTHCAFMR registration "
 				"specified but not supported by adapter, "
 				"using riskier RPCRDMA_ALLPHYSICAL\n",
 				__func__);
 			memreg = RPCRDMA_ALLPHYSICAL;
-#else
-			dprintk("RPC:       %s: MTHCAFMR registration "
-				"specified but not supported by adapter, "
-				"using slower RPCRDMA_REGISTER\n",
-				__func__);
-			memreg = RPCRDMA_REGISTER;
-#endif
 		}
 		break;
 	case RPCRDMA_FRMR:
@@ -514,19 +506,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
 		if ((devattr.device_cap_flags &
 		     (IB_DEVICE_MEM_MGT_EXTENSIONS|IB_DEVICE_LOCAL_DMA_LKEY)) !=
 		    (IB_DEVICE_MEM_MGT_EXTENSIONS|IB_DEVICE_LOCAL_DMA_LKEY)) {
-#if RPCRDMA_PERSISTENT_REGISTRATION
 			dprintk("RPC:       %s: FRMR registration "
 				"specified but not supported by adapter, "
 				"using riskier RPCRDMA_ALLPHYSICAL\n",
 				__func__);
 			memreg = RPCRDMA_ALLPHYSICAL;
-#else
-			dprintk("RPC:       %s: FRMR registration "
-				"specified but not supported by adapter, "
-				"using slower RPCRDMA_REGISTER\n",
-				__func__);
-			memreg = RPCRDMA_REGISTER;
-#endif
 		} else {
 			/* Mind the ia limit on FRMR page list depth */
 			ia->ri_max_frmr_depth = min_t(unsigned int,
@@ -545,7 +529,6 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
 	 * adapter.
 	 */
 	switch (memreg) {
-	case RPCRDMA_REGISTER:
 	case RPCRDMA_FRMR:
 		break;
 #if RPCRDMA_PERSISTENT_REGISTRATION
@@ -565,11 +548,10 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
 		ia->ri_bind_mem = ib_get_dma_mr(ia->ri_pd, mem_priv);
 		if (IS_ERR(ia->ri_bind_mem)) {
 			printk(KERN_ALERT "%s: ib_get_dma_mr for "
-				"phys register failed with %lX\n\t"
-				"Will continue with degraded performance\n",
+				"phys register failed with %lX\n",
 				__func__, PTR_ERR(ia->ri_bind_mem));
-			memreg = RPCRDMA_REGISTER;
-			ia->ri_bind_mem = NULL;
+			rc = -ENOMEM;
+			goto out2;
 		}
 		break;
 	default:
@@ -1611,67 +1593,6 @@ rpcrdma_deregister_fmr_external(struct rpcrdma_mr_seg *seg,
 	return rc;
 }
 
-static int
-rpcrdma_register_default_external(struct rpcrdma_mr_seg *seg,
-			int *nsegs, int writing, struct rpcrdma_ia *ia)
-{
-	int mem_priv = (writing ? IB_ACCESS_REMOTE_WRITE :
-				  IB_ACCESS_REMOTE_READ);
-	struct rpcrdma_mr_seg *seg1 = seg;
-	struct ib_phys_buf ipb[RPCRDMA_MAX_DATA_SEGS];
-	int len, i, rc = 0;
-
-	if (*nsegs > RPCRDMA_MAX_DATA_SEGS)
-		*nsegs = RPCRDMA_MAX_DATA_SEGS;
-	for (len = 0, i = 0; i < *nsegs;) {
-		rpcrdma_map_one(ia, seg, writing);
-		ipb[i].addr = seg->mr_dma;
-		ipb[i].size = seg->mr_len;
-		len += seg->mr_len;
-		++seg;
-		++i;
-		/* Check for holes */
-		if ((i < *nsegs && offset_in_page(seg->mr_offset)) ||
-		    offset_in_page((seg-1)->mr_offset+(seg-1)->mr_len))
-			break;
-	}
-	seg1->mr_base = seg1->mr_dma;
-	seg1->mr_chunk.rl_mr = ib_reg_phys_mr(ia->ri_pd,
-				ipb, i, mem_priv, &seg1->mr_base);
-	if (IS_ERR(seg1->mr_chunk.rl_mr)) {
-		rc = PTR_ERR(seg1->mr_chunk.rl_mr);
-		dprintk("RPC:       %s: failed ib_reg_phys_mr "
-			"%u@0x%llx (%d)... status %i\n",
-			__func__, len,
-			(unsigned long long)seg1->mr_dma, i, rc);
-		while (i--)
-			rpcrdma_unmap_one(ia, --seg);
-	} else {
-		seg1->mr_rkey = seg1->mr_chunk.rl_mr->rkey;
-		seg1->mr_nsegs = i;
-		seg1->mr_len = len;
-	}
-	*nsegs = i;
-	return rc;
-}
-
-static int
-rpcrdma_deregister_default_external(struct rpcrdma_mr_seg *seg,
-			struct rpcrdma_ia *ia)
-{
-	struct rpcrdma_mr_seg *seg1 = seg;
-	int rc;
-
-	rc = ib_dereg_mr(seg1->mr_chunk.rl_mr);
-	seg1->mr_chunk.rl_mr = NULL;
-	while (seg1->mr_nsegs--)
-		rpcrdma_unmap_one(ia, seg++);
-	if (rc)
-		dprintk("RPC:       %s: failed ib_dereg_mr,"
-			" status %i\n", __func__, rc);
-	return rc;
-}
-
 int
 rpcrdma_register_external(struct rpcrdma_mr_seg *seg,
 			int nsegs, int writing, struct rpcrdma_xprt *r_xprt)
@@ -1701,10 +1622,8 @@ rpcrdma_register_external(struct rpcrdma_mr_seg *seg,
 		rc = rpcrdma_register_fmr_external(seg, &nsegs, writing, ia);
 		break;
 
-	/* Default registration each time */
 	default:
-		rc = rpcrdma_register_default_external(seg, &nsegs, writing, ia);
-		break;
+		return -1;
 	}
 	if (rc)
 		return -1;
@@ -1738,7 +1657,6 @@ rpcrdma_deregister_external(struct rpcrdma_mr_seg *seg,
 		break;
 
 	default:
-		rc = rpcrdma_deregister_default_external(seg, ia);
 		break;
 	}
 	if (r) {

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 06/17] xprtrdma: Remove REGISTER memory registration mode
@ 2014-04-30 19:30     ` Chuck Lever
  0 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:30 UTC (permalink / raw)
  To: linux-nfs, linux-rdma; +Cc: Anna.Schumaker

All kernel RDMA providers except amso1100 support either MTHCAFMR
or FRMR, both of which are faster than REGISTER.  amso1100 can
continue to use ALLPHYSICAL.

The only other ULP consumer in the kernel that uses the reg_phys_mr
verb is Lustre.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 net/sunrpc/xprtrdma/rpc_rdma.c |    3 -
 net/sunrpc/xprtrdma/verbs.c    |   90 ++--------------------------------------
 2 files changed, 5 insertions(+), 88 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index e4af26a..a38efda 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -479,8 +479,7 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
 			 * on receive. Therefore, we request a reply chunk
 			 * for non-writes wherever feasible and efficient.
 			 */
-			if (wtype == rpcrdma_noch &&
-			    r_xprt->rx_ia.ri_memreg_strategy > RPCRDMA_REGISTER)
+			if (wtype == rpcrdma_noch)
 				wtype = rpcrdma_replych;
 		}
 	}
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 304c7ad..6bb9a07 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -494,19 +494,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
 	switch (memreg) {
 	case RPCRDMA_MTHCAFMR:
 		if (!ia->ri_id->device->alloc_fmr) {
-#if RPCRDMA_PERSISTENT_REGISTRATION
 			dprintk("RPC:       %s: MTHCAFMR registration "
 				"specified but not supported by adapter, "
 				"using riskier RPCRDMA_ALLPHYSICAL\n",
 				__func__);
 			memreg = RPCRDMA_ALLPHYSICAL;
-#else
-			dprintk("RPC:       %s: MTHCAFMR registration "
-				"specified but not supported by adapter, "
-				"using slower RPCRDMA_REGISTER\n",
-				__func__);
-			memreg = RPCRDMA_REGISTER;
-#endif
 		}
 		break;
 	case RPCRDMA_FRMR:
@@ -514,19 +506,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
 		if ((devattr.device_cap_flags &
 		     (IB_DEVICE_MEM_MGT_EXTENSIONS|IB_DEVICE_LOCAL_DMA_LKEY)) !=
 		    (IB_DEVICE_MEM_MGT_EXTENSIONS|IB_DEVICE_LOCAL_DMA_LKEY)) {
-#if RPCRDMA_PERSISTENT_REGISTRATION
 			dprintk("RPC:       %s: FRMR registration "
 				"specified but not supported by adapter, "
 				"using riskier RPCRDMA_ALLPHYSICAL\n",
 				__func__);
 			memreg = RPCRDMA_ALLPHYSICAL;
-#else
-			dprintk("RPC:       %s: FRMR registration "
-				"specified but not supported by adapter, "
-				"using slower RPCRDMA_REGISTER\n",
-				__func__);
-			memreg = RPCRDMA_REGISTER;
-#endif
 		} else {
 			/* Mind the ia limit on FRMR page list depth */
 			ia->ri_max_frmr_depth = min_t(unsigned int,
@@ -545,7 +529,6 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
 	 * adapter.
 	 */
 	switch (memreg) {
-	case RPCRDMA_REGISTER:
 	case RPCRDMA_FRMR:
 		break;
 #if RPCRDMA_PERSISTENT_REGISTRATION
@@ -565,11 +548,10 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
 		ia->ri_bind_mem = ib_get_dma_mr(ia->ri_pd, mem_priv);
 		if (IS_ERR(ia->ri_bind_mem)) {
 			printk(KERN_ALERT "%s: ib_get_dma_mr for "
-				"phys register failed with %lX\n\t"
-				"Will continue with degraded performance\n",
+				"phys register failed with %lX\n",
 				__func__, PTR_ERR(ia->ri_bind_mem));
-			memreg = RPCRDMA_REGISTER;
-			ia->ri_bind_mem = NULL;
+			rc = -ENOMEM;
+			goto out2;
 		}
 		break;
 	default:
@@ -1611,67 +1593,6 @@ rpcrdma_deregister_fmr_external(struct rpcrdma_mr_seg *seg,
 	return rc;
 }
 
-static int
-rpcrdma_register_default_external(struct rpcrdma_mr_seg *seg,
-			int *nsegs, int writing, struct rpcrdma_ia *ia)
-{
-	int mem_priv = (writing ? IB_ACCESS_REMOTE_WRITE :
-				  IB_ACCESS_REMOTE_READ);
-	struct rpcrdma_mr_seg *seg1 = seg;
-	struct ib_phys_buf ipb[RPCRDMA_MAX_DATA_SEGS];
-	int len, i, rc = 0;
-
-	if (*nsegs > RPCRDMA_MAX_DATA_SEGS)
-		*nsegs = RPCRDMA_MAX_DATA_SEGS;
-	for (len = 0, i = 0; i < *nsegs;) {
-		rpcrdma_map_one(ia, seg, writing);
-		ipb[i].addr = seg->mr_dma;
-		ipb[i].size = seg->mr_len;
-		len += seg->mr_len;
-		++seg;
-		++i;
-		/* Check for holes */
-		if ((i < *nsegs && offset_in_page(seg->mr_offset)) ||
-		    offset_in_page((seg-1)->mr_offset+(seg-1)->mr_len))
-			break;
-	}
-	seg1->mr_base = seg1->mr_dma;
-	seg1->mr_chunk.rl_mr = ib_reg_phys_mr(ia->ri_pd,
-				ipb, i, mem_priv, &seg1->mr_base);
-	if (IS_ERR(seg1->mr_chunk.rl_mr)) {
-		rc = PTR_ERR(seg1->mr_chunk.rl_mr);
-		dprintk("RPC:       %s: failed ib_reg_phys_mr "
-			"%u@0x%llx (%d)... status %i\n",
-			__func__, len,
-			(unsigned long long)seg1->mr_dma, i, rc);
-		while (i--)
-			rpcrdma_unmap_one(ia, --seg);
-	} else {
-		seg1->mr_rkey = seg1->mr_chunk.rl_mr->rkey;
-		seg1->mr_nsegs = i;
-		seg1->mr_len = len;
-	}
-	*nsegs = i;
-	return rc;
-}
-
-static int
-rpcrdma_deregister_default_external(struct rpcrdma_mr_seg *seg,
-			struct rpcrdma_ia *ia)
-{
-	struct rpcrdma_mr_seg *seg1 = seg;
-	int rc;
-
-	rc = ib_dereg_mr(seg1->mr_chunk.rl_mr);
-	seg1->mr_chunk.rl_mr = NULL;
-	while (seg1->mr_nsegs--)
-		rpcrdma_unmap_one(ia, seg++);
-	if (rc)
-		dprintk("RPC:       %s: failed ib_dereg_mr,"
-			" status %i\n", __func__, rc);
-	return rc;
-}
-
 int
 rpcrdma_register_external(struct rpcrdma_mr_seg *seg,
 			int nsegs, int writing, struct rpcrdma_xprt *r_xprt)
@@ -1701,10 +1622,8 @@ rpcrdma_register_external(struct rpcrdma_mr_seg *seg,
 		rc = rpcrdma_register_fmr_external(seg, &nsegs, writing, ia);
 		break;
 
-	/* Default registration each time */
 	default:
-		rc = rpcrdma_register_default_external(seg, &nsegs, writing, ia);
-		break;
+		return -1;
 	}
 	if (rc)
 		return -1;
@@ -1738,7 +1657,6 @@ rpcrdma_deregister_external(struct rpcrdma_mr_seg *seg,
 		break;
 
 	default:
-		rc = rpcrdma_deregister_default_external(seg, ia);
 		break;
 	}
 	if (r) {


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 07/17] xprtrdma: Fall back to MTHCAFMR when FRMR is not supported
  2014-04-30 19:29 ` Chuck Lever
@ 2014-04-30 19:30     ` Chuck Lever
  -1 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:30 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA

An audit of in-kernel RDMA providers that do not support the FRMR
memory registration shows that several of them support MTHCAFMR.
Prefer MTHCAFMR when FRMR is not supported.

If MTHCAFMR is not supported, only then choose ALLPHYSICAL.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---

 net/sunrpc/xprtrdma/verbs.c |   31 +++++++++++++++----------------
 1 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 6bb9a07..a352798 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -491,33 +491,32 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
 		ia->ri_dma_lkey = ia->ri_id->device->local_dma_lkey;
 	}
 
-	switch (memreg) {
-	case RPCRDMA_MTHCAFMR:
-		if (!ia->ri_id->device->alloc_fmr) {
-			dprintk("RPC:       %s: MTHCAFMR registration "
-				"specified but not supported by adapter, "
-				"using riskier RPCRDMA_ALLPHYSICAL\n",
-				__func__);
-			memreg = RPCRDMA_ALLPHYSICAL;
-		}
-		break;
-	case RPCRDMA_FRMR:
+	if (memreg == RPCRDMA_FRMR) {
 		/* Requires both frmr reg and local dma lkey */
 		if ((devattr.device_cap_flags &
 		     (IB_DEVICE_MEM_MGT_EXTENSIONS|IB_DEVICE_LOCAL_DMA_LKEY)) !=
 		    (IB_DEVICE_MEM_MGT_EXTENSIONS|IB_DEVICE_LOCAL_DMA_LKEY)) {
 			dprintk("RPC:       %s: FRMR registration "
-				"specified but not supported by adapter, "
-				"using riskier RPCRDMA_ALLPHYSICAL\n",
-				__func__);
-			memreg = RPCRDMA_ALLPHYSICAL;
+				"not supported by HCA\n", __func__);
+			memreg = RPCRDMA_MTHCAFMR;
 		} else {
 			/* Mind the ia limit on FRMR page list depth */
 			ia->ri_max_frmr_depth = min_t(unsigned int,
 				RPCRDMA_MAX_DATA_SEGS,
 				devattr.max_fast_reg_page_list_len);
 		}
-		break;
+	}
+	if (memreg == RPCRDMA_MTHCAFMR) {
+		if (!ia->ri_id->device->alloc_fmr) {
+			dprintk("RPC:       %s: MTHCAFMR registration "
+				"not supported by HCA\n", __func__);
+#if RPCRDMA_PERSISTENT_REGISTRATION
+			memreg = RPCRDMA_ALLPHYSICAL;
+#else
+			rc = -EINVAL;
+			goto out2;
+#endif
+		}
 	}
 
 	/*

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 07/17] xprtrdma: Fall back to MTHCAFMR when FRMR is not supported
@ 2014-04-30 19:30     ` Chuck Lever
  0 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:30 UTC (permalink / raw)
  To: linux-nfs, linux-rdma; +Cc: Anna.Schumaker

An audit of in-kernel RDMA providers that do not support the FRMR
memory registration shows that several of them support MTHCAFMR.
Prefer MTHCAFMR when FRMR is not supported.

If MTHCAFMR is not supported, only then choose ALLPHYSICAL.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 net/sunrpc/xprtrdma/verbs.c |   31 +++++++++++++++----------------
 1 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 6bb9a07..a352798 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -491,33 +491,32 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
 		ia->ri_dma_lkey = ia->ri_id->device->local_dma_lkey;
 	}
 
-	switch (memreg) {
-	case RPCRDMA_MTHCAFMR:
-		if (!ia->ri_id->device->alloc_fmr) {
-			dprintk("RPC:       %s: MTHCAFMR registration "
-				"specified but not supported by adapter, "
-				"using riskier RPCRDMA_ALLPHYSICAL\n",
-				__func__);
-			memreg = RPCRDMA_ALLPHYSICAL;
-		}
-		break;
-	case RPCRDMA_FRMR:
+	if (memreg == RPCRDMA_FRMR) {
 		/* Requires both frmr reg and local dma lkey */
 		if ((devattr.device_cap_flags &
 		     (IB_DEVICE_MEM_MGT_EXTENSIONS|IB_DEVICE_LOCAL_DMA_LKEY)) !=
 		    (IB_DEVICE_MEM_MGT_EXTENSIONS|IB_DEVICE_LOCAL_DMA_LKEY)) {
 			dprintk("RPC:       %s: FRMR registration "
-				"specified but not supported by adapter, "
-				"using riskier RPCRDMA_ALLPHYSICAL\n",
-				__func__);
-			memreg = RPCRDMA_ALLPHYSICAL;
+				"not supported by HCA\n", __func__);
+			memreg = RPCRDMA_MTHCAFMR;
 		} else {
 			/* Mind the ia limit on FRMR page list depth */
 			ia->ri_max_frmr_depth = min_t(unsigned int,
 				RPCRDMA_MAX_DATA_SEGS,
 				devattr.max_fast_reg_page_list_len);
 		}
-		break;
+	}
+	if (memreg == RPCRDMA_MTHCAFMR) {
+		if (!ia->ri_id->device->alloc_fmr) {
+			dprintk("RPC:       %s: MTHCAFMR registration "
+				"not supported by HCA\n", __func__);
+#if RPCRDMA_PERSISTENT_REGISTRATION
+			memreg = RPCRDMA_ALLPHYSICAL;
+#else
+			rc = -EINVAL;
+			goto out2;
+#endif
+		}
 	}
 
 	/*


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 08/17] xprtrdma: mount reports "Invalid mount option" if memreg mode not supported
  2014-04-30 19:29 ` Chuck Lever
@ 2014-04-30 19:30     ` Chuck Lever
  -1 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:30 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA

If the selected memory registration mode is not supported by the
underlying provider/HCA, the NFS mount command reports that there was
an invalid mount option, and fails. This is misleading.

Reporting a problem allocating memory is a lot closer to the truth.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---

 net/sunrpc/xprtrdma/verbs.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index a352798..7c7e9b4 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -513,7 +513,7 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
 #if RPCRDMA_PERSISTENT_REGISTRATION
 			memreg = RPCRDMA_ALLPHYSICAL;
 #else
-			rc = -EINVAL;
+			rc = -ENOMEM;
 			goto out2;
 #endif
 		}
@@ -554,9 +554,9 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
 		}
 		break;
 	default:
-		printk(KERN_ERR "%s: invalid memory registration mode %d\n",
-				__func__, memreg);
-		rc = -EINVAL;
+		printk(KERN_ERR "RPC: Unsupported memory "
+				"registration mode: %d\n", memreg);
+		rc = -ENOMEM;
 		goto out2;
 	}
 	dprintk("RPC:       %s: memory registration strategy is %d\n",

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 08/17] xprtrdma: mount reports "Invalid mount option" if memreg mode not supported
@ 2014-04-30 19:30     ` Chuck Lever
  0 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:30 UTC (permalink / raw)
  To: linux-nfs, linux-rdma; +Cc: Anna.Schumaker

If the selected memory registration mode is not supported by the
underlying provider/HCA, the NFS mount command reports that there was
an invalid mount option, and fails. This is misleading.

Reporting a problem allocating memory is a lot closer to the truth.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 net/sunrpc/xprtrdma/verbs.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index a352798..7c7e9b4 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -513,7 +513,7 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
 #if RPCRDMA_PERSISTENT_REGISTRATION
 			memreg = RPCRDMA_ALLPHYSICAL;
 #else
-			rc = -EINVAL;
+			rc = -ENOMEM;
 			goto out2;
 #endif
 		}
@@ -554,9 +554,9 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
 		}
 		break;
 	default:
-		printk(KERN_ERR "%s: invalid memory registration mode %d\n",
-				__func__, memreg);
-		rc = -EINVAL;
+		printk(KERN_ERR "RPC: Unsupported memory "
+				"registration mode: %d\n", memreg);
+		rc = -ENOMEM;
 		goto out2;
 	}
 	dprintk("RPC:       %s: memory registration strategy is %d\n",


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 09/17] xprtrdma: Simplify rpcrdma_deregister_external() synopsis
  2014-04-30 19:29 ` Chuck Lever
@ 2014-04-30 19:30     ` Chuck Lever
  -1 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:30 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA

Clean up: All remaining callers of rpcrdma_deregister_external()
pass NULL as the last argument, so remove that argument.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Tested-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---

 net/sunrpc/xprtrdma/rpc_rdma.c  |    2 +-
 net/sunrpc/xprtrdma/transport.c |    2 +-
 net/sunrpc/xprtrdma/verbs.c     |    8 +-------
 net/sunrpc/xprtrdma/xprt_rdma.h |    2 +-
 4 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index a38efda..315417d 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -273,7 +273,7 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct xdr_buf *target,
 out:
 	for (pos = 0; nchunks--;)
 		pos += rpcrdma_deregister_external(
-				&req->rl_segments[pos], r_xprt, NULL);
+				&req->rl_segments[pos], r_xprt);
 	return 0;
 }
 
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index c23b0c1..430cabb 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -575,7 +575,7 @@ xprt_rdma_free(void *buffer)
 	for (i = 0; req->rl_nchunks;) {
 		--req->rl_nchunks;
 		i += rpcrdma_deregister_external(
-			&req->rl_segments[i], r_xprt, NULL);
+			&req->rl_segments[i], r_xprt);
 	}
 
 	if (req->rl_iov.length == 0) {	/* see allocate above */
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 7c7e9b4..0cbc83c 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1632,7 +1632,7 @@ rpcrdma_register_external(struct rpcrdma_mr_seg *seg,
 
 int
 rpcrdma_deregister_external(struct rpcrdma_mr_seg *seg,
-		struct rpcrdma_xprt *r_xprt, void *r)
+		struct rpcrdma_xprt *r_xprt)
 {
 	struct rpcrdma_ia *ia = &r_xprt->rx_ia;
 	int nsegs = seg->mr_nsegs, rc;
@@ -1658,12 +1658,6 @@ rpcrdma_deregister_external(struct rpcrdma_mr_seg *seg,
 	default:
 		break;
 	}
-	if (r) {
-		struct rpcrdma_rep *rep = r;
-		void (*func)(struct rpcrdma_rep *) = rep->rr_func;
-		rep->rr_func = NULL;
-		func(rep);	/* dereg done, callback now */
-	}
 	return nsegs;
 }
 
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index bf08ee0..3f44d6a 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -331,7 +331,7 @@ int rpcrdma_deregister_internal(struct rpcrdma_ia *,
 int rpcrdma_register_external(struct rpcrdma_mr_seg *,
 				int, int, struct rpcrdma_xprt *);
 int rpcrdma_deregister_external(struct rpcrdma_mr_seg *,
-				struct rpcrdma_xprt *, void *);
+				struct rpcrdma_xprt *);
 
 /*
  * RPC/RDMA connection management calls - xprtrdma/rpc_rdma.c

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 09/17] xprtrdma: Simplify rpcrdma_deregister_external() synopsis
@ 2014-04-30 19:30     ` Chuck Lever
  0 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:30 UTC (permalink / raw)
  To: linux-nfs, linux-rdma; +Cc: Anna.Schumaker

Clean up: All remaining callers of rpcrdma_deregister_external()
pass NULL as the last argument, so remove that argument.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
---

 net/sunrpc/xprtrdma/rpc_rdma.c  |    2 +-
 net/sunrpc/xprtrdma/transport.c |    2 +-
 net/sunrpc/xprtrdma/verbs.c     |    8 +-------
 net/sunrpc/xprtrdma/xprt_rdma.h |    2 +-
 4 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index a38efda..315417d 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -273,7 +273,7 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct xdr_buf *target,
 out:
 	for (pos = 0; nchunks--;)
 		pos += rpcrdma_deregister_external(
-				&req->rl_segments[pos], r_xprt, NULL);
+				&req->rl_segments[pos], r_xprt);
 	return 0;
 }
 
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index c23b0c1..430cabb 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -575,7 +575,7 @@ xprt_rdma_free(void *buffer)
 	for (i = 0; req->rl_nchunks;) {
 		--req->rl_nchunks;
 		i += rpcrdma_deregister_external(
-			&req->rl_segments[i], r_xprt, NULL);
+			&req->rl_segments[i], r_xprt);
 	}
 
 	if (req->rl_iov.length == 0) {	/* see allocate above */
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 7c7e9b4..0cbc83c 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1632,7 +1632,7 @@ rpcrdma_register_external(struct rpcrdma_mr_seg *seg,
 
 int
 rpcrdma_deregister_external(struct rpcrdma_mr_seg *seg,
-		struct rpcrdma_xprt *r_xprt, void *r)
+		struct rpcrdma_xprt *r_xprt)
 {
 	struct rpcrdma_ia *ia = &r_xprt->rx_ia;
 	int nsegs = seg->mr_nsegs, rc;
@@ -1658,12 +1658,6 @@ rpcrdma_deregister_external(struct rpcrdma_mr_seg *seg,
 	default:
 		break;
 	}
-	if (r) {
-		struct rpcrdma_rep *rep = r;
-		void (*func)(struct rpcrdma_rep *) = rep->rr_func;
-		rep->rr_func = NULL;
-		func(rep);	/* dereg done, callback now */
-	}
 	return nsegs;
 }
 
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index bf08ee0..3f44d6a 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -331,7 +331,7 @@ int rpcrdma_deregister_internal(struct rpcrdma_ia *,
 int rpcrdma_register_external(struct rpcrdma_mr_seg *,
 				int, int, struct rpcrdma_xprt *);
 int rpcrdma_deregister_external(struct rpcrdma_mr_seg *,
-				struct rpcrdma_xprt *, void *);
+				struct rpcrdma_xprt *);
 
 /*
  * RPC/RDMA connection management calls - xprtrdma/rpc_rdma.c


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 10/17] xprtrdma: Make rpcrdma_ep_destroy() return void
  2014-04-30 19:29 ` Chuck Lever
@ 2014-04-30 19:30     ` Chuck Lever
  -1 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:30 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA

Clean up: rpcrdma_ep_destroy() returns a value that is used
only to print a debugging message. rpcrdma_ep_destroy() already
prints debugging messages in all error cases.

Make rpcrdma_ep_destroy() return void instead.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Tested-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---

 net/sunrpc/xprtrdma/transport.c |    8 ++------
 net/sunrpc/xprtrdma/verbs.c     |    7 +------
 net/sunrpc/xprtrdma/xprt_rdma.h |    2 +-
 3 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 430cabb..d18b2a3 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -229,7 +229,6 @@ static void
 xprt_rdma_destroy(struct rpc_xprt *xprt)
 {
 	struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
-	int rc;
 
 	dprintk("RPC:       %s: called\n", __func__);
 
@@ -238,10 +237,7 @@ xprt_rdma_destroy(struct rpc_xprt *xprt)
 	xprt_clear_connected(xprt);
 
 	rpcrdma_buffer_destroy(&r_xprt->rx_buf);
-	rc = rpcrdma_ep_destroy(&r_xprt->rx_ep, &r_xprt->rx_ia);
-	if (rc)
-		dprintk("RPC:       %s: rpcrdma_ep_destroy returned %i\n",
-			__func__, rc);
+	rpcrdma_ep_destroy(&r_xprt->rx_ep, &r_xprt->rx_ia);
 	rpcrdma_ia_close(&r_xprt->rx_ia);
 
 	xprt_rdma_free_addresses(xprt);
@@ -391,7 +387,7 @@ out4:
 	xprt_rdma_free_addresses(xprt);
 	rc = -EINVAL;
 out3:
-	(void) rpcrdma_ep_destroy(new_ep, &new_xprt->rx_ia);
+	rpcrdma_ep_destroy(new_ep, &new_xprt->rx_ia);
 out2:
 	rpcrdma_ia_close(&new_xprt->rx_ia);
 out1:
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 0cbc83c..edc951e 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -748,11 +748,8 @@ out1:
  * Disconnect and destroy endpoint. After this, the only
  * valid operations on the ep are to free it (if dynamically
  * allocated) or re-create it.
- *
- * The caller's error handling must be sure to not leak the endpoint
- * if this function fails.
  */
-int
+void
 rpcrdma_ep_destroy(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
 {
 	int rc;
@@ -782,8 +779,6 @@ rpcrdma_ep_destroy(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
 	if (rc)
 		dprintk("RPC:       %s: ib_destroy_cq returned %i\n",
 			__func__, rc);
-
-	return rc;
 }
 
 /*
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 3f44d6a..362a19d 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -301,7 +301,7 @@ void rpcrdma_ia_close(struct rpcrdma_ia *);
  */
 int rpcrdma_ep_create(struct rpcrdma_ep *, struct rpcrdma_ia *,
 				struct rpcrdma_create_data_internal *);
-int rpcrdma_ep_destroy(struct rpcrdma_ep *, struct rpcrdma_ia *);
+void rpcrdma_ep_destroy(struct rpcrdma_ep *, struct rpcrdma_ia *);
 int rpcrdma_ep_connect(struct rpcrdma_ep *, struct rpcrdma_ia *);
 int rpcrdma_ep_disconnect(struct rpcrdma_ep *, struct rpcrdma_ia *);
 

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 10/17] xprtrdma: Make rpcrdma_ep_destroy() return void
@ 2014-04-30 19:30     ` Chuck Lever
  0 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:30 UTC (permalink / raw)
  To: linux-nfs, linux-rdma; +Cc: Anna.Schumaker

Clean up: rpcrdma_ep_destroy() returns a value that is used
only to print a debugging message. rpcrdma_ep_destroy() already
prints debugging messages in all error cases.

Make rpcrdma_ep_destroy() return void instead.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
---

 net/sunrpc/xprtrdma/transport.c |    8 ++------
 net/sunrpc/xprtrdma/verbs.c     |    7 +------
 net/sunrpc/xprtrdma/xprt_rdma.h |    2 +-
 3 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 430cabb..d18b2a3 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -229,7 +229,6 @@ static void
 xprt_rdma_destroy(struct rpc_xprt *xprt)
 {
 	struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
-	int rc;
 
 	dprintk("RPC:       %s: called\n", __func__);
 
@@ -238,10 +237,7 @@ xprt_rdma_destroy(struct rpc_xprt *xprt)
 	xprt_clear_connected(xprt);
 
 	rpcrdma_buffer_destroy(&r_xprt->rx_buf);
-	rc = rpcrdma_ep_destroy(&r_xprt->rx_ep, &r_xprt->rx_ia);
-	if (rc)
-		dprintk("RPC:       %s: rpcrdma_ep_destroy returned %i\n",
-			__func__, rc);
+	rpcrdma_ep_destroy(&r_xprt->rx_ep, &r_xprt->rx_ia);
 	rpcrdma_ia_close(&r_xprt->rx_ia);
 
 	xprt_rdma_free_addresses(xprt);
@@ -391,7 +387,7 @@ out4:
 	xprt_rdma_free_addresses(xprt);
 	rc = -EINVAL;
 out3:
-	(void) rpcrdma_ep_destroy(new_ep, &new_xprt->rx_ia);
+	rpcrdma_ep_destroy(new_ep, &new_xprt->rx_ia);
 out2:
 	rpcrdma_ia_close(&new_xprt->rx_ia);
 out1:
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 0cbc83c..edc951e 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -748,11 +748,8 @@ out1:
  * Disconnect and destroy endpoint. After this, the only
  * valid operations on the ep are to free it (if dynamically
  * allocated) or re-create it.
- *
- * The caller's error handling must be sure to not leak the endpoint
- * if this function fails.
  */
-int
+void
 rpcrdma_ep_destroy(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
 {
 	int rc;
@@ -782,8 +779,6 @@ rpcrdma_ep_destroy(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
 	if (rc)
 		dprintk("RPC:       %s: ib_destroy_cq returned %i\n",
 			__func__, rc);
-
-	return rc;
 }
 
 /*
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 3f44d6a..362a19d 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -301,7 +301,7 @@ void rpcrdma_ia_close(struct rpcrdma_ia *);
  */
 int rpcrdma_ep_create(struct rpcrdma_ep *, struct rpcrdma_ia *,
 				struct rpcrdma_create_data_internal *);
-int rpcrdma_ep_destroy(struct rpcrdma_ep *, struct rpcrdma_ia *);
+void rpcrdma_ep_destroy(struct rpcrdma_ep *, struct rpcrdma_ia *);
 int rpcrdma_ep_connect(struct rpcrdma_ep *, struct rpcrdma_ia *);
 int rpcrdma_ep_disconnect(struct rpcrdma_ep *, struct rpcrdma_ia *);
 


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 11/17] xprtrdma: Split the completion queue
  2014-04-30 19:29 ` Chuck Lever
@ 2014-04-30 19:31     ` Chuck Lever
  -1 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:31 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA

The current CQ handler uses the ib_wc.opcode field to distinguish
between event types. However, the contents of that field are not
reliable if the completion status is not IB_WC_SUCCESS.

When an error completion occurs on a send event, the CQ handler
schedules a tasklet with something that is not a struct rpcrdma_rep.
This is never correct behavior, and sometimes it results in a panic.

To resolve this issue, split the completion queue into a send CQ and
a receive CQ. The send CQ handler now handles only struct rpcrdma_mw
wr_id's, and the receive CQ handler now handles only struct
rpcrdma_rep wr_id's.

Fix suggested by Shirley Ma <shirley.ma-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

Reported-by: Rafael Reiter <rafael.reiter-cv18SyjCLaheoWH0uzbU5w@public.gmane.org>
Fixes: 5c635e09cec0feeeb310968e51dad01040244851
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=73211
Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Tested-by: Klemens Senn <klemens.senn-cv18SyjCLaheoWH0uzbU5w@public.gmane.org>
Tested-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---

 net/sunrpc/xprtrdma/verbs.c     |  228 +++++++++++++++++++++++----------------
 net/sunrpc/xprtrdma/xprt_rdma.h |    1 
 2 files changed, 137 insertions(+), 92 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index edc951e..af2d097 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -142,96 +142,115 @@ rpcrdma_cq_async_error_upcall(struct ib_event *event, void *context)
 	}
 }
 
-static inline
-void rpcrdma_event_process(struct ib_wc *wc)
+static void
+rpcrdma_sendcq_process_wc(struct ib_wc *wc)
 {
-	struct rpcrdma_mw *frmr;
-	struct rpcrdma_rep *rep =
-			(struct rpcrdma_rep *)(unsigned long) wc->wr_id;
+	struct rpcrdma_mw *frmr = (struct rpcrdma_mw *)(unsigned long)wc->wr_id;
 
-	dprintk("RPC:       %s: event rep %p status %X opcode %X length %u\n",
-		__func__, rep, wc->status, wc->opcode, wc->byte_len);
+	dprintk("RPC:       %s: frmr %p status %X opcode %d\n",
+		__func__, frmr, wc->status, wc->opcode);
 
-	if (!rep) /* send completion that we don't care about */
+	if (wc->wr_id == 0ULL)
 		return;
-
-	if (IB_WC_SUCCESS != wc->status) {
-		dprintk("RPC:       %s: WC opcode %d status %X, connection lost\n",
-			__func__, wc->opcode, wc->status);
-		rep->rr_len = ~0U;
-		if (wc->opcode != IB_WC_FAST_REG_MR && wc->opcode != IB_WC_LOCAL_INV)
-			rpcrdma_schedule_tasklet(rep);
+	if (wc->status != IB_WC_SUCCESS)
 		return;
-	}
 
-	switch (wc->opcode) {
-	case IB_WC_FAST_REG_MR:
-		frmr = (struct rpcrdma_mw *)(unsigned long)wc->wr_id;
+	if (wc->opcode == IB_WC_FAST_REG_MR)
 		frmr->r.frmr.state = FRMR_IS_VALID;
-		break;
-	case IB_WC_LOCAL_INV:
-		frmr = (struct rpcrdma_mw *)(unsigned long)wc->wr_id;
+	else if (wc->opcode == IB_WC_LOCAL_INV)
 		frmr->r.frmr.state = FRMR_IS_INVALID;
-		break;
-	case IB_WC_RECV:
-		rep->rr_len = wc->byte_len;
-		ib_dma_sync_single_for_cpu(
-			rdmab_to_ia(rep->rr_buffer)->ri_id->device,
-			rep->rr_iov.addr, rep->rr_len, DMA_FROM_DEVICE);
-		/* Keep (only) the most recent credits, after check validity */
-		if (rep->rr_len >= 16) {
-			struct rpcrdma_msg *p =
-					(struct rpcrdma_msg *) rep->rr_base;
-			unsigned int credits = ntohl(p->rm_credit);
-			if (credits == 0) {
-				dprintk("RPC:       %s: server"
-					" dropped credits to 0!\n", __func__);
-				/* don't deadlock */
-				credits = 1;
-			} else if (credits > rep->rr_buffer->rb_max_requests) {
-				dprintk("RPC:       %s: server"
-					" over-crediting: %d (%d)\n",
-					__func__, credits,
-					rep->rr_buffer->rb_max_requests);
-				credits = rep->rr_buffer->rb_max_requests;
-			}
-			atomic_set(&rep->rr_buffer->rb_credits, credits);
-		}
-		rpcrdma_schedule_tasklet(rep);
-		break;
-	default:
-		dprintk("RPC:       %s: unexpected WC event %X\n",
-			__func__, wc->opcode);
-		break;
-	}
 }
 
-static inline int
-rpcrdma_cq_poll(struct ib_cq *cq)
+static int
+rpcrdma_sendcq_poll(struct ib_cq *cq)
 {
 	struct ib_wc wc;
 	int rc;
 
-	for (;;) {
-		rc = ib_poll_cq(cq, 1, &wc);
-		if (rc < 0) {
-			dprintk("RPC:       %s: ib_poll_cq failed %i\n",
-				__func__, rc);
-			return rc;
-		}
-		if (rc == 0)
-			break;
+	while ((rc = ib_poll_cq(cq, 1, &wc)) == 1)
+		rpcrdma_sendcq_process_wc(&wc);
+	return rc;
+}
 
-		rpcrdma_event_process(&wc);
+/*
+ * Handle send, fast_reg_mr, and local_inv completions.
+ *
+ * Send events are typically suppressed and thus do not result
+ * in an upcall. Occasionally one is signaled, however. This
+ * prevents the provider's completion queue from wrapping and
+ * losing a completion.
+ */
+static void
+rpcrdma_sendcq_upcall(struct ib_cq *cq, void *cq_context)
+{
+	int rc;
+
+	rc = rpcrdma_sendcq_poll(cq);
+	if (rc) {
+		dprintk("RPC:       %s: ib_poll_cq failed: %i\n",
+			__func__, rc);
+		return;
 	}
 
-	return 0;
+	rc = ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
+	if (rc) {
+		dprintk("RPC:       %s: ib_req_notify_cq failed: %i\n",
+			__func__, rc);
+		return;
+	}
+
+	rpcrdma_sendcq_poll(cq);
+}
+
+static void
+rpcrdma_recvcq_process_wc(struct ib_wc *wc)
+{
+	struct rpcrdma_rep *rep =
+			(struct rpcrdma_rep *)(unsigned long)wc->wr_id;
+
+	dprintk("RPC:       %s: rep %p status %X opcode %X length %u\n",
+		__func__, rep, wc->status, wc->opcode, wc->byte_len);
+
+	if (wc->status != IB_WC_SUCCESS) {
+		rep->rr_len = ~0U;
+		goto out_schedule;
+	}
+	if (wc->opcode != IB_WC_RECV)
+		return;
+
+	rep->rr_len = wc->byte_len;
+	ib_dma_sync_single_for_cpu(rdmab_to_ia(rep->rr_buffer)->ri_id->device,
+			rep->rr_iov.addr, rep->rr_len, DMA_FROM_DEVICE);
+
+	if (rep->rr_len >= 16) {
+		struct rpcrdma_msg *p = (struct rpcrdma_msg *)rep->rr_base;
+		unsigned int credits = ntohl(p->rm_credit);
+
+		if (credits == 0)
+			credits = 1;	/* don't deadlock */
+		else if (credits > rep->rr_buffer->rb_max_requests)
+			credits = rep->rr_buffer->rb_max_requests;
+		atomic_set(&rep->rr_buffer->rb_credits, credits);
+	}
+
+out_schedule:
+	rpcrdma_schedule_tasklet(rep);
+}
+
+static int
+rpcrdma_recvcq_poll(struct ib_cq *cq)
+{
+	struct ib_wc wc;
+	int rc;
+
+	while ((rc = ib_poll_cq(cq, 1, &wc)) == 1)
+		rpcrdma_recvcq_process_wc(&wc);
+	return rc;
 }
 
 /*
- * rpcrdma_cq_event_upcall
+ * Handle receive completions.
  *
- * This upcall handles recv and send events.
  * It is reentrant but processes single events in order to maintain
  * ordering of receives to keep server credits.
  *
@@ -240,26 +259,27 @@ rpcrdma_cq_poll(struct ib_cq *cq)
  * connection shutdown. That is, the structures required for
  * the completion of the reply handler must remain intact until
  * all memory has been reclaimed.
- *
- * Note that send events are suppressed and do not result in an upcall.
  */
 static void
-rpcrdma_cq_event_upcall(struct ib_cq *cq, void *context)
+rpcrdma_recvcq_upcall(struct ib_cq *cq, void *cq_context)
 {
 	int rc;
 
-	rc = rpcrdma_cq_poll(cq);
-	if (rc)
+	rc = rpcrdma_recvcq_poll(cq);
+	if (rc) {
+		dprintk("RPC:       %s: ib_poll_cq failed: %i\n",
+			__func__, rc);
 		return;
+	}
 
 	rc = ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
 	if (rc) {
-		dprintk("RPC:       %s: ib_req_notify_cq failed %i\n",
+		dprintk("RPC:       %s: ib_req_notify_cq failed: %i\n",
 			__func__, rc);
 		return;
 	}
 
-	rpcrdma_cq_poll(cq);
+	rpcrdma_recvcq_poll(cq);
 }
 
 #ifdef RPC_DEBUG
@@ -610,6 +630,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 				struct rpcrdma_create_data_internal *cdata)
 {
 	struct ib_device_attr devattr;
+	struct ib_cq *sendcq, *recvcq;
 	int rc, err;
 
 	rc = ib_query_device(ia->ri_id->device, &devattr);
@@ -685,7 +706,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 		ep->rep_attr.cap.max_recv_sge);
 
 	/* set trigger for requesting send completion */
-	ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/2 /*  - 1*/;
+	ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/2 - 1;
 	if (ep->rep_cqinit <= 2)
 		ep->rep_cqinit = 0;
 	INIT_CQCOUNT(ep);
@@ -693,26 +714,43 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 	init_waitqueue_head(&ep->rep_connect_wait);
 	INIT_DELAYED_WORK(&ep->rep_connect_worker, rpcrdma_connect_worker);
 
-	ep->rep_cq = ib_create_cq(ia->ri_id->device, rpcrdma_cq_event_upcall,
+	sendcq = ib_create_cq(ia->ri_id->device, rpcrdma_sendcq_upcall,
 				  rpcrdma_cq_async_error_upcall, NULL,
-				  ep->rep_attr.cap.max_recv_wr +
 				  ep->rep_attr.cap.max_send_wr + 1, 0);
-	if (IS_ERR(ep->rep_cq)) {
-		rc = PTR_ERR(ep->rep_cq);
-		dprintk("RPC:       %s: ib_create_cq failed: %i\n",
+	if (IS_ERR(sendcq)) {
+		rc = PTR_ERR(sendcq);
+		dprintk("RPC:       %s: failed to create send CQ: %i\n",
 			__func__, rc);
 		goto out1;
 	}
 
-	rc = ib_req_notify_cq(ep->rep_cq, IB_CQ_NEXT_COMP);
+	rc = ib_req_notify_cq(sendcq, IB_CQ_NEXT_COMP);
 	if (rc) {
 		dprintk("RPC:       %s: ib_req_notify_cq failed: %i\n",
 			__func__, rc);
 		goto out2;
 	}
 
-	ep->rep_attr.send_cq = ep->rep_cq;
-	ep->rep_attr.recv_cq = ep->rep_cq;
+	recvcq = ib_create_cq(ia->ri_id->device, rpcrdma_recvcq_upcall,
+				  rpcrdma_cq_async_error_upcall, NULL,
+				  ep->rep_attr.cap.max_recv_wr + 1, 0);
+	if (IS_ERR(recvcq)) {
+		rc = PTR_ERR(recvcq);
+		dprintk("RPC:       %s: failed to create recv CQ: %i\n",
+			__func__, rc);
+		goto out2;
+	}
+
+	rc = ib_req_notify_cq(recvcq, IB_CQ_NEXT_COMP);
+	if (rc) {
+		dprintk("RPC:       %s: ib_req_notify_cq failed: %i\n",
+			__func__, rc);
+		ib_destroy_cq(recvcq);
+		goto out2;
+	}
+
+	ep->rep_attr.send_cq = sendcq;
+	ep->rep_attr.recv_cq = recvcq;
 
 	/* Initialize cma parameters */
 
@@ -734,7 +772,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 	return 0;
 
 out2:
-	err = ib_destroy_cq(ep->rep_cq);
+	err = ib_destroy_cq(sendcq);
 	if (err)
 		dprintk("RPC:       %s: ib_destroy_cq returned %i\n",
 			__func__, err);
@@ -774,8 +812,14 @@ rpcrdma_ep_destroy(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
 		ep->rep_pad_mr = NULL;
 	}
 
-	rpcrdma_clean_cq(ep->rep_cq);
-	rc = ib_destroy_cq(ep->rep_cq);
+	rpcrdma_clean_cq(ep->rep_attr.recv_cq);
+	rc = ib_destroy_cq(ep->rep_attr.recv_cq);
+	if (rc)
+		dprintk("RPC:       %s: ib_destroy_cq returned %i\n",
+			__func__, rc);
+
+	rpcrdma_clean_cq(ep->rep_attr.send_cq);
+	rc = ib_destroy_cq(ep->rep_attr.send_cq);
 	if (rc)
 		dprintk("RPC:       %s: ib_destroy_cq returned %i\n",
 			__func__, rc);
@@ -798,7 +842,9 @@ retry:
 		if (rc && rc != -ENOTCONN)
 			dprintk("RPC:       %s: rpcrdma_ep_disconnect"
 				" status %i\n", __func__, rc);
-		rpcrdma_clean_cq(ep->rep_cq);
+
+		rpcrdma_clean_cq(ep->rep_attr.recv_cq);
+		rpcrdma_clean_cq(ep->rep_attr.send_cq);
 
 		xprt = container_of(ia, struct rpcrdma_xprt, rx_ia);
 		id = rpcrdma_create_id(xprt, ia,
@@ -907,7 +953,8 @@ rpcrdma_ep_disconnect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
 {
 	int rc;
 
-	rpcrdma_clean_cq(ep->rep_cq);
+	rpcrdma_clean_cq(ep->rep_attr.recv_cq);
+	rpcrdma_clean_cq(ep->rep_attr.send_cq);
 	rc = rdma_disconnect(ia->ri_id);
 	if (!rc) {
 		/* returns without wait if not connected */
@@ -1727,7 +1774,6 @@ rpcrdma_ep_post_recv(struct rpcrdma_ia *ia,
 	ib_dma_sync_single_for_cpu(ia->ri_id->device,
 		rep->rr_iov.addr, rep->rr_iov.length, DMA_BIDIRECTIONAL);
 
-	DECR_CQCOUNT(ep);
 	rc = ib_post_recv(ia->ri_id->qp, &recv_wr, &recv_wr_fail);
 
 	if (rc)
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 362a19d..334ab6e 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -79,7 +79,6 @@ struct rpcrdma_ep {
 	int			rep_cqinit;
 	int			rep_connected;
 	struct rpcrdma_ia	*rep_ia;
-	struct ib_cq		*rep_cq;
 	struct ib_qp_init_attr	rep_attr;
 	wait_queue_head_t 	rep_connect_wait;
 	struct ib_sge		rep_pad;	/* holds zeroed pad */

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 11/17] xprtrdma: Split the completion queue
@ 2014-04-30 19:31     ` Chuck Lever
  0 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:31 UTC (permalink / raw)
  To: linux-nfs, linux-rdma; +Cc: Anna.Schumaker

The current CQ handler uses the ib_wc.opcode field to distinguish
between event types. However, the contents of that field are not
reliable if the completion status is not IB_WC_SUCCESS.

When an error completion occurs on a send event, the CQ handler
schedules a tasklet with something that is not a struct rpcrdma_rep.
This is never correct behavior, and sometimes it results in a panic.

To resolve this issue, split the completion queue into a send CQ and
a receive CQ. The send CQ handler now handles only struct rpcrdma_mw
wr_id's, and the receive CQ handler now handles only struct
rpcrdma_rep wr_id's.

Fix suggested by Shirley Ma <shirley.ma@oracle.com>

Reported-by: Rafael Reiter <rafael.reiter@ims.co.at>
Fixes: 5c635e09cec0feeeb310968e51dad01040244851
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=73211
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Klemens Senn <klemens.senn@ims.co.at>
Tested-by: Steve Wise <swise@opengridcomputing.com>
---

 net/sunrpc/xprtrdma/verbs.c     |  228 +++++++++++++++++++++++----------------
 net/sunrpc/xprtrdma/xprt_rdma.h |    1 
 2 files changed, 137 insertions(+), 92 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index edc951e..af2d097 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -142,96 +142,115 @@ rpcrdma_cq_async_error_upcall(struct ib_event *event, void *context)
 	}
 }
 
-static inline
-void rpcrdma_event_process(struct ib_wc *wc)
+static void
+rpcrdma_sendcq_process_wc(struct ib_wc *wc)
 {
-	struct rpcrdma_mw *frmr;
-	struct rpcrdma_rep *rep =
-			(struct rpcrdma_rep *)(unsigned long) wc->wr_id;
+	struct rpcrdma_mw *frmr = (struct rpcrdma_mw *)(unsigned long)wc->wr_id;
 
-	dprintk("RPC:       %s: event rep %p status %X opcode %X length %u\n",
-		__func__, rep, wc->status, wc->opcode, wc->byte_len);
+	dprintk("RPC:       %s: frmr %p status %X opcode %d\n",
+		__func__, frmr, wc->status, wc->opcode);
 
-	if (!rep) /* send completion that we don't care about */
+	if (wc->wr_id == 0ULL)
 		return;
-
-	if (IB_WC_SUCCESS != wc->status) {
-		dprintk("RPC:       %s: WC opcode %d status %X, connection lost\n",
-			__func__, wc->opcode, wc->status);
-		rep->rr_len = ~0U;
-		if (wc->opcode != IB_WC_FAST_REG_MR && wc->opcode != IB_WC_LOCAL_INV)
-			rpcrdma_schedule_tasklet(rep);
+	if (wc->status != IB_WC_SUCCESS)
 		return;
-	}
 
-	switch (wc->opcode) {
-	case IB_WC_FAST_REG_MR:
-		frmr = (struct rpcrdma_mw *)(unsigned long)wc->wr_id;
+	if (wc->opcode == IB_WC_FAST_REG_MR)
 		frmr->r.frmr.state = FRMR_IS_VALID;
-		break;
-	case IB_WC_LOCAL_INV:
-		frmr = (struct rpcrdma_mw *)(unsigned long)wc->wr_id;
+	else if (wc->opcode == IB_WC_LOCAL_INV)
 		frmr->r.frmr.state = FRMR_IS_INVALID;
-		break;
-	case IB_WC_RECV:
-		rep->rr_len = wc->byte_len;
-		ib_dma_sync_single_for_cpu(
-			rdmab_to_ia(rep->rr_buffer)->ri_id->device,
-			rep->rr_iov.addr, rep->rr_len, DMA_FROM_DEVICE);
-		/* Keep (only) the most recent credits, after check validity */
-		if (rep->rr_len >= 16) {
-			struct rpcrdma_msg *p =
-					(struct rpcrdma_msg *) rep->rr_base;
-			unsigned int credits = ntohl(p->rm_credit);
-			if (credits == 0) {
-				dprintk("RPC:       %s: server"
-					" dropped credits to 0!\n", __func__);
-				/* don't deadlock */
-				credits = 1;
-			} else if (credits > rep->rr_buffer->rb_max_requests) {
-				dprintk("RPC:       %s: server"
-					" over-crediting: %d (%d)\n",
-					__func__, credits,
-					rep->rr_buffer->rb_max_requests);
-				credits = rep->rr_buffer->rb_max_requests;
-			}
-			atomic_set(&rep->rr_buffer->rb_credits, credits);
-		}
-		rpcrdma_schedule_tasklet(rep);
-		break;
-	default:
-		dprintk("RPC:       %s: unexpected WC event %X\n",
-			__func__, wc->opcode);
-		break;
-	}
 }
 
-static inline int
-rpcrdma_cq_poll(struct ib_cq *cq)
+static int
+rpcrdma_sendcq_poll(struct ib_cq *cq)
 {
 	struct ib_wc wc;
 	int rc;
 
-	for (;;) {
-		rc = ib_poll_cq(cq, 1, &wc);
-		if (rc < 0) {
-			dprintk("RPC:       %s: ib_poll_cq failed %i\n",
-				__func__, rc);
-			return rc;
-		}
-		if (rc == 0)
-			break;
+	while ((rc = ib_poll_cq(cq, 1, &wc)) == 1)
+		rpcrdma_sendcq_process_wc(&wc);
+	return rc;
+}
 
-		rpcrdma_event_process(&wc);
+/*
+ * Handle send, fast_reg_mr, and local_inv completions.
+ *
+ * Send events are typically suppressed and thus do not result
+ * in an upcall. Occasionally one is signaled, however. This
+ * prevents the provider's completion queue from wrapping and
+ * losing a completion.
+ */
+static void
+rpcrdma_sendcq_upcall(struct ib_cq *cq, void *cq_context)
+{
+	int rc;
+
+	rc = rpcrdma_sendcq_poll(cq);
+	if (rc) {
+		dprintk("RPC:       %s: ib_poll_cq failed: %i\n",
+			__func__, rc);
+		return;
 	}
 
-	return 0;
+	rc = ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
+	if (rc) {
+		dprintk("RPC:       %s: ib_req_notify_cq failed: %i\n",
+			__func__, rc);
+		return;
+	}
+
+	rpcrdma_sendcq_poll(cq);
+}
+
+static void
+rpcrdma_recvcq_process_wc(struct ib_wc *wc)
+{
+	struct rpcrdma_rep *rep =
+			(struct rpcrdma_rep *)(unsigned long)wc->wr_id;
+
+	dprintk("RPC:       %s: rep %p status %X opcode %X length %u\n",
+		__func__, rep, wc->status, wc->opcode, wc->byte_len);
+
+	if (wc->status != IB_WC_SUCCESS) {
+		rep->rr_len = ~0U;
+		goto out_schedule;
+	}
+	if (wc->opcode != IB_WC_RECV)
+		return;
+
+	rep->rr_len = wc->byte_len;
+	ib_dma_sync_single_for_cpu(rdmab_to_ia(rep->rr_buffer)->ri_id->device,
+			rep->rr_iov.addr, rep->rr_len, DMA_FROM_DEVICE);
+
+	if (rep->rr_len >= 16) {
+		struct rpcrdma_msg *p = (struct rpcrdma_msg *)rep->rr_base;
+		unsigned int credits = ntohl(p->rm_credit);
+
+		if (credits == 0)
+			credits = 1;	/* don't deadlock */
+		else if (credits > rep->rr_buffer->rb_max_requests)
+			credits = rep->rr_buffer->rb_max_requests;
+		atomic_set(&rep->rr_buffer->rb_credits, credits);
+	}
+
+out_schedule:
+	rpcrdma_schedule_tasklet(rep);
+}
+
+static int
+rpcrdma_recvcq_poll(struct ib_cq *cq)
+{
+	struct ib_wc wc;
+	int rc;
+
+	while ((rc = ib_poll_cq(cq, 1, &wc)) == 1)
+		rpcrdma_recvcq_process_wc(&wc);
+	return rc;
 }
 
 /*
- * rpcrdma_cq_event_upcall
+ * Handle receive completions.
  *
- * This upcall handles recv and send events.
  * It is reentrant but processes single events in order to maintain
  * ordering of receives to keep server credits.
  *
@@ -240,26 +259,27 @@ rpcrdma_cq_poll(struct ib_cq *cq)
  * connection shutdown. That is, the structures required for
  * the completion of the reply handler must remain intact until
  * all memory has been reclaimed.
- *
- * Note that send events are suppressed and do not result in an upcall.
  */
 static void
-rpcrdma_cq_event_upcall(struct ib_cq *cq, void *context)
+rpcrdma_recvcq_upcall(struct ib_cq *cq, void *cq_context)
 {
 	int rc;
 
-	rc = rpcrdma_cq_poll(cq);
-	if (rc)
+	rc = rpcrdma_recvcq_poll(cq);
+	if (rc) {
+		dprintk("RPC:       %s: ib_poll_cq failed: %i\n",
+			__func__, rc);
 		return;
+	}
 
 	rc = ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
 	if (rc) {
-		dprintk("RPC:       %s: ib_req_notify_cq failed %i\n",
+		dprintk("RPC:       %s: ib_req_notify_cq failed: %i\n",
 			__func__, rc);
 		return;
 	}
 
-	rpcrdma_cq_poll(cq);
+	rpcrdma_recvcq_poll(cq);
 }
 
 #ifdef RPC_DEBUG
@@ -610,6 +630,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 				struct rpcrdma_create_data_internal *cdata)
 {
 	struct ib_device_attr devattr;
+	struct ib_cq *sendcq, *recvcq;
 	int rc, err;
 
 	rc = ib_query_device(ia->ri_id->device, &devattr);
@@ -685,7 +706,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 		ep->rep_attr.cap.max_recv_sge);
 
 	/* set trigger for requesting send completion */
-	ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/2 /*  - 1*/;
+	ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/2 - 1;
 	if (ep->rep_cqinit <= 2)
 		ep->rep_cqinit = 0;
 	INIT_CQCOUNT(ep);
@@ -693,26 +714,43 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 	init_waitqueue_head(&ep->rep_connect_wait);
 	INIT_DELAYED_WORK(&ep->rep_connect_worker, rpcrdma_connect_worker);
 
-	ep->rep_cq = ib_create_cq(ia->ri_id->device, rpcrdma_cq_event_upcall,
+	sendcq = ib_create_cq(ia->ri_id->device, rpcrdma_sendcq_upcall,
 				  rpcrdma_cq_async_error_upcall, NULL,
-				  ep->rep_attr.cap.max_recv_wr +
 				  ep->rep_attr.cap.max_send_wr + 1, 0);
-	if (IS_ERR(ep->rep_cq)) {
-		rc = PTR_ERR(ep->rep_cq);
-		dprintk("RPC:       %s: ib_create_cq failed: %i\n",
+	if (IS_ERR(sendcq)) {
+		rc = PTR_ERR(sendcq);
+		dprintk("RPC:       %s: failed to create send CQ: %i\n",
 			__func__, rc);
 		goto out1;
 	}
 
-	rc = ib_req_notify_cq(ep->rep_cq, IB_CQ_NEXT_COMP);
+	rc = ib_req_notify_cq(sendcq, IB_CQ_NEXT_COMP);
 	if (rc) {
 		dprintk("RPC:       %s: ib_req_notify_cq failed: %i\n",
 			__func__, rc);
 		goto out2;
 	}
 
-	ep->rep_attr.send_cq = ep->rep_cq;
-	ep->rep_attr.recv_cq = ep->rep_cq;
+	recvcq = ib_create_cq(ia->ri_id->device, rpcrdma_recvcq_upcall,
+				  rpcrdma_cq_async_error_upcall, NULL,
+				  ep->rep_attr.cap.max_recv_wr + 1, 0);
+	if (IS_ERR(recvcq)) {
+		rc = PTR_ERR(recvcq);
+		dprintk("RPC:       %s: failed to create recv CQ: %i\n",
+			__func__, rc);
+		goto out2;
+	}
+
+	rc = ib_req_notify_cq(recvcq, IB_CQ_NEXT_COMP);
+	if (rc) {
+		dprintk("RPC:       %s: ib_req_notify_cq failed: %i\n",
+			__func__, rc);
+		ib_destroy_cq(recvcq);
+		goto out2;
+	}
+
+	ep->rep_attr.send_cq = sendcq;
+	ep->rep_attr.recv_cq = recvcq;
 
 	/* Initialize cma parameters */
 
@@ -734,7 +772,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 	return 0;
 
 out2:
-	err = ib_destroy_cq(ep->rep_cq);
+	err = ib_destroy_cq(sendcq);
 	if (err)
 		dprintk("RPC:       %s: ib_destroy_cq returned %i\n",
 			__func__, err);
@@ -774,8 +812,14 @@ rpcrdma_ep_destroy(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
 		ep->rep_pad_mr = NULL;
 	}
 
-	rpcrdma_clean_cq(ep->rep_cq);
-	rc = ib_destroy_cq(ep->rep_cq);
+	rpcrdma_clean_cq(ep->rep_attr.recv_cq);
+	rc = ib_destroy_cq(ep->rep_attr.recv_cq);
+	if (rc)
+		dprintk("RPC:       %s: ib_destroy_cq returned %i\n",
+			__func__, rc);
+
+	rpcrdma_clean_cq(ep->rep_attr.send_cq);
+	rc = ib_destroy_cq(ep->rep_attr.send_cq);
 	if (rc)
 		dprintk("RPC:       %s: ib_destroy_cq returned %i\n",
 			__func__, rc);
@@ -798,7 +842,9 @@ retry:
 		if (rc && rc != -ENOTCONN)
 			dprintk("RPC:       %s: rpcrdma_ep_disconnect"
 				" status %i\n", __func__, rc);
-		rpcrdma_clean_cq(ep->rep_cq);
+
+		rpcrdma_clean_cq(ep->rep_attr.recv_cq);
+		rpcrdma_clean_cq(ep->rep_attr.send_cq);
 
 		xprt = container_of(ia, struct rpcrdma_xprt, rx_ia);
 		id = rpcrdma_create_id(xprt, ia,
@@ -907,7 +953,8 @@ rpcrdma_ep_disconnect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
 {
 	int rc;
 
-	rpcrdma_clean_cq(ep->rep_cq);
+	rpcrdma_clean_cq(ep->rep_attr.recv_cq);
+	rpcrdma_clean_cq(ep->rep_attr.send_cq);
 	rc = rdma_disconnect(ia->ri_id);
 	if (!rc) {
 		/* returns without wait if not connected */
@@ -1727,7 +1774,6 @@ rpcrdma_ep_post_recv(struct rpcrdma_ia *ia,
 	ib_dma_sync_single_for_cpu(ia->ri_id->device,
 		rep->rr_iov.addr, rep->rr_iov.length, DMA_BIDIRECTIONAL);
 
-	DECR_CQCOUNT(ep);
 	rc = ib_post_recv(ia->ri_id->qp, &recv_wr, &recv_wr_fail);
 
 	if (rc)
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 362a19d..334ab6e 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -79,7 +79,6 @@ struct rpcrdma_ep {
 	int			rep_cqinit;
 	int			rep_connected;
 	struct rpcrdma_ia	*rep_ia;
-	struct ib_cq		*rep_cq;
 	struct ib_qp_init_attr	rep_attr;
 	wait_queue_head_t 	rep_connect_wait;
 	struct ib_sge		rep_pad;	/* holds zeroed pad */


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 12/17] xprtrmda: Reduce lock contention in completion handlers
  2014-04-30 19:29 ` Chuck Lever
@ 2014-04-30 19:31     ` Chuck Lever
  -1 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:31 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA

Skip the ib_poll_cq() after re-arming, if the provider knows there
are no additional items waiting. (Have a look at commit ed23a727 for
more details).

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---

 net/sunrpc/xprtrdma/verbs.c |   14 ++++++++++----
 1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index af2d097..c7d5281 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -192,8 +192,11 @@ rpcrdma_sendcq_upcall(struct ib_cq *cq, void *cq_context)
 		return;
 	}
 
-	rc = ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
-	if (rc) {
+	rc = ib_req_notify_cq(cq,
+			IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS);
+	if (rc == 0)
+		return;
+	if (rc < 0) {
 		dprintk("RPC:       %s: ib_req_notify_cq failed: %i\n",
 			__func__, rc);
 		return;
@@ -272,8 +275,11 @@ rpcrdma_recvcq_upcall(struct ib_cq *cq, void *cq_context)
 		return;
 	}
 
-	rc = ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
-	if (rc) {
+	rc = ib_req_notify_cq(cq,
+			IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS);
+	if (rc == 0)
+		return;
+	if (rc < 0) {
 		dprintk("RPC:       %s: ib_req_notify_cq failed: %i\n",
 			__func__, rc);
 		return;

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 12/17] xprtrmda: Reduce lock contention in completion handlers
@ 2014-04-30 19:31     ` Chuck Lever
  0 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:31 UTC (permalink / raw)
  To: linux-nfs, linux-rdma; +Cc: Anna.Schumaker

Skip the ib_poll_cq() after re-arming, if the provider knows there
are no additional items waiting. (Have a look at commit ed23a727 for
more details).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 net/sunrpc/xprtrdma/verbs.c |   14 ++++++++++----
 1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index af2d097..c7d5281 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -192,8 +192,11 @@ rpcrdma_sendcq_upcall(struct ib_cq *cq, void *cq_context)
 		return;
 	}
 
-	rc = ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
-	if (rc) {
+	rc = ib_req_notify_cq(cq,
+			IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS);
+	if (rc == 0)
+		return;
+	if (rc < 0) {
 		dprintk("RPC:       %s: ib_req_notify_cq failed: %i\n",
 			__func__, rc);
 		return;
@@ -272,8 +275,11 @@ rpcrdma_recvcq_upcall(struct ib_cq *cq, void *cq_context)
 		return;
 	}
 
-	rc = ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
-	if (rc) {
+	rc = ib_req_notify_cq(cq,
+			IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS);
+	if (rc == 0)
+		return;
+	if (rc < 0) {
 		dprintk("RPC:       %s: ib_req_notify_cq failed: %i\n",
 			__func__, rc);
 		return;


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 13/17] xprtrmda: Reduce calls to ib_poll_cq() in completion handlers
  2014-04-30 19:29 ` Chuck Lever
@ 2014-04-30 19:31     ` Chuck Lever
  -1 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:31 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA

Change the completion handlers to grab up to 16 items per
ib_poll_cq() call. No extra ib_poll_cq() is needed if fewer than 16
items are returned.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---

 net/sunrpc/xprtrdma/verbs.c     |   56 ++++++++++++++++++++++++++-------------
 net/sunrpc/xprtrdma/xprt_rdma.h |    4 +++
 2 files changed, 42 insertions(+), 18 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index c7d5281..b8caee9 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -162,14 +162,23 @@ rpcrdma_sendcq_process_wc(struct ib_wc *wc)
 }
 
 static int
-rpcrdma_sendcq_poll(struct ib_cq *cq)
+rpcrdma_sendcq_poll(struct ib_cq *cq, struct rpcrdma_ep *ep)
 {
-	struct ib_wc wc;
-	int rc;
+	struct ib_wc *wcs;
+	int count, rc;
 
-	while ((rc = ib_poll_cq(cq, 1, &wc)) == 1)
-		rpcrdma_sendcq_process_wc(&wc);
-	return rc;
+	do {
+		wcs = ep->rep_send_wcs;
+
+		rc = ib_poll_cq(cq, RPCRDMA_POLLSIZE, wcs);
+		if (rc <= 0)
+			return rc;
+
+		count = rc;
+		while (count-- > 0)
+			rpcrdma_sendcq_process_wc(wcs++);
+	} while (rc == RPCRDMA_POLLSIZE);
+	return 0;
 }
 
 /*
@@ -183,9 +192,10 @@ rpcrdma_sendcq_poll(struct ib_cq *cq)
 static void
 rpcrdma_sendcq_upcall(struct ib_cq *cq, void *cq_context)
 {
+	struct rpcrdma_ep *ep = (struct rpcrdma_ep *)cq_context;
 	int rc;
 
-	rc = rpcrdma_sendcq_poll(cq);
+	rc = rpcrdma_sendcq_poll(cq, ep);
 	if (rc) {
 		dprintk("RPC:       %s: ib_poll_cq failed: %i\n",
 			__func__, rc);
@@ -202,7 +212,7 @@ rpcrdma_sendcq_upcall(struct ib_cq *cq, void *cq_context)
 		return;
 	}
 
-	rpcrdma_sendcq_poll(cq);
+	rpcrdma_sendcq_poll(cq, ep);
 }
 
 static void
@@ -241,14 +251,23 @@ out_schedule:
 }
 
 static int
-rpcrdma_recvcq_poll(struct ib_cq *cq)
+rpcrdma_recvcq_poll(struct ib_cq *cq, struct rpcrdma_ep *ep)
 {
-	struct ib_wc wc;
-	int rc;
+	struct ib_wc *wcs;
+	int count, rc;
 
-	while ((rc = ib_poll_cq(cq, 1, &wc)) == 1)
-		rpcrdma_recvcq_process_wc(&wc);
-	return rc;
+	do {
+		wcs = ep->rep_recv_wcs;
+
+		rc = ib_poll_cq(cq, RPCRDMA_POLLSIZE, wcs);
+		if (rc <= 0)
+			return rc;
+
+		count = rc;
+		while (count-- > 0)
+			rpcrdma_recvcq_process_wc(wcs++);
+	} while (rc == RPCRDMA_POLLSIZE);
+	return 0;
 }
 
 /*
@@ -266,9 +285,10 @@ rpcrdma_recvcq_poll(struct ib_cq *cq)
 static void
 rpcrdma_recvcq_upcall(struct ib_cq *cq, void *cq_context)
 {
+	struct rpcrdma_ep *ep = (struct rpcrdma_ep *)cq_context;
 	int rc;
 
-	rc = rpcrdma_recvcq_poll(cq);
+	rc = rpcrdma_recvcq_poll(cq, ep);
 	if (rc) {
 		dprintk("RPC:       %s: ib_poll_cq failed: %i\n",
 			__func__, rc);
@@ -285,7 +305,7 @@ rpcrdma_recvcq_upcall(struct ib_cq *cq, void *cq_context)
 		return;
 	}
 
-	rpcrdma_recvcq_poll(cq);
+	rpcrdma_recvcq_poll(cq, ep);
 }
 
 #ifdef RPC_DEBUG
@@ -721,7 +741,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 	INIT_DELAYED_WORK(&ep->rep_connect_worker, rpcrdma_connect_worker);
 
 	sendcq = ib_create_cq(ia->ri_id->device, rpcrdma_sendcq_upcall,
-				  rpcrdma_cq_async_error_upcall, NULL,
+				  rpcrdma_cq_async_error_upcall, ep,
 				  ep->rep_attr.cap.max_send_wr + 1, 0);
 	if (IS_ERR(sendcq)) {
 		rc = PTR_ERR(sendcq);
@@ -738,7 +758,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 	}
 
 	recvcq = ib_create_cq(ia->ri_id->device, rpcrdma_recvcq_upcall,
-				  rpcrdma_cq_async_error_upcall, NULL,
+				  rpcrdma_cq_async_error_upcall, ep,
 				  ep->rep_attr.cap.max_recv_wr + 1, 0);
 	if (IS_ERR(recvcq)) {
 		rc = PTR_ERR(recvcq);
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 334ab6e..cb4c882 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -74,6 +74,8 @@ struct rpcrdma_ia {
  * RDMA Endpoint -- one per transport instance
  */
 
+#define RPCRDMA_POLLSIZE	(16)
+
 struct rpcrdma_ep {
 	atomic_t		rep_cqcount;
 	int			rep_cqinit;
@@ -88,6 +90,8 @@ struct rpcrdma_ep {
 	struct rdma_conn_param	rep_remote_cma;
 	struct sockaddr_storage	rep_remote_addr;
 	struct delayed_work	rep_connect_worker;
+	struct ib_wc		rep_send_wcs[RPCRDMA_POLLSIZE];
+	struct ib_wc		rep_recv_wcs[RPCRDMA_POLLSIZE];
 };
 
 #define INIT_CQCOUNT(ep) atomic_set(&(ep)->rep_cqcount, (ep)->rep_cqinit)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 13/17] xprtrmda: Reduce calls to ib_poll_cq() in completion handlers
@ 2014-04-30 19:31     ` Chuck Lever
  0 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:31 UTC (permalink / raw)
  To: linux-nfs, linux-rdma; +Cc: Anna.Schumaker

Change the completion handlers to grab up to 16 items per
ib_poll_cq() call. No extra ib_poll_cq() is needed if fewer than 16
items are returned.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 net/sunrpc/xprtrdma/verbs.c     |   56 ++++++++++++++++++++++++++-------------
 net/sunrpc/xprtrdma/xprt_rdma.h |    4 +++
 2 files changed, 42 insertions(+), 18 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index c7d5281..b8caee9 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -162,14 +162,23 @@ rpcrdma_sendcq_process_wc(struct ib_wc *wc)
 }
 
 static int
-rpcrdma_sendcq_poll(struct ib_cq *cq)
+rpcrdma_sendcq_poll(struct ib_cq *cq, struct rpcrdma_ep *ep)
 {
-	struct ib_wc wc;
-	int rc;
+	struct ib_wc *wcs;
+	int count, rc;
 
-	while ((rc = ib_poll_cq(cq, 1, &wc)) == 1)
-		rpcrdma_sendcq_process_wc(&wc);
-	return rc;
+	do {
+		wcs = ep->rep_send_wcs;
+
+		rc = ib_poll_cq(cq, RPCRDMA_POLLSIZE, wcs);
+		if (rc <= 0)
+			return rc;
+
+		count = rc;
+		while (count-- > 0)
+			rpcrdma_sendcq_process_wc(wcs++);
+	} while (rc == RPCRDMA_POLLSIZE);
+	return 0;
 }
 
 /*
@@ -183,9 +192,10 @@ rpcrdma_sendcq_poll(struct ib_cq *cq)
 static void
 rpcrdma_sendcq_upcall(struct ib_cq *cq, void *cq_context)
 {
+	struct rpcrdma_ep *ep = (struct rpcrdma_ep *)cq_context;
 	int rc;
 
-	rc = rpcrdma_sendcq_poll(cq);
+	rc = rpcrdma_sendcq_poll(cq, ep);
 	if (rc) {
 		dprintk("RPC:       %s: ib_poll_cq failed: %i\n",
 			__func__, rc);
@@ -202,7 +212,7 @@ rpcrdma_sendcq_upcall(struct ib_cq *cq, void *cq_context)
 		return;
 	}
 
-	rpcrdma_sendcq_poll(cq);
+	rpcrdma_sendcq_poll(cq, ep);
 }
 
 static void
@@ -241,14 +251,23 @@ out_schedule:
 }
 
 static int
-rpcrdma_recvcq_poll(struct ib_cq *cq)
+rpcrdma_recvcq_poll(struct ib_cq *cq, struct rpcrdma_ep *ep)
 {
-	struct ib_wc wc;
-	int rc;
+	struct ib_wc *wcs;
+	int count, rc;
 
-	while ((rc = ib_poll_cq(cq, 1, &wc)) == 1)
-		rpcrdma_recvcq_process_wc(&wc);
-	return rc;
+	do {
+		wcs = ep->rep_recv_wcs;
+
+		rc = ib_poll_cq(cq, RPCRDMA_POLLSIZE, wcs);
+		if (rc <= 0)
+			return rc;
+
+		count = rc;
+		while (count-- > 0)
+			rpcrdma_recvcq_process_wc(wcs++);
+	} while (rc == RPCRDMA_POLLSIZE);
+	return 0;
 }
 
 /*
@@ -266,9 +285,10 @@ rpcrdma_recvcq_poll(struct ib_cq *cq)
 static void
 rpcrdma_recvcq_upcall(struct ib_cq *cq, void *cq_context)
 {
+	struct rpcrdma_ep *ep = (struct rpcrdma_ep *)cq_context;
 	int rc;
 
-	rc = rpcrdma_recvcq_poll(cq);
+	rc = rpcrdma_recvcq_poll(cq, ep);
 	if (rc) {
 		dprintk("RPC:       %s: ib_poll_cq failed: %i\n",
 			__func__, rc);
@@ -285,7 +305,7 @@ rpcrdma_recvcq_upcall(struct ib_cq *cq, void *cq_context)
 		return;
 	}
 
-	rpcrdma_recvcq_poll(cq);
+	rpcrdma_recvcq_poll(cq, ep);
 }
 
 #ifdef RPC_DEBUG
@@ -721,7 +741,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 	INIT_DELAYED_WORK(&ep->rep_connect_worker, rpcrdma_connect_worker);
 
 	sendcq = ib_create_cq(ia->ri_id->device, rpcrdma_sendcq_upcall,
-				  rpcrdma_cq_async_error_upcall, NULL,
+				  rpcrdma_cq_async_error_upcall, ep,
 				  ep->rep_attr.cap.max_send_wr + 1, 0);
 	if (IS_ERR(sendcq)) {
 		rc = PTR_ERR(sendcq);
@@ -738,7 +758,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 	}
 
 	recvcq = ib_create_cq(ia->ri_id->device, rpcrdma_recvcq_upcall,
-				  rpcrdma_cq_async_error_upcall, NULL,
+				  rpcrdma_cq_async_error_upcall, ep,
 				  ep->rep_attr.cap.max_recv_wr + 1, 0);
 	if (IS_ERR(recvcq)) {
 		rc = PTR_ERR(recvcq);
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 334ab6e..cb4c882 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -74,6 +74,8 @@ struct rpcrdma_ia {
  * RDMA Endpoint -- one per transport instance
  */
 
+#define RPCRDMA_POLLSIZE	(16)
+
 struct rpcrdma_ep {
 	atomic_t		rep_cqcount;
 	int			rep_cqinit;
@@ -88,6 +90,8 @@ struct rpcrdma_ep {
 	struct rdma_conn_param	rep_remote_cma;
 	struct sockaddr_storage	rep_remote_addr;
 	struct delayed_work	rep_connect_worker;
+	struct ib_wc		rep_send_wcs[RPCRDMA_POLLSIZE];
+	struct ib_wc		rep_recv_wcs[RPCRDMA_POLLSIZE];
 };
 
 #define INIT_CQCOUNT(ep) atomic_set(&(ep)->rep_cqcount, (ep)->rep_cqinit)


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 14/17] xprtrdma: Limit work done by completion handler
  2014-04-30 19:29 ` Chuck Lever
@ 2014-04-30 19:31     ` Chuck Lever
  -1 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:31 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA

Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> points out that a steady
stream of CQ events could starve other work because of the boundless
loop pooling in rpcrdma_{send,recv}_poll().

Instead of a (potentially infinite) while loop, return after
collecting a budgeted number of completions.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Acked-by: Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
---

 net/sunrpc/xprtrdma/verbs.c     |   10 ++++++----
 net/sunrpc/xprtrdma/xprt_rdma.h |    1 +
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index b8caee9..1d08366 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -165,8 +165,9 @@ static int
 rpcrdma_sendcq_poll(struct ib_cq *cq, struct rpcrdma_ep *ep)
 {
 	struct ib_wc *wcs;
-	int count, rc;
+	int budget, count, rc;
 
+	budget = RPCRDMA_WC_BUDGET / RPCRDMA_POLLSIZE;
 	do {
 		wcs = ep->rep_send_wcs;
 
@@ -177,7 +178,7 @@ rpcrdma_sendcq_poll(struct ib_cq *cq, struct rpcrdma_ep *ep)
 		count = rc;
 		while (count-- > 0)
 			rpcrdma_sendcq_process_wc(wcs++);
-	} while (rc == RPCRDMA_POLLSIZE);
+	} while (rc == RPCRDMA_POLLSIZE && --budget);
 	return 0;
 }
 
@@ -254,8 +255,9 @@ static int
 rpcrdma_recvcq_poll(struct ib_cq *cq, struct rpcrdma_ep *ep)
 {
 	struct ib_wc *wcs;
-	int count, rc;
+	int budget, count, rc;
 
+	budget = RPCRDMA_WC_BUDGET / RPCRDMA_POLLSIZE;
 	do {
 		wcs = ep->rep_recv_wcs;
 
@@ -266,7 +268,7 @@ rpcrdma_recvcq_poll(struct ib_cq *cq, struct rpcrdma_ep *ep)
 		count = rc;
 		while (count-- > 0)
 			rpcrdma_recvcq_process_wc(wcs++);
-	} while (rc == RPCRDMA_POLLSIZE);
+	} while (rc == RPCRDMA_POLLSIZE && --budget);
 	return 0;
 }
 
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index cb4c882..0c3b88e 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -74,6 +74,7 @@ struct rpcrdma_ia {
  * RDMA Endpoint -- one per transport instance
  */
 
+#define RPCRDMA_WC_BUDGET	(128)
 #define RPCRDMA_POLLSIZE	(16)
 
 struct rpcrdma_ep {

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 14/17] xprtrdma: Limit work done by completion handler
@ 2014-04-30 19:31     ` Chuck Lever
  0 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:31 UTC (permalink / raw)
  To: linux-nfs, linux-rdma; +Cc: Anna.Schumaker

Sagi Grimberg <sagig@dev.mellanox.co.il> points out that a steady
stream of CQ events could starve other work because of the boundless
loop pooling in rpcrdma_{send,recv}_poll().

Instead of a (potentially infinite) while loop, return after
collecting a budgeted number of completions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
---

 net/sunrpc/xprtrdma/verbs.c     |   10 ++++++----
 net/sunrpc/xprtrdma/xprt_rdma.h |    1 +
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index b8caee9..1d08366 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -165,8 +165,9 @@ static int
 rpcrdma_sendcq_poll(struct ib_cq *cq, struct rpcrdma_ep *ep)
 {
 	struct ib_wc *wcs;
-	int count, rc;
+	int budget, count, rc;
 
+	budget = RPCRDMA_WC_BUDGET / RPCRDMA_POLLSIZE;
 	do {
 		wcs = ep->rep_send_wcs;
 
@@ -177,7 +178,7 @@ rpcrdma_sendcq_poll(struct ib_cq *cq, struct rpcrdma_ep *ep)
 		count = rc;
 		while (count-- > 0)
 			rpcrdma_sendcq_process_wc(wcs++);
-	} while (rc == RPCRDMA_POLLSIZE);
+	} while (rc == RPCRDMA_POLLSIZE && --budget);
 	return 0;
 }
 
@@ -254,8 +255,9 @@ static int
 rpcrdma_recvcq_poll(struct ib_cq *cq, struct rpcrdma_ep *ep)
 {
 	struct ib_wc *wcs;
-	int count, rc;
+	int budget, count, rc;
 
+	budget = RPCRDMA_WC_BUDGET / RPCRDMA_POLLSIZE;
 	do {
 		wcs = ep->rep_recv_wcs;
 
@@ -266,7 +268,7 @@ rpcrdma_recvcq_poll(struct ib_cq *cq, struct rpcrdma_ep *ep)
 		count = rc;
 		while (count-- > 0)
 			rpcrdma_recvcq_process_wc(wcs++);
-	} while (rc == RPCRDMA_POLLSIZE);
+	} while (rc == RPCRDMA_POLLSIZE && --budget);
 	return 0;
 }
 
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index cb4c882..0c3b88e 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -74,6 +74,7 @@ struct rpcrdma_ia {
  * RDMA Endpoint -- one per transport instance
  */
 
+#define RPCRDMA_WC_BUDGET	(128)
 #define RPCRDMA_POLLSIZE	(16)
 
 struct rpcrdma_ep {


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 15/17] xprtrdma: Reduce the number of hardway buffer allocations
  2014-04-30 19:29 ` Chuck Lever
@ 2014-04-30 19:31     ` Chuck Lever
  -1 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:31 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA

While marshaling an RPC/RDMA request, the inline_{rsize,wsize}
settings determine whether an inline request is used, or whether
read or write chunks lists are built. The current default value of
these settings is 1024. Any RPC request smaller than 1024 bytes is
sent to the NFS server completely inline.

rpcrdma_buffer_create() allocates and pre-registers a set of RPC
buffers for each transport instance, also based on the inline rsize
and wsize settings.

RPC/RDMA requests and replies are built in these buffers. However,
if an RPC/RDMA request is expected to be larger than 1024, a buffer
has to be allocated and registered for that RPC, and deregistered
and released when the RPC is complete. This is known has a
"hardway allocation."

Since the introduction of NFSv4, the size of RPC requests has become
larger, and hardway allocations are thus more frequent. Hardway
allocations are significant overhead, and they waste the existing
RPC buffers pre-allocated by rpcrdma_buffer_create().

We'd like fewer hardway allocations.

Increasing the size of the pre-registered buffers is the most direct
way to do this. However, a blanket increase of the inline thresholds
has interoperability consequences.

On my 64-bit system, rpcrdma_buffer_create() requests roughly 7000
bytes for each RPC request buffer, using kmalloc(). Due to internal
fragmentation, this wastes nearly 1200 bytes because kmalloc()
already returns an 8192-byte piece of memory for a 7000-byte
allocation request, though the extra space remains unused.

So let's round up the size of the pre-allocated buffers, and make
use of the unused space in the kmalloc'd memory.

This change reduces the amount of hardway allocated memory for an
NFSv4 general connectathon run from 1322092 to 9472 bytes (99%).

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Tested-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---

 net/sunrpc/xprtrdma/verbs.c |   25 +++++++++++++------------
 1 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 1d08366..c80995a 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -50,6 +50,7 @@
 #include <linux/interrupt.h>
 #include <linux/pci.h>	/* for Tavor hack below */
 #include <linux/slab.h>
+#include <asm/bitops.h>

 #include "xprt_rdma.h"

@@ -1005,7 +1006,7 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
 	struct rpcrdma_ia *ia, struct rpcrdma_create_data_internal *cdata)
 {
 	char *p;
-	size_t len;
+	size_t len, rlen, wlen;
 	int i, rc;
 	struct rpcrdma_mw *r;

@@ -1120,16 +1121,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
 	 * Allocate/init the request/reply buffers. Doing this
 	 * using kmalloc for now -- one for each buf.
 	 */
+	wlen = 1 << fls(cdata->inline_wsize + sizeof(struct rpcrdma_req));
+	rlen = 1 << fls(cdata->inline_rsize + sizeof(struct rpcrdma_rep));
+	dprintk("RPC:       %s: wlen = %zu, rlen = %zu\n",
+		__func__, wlen, rlen);
+
 	for (i = 0; i < buf->rb_max_requests; i++) {
 		struct rpcrdma_req *req;
 		struct rpcrdma_rep *rep;

-		len = cdata->inline_wsize + sizeof(struct rpcrdma_req);
-		/* RPC layer requests *double* size + 1K RPC_SLACK_SPACE! */
-		/* Typical ~2400b, so rounding up saves work later */
-		if (len < 4096)
-			len = 4096;
-		req = kmalloc(len, GFP_KERNEL);
+		req = kmalloc(wlen, GFP_KERNEL);
 		if (req == NULL) {
 			dprintk("RPC:       %s: request buffer %d alloc"
 				" failed\n", __func__, i);
@@ -1141,16 +1142,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
 		buf->rb_send_bufs[i]->rl_buffer = buf;

 		rc = rpcrdma_register_internal(ia, req->rl_base,
-				len - offsetof(struct rpcrdma_req, rl_base),
+				wlen - offsetof(struct rpcrdma_req, rl_base),
 				&buf->rb_send_bufs[i]->rl_handle,
 				&buf->rb_send_bufs[i]->rl_iov);
 		if (rc)
 			goto out;

-		buf->rb_send_bufs[i]->rl_size = len-sizeof(struct rpcrdma_req);
+		buf->rb_send_bufs[i]->rl_size = wlen -
+						sizeof(struct rpcrdma_req);

-		len = cdata->inline_rsize + sizeof(struct rpcrdma_rep);
-		rep = kmalloc(len, GFP_KERNEL);
+		rep = kmalloc(rlen, GFP_KERNEL);
 		if (rep == NULL) {
 			dprintk("RPC:       %s: reply buffer %d alloc failed\n",
 				__func__, i);
@@ -1162,7 +1163,7 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
 		buf->rb_recv_bufs[i]->rr_buffer = buf;

 		rc = rpcrdma_register_internal(ia, rep->rr_base,
-				len - offsetof(struct rpcrdma_rep, rr_base),
+				rlen - offsetof(struct rpcrdma_rep, rr_base),
 				&buf->rb_recv_bufs[i]->rr_handle,
 				&buf->rb_recv_bufs[i]->rr_iov);
 		if (rc)

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 15/17] xprtrdma: Reduce the number of hardway buffer allocations
@ 2014-04-30 19:31     ` Chuck Lever
  0 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:31 UTC (permalink / raw)
  To: linux-nfs, linux-rdma; +Cc: Anna.Schumaker

While marshaling an RPC/RDMA request, the inline_{rsize,wsize}
settings determine whether an inline request is used, or whether
read or write chunks lists are built. The current default value of
these settings is 1024. Any RPC request smaller than 1024 bytes is
sent to the NFS server completely inline.

rpcrdma_buffer_create() allocates and pre-registers a set of RPC
buffers for each transport instance, also based on the inline rsize
and wsize settings.

RPC/RDMA requests and replies are built in these buffers. However,
if an RPC/RDMA request is expected to be larger than 1024, a buffer
has to be allocated and registered for that RPC, and deregistered
and released when the RPC is complete. This is known has a
"hardway allocation."

Since the introduction of NFSv4, the size of RPC requests has become
larger, and hardway allocations are thus more frequent. Hardway
allocations are significant overhead, and they waste the existing
RPC buffers pre-allocated by rpcrdma_buffer_create().

We'd like fewer hardway allocations.

Increasing the size of the pre-registered buffers is the most direct
way to do this. However, a blanket increase of the inline thresholds
has interoperability consequences.

On my 64-bit system, rpcrdma_buffer_create() requests roughly 7000
bytes for each RPC request buffer, using kmalloc(). Due to internal
fragmentation, this wastes nearly 1200 bytes because kmalloc()
already returns an 8192-byte piece of memory for a 7000-byte
allocation request, though the extra space remains unused.

So let's round up the size of the pre-allocated buffers, and make
use of the unused space in the kmalloc'd memory.

This change reduces the amount of hardway allocated memory for an
NFSv4 general connectathon run from 1322092 to 9472 bytes (99%).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
---

 net/sunrpc/xprtrdma/verbs.c |   25 +++++++++++++------------
 1 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 1d08366..c80995a 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -50,6 +50,7 @@
 #include <linux/interrupt.h>
 #include <linux/pci.h>	/* for Tavor hack below */
 #include <linux/slab.h>
+#include <asm/bitops.h>

 #include "xprt_rdma.h"

@@ -1005,7 +1006,7 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
 	struct rpcrdma_ia *ia, struct rpcrdma_create_data_internal *cdata)
 {
 	char *p;
-	size_t len;
+	size_t len, rlen, wlen;
 	int i, rc;
 	struct rpcrdma_mw *r;

@@ -1120,16 +1121,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
 	 * Allocate/init the request/reply buffers. Doing this
 	 * using kmalloc for now -- one for each buf.
 	 */
+	wlen = 1 << fls(cdata->inline_wsize + sizeof(struct rpcrdma_req));
+	rlen = 1 << fls(cdata->inline_rsize + sizeof(struct rpcrdma_rep));
+	dprintk("RPC:       %s: wlen = %zu, rlen = %zu\n",
+		__func__, wlen, rlen);
+
 	for (i = 0; i < buf->rb_max_requests; i++) {
 		struct rpcrdma_req *req;
 		struct rpcrdma_rep *rep;

-		len = cdata->inline_wsize + sizeof(struct rpcrdma_req);
-		/* RPC layer requests *double* size + 1K RPC_SLACK_SPACE! */
-		/* Typical ~2400b, so rounding up saves work later */
-		if (len < 4096)
-			len = 4096;
-		req = kmalloc(len, GFP_KERNEL);
+		req = kmalloc(wlen, GFP_KERNEL);
 		if (req == NULL) {
 			dprintk("RPC:       %s: request buffer %d alloc"
 				" failed\n", __func__, i);
@@ -1141,16 +1142,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
 		buf->rb_send_bufs[i]->rl_buffer = buf;

 		rc = rpcrdma_register_internal(ia, req->rl_base,
-				len - offsetof(struct rpcrdma_req, rl_base),
+				wlen - offsetof(struct rpcrdma_req, rl_base),
 				&buf->rb_send_bufs[i]->rl_handle,
 				&buf->rb_send_bufs[i]->rl_iov);
 		if (rc)
 			goto out;

-		buf->rb_send_bufs[i]->rl_size = len-sizeof(struct rpcrdma_req);
+		buf->rb_send_bufs[i]->rl_size = wlen -
+						sizeof(struct rpcrdma_req);

-		len = cdata->inline_rsize + sizeof(struct rpcrdma_rep);
-		rep = kmalloc(len, GFP_KERNEL);
+		rep = kmalloc(rlen, GFP_KERNEL);
 		if (rep == NULL) {
 			dprintk("RPC:       %s: reply buffer %d alloc failed\n",
 				__func__, i);
@@ -1162,7 +1163,7 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
 		buf->rb_recv_bufs[i]->rr_buffer = buf;

 		rc = rpcrdma_register_internal(ia, rep->rr_base,
-				len - offsetof(struct rpcrdma_rep, rr_base),
+				rlen - offsetof(struct rpcrdma_rep, rr_base),
 				&buf->rb_recv_bufs[i]->rr_handle,
 				&buf->rb_recv_bufs[i]->rr_iov);
 		if (rc)

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 16/17] xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting
  2014-04-30 19:29 ` Chuck Lever
@ 2014-04-30 19:31     ` Chuck Lever
  -1 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:31 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA

Devesh Sharma <Devesh.Sharma-iH1Dq9VlAzfQT0dZR+AlfA@public.gmane.org> reports that after a
disconnect, his HCA is failing to create a fresh QP, leaving
ia_ri->ri_id->qp set to NULL. But xprtrdma still allows RPCs to
wake up and post LOCAL_INV as they exit, causing an oops.

rpcrdma_ep_connect() is allowing the wake-up by leaking the QP
creation error code (-EPERM in this case) to the RPC client's
generic layer. xprt_connect_status() does not recognize -EPERM, so
it kills pending RPC tasks immediately rather than retrying the
connect.

Re-arrange the QP creation logic so that when it fails on reconnect,
it leaves ->qp with the old QP rather than NULL.  If pending RPC
tasks wake and exit, LOCAL_INV work requests will flush rather than
oops.

On initial connect, leaving ->qp == NULL is OK, since there are no
pending RPCs that might use ->qp. But be sure not to try to destroy
a NULL QP when rpcrdma_ep_connect() is retried.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---

 net/sunrpc/xprtrdma/verbs.c |   29 ++++++++++++++++++++---------
 1 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index c80995a..54edf2a 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -867,6 +867,7 @@ rpcrdma_ep_connect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
 	if (ep->rep_connected != 0) {
 		struct rpcrdma_xprt *xprt;
 retry:
+		dprintk("RPC:       %s: reconnecting...\n", __func__);
 		rc = rpcrdma_ep_disconnect(ep, ia);
 		if (rc && rc != -ENOTCONN)
 			dprintk("RPC:       %s: rpcrdma_ep_disconnect"
@@ -879,7 +880,7 @@ retry:
 		id = rpcrdma_create_id(xprt, ia,
 				(struct sockaddr *)&xprt->rx_data.addr);
 		if (IS_ERR(id)) {
-			rc = PTR_ERR(id);
+			rc = -EHOSTUNREACH;
 			goto out;
 		}
 		/* TEMP TEMP TEMP - fail if new device:
@@ -893,20 +894,30 @@ retry:
 			printk("RPC:       %s: can't reconnect on "
 				"different device!\n", __func__);
 			rdma_destroy_id(id);
-			rc = -ENETDOWN;
+			rc = -ENETUNREACH;
 			goto out;
 		}
 		/* END TEMP */
+		rc = rdma_create_qp(id, ia->ri_pd, &ep->rep_attr);
+		if (rc) {
+			dprintk("RPC:       %s: rdma_create_qp failed %i\n",
+				__func__, rc);
+			rdma_destroy_id(id);
+			rc = -ENETUNREACH;
+			goto out;
+		}
 		rdma_destroy_qp(ia->ri_id);
 		rdma_destroy_id(ia->ri_id);
 		ia->ri_id = id;
-	}
-
-	rc = rdma_create_qp(ia->ri_id, ia->ri_pd, &ep->rep_attr);
-	if (rc) {
-		dprintk("RPC:       %s: rdma_create_qp failed %i\n",
-			__func__, rc);
-		goto out;
+	} else {
+		dprintk("RPC:       %s: connecting...\n", __func__);
+		rc = rdma_create_qp(ia->ri_id, ia->ri_pd, &ep->rep_attr);
+		if (rc) {
+			dprintk("RPC:       %s: rdma_create_qp failed %i\n",
+				__func__, rc);
+			/* do not update ep->rep_connected */
+			return -ENETUNREACH;
+		}
 	}
 
 /* XXX Tavor device performs badly with 2K MTU! */

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 16/17] xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting
@ 2014-04-30 19:31     ` Chuck Lever
  0 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:31 UTC (permalink / raw)
  To: linux-nfs, linux-rdma; +Cc: Anna.Schumaker

Devesh Sharma <Devesh.Sharma@Emulex.Com> reports that after a
disconnect, his HCA is failing to create a fresh QP, leaving
ia_ri->ri_id->qp set to NULL. But xprtrdma still allows RPCs to
wake up and post LOCAL_INV as they exit, causing an oops.

rpcrdma_ep_connect() is allowing the wake-up by leaking the QP
creation error code (-EPERM in this case) to the RPC client's
generic layer. xprt_connect_status() does not recognize -EPERM, so
it kills pending RPC tasks immediately rather than retrying the
connect.

Re-arrange the QP creation logic so that when it fails on reconnect,
it leaves ->qp with the old QP rather than NULL.  If pending RPC
tasks wake and exit, LOCAL_INV work requests will flush rather than
oops.

On initial connect, leaving ->qp == NULL is OK, since there are no
pending RPCs that might use ->qp. But be sure not to try to destroy
a NULL QP when rpcrdma_ep_connect() is retried.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 net/sunrpc/xprtrdma/verbs.c |   29 ++++++++++++++++++++---------
 1 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index c80995a..54edf2a 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -867,6 +867,7 @@ rpcrdma_ep_connect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
 	if (ep->rep_connected != 0) {
 		struct rpcrdma_xprt *xprt;
 retry:
+		dprintk("RPC:       %s: reconnecting...\n", __func__);
 		rc = rpcrdma_ep_disconnect(ep, ia);
 		if (rc && rc != -ENOTCONN)
 			dprintk("RPC:       %s: rpcrdma_ep_disconnect"
@@ -879,7 +880,7 @@ retry:
 		id = rpcrdma_create_id(xprt, ia,
 				(struct sockaddr *)&xprt->rx_data.addr);
 		if (IS_ERR(id)) {
-			rc = PTR_ERR(id);
+			rc = -EHOSTUNREACH;
 			goto out;
 		}
 		/* TEMP TEMP TEMP - fail if new device:
@@ -893,20 +894,30 @@ retry:
 			printk("RPC:       %s: can't reconnect on "
 				"different device!\n", __func__);
 			rdma_destroy_id(id);
-			rc = -ENETDOWN;
+			rc = -ENETUNREACH;
 			goto out;
 		}
 		/* END TEMP */
+		rc = rdma_create_qp(id, ia->ri_pd, &ep->rep_attr);
+		if (rc) {
+			dprintk("RPC:       %s: rdma_create_qp failed %i\n",
+				__func__, rc);
+			rdma_destroy_id(id);
+			rc = -ENETUNREACH;
+			goto out;
+		}
 		rdma_destroy_qp(ia->ri_id);
 		rdma_destroy_id(ia->ri_id);
 		ia->ri_id = id;
-	}
-
-	rc = rdma_create_qp(ia->ri_id, ia->ri_pd, &ep->rep_attr);
-	if (rc) {
-		dprintk("RPC:       %s: rdma_create_qp failed %i\n",
-			__func__, rc);
-		goto out;
+	} else {
+		dprintk("RPC:       %s: connecting...\n", __func__);
+		rc = rdma_create_qp(ia->ri_id, ia->ri_pd, &ep->rep_attr);
+		if (rc) {
+			dprintk("RPC:       %s: rdma_create_qp failed %i\n",
+				__func__, rc);
+			/* do not update ep->rep_connected */
+			return -ENETUNREACH;
+		}
 	}
 
 /* XXX Tavor device performs badly with 2K MTU! */


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 17/17] xprtrdma: Remove Tavor MTU setting
  2014-04-30 19:29 ` Chuck Lever
@ 2014-04-30 19:31     ` Chuck Lever
  -1 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:31 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA

Clean up.  Remove HCA-specific clutter in xprtrdma, which is
supposed to be device-independent.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---

 net/sunrpc/xprtrdma/verbs.c |   14 --------------
 1 files changed, 0 insertions(+), 14 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 54edf2a..515dfc1 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -48,7 +48,6 @@
  */
 
 #include <linux/interrupt.h>
-#include <linux/pci.h>	/* for Tavor hack below */
 #include <linux/slab.h>
 #include <asm/bitops.h>
 
@@ -920,19 +919,6 @@ retry:
 		}
 	}
 
-/* XXX Tavor device performs badly with 2K MTU! */
-if (strnicmp(ia->ri_id->device->dma_device->bus->name, "pci", 3) == 0) {
-	struct pci_dev *pcid = to_pci_dev(ia->ri_id->device->dma_device);
-	if (pcid->device == PCI_DEVICE_ID_MELLANOX_TAVOR &&
-	    (pcid->vendor == PCI_VENDOR_ID_MELLANOX ||
-	     pcid->vendor == PCI_VENDOR_ID_TOPSPIN)) {
-		struct ib_qp_attr attr = {
-			.path_mtu = IB_MTU_1024
-		};
-		rc = ib_modify_qp(ia->ri_id->qp, &attr, IB_QP_PATH_MTU);
-	}
-}
-
 	ep->rep_connected = 0;
 
 	rc = rdma_connect(ia->ri_id, &ep->rep_remote_cma);

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH V3 17/17] xprtrdma: Remove Tavor MTU setting
@ 2014-04-30 19:31     ` Chuck Lever
  0 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-04-30 19:31 UTC (permalink / raw)
  To: linux-nfs, linux-rdma; +Cc: Anna.Schumaker

Clean up.  Remove HCA-specific clutter in xprtrdma, which is
supposed to be device-independent.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 net/sunrpc/xprtrdma/verbs.c |   14 --------------
 1 files changed, 0 insertions(+), 14 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 54edf2a..515dfc1 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -48,7 +48,6 @@
  */
 
 #include <linux/interrupt.h>
-#include <linux/pci.h>	/* for Tavor hack below */
 #include <linux/slab.h>
 #include <asm/bitops.h>
 
@@ -920,19 +919,6 @@ retry:
 		}
 	}
 
-/* XXX Tavor device performs badly with 2K MTU! */
-if (strnicmp(ia->ri_id->device->dma_device->bus->name, "pci", 3) == 0) {
-	struct pci_dev *pcid = to_pci_dev(ia->ri_id->device->dma_device);
-	if (pcid->device == PCI_DEVICE_ID_MELLANOX_TAVOR &&
-	    (pcid->vendor == PCI_VENDOR_ID_MELLANOX ||
-	     pcid->vendor == PCI_VENDOR_ID_TOPSPIN)) {
-		struct ib_qp_attr attr = {
-			.path_mtu = IB_MTU_1024
-		};
-		rc = ib_modify_qp(ia->ri_id->qp, &attr, IB_QP_PATH_MTU);
-	}
-}
-
 	ep->rep_connected = 0;
 
 	rc = rdma_connect(ia->ri_id, &ep->rep_remote_cma);


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH V3 17/17] xprtrdma: Remove Tavor MTU setting
  2014-04-30 19:31     ` Chuck Lever
@ 2014-05-01  7:36         ` Hal Rosenstock
  -1 siblings, 0 replies; 60+ messages in thread
From: Hal Rosenstock @ 2014-05-01  7:36 UTC (permalink / raw)
  To: Chuck Lever
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA

On 4/30/2014 3:31 PM, Chuck Lever wrote:
> Clean up.  Remove HCA-specific clutter in xprtrdma, which is
> supposed to be device-independent.
> 
> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> ---
> 
>  net/sunrpc/xprtrdma/verbs.c |   14 --------------
>  1 files changed, 0 insertions(+), 14 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 54edf2a..515dfc1 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -48,7 +48,6 @@
>   */
>  
>  #include <linux/interrupt.h>
> -#include <linux/pci.h>	/* for Tavor hack below */
>  #include <linux/slab.h>
>  #include <asm/bitops.h>
>  
> @@ -920,19 +919,6 @@ retry:
>  		}
>  	}
>  
> -/* XXX Tavor device performs badly with 2K MTU! */
> -if (strnicmp(ia->ri_id->device->dma_device->bus->name, "pci", 3) == 0) {
> -	struct pci_dev *pcid = to_pci_dev(ia->ri_id->device->dma_device);
> -	if (pcid->device == PCI_DEVICE_ID_MELLANOX_TAVOR &&
> -	    (pcid->vendor == PCI_VENDOR_ID_MELLANOX ||
> -	     pcid->vendor == PCI_VENDOR_ID_TOPSPIN)) {
> -		struct ib_qp_attr attr = {
> -			.path_mtu = IB_MTU_1024
> -		};
> -		rc = ib_modify_qp(ia->ri_id->qp, &attr, IB_QP_PATH_MTU);

Note that there is OpenSM option (enable_quirks) to return 1K MTU in SA
PathRecord responses for Tavor so that can be used for this. The default
setting for enable_quirks is FALSE so that would need changing.

-- Hal

> -	}
> -}
> -
>  	ep->rep_connected = 0;
>  
>  	rc = rdma_connect(ia->ri_id, &ep->rep_remote_cma);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH V3 17/17] xprtrdma: Remove Tavor MTU setting
@ 2014-05-01  7:36         ` Hal Rosenstock
  0 siblings, 0 replies; 60+ messages in thread
From: Hal Rosenstock @ 2014-05-01  7:36 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs, linux-rdma, Anna.Schumaker

On 4/30/2014 3:31 PM, Chuck Lever wrote:
> Clean up.  Remove HCA-specific clutter in xprtrdma, which is
> supposed to be device-independent.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> 
>  net/sunrpc/xprtrdma/verbs.c |   14 --------------
>  1 files changed, 0 insertions(+), 14 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 54edf2a..515dfc1 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -48,7 +48,6 @@
>   */
>  
>  #include <linux/interrupt.h>
> -#include <linux/pci.h>	/* for Tavor hack below */
>  #include <linux/slab.h>
>  #include <asm/bitops.h>
>  
> @@ -920,19 +919,6 @@ retry:
>  		}
>  	}
>  
> -/* XXX Tavor device performs badly with 2K MTU! */
> -if (strnicmp(ia->ri_id->device->dma_device->bus->name, "pci", 3) == 0) {
> -	struct pci_dev *pcid = to_pci_dev(ia->ri_id->device->dma_device);
> -	if (pcid->device == PCI_DEVICE_ID_MELLANOX_TAVOR &&
> -	    (pcid->vendor == PCI_VENDOR_ID_MELLANOX ||
> -	     pcid->vendor == PCI_VENDOR_ID_TOPSPIN)) {
> -		struct ib_qp_attr attr = {
> -			.path_mtu = IB_MTU_1024
> -		};
> -		rc = ib_modify_qp(ia->ri_id->qp, &attr, IB_QP_PATH_MTU);

Note that there is OpenSM option (enable_quirks) to return 1K MTU in SA
PathRecord responses for Tavor so that can be used for this. The default
setting for enable_quirks is FALSE so that would need changing.

-- Hal

> -	}
> -}
> -
>  	ep->rep_connected = 0;
>  
>  	rc = rdma_connect(ia->ri_id, &ep->rep_remote_cma);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH V3 00/17] NFS/RDMA client-side patches
       [not found] ` <20140430191433.5663.16217.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
@ 2014-05-02 19:27     ` Doug Ledford
  2014-04-30 19:29     ` Chuck Lever
                       ` (17 subsequent siblings)
  18 siblings, 0 replies; 60+ messages in thread
From: Doug Ledford @ 2014-05-02 19:27 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Anna Schumaker, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier, Allen Andrews

----- Original Message -----
> Changes since V2:
> 
>  - Rebased on v3.15-rc3
> 
>  - "enable pad optimization" dropped. Testing showed Linux NFS/RDMA
>    server does not support pad optimization yet.
> 
>  - "ALLPHYSICAL CONFIG" dropped. There is a lack of consensus on
>    this one. Christoph would like ALLPHYSICAL removed, but the HPC
>    community prefers keeping a performance-at-all-costs option. And,
>    with most other registration modes now removed, ALLPHYSICAL is
>    the mode of last resort if an adapter does not support FRMR or
>    MTHCAFMR, since ALLPHYSICAL is universally supported. We will
>    very likely revisit this later. I'm erring on the side of less
>    churn and dropping this until the community agrees on how to
>    move forward.
> 
>  - Added a patch to ensure there is always a valid ->qp if RPCs
>    might awaken while the transport is disconnected.
> 
>  - Added a patch to clean up an MTU settings hack for a very old
>    adapter model.
> 
> Test and review the "nfs-rdma-client" branch:
> 
>  git://git.linux-nfs.org/projects/cel/cel-2.6.git
> 
> Thanks!

Hi Chuck,

I've installed this in my cluster and ran a number of simple tests
over a variety of hardware.  For the most part, it's looking much
better than NFSoRDMA looked a kernel or two back, but I can still
trip it up.  All tests were run with rhel7 + current upstream
kernel.

My server was using mlx4 hardware in both IB and RoCE modes.

I tested from mlx4 client in both IB and RoCE modes -> not DOA
I tested from mlx5 client in IB mode -> not DOA
I tested from mthca client in IB mode -> not DOA
I tested from qib client in IB mode -> not DOA
I tested from ocrdma client in RoCE mode -> DOA (cpu soft lockup
  on mount on the client)

I tested nfsv3 -> not DOA
I tested nfsv4 + rdma -> still DOA, but I think this is expected
  as last I knew someone needs to write code for nfsv4 mountd
  over rdma before this will work (as nfsv3 uses a tcp connection
  to do mounting, and then switches to rdma for data transfers
  and nfsv4 doesn't support that or something like that...this
  is what I recall Jeff Layton telling me anyway)

I tested nfsv3 in both IB and RoCE modes with rsize=32768 and
wsize=32768 -> not DOA, reliable, did data verification and passed

I tested nfsv3 in both IB and RoCE modes with rsize=65536 and
wsize=65536 -> not DOA, but not reliable either, data transfers
will stop after a certain amount has been transferred and the
mount will have a soft hang

My data verification was simple (but generally effective in
lots of scenarios):

I had a full linux kernel git repo, with a complete build in it
(totaling a little over 9GB of disk space used) and I would run
tar -cf - linus | tar -xvf - -C <tmpdir> to copy the tree
around (I did copies both on the same mount and on a different
mount that was also NFSoRDMA, including copying from an IB
NFSoRDMA mount to a RoCE NFSoRDMA mount on different mlx4 ports),
and then diff -uprN on the various tree locations to check for
any data differences.

So there's your testing report.  As I said in the beginning, it's
definitely better than it was since it used to oops the server and
I didn't encounter any server side problems this time, only client
side problems.

ToDo items that I see:

Write NFSv4 rdma protocol mount support
Fix client soft mount hangs when rsize/wsize > 32768
Fix DOA of ocrdma driver

Tested-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD
	      http://people.redhat.com/dledford

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH V3 00/17] NFS/RDMA client-side patches
       [not found] ` <20140430191433.5663.16217.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
@ 2014-05-02 19:27     ` Doug Ledford
  2014-04-30 19:29     ` Chuck Lever
                       ` (17 subsequent siblings)
  18 siblings, 0 replies; 60+ messages in thread
From: Doug Ledford @ 2014-05-02 19:27 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Anna Schumaker, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier, Allen Andrews

----- Original Message -----
> Changes since V2:
> 
>  - Rebased on v3.15-rc3
> 
>  - "enable pad optimization" dropped. Testing showed Linux NFS/RDMA
>    server does not support pad optimization yet.
> 
>  - "ALLPHYSICAL CONFIG" dropped. There is a lack of consensus on
>    this one. Christoph would like ALLPHYSICAL removed, but the HPC
>    community prefers keeping a performance-at-all-costs option. And,
>    with most other registration modes now removed, ALLPHYSICAL is
>    the mode of last resort if an adapter does not support FRMR or
>    MTHCAFMR, since ALLPHYSICAL is universally supported. We will
>    very likely revisit this later. I'm erring on the side of less
>    churn and dropping this until the community agrees on how to
>    move forward.
> 
>  - Added a patch to ensure there is always a valid ->qp if RPCs
>    might awaken while the transport is disconnected.
> 
>  - Added a patch to clean up an MTU settings hack for a very old
>    adapter model.
> 
> Test and review the "nfs-rdma-client" branch:
> 
>  git://git.linux-nfs.org/projects/cel/cel-2.6.git
> 
> Thanks!

Hi Chuck,

I've installed this in my cluster and ran a number of simple tests
over a variety of hardware.  For the most part, it's looking much
better than NFSoRDMA looked a kernel or two back, but I can still
trip it up.  All tests were run with rhel7 + current upstream
kernel.

My server was using mlx4 hardware in both IB and RoCE modes.

I tested from mlx4 client in both IB and RoCE modes -> not DOA
I tested from mlx5 client in IB mode -> not DOA
I tested from mthca client in IB mode -> not DOA
I tested from qib client in IB mode -> not DOA
I tested from ocrdma client in RoCE mode -> DOA (cpu soft lockup
  on mount on the client)

I tested nfsv3 -> not DOA
I tested nfsv4 + rdma -> still DOA, but I think this is expected
  as last I knew someone needs to write code for nfsv4 mountd
  over rdma before this will work (as nfsv3 uses a tcp connection
  to do mounting, and then switches to rdma for data transfers
  and nfsv4 doesn't support that or something like that...this
  is what I recall Jeff Layton telling me anyway)

I tested nfsv3 in both IB and RoCE modes with rsize=32768 and
wsize=32768 -> not DOA, reliable, did data verification and passed

I tested nfsv3 in both IB and RoCE modes with rsize=65536 and
wsize=65536 -> not DOA, but not reliable either, data transfers
will stop after a certain amount has been transferred and the
mount will have a soft hang

My data verification was simple (but generally effective in
lots of scenarios):

I had a full linux kernel git repo, with a complete build in it
(totaling a little over 9GB of disk space used) and I would run
tar -cf - linus | tar -xvf - -C <tmpdir> to copy the tree
around (I did copies both on the same mount and on a different
mount that was also NFSoRDMA, including copying from an IB
NFSoRDMA mount to a RoCE NFSoRDMA mount on different mlx4 ports),
and then diff -uprN on the various tree locations to check for
any data differences.

So there's your testing report.  As I said in the beginning, it's
definitely better than it was since it used to oops the server and
I didn't encounter any server side problems this time, only client
side problems.

ToDo items that I see:

Write NFSv4 rdma protocol mount support
Fix client soft mount hangs when rsize/wsize > 32768
Fix DOA of ocrdma driver

Tested-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD
	      http://people.redhat.com/dledford

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH V3 00/17] NFS/RDMA client-side patches
  2014-04-30 19:29 ` Chuck Lever
  (?)
  (?)
@ 2014-05-02 19:27 ` Doug Ledford
  -1 siblings, 0 replies; 60+ messages in thread
From: Doug Ledford @ 2014-05-02 19:27 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Anna Schumaker, linux-nfs, linux-rdma, Roland Dreier, Allen Andrews

----- Original Message -----
> Changes since V2:
> 
>  - Rebased on v3.15-rc3
> 
>  - "enable pad optimization" dropped. Testing showed Linux NFS/RDMA
>    server does not support pad optimization yet.
> 
>  - "ALLPHYSICAL CONFIG" dropped. There is a lack of consensus on
>    this one. Christoph would like ALLPHYSICAL removed, but the HPC
>    community prefers keeping a performance-at-all-costs option. And,
>    with most other registration modes now removed, ALLPHYSICAL is
>    the mode of last resort if an adapter does not support FRMR or
>    MTHCAFMR, since ALLPHYSICAL is universally supported. We will
>    very likely revisit this later. I'm erring on the side of less
>    churn and dropping this until the community agrees on how to
>    move forward.
> 
>  - Added a patch to ensure there is always a valid ->qp if RPCs
>    might awaken while the transport is disconnected.
> 
>  - Added a patch to clean up an MTU settings hack for a very old
>    adapter model.
> 
> Test and review the "nfs-rdma-client" branch:
> 
>  git://git.linux-nfs.org/projects/cel/cel-2.6.git
> 
> Thanks!

Hi Chuck,

I've installed this in my cluster and ran a number of simple tests
over a variety of hardware.  For the most part, it's looking much
better than NFSoRDMA looked a kernel or two back, but I can still
trip it up.  All tests were run with rhel7 + current upstream
kernel.

My server was using mlx4 hardware in both IB and RoCE modes.

I tested from mlx4 client in both IB and RoCE modes -> not DOA
I tested from mlx5 client in IB mode -> not DOA
I tested from mthca client in IB mode -> not DOA
I tested from qib client in IB mode -> not DOA
I tested from ocrdma client in RoCE mode -> DOA (cpu soft lockup
  on mount on the client)

I tested nfsv3 -> not DOA
I tested nfsv4 + rdma -> still DOA, but I think this is expected
  as last I knew someone needs to write code for nfsv4 mountd
  over rdma before this will work (as nfsv3 uses a tcp connection
  to do mounting, and then switches to rdma for data transfers
  and nfsv4 doesn't support that or something like that...this
  is what I recall Jeff Layton telling me anyway)

I tested nfsv3 in both IB and RoCE modes with rsize=32768 and
wsize=32768 -> not DOA, reliable, did data verification and passed

I tested nfsv3 in both IB and RoCE modes with rsize=65536 and
wsize=65536 -> not DOA, but not reliable either, data transfers
will stop after a certain amount has been transferred and the
mount will have a soft hang

My data verification was simple (but generally effective in
lots of scenarios):

I had a full linux kernel git repo, with a complete build in it
(totaling a little over 9GB of disk space used) and I would run
tar -cf - linus | tar -xvf - -C <tmpdir> to copy the tree
around (I did copies both on the same mount and on a different
mount that was also NFSoRDMA, including copying from an IB
NFSoRDMA mount to a RoCE NFSoRDMA mount on different mlx4 ports),
and then diff -uprN on the various tree locations to check for
any data differences.

So there's your testing report.  As I said in the beginning, it's
definitely better than it was since it used to oops the server and
I didn't encounter any server side problems this time, only client
side problems.

ToDo items that I see:

Write NFSv4 rdma protocol mount support
Fix client soft mount hangs when rsize/wsize > 32768
Fix DOA of ocrdma driver

Tested-by: Doug Ledford <dledford@redhat.com>


-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: 0E572FDD
	      http://people.redhat.com/dledford


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH V3 00/17] NFS/RDMA client-side patches
@ 2014-05-02 19:27     ` Doug Ledford
  0 siblings, 0 replies; 60+ messages in thread
From: Doug Ledford @ 2014-05-02 19:27 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Anna Schumaker, linux-nfs, linux-rdma, Roland Dreier, Allen Andrews

----- Original Message -----
> Changes since V2:
> 
>  - Rebased on v3.15-rc3
> 
>  - "enable pad optimization" dropped. Testing showed Linux NFS/RDMA
>    server does not support pad optimization yet.
> 
>  - "ALLPHYSICAL CONFIG" dropped. There is a lack of consensus on
>    this one. Christoph would like ALLPHYSICAL removed, but the HPC
>    community prefers keeping a performance-at-all-costs option. And,
>    with most other registration modes now removed, ALLPHYSICAL is
>    the mode of last resort if an adapter does not support FRMR or
>    MTHCAFMR, since ALLPHYSICAL is universally supported. We will
>    very likely revisit this later. I'm erring on the side of less
>    churn and dropping this until the community agrees on how to
>    move forward.
> 
>  - Added a patch to ensure there is always a valid ->qp if RPCs
>    might awaken while the transport is disconnected.
> 
>  - Added a patch to clean up an MTU settings hack for a very old
>    adapter model.
> 
> Test and review the "nfs-rdma-client" branch:
> 
>  git://git.linux-nfs.org/projects/cel/cel-2.6.git
> 
> Thanks!

Hi Chuck,

I've installed this in my cluster and ran a number of simple tests
over a variety of hardware.  For the most part, it's looking much
better than NFSoRDMA looked a kernel or two back, but I can still
trip it up.  All tests were run with rhel7 + current upstream
kernel.

My server was using mlx4 hardware in both IB and RoCE modes.

I tested from mlx4 client in both IB and RoCE modes -> not DOA
I tested from mlx5 client in IB mode -> not DOA
I tested from mthca client in IB mode -> not DOA
I tested from qib client in IB mode -> not DOA
I tested from ocrdma client in RoCE mode -> DOA (cpu soft lockup
  on mount on the client)

I tested nfsv3 -> not DOA
I tested nfsv4 + rdma -> still DOA, but I think this is expected
  as last I knew someone needs to write code for nfsv4 mountd
  over rdma before this will work (as nfsv3 uses a tcp connection
  to do mounting, and then switches to rdma for data transfers
  and nfsv4 doesn't support that or something like that...this
  is what I recall Jeff Layton telling me anyway)

I tested nfsv3 in both IB and RoCE modes with rsize=32768 and
wsize=32768 -> not DOA, reliable, did data verification and passed

I tested nfsv3 in both IB and RoCE modes with rsize=65536 and
wsize=65536 -> not DOA, but not reliable either, data transfers
will stop after a certain amount has been transferred and the
mount will have a soft hang

My data verification was simple (but generally effective in
lots of scenarios):

I had a full linux kernel git repo, with a complete build in it
(totaling a little over 9GB of disk space used) and I would run
tar -cf - linus | tar -xvf - -C <tmpdir> to copy the tree
around (I did copies both on the same mount and on a different
mount that was also NFSoRDMA, including copying from an IB
NFSoRDMA mount to a RoCE NFSoRDMA mount on different mlx4 ports),
and then diff -uprN on the various tree locations to check for
any data differences.

So there's your testing report.  As I said in the beginning, it's
definitely better than it was since it used to oops the server and
I didn't encounter any server side problems this time, only client
side problems.

ToDo items that I see:

Write NFSv4 rdma protocol mount support
Fix client soft mount hangs when rsize/wsize > 32768
Fix DOA of ocrdma driver

Tested-by: Doug Ledford <dledford@redhat.com>


-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: 0E572FDD
	      http://people.redhat.com/dledford


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH V3 00/17] NFS/RDMA client-side patches
@ 2014-05-02 19:27     ` Doug Ledford
  0 siblings, 0 replies; 60+ messages in thread
From: Doug Ledford @ 2014-05-02 19:27 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Anna Schumaker, linux-nfs, linux-rdma, Roland Dreier, Allen Andrews

----- Original Message -----
> Changes since V2:
> 
>  - Rebased on v3.15-rc3
> 
>  - "enable pad optimization" dropped. Testing showed Linux NFS/RDMA
>    server does not support pad optimization yet.
> 
>  - "ALLPHYSICAL CONFIG" dropped. There is a lack of consensus on
>    this one. Christoph would like ALLPHYSICAL removed, but the HPC
>    community prefers keeping a performance-at-all-costs option. And,
>    with most other registration modes now removed, ALLPHYSICAL is
>    the mode of last resort if an adapter does not support FRMR or
>    MTHCAFMR, since ALLPHYSICAL is universally supported. We will
>    very likely revisit this later. I'm erring on the side of less
>    churn and dropping this until the community agrees on how to
>    move forward.
> 
>  - Added a patch to ensure there is always a valid ->qp if RPCs
>    might awaken while the transport is disconnected.
> 
>  - Added a patch to clean up an MTU settings hack for a very old
>    adapter model.
> 
> Test and review the "nfs-rdma-client" branch:
> 
>  git://git.linux-nfs.org/projects/cel/cel-2.6.git
> 
> Thanks!

Hi Chuck,

I've installed this in my cluster and ran a number of simple tests
over a variety of hardware.  For the most part, it's looking much
better than NFSoRDMA looked a kernel or two back, but I can still
trip it up.  All tests were run with rhel7 + current upstream
kernel.

My server was using mlx4 hardware in both IB and RoCE modes.

I tested from mlx4 client in both IB and RoCE modes -> not DOA
I tested from mlx5 client in IB mode -> not DOA
I tested from mthca client in IB mode -> not DOA
I tested from qib client in IB mode -> not DOA
I tested from ocrdma client in RoCE mode -> DOA (cpu soft lockup
  on mount on the client)

I tested nfsv3 -> not DOA
I tested nfsv4 + rdma -> still DOA, but I think this is expected
  as last I knew someone needs to write code for nfsv4 mountd
  over rdma before this will work (as nfsv3 uses a tcp connection
  to do mounting, and then switches to rdma for data transfers
  and nfsv4 doesn't support that or something like that...this
  is what I recall Jeff Layton telling me anyway)

I tested nfsv3 in both IB and RoCE modes with rsize=32768 and
wsize=32768 -> not DOA, reliable, did data verification and passed

I tested nfsv3 in both IB and RoCE modes with rsize=65536 and
wsize=65536 -> not DOA, but not reliable either, data transfers
will stop after a certain amount has been transferred and the
mount will have a soft hang

My data verification was simple (but generally effective in
lots of scenarios):

I had a full linux kernel git repo, with a complete build in it
(totaling a little over 9GB of disk space used) and I would run
tar -cf - linus | tar -xvf - -C <tmpdir> to copy the tree
around (I did copies both on the same mount and on a different
mount that was also NFSoRDMA, including copying from an IB
NFSoRDMA mount to a RoCE NFSoRDMA mount on different mlx4 ports),
and then diff -uprN on the various tree locations to check for
any data differences.

So there's your testing report.  As I said in the beginning, it's
definitely better than it was since it used to oops the server and
I didn't encounter any server side problems this time, only client
side problems.

ToDo items that I see:

Write NFSv4 rdma protocol mount support
Fix client soft mount hangs when rsize/wsize > 32768
Fix DOA of ocrdma driver

Tested-by: Doug Ledford <dledford@redhat.com>


-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: 0E572FDD
	      http://people.redhat.com/dledford


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH V3 00/17] NFS/RDMA client-side patches
       [not found] ` <5363f223.e39f420a.4af6.6fc9SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2014-05-02 20:20       ` Chuck Lever
  0 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-05-02 20:20 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Anna Schumaker, Linux NFS Mailing List,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier, Allen Andrews


On May 2, 2014, at 3:27 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> ----- Original Message -----
>> Changes since V2:
>> 
>> - Rebased on v3.15-rc3
>> 
>> - "enable pad optimization" dropped. Testing showed Linux NFS/RDMA
>>   server does not support pad optimization yet.
>> 
>> - "ALLPHYSICAL CONFIG" dropped. There is a lack of consensus on
>>   this one. Christoph would like ALLPHYSICAL removed, but the HPC
>>   community prefers keeping a performance-at-all-costs option. And,
>>   with most other registration modes now removed, ALLPHYSICAL is
>>   the mode of last resort if an adapter does not support FRMR or
>>   MTHCAFMR, since ALLPHYSICAL is universally supported. We will
>>   very likely revisit this later. I'm erring on the side of less
>>   churn and dropping this until the community agrees on how to
>>   move forward.
>> 
>> - Added a patch to ensure there is always a valid ->qp if RPCs
>>   might awaken while the transport is disconnected.
>> 
>> - Added a patch to clean up an MTU settings hack for a very old
>>   adapter model.
>> 
>> Test and review the "nfs-rdma-client" branch:
>> 
>> git://git.linux-nfs.org/projects/cel/cel-2.6.git
>> 
>> Thanks!
> 
> Hi Chuck,
> 
> I've installed this in my cluster and ran a number of simple tests
> over a variety of hardware.  For the most part, it's looking much
> better than NFSoRDMA looked a kernel or two back, but I can still
> trip it up.  All tests were run with rhel7 + current upstream
> kernel.
> 
> My server was using mlx4 hardware in both IB and RoCE modes.
> 
> I tested from mlx4 client in both IB and RoCE modes -> not DOA
> I tested from mlx5 client in IB mode -> not DOA
> I tested from mthca client in IB mode -> not DOA
> I tested from qib client in IB mode -> not DOA
> I tested from ocrdma client in RoCE mode -> DOA (cpu soft lockup
>  on mount on the client)
> 
> I tested nfsv3 -> not DOA
> I tested nfsv4 + rdma -> still DOA, but I think this is expected
>  as last I knew someone needs to write code for nfsv4 mountd
>  over rdma before this will work (as nfsv3 uses a tcp connection
>  to do mounting, and then switches to rdma for data transfers
>  and nfsv4 doesn't support that or something like that...this
>  is what I recall Jeff Layton telling me anyway)
> 
> I tested nfsv3 in both IB and RoCE modes with rsize=32768 and
> wsize=32768 -> not DOA, reliable, did data verification and passed
> 
> I tested nfsv3 in both IB and RoCE modes with rsize=65536 and
> wsize=65536 -> not DOA, but not reliable either, data transfers
> will stop after a certain amount has been transferred and the
> mount will have a soft hang

Can you clarify what you mean by “soft hang?” Are you seeing a
problem when mounting with the “soft” mount option, or does this
mean “CPU soft lockup?” (INFO: task hung for 120 seconds)

> My data verification was simple (but generally effective in
> lots of scenarios):
> 
> I had a full linux kernel git repo, with a complete build in it
> (totaling a little over 9GB of disk space used) and I would run
> tar -cf - linus | tar -xvf - -C <tmpdir> to copy the tree
> around (I did copies both on the same mount and on a different
> mount that was also NFSoRDMA, including copying from an IB
> NFSoRDMA mount to a RoCE NFSoRDMA mount on different mlx4 ports),
> and then diff -uprN on the various tree locations to check for
> any data differences.
> 
> So there's your testing report.  As I said in the beginning, it's
> definitely better than it was since it used to oops the server and
> I didn't encounter any server side problems this time, only client
> side problems.

Thanks for testing!

> ToDo items that I see:
> 
> Write NFSv4 rdma protocol mount support

NFSv4 does not use the MNT protocol. If NFSv4 is not working for you,
there’s something else going on. For me NFSv4 works as well as NFSv3.
Let me know if you need help troubleshooting.

> Fix client soft mount hangs when rsize/wsize > 32768

Does that problem occur with unpatched v3.15-rc3 on the client?

HCAs/RNICs that support MTHCAFMR and FRMR should be working up to the
largest rsize and wsize supported by the client and server.

When I use ALLPHYSICAL with large wsize, typically the server starts
dropping NFS WRITE requests. The client retries them forever, and that
looks like a mount point hang.

Something like https://bugzilla.linux-nfs.org/show_bug.cgi?id=248

> Fix DOA of ocrdma driver

Does that problem occur with unpatched v3.15-rc3 on the client?

Emulex has reported some problems when reconnecting, but
I haven’t heard of issues that occur right at mount time.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH V3 00/17] NFS/RDMA client-side patches
@ 2014-05-02 20:20       ` Chuck Lever
  0 siblings, 0 replies; 60+ messages in thread
From: Chuck Lever @ 2014-05-02 20:20 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Anna Schumaker, Linux NFS Mailing List, linux-rdma,
	Roland Dreier, Allen Andrews


On May 2, 2014, at 3:27 PM, Doug Ledford <dledford@redhat.com> wrote:

> ----- Original Message -----
>> Changes since V2:
>> 
>> - Rebased on v3.15-rc3
>> 
>> - "enable pad optimization" dropped. Testing showed Linux NFS/RDMA
>>   server does not support pad optimization yet.
>> 
>> - "ALLPHYSICAL CONFIG" dropped. There is a lack of consensus on
>>   this one. Christoph would like ALLPHYSICAL removed, but the HPC
>>   community prefers keeping a performance-at-all-costs option. And,
>>   with most other registration modes now removed, ALLPHYSICAL is
>>   the mode of last resort if an adapter does not support FRMR or
>>   MTHCAFMR, since ALLPHYSICAL is universally supported. We will
>>   very likely revisit this later. I'm erring on the side of less
>>   churn and dropping this until the community agrees on how to
>>   move forward.
>> 
>> - Added a patch to ensure there is always a valid ->qp if RPCs
>>   might awaken while the transport is disconnected.
>> 
>> - Added a patch to clean up an MTU settings hack for a very old
>>   adapter model.
>> 
>> Test and review the "nfs-rdma-client" branch:
>> 
>> git://git.linux-nfs.org/projects/cel/cel-2.6.git
>> 
>> Thanks!
> 
> Hi Chuck,
> 
> I've installed this in my cluster and ran a number of simple tests
> over a variety of hardware.  For the most part, it's looking much
> better than NFSoRDMA looked a kernel or two back, but I can still
> trip it up.  All tests were run with rhel7 + current upstream
> kernel.
> 
> My server was using mlx4 hardware in both IB and RoCE modes.
> 
> I tested from mlx4 client in both IB and RoCE modes -> not DOA
> I tested from mlx5 client in IB mode -> not DOA
> I tested from mthca client in IB mode -> not DOA
> I tested from qib client in IB mode -> not DOA
> I tested from ocrdma client in RoCE mode -> DOA (cpu soft lockup
>  on mount on the client)
> 
> I tested nfsv3 -> not DOA
> I tested nfsv4 + rdma -> still DOA, but I think this is expected
>  as last I knew someone needs to write code for nfsv4 mountd
>  over rdma before this will work (as nfsv3 uses a tcp connection
>  to do mounting, and then switches to rdma for data transfers
>  and nfsv4 doesn't support that or something like that...this
>  is what I recall Jeff Layton telling me anyway)
> 
> I tested nfsv3 in both IB and RoCE modes with rsize=32768 and
> wsize=32768 -> not DOA, reliable, did data verification and passed
> 
> I tested nfsv3 in both IB and RoCE modes with rsize=65536 and
> wsize=65536 -> not DOA, but not reliable either, data transfers
> will stop after a certain amount has been transferred and the
> mount will have a soft hang

Can you clarify what you mean by “soft hang?” Are you seeing a
problem when mounting with the “soft” mount option, or does this
mean “CPU soft lockup?” (INFO: task hung for 120 seconds)

> My data verification was simple (but generally effective in
> lots of scenarios):
> 
> I had a full linux kernel git repo, with a complete build in it
> (totaling a little over 9GB of disk space used) and I would run
> tar -cf - linus | tar -xvf - -C <tmpdir> to copy the tree
> around (I did copies both on the same mount and on a different
> mount that was also NFSoRDMA, including copying from an IB
> NFSoRDMA mount to a RoCE NFSoRDMA mount on different mlx4 ports),
> and then diff -uprN on the various tree locations to check for
> any data differences.
> 
> So there's your testing report.  As I said in the beginning, it's
> definitely better than it was since it used to oops the server and
> I didn't encounter any server side problems this time, only client
> side problems.

Thanks for testing!

> ToDo items that I see:
> 
> Write NFSv4 rdma protocol mount support

NFSv4 does not use the MNT protocol. If NFSv4 is not working for you,
there’s something else going on. For me NFSv4 works as well as NFSv3.
Let me know if you need help troubleshooting.

> Fix client soft mount hangs when rsize/wsize > 32768

Does that problem occur with unpatched v3.15-rc3 on the client?

HCAs/RNICs that support MTHCAFMR and FRMR should be working up to the
largest rsize and wsize supported by the client and server.

When I use ALLPHYSICAL with large wsize, typically the server starts
dropping NFS WRITE requests. The client retries them forever, and that
looks like a mount point hang.

Something like https://bugzilla.linux-nfs.org/show_bug.cgi?id=248

> Fix DOA of ocrdma driver

Does that problem occur with unpatched v3.15-rc3 on the client?

Emulex has reported some problems when reconnecting, but
I haven’t heard of issues that occur right at mount time.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH V3 00/17] NFS/RDMA client-side patches
  2014-05-02 20:20       ` Chuck Lever
@ 2014-05-02 22:34           ` Doug Ledford
  -1 siblings, 0 replies; 60+ messages in thread
From: Doug Ledford @ 2014-05-02 22:34 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Anna Schumaker, Linux NFS Mailing List,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier, Allen Andrews

----- Original Message -----
> 
> On May 2, 2014, at 3:27 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > I tested nfsv3 in both IB and RoCE modes with rsize=32768 and
> > wsize=32768 -> not DOA, reliable, did data verification and passed
> > 
> > I tested nfsv3 in both IB and RoCE modes with rsize=65536 and
> > wsize=65536 -> not DOA, but not reliable either, data transfers
> > will stop after a certain amount has been transferred and the
> > mount will have a soft hang
> 
> Can you clarify what you mean by “soft hang?” Are you seeing a
> problem when mounting with the “soft” mount option, or does this
> mean “CPU soft lockup?” (INFO: task hung for 120 seconds)

Neither of those options actually.  I'm using hard,intr on the mount
flags, and by soft hang I mean that the application copying data
will come to a stop and never make any progress again.  When that
happens, you can usually interrupt the process and get back to the
command line, but it doesn't clean up internally in the kernel
because from that point on, attempts to unmount the nfs filesystem
return EBUSY.


> > ToDo items that I see:
> > 
> > Write NFSv4 rdma protocol mount support
> 
> NFSv4 does not use the MNT protocol. If NFSv4 is not working for you,
> there’s something else going on. For me NFSv4 works as well as NFSv3.
> Let me know if you need help troubleshooting.

OK, I'll see if I'm doing something wrong.  I can do nfs4 tcp mounts
just fine, but trying to do nfs4 rdma mounts results in operation not
permitted returns on the client.  And nfs3 mounts using rdma work as
expected.  This is all with the same server, same client, same mount
point, etc.

> > Fix client soft mount hangs when rsize/wsize > 32768
> 
> Does that problem occur with unpatched v3.15-rc3 on the client?

Probably.  I've been able to reproduce this for a while.  I originally
thought it was a problem between Mellanox <-> QLogic/Intel operation
because it reproduces faster in that environment, but I can get it to
reproduce in Mellanox <-> Mellanox situations too.

> HCAs/RNICs that support MTHCAFMR and FRMR should be working up to the
> largest rsize and wsize supported by the client and server.
> 
> When I use ALLPHYSICAL with large wsize, typically the server starts
> dropping NFS WRITE requests. The client retries them forever, and
> that
> looks like a mount point hang.
> 
> Something like https://bugzilla.linux-nfs.org/show_bug.cgi?id=248

This sounds like what I'm seeing here too.

> > Fix DOA of ocrdma driver
> 
> Does that problem occur with unpatched v3.15-rc3 on the client?

Haven't tried.  I'll queue that up for next week.

> Emulex has reported some problems when reconnecting, but
> I haven’t heard of issues that occur right at mount time.
> 
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD
	      http://people.redhat.com/dledford

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH V3 00/17] NFS/RDMA client-side patches
  2014-05-02 20:20       ` Chuck Lever
@ 2014-05-02 22:34           ` Doug Ledford
  -1 siblings, 0 replies; 60+ messages in thread
From: Doug Ledford @ 2014-05-02 22:34 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Anna Schumaker, Linux NFS Mailing List,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier, Allen Andrews

----- Original Message -----
> 
> On May 2, 2014, at 3:27 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > I tested nfsv3 in both IB and RoCE modes with rsize=32768 and
> > wsize=32768 -> not DOA, reliable, did data verification and passed
> > 
> > I tested nfsv3 in both IB and RoCE modes with rsize=65536 and
> > wsize=65536 -> not DOA, but not reliable either, data transfers
> > will stop after a certain amount has been transferred and the
> > mount will have a soft hang
> 
> Can you clarify what you mean by “soft hang?” Are you seeing a
> problem when mounting with the “soft” mount option, or does this
> mean “CPU soft lockup?” (INFO: task hung for 120 seconds)

Neither of those options actually.  I'm using hard,intr on the mount
flags, and by soft hang I mean that the application copying data
will come to a stop and never make any progress again.  When that
happens, you can usually interrupt the process and get back to the
command line, but it doesn't clean up internally in the kernel
because from that point on, attempts to unmount the nfs filesystem
return EBUSY.


> > ToDo items that I see:
> > 
> > Write NFSv4 rdma protocol mount support
> 
> NFSv4 does not use the MNT protocol. If NFSv4 is not working for you,
> there’s something else going on. For me NFSv4 works as well as NFSv3.
> Let me know if you need help troubleshooting.

OK, I'll see if I'm doing something wrong.  I can do nfs4 tcp mounts
just fine, but trying to do nfs4 rdma mounts results in operation not
permitted returns on the client.  And nfs3 mounts using rdma work as
expected.  This is all with the same server, same client, same mount
point, etc.

> > Fix client soft mount hangs when rsize/wsize > 32768
> 
> Does that problem occur with unpatched v3.15-rc3 on the client?

Probably.  I've been able to reproduce this for a while.  I originally
thought it was a problem between Mellanox <-> QLogic/Intel operation
because it reproduces faster in that environment, but I can get it to
reproduce in Mellanox <-> Mellanox situations too.

> HCAs/RNICs that support MTHCAFMR and FRMR should be working up to the
> largest rsize and wsize supported by the client and server.
> 
> When I use ALLPHYSICAL with large wsize, typically the server starts
> dropping NFS WRITE requests. The client retries them forever, and
> that
> looks like a mount point hang.
> 
> Something like https://bugzilla.linux-nfs.org/show_bug.cgi?id=248

This sounds like what I'm seeing here too.

> > Fix DOA of ocrdma driver
> 
> Does that problem occur with unpatched v3.15-rc3 on the client?

Haven't tried.  I'll queue that up for next week.

> Emulex has reported some problems when reconnecting, but
> I haven’t heard of issues that occur right at mount time.
> 
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD
	      http://people.redhat.com/dledford

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH V3 00/17] NFS/RDMA client-side patches
  2014-05-02 20:20       ` Chuck Lever
  (?)
  (?)
@ 2014-05-02 22:34       ` Doug Ledford
  -1 siblings, 0 replies; 60+ messages in thread
From: Doug Ledford @ 2014-05-02 22:34 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Anna Schumaker, Linux NFS Mailing List, linux-rdma,
	Roland Dreier, Allen Andrews

----- Original Message -----
> 
> On May 2, 2014, at 3:27 PM, Doug Ledford <dledford@redhat.com> wrote:
> 
> > I tested nfsv3 in both IB and RoCE modes with rsize=32768 and
> > wsize=32768 -> not DOA, reliable, did data verification and passed
> > 
> > I tested nfsv3 in both IB and RoCE modes with rsize=65536 and
> > wsize=65536 -> not DOA, but not reliable either, data transfers
> > will stop after a certain amount has been transferred and the
> > mount will have a soft hang
> 
> Can you clarify what you mean by “soft hang?” Are you seeing a
> problem when mounting with the “soft” mount option, or does this
> mean “CPU soft lockup?” (INFO: task hung for 120 seconds)

Neither of those options actually.  I'm using hard,intr on the mount
flags, and by soft hang I mean that the application copying data
will come to a stop and never make any progress again.  When that
happens, you can usually interrupt the process and get back to the
command line, but it doesn't clean up internally in the kernel
because from that point on, attempts to unmount the nfs filesystem
return EBUSY.


> > ToDo items that I see:
> > 
> > Write NFSv4 rdma protocol mount support
> 
> NFSv4 does not use the MNT protocol. If NFSv4 is not working for you,
> there’s something else going on. For me NFSv4 works as well as NFSv3.
> Let me know if you need help troubleshooting.

OK, I'll see if I'm doing something wrong.  I can do nfs4 tcp mounts
just fine, but trying to do nfs4 rdma mounts results in operation not
permitted returns on the client.  And nfs3 mounts using rdma work as
expected.  This is all with the same server, same client, same mount
point, etc.

> > Fix client soft mount hangs when rsize/wsize > 32768
> 
> Does that problem occur with unpatched v3.15-rc3 on the client?

Probably.  I've been able to reproduce this for a while.  I originally
thought it was a problem between Mellanox <-> QLogic/Intel operation
because it reproduces faster in that environment, but I can get it to
reproduce in Mellanox <-> Mellanox situations too.

> HCAs/RNICs that support MTHCAFMR and FRMR should be working up to the
> largest rsize and wsize supported by the client and server.
> 
> When I use ALLPHYSICAL with large wsize, typically the server starts
> dropping NFS WRITE requests. The client retries them forever, and
> that
> looks like a mount point hang.
> 
> Something like https://bugzilla.linux-nfs.org/show_bug.cgi?id=248

This sounds like what I'm seeing here too.

> > Fix DOA of ocrdma driver
> 
> Does that problem occur with unpatched v3.15-rc3 on the client?

Haven't tried.  I'll queue that up for next week.

> Emulex has reported some problems when reconnecting, but
> I haven’t heard of issues that occur right at mount time.
> 
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: 0E572FDD
	      http://people.redhat.com/dledford


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH V3 00/17] NFS/RDMA client-side patches
@ 2014-05-02 22:34           ` Doug Ledford
  0 siblings, 0 replies; 60+ messages in thread
From: Doug Ledford @ 2014-05-02 22:34 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Anna Schumaker, Linux NFS Mailing List, linux-rdma,
	Roland Dreier, Allen Andrews

----- Original Message -----
>=20
> On May 2, 2014, at 3:27 PM, Doug Ledford <dledford@redhat.com> wrote:
>=20
> > I tested nfsv3 in both IB and RoCE modes with rsize=3D32768 and
> > wsize=3D32768 -> not DOA, reliable, did data verification and passe=
d
> >=20
> > I tested nfsv3 in both IB and RoCE modes with rsize=3D65536 and
> > wsize=3D65536 -> not DOA, but not reliable either, data transfers
> > will stop after a certain amount has been transferred and the
> > mount will have a soft hang
>=20
> Can you clarify what you mean by =E2=80=9Csoft hang?=E2=80=9D Are you=
 seeing a
> problem when mounting with the =E2=80=9Csoft=E2=80=9D mount option, o=
r does this
> mean =E2=80=9CCPU soft lockup?=E2=80=9D (INFO: task hung for 120 seco=
nds)

Neither of those options actually.  I'm using hard,intr on the mount
flags, and by soft hang I mean that the application copying data
will come to a stop and never make any progress again.  When that
happens, you can usually interrupt the process and get back to the
command line, but it doesn't clean up internally in the kernel
because from that point on, attempts to unmount the nfs filesystem
return EBUSY.


> > ToDo items that I see:
> >=20
> > Write NFSv4 rdma protocol mount support
>=20
> NFSv4 does not use the MNT protocol. If NFSv4 is not working for you,
> there=E2=80=99s something else going on. For me NFSv4 works as well a=
s NFSv3.
> Let me know if you need help troubleshooting.

OK, I'll see if I'm doing something wrong.  I can do nfs4 tcp mounts
just fine, but trying to do nfs4 rdma mounts results in operation not
permitted returns on the client.  And nfs3 mounts using rdma work as
expected.  This is all with the same server, same client, same mount
point, etc.

> > Fix client soft mount hangs when rsize/wsize > 32768
>=20
> Does that problem occur with unpatched v3.15-rc3 on the client?

Probably.  I've been able to reproduce this for a while.  I originally
thought it was a problem between Mellanox <-> QLogic/Intel operation
because it reproduces faster in that environment, but I can get it to
reproduce in Mellanox <-> Mellanox situations too.

> HCAs/RNICs that support MTHCAFMR and FRMR should be working up to the
> largest rsize and wsize supported by the client and server.
>=20
> When I use ALLPHYSICAL with large wsize, typically the server starts
> dropping NFS WRITE requests. The client retries them forever, and
> that
> looks like a mount point hang.
>=20
> Something like https://bugzilla.linux-nfs.org/show_bug.cgi?id=3D248

This sounds like what I'm seeing here too.

> > Fix DOA of ocrdma driver
>=20
> Does that problem occur with unpatched v3.15-rc3 on the client?

Haven't tried.  I'll queue that up for next week.

> Emulex has reported some problems when reconnecting, but
> I haven=E2=80=99t heard of issues that occur right at mount time.
>=20
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>=20
>=20
>=20
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>=20

--=20
Doug Ledford <dledford@redhat.com>
              GPG KeyID: 0E572FDD
	      http://people.redhat.com/dledford

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH V3 00/17] NFS/RDMA client-side patches
@ 2014-05-02 22:34           ` Doug Ledford
  0 siblings, 0 replies; 60+ messages in thread
From: Doug Ledford @ 2014-05-02 22:34 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Anna Schumaker, Linux NFS Mailing List, linux-rdma,
	Roland Dreier, Allen Andrews

----- Original Message -----
>=20
> On May 2, 2014, at 3:27 PM, Doug Ledford <dledford@redhat.com> wrote:
>=20
> > I tested nfsv3 in both IB and RoCE modes with rsize=3D32768 and
> > wsize=3D32768 -> not DOA, reliable, did data verification and passe=
d
> >=20
> > I tested nfsv3 in both IB and RoCE modes with rsize=3D65536 and
> > wsize=3D65536 -> not DOA, but not reliable either, data transfers
> > will stop after a certain amount has been transferred and the
> > mount will have a soft hang
>=20
> Can you clarify what you mean by =E2=80=9Csoft hang?=E2=80=9D Are you=
 seeing a
> problem when mounting with the =E2=80=9Csoft=E2=80=9D mount option, o=
r does this
> mean =E2=80=9CCPU soft lockup?=E2=80=9D (INFO: task hung for 120 seco=
nds)

Neither of those options actually.  I'm using hard,intr on the mount
flags, and by soft hang I mean that the application copying data
will come to a stop and never make any progress again.  When that
happens, you can usually interrupt the process and get back to the
command line, but it doesn't clean up internally in the kernel
because from that point on, attempts to unmount the nfs filesystem
return EBUSY.


> > ToDo items that I see:
> >=20
> > Write NFSv4 rdma protocol mount support
>=20
> NFSv4 does not use the MNT protocol. If NFSv4 is not working for you,
> there=E2=80=99s something else going on. For me NFSv4 works as well a=
s NFSv3.
> Let me know if you need help troubleshooting.

OK, I'll see if I'm doing something wrong.  I can do nfs4 tcp mounts
just fine, but trying to do nfs4 rdma mounts results in operation not
permitted returns on the client.  And nfs3 mounts using rdma work as
expected.  This is all with the same server, same client, same mount
point, etc.

> > Fix client soft mount hangs when rsize/wsize > 32768
>=20
> Does that problem occur with unpatched v3.15-rc3 on the client?

Probably.  I've been able to reproduce this for a while.  I originally
thought it was a problem between Mellanox <-> QLogic/Intel operation
because it reproduces faster in that environment, but I can get it to
reproduce in Mellanox <-> Mellanox situations too.

> HCAs/RNICs that support MTHCAFMR and FRMR should be working up to the
> largest rsize and wsize supported by the client and server.
>=20
> When I use ALLPHYSICAL with large wsize, typically the server starts
> dropping NFS WRITE requests. The client retries them forever, and
> that
> looks like a mount point hang.
>=20
> Something like https://bugzilla.linux-nfs.org/show_bug.cgi?id=3D248

This sounds like what I'm seeing here too.

> > Fix DOA of ocrdma driver
>=20
> Does that problem occur with unpatched v3.15-rc3 on the client?

Haven't tried.  I'll queue that up for next week.

> Emulex has reported some problems when reconnecting, but
> I haven=E2=80=99t heard of issues that occur right at mount time.
>=20
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>=20
>=20
>=20
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>=20

--=20
Doug Ledford <dledford@redhat.com>
              GPG KeyID: 0E572FDD
	      http://people.redhat.com/dledford


^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH V3 01/17] xprtrdma: mind the device's max fast register page list depth
  2014-04-30 19:29     ` Chuck Lever
@ 2014-05-16  7:08         ` Devesh Sharma
  -1 siblings, 0 replies; 60+ messages in thread
From: Devesh Sharma @ 2014-05-16  7:08 UTC (permalink / raw)
  To: Chuck Lever, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA

Chuck

This patch is causing a CPU soft-lockup if underlying vendor reports devattr.max_fast_reg_pagr_list_len = 0 and ia->ri_memreg_strategy = FRMR (Default option).
I think there is need to refer to device capability flags. If strategy = FRMR is forced and devattr.max_fast_reg_pagr_list_len=0 then flash an error and fail RPC with -EIO.

See inline:

> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> owner@vger.kernel.org] On Behalf Of Chuck Lever
> Sent: Thursday, May 01, 2014 1:00 AM
> To: linux-nfs@vger.kernel.org; linux-rdma@vger.kernel.org
> Cc: Anna.Schumaker@netapp.com
> Subject: [PATCH V3 01/17] xprtrdma: mind the device's max fast register
> page list depth
> 
> From: Steve Wise <swise@opengridcomputing.com>
> 
> Some rdma devices don't support a fast register page list depth of at least
> RPCRDMA_MAX_DATA_SEGS.  So xprtrdma needs to chunk its fast register
> regions according to the minimum of the device max supported depth or
> RPCRDMA_MAX_DATA_SEGS.
> 
> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> 
>  net/sunrpc/xprtrdma/rpc_rdma.c  |    4 ---
>  net/sunrpc/xprtrdma/verbs.c     |   47 +++++++++++++++++++++++++++++-
> ---------
>  net/sunrpc/xprtrdma/xprt_rdma.h |    1 +
>  3 files changed, 36 insertions(+), 16 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c
> b/net/sunrpc/xprtrdma/rpc_rdma.c index 96ead52..400aa1b 100644
> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> @@ -248,10 +248,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct
> xdr_buf *target,
>  	/* success. all failures return above */
>  	req->rl_nchunks = nchunks;
> 
> -	BUG_ON(nchunks == 0);
> -	BUG_ON((r_xprt->rx_ia.ri_memreg_strategy == RPCRDMA_FRMR)
> -	       && (nchunks > 3));
> -
>  	/*
>  	 * finish off header. If write, marshal discrim and nchunks.
>  	 */
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 9372656..55fb09a 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -539,6 +539,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct
> sockaddr *addr, int memreg)
>  				__func__);
>  			memreg = RPCRDMA_REGISTER;
>  #endif
> +		} else {
> +			/* Mind the ia limit on FRMR page list depth */
> +			ia->ri_max_frmr_depth = min_t(unsigned int,
> +				RPCRDMA_MAX_DATA_SEGS,
> +				devattr.max_fast_reg_page_list_len);
>  		}
>  		break;
>  	}
> @@ -659,24 +664,42 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct
> rpcrdma_ia *ia,
>  	ep->rep_attr.srq = NULL;
>  	ep->rep_attr.cap.max_send_wr = cdata->max_requests;
>  	switch (ia->ri_memreg_strategy) {
> -	case RPCRDMA_FRMR:
> +	case RPCRDMA_FRMR: {
> +		int depth = 7;
> +
>  		/* Add room for frmr register and invalidate WRs.
>  		 * 1. FRMR reg WR for head
>  		 * 2. FRMR invalidate WR for head
> -		 * 3. FRMR reg WR for pagelist
> -		 * 4. FRMR invalidate WR for pagelist
> +		 * 3. N FRMR reg WRs for pagelist
> +		 * 4. N FRMR invalidate WRs for pagelist
>  		 * 5. FRMR reg WR for tail
>  		 * 6. FRMR invalidate WR for tail
>  		 * 7. The RDMA_SEND WR
>  		 */
> -		ep->rep_attr.cap.max_send_wr *= 7;
> +
> +		/* Calculate N if the device max FRMR depth is smaller than
> +		 * RPCRDMA_MAX_DATA_SEGS.
> +		 */
> +		if (ia->ri_max_frmr_depth < RPCRDMA_MAX_DATA_SEGS) {
> +			int delta = RPCRDMA_MAX_DATA_SEGS -
> +				    ia->ri_max_frmr_depth;
> +
> +			do {
> +				depth += 2; /* FRMR reg + invalidate */
> +				delta -= ia->ri_max_frmr_depth;

If ia->ri_max_frmr_depth is = 0. This loop becomes infinite loop.

> +			} while (delta > 0);
> +
> +		}
> +		ep->rep_attr.cap.max_send_wr *= depth;
>  		if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr) {
> -			cdata->max_requests = devattr.max_qp_wr / 7;
> +			cdata->max_requests = devattr.max_qp_wr / depth;
>  			if (!cdata->max_requests)
>  				return -EINVAL;
> -			ep->rep_attr.cap.max_send_wr = cdata-
> >max_requests * 7;
> +			ep->rep_attr.cap.max_send_wr = cdata-
> >max_requests *
> +						       depth;
>  		}
>  		break;
> +	}
>  	case RPCRDMA_MEMWINDOWS_ASYNC:
>  	case RPCRDMA_MEMWINDOWS:
>  		/* Add room for mw_binds+unbinds - overkill! */ @@ -
> 1043,16 +1066,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf,
> struct rpcrdma_ep *ep,
>  	case RPCRDMA_FRMR:
>  		for (i = buf->rb_max_requests * RPCRDMA_MAX_SEGS; i; i--
> ) {
>  			r->r.frmr.fr_mr = ib_alloc_fast_reg_mr(ia->ri_pd,
> -
> RPCRDMA_MAX_SEGS);
> +						ia->ri_max_frmr_depth);
>  			if (IS_ERR(r->r.frmr.fr_mr)) {
>  				rc = PTR_ERR(r->r.frmr.fr_mr);
>  				dprintk("RPC:       %s: ib_alloc_fast_reg_mr"
>  					" failed %i\n", __func__, rc);
>  				goto out;
>  			}
> -			r->r.frmr.fr_pgl =
> -				ib_alloc_fast_reg_page_list(ia->ri_id-
> >device,
> -
> RPCRDMA_MAX_SEGS);
> +			r->r.frmr.fr_pgl = ib_alloc_fast_reg_page_list(
> +						ia->ri_id->device,
> +						ia->ri_max_frmr_depth);
>  			if (IS_ERR(r->r.frmr.fr_pgl)) {
>  				rc = PTR_ERR(r->r.frmr.fr_pgl);
>  				dprintk("RPC:       %s: "
> @@ -1498,8 +1521,8 @@ rpcrdma_register_frmr_external(struct
> rpcrdma_mr_seg *seg,
>  	seg1->mr_offset -= pageoff;	/* start of page */
>  	seg1->mr_len += pageoff;
>  	len = -pageoff;
> -	if (*nsegs > RPCRDMA_MAX_DATA_SEGS)
> -		*nsegs = RPCRDMA_MAX_DATA_SEGS;
> +	if (*nsegs > ia->ri_max_frmr_depth)
> +		*nsegs = ia->ri_max_frmr_depth;
>  	for (page_no = i = 0; i < *nsegs;) {
>  		rpcrdma_map_one(ia, seg, writing);
>  		pa = seg->mr_dma;
> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h
> b/net/sunrpc/xprtrdma/xprt_rdma.h index cc1445d..98340a3 100644
> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
> @@ -66,6 +66,7 @@ struct rpcrdma_ia {
>  	struct completion	ri_done;
>  	int			ri_async_rc;
>  	enum rpcrdma_memreg	ri_memreg_strategy;
> +	unsigned int		ri_max_frmr_depth;
>  };
> 
>  /*
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH V3 01/17] xprtrdma: mind the device's max fast register page list depth
@ 2014-05-16  7:08         ` Devesh Sharma
  0 siblings, 0 replies; 60+ messages in thread
From: Devesh Sharma @ 2014-05-16  7:08 UTC (permalink / raw)
  To: Chuck Lever, linux-nfs, linux-rdma; +Cc: Anna.Schumaker

Q2h1Y2sNCg0KVGhpcyBwYXRjaCBpcyBjYXVzaW5nIGEgQ1BVIHNvZnQtbG9ja3VwIGlmIHVuZGVy
bHlpbmcgdmVuZG9yIHJlcG9ydHMgZGV2YXR0ci5tYXhfZmFzdF9yZWdfcGFncl9saXN0X2xlbiA9
IDAgYW5kIGlhLT5yaV9tZW1yZWdfc3RyYXRlZ3kgPSBGUk1SIChEZWZhdWx0IG9wdGlvbikuDQpJ
IHRoaW5rIHRoZXJlIGlzIG5lZWQgdG8gcmVmZXIgdG8gZGV2aWNlIGNhcGFiaWxpdHkgZmxhZ3Mu
IElmIHN0cmF0ZWd5ID0gRlJNUiBpcyBmb3JjZWQgYW5kIGRldmF0dHIubWF4X2Zhc3RfcmVnX3Bh
Z3JfbGlzdF9sZW49MCB0aGVuIGZsYXNoIGFuIGVycm9yIGFuZCBmYWlsIFJQQyB3aXRoIC1FSU8u
DQoNClNlZSBpbmxpbmU6DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTog
bGludXgtcmRtYS1vd25lckB2Z2VyLmtlcm5lbC5vcmcgW21haWx0bzpsaW51eC1yZG1hLQ0KPiBv
d25lckB2Z2VyLmtlcm5lbC5vcmddIE9uIEJlaGFsZiBPZiBDaHVjayBMZXZlcg0KPiBTZW50OiBU
aHVyc2RheSwgTWF5IDAxLCAyMDE0IDE6MDAgQU0NCj4gVG86IGxpbnV4LW5mc0B2Z2VyLmtlcm5l
bC5vcmc7IGxpbnV4LXJkbWFAdmdlci5rZXJuZWwub3JnDQo+IENjOiBBbm5hLlNjaHVtYWtlckBu
ZXRhcHAuY29tDQo+IFN1YmplY3Q6IFtQQVRDSCBWMyAwMS8xN10geHBydHJkbWE6IG1pbmQgdGhl
IGRldmljZSdzIG1heCBmYXN0IHJlZ2lzdGVyDQo+IHBhZ2UgbGlzdCBkZXB0aA0KPiANCj4gRnJv
bTogU3RldmUgV2lzZSA8c3dpc2VAb3BlbmdyaWRjb21wdXRpbmcuY29tPg0KPiANCj4gU29tZSBy
ZG1hIGRldmljZXMgZG9uJ3Qgc3VwcG9ydCBhIGZhc3QgcmVnaXN0ZXIgcGFnZSBsaXN0IGRlcHRo
IG9mIGF0IGxlYXN0DQo+IFJQQ1JETUFfTUFYX0RBVEFfU0VHUy4gIFNvIHhwcnRyZG1hIG5lZWRz
IHRvIGNodW5rIGl0cyBmYXN0IHJlZ2lzdGVyDQo+IHJlZ2lvbnMgYWNjb3JkaW5nIHRvIHRoZSBt
aW5pbXVtIG9mIHRoZSBkZXZpY2UgbWF4IHN1cHBvcnRlZCBkZXB0aCBvcg0KPiBSUENSRE1BX01B
WF9EQVRBX1NFR1MuDQo+IA0KPiBTaWduZWQtb2ZmLWJ5OiBTdGV2ZSBXaXNlIDxzd2lzZUBvcGVu
Z3JpZGNvbXB1dGluZy5jb20+DQo+IFJldmlld2VkLWJ5OiBDaHVjayBMZXZlciA8Y2h1Y2subGV2
ZXJAb3JhY2xlLmNvbT4NCj4gLS0tDQo+IA0KPiAgbmV0L3N1bnJwYy94cHJ0cmRtYS9ycGNfcmRt
YS5jICB8ICAgIDQgLS0tDQo+ICBuZXQvc3VucnBjL3hwcnRyZG1hL3ZlcmJzLmMgICAgIHwgICA0
NyArKysrKysrKysrKysrKysrKysrKysrKysrKysrKy0NCj4gLS0tLS0tLS0tDQo+ICBuZXQvc3Vu
cnBjL3hwcnRyZG1hL3hwcnRfcmRtYS5oIHwgICAgMSArDQo+ICAzIGZpbGVzIGNoYW5nZWQsIDM2
IGluc2VydGlvbnMoKyksIDE2IGRlbGV0aW9ucygtKQ0KPiANCj4gZGlmZiAtLWdpdCBhL25ldC9z
dW5ycGMveHBydHJkbWEvcnBjX3JkbWEuYw0KPiBiL25ldC9zdW5ycGMveHBydHJkbWEvcnBjX3Jk
bWEuYyBpbmRleCA5NmVhZDUyLi40MDBhYTFiIDEwMDY0NA0KPiAtLS0gYS9uZXQvc3VucnBjL3hw
cnRyZG1hL3JwY19yZG1hLmMNCj4gKysrIGIvbmV0L3N1bnJwYy94cHJ0cmRtYS9ycGNfcmRtYS5j
DQo+IEBAIC0yNDgsMTAgKzI0OCw2IEBAIHJwY3JkbWFfY3JlYXRlX2NodW5rcyhzdHJ1Y3QgcnBj
X3Jxc3QgKnJxc3QsIHN0cnVjdA0KPiB4ZHJfYnVmICp0YXJnZXQsDQo+ICAJLyogc3VjY2Vzcy4g
YWxsIGZhaWx1cmVzIHJldHVybiBhYm92ZSAqLw0KPiAgCXJlcS0+cmxfbmNodW5rcyA9IG5jaHVu
a3M7DQo+IA0KPiAtCUJVR19PTihuY2h1bmtzID09IDApOw0KPiAtCUJVR19PTigocl94cHJ0LT5y
eF9pYS5yaV9tZW1yZWdfc3RyYXRlZ3kgPT0gUlBDUkRNQV9GUk1SKQ0KPiAtCSAgICAgICAmJiAo
bmNodW5rcyA+IDMpKTsNCj4gLQ0KPiAgCS8qDQo+ICAJICogZmluaXNoIG9mZiBoZWFkZXIuIElm
IHdyaXRlLCBtYXJzaGFsIGRpc2NyaW0gYW5kIG5jaHVua3MuDQo+ICAJICovDQo+IGRpZmYgLS1n
aXQgYS9uZXQvc3VucnBjL3hwcnRyZG1hL3ZlcmJzLmMgYi9uZXQvc3VucnBjL3hwcnRyZG1hL3Zl
cmJzLmMNCj4gaW5kZXggOTM3MjY1Ni4uNTVmYjA5YSAxMDA2NDQNCj4gLS0tIGEvbmV0L3N1bnJw
Yy94cHJ0cmRtYS92ZXJicy5jDQo+ICsrKyBiL25ldC9zdW5ycGMveHBydHJkbWEvdmVyYnMuYw0K
PiBAQCAtNTM5LDYgKzUzOSwxMSBAQCBycGNyZG1hX2lhX29wZW4oc3RydWN0IHJwY3JkbWFfeHBy
dCAqeHBydCwgc3RydWN0DQo+IHNvY2thZGRyICphZGRyLCBpbnQgbWVtcmVnKQ0KPiAgCQkJCV9f
ZnVuY19fKTsNCj4gIAkJCW1lbXJlZyA9IFJQQ1JETUFfUkVHSVNURVI7DQo+ICAjZW5kaWYNCj4g
KwkJfSBlbHNlIHsNCj4gKwkJCS8qIE1pbmQgdGhlIGlhIGxpbWl0IG9uIEZSTVIgcGFnZSBsaXN0
IGRlcHRoICovDQo+ICsJCQlpYS0+cmlfbWF4X2ZybXJfZGVwdGggPSBtaW5fdCh1bnNpZ25lZCBp
bnQsDQo+ICsJCQkJUlBDUkRNQV9NQVhfREFUQV9TRUdTLA0KPiArCQkJCWRldmF0dHIubWF4X2Zh
c3RfcmVnX3BhZ2VfbGlzdF9sZW4pOw0KPiAgCQl9DQo+ICAJCWJyZWFrOw0KPiAgCX0NCj4gQEAg
LTY1OSwyNCArNjY0LDQyIEBAIHJwY3JkbWFfZXBfY3JlYXRlKHN0cnVjdCBycGNyZG1hX2VwICpl
cCwgc3RydWN0DQo+IHJwY3JkbWFfaWEgKmlhLA0KPiAgCWVwLT5yZXBfYXR0ci5zcnEgPSBOVUxM
Ow0KPiAgCWVwLT5yZXBfYXR0ci5jYXAubWF4X3NlbmRfd3IgPSBjZGF0YS0+bWF4X3JlcXVlc3Rz
Ow0KPiAgCXN3aXRjaCAoaWEtPnJpX21lbXJlZ19zdHJhdGVneSkgew0KPiAtCWNhc2UgUlBDUkRN
QV9GUk1SOg0KPiArCWNhc2UgUlBDUkRNQV9GUk1SOiB7DQo+ICsJCWludCBkZXB0aCA9IDc7DQo+
ICsNCj4gIAkJLyogQWRkIHJvb20gZm9yIGZybXIgcmVnaXN0ZXIgYW5kIGludmFsaWRhdGUgV1Jz
Lg0KPiAgCQkgKiAxLiBGUk1SIHJlZyBXUiBmb3IgaGVhZA0KPiAgCQkgKiAyLiBGUk1SIGludmFs
aWRhdGUgV1IgZm9yIGhlYWQNCj4gLQkJICogMy4gRlJNUiByZWcgV1IgZm9yIHBhZ2VsaXN0DQo+
IC0JCSAqIDQuIEZSTVIgaW52YWxpZGF0ZSBXUiBmb3IgcGFnZWxpc3QNCj4gKwkJICogMy4gTiBG
Uk1SIHJlZyBXUnMgZm9yIHBhZ2VsaXN0DQo+ICsJCSAqIDQuIE4gRlJNUiBpbnZhbGlkYXRlIFdS
cyBmb3IgcGFnZWxpc3QNCj4gIAkJICogNS4gRlJNUiByZWcgV1IgZm9yIHRhaWwNCj4gIAkJICog
Ni4gRlJNUiBpbnZhbGlkYXRlIFdSIGZvciB0YWlsDQo+ICAJCSAqIDcuIFRoZSBSRE1BX1NFTkQg
V1INCj4gIAkJICovDQo+IC0JCWVwLT5yZXBfYXR0ci5jYXAubWF4X3NlbmRfd3IgKj0gNzsNCj4g
Kw0KPiArCQkvKiBDYWxjdWxhdGUgTiBpZiB0aGUgZGV2aWNlIG1heCBGUk1SIGRlcHRoIGlzIHNt
YWxsZXIgdGhhbg0KPiArCQkgKiBSUENSRE1BX01BWF9EQVRBX1NFR1MuDQo+ICsJCSAqLw0KPiAr
CQlpZiAoaWEtPnJpX21heF9mcm1yX2RlcHRoIDwgUlBDUkRNQV9NQVhfREFUQV9TRUdTKSB7DQo+
ICsJCQlpbnQgZGVsdGEgPSBSUENSRE1BX01BWF9EQVRBX1NFR1MgLQ0KPiArCQkJCSAgICBpYS0+
cmlfbWF4X2ZybXJfZGVwdGg7DQo+ICsNCj4gKwkJCWRvIHsNCj4gKwkJCQlkZXB0aCArPSAyOyAv
KiBGUk1SIHJlZyArIGludmFsaWRhdGUgKi8NCj4gKwkJCQlkZWx0YSAtPSBpYS0+cmlfbWF4X2Zy
bXJfZGVwdGg7DQoNCklmIGlhLT5yaV9tYXhfZnJtcl9kZXB0aCBpcyA9IDAuIFRoaXMgbG9vcCBi
ZWNvbWVzIGluZmluaXRlIGxvb3AuDQoNCj4gKwkJCX0gd2hpbGUgKGRlbHRhID4gMCk7DQo+ICsN
Cj4gKwkJfQ0KPiArCQllcC0+cmVwX2F0dHIuY2FwLm1heF9zZW5kX3dyICo9IGRlcHRoOw0KPiAg
CQlpZiAoZXAtPnJlcF9hdHRyLmNhcC5tYXhfc2VuZF93ciA+IGRldmF0dHIubWF4X3FwX3dyKSB7
DQo+IC0JCQljZGF0YS0+bWF4X3JlcXVlc3RzID0gZGV2YXR0ci5tYXhfcXBfd3IgLyA3Ow0KPiAr
CQkJY2RhdGEtPm1heF9yZXF1ZXN0cyA9IGRldmF0dHIubWF4X3FwX3dyIC8gZGVwdGg7DQo+ICAJ
CQlpZiAoIWNkYXRhLT5tYXhfcmVxdWVzdHMpDQo+ICAJCQkJcmV0dXJuIC1FSU5WQUw7DQo+IC0J
CQllcC0+cmVwX2F0dHIuY2FwLm1heF9zZW5kX3dyID0gY2RhdGEtDQo+ID5tYXhfcmVxdWVzdHMg
KiA3Ow0KPiArCQkJZXAtPnJlcF9hdHRyLmNhcC5tYXhfc2VuZF93ciA9IGNkYXRhLQ0KPiA+bWF4
X3JlcXVlc3RzICoNCj4gKwkJCQkJCSAgICAgICBkZXB0aDsNCj4gIAkJfQ0KPiAgCQlicmVhazsN
Cj4gKwl9DQo+ICAJY2FzZSBSUENSRE1BX01FTVdJTkRPV1NfQVNZTkM6DQo+ICAJY2FzZSBSUENS
RE1BX01FTVdJTkRPV1M6DQo+ICAJCS8qIEFkZCByb29tIGZvciBtd19iaW5kcyt1bmJpbmRzIC0g
b3ZlcmtpbGwhICovIEBAIC0NCj4gMTA0MywxNiArMTA2NiwxNiBAQCBycGNyZG1hX2J1ZmZlcl9j
cmVhdGUoc3RydWN0IHJwY3JkbWFfYnVmZmVyICpidWYsDQo+IHN0cnVjdCBycGNyZG1hX2VwICpl
cCwNCj4gIAljYXNlIFJQQ1JETUFfRlJNUjoNCj4gIAkJZm9yIChpID0gYnVmLT5yYl9tYXhfcmVx
dWVzdHMgKiBSUENSRE1BX01BWF9TRUdTOyBpOyBpLS0NCj4gKSB7DQo+ICAJCQlyLT5yLmZybXIu
ZnJfbXIgPSBpYl9hbGxvY19mYXN0X3JlZ19tcihpYS0+cmlfcGQsDQo+IC0NCj4gUlBDUkRNQV9N
QVhfU0VHUyk7DQo+ICsJCQkJCQlpYS0+cmlfbWF4X2ZybXJfZGVwdGgpOw0KPiAgCQkJaWYgKElT
X0VSUihyLT5yLmZybXIuZnJfbXIpKSB7DQo+ICAJCQkJcmMgPSBQVFJfRVJSKHItPnIuZnJtci5m
cl9tcik7DQo+ICAJCQkJZHByaW50aygiUlBDOiAgICAgICAlczogaWJfYWxsb2NfZmFzdF9yZWdf
bXIiDQo+ICAJCQkJCSIgZmFpbGVkICVpXG4iLCBfX2Z1bmNfXywgcmMpOw0KPiAgCQkJCWdvdG8g
b3V0Ow0KPiAgCQkJfQ0KPiAtCQkJci0+ci5mcm1yLmZyX3BnbCA9DQo+IC0JCQkJaWJfYWxsb2Nf
ZmFzdF9yZWdfcGFnZV9saXN0KGlhLT5yaV9pZC0NCj4gPmRldmljZSwNCj4gLQ0KPiBSUENSRE1B
X01BWF9TRUdTKTsNCj4gKwkJCXItPnIuZnJtci5mcl9wZ2wgPSBpYl9hbGxvY19mYXN0X3JlZ19w
YWdlX2xpc3QoDQo+ICsJCQkJCQlpYS0+cmlfaWQtPmRldmljZSwNCj4gKwkJCQkJCWlhLT5yaV9t
YXhfZnJtcl9kZXB0aCk7DQo+ICAJCQlpZiAoSVNfRVJSKHItPnIuZnJtci5mcl9wZ2wpKSB7DQo+
ICAJCQkJcmMgPSBQVFJfRVJSKHItPnIuZnJtci5mcl9wZ2wpOw0KPiAgCQkJCWRwcmludGsoIlJQ
QzogICAgICAgJXM6ICINCj4gQEAgLTE0OTgsOCArMTUyMSw4IEBAIHJwY3JkbWFfcmVnaXN0ZXJf
ZnJtcl9leHRlcm5hbChzdHJ1Y3QNCj4gcnBjcmRtYV9tcl9zZWcgKnNlZywNCj4gIAlzZWcxLT5t
cl9vZmZzZXQgLT0gcGFnZW9mZjsJLyogc3RhcnQgb2YgcGFnZSAqLw0KPiAgCXNlZzEtPm1yX2xl
biArPSBwYWdlb2ZmOw0KPiAgCWxlbiA9IC1wYWdlb2ZmOw0KPiAtCWlmICgqbnNlZ3MgPiBSUENS
RE1BX01BWF9EQVRBX1NFR1MpDQo+IC0JCSpuc2VncyA9IFJQQ1JETUFfTUFYX0RBVEFfU0VHUzsN
Cj4gKwlpZiAoKm5zZWdzID4gaWEtPnJpX21heF9mcm1yX2RlcHRoKQ0KPiArCQkqbnNlZ3MgPSBp
YS0+cmlfbWF4X2ZybXJfZGVwdGg7DQo+ICAJZm9yIChwYWdlX25vID0gaSA9IDA7IGkgPCAqbnNl
Z3M7KSB7DQo+ICAJCXJwY3JkbWFfbWFwX29uZShpYSwgc2VnLCB3cml0aW5nKTsNCj4gIAkJcGEg
PSBzZWctPm1yX2RtYTsNCj4gZGlmZiAtLWdpdCBhL25ldC9zdW5ycGMveHBydHJkbWEveHBydF9y
ZG1hLmgNCj4gYi9uZXQvc3VucnBjL3hwcnRyZG1hL3hwcnRfcmRtYS5oIGluZGV4IGNjMTQ0NWQu
Ljk4MzQwYTMgMTAwNjQ0DQo+IC0tLSBhL25ldC9zdW5ycGMveHBydHJkbWEveHBydF9yZG1hLmgN
Cj4gKysrIGIvbmV0L3N1bnJwYy94cHJ0cmRtYS94cHJ0X3JkbWEuaA0KPiBAQCAtNjYsNiArNjYs
NyBAQCBzdHJ1Y3QgcnBjcmRtYV9pYSB7DQo+ICAJc3RydWN0IGNvbXBsZXRpb24JcmlfZG9uZTsN
Cj4gIAlpbnQJCQlyaV9hc3luY19yYzsNCj4gIAllbnVtIHJwY3JkbWFfbWVtcmVnCXJpX21lbXJl
Z19zdHJhdGVneTsNCj4gKwl1bnNpZ25lZCBpbnQJCXJpX21heF9mcm1yX2RlcHRoOw0KPiAgfTsN
Cj4gDQo+ICAvKg0KPiANCj4gLS0NCj4gVG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNl
bmQgdGhlIGxpbmUgInVuc3Vic2NyaWJlIGxpbnV4LXJkbWEiIGluIHRoZQ0KPiBib2R5IG9mIGEg
bWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJuZWwub3JnIE1vcmUgbWFqb3Jkb21vIGluZm8g
YXQNCj4gaHR0cDovL3ZnZXIua2VybmVsLm9yZy9tYWpvcmRvbW8taW5mby5odG1sDQo=

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH V3 01/17] xprtrdma: mind the device's max fast register page list depth
  2014-05-16  7:08         ` Devesh Sharma
@ 2014-05-16 14:10             ` Steve Wise
  -1 siblings, 0 replies; 60+ messages in thread
From: Steve Wise @ 2014-05-16 14:10 UTC (permalink / raw)
  To: Devesh Sharma, Chuck Lever, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA

I guess the client code doesn't verify that the device supports the 
chosen memreg mode.  That's not good.   Lemme fix this and respin this 
patch.


On 5/16/2014 2:08 AM, Devesh Sharma wrote:
> Chuck
>
> This patch is causing a CPU soft-lockup if underlying vendor reports devattr.max_fast_reg_pagr_list_len = 0 and ia->ri_memreg_strategy = FRMR (Default option).
> I think there is need to refer to device capability flags. If strategy = FRMR is forced and devattr.max_fast_reg_pagr_list_len=0 then flash an error and fail RPC with -EIO.
>
> See inline:
>
>> -----Original Message-----
>> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
>> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Chuck Lever
>> Sent: Thursday, May 01, 2014 1:00 AM
>> To: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org
>> Subject: [PATCH V3 01/17] xprtrdma: mind the device's max fast register
>> page list depth
>>
>> From: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
>>
>> Some rdma devices don't support a fast register page list depth of at least
>> RPCRDMA_MAX_DATA_SEGS.  So xprtrdma needs to chunk its fast register
>> regions according to the minimum of the device max supported depth or
>> RPCRDMA_MAX_DATA_SEGS.
>>
>> Signed-off-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
>> Reviewed-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>> ---
>>
>>   net/sunrpc/xprtrdma/rpc_rdma.c  |    4 ---
>>   net/sunrpc/xprtrdma/verbs.c     |   47 +++++++++++++++++++++++++++++-
>> ---------
>>   net/sunrpc/xprtrdma/xprt_rdma.h |    1 +
>>   3 files changed, 36 insertions(+), 16 deletions(-)
>>
>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c
>> b/net/sunrpc/xprtrdma/rpc_rdma.c index 96ead52..400aa1b 100644
>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
>> @@ -248,10 +248,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct
>> xdr_buf *target,
>>   	/* success. all failures return above */
>>   	req->rl_nchunks = nchunks;
>>
>> -	BUG_ON(nchunks == 0);
>> -	BUG_ON((r_xprt->rx_ia.ri_memreg_strategy == RPCRDMA_FRMR)
>> -	       && (nchunks > 3));
>> -
>>   	/*
>>   	 * finish off header. If write, marshal discrim and nchunks.
>>   	 */
>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>> index 9372656..55fb09a 100644
>> --- a/net/sunrpc/xprtrdma/verbs.c
>> +++ b/net/sunrpc/xprtrdma/verbs.c
>> @@ -539,6 +539,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct
>> sockaddr *addr, int memreg)
>>   				__func__);
>>   			memreg = RPCRDMA_REGISTER;
>>   #endif
>> +		} else {
>> +			/* Mind the ia limit on FRMR page list depth */
>> +			ia->ri_max_frmr_depth = min_t(unsigned int,
>> +				RPCRDMA_MAX_DATA_SEGS,
>> +				devattr.max_fast_reg_page_list_len);
>>   		}
>>   		break;
>>   	}
>> @@ -659,24 +664,42 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct
>> rpcrdma_ia *ia,
>>   	ep->rep_attr.srq = NULL;
>>   	ep->rep_attr.cap.max_send_wr = cdata->max_requests;
>>   	switch (ia->ri_memreg_strategy) {
>> -	case RPCRDMA_FRMR:
>> +	case RPCRDMA_FRMR: {
>> +		int depth = 7;
>> +
>>   		/* Add room for frmr register and invalidate WRs.
>>   		 * 1. FRMR reg WR for head
>>   		 * 2. FRMR invalidate WR for head
>> -		 * 3. FRMR reg WR for pagelist
>> -		 * 4. FRMR invalidate WR for pagelist
>> +		 * 3. N FRMR reg WRs for pagelist
>> +		 * 4. N FRMR invalidate WRs for pagelist
>>   		 * 5. FRMR reg WR for tail
>>   		 * 6. FRMR invalidate WR for tail
>>   		 * 7. The RDMA_SEND WR
>>   		 */
>> -		ep->rep_attr.cap.max_send_wr *= 7;
>> +
>> +		/* Calculate N if the device max FRMR depth is smaller than
>> +		 * RPCRDMA_MAX_DATA_SEGS.
>> +		 */
>> +		if (ia->ri_max_frmr_depth < RPCRDMA_MAX_DATA_SEGS) {
>> +			int delta = RPCRDMA_MAX_DATA_SEGS -
>> +				    ia->ri_max_frmr_depth;
>> +
>> +			do {
>> +				depth += 2; /* FRMR reg + invalidate */
>> +				delta -= ia->ri_max_frmr_depth;
> If ia->ri_max_frmr_depth is = 0. This loop becomes infinite loop.
>
>> +			} while (delta > 0);
>> +
>> +		}
>> +		ep->rep_attr.cap.max_send_wr *= depth;
>>   		if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr) {
>> -			cdata->max_requests = devattr.max_qp_wr / 7;
>> +			cdata->max_requests = devattr.max_qp_wr / depth;
>>   			if (!cdata->max_requests)
>>   				return -EINVAL;
>> -			ep->rep_attr.cap.max_send_wr = cdata-
>>> max_requests * 7;
>> +			ep->rep_attr.cap.max_send_wr = cdata-
>>> max_requests *
>> +						       depth;
>>   		}
>>   		break;
>> +	}
>>   	case RPCRDMA_MEMWINDOWS_ASYNC:
>>   	case RPCRDMA_MEMWINDOWS:
>>   		/* Add room for mw_binds+unbinds - overkill! */ @@ -
>> 1043,16 +1066,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf,
>> struct rpcrdma_ep *ep,
>>   	case RPCRDMA_FRMR:
>>   		for (i = buf->rb_max_requests * RPCRDMA_MAX_SEGS; i; i--
>> ) {
>>   			r->r.frmr.fr_mr = ib_alloc_fast_reg_mr(ia->ri_pd,
>> -
>> RPCRDMA_MAX_SEGS);
>> +						ia->ri_max_frmr_depth);
>>   			if (IS_ERR(r->r.frmr.fr_mr)) {
>>   				rc = PTR_ERR(r->r.frmr.fr_mr);
>>   				dprintk("RPC:       %s: ib_alloc_fast_reg_mr"
>>   					" failed %i\n", __func__, rc);
>>   				goto out;
>>   			}
>> -			r->r.frmr.fr_pgl =
>> -				ib_alloc_fast_reg_page_list(ia->ri_id-
>>> device,
>> -
>> RPCRDMA_MAX_SEGS);
>> +			r->r.frmr.fr_pgl = ib_alloc_fast_reg_page_list(
>> +						ia->ri_id->device,
>> +						ia->ri_max_frmr_depth);
>>   			if (IS_ERR(r->r.frmr.fr_pgl)) {
>>   				rc = PTR_ERR(r->r.frmr.fr_pgl);
>>   				dprintk("RPC:       %s: "
>> @@ -1498,8 +1521,8 @@ rpcrdma_register_frmr_external(struct
>> rpcrdma_mr_seg *seg,
>>   	seg1->mr_offset -= pageoff;	/* start of page */
>>   	seg1->mr_len += pageoff;
>>   	len = -pageoff;
>> -	if (*nsegs > RPCRDMA_MAX_DATA_SEGS)
>> -		*nsegs = RPCRDMA_MAX_DATA_SEGS;
>> +	if (*nsegs > ia->ri_max_frmr_depth)
>> +		*nsegs = ia->ri_max_frmr_depth;
>>   	for (page_no = i = 0; i < *nsegs;) {
>>   		rpcrdma_map_one(ia, seg, writing);
>>   		pa = seg->mr_dma;
>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h
>> b/net/sunrpc/xprtrdma/xprt_rdma.h index cc1445d..98340a3 100644
>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
>> @@ -66,6 +66,7 @@ struct rpcrdma_ia {
>>   	struct completion	ri_done;
>>   	int			ri_async_rc;
>>   	enum rpcrdma_memreg	ri_memreg_strategy;
>> +	unsigned int		ri_max_frmr_depth;
>>   };
>>
>>   /*
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
>> body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
> N�����r��y���b�X��ǧv�^�)޺{.n�+����{���"��^n�r���z�\x1a��h����&��\x1e�G���h�\x03(�階�ݢj"��\x1a�^[m�����z�ޖ���f���h���~�mml==

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH V3 01/17] xprtrdma: mind the device's max fast register page list depth
@ 2014-05-16 14:10             ` Steve Wise
  0 siblings, 0 replies; 60+ messages in thread
From: Steve Wise @ 2014-05-16 14:10 UTC (permalink / raw)
  To: Devesh Sharma, Chuck Lever, linux-nfs, linux-rdma; +Cc: Anna.Schumaker

I guess the client code doesn't verify that the device supports the 
chosen memreg mode.  That's not good.   Lemme fix this and respin this 
patch.


On 5/16/2014 2:08 AM, Devesh Sharma wrote:
> Chuck
>
> This patch is causing a CPU soft-lockup if underlying vendor reports devattr.max_fast_reg_pagr_list_len = 0 and ia->ri_memreg_strategy = FRMR (Default option).
> I think there is need to refer to device capability flags. If strategy = FRMR is forced and devattr.max_fast_reg_pagr_list_len=0 then flash an error and fail RPC with -EIO.
>
> See inline:
>
>> -----Original Message-----
>> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
>> owner@vger.kernel.org] On Behalf Of Chuck Lever
>> Sent: Thursday, May 01, 2014 1:00 AM
>> To: linux-nfs@vger.kernel.org; linux-rdma@vger.kernel.org
>> Cc: Anna.Schumaker@netapp.com
>> Subject: [PATCH V3 01/17] xprtrdma: mind the device's max fast register
>> page list depth
>>
>> From: Steve Wise <swise@opengridcomputing.com>
>>
>> Some rdma devices don't support a fast register page list depth of at least
>> RPCRDMA_MAX_DATA_SEGS.  So xprtrdma needs to chunk its fast register
>> regions according to the minimum of the device max supported depth or
>> RPCRDMA_MAX_DATA_SEGS.
>>
>> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
>> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>
>>   net/sunrpc/xprtrdma/rpc_rdma.c  |    4 ---
>>   net/sunrpc/xprtrdma/verbs.c     |   47 +++++++++++++++++++++++++++++-
>> ---------
>>   net/sunrpc/xprtrdma/xprt_rdma.h |    1 +
>>   3 files changed, 36 insertions(+), 16 deletions(-)
>>
>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c
>> b/net/sunrpc/xprtrdma/rpc_rdma.c index 96ead52..400aa1b 100644
>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
>> @@ -248,10 +248,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, struct
>> xdr_buf *target,
>>   	/* success. all failures return above */
>>   	req->rl_nchunks = nchunks;
>>
>> -	BUG_ON(nchunks == 0);
>> -	BUG_ON((r_xprt->rx_ia.ri_memreg_strategy == RPCRDMA_FRMR)
>> -	       && (nchunks > 3));
>> -
>>   	/*
>>   	 * finish off header. If write, marshal discrim and nchunks.
>>   	 */
>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>> index 9372656..55fb09a 100644
>> --- a/net/sunrpc/xprtrdma/verbs.c
>> +++ b/net/sunrpc/xprtrdma/verbs.c
>> @@ -539,6 +539,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct
>> sockaddr *addr, int memreg)
>>   				__func__);
>>   			memreg = RPCRDMA_REGISTER;
>>   #endif
>> +		} else {
>> +			/* Mind the ia limit on FRMR page list depth */
>> +			ia->ri_max_frmr_depth = min_t(unsigned int,
>> +				RPCRDMA_MAX_DATA_SEGS,
>> +				devattr.max_fast_reg_page_list_len);
>>   		}
>>   		break;
>>   	}
>> @@ -659,24 +664,42 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct
>> rpcrdma_ia *ia,
>>   	ep->rep_attr.srq = NULL;
>>   	ep->rep_attr.cap.max_send_wr = cdata->max_requests;
>>   	switch (ia->ri_memreg_strategy) {
>> -	case RPCRDMA_FRMR:
>> +	case RPCRDMA_FRMR: {
>> +		int depth = 7;
>> +
>>   		/* Add room for frmr register and invalidate WRs.
>>   		 * 1. FRMR reg WR for head
>>   		 * 2. FRMR invalidate WR for head
>> -		 * 3. FRMR reg WR for pagelist
>> -		 * 4. FRMR invalidate WR for pagelist
>> +		 * 3. N FRMR reg WRs for pagelist
>> +		 * 4. N FRMR invalidate WRs for pagelist
>>   		 * 5. FRMR reg WR for tail
>>   		 * 6. FRMR invalidate WR for tail
>>   		 * 7. The RDMA_SEND WR
>>   		 */
>> -		ep->rep_attr.cap.max_send_wr *= 7;
>> +
>> +		/* Calculate N if the device max FRMR depth is smaller than
>> +		 * RPCRDMA_MAX_DATA_SEGS.
>> +		 */
>> +		if (ia->ri_max_frmr_depth < RPCRDMA_MAX_DATA_SEGS) {
>> +			int delta = RPCRDMA_MAX_DATA_SEGS -
>> +				    ia->ri_max_frmr_depth;
>> +
>> +			do {
>> +				depth += 2; /* FRMR reg + invalidate */
>> +				delta -= ia->ri_max_frmr_depth;
> If ia->ri_max_frmr_depth is = 0. This loop becomes infinite loop.
>
>> +			} while (delta > 0);
>> +
>> +		}
>> +		ep->rep_attr.cap.max_send_wr *= depth;
>>   		if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr) {
>> -			cdata->max_requests = devattr.max_qp_wr / 7;
>> +			cdata->max_requests = devattr.max_qp_wr / depth;
>>   			if (!cdata->max_requests)
>>   				return -EINVAL;
>> -			ep->rep_attr.cap.max_send_wr = cdata-
>>> max_requests * 7;
>> +			ep->rep_attr.cap.max_send_wr = cdata-
>>> max_requests *
>> +						       depth;
>>   		}
>>   		break;
>> +	}
>>   	case RPCRDMA_MEMWINDOWS_ASYNC:
>>   	case RPCRDMA_MEMWINDOWS:
>>   		/* Add room for mw_binds+unbinds - overkill! */ @@ -
>> 1043,16 +1066,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf,
>> struct rpcrdma_ep *ep,
>>   	case RPCRDMA_FRMR:
>>   		for (i = buf->rb_max_requests * RPCRDMA_MAX_SEGS; i; i--
>> ) {
>>   			r->r.frmr.fr_mr = ib_alloc_fast_reg_mr(ia->ri_pd,
>> -
>> RPCRDMA_MAX_SEGS);
>> +						ia->ri_max_frmr_depth);
>>   			if (IS_ERR(r->r.frmr.fr_mr)) {
>>   				rc = PTR_ERR(r->r.frmr.fr_mr);
>>   				dprintk("RPC:       %s: ib_alloc_fast_reg_mr"
>>   					" failed %i\n", __func__, rc);
>>   				goto out;
>>   			}
>> -			r->r.frmr.fr_pgl =
>> -				ib_alloc_fast_reg_page_list(ia->ri_id-
>>> device,
>> -
>> RPCRDMA_MAX_SEGS);
>> +			r->r.frmr.fr_pgl = ib_alloc_fast_reg_page_list(
>> +						ia->ri_id->device,
>> +						ia->ri_max_frmr_depth);
>>   			if (IS_ERR(r->r.frmr.fr_pgl)) {
>>   				rc = PTR_ERR(r->r.frmr.fr_pgl);
>>   				dprintk("RPC:       %s: "
>> @@ -1498,8 +1521,8 @@ rpcrdma_register_frmr_external(struct
>> rpcrdma_mr_seg *seg,
>>   	seg1->mr_offset -= pageoff;	/* start of page */
>>   	seg1->mr_len += pageoff;
>>   	len = -pageoff;
>> -	if (*nsegs > RPCRDMA_MAX_DATA_SEGS)
>> -		*nsegs = RPCRDMA_MAX_DATA_SEGS;
>> +	if (*nsegs > ia->ri_max_frmr_depth)
>> +		*nsegs = ia->ri_max_frmr_depth;
>>   	for (page_no = i = 0; i < *nsegs;) {
>>   		rpcrdma_map_one(ia, seg, writing);
>>   		pa = seg->mr_dma;
>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h
>> b/net/sunrpc/xprtrdma/xprt_rdma.h index cc1445d..98340a3 100644
>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
>> @@ -66,6 +66,7 @@ struct rpcrdma_ia {
>>   	struct completion	ri_done;
>>   	int			ri_async_rc;
>>   	enum rpcrdma_memreg	ri_memreg_strategy;
>> +	unsigned int		ri_max_frmr_depth;
>>   };
>>
>>   /*
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
>> body of a message to majordomo@vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
> N�����r��y���b�X��ǧv�^�)޺{.n�+����{���"��^n�r���z�\x1a��h����&��\x1e�G���h�\x03(�階�ݢj"��\x1a�^[m�����z�ޖ���f���h���~�mml==


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH V3 01/17] xprtrdma: mind the device's max fast register page list depth
  2014-05-16 14:10             ` Steve Wise
@ 2014-05-16 14:14                 ` Steve Wise
  -1 siblings, 0 replies; 60+ messages in thread
From: Steve Wise @ 2014-05-16 14:14 UTC (permalink / raw)
  To: Devesh Sharma, Chuck Lever, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA

By the way, Devesh:  Is the device advertising FRMR support, yet setting 
the max page list len to zero?  That's a driver bug...



On 5/16/2014 9:10 AM, Steve Wise wrote:
> I guess the client code doesn't verify that the device supports the 
> chosen memreg mode.  That's not good.   Lemme fix this and respin this 
> patch.
>
>
> On 5/16/2014 2:08 AM, Devesh Sharma wrote:
>> Chuck
>>
>> This patch is causing a CPU soft-lockup if underlying vendor reports 
>> devattr.max_fast_reg_pagr_list_len = 0 and ia->ri_memreg_strategy = 
>> FRMR (Default option).
>> I think there is need to refer to device capability flags. If 
>> strategy = FRMR is forced and devattr.max_fast_reg_pagr_list_len=0 
>> then flash an error and fail RPC with -EIO.
>>
>> See inline:
>>
>>> -----Original Message-----
>>> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
>>> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Chuck Lever
>>> Sent: Thursday, May 01, 2014 1:00 AM
>>> To: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org
>>> Subject: [PATCH V3 01/17] xprtrdma: mind the device's max fast register
>>> page list depth
>>>
>>> From: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
>>>
>>> Some rdma devices don't support a fast register page list depth of 
>>> at least
>>> RPCRDMA_MAX_DATA_SEGS.  So xprtrdma needs to chunk its fast register
>>> regions according to the minimum of the device max supported depth or
>>> RPCRDMA_MAX_DATA_SEGS.
>>>
>>> Signed-off-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
>>> Reviewed-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>> ---
>>>
>>>   net/sunrpc/xprtrdma/rpc_rdma.c  |    4 ---
>>>   net/sunrpc/xprtrdma/verbs.c     |   47 +++++++++++++++++++++++++++++-
>>> ---------
>>>   net/sunrpc/xprtrdma/xprt_rdma.h |    1 +
>>>   3 files changed, 36 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c
>>> b/net/sunrpc/xprtrdma/rpc_rdma.c index 96ead52..400aa1b 100644
>>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
>>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
>>> @@ -248,10 +248,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, 
>>> struct
>>> xdr_buf *target,
>>>       /* success. all failures return above */
>>>       req->rl_nchunks = nchunks;
>>>
>>> -    BUG_ON(nchunks == 0);
>>> -    BUG_ON((r_xprt->rx_ia.ri_memreg_strategy == RPCRDMA_FRMR)
>>> -           && (nchunks > 3));
>>> -
>>>       /*
>>>        * finish off header. If write, marshal discrim and nchunks.
>>>        */
>>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>>> index 9372656..55fb09a 100644
>>> --- a/net/sunrpc/xprtrdma/verbs.c
>>> +++ b/net/sunrpc/xprtrdma/verbs.c
>>> @@ -539,6 +539,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct
>>> sockaddr *addr, int memreg)
>>>                   __func__);
>>>               memreg = RPCRDMA_REGISTER;
>>>   #endif
>>> +        } else {
>>> +            /* Mind the ia limit on FRMR page list depth */
>>> +            ia->ri_max_frmr_depth = min_t(unsigned int,
>>> +                RPCRDMA_MAX_DATA_SEGS,
>>> +                devattr.max_fast_reg_page_list_len);
>>>           }
>>>           break;
>>>       }
>>> @@ -659,24 +664,42 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct
>>> rpcrdma_ia *ia,
>>>       ep->rep_attr.srq = NULL;
>>>       ep->rep_attr.cap.max_send_wr = cdata->max_requests;
>>>       switch (ia->ri_memreg_strategy) {
>>> -    case RPCRDMA_FRMR:
>>> +    case RPCRDMA_FRMR: {
>>> +        int depth = 7;
>>> +
>>>           /* Add room for frmr register and invalidate WRs.
>>>            * 1. FRMR reg WR for head
>>>            * 2. FRMR invalidate WR for head
>>> -         * 3. FRMR reg WR for pagelist
>>> -         * 4. FRMR invalidate WR for pagelist
>>> +         * 3. N FRMR reg WRs for pagelist
>>> +         * 4. N FRMR invalidate WRs for pagelist
>>>            * 5. FRMR reg WR for tail
>>>            * 6. FRMR invalidate WR for tail
>>>            * 7. The RDMA_SEND WR
>>>            */
>>> -        ep->rep_attr.cap.max_send_wr *= 7;
>>> +
>>> +        /* Calculate N if the device max FRMR depth is smaller than
>>> +         * RPCRDMA_MAX_DATA_SEGS.
>>> +         */
>>> +        if (ia->ri_max_frmr_depth < RPCRDMA_MAX_DATA_SEGS) {
>>> +            int delta = RPCRDMA_MAX_DATA_SEGS -
>>> +                    ia->ri_max_frmr_depth;
>>> +
>>> +            do {
>>> +                depth += 2; /* FRMR reg + invalidate */
>>> +                delta -= ia->ri_max_frmr_depth;
>> If ia->ri_max_frmr_depth is = 0. This loop becomes infinite loop.
>>
>>> +            } while (delta > 0);
>>> +
>>> +        }
>>> +        ep->rep_attr.cap.max_send_wr *= depth;
>>>           if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr) {
>>> -            cdata->max_requests = devattr.max_qp_wr / 7;
>>> +            cdata->max_requests = devattr.max_qp_wr / depth;
>>>               if (!cdata->max_requests)
>>>                   return -EINVAL;
>>> -            ep->rep_attr.cap.max_send_wr = cdata-
>>>> max_requests * 7;
>>> +            ep->rep_attr.cap.max_send_wr = cdata-
>>>> max_requests *
>>> +                               depth;
>>>           }
>>>           break;
>>> +    }
>>>       case RPCRDMA_MEMWINDOWS_ASYNC:
>>>       case RPCRDMA_MEMWINDOWS:
>>>           /* Add room for mw_binds+unbinds - overkill! */ @@ -
>>> 1043,16 +1066,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf,
>>> struct rpcrdma_ep *ep,
>>>       case RPCRDMA_FRMR:
>>>           for (i = buf->rb_max_requests * RPCRDMA_MAX_SEGS; i; i--
>>> ) {
>>>               r->r.frmr.fr_mr = ib_alloc_fast_reg_mr(ia->ri_pd,
>>> -
>>> RPCRDMA_MAX_SEGS);
>>> +                        ia->ri_max_frmr_depth);
>>>               if (IS_ERR(r->r.frmr.fr_mr)) {
>>>                   rc = PTR_ERR(r->r.frmr.fr_mr);
>>>                   dprintk("RPC:       %s: ib_alloc_fast_reg_mr"
>>>                       " failed %i\n", __func__, rc);
>>>                   goto out;
>>>               }
>>> -            r->r.frmr.fr_pgl =
>>> -                ib_alloc_fast_reg_page_list(ia->ri_id-
>>>> device,
>>> -
>>> RPCRDMA_MAX_SEGS);
>>> +            r->r.frmr.fr_pgl = ib_alloc_fast_reg_page_list(
>>> +                        ia->ri_id->device,
>>> +                        ia->ri_max_frmr_depth);
>>>               if (IS_ERR(r->r.frmr.fr_pgl)) {
>>>                   rc = PTR_ERR(r->r.frmr.fr_pgl);
>>>                   dprintk("RPC:       %s: "
>>> @@ -1498,8 +1521,8 @@ rpcrdma_register_frmr_external(struct
>>> rpcrdma_mr_seg *seg,
>>>       seg1->mr_offset -= pageoff;    /* start of page */
>>>       seg1->mr_len += pageoff;
>>>       len = -pageoff;
>>> -    if (*nsegs > RPCRDMA_MAX_DATA_SEGS)
>>> -        *nsegs = RPCRDMA_MAX_DATA_SEGS;
>>> +    if (*nsegs > ia->ri_max_frmr_depth)
>>> +        *nsegs = ia->ri_max_frmr_depth;
>>>       for (page_no = i = 0; i < *nsegs;) {
>>>           rpcrdma_map_one(ia, seg, writing);
>>>           pa = seg->mr_dma;
>>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h
>>> b/net/sunrpc/xprtrdma/xprt_rdma.h index cc1445d..98340a3 100644
>>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
>>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
>>> @@ -66,6 +66,7 @@ struct rpcrdma_ia {
>>>       struct completion    ri_done;
>>>       int            ri_async_rc;
>>>       enum rpcrdma_memreg    ri_memreg_strategy;
>>> +    unsigned int        ri_max_frmr_depth;
>>>   };
>>>
>>>   /*
>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe 
>>> linux-rdma" in the
>>> body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
>>> http://vger.kernel.org/majordomo-info.html
>> N�����r��y���b�X��ǧv�^�)޺{.n�+����{���"��^n�r���z�\x1a ��h����&��\x1e 
>> �G���h�\x03(�階�ݢj"��\x1a�^[m�����z�ޖ���f���h���~�mml==
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH V3 01/17] xprtrdma: mind the device's max fast register page list depth
@ 2014-05-16 14:14                 ` Steve Wise
  0 siblings, 0 replies; 60+ messages in thread
From: Steve Wise @ 2014-05-16 14:14 UTC (permalink / raw)
  To: Devesh Sharma, Chuck Lever, linux-nfs, linux-rdma; +Cc: Anna.Schumaker

By the way, Devesh:  Is the device advertising FRMR support, yet setting 
the max page list len to zero?  That's a driver bug...



On 5/16/2014 9:10 AM, Steve Wise wrote:
> I guess the client code doesn't verify that the device supports the 
> chosen memreg mode.  That's not good.   Lemme fix this and respin this 
> patch.
>
>
> On 5/16/2014 2:08 AM, Devesh Sharma wrote:
>> Chuck
>>
>> This patch is causing a CPU soft-lockup if underlying vendor reports 
>> devattr.max_fast_reg_pagr_list_len = 0 and ia->ri_memreg_strategy = 
>> FRMR (Default option).
>> I think there is need to refer to device capability flags. If 
>> strategy = FRMR is forced and devattr.max_fast_reg_pagr_list_len=0 
>> then flash an error and fail RPC with -EIO.
>>
>> See inline:
>>
>>> -----Original Message-----
>>> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
>>> owner@vger.kernel.org] On Behalf Of Chuck Lever
>>> Sent: Thursday, May 01, 2014 1:00 AM
>>> To: linux-nfs@vger.kernel.org; linux-rdma@vger.kernel.org
>>> Cc: Anna.Schumaker@netapp.com
>>> Subject: [PATCH V3 01/17] xprtrdma: mind the device's max fast register
>>> page list depth
>>>
>>> From: Steve Wise <swise@opengridcomputing.com>
>>>
>>> Some rdma devices don't support a fast register page list depth of 
>>> at least
>>> RPCRDMA_MAX_DATA_SEGS.  So xprtrdma needs to chunk its fast register
>>> regions according to the minimum of the device max supported depth or
>>> RPCRDMA_MAX_DATA_SEGS.
>>>
>>> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
>>> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
>>> ---
>>>
>>>   net/sunrpc/xprtrdma/rpc_rdma.c  |    4 ---
>>>   net/sunrpc/xprtrdma/verbs.c     |   47 +++++++++++++++++++++++++++++-
>>> ---------
>>>   net/sunrpc/xprtrdma/xprt_rdma.h |    1 +
>>>   3 files changed, 36 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c
>>> b/net/sunrpc/xprtrdma/rpc_rdma.c index 96ead52..400aa1b 100644
>>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
>>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
>>> @@ -248,10 +248,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, 
>>> struct
>>> xdr_buf *target,
>>>       /* success. all failures return above */
>>>       req->rl_nchunks = nchunks;
>>>
>>> -    BUG_ON(nchunks == 0);
>>> -    BUG_ON((r_xprt->rx_ia.ri_memreg_strategy == RPCRDMA_FRMR)
>>> -           && (nchunks > 3));
>>> -
>>>       /*
>>>        * finish off header. If write, marshal discrim and nchunks.
>>>        */
>>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>>> index 9372656..55fb09a 100644
>>> --- a/net/sunrpc/xprtrdma/verbs.c
>>> +++ b/net/sunrpc/xprtrdma/verbs.c
>>> @@ -539,6 +539,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct
>>> sockaddr *addr, int memreg)
>>>                   __func__);
>>>               memreg = RPCRDMA_REGISTER;
>>>   #endif
>>> +        } else {
>>> +            /* Mind the ia limit on FRMR page list depth */
>>> +            ia->ri_max_frmr_depth = min_t(unsigned int,
>>> +                RPCRDMA_MAX_DATA_SEGS,
>>> +                devattr.max_fast_reg_page_list_len);
>>>           }
>>>           break;
>>>       }
>>> @@ -659,24 +664,42 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct
>>> rpcrdma_ia *ia,
>>>       ep->rep_attr.srq = NULL;
>>>       ep->rep_attr.cap.max_send_wr = cdata->max_requests;
>>>       switch (ia->ri_memreg_strategy) {
>>> -    case RPCRDMA_FRMR:
>>> +    case RPCRDMA_FRMR: {
>>> +        int depth = 7;
>>> +
>>>           /* Add room for frmr register and invalidate WRs.
>>>            * 1. FRMR reg WR for head
>>>            * 2. FRMR invalidate WR for head
>>> -         * 3. FRMR reg WR for pagelist
>>> -         * 4. FRMR invalidate WR for pagelist
>>> +         * 3. N FRMR reg WRs for pagelist
>>> +         * 4. N FRMR invalidate WRs for pagelist
>>>            * 5. FRMR reg WR for tail
>>>            * 6. FRMR invalidate WR for tail
>>>            * 7. The RDMA_SEND WR
>>>            */
>>> -        ep->rep_attr.cap.max_send_wr *= 7;
>>> +
>>> +        /* Calculate N if the device max FRMR depth is smaller than
>>> +         * RPCRDMA_MAX_DATA_SEGS.
>>> +         */
>>> +        if (ia->ri_max_frmr_depth < RPCRDMA_MAX_DATA_SEGS) {
>>> +            int delta = RPCRDMA_MAX_DATA_SEGS -
>>> +                    ia->ri_max_frmr_depth;
>>> +
>>> +            do {
>>> +                depth += 2; /* FRMR reg + invalidate */
>>> +                delta -= ia->ri_max_frmr_depth;
>> If ia->ri_max_frmr_depth is = 0. This loop becomes infinite loop.
>>
>>> +            } while (delta > 0);
>>> +
>>> +        }
>>> +        ep->rep_attr.cap.max_send_wr *= depth;
>>>           if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr) {
>>> -            cdata->max_requests = devattr.max_qp_wr / 7;
>>> +            cdata->max_requests = devattr.max_qp_wr / depth;
>>>               if (!cdata->max_requests)
>>>                   return -EINVAL;
>>> -            ep->rep_attr.cap.max_send_wr = cdata-
>>>> max_requests * 7;
>>> +            ep->rep_attr.cap.max_send_wr = cdata-
>>>> max_requests *
>>> +                               depth;
>>>           }
>>>           break;
>>> +    }
>>>       case RPCRDMA_MEMWINDOWS_ASYNC:
>>>       case RPCRDMA_MEMWINDOWS:
>>>           /* Add room for mw_binds+unbinds - overkill! */ @@ -
>>> 1043,16 +1066,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf,
>>> struct rpcrdma_ep *ep,
>>>       case RPCRDMA_FRMR:
>>>           for (i = buf->rb_max_requests * RPCRDMA_MAX_SEGS; i; i--
>>> ) {
>>>               r->r.frmr.fr_mr = ib_alloc_fast_reg_mr(ia->ri_pd,
>>> -
>>> RPCRDMA_MAX_SEGS);
>>> +                        ia->ri_max_frmr_depth);
>>>               if (IS_ERR(r->r.frmr.fr_mr)) {
>>>                   rc = PTR_ERR(r->r.frmr.fr_mr);
>>>                   dprintk("RPC:       %s: ib_alloc_fast_reg_mr"
>>>                       " failed %i\n", __func__, rc);
>>>                   goto out;
>>>               }
>>> -            r->r.frmr.fr_pgl =
>>> -                ib_alloc_fast_reg_page_list(ia->ri_id-
>>>> device,
>>> -
>>> RPCRDMA_MAX_SEGS);
>>> +            r->r.frmr.fr_pgl = ib_alloc_fast_reg_page_list(
>>> +                        ia->ri_id->device,
>>> +                        ia->ri_max_frmr_depth);
>>>               if (IS_ERR(r->r.frmr.fr_pgl)) {
>>>                   rc = PTR_ERR(r->r.frmr.fr_pgl);
>>>                   dprintk("RPC:       %s: "
>>> @@ -1498,8 +1521,8 @@ rpcrdma_register_frmr_external(struct
>>> rpcrdma_mr_seg *seg,
>>>       seg1->mr_offset -= pageoff;    /* start of page */
>>>       seg1->mr_len += pageoff;
>>>       len = -pageoff;
>>> -    if (*nsegs > RPCRDMA_MAX_DATA_SEGS)
>>> -        *nsegs = RPCRDMA_MAX_DATA_SEGS;
>>> +    if (*nsegs > ia->ri_max_frmr_depth)
>>> +        *nsegs = ia->ri_max_frmr_depth;
>>>       for (page_no = i = 0; i < *nsegs;) {
>>>           rpcrdma_map_one(ia, seg, writing);
>>>           pa = seg->mr_dma;
>>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h
>>> b/net/sunrpc/xprtrdma/xprt_rdma.h index cc1445d..98340a3 100644
>>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
>>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
>>> @@ -66,6 +66,7 @@ struct rpcrdma_ia {
>>>       struct completion    ri_done;
>>>       int            ri_async_rc;
>>>       enum rpcrdma_memreg    ri_memreg_strategy;
>>> +    unsigned int        ri_max_frmr_depth;
>>>   };
>>>
>>>   /*
>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe 
>>> linux-rdma" in the
>>> body of a message to majordomo@vger.kernel.org More majordomo info at
>>> http://vger.kernel.org/majordomo-info.html
>> N�����r��y���b�X��ǧv�^�)޺{.n�+����{���"��^n�r���z�\x1a ��h����&��\x1e 
>> �G���h�\x03(�階�ݢj"��\x1a�^[m�����z�ޖ���f���h���~�mml==
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH V3 01/17] xprtrdma: mind the device's max fast register page list depth
  2014-05-16 14:14                 ` Steve Wise
@ 2014-05-16 14:29                     ` Steve Wise
  -1 siblings, 0 replies; 60+ messages in thread
From: Steve Wise @ 2014-05-16 14:29 UTC (permalink / raw)
  To: Devesh Sharma, Chuck Lever, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA

Looks like ocrdma does this.  See ocrdma_query_device().  It advertises 
IB_DEVICE_MEM_MGT_EXTENSIONS but sets max_fast_reg_page_list_len to 0.  
The Verbs spec sez if you advertise the mem extensions, then you need to 
support all of them.   Is this just a bug in the driver?  Or does it 
really not support FRMRs?

Steve.


On 5/16/2014 9:14 AM, Steve Wise wrote:
> By the way, Devesh:  Is the device advertising FRMR support, yet 
> setting the max page list len to zero?  That's a driver bug...
>
>
>
> On 5/16/2014 9:10 AM, Steve Wise wrote:
>> I guess the client code doesn't verify that the device supports the 
>> chosen memreg mode.  That's not good.   Lemme fix this and respin 
>> this patch.
>>
>>
>> On 5/16/2014 2:08 AM, Devesh Sharma wrote:
>>> Chuck
>>>
>>> This patch is causing a CPU soft-lockup if underlying vendor reports 
>>> devattr.max_fast_reg_pagr_list_len = 0 and ia->ri_memreg_strategy = 
>>> FRMR (Default option).
>>> I think there is need to refer to device capability flags. If 
>>> strategy = FRMR is forced and devattr.max_fast_reg_pagr_list_len=0 
>>> then flash an error and fail RPC with -EIO.
>>>
>>> See inline:
>>>
>>>> -----Original Message-----
>>>> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
>>>> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Chuck Lever
>>>> Sent: Thursday, May 01, 2014 1:00 AM
>>>> To: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>> Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org
>>>> Subject: [PATCH V3 01/17] xprtrdma: mind the device's max fast 
>>>> register
>>>> page list depth
>>>>
>>>> From: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
>>>>
>>>> Some rdma devices don't support a fast register page list depth of 
>>>> at least
>>>> RPCRDMA_MAX_DATA_SEGS.  So xprtrdma needs to chunk its fast register
>>>> regions according to the minimum of the device max supported depth or
>>>> RPCRDMA_MAX_DATA_SEGS.
>>>>
>>>> Signed-off-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
>>>> Reviewed-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>>> ---
>>>>
>>>>   net/sunrpc/xprtrdma/rpc_rdma.c  |    4 ---
>>>>   net/sunrpc/xprtrdma/verbs.c     |   47 
>>>> +++++++++++++++++++++++++++++-
>>>> ---------
>>>>   net/sunrpc/xprtrdma/xprt_rdma.h |    1 +
>>>>   3 files changed, 36 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c
>>>> b/net/sunrpc/xprtrdma/rpc_rdma.c index 96ead52..400aa1b 100644
>>>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
>>>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
>>>> @@ -248,10 +248,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, 
>>>> struct
>>>> xdr_buf *target,
>>>>       /* success. all failures return above */
>>>>       req->rl_nchunks = nchunks;
>>>>
>>>> -    BUG_ON(nchunks == 0);
>>>> -    BUG_ON((r_xprt->rx_ia.ri_memreg_strategy == RPCRDMA_FRMR)
>>>> -           && (nchunks > 3));
>>>> -
>>>>       /*
>>>>        * finish off header. If write, marshal discrim and nchunks.
>>>>        */
>>>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>>>> index 9372656..55fb09a 100644
>>>> --- a/net/sunrpc/xprtrdma/verbs.c
>>>> +++ b/net/sunrpc/xprtrdma/verbs.c
>>>> @@ -539,6 +539,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct
>>>> sockaddr *addr, int memreg)
>>>>                   __func__);
>>>>               memreg = RPCRDMA_REGISTER;
>>>>   #endif
>>>> +        } else {
>>>> +            /* Mind the ia limit on FRMR page list depth */
>>>> +            ia->ri_max_frmr_depth = min_t(unsigned int,
>>>> +                RPCRDMA_MAX_DATA_SEGS,
>>>> +                devattr.max_fast_reg_page_list_len);
>>>>           }
>>>>           break;
>>>>       }
>>>> @@ -659,24 +664,42 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct
>>>> rpcrdma_ia *ia,
>>>>       ep->rep_attr.srq = NULL;
>>>>       ep->rep_attr.cap.max_send_wr = cdata->max_requests;
>>>>       switch (ia->ri_memreg_strategy) {
>>>> -    case RPCRDMA_FRMR:
>>>> +    case RPCRDMA_FRMR: {
>>>> +        int depth = 7;
>>>> +
>>>>           /* Add room for frmr register and invalidate WRs.
>>>>            * 1. FRMR reg WR for head
>>>>            * 2. FRMR invalidate WR for head
>>>> -         * 3. FRMR reg WR for pagelist
>>>> -         * 4. FRMR invalidate WR for pagelist
>>>> +         * 3. N FRMR reg WRs for pagelist
>>>> +         * 4. N FRMR invalidate WRs for pagelist
>>>>            * 5. FRMR reg WR for tail
>>>>            * 6. FRMR invalidate WR for tail
>>>>            * 7. The RDMA_SEND WR
>>>>            */
>>>> -        ep->rep_attr.cap.max_send_wr *= 7;
>>>> +
>>>> +        /* Calculate N if the device max FRMR depth is smaller than
>>>> +         * RPCRDMA_MAX_DATA_SEGS.
>>>> +         */
>>>> +        if (ia->ri_max_frmr_depth < RPCRDMA_MAX_DATA_SEGS) {
>>>> +            int delta = RPCRDMA_MAX_DATA_SEGS -
>>>> +                    ia->ri_max_frmr_depth;
>>>> +
>>>> +            do {
>>>> +                depth += 2; /* FRMR reg + invalidate */
>>>> +                delta -= ia->ri_max_frmr_depth;
>>> If ia->ri_max_frmr_depth is = 0. This loop becomes infinite loop.
>>>
>>>> +            } while (delta > 0);
>>>> +
>>>> +        }
>>>> +        ep->rep_attr.cap.max_send_wr *= depth;
>>>>           if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr) {
>>>> -            cdata->max_requests = devattr.max_qp_wr / 7;
>>>> +            cdata->max_requests = devattr.max_qp_wr / depth;
>>>>               if (!cdata->max_requests)
>>>>                   return -EINVAL;
>>>> -            ep->rep_attr.cap.max_send_wr = cdata-
>>>>> max_requests * 7;
>>>> +            ep->rep_attr.cap.max_send_wr = cdata-
>>>>> max_requests *
>>>> +                               depth;
>>>>           }
>>>>           break;
>>>> +    }
>>>>       case RPCRDMA_MEMWINDOWS_ASYNC:
>>>>       case RPCRDMA_MEMWINDOWS:
>>>>           /* Add room for mw_binds+unbinds - overkill! */ @@ -
>>>> 1043,16 +1066,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf,
>>>> struct rpcrdma_ep *ep,
>>>>       case RPCRDMA_FRMR:
>>>>           for (i = buf->rb_max_requests * RPCRDMA_MAX_SEGS; i; i--
>>>> ) {
>>>>               r->r.frmr.fr_mr = ib_alloc_fast_reg_mr(ia->ri_pd,
>>>> -
>>>> RPCRDMA_MAX_SEGS);
>>>> +                        ia->ri_max_frmr_depth);
>>>>               if (IS_ERR(r->r.frmr.fr_mr)) {
>>>>                   rc = PTR_ERR(r->r.frmr.fr_mr);
>>>>                   dprintk("RPC:       %s: ib_alloc_fast_reg_mr"
>>>>                       " failed %i\n", __func__, rc);
>>>>                   goto out;
>>>>               }
>>>> -            r->r.frmr.fr_pgl =
>>>> -                ib_alloc_fast_reg_page_list(ia->ri_id-
>>>>> device,
>>>> -
>>>> RPCRDMA_MAX_SEGS);
>>>> +            r->r.frmr.fr_pgl = ib_alloc_fast_reg_page_list(
>>>> +                        ia->ri_id->device,
>>>> +                        ia->ri_max_frmr_depth);
>>>>               if (IS_ERR(r->r.frmr.fr_pgl)) {
>>>>                   rc = PTR_ERR(r->r.frmr.fr_pgl);
>>>>                   dprintk("RPC:       %s: "
>>>> @@ -1498,8 +1521,8 @@ rpcrdma_register_frmr_external(struct
>>>> rpcrdma_mr_seg *seg,
>>>>       seg1->mr_offset -= pageoff;    /* start of page */
>>>>       seg1->mr_len += pageoff;
>>>>       len = -pageoff;
>>>> -    if (*nsegs > RPCRDMA_MAX_DATA_SEGS)
>>>> -        *nsegs = RPCRDMA_MAX_DATA_SEGS;
>>>> +    if (*nsegs > ia->ri_max_frmr_depth)
>>>> +        *nsegs = ia->ri_max_frmr_depth;
>>>>       for (page_no = i = 0; i < *nsegs;) {
>>>>           rpcrdma_map_one(ia, seg, writing);
>>>>           pa = seg->mr_dma;
>>>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h
>>>> b/net/sunrpc/xprtrdma/xprt_rdma.h index cc1445d..98340a3 100644
>>>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
>>>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
>>>> @@ -66,6 +66,7 @@ struct rpcrdma_ia {
>>>>       struct completion    ri_done;
>>>>       int            ri_async_rc;
>>>>       enum rpcrdma_memreg    ri_memreg_strategy;
>>>> +    unsigned int        ri_max_frmr_depth;
>>>>   };
>>>>
>>>>   /*
>>>>
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe 
>>>> linux-rdma" in the
>>>> body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
>>>> http://vger.kernel.org/majordomo-info.html
>>> N�����r��y���b�X��ǧv�^�)޺{.n�+����{���"��^n�r���z�\x1a 
>>> ��h����&��\x1e �G���h�\x03(�階�ݢj"��\x1a�^[m�����z�ޖ���f���h���~�mml==
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH V3 01/17] xprtrdma: mind the device's max fast register page list depth
@ 2014-05-16 14:29                     ` Steve Wise
  0 siblings, 0 replies; 60+ messages in thread
From: Steve Wise @ 2014-05-16 14:29 UTC (permalink / raw)
  To: Devesh Sharma, Chuck Lever, linux-nfs, linux-rdma; +Cc: Anna.Schumaker

Looks like ocrdma does this.  See ocrdma_query_device().  It advertises 
IB_DEVICE_MEM_MGT_EXTENSIONS but sets max_fast_reg_page_list_len to 0.  
The Verbs spec sez if you advertise the mem extensions, then you need to 
support all of them.   Is this just a bug in the driver?  Or does it 
really not support FRMRs?

Steve.


On 5/16/2014 9:14 AM, Steve Wise wrote:
> By the way, Devesh:  Is the device advertising FRMR support, yet 
> setting the max page list len to zero?  That's a driver bug...
>
>
>
> On 5/16/2014 9:10 AM, Steve Wise wrote:
>> I guess the client code doesn't verify that the device supports the 
>> chosen memreg mode.  That's not good.   Lemme fix this and respin 
>> this patch.
>>
>>
>> On 5/16/2014 2:08 AM, Devesh Sharma wrote:
>>> Chuck
>>>
>>> This patch is causing a CPU soft-lockup if underlying vendor reports 
>>> devattr.max_fast_reg_pagr_list_len = 0 and ia->ri_memreg_strategy = 
>>> FRMR (Default option).
>>> I think there is need to refer to device capability flags. If 
>>> strategy = FRMR is forced and devattr.max_fast_reg_pagr_list_len=0 
>>> then flash an error and fail RPC with -EIO.
>>>
>>> See inline:
>>>
>>>> -----Original Message-----
>>>> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
>>>> owner@vger.kernel.org] On Behalf Of Chuck Lever
>>>> Sent: Thursday, May 01, 2014 1:00 AM
>>>> To: linux-nfs@vger.kernel.org; linux-rdma@vger.kernel.org
>>>> Cc: Anna.Schumaker@netapp.com
>>>> Subject: [PATCH V3 01/17] xprtrdma: mind the device's max fast 
>>>> register
>>>> page list depth
>>>>
>>>> From: Steve Wise <swise@opengridcomputing.com>
>>>>
>>>> Some rdma devices don't support a fast register page list depth of 
>>>> at least
>>>> RPCRDMA_MAX_DATA_SEGS.  So xprtrdma needs to chunk its fast register
>>>> regions according to the minimum of the device max supported depth or
>>>> RPCRDMA_MAX_DATA_SEGS.
>>>>
>>>> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
>>>> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
>>>> ---
>>>>
>>>>   net/sunrpc/xprtrdma/rpc_rdma.c  |    4 ---
>>>>   net/sunrpc/xprtrdma/verbs.c     |   47 
>>>> +++++++++++++++++++++++++++++-
>>>> ---------
>>>>   net/sunrpc/xprtrdma/xprt_rdma.h |    1 +
>>>>   3 files changed, 36 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c
>>>> b/net/sunrpc/xprtrdma/rpc_rdma.c index 96ead52..400aa1b 100644
>>>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
>>>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
>>>> @@ -248,10 +248,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, 
>>>> struct
>>>> xdr_buf *target,
>>>>       /* success. all failures return above */
>>>>       req->rl_nchunks = nchunks;
>>>>
>>>> -    BUG_ON(nchunks == 0);
>>>> -    BUG_ON((r_xprt->rx_ia.ri_memreg_strategy == RPCRDMA_FRMR)
>>>> -           && (nchunks > 3));
>>>> -
>>>>       /*
>>>>        * finish off header. If write, marshal discrim and nchunks.
>>>>        */
>>>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>>>> index 9372656..55fb09a 100644
>>>> --- a/net/sunrpc/xprtrdma/verbs.c
>>>> +++ b/net/sunrpc/xprtrdma/verbs.c
>>>> @@ -539,6 +539,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct
>>>> sockaddr *addr, int memreg)
>>>>                   __func__);
>>>>               memreg = RPCRDMA_REGISTER;
>>>>   #endif
>>>> +        } else {
>>>> +            /* Mind the ia limit on FRMR page list depth */
>>>> +            ia->ri_max_frmr_depth = min_t(unsigned int,
>>>> +                RPCRDMA_MAX_DATA_SEGS,
>>>> +                devattr.max_fast_reg_page_list_len);
>>>>           }
>>>>           break;
>>>>       }
>>>> @@ -659,24 +664,42 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct
>>>> rpcrdma_ia *ia,
>>>>       ep->rep_attr.srq = NULL;
>>>>       ep->rep_attr.cap.max_send_wr = cdata->max_requests;
>>>>       switch (ia->ri_memreg_strategy) {
>>>> -    case RPCRDMA_FRMR:
>>>> +    case RPCRDMA_FRMR: {
>>>> +        int depth = 7;
>>>> +
>>>>           /* Add room for frmr register and invalidate WRs.
>>>>            * 1. FRMR reg WR for head
>>>>            * 2. FRMR invalidate WR for head
>>>> -         * 3. FRMR reg WR for pagelist
>>>> -         * 4. FRMR invalidate WR for pagelist
>>>> +         * 3. N FRMR reg WRs for pagelist
>>>> +         * 4. N FRMR invalidate WRs for pagelist
>>>>            * 5. FRMR reg WR for tail
>>>>            * 6. FRMR invalidate WR for tail
>>>>            * 7. The RDMA_SEND WR
>>>>            */
>>>> -        ep->rep_attr.cap.max_send_wr *= 7;
>>>> +
>>>> +        /* Calculate N if the device max FRMR depth is smaller than
>>>> +         * RPCRDMA_MAX_DATA_SEGS.
>>>> +         */
>>>> +        if (ia->ri_max_frmr_depth < RPCRDMA_MAX_DATA_SEGS) {
>>>> +            int delta = RPCRDMA_MAX_DATA_SEGS -
>>>> +                    ia->ri_max_frmr_depth;
>>>> +
>>>> +            do {
>>>> +                depth += 2; /* FRMR reg + invalidate */
>>>> +                delta -= ia->ri_max_frmr_depth;
>>> If ia->ri_max_frmr_depth is = 0. This loop becomes infinite loop.
>>>
>>>> +            } while (delta > 0);
>>>> +
>>>> +        }
>>>> +        ep->rep_attr.cap.max_send_wr *= depth;
>>>>           if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr) {
>>>> -            cdata->max_requests = devattr.max_qp_wr / 7;
>>>> +            cdata->max_requests = devattr.max_qp_wr / depth;
>>>>               if (!cdata->max_requests)
>>>>                   return -EINVAL;
>>>> -            ep->rep_attr.cap.max_send_wr = cdata-
>>>>> max_requests * 7;
>>>> +            ep->rep_attr.cap.max_send_wr = cdata-
>>>>> max_requests *
>>>> +                               depth;
>>>>           }
>>>>           break;
>>>> +    }
>>>>       case RPCRDMA_MEMWINDOWS_ASYNC:
>>>>       case RPCRDMA_MEMWINDOWS:
>>>>           /* Add room for mw_binds+unbinds - overkill! */ @@ -
>>>> 1043,16 +1066,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf,
>>>> struct rpcrdma_ep *ep,
>>>>       case RPCRDMA_FRMR:
>>>>           for (i = buf->rb_max_requests * RPCRDMA_MAX_SEGS; i; i--
>>>> ) {
>>>>               r->r.frmr.fr_mr = ib_alloc_fast_reg_mr(ia->ri_pd,
>>>> -
>>>> RPCRDMA_MAX_SEGS);
>>>> +                        ia->ri_max_frmr_depth);
>>>>               if (IS_ERR(r->r.frmr.fr_mr)) {
>>>>                   rc = PTR_ERR(r->r.frmr.fr_mr);
>>>>                   dprintk("RPC:       %s: ib_alloc_fast_reg_mr"
>>>>                       " failed %i\n", __func__, rc);
>>>>                   goto out;
>>>>               }
>>>> -            r->r.frmr.fr_pgl =
>>>> -                ib_alloc_fast_reg_page_list(ia->ri_id-
>>>>> device,
>>>> -
>>>> RPCRDMA_MAX_SEGS);
>>>> +            r->r.frmr.fr_pgl = ib_alloc_fast_reg_page_list(
>>>> +                        ia->ri_id->device,
>>>> +                        ia->ri_max_frmr_depth);
>>>>               if (IS_ERR(r->r.frmr.fr_pgl)) {
>>>>                   rc = PTR_ERR(r->r.frmr.fr_pgl);
>>>>                   dprintk("RPC:       %s: "
>>>> @@ -1498,8 +1521,8 @@ rpcrdma_register_frmr_external(struct
>>>> rpcrdma_mr_seg *seg,
>>>>       seg1->mr_offset -= pageoff;    /* start of page */
>>>>       seg1->mr_len += pageoff;
>>>>       len = -pageoff;
>>>> -    if (*nsegs > RPCRDMA_MAX_DATA_SEGS)
>>>> -        *nsegs = RPCRDMA_MAX_DATA_SEGS;
>>>> +    if (*nsegs > ia->ri_max_frmr_depth)
>>>> +        *nsegs = ia->ri_max_frmr_depth;
>>>>       for (page_no = i = 0; i < *nsegs;) {
>>>>           rpcrdma_map_one(ia, seg, writing);
>>>>           pa = seg->mr_dma;
>>>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h
>>>> b/net/sunrpc/xprtrdma/xprt_rdma.h index cc1445d..98340a3 100644
>>>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
>>>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
>>>> @@ -66,6 +66,7 @@ struct rpcrdma_ia {
>>>>       struct completion    ri_done;
>>>>       int            ri_async_rc;
>>>>       enum rpcrdma_memreg    ri_memreg_strategy;
>>>> +    unsigned int        ri_max_frmr_depth;
>>>>   };
>>>>
>>>>   /*
>>>>
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe 
>>>> linux-rdma" in the
>>>> body of a message to majordomo@vger.kernel.org More majordomo info at
>>>> http://vger.kernel.org/majordomo-info.html
>>> N�����r��y���b�X��ǧv�^�)޺{.n�+����{���"��^n�r���z�\x1a 
>>> ��h����&��\x1e �G���h�\x03(�階�ݢj"��\x1a�^[m�����z�ޖ���f���h���~�mml==
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH V3 01/17] xprtrdma: mind the device's max fast register page list depth
  2014-05-16 14:29                     ` Steve Wise
@ 2014-05-17  8:23                         ` Devesh Sharma
  -1 siblings, 0 replies; 60+ messages in thread
From: Devesh Sharma @ 2014-05-17  8:23 UTC (permalink / raw)
  To: Steve Wise, Chuck Lever, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA

Yes Steve,

This is a bug in ocrdma. We will be suplying a fix for this in the near future. Its because of this bug i was able to figure out that infinite loop is causing a soft lockup

Thanks for pointing out you will see a fix soon for this bug.

Regards
Devesh
________________________________________
From: Steve Wise [swise@opengridcomputing.com]
Sent: Friday, May 16, 2014 7:59 PM
To: Devesh Sharma; Chuck Lever; linux-nfs@vger.kernel.org; linux-rdma@vger.kernel.org
Cc: Anna.Schumaker@netapp.com
Subject: Re: [PATCH V3 01/17] xprtrdma: mind the device's max fast register page        list depth

Looks like ocrdma does this.  See ocrdma_query_device().  It advertises
IB_DEVICE_MEM_MGT_EXTENSIONS but sets max_fast_reg_page_list_len to 0.
The Verbs spec sez if you advertise the mem extensions, then you need to
support all of them.   Is this just a bug in the driver?  Or does it
really not support FRMRs?

Steve.


On 5/16/2014 9:14 AM, Steve Wise wrote:
> By the way, Devesh:  Is the device advertising FRMR support, yet
> setting the max page list len to zero?  That's a driver bug...
>
>
>
> On 5/16/2014 9:10 AM, Steve Wise wrote:
>> I guess the client code doesn't verify that the device supports the
>> chosen memreg mode.  That's not good.   Lemme fix this and respin
>> this patch.
>>
>>
>> On 5/16/2014 2:08 AM, Devesh Sharma wrote:
>>> Chuck
>>>
>>> This patch is causing a CPU soft-lockup if underlying vendor reports
>>> devattr.max_fast_reg_pagr_list_len = 0 and ia->ri_memreg_strategy =
>>> FRMR (Default option).
>>> I think there is need to refer to device capability flags. If
>>> strategy = FRMR is forced and devattr.max_fast_reg_pagr_list_len=0
>>> then flash an error and fail RPC with -EIO.
>>>
>>> See inline:
>>>
>>>> -----Original Message-----
>>>> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
>>>> owner@vger.kernel.org] On Behalf Of Chuck Lever
>>>> Sent: Thursday, May 01, 2014 1:00 AM
>>>> To: linux-nfs@vger.kernel.org; linux-rdma@vger.kernel.org
>>>> Cc: Anna.Schumaker@netapp.com
>>>> Subject: [PATCH V3 01/17] xprtrdma: mind the device's max fast
>>>> register
>>>> page list depth
>>>>
>>>> From: Steve Wise <swise@opengridcomputing.com>
>>>>
>>>> Some rdma devices don't support a fast register page list depth of
>>>> at least
>>>> RPCRDMA_MAX_DATA_SEGS.  So xprtrdma needs to chunk its fast register
>>>> regions according to the minimum of the device max supported depth or
>>>> RPCRDMA_MAX_DATA_SEGS.
>>>>
>>>> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
>>>> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
>>>> ---
>>>>
>>>>   net/sunrpc/xprtrdma/rpc_rdma.c  |    4 ---
>>>>   net/sunrpc/xprtrdma/verbs.c     |   47
>>>> +++++++++++++++++++++++++++++-
>>>> ---------
>>>>   net/sunrpc/xprtrdma/xprt_rdma.h |    1 +
>>>>   3 files changed, 36 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c
>>>> b/net/sunrpc/xprtrdma/rpc_rdma.c index 96ead52..400aa1b 100644
>>>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
>>>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
>>>> @@ -248,10 +248,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst,
>>>> struct
>>>> xdr_buf *target,
>>>>       /* success. all failures return above */
>>>>       req->rl_nchunks = nchunks;
>>>>
>>>> -    BUG_ON(nchunks == 0);
>>>> -    BUG_ON((r_xprt->rx_ia.ri_memreg_strategy == RPCRDMA_FRMR)
>>>> -           && (nchunks > 3));
>>>> -
>>>>       /*
>>>>        * finish off header. If write, marshal discrim and nchunks.
>>>>        */
>>>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>>>> index 9372656..55fb09a 100644
>>>> --- a/net/sunrpc/xprtrdma/verbs.c
>>>> +++ b/net/sunrpc/xprtrdma/verbs.c
>>>> @@ -539,6 +539,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct
>>>> sockaddr *addr, int memreg)
>>>>                   __func__);
>>>>               memreg = RPCRDMA_REGISTER;
>>>>   #endif
>>>> +        } else {
>>>> +            /* Mind the ia limit on FRMR page list depth */
>>>> +            ia->ri_max_frmr_depth = min_t(unsigned int,
>>>> +                RPCRDMA_MAX_DATA_SEGS,
>>>> +                devattr.max_fast_reg_page_list_len);
>>>>           }
>>>>           break;
>>>>       }
>>>> @@ -659,24 +664,42 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct
>>>> rpcrdma_ia *ia,
>>>>       ep->rep_attr.srq = NULL;
>>>>       ep->rep_attr.cap.max_send_wr = cdata->max_requests;
>>>>       switch (ia->ri_memreg_strategy) {
>>>> -    case RPCRDMA_FRMR:
>>>> +    case RPCRDMA_FRMR: {
>>>> +        int depth = 7;
>>>> +
>>>>           /* Add room for frmr register and invalidate WRs.
>>>>            * 1. FRMR reg WR for head
>>>>            * 2. FRMR invalidate WR for head
>>>> -         * 3. FRMR reg WR for pagelist
>>>> -         * 4. FRMR invalidate WR for pagelist
>>>> +         * 3. N FRMR reg WRs for pagelist
>>>> +         * 4. N FRMR invalidate WRs for pagelist
>>>>            * 5. FRMR reg WR for tail
>>>>            * 6. FRMR invalidate WR for tail
>>>>            * 7. The RDMA_SEND WR
>>>>            */
>>>> -        ep->rep_attr.cap.max_send_wr *= 7;
>>>> +
>>>> +        /* Calculate N if the device max FRMR depth is smaller than
>>>> +         * RPCRDMA_MAX_DATA_SEGS.
>>>> +         */
>>>> +        if (ia->ri_max_frmr_depth < RPCRDMA_MAX_DATA_SEGS) {
>>>> +            int delta = RPCRDMA_MAX_DATA_SEGS -
>>>> +                    ia->ri_max_frmr_depth;
>>>> +
>>>> +            do {
>>>> +                depth += 2; /* FRMR reg + invalidate */
>>>> +                delta -= ia->ri_max_frmr_depth;
>>> If ia->ri_max_frmr_depth is = 0. This loop becomes infinite loop.
>>>
>>>> +            } while (delta > 0);
>>>> +
>>>> +        }
>>>> +        ep->rep_attr.cap.max_send_wr *= depth;
>>>>           if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr) {
>>>> -            cdata->max_requests = devattr.max_qp_wr / 7;
>>>> +            cdata->max_requests = devattr.max_qp_wr / depth;
>>>>               if (!cdata->max_requests)
>>>>                   return -EINVAL;
>>>> -            ep->rep_attr.cap.max_send_wr = cdata-
>>>>> max_requests * 7;
>>>> +            ep->rep_attr.cap.max_send_wr = cdata-
>>>>> max_requests *
>>>> +                               depth;
>>>>           }
>>>>           break;
>>>> +    }
>>>>       case RPCRDMA_MEMWINDOWS_ASYNC:
>>>>       case RPCRDMA_MEMWINDOWS:
>>>>           /* Add room for mw_binds+unbinds - overkill! */ @@ -
>>>> 1043,16 +1066,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf,
>>>> struct rpcrdma_ep *ep,
>>>>       case RPCRDMA_FRMR:
>>>>           for (i = buf->rb_max_requests * RPCRDMA_MAX_SEGS; i; i--
>>>> ) {
>>>>               r->r.frmr.fr_mr = ib_alloc_fast_reg_mr(ia->ri_pd,
>>>> -
>>>> RPCRDMA_MAX_SEGS);
>>>> +                        ia->ri_max_frmr_depth);
>>>>               if (IS_ERR(r->r.frmr.fr_mr)) {
>>>>                   rc = PTR_ERR(r->r.frmr.fr_mr);
>>>>                   dprintk("RPC:       %s: ib_alloc_fast_reg_mr"
>>>>                       " failed %i\n", __func__, rc);
>>>>                   goto out;
>>>>               }
>>>> -            r->r.frmr.fr_pgl =
>>>> -                ib_alloc_fast_reg_page_list(ia->ri_id-
>>>>> device,
>>>> -
>>>> RPCRDMA_MAX_SEGS);
>>>> +            r->r.frmr.fr_pgl = ib_alloc_fast_reg_page_list(
>>>> +                        ia->ri_id->device,
>>>> +                        ia->ri_max_frmr_depth);
>>>>               if (IS_ERR(r->r.frmr.fr_pgl)) {
>>>>                   rc = PTR_ERR(r->r.frmr.fr_pgl);
>>>>                   dprintk("RPC:       %s: "
>>>> @@ -1498,8 +1521,8 @@ rpcrdma_register_frmr_external(struct
>>>> rpcrdma_mr_seg *seg,
>>>>       seg1->mr_offset -= pageoff;    /* start of page */
>>>>       seg1->mr_len += pageoff;
>>>>       len = -pageoff;
>>>> -    if (*nsegs > RPCRDMA_MAX_DATA_SEGS)
>>>> -        *nsegs = RPCRDMA_MAX_DATA_SEGS;
>>>> +    if (*nsegs > ia->ri_max_frmr_depth)
>>>> +        *nsegs = ia->ri_max_frmr_depth;
>>>>       for (page_no = i = 0; i < *nsegs;) {
>>>>           rpcrdma_map_one(ia, seg, writing);
>>>>           pa = seg->mr_dma;
>>>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h
>>>> b/net/sunrpc/xprtrdma/xprt_rdma.h index cc1445d..98340a3 100644
>>>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
>>>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
>>>> @@ -66,6 +66,7 @@ struct rpcrdma_ia {
>>>>       struct completion    ri_done;
>>>>       int            ri_async_rc;
>>>>       enum rpcrdma_memreg    ri_memreg_strategy;
>>>> +    unsigned int        ri_max_frmr_depth;
>>>>   };
>>>>
>>>>   /*
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe
>>>> linux-rdma" in the
>>>> body of a message to majordomo@vger.kernel.org More majordomo info at
>>>> http://vger.kernel.org/majordomo-info.html
>>> N�����r��y���b�X��ǧv�^�)޺{.n�+����{���"��^n�r���z�\x1a
>>> ��h����&��\x1e �G���h�\x03(�階�ݢj"��\x1a�^[m�����z�ޖ���f���h���~�mml==
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH V3 01/17] xprtrdma: mind the device's max fast register page list depth
@ 2014-05-17  8:23                         ` Devesh Sharma
  0 siblings, 0 replies; 60+ messages in thread
From: Devesh Sharma @ 2014-05-17  8:23 UTC (permalink / raw)
  To: Steve Wise, Chuck Lever, linux-nfs, linux-rdma; +Cc: Anna.Schumaker

WWVzIFN0ZXZlLA0KDQpUaGlzIGlzIGEgYnVnIGluIG9jcmRtYS4gV2Ugd2lsbCBiZSBzdXBseWlu
ZyBhIGZpeCBmb3IgdGhpcyBpbiB0aGUgbmVhciBmdXR1cmUuIEl0cyBiZWNhdXNlIG9mIHRoaXMg
YnVnIGkgd2FzIGFibGUgdG8gZmlndXJlIG91dCB0aGF0IGluZmluaXRlIGxvb3AgaXMgY2F1c2lu
ZyBhIHNvZnQgbG9ja3VwDQoNClRoYW5rcyBmb3IgcG9pbnRpbmcgb3V0IHlvdSB3aWxsIHNlZSBh
IGZpeCBzb29uIGZvciB0aGlzIGJ1Zy4NCg0KUmVnYXJkcw0KRGV2ZXNoDQpfX19fX19fX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX19fDQpGcm9tOiBTdGV2ZSBXaXNlIFtzd2lzZUBvcGVu
Z3JpZGNvbXB1dGluZy5jb21dDQpTZW50OiBGcmlkYXksIE1heSAxNiwgMjAxNCA3OjU5IFBNDQpU
bzogRGV2ZXNoIFNoYXJtYTsgQ2h1Y2sgTGV2ZXI7IGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5vcmc7
IGxpbnV4LXJkbWFAdmdlci5rZXJuZWwub3JnDQpDYzogQW5uYS5TY2h1bWFrZXJAbmV0YXBwLmNv
bQ0KU3ViamVjdDogUmU6IFtQQVRDSCBWMyAwMS8xN10geHBydHJkbWE6IG1pbmQgdGhlIGRldmlj
ZSdzIG1heCBmYXN0IHJlZ2lzdGVyIHBhZ2UgICAgICAgIGxpc3QgZGVwdGgNCg0KTG9va3MgbGlr
ZSBvY3JkbWEgZG9lcyB0aGlzLiAgU2VlIG9jcmRtYV9xdWVyeV9kZXZpY2UoKS4gIEl0IGFkdmVy
dGlzZXMNCklCX0RFVklDRV9NRU1fTUdUX0VYVEVOU0lPTlMgYnV0IHNldHMgbWF4X2Zhc3RfcmVn
X3BhZ2VfbGlzdF9sZW4gdG8gMC4NClRoZSBWZXJicyBzcGVjIHNleiBpZiB5b3UgYWR2ZXJ0aXNl
IHRoZSBtZW0gZXh0ZW5zaW9ucywgdGhlbiB5b3UgbmVlZCB0bw0Kc3VwcG9ydCBhbGwgb2YgdGhl
bS4gICBJcyB0aGlzIGp1c3QgYSBidWcgaW4gdGhlIGRyaXZlcj8gIE9yIGRvZXMgaXQNCnJlYWxs
eSBub3Qgc3VwcG9ydCBGUk1Scz8NCg0KU3RldmUuDQoNCg0KT24gNS8xNi8yMDE0IDk6MTQgQU0s
IFN0ZXZlIFdpc2Ugd3JvdGU6DQo+IEJ5IHRoZSB3YXksIERldmVzaDogIElzIHRoZSBkZXZpY2Ug
YWR2ZXJ0aXNpbmcgRlJNUiBzdXBwb3J0LCB5ZXQNCj4gc2V0dGluZyB0aGUgbWF4IHBhZ2UgbGlz
dCBsZW4gdG8gemVybz8gIFRoYXQncyBhIGRyaXZlciBidWcuLi4NCj4NCj4NCj4NCj4gT24gNS8x
Ni8yMDE0IDk6MTAgQU0sIFN0ZXZlIFdpc2Ugd3JvdGU6DQo+PiBJIGd1ZXNzIHRoZSBjbGllbnQg
Y29kZSBkb2Vzbid0IHZlcmlmeSB0aGF0IHRoZSBkZXZpY2Ugc3VwcG9ydHMgdGhlDQo+PiBjaG9z
ZW4gbWVtcmVnIG1vZGUuICBUaGF0J3Mgbm90IGdvb2QuICAgTGVtbWUgZml4IHRoaXMgYW5kIHJl
c3Bpbg0KPj4gdGhpcyBwYXRjaC4NCj4+DQo+Pg0KPj4gT24gNS8xNi8yMDE0IDI6MDggQU0sIERl
dmVzaCBTaGFybWEgd3JvdGU6DQo+Pj4gQ2h1Y2sNCj4+Pg0KPj4+IFRoaXMgcGF0Y2ggaXMgY2F1
c2luZyBhIENQVSBzb2Z0LWxvY2t1cCBpZiB1bmRlcmx5aW5nIHZlbmRvciByZXBvcnRzDQo+Pj4g
ZGV2YXR0ci5tYXhfZmFzdF9yZWdfcGFncl9saXN0X2xlbiA9IDAgYW5kIGlhLT5yaV9tZW1yZWdf
c3RyYXRlZ3kgPQ0KPj4+IEZSTVIgKERlZmF1bHQgb3B0aW9uKS4NCj4+PiBJIHRoaW5rIHRoZXJl
IGlzIG5lZWQgdG8gcmVmZXIgdG8gZGV2aWNlIGNhcGFiaWxpdHkgZmxhZ3MuIElmDQo+Pj4gc3Ry
YXRlZ3kgPSBGUk1SIGlzIGZvcmNlZCBhbmQgZGV2YXR0ci5tYXhfZmFzdF9yZWdfcGFncl9saXN0
X2xlbj0wDQo+Pj4gdGhlbiBmbGFzaCBhbiBlcnJvciBhbmQgZmFpbCBSUEMgd2l0aCAtRUlPLg0K
Pj4+DQo+Pj4gU2VlIGlubGluZToNCj4+Pg0KPj4+PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0t
LQ0KPj4+PiBGcm9tOiBsaW51eC1yZG1hLW93bmVyQHZnZXIua2VybmVsLm9yZyBbbWFpbHRvOmxp
bnV4LXJkbWEtDQo+Pj4+IG93bmVyQHZnZXIua2VybmVsLm9yZ10gT24gQmVoYWxmIE9mIENodWNr
IExldmVyDQo+Pj4+IFNlbnQ6IFRodXJzZGF5LCBNYXkgMDEsIDIwMTQgMTowMCBBTQ0KPj4+PiBU
bzogbGludXgtbmZzQHZnZXIua2VybmVsLm9yZzsgbGludXgtcmRtYUB2Z2VyLmtlcm5lbC5vcmcN
Cj4+Pj4gQ2M6IEFubmEuU2NodW1ha2VyQG5ldGFwcC5jb20NCj4+Pj4gU3ViamVjdDogW1BBVENI
IFYzIDAxLzE3XSB4cHJ0cmRtYTogbWluZCB0aGUgZGV2aWNlJ3MgbWF4IGZhc3QNCj4+Pj4gcmVn
aXN0ZXINCj4+Pj4gcGFnZSBsaXN0IGRlcHRoDQo+Pj4+DQo+Pj4+IEZyb206IFN0ZXZlIFdpc2Ug
PHN3aXNlQG9wZW5ncmlkY29tcHV0aW5nLmNvbT4NCj4+Pj4NCj4+Pj4gU29tZSByZG1hIGRldmlj
ZXMgZG9uJ3Qgc3VwcG9ydCBhIGZhc3QgcmVnaXN0ZXIgcGFnZSBsaXN0IGRlcHRoIG9mDQo+Pj4+
IGF0IGxlYXN0DQo+Pj4+IFJQQ1JETUFfTUFYX0RBVEFfU0VHUy4gIFNvIHhwcnRyZG1hIG5lZWRz
IHRvIGNodW5rIGl0cyBmYXN0IHJlZ2lzdGVyDQo+Pj4+IHJlZ2lvbnMgYWNjb3JkaW5nIHRvIHRo
ZSBtaW5pbXVtIG9mIHRoZSBkZXZpY2UgbWF4IHN1cHBvcnRlZCBkZXB0aCBvcg0KPj4+PiBSUENS
RE1BX01BWF9EQVRBX1NFR1MuDQo+Pj4+DQo+Pj4+IFNpZ25lZC1vZmYtYnk6IFN0ZXZlIFdpc2Ug
PHN3aXNlQG9wZW5ncmlkY29tcHV0aW5nLmNvbT4NCj4+Pj4gUmV2aWV3ZWQtYnk6IENodWNrIExl
dmVyIDxjaHVjay5sZXZlckBvcmFjbGUuY29tPg0KPj4+PiAtLS0NCj4+Pj4NCj4+Pj4gICBuZXQv
c3VucnBjL3hwcnRyZG1hL3JwY19yZG1hLmMgIHwgICAgNCAtLS0NCj4+Pj4gICBuZXQvc3VucnBj
L3hwcnRyZG1hL3ZlcmJzLmMgICAgIHwgICA0Nw0KPj4+PiArKysrKysrKysrKysrKysrKysrKysr
KysrKysrKy0NCj4+Pj4gLS0tLS0tLS0tDQo+Pj4+ICAgbmV0L3N1bnJwYy94cHJ0cmRtYS94cHJ0
X3JkbWEuaCB8ICAgIDEgKw0KPj4+PiAgIDMgZmlsZXMgY2hhbmdlZCwgMzYgaW5zZXJ0aW9ucygr
KSwgMTYgZGVsZXRpb25zKC0pDQo+Pj4+DQo+Pj4+IGRpZmYgLS1naXQgYS9uZXQvc3VucnBjL3hw
cnRyZG1hL3JwY19yZG1hLmMNCj4+Pj4gYi9uZXQvc3VucnBjL3hwcnRyZG1hL3JwY19yZG1hLmMg
aW5kZXggOTZlYWQ1Mi4uNDAwYWExYiAxMDA2NDQNCj4+Pj4gLS0tIGEvbmV0L3N1bnJwYy94cHJ0
cmRtYS9ycGNfcmRtYS5jDQo+Pj4+ICsrKyBiL25ldC9zdW5ycGMveHBydHJkbWEvcnBjX3JkbWEu
Yw0KPj4+PiBAQCAtMjQ4LDEwICsyNDgsNiBAQCBycGNyZG1hX2NyZWF0ZV9jaHVua3Moc3RydWN0
IHJwY19ycXN0ICpycXN0LA0KPj4+PiBzdHJ1Y3QNCj4+Pj4geGRyX2J1ZiAqdGFyZ2V0LA0KPj4+
PiAgICAgICAvKiBzdWNjZXNzLiBhbGwgZmFpbHVyZXMgcmV0dXJuIGFib3ZlICovDQo+Pj4+ICAg
ICAgIHJlcS0+cmxfbmNodW5rcyA9IG5jaHVua3M7DQo+Pj4+DQo+Pj4+IC0gICAgQlVHX09OKG5j
aHVua3MgPT0gMCk7DQo+Pj4+IC0gICAgQlVHX09OKChyX3hwcnQtPnJ4X2lhLnJpX21lbXJlZ19z
dHJhdGVneSA9PSBSUENSRE1BX0ZSTVIpDQo+Pj4+IC0gICAgICAgICAgICYmIChuY2h1bmtzID4g
MykpOw0KPj4+PiAtDQo+Pj4+ICAgICAgIC8qDQo+Pj4+ICAgICAgICAqIGZpbmlzaCBvZmYgaGVh
ZGVyLiBJZiB3cml0ZSwgbWFyc2hhbCBkaXNjcmltIGFuZCBuY2h1bmtzLg0KPj4+PiAgICAgICAg
Ki8NCj4+Pj4gZGlmZiAtLWdpdCBhL25ldC9zdW5ycGMveHBydHJkbWEvdmVyYnMuYyBiL25ldC9z
dW5ycGMveHBydHJkbWEvdmVyYnMuYw0KPj4+PiBpbmRleCA5MzcyNjU2Li41NWZiMDlhIDEwMDY0
NA0KPj4+PiAtLS0gYS9uZXQvc3VucnBjL3hwcnRyZG1hL3ZlcmJzLmMNCj4+Pj4gKysrIGIvbmV0
L3N1bnJwYy94cHJ0cmRtYS92ZXJicy5jDQo+Pj4+IEBAIC01MzksNiArNTM5LDExIEBAIHJwY3Jk
bWFfaWFfb3BlbihzdHJ1Y3QgcnBjcmRtYV94cHJ0ICp4cHJ0LCBzdHJ1Y3QNCj4+Pj4gc29ja2Fk
ZHIgKmFkZHIsIGludCBtZW1yZWcpDQo+Pj4+ICAgICAgICAgICAgICAgICAgIF9fZnVuY19fKTsN
Cj4+Pj4gICAgICAgICAgICAgICBtZW1yZWcgPSBSUENSRE1BX1JFR0lTVEVSOw0KPj4+PiAgICNl
bmRpZg0KPj4+PiArICAgICAgICB9IGVsc2Ugew0KPj4+PiArICAgICAgICAgICAgLyogTWluZCB0
aGUgaWEgbGltaXQgb24gRlJNUiBwYWdlIGxpc3QgZGVwdGggKi8NCj4+Pj4gKyAgICAgICAgICAg
IGlhLT5yaV9tYXhfZnJtcl9kZXB0aCA9IG1pbl90KHVuc2lnbmVkIGludCwNCj4+Pj4gKyAgICAg
ICAgICAgICAgICBSUENSRE1BX01BWF9EQVRBX1NFR1MsDQo+Pj4+ICsgICAgICAgICAgICAgICAg
ZGV2YXR0ci5tYXhfZmFzdF9yZWdfcGFnZV9saXN0X2xlbik7DQo+Pj4+ICAgICAgICAgICB9DQo+
Pj4+ICAgICAgICAgICBicmVhazsNCj4+Pj4gICAgICAgfQ0KPj4+PiBAQCAtNjU5LDI0ICs2NjQs
NDIgQEAgcnBjcmRtYV9lcF9jcmVhdGUoc3RydWN0IHJwY3JkbWFfZXAgKmVwLCBzdHJ1Y3QNCj4+
Pj4gcnBjcmRtYV9pYSAqaWEsDQo+Pj4+ICAgICAgIGVwLT5yZXBfYXR0ci5zcnEgPSBOVUxMOw0K
Pj4+PiAgICAgICBlcC0+cmVwX2F0dHIuY2FwLm1heF9zZW5kX3dyID0gY2RhdGEtPm1heF9yZXF1
ZXN0czsNCj4+Pj4gICAgICAgc3dpdGNoIChpYS0+cmlfbWVtcmVnX3N0cmF0ZWd5KSB7DQo+Pj4+
IC0gICAgY2FzZSBSUENSRE1BX0ZSTVI6DQo+Pj4+ICsgICAgY2FzZSBSUENSRE1BX0ZSTVI6IHsN
Cj4+Pj4gKyAgICAgICAgaW50IGRlcHRoID0gNzsNCj4+Pj4gKw0KPj4+PiAgICAgICAgICAgLyog
QWRkIHJvb20gZm9yIGZybXIgcmVnaXN0ZXIgYW5kIGludmFsaWRhdGUgV1JzLg0KPj4+PiAgICAg
ICAgICAgICogMS4gRlJNUiByZWcgV1IgZm9yIGhlYWQNCj4+Pj4gICAgICAgICAgICAqIDIuIEZS
TVIgaW52YWxpZGF0ZSBXUiBmb3IgaGVhZA0KPj4+PiAtICAgICAgICAgKiAzLiBGUk1SIHJlZyBX
UiBmb3IgcGFnZWxpc3QNCj4+Pj4gLSAgICAgICAgICogNC4gRlJNUiBpbnZhbGlkYXRlIFdSIGZv
ciBwYWdlbGlzdA0KPj4+PiArICAgICAgICAgKiAzLiBOIEZSTVIgcmVnIFdScyBmb3IgcGFnZWxp
c3QNCj4+Pj4gKyAgICAgICAgICogNC4gTiBGUk1SIGludmFsaWRhdGUgV1JzIGZvciBwYWdlbGlz
dA0KPj4+PiAgICAgICAgICAgICogNS4gRlJNUiByZWcgV1IgZm9yIHRhaWwNCj4+Pj4gICAgICAg
ICAgICAqIDYuIEZSTVIgaW52YWxpZGF0ZSBXUiBmb3IgdGFpbA0KPj4+PiAgICAgICAgICAgICog
Ny4gVGhlIFJETUFfU0VORCBXUg0KPj4+PiAgICAgICAgICAgICovDQo+Pj4+IC0gICAgICAgIGVw
LT5yZXBfYXR0ci5jYXAubWF4X3NlbmRfd3IgKj0gNzsNCj4+Pj4gKw0KPj4+PiArICAgICAgICAv
KiBDYWxjdWxhdGUgTiBpZiB0aGUgZGV2aWNlIG1heCBGUk1SIGRlcHRoIGlzIHNtYWxsZXIgdGhh
bg0KPj4+PiArICAgICAgICAgKiBSUENSRE1BX01BWF9EQVRBX1NFR1MuDQo+Pj4+ICsgICAgICAg
ICAqLw0KPj4+PiArICAgICAgICBpZiAoaWEtPnJpX21heF9mcm1yX2RlcHRoIDwgUlBDUkRNQV9N
QVhfREFUQV9TRUdTKSB7DQo+Pj4+ICsgICAgICAgICAgICBpbnQgZGVsdGEgPSBSUENSRE1BX01B
WF9EQVRBX1NFR1MgLQ0KPj4+PiArICAgICAgICAgICAgICAgICAgICBpYS0+cmlfbWF4X2ZybXJf
ZGVwdGg7DQo+Pj4+ICsNCj4+Pj4gKyAgICAgICAgICAgIGRvIHsNCj4+Pj4gKyAgICAgICAgICAg
ICAgICBkZXB0aCArPSAyOyAvKiBGUk1SIHJlZyArIGludmFsaWRhdGUgKi8NCj4+Pj4gKyAgICAg
ICAgICAgICAgICBkZWx0YSAtPSBpYS0+cmlfbWF4X2ZybXJfZGVwdGg7DQo+Pj4gSWYgaWEtPnJp
X21heF9mcm1yX2RlcHRoIGlzID0gMC4gVGhpcyBsb29wIGJlY29tZXMgaW5maW5pdGUgbG9vcC4N
Cj4+Pg0KPj4+PiArICAgICAgICAgICAgfSB3aGlsZSAoZGVsdGEgPiAwKTsNCj4+Pj4gKw0KPj4+
PiArICAgICAgICB9DQo+Pj4+ICsgICAgICAgIGVwLT5yZXBfYXR0ci5jYXAubWF4X3NlbmRfd3Ig
Kj0gZGVwdGg7DQo+Pj4+ICAgICAgICAgICBpZiAoZXAtPnJlcF9hdHRyLmNhcC5tYXhfc2VuZF93
ciA+IGRldmF0dHIubWF4X3FwX3dyKSB7DQo+Pj4+IC0gICAgICAgICAgICBjZGF0YS0+bWF4X3Jl
cXVlc3RzID0gZGV2YXR0ci5tYXhfcXBfd3IgLyA3Ow0KPj4+PiArICAgICAgICAgICAgY2RhdGEt
Pm1heF9yZXF1ZXN0cyA9IGRldmF0dHIubWF4X3FwX3dyIC8gZGVwdGg7DQo+Pj4+ICAgICAgICAg
ICAgICAgaWYgKCFjZGF0YS0+bWF4X3JlcXVlc3RzKQ0KPj4+PiAgICAgICAgICAgICAgICAgICBy
ZXR1cm4gLUVJTlZBTDsNCj4+Pj4gLSAgICAgICAgICAgIGVwLT5yZXBfYXR0ci5jYXAubWF4X3Nl
bmRfd3IgPSBjZGF0YS0NCj4+Pj4+IG1heF9yZXF1ZXN0cyAqIDc7DQo+Pj4+ICsgICAgICAgICAg
ICBlcC0+cmVwX2F0dHIuY2FwLm1heF9zZW5kX3dyID0gY2RhdGEtDQo+Pj4+PiBtYXhfcmVxdWVz
dHMgKg0KPj4+PiArICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGRlcHRoOw0KPj4+PiAg
ICAgICAgICAgfQ0KPj4+PiAgICAgICAgICAgYnJlYWs7DQo+Pj4+ICsgICAgfQ0KPj4+PiAgICAg
ICBjYXNlIFJQQ1JETUFfTUVNV0lORE9XU19BU1lOQzoNCj4+Pj4gICAgICAgY2FzZSBSUENSRE1B
X01FTVdJTkRPV1M6DQo+Pj4+ICAgICAgICAgICAvKiBBZGQgcm9vbSBmb3IgbXdfYmluZHMrdW5i
aW5kcyAtIG92ZXJraWxsISAqLyBAQCAtDQo+Pj4+IDEwNDMsMTYgKzEwNjYsMTYgQEAgcnBjcmRt
YV9idWZmZXJfY3JlYXRlKHN0cnVjdCBycGNyZG1hX2J1ZmZlciAqYnVmLA0KPj4+PiBzdHJ1Y3Qg
cnBjcmRtYV9lcCAqZXAsDQo+Pj4+ICAgICAgIGNhc2UgUlBDUkRNQV9GUk1SOg0KPj4+PiAgICAg
ICAgICAgZm9yIChpID0gYnVmLT5yYl9tYXhfcmVxdWVzdHMgKiBSUENSRE1BX01BWF9TRUdTOyBp
OyBpLS0NCj4+Pj4gKSB7DQo+Pj4+ICAgICAgICAgICAgICAgci0+ci5mcm1yLmZyX21yID0gaWJf
YWxsb2NfZmFzdF9yZWdfbXIoaWEtPnJpX3BkLA0KPj4+PiAtDQo+Pj4+IFJQQ1JETUFfTUFYX1NF
R1MpOw0KPj4+PiArICAgICAgICAgICAgICAgICAgICAgICAgaWEtPnJpX21heF9mcm1yX2RlcHRo
KTsNCj4+Pj4gICAgICAgICAgICAgICBpZiAoSVNfRVJSKHItPnIuZnJtci5mcl9tcikpIHsNCj4+
Pj4gICAgICAgICAgICAgICAgICAgcmMgPSBQVFJfRVJSKHItPnIuZnJtci5mcl9tcik7DQo+Pj4+
ICAgICAgICAgICAgICAgICAgIGRwcmludGsoIlJQQzogICAgICAgJXM6IGliX2FsbG9jX2Zhc3Rf
cmVnX21yIg0KPj4+PiAgICAgICAgICAgICAgICAgICAgICAgIiBmYWlsZWQgJWlcbiIsIF9fZnVu
Y19fLCByYyk7DQo+Pj4+ICAgICAgICAgICAgICAgICAgIGdvdG8gb3V0Ow0KPj4+PiAgICAgICAg
ICAgICAgIH0NCj4+Pj4gLSAgICAgICAgICAgIHItPnIuZnJtci5mcl9wZ2wgPQ0KPj4+PiAtICAg
ICAgICAgICAgICAgIGliX2FsbG9jX2Zhc3RfcmVnX3BhZ2VfbGlzdChpYS0+cmlfaWQtDQo+Pj4+
PiBkZXZpY2UsDQo+Pj4+IC0NCj4+Pj4gUlBDUkRNQV9NQVhfU0VHUyk7DQo+Pj4+ICsgICAgICAg
ICAgICByLT5yLmZybXIuZnJfcGdsID0gaWJfYWxsb2NfZmFzdF9yZWdfcGFnZV9saXN0KA0KPj4+
PiArICAgICAgICAgICAgICAgICAgICAgICAgaWEtPnJpX2lkLT5kZXZpY2UsDQo+Pj4+ICsgICAg
ICAgICAgICAgICAgICAgICAgICBpYS0+cmlfbWF4X2ZybXJfZGVwdGgpOw0KPj4+PiAgICAgICAg
ICAgICAgIGlmIChJU19FUlIoci0+ci5mcm1yLmZyX3BnbCkpIHsNCj4+Pj4gICAgICAgICAgICAg
ICAgICAgcmMgPSBQVFJfRVJSKHItPnIuZnJtci5mcl9wZ2wpOw0KPj4+PiAgICAgICAgICAgICAg
ICAgICBkcHJpbnRrKCJSUEM6ICAgICAgICVzOiAiDQo+Pj4+IEBAIC0xNDk4LDggKzE1MjEsOCBA
QCBycGNyZG1hX3JlZ2lzdGVyX2ZybXJfZXh0ZXJuYWwoc3RydWN0DQo+Pj4+IHJwY3JkbWFfbXJf
c2VnICpzZWcsDQo+Pj4+ICAgICAgIHNlZzEtPm1yX29mZnNldCAtPSBwYWdlb2ZmOyAgICAvKiBz
dGFydCBvZiBwYWdlICovDQo+Pj4+ICAgICAgIHNlZzEtPm1yX2xlbiArPSBwYWdlb2ZmOw0KPj4+
PiAgICAgICBsZW4gPSAtcGFnZW9mZjsNCj4+Pj4gLSAgICBpZiAoKm5zZWdzID4gUlBDUkRNQV9N
QVhfREFUQV9TRUdTKQ0KPj4+PiAtICAgICAgICAqbnNlZ3MgPSBSUENSRE1BX01BWF9EQVRBX1NF
R1M7DQo+Pj4+ICsgICAgaWYgKCpuc2VncyA+IGlhLT5yaV9tYXhfZnJtcl9kZXB0aCkNCj4+Pj4g
KyAgICAgICAgKm5zZWdzID0gaWEtPnJpX21heF9mcm1yX2RlcHRoOw0KPj4+PiAgICAgICBmb3Ig
KHBhZ2Vfbm8gPSBpID0gMDsgaSA8ICpuc2VnczspIHsNCj4+Pj4gICAgICAgICAgIHJwY3JkbWFf
bWFwX29uZShpYSwgc2VnLCB3cml0aW5nKTsNCj4+Pj4gICAgICAgICAgIHBhID0gc2VnLT5tcl9k
bWE7DQo+Pj4+IGRpZmYgLS1naXQgYS9uZXQvc3VucnBjL3hwcnRyZG1hL3hwcnRfcmRtYS5oDQo+
Pj4+IGIvbmV0L3N1bnJwYy94cHJ0cmRtYS94cHJ0X3JkbWEuaCBpbmRleCBjYzE0NDVkLi45ODM0
MGEzIDEwMDY0NA0KPj4+PiAtLS0gYS9uZXQvc3VucnBjL3hwcnRyZG1hL3hwcnRfcmRtYS5oDQo+
Pj4+ICsrKyBiL25ldC9zdW5ycGMveHBydHJkbWEveHBydF9yZG1hLmgNCj4+Pj4gQEAgLTY2LDYg
KzY2LDcgQEAgc3RydWN0IHJwY3JkbWFfaWEgew0KPj4+PiAgICAgICBzdHJ1Y3QgY29tcGxldGlv
biAgICByaV9kb25lOw0KPj4+PiAgICAgICBpbnQgICAgICAgICAgICByaV9hc3luY19yYzsNCj4+
Pj4gICAgICAgZW51bSBycGNyZG1hX21lbXJlZyAgICByaV9tZW1yZWdfc3RyYXRlZ3k7DQo+Pj4+
ICsgICAgdW5zaWduZWQgaW50ICAgICAgICByaV9tYXhfZnJtcl9kZXB0aDsNCj4+Pj4gICB9Ow0K
Pj4+Pg0KPj4+PiAgIC8qDQo+Pj4+DQo+Pj4+IC0tDQo+Pj4+IFRvIHVuc3Vic2NyaWJlIGZyb20g
dGhpcyBsaXN0OiBzZW5kIHRoZSBsaW5lICJ1bnN1YnNjcmliZQ0KPj4+PiBsaW51eC1yZG1hIiBp
biB0aGUNCj4+Pj4gYm9keSBvZiBhIG1lc3NhZ2UgdG8gbWFqb3Jkb21vQHZnZXIua2VybmVsLm9y
ZyBNb3JlIG1ham9yZG9tbyBpbmZvIGF0DQo+Pj4+IGh0dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFq
b3Jkb21vLWluZm8uaHRtbA0KPj4+IE7vv73vv73vv73vv73vv71y77+977+9ee+/ve+/ve+/vWLv
v71Y77+977+9x6d277+9Xu+/vSneunsubu+/vSvvv73vv73vv73vv71777+977+977+9Iu+/ve+/
vV5u77+9cu+/ve+/ve+/vXrvv70aDQo+Pj4g77+977+9aO+/ve+/ve+/ve+/vSbvv73vv70eIO+/
vUfvv73vv73vv71o77+9Ayjvv73pmo7vv73domoi77+977+9Gu+/vRtt77+977+977+977+977+9
eu+/vd6W77+977+977+9Zu+/ve+/ve+/vWjvv73vv73vv71+77+9bW1sPT0NCj4+DQo+PiAtLQ0K
Pj4gVG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2Ny
aWJlIGxpbnV4LXJkbWEiIGluDQo+PiB0aGUgYm9keSBvZiBhIG1lc3NhZ2UgdG8gbWFqb3Jkb21v
QHZnZXIua2VybmVsLm9yZw0KPj4gTW9yZSBtYWpvcmRvbW8gaW5mbyBhdCBodHRwOi8vdmdlci5r
ZXJuZWwub3JnL21ham9yZG9tby1pbmZvLmh0bWwNCj4NCj4gLS0NCj4gVG8gdW5zdWJzY3JpYmUg
ZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2NyaWJlIGxpbnV4LXJkbWEiIGlu
DQo+IHRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJuZWwub3JnDQo+
IE1vcmUgbWFqb3Jkb21vIGluZm8gYXQgIGh0dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21v
LWluZm8uaHRtbA0KDQo=

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2014-05-17  8:23 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-30 19:29 [PATCH V3 00/17] NFS/RDMA client-side patches Chuck Lever
2014-04-30 19:29 ` Chuck Lever
     [not found] ` <20140430191433.5663.16217.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2014-04-30 19:29   ` [PATCH V3 01/17] xprtrdma: mind the device's max fast register page list depth Chuck Lever
2014-04-30 19:29     ` Chuck Lever
     [not found]     ` <20140430192936.5663.66537.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2014-05-16  7:08       ` Devesh Sharma
2014-05-16  7:08         ` Devesh Sharma
     [not found]         ` <EE7902D3F51F404C82415C4803930ACD3FDFBDA9-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-05-16 14:10           ` Steve Wise
2014-05-16 14:10             ` Steve Wise
     [not found]             ` <53761C63.4050908-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2014-05-16 14:14               ` Steve Wise
2014-05-16 14:14                 ` Steve Wise
     [not found]                 ` <53761D28.3070704-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2014-05-16 14:29                   ` Steve Wise
2014-05-16 14:29                     ` Steve Wise
     [not found]                     ` <537620AF.3010307-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2014-05-17  8:23                       ` Devesh Sharma
2014-05-17  8:23                         ` Devesh Sharma
2014-04-30 19:29   ` [PATCH V3 02/17] nfs-rdma: Fix for FMR leaks Chuck Lever
2014-04-30 19:29     ` Chuck Lever
2014-04-30 19:29   ` [PATCH V3 03/17] xprtrdma: RPC/RDMA must invoke xprt_wake_pending_tasks() in process context Chuck Lever
2014-04-30 19:29     ` Chuck Lever
2014-04-30 19:30   ` [PATCH V3 04/17] xprtrdma: Remove BOUNCEBUFFERS memory registration mode Chuck Lever
2014-04-30 19:30     ` Chuck Lever
2014-04-30 19:30   ` [PATCH V3 05/17] xprtrdma: Remove MEMWINDOWS registration modes Chuck Lever
2014-04-30 19:30     ` Chuck Lever
2014-04-30 19:30   ` [PATCH V3 06/17] xprtrdma: Remove REGISTER memory registration mode Chuck Lever
2014-04-30 19:30     ` Chuck Lever
2014-04-30 19:30   ` [PATCH V3 07/17] xprtrdma: Fall back to MTHCAFMR when FRMR is not supported Chuck Lever
2014-04-30 19:30     ` Chuck Lever
2014-04-30 19:30   ` [PATCH V3 08/17] xprtrdma: mount reports "Invalid mount option" if memreg mode " Chuck Lever
2014-04-30 19:30     ` Chuck Lever
2014-04-30 19:30   ` [PATCH V3 09/17] xprtrdma: Simplify rpcrdma_deregister_external() synopsis Chuck Lever
2014-04-30 19:30     ` Chuck Lever
2014-04-30 19:30   ` [PATCH V3 10/17] xprtrdma: Make rpcrdma_ep_destroy() return void Chuck Lever
2014-04-30 19:30     ` Chuck Lever
2014-04-30 19:31   ` [PATCH V3 11/17] xprtrdma: Split the completion queue Chuck Lever
2014-04-30 19:31     ` Chuck Lever
2014-04-30 19:31   ` [PATCH V3 12/17] xprtrmda: Reduce lock contention in completion handlers Chuck Lever
2014-04-30 19:31     ` Chuck Lever
2014-04-30 19:31   ` [PATCH V3 13/17] xprtrmda: Reduce calls to ib_poll_cq() " Chuck Lever
2014-04-30 19:31     ` Chuck Lever
2014-04-30 19:31   ` [PATCH V3 14/17] xprtrdma: Limit work done by completion handler Chuck Lever
2014-04-30 19:31     ` Chuck Lever
2014-04-30 19:31   ` [PATCH V3 15/17] xprtrdma: Reduce the number of hardway buffer allocations Chuck Lever
2014-04-30 19:31     ` Chuck Lever
2014-04-30 19:31   ` [PATCH V3 16/17] xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting Chuck Lever
2014-04-30 19:31     ` Chuck Lever
2014-04-30 19:31   ` [PATCH V3 17/17] xprtrdma: Remove Tavor MTU setting Chuck Lever
2014-04-30 19:31     ` Chuck Lever
     [not found]     ` <20140430193155.5663.86148.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2014-05-01  7:36       ` Hal Rosenstock
2014-05-01  7:36         ` Hal Rosenstock
2014-05-02 19:27   ` [PATCH V3 00/17] NFS/RDMA client-side patches Doug Ledford
2014-05-02 19:27     ` Doug Ledford
2014-05-02 19:27   ` Doug Ledford
2014-05-02 19:27     ` Doug Ledford
2014-05-02 19:27 ` Doug Ledford
     [not found] ` <5363f223.e39f420a.4af6.6fc9SMTPIN_ADDED_BROKEN@mx.google.com>
     [not found]   ` <5363f223.e39f420a.4af6.6fc9SMTPIN_ADDED_BROKEN-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
2014-05-02 20:20     ` Chuck Lever
2014-05-02 20:20       ` Chuck Lever
     [not found]       ` <45067B04-660C-4971-B12F-AEC9F7D32785-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-05-02 22:34         ` Doug Ledford
2014-05-02 22:34           ` Doug Ledford
2014-05-02 22:34         ` Doug Ledford
2014-05-02 22:34           ` Doug Ledford
2014-05-02 22:34       ` Doug Ledford

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.