linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/2] NFS/RDMA changes
@ 2019-01-16 18:22 Chuck Lever
  2019-01-16 18:22 ` [PATCH RFC 1/2] xprtrdma: Check inline size before providing a Write chunk Chuck Lever
  2019-01-16 18:22 ` [PATCH RFC 2/2] xprtrdma: Reduce the doorbell rate (Receive) Chuck Lever
  0 siblings, 2 replies; 3+ messages in thread
From: Chuck Lever @ 2019-01-16 18:22 UTC (permalink / raw)
  To: linux-rdma, linux-nfs

Here are two NFS-RDMA-related patches I'd like to see in kernel
v5.1. Thanks in advance for review and comments.

---

Chuck Lever (2):
      xprtrdma: Check inline size before providing a Write chunk
      xprtrdma: Reduce the doorbell rate (Receive)


 net/sunrpc/xprtrdma/rpc_rdma.c  |   18 +++++++++++++++++-
 net/sunrpc/xprtrdma/verbs.c     |    2 ++
 net/sunrpc/xprtrdma/xprt_rdma.h |   11 +++++++++++
 3 files changed, 30 insertions(+), 1 deletion(-)

--
Chuck Lever

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH RFC 1/2] xprtrdma: Check inline size before providing a Write chunk
  2019-01-16 18:22 [PATCH RFC 0/2] NFS/RDMA changes Chuck Lever
@ 2019-01-16 18:22 ` Chuck Lever
  2019-01-16 18:22 ` [PATCH RFC 2/2] xprtrdma: Reduce the doorbell rate (Receive) Chuck Lever
  1 sibling, 0 replies; 3+ messages in thread
From: Chuck Lever @ 2019-01-16 18:22 UTC (permalink / raw)
  To: linux-rdma, linux-nfs

In very rare cases, an NFS READ operation might predict that the
non-payload part of the RPC Call is large. For instance, an
NFSv4 COMPOUND with a large GETATTR result, in combination with a
large Kerberos credential, could push the non-payload part to be
several kilobytes.

If the non-payload part is larger than the connection's inline
threshold, the client is required to provision a Reply chunk. The
current Linux client does not check for this case. There are two
obvious ways to handle it:

a. Provision a Write chunk for the payload and a Reply chunk for
   the non-payload part

b. Provision a Reply chunk for the whole RPC Reply

Some testing at a recent NFS bake-a-thon showed that servers can
mostly handle a. but there are some corner cases that do not work
yet. b. already works (it has to, to handle krb5i/p), but could be
somewhat less efficient. However, I expect this scenario to be very
rare -- no-one has reported a problem yet.

So I'm going to implement b. Sometime later I will provide some
patches to help make b. a little more efficient by more carefully
choosing the Reply chunk's segment sizes to ensure the payload is
optimally aligned.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/rpc_rdma.c |   18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index d18614e..7774aee 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -164,6 +164,21 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
 	return rqst->rq_rcv_buf.buflen <= ia->ri_max_inline_read;
 }
 
+/* The client is required to provide a Reply chunk if the maximum
+ * size of the non-payload part of the RPC Reply is larger than
+ * the inline threshold.
+ */
+static bool
+rpcrdma_nonpayload_inline(const struct rpcrdma_xprt *r_xprt,
+			  const struct rpc_rqst *rqst)
+{
+	const struct xdr_buf *buf = &rqst->rq_rcv_buf;
+	const struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+
+	return buf->head[0].iov_len + buf->tail[0].iov_len <
+		ia->ri_max_inline_read;
+}
+
 /* Split @vec on page boundaries into SGEs. FMR registers pages, not
  * a byte range. Other modes coalesce these SGEs into a single MR
  * when they can.
@@ -762,7 +777,8 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
 	 */
 	if (rpcrdma_results_inline(r_xprt, rqst))
 		wtype = rpcrdma_noch;
-	else if (ddp_allowed && rqst->rq_rcv_buf.flags & XDRBUF_READ)
+	else if ((ddp_allowed && rqst->rq_rcv_buf.flags & XDRBUF_READ) &&
+		 rpcrdma_nonpayload_inline(r_xprt, rqst))
 		wtype = rpcrdma_writech;
 	else
 		wtype = rpcrdma_replych;


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH RFC 2/2] xprtrdma: Reduce the doorbell rate (Receive)
  2019-01-16 18:22 [PATCH RFC 0/2] NFS/RDMA changes Chuck Lever
  2019-01-16 18:22 ` [PATCH RFC 1/2] xprtrdma: Check inline size before providing a Write chunk Chuck Lever
@ 2019-01-16 18:22 ` Chuck Lever
  1 sibling, 0 replies; 3+ messages in thread
From: Chuck Lever @ 2019-01-16 18:22 UTC (permalink / raw)
  To: linux-rdma, linux-nfs

Post RECV WRs in batches to reduce the hardware doorbell rate per
transport. This helps the RPC-over-RDMA client scale better in
number of transports.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/verbs.c     |    2 ++
 net/sunrpc/xprtrdma/xprt_rdma.h |   11 +++++++++++
 2 files changed, 13 insertions(+)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 7749a2b..0d3ec6f 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1482,6 +1482,8 @@ struct rpcrdma_regbuf *
 	if (ep->rep_receive_count > needed)
 		goto out;
 	needed -= ep->rep_receive_count;
+	if (!temp)
+		needed += RPCRDMA_MAX_RECV_BATCH;
 
 	count = 0;
 	wr = NULL;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 5a18472..47ded5c 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -205,6 +205,17 @@ struct rpcrdma_rep {
 	struct ib_recv_wr	rr_recv_wr;
 };
 
+/* To reduce the rate at which a transport invokes ib_post_recv
+ * (and thus the hardware doorbell rate), xprtrdma posts Receive
+ * WRs in batches.
+ *
+ * Setting this to zero disables Receive post batching.
+ */
+enum {
+	RPCRDMA_MAX_RECV_BATCH = 7,
+};
+
+
 /* struct rpcrdma_sendctx - DMA mapped SGEs to unmap after Send completes
  */
 struct rpcrdma_req;


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-01-16 18:22 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-16 18:22 [PATCH RFC 0/2] NFS/RDMA changes Chuck Lever
2019-01-16 18:22 ` [PATCH RFC 1/2] xprtrdma: Check inline size before providing a Write chunk Chuck Lever
2019-01-16 18:22 ` [PATCH RFC 2/2] xprtrdma: Reduce the doorbell rate (Receive) Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).