All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/2] NFS/RDMA changes
@ 2019-01-16 18:22 Chuck Lever
  2019-01-16 18:22 ` [PATCH RFC 1/2] xprtrdma: Check inline size before providing a Write chunk Chuck Lever
  2019-01-16 18:22 ` [PATCH RFC 2/2] xprtrdma: Reduce the doorbell rate (Receive) Chuck Lever
  0 siblings, 2 replies; 3+ messages in thread
From: Chuck Lever @ 2019-01-16 18:22 UTC (permalink / raw)
  To: linux-rdma, linux-nfs

Here are two NFS-RDMA-related patches I'd like to see in kernel
v5.1. Thanks in advance for review and comments.

---

Chuck Lever (2):
      xprtrdma: Check inline size before providing a Write chunk
      xprtrdma: Reduce the doorbell rate (Receive)


 net/sunrpc/xprtrdma/rpc_rdma.c  |   18 +++++++++++++++++-
 net/sunrpc/xprtrdma/verbs.c     |    2 ++
 net/sunrpc/xprtrdma/xprt_rdma.h |   11 +++++++++++
 3 files changed, 30 insertions(+), 1 deletion(-)

--
Chuck Lever

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH RFC 1/2] xprtrdma: Check inline size before providing a Write chunk
  2019-01-16 18:22 [PATCH RFC 0/2] NFS/RDMA changes Chuck Lever
@ 2019-01-16 18:22 ` Chuck Lever
  2019-01-16 18:22 ` [PATCH RFC 2/2] xprtrdma: Reduce the doorbell rate (Receive) Chuck Lever
  1 sibling, 0 replies; 3+ messages in thread
From: Chuck Lever @ 2019-01-16 18:22 UTC (permalink / raw)
  To: linux-rdma, linux-nfs

In very rare cases, an NFS READ operation might predict that the
non-payload part of the RPC Call is large. For instance, an
NFSv4 COMPOUND with a large GETATTR result, in combination with a
large Kerberos credential, could push the non-payload part to be
several kilobytes.

If the non-payload part is larger than the connection's inline
threshold, the client is required to provision a Reply chunk. The
current Linux client does not check for this case. There are two
obvious ways to handle it:

a. Provision a Write chunk for the payload and a Reply chunk for
   the non-payload part

b. Provision a Reply chunk for the whole RPC Reply

Some testing at a recent NFS bake-a-thon showed that servers can
mostly handle a. but there are some corner cases that do not work
yet. b. already works (it has to, to handle krb5i/p), but could be
somewhat less efficient. However, I expect this scenario to be very
rare -- no-one has reported a problem yet.

So I'm going to implement b. Sometime later I will provide some
patches to help make b. a little more efficient by more carefully
choosing the Reply chunk's segment sizes to ensure the payload is
optimally aligned.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/rpc_rdma.c |   18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index d18614e..7774aee 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -164,6 +164,21 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
 	return rqst->rq_rcv_buf.buflen <= ia->ri_max_inline_read;
 }
 
+/* The client is required to provide a Reply chunk if the maximum
+ * size of the non-payload part of the RPC Reply is larger than
+ * the inline threshold.
+ */
+static bool
+rpcrdma_nonpayload_inline(const struct rpcrdma_xprt *r_xprt,
+			  const struct rpc_rqst *rqst)
+{
+	const struct xdr_buf *buf = &rqst->rq_rcv_buf;
+	const struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+
+	return buf->head[0].iov_len + buf->tail[0].iov_len <
+		ia->ri_max_inline_read;
+}
+
 /* Split @vec on page boundaries into SGEs. FMR registers pages, not
  * a byte range. Other modes coalesce these SGEs into a single MR
  * when they can.
@@ -762,7 +777,8 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
 	 */
 	if (rpcrdma_results_inline(r_xprt, rqst))
 		wtype = rpcrdma_noch;
-	else if (ddp_allowed && rqst->rq_rcv_buf.flags & XDRBUF_READ)
+	else if ((ddp_allowed && rqst->rq_rcv_buf.flags & XDRBUF_READ) &&
+		 rpcrdma_nonpayload_inline(r_xprt, rqst))
 		wtype = rpcrdma_writech;
 	else
 		wtype = rpcrdma_replych;


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH RFC 2/2] xprtrdma: Reduce the doorbell rate (Receive)
  2019-01-16 18:22 [PATCH RFC 0/2] NFS/RDMA changes Chuck Lever
  2019-01-16 18:22 ` [PATCH RFC 1/2] xprtrdma: Check inline size before providing a Write chunk Chuck Lever
@ 2019-01-16 18:22 ` Chuck Lever
  1 sibling, 0 replies; 3+ messages in thread
From: Chuck Lever @ 2019-01-16 18:22 UTC (permalink / raw)
  To: linux-rdma, linux-nfs

Post RECV WRs in batches to reduce the hardware doorbell rate per
transport. This helps the RPC-over-RDMA client scale better in
number of transports.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/verbs.c     |    2 ++
 net/sunrpc/xprtrdma/xprt_rdma.h |   11 +++++++++++
 2 files changed, 13 insertions(+)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 7749a2b..0d3ec6f 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1482,6 +1482,8 @@ struct rpcrdma_regbuf *
 	if (ep->rep_receive_count > needed)
 		goto out;
 	needed -= ep->rep_receive_count;
+	if (!temp)
+		needed += RPCRDMA_MAX_RECV_BATCH;
 
 	count = 0;
 	wr = NULL;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 5a18472..47ded5c 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -205,6 +205,17 @@ struct rpcrdma_rep {
 	struct ib_recv_wr	rr_recv_wr;
 };
 
+/* To reduce the rate at which a transport invokes ib_post_recv
+ * (and thus the hardware doorbell rate), xprtrdma posts Receive
+ * WRs in batches.
+ *
+ * Setting this to zero disables Receive post batching.
+ */
+enum {
+	RPCRDMA_MAX_RECV_BATCH = 7,
+};
+
+
 /* struct rpcrdma_sendctx - DMA mapped SGEs to unmap after Send completes
  */
 struct rpcrdma_req;


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-01-16 18:22 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-16 18:22 [PATCH RFC 0/2] NFS/RDMA changes Chuck Lever
2019-01-16 18:22 ` [PATCH RFC 1/2] xprtrdma: Check inline size before providing a Write chunk Chuck Lever
2019-01-16 18:22 ` [PATCH RFC 2/2] xprtrdma: Reduce the doorbell rate (Receive) Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.