* [PATCH RFC 0/2] NFS/RDMA changes
@ 2019-01-16 18:22 Chuck Lever
2019-01-16 18:22 ` [PATCH RFC 1/2] xprtrdma: Check inline size before providing a Write chunk Chuck Lever
2019-01-16 18:22 ` [PATCH RFC 2/2] xprtrdma: Reduce the doorbell rate (Receive) Chuck Lever
0 siblings, 2 replies; 3+ messages in thread
From: Chuck Lever @ 2019-01-16 18:22 UTC (permalink / raw)
To: linux-rdma, linux-nfs
Here are two NFS-RDMA-related patches I'd like to see in kernel
v5.1. Thanks in advance for review and comments.
---
Chuck Lever (2):
xprtrdma: Check inline size before providing a Write chunk
xprtrdma: Reduce the doorbell rate (Receive)
net/sunrpc/xprtrdma/rpc_rdma.c | 18 +++++++++++++++++-
net/sunrpc/xprtrdma/verbs.c | 2 ++
net/sunrpc/xprtrdma/xprt_rdma.h | 11 +++++++++++
3 files changed, 30 insertions(+), 1 deletion(-)
--
Chuck Lever
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH RFC 1/2] xprtrdma: Check inline size before providing a Write chunk
2019-01-16 18:22 [PATCH RFC 0/2] NFS/RDMA changes Chuck Lever
@ 2019-01-16 18:22 ` Chuck Lever
2019-01-16 18:22 ` [PATCH RFC 2/2] xprtrdma: Reduce the doorbell rate (Receive) Chuck Lever
1 sibling, 0 replies; 3+ messages in thread
From: Chuck Lever @ 2019-01-16 18:22 UTC (permalink / raw)
To: linux-rdma, linux-nfs
In very rare cases, an NFS READ operation might predict that the
non-payload part of the RPC Call is large. For instance, an
NFSv4 COMPOUND with a large GETATTR result, in combination with a
large Kerberos credential, could push the non-payload part to be
several kilobytes.
If the non-payload part is larger than the connection's inline
threshold, the client is required to provision a Reply chunk. The
current Linux client does not check for this case. There are two
obvious ways to handle it:
a. Provision a Write chunk for the payload and a Reply chunk for
the non-payload part
b. Provision a Reply chunk for the whole RPC Reply
Some testing at a recent NFS bake-a-thon showed that servers can
mostly handle a. but there are some corner cases that do not work
yet. b. already works (it has to, to handle krb5i/p), but could be
somewhat less efficient. However, I expect this scenario to be very
rare -- no-one has reported a problem yet.
So I'm going to implement b. Sometime later I will provide some
patches to help make b. a little more efficient by more carefully
choosing the Reply chunk's segment sizes to ensure the payload is
optimally aligned.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index d18614e..7774aee 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -164,6 +164,21 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
return rqst->rq_rcv_buf.buflen <= ia->ri_max_inline_read;
}
+/* The client is required to provide a Reply chunk if the maximum
+ * size of the non-payload part of the RPC Reply is larger than
+ * the inline threshold.
+ */
+static bool
+rpcrdma_nonpayload_inline(const struct rpcrdma_xprt *r_xprt,
+ const struct rpc_rqst *rqst)
+{
+ const struct xdr_buf *buf = &rqst->rq_rcv_buf;
+ const struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+
+ return buf->head[0].iov_len + buf->tail[0].iov_len <
+ ia->ri_max_inline_read;
+}
+
/* Split @vec on page boundaries into SGEs. FMR registers pages, not
* a byte range. Other modes coalesce these SGEs into a single MR
* when they can.
@@ -762,7 +777,8 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
*/
if (rpcrdma_results_inline(r_xprt, rqst))
wtype = rpcrdma_noch;
- else if (ddp_allowed && rqst->rq_rcv_buf.flags & XDRBUF_READ)
+ else if ((ddp_allowed && rqst->rq_rcv_buf.flags & XDRBUF_READ) &&
+ rpcrdma_nonpayload_inline(r_xprt, rqst))
wtype = rpcrdma_writech;
else
wtype = rpcrdma_replych;
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH RFC 2/2] xprtrdma: Reduce the doorbell rate (Receive)
2019-01-16 18:22 [PATCH RFC 0/2] NFS/RDMA changes Chuck Lever
2019-01-16 18:22 ` [PATCH RFC 1/2] xprtrdma: Check inline size before providing a Write chunk Chuck Lever
@ 2019-01-16 18:22 ` Chuck Lever
1 sibling, 0 replies; 3+ messages in thread
From: Chuck Lever @ 2019-01-16 18:22 UTC (permalink / raw)
To: linux-rdma, linux-nfs
Post RECV WRs in batches to reduce the hardware doorbell rate per
transport. This helps the RPC-over-RDMA client scale better in
number of transports.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/xprtrdma/verbs.c | 2 ++
net/sunrpc/xprtrdma/xprt_rdma.h | 11 +++++++++++
2 files changed, 13 insertions(+)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 7749a2b..0d3ec6f 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1482,6 +1482,8 @@ struct rpcrdma_regbuf *
if (ep->rep_receive_count > needed)
goto out;
needed -= ep->rep_receive_count;
+ if (!temp)
+ needed += RPCRDMA_MAX_RECV_BATCH;
count = 0;
wr = NULL;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 5a18472..47ded5c 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -205,6 +205,17 @@ struct rpcrdma_rep {
struct ib_recv_wr rr_recv_wr;
};
+/* To reduce the rate at which a transport invokes ib_post_recv
+ * (and thus the hardware doorbell rate), xprtrdma posts Receive
+ * WRs in batches.
+ *
+ * Setting this to zero disables Receive post batching.
+ */
+enum {
+ RPCRDMA_MAX_RECV_BATCH = 7,
+};
+
+
/* struct rpcrdma_sendctx - DMA mapped SGEs to unmap after Send completes
*/
struct rpcrdma_req;
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-01-16 18:22 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-16 18:22 [PATCH RFC 0/2] NFS/RDMA changes Chuck Lever
2019-01-16 18:22 ` [PATCH RFC 1/2] xprtrdma: Check inline size before providing a Write chunk Chuck Lever
2019-01-16 18:22 ` [PATCH RFC 2/2] xprtrdma: Reduce the doorbell rate (Receive) Chuck Lever
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).