* [PATCH RFC 0/2] NFS/RDMA changes @ 2019-01-16 18:22 Chuck Lever 2019-01-16 18:22 ` [PATCH RFC 1/2] xprtrdma: Check inline size before providing a Write chunk Chuck Lever 2019-01-16 18:22 ` [PATCH RFC 2/2] xprtrdma: Reduce the doorbell rate (Receive) Chuck Lever 0 siblings, 2 replies; 3+ messages in thread From: Chuck Lever @ 2019-01-16 18:22 UTC (permalink / raw) To: linux-rdma, linux-nfs Here are two NFS-RDMA-related patches I'd like to see in kernel v5.1. Thanks in advance for review and comments. --- Chuck Lever (2): xprtrdma: Check inline size before providing a Write chunk xprtrdma: Reduce the doorbell rate (Receive) net/sunrpc/xprtrdma/rpc_rdma.c | 18 +++++++++++++++++- net/sunrpc/xprtrdma/verbs.c | 2 ++ net/sunrpc/xprtrdma/xprt_rdma.h | 11 +++++++++++ 3 files changed, 30 insertions(+), 1 deletion(-) -- Chuck Lever ^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH RFC 1/2] xprtrdma: Check inline size before providing a Write chunk 2019-01-16 18:22 [PATCH RFC 0/2] NFS/RDMA changes Chuck Lever @ 2019-01-16 18:22 ` Chuck Lever 2019-01-16 18:22 ` [PATCH RFC 2/2] xprtrdma: Reduce the doorbell rate (Receive) Chuck Lever 1 sibling, 0 replies; 3+ messages in thread From: Chuck Lever @ 2019-01-16 18:22 UTC (permalink / raw) To: linux-rdma, linux-nfs In very rare cases, an NFS READ operation might predict that the non-payload part of the RPC Call is large. For instance, an NFSv4 COMPOUND with a large GETATTR result, in combination with a large Kerberos credential, could push the non-payload part to be several kilobytes. If the non-payload part is larger than the connection's inline threshold, the client is required to provision a Reply chunk. The current Linux client does not check for this case. There are two obvious ways to handle it: a. Provision a Write chunk for the payload and a Reply chunk for the non-payload part b. Provision a Reply chunk for the whole RPC Reply Some testing at a recent NFS bake-a-thon showed that servers can mostly handle a. but there are some corner cases that do not work yet. b. already works (it has to, to handle krb5i/p), but could be somewhat less efficient. However, I expect this scenario to be very rare -- no-one has reported a problem yet. So I'm going to implement b. Sometime later I will provide some patches to help make b. a little more efficient by more carefully choosing the Reply chunk's segment sizes to ensure the payload is optimally aligned. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> --- net/sunrpc/xprtrdma/rpc_rdma.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c index d18614e..7774aee 100644 --- a/net/sunrpc/xprtrdma/rpc_rdma.c +++ b/net/sunrpc/xprtrdma/rpc_rdma.c @@ -164,6 +164,21 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt, return rqst->rq_rcv_buf.buflen <= ia->ri_max_inline_read; } +/* The client is required to provide a Reply chunk if the maximum + * size of the non-payload part of the RPC Reply is larger than + * the inline threshold. + */ +static bool +rpcrdma_nonpayload_inline(const struct rpcrdma_xprt *r_xprt, + const struct rpc_rqst *rqst) +{ + const struct xdr_buf *buf = &rqst->rq_rcv_buf; + const struct rpcrdma_ia *ia = &r_xprt->rx_ia; + + return buf->head[0].iov_len + buf->tail[0].iov_len < + ia->ri_max_inline_read; +} + /* Split @vec on page boundaries into SGEs. FMR registers pages, not * a byte range. Other modes coalesce these SGEs into a single MR * when they can. @@ -762,7 +777,8 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt, */ if (rpcrdma_results_inline(r_xprt, rqst)) wtype = rpcrdma_noch; - else if (ddp_allowed && rqst->rq_rcv_buf.flags & XDRBUF_READ) + else if ((ddp_allowed && rqst->rq_rcv_buf.flags & XDRBUF_READ) && + rpcrdma_nonpayload_inline(r_xprt, rqst)) wtype = rpcrdma_writech; else wtype = rpcrdma_replych; ^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH RFC 2/2] xprtrdma: Reduce the doorbell rate (Receive) 2019-01-16 18:22 [PATCH RFC 0/2] NFS/RDMA changes Chuck Lever 2019-01-16 18:22 ` [PATCH RFC 1/2] xprtrdma: Check inline size before providing a Write chunk Chuck Lever @ 2019-01-16 18:22 ` Chuck Lever 1 sibling, 0 replies; 3+ messages in thread From: Chuck Lever @ 2019-01-16 18:22 UTC (permalink / raw) To: linux-rdma, linux-nfs Post RECV WRs in batches to reduce the hardware doorbell rate per transport. This helps the RPC-over-RDMA client scale better in number of transports. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> --- net/sunrpc/xprtrdma/verbs.c | 2 ++ net/sunrpc/xprtrdma/xprt_rdma.h | 11 +++++++++++ 2 files changed, 13 insertions(+) diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 7749a2b..0d3ec6f 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -1482,6 +1482,8 @@ struct rpcrdma_regbuf * if (ep->rep_receive_count > needed) goto out; needed -= ep->rep_receive_count; + if (!temp) + needed += RPCRDMA_MAX_RECV_BATCH; count = 0; wr = NULL; diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h index 5a18472..47ded5c 100644 --- a/net/sunrpc/xprtrdma/xprt_rdma.h +++ b/net/sunrpc/xprtrdma/xprt_rdma.h @@ -205,6 +205,17 @@ struct rpcrdma_rep { struct ib_recv_wr rr_recv_wr; }; +/* To reduce the rate at which a transport invokes ib_post_recv + * (and thus the hardware doorbell rate), xprtrdma posts Receive + * WRs in batches. + * + * Setting this to zero disables Receive post batching. + */ +enum { + RPCRDMA_MAX_RECV_BATCH = 7, +}; + + /* struct rpcrdma_sendctx - DMA mapped SGEs to unmap after Send completes */ struct rpcrdma_req; ^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-01-16 18:22 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-01-16 18:22 [PATCH RFC 0/2] NFS/RDMA changes Chuck Lever 2019-01-16 18:22 ` [PATCH RFC 1/2] xprtrdma: Check inline size before providing a Write chunk Chuck Lever 2019-01-16 18:22 ` [PATCH RFC 2/2] xprtrdma: Reduce the doorbell rate (Receive) Chuck Lever
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).