From: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Subject: [PATCH v1 08/12] xprtrdma: Fix XDR tail buffer marshalling Date: Thu, 09 Jul 2015 16:42:56 -0400 [thread overview] Message-ID: <20150709204256.26247.14336.stgit@manet.1015granger.net> (raw) In-Reply-To: <20150709203242.26247.4848.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org> Currently xprtrdma appends an extra chunk element to the RPC/RDMA read chunk list of each NFSv4 WRITE compound. The extra element contains the final GETATTR operation in the compound. The result is an extra RDMA READ operation to transfer a very short piece of each NFS WRITE compound (typically 16 bytes). This is inefficient. It is also incorrect. Although RFC 5667 is not precise about when using a read list with NFSv4 COMPOUND is allowed, the intent is that only data arguments not touched by the NFS client are to be sent using RDMA READ or WRITE. The NFS client constructs GETATTR arguments itself, and therefore is required to send the trailing GETATTR operation as additional inline content, not as a data payload. NB: This change is not backwards compatible. Some older servers do not accept inline content following the read list. The Linux NFS server should handle this content correctly as of commit a97c331f9aa9 ("svcrdma: Handle additional inline content"). Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> --- net/sunrpc/xprtrdma/rpc_rdma.c | 43 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 42 insertions(+), 1 deletion(-) diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c index 8ac1448c..cb05233 100644 --- a/net/sunrpc/xprtrdma/rpc_rdma.c +++ b/net/sunrpc/xprtrdma/rpc_rdma.c @@ -96,6 +96,42 @@ static bool rpcrdma_results_inline(struct rpc_rqst *rqst) return repsize <= RPCRDMA_INLINE_READ_THRESHOLD(rqst); } +static int +rpcrdma_tail_pullup(struct xdr_buf *buf) +{ + size_t tlen = buf->tail[0].iov_len; + size_t skip = tlen & 3; + + /* Do not include the tail if it is only an XDR pad */ + if (tlen < 4) + return 0; + + /* xdr_write_pages() adds a pad at the beginning of the tail + * if the content in "buf->pages" is unaligned. Force the + * tail's actual content to land at the next XDR position + * after the head instead. + */ + if (skip) { + unsigned char *src, *dst; + unsigned int count; + + src = buf->tail[0].iov_base; + dst = buf->head[0].iov_base; + dst += buf->head[0].iov_len; + + src += skip; + tlen -= skip; + + dprintk("RPC: %s: skip=%zu, memmove(%p, %p, %zu)\n", + __func__, skip, dst, src, tlen); + + for (count = tlen; count; count--) + *dst++ = *src++; + } + + return tlen; +} + /* * Chunk assembly from upper layer xdr_buf. * @@ -147,6 +183,10 @@ rpcrdma_convert_iovs(struct xdr_buf *xdrbuf, unsigned int pos, if (len && n == nsegs) return -EIO; + /* When encoding the read list, the tail is always sent inline */ + if (type == rpcrdma_readch) + return n; + if (xdrbuf->tail[0].iov_len) { /* the rpcrdma protocol allows us to omit any trailing * xdr pad bytes, saving the server an RDMA operation. */ @@ -504,7 +544,8 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst) /* new length after pullup */ rpclen = rqst->rq_svec[0].iov_len; } - } + } else if (rtype == rpcrdma_readch) + rpclen += rpcrdma_tail_pullup(&rqst->rq_snd_buf); if (rtype != rpcrdma_noch) { hdrlen = rpcrdma_create_chunks(rqst, &rqst->rq_snd_buf, -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html
WARNING: multiple messages have this Message-ID (diff)
From: Chuck Lever <chuck.lever@oracle.com> To: linux-rdma@vger.kernel.org, linux-nfs@vger.kernel.org Subject: [PATCH v1 08/12] xprtrdma: Fix XDR tail buffer marshalling Date: Thu, 09 Jul 2015 16:42:56 -0400 [thread overview] Message-ID: <20150709204256.26247.14336.stgit@manet.1015granger.net> (raw) In-Reply-To: <20150709203242.26247.4848.stgit@manet.1015granger.net> Currently xprtrdma appends an extra chunk element to the RPC/RDMA read chunk list of each NFSv4 WRITE compound. The extra element contains the final GETATTR operation in the compound. The result is an extra RDMA READ operation to transfer a very short piece of each NFS WRITE compound (typically 16 bytes). This is inefficient. It is also incorrect. Although RFC 5667 is not precise about when using a read list with NFSv4 COMPOUND is allowed, the intent is that only data arguments not touched by the NFS client are to be sent using RDMA READ or WRITE. The NFS client constructs GETATTR arguments itself, and therefore is required to send the trailing GETATTR operation as additional inline content, not as a data payload. NB: This change is not backwards compatible. Some older servers do not accept inline content following the read list. The Linux NFS server should handle this content correctly as of commit a97c331f9aa9 ("svcrdma: Handle additional inline content"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> --- net/sunrpc/xprtrdma/rpc_rdma.c | 43 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 42 insertions(+), 1 deletion(-) diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c index 8ac1448c..cb05233 100644 --- a/net/sunrpc/xprtrdma/rpc_rdma.c +++ b/net/sunrpc/xprtrdma/rpc_rdma.c @@ -96,6 +96,42 @@ static bool rpcrdma_results_inline(struct rpc_rqst *rqst) return repsize <= RPCRDMA_INLINE_READ_THRESHOLD(rqst); } +static int +rpcrdma_tail_pullup(struct xdr_buf *buf) +{ + size_t tlen = buf->tail[0].iov_len; + size_t skip = tlen & 3; + + /* Do not include the tail if it is only an XDR pad */ + if (tlen < 4) + return 0; + + /* xdr_write_pages() adds a pad at the beginning of the tail + * if the content in "buf->pages" is unaligned. Force the + * tail's actual content to land at the next XDR position + * after the head instead. + */ + if (skip) { + unsigned char *src, *dst; + unsigned int count; + + src = buf->tail[0].iov_base; + dst = buf->head[0].iov_base; + dst += buf->head[0].iov_len; + + src += skip; + tlen -= skip; + + dprintk("RPC: %s: skip=%zu, memmove(%p, %p, %zu)\n", + __func__, skip, dst, src, tlen); + + for (count = tlen; count; count--) + *dst++ = *src++; + } + + return tlen; +} + /* * Chunk assembly from upper layer xdr_buf. * @@ -147,6 +183,10 @@ rpcrdma_convert_iovs(struct xdr_buf *xdrbuf, unsigned int pos, if (len && n == nsegs) return -EIO; + /* When encoding the read list, the tail is always sent inline */ + if (type == rpcrdma_readch) + return n; + if (xdrbuf->tail[0].iov_len) { /* the rpcrdma protocol allows us to omit any trailing * xdr pad bytes, saving the server an RDMA operation. */ @@ -504,7 +544,8 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst) /* new length after pullup */ rpclen = rqst->rq_svec[0].iov_len; } - } + } else if (rtype == rpcrdma_readch) + rpclen += rpcrdma_tail_pullup(&rqst->rq_snd_buf); if (rtype != rpcrdma_noch) { hdrlen = rpcrdma_create_chunks(rqst, &rqst->rq_snd_buf,
next prev parent reply other threads:[~2015-07-09 20:42 UTC|newest] Thread overview: 92+ messages / expand[flat|nested] mbox.gz Atom feed top 2015-07-09 20:41 [PATCH v1 00/12] NFS/RDMA client side for Linux 4.3 Chuck Lever 2015-07-09 20:41 ` Chuck Lever [not found] ` <20150709203242.26247.4848.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org> 2015-07-09 20:41 ` [PATCH v1 01/12] xprtrdma: Make xprt_setup_rdma() agnostic to family of server address Chuck Lever 2015-07-09 20:41 ` Chuck Lever 2015-07-09 20:41 ` [PATCH v1 02/12] xprtrdma: Raise maximum payload size to one megabyte Chuck Lever 2015-07-09 20:41 ` Chuck Lever [not found] ` <20150709204159.26247.44592.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org> 2015-07-10 10:25 ` Devesh Sharma 2015-07-10 10:25 ` Devesh Sharma 2015-07-10 19:21 ` Anna Schumaker 2015-07-10 19:21 ` Anna Schumaker [not found] ` <55A01B56.2020107-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org> 2015-07-10 19:33 ` Chuck Lever 2015-07-10 19:33 ` Chuck Lever [not found] ` <F9717330-F362-477B-915D-D6AAE7B5DDDB-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2015-07-10 19:41 ` Anna Schumaker 2015-07-10 19:41 ` Anna Schumaker 2015-07-12 14:31 ` Sagi Grimberg 2015-07-12 14:31 ` Sagi Grimberg 2015-07-09 20:42 ` [PATCH v1 03/12] xprtrdma: Increase default credit limit Chuck Lever 2015-07-09 20:42 ` Chuck Lever [not found] ` <20150709204208.26247.52073.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org> 2015-07-10 10:45 ` Devesh Sharma 2015-07-10 10:45 ` Devesh Sharma [not found] ` <CANjDDBiMDM5VP5ev3LFfapX-5amqukH=dJdaEZVcZc9SSP2CCg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2015-07-10 14:33 ` Chuck Lever 2015-07-10 14:33 ` Chuck Lever [not found] ` <B6E6A65D-DE61-4A95-91D7-1E8A5132F7EE-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2015-07-10 14:47 ` Devesh Sharma 2015-07-10 14:47 ` Devesh Sharma 2015-07-12 14:31 ` Sagi Grimberg 2015-07-12 14:31 ` Sagi Grimberg 2015-07-09 20:42 ` [PATCH v1 04/12] xprtrdma: Remove last ib_reg_phys_mr() call site Chuck Lever 2015-07-09 20:42 ` Chuck Lever [not found] ` <20150709204218.26247.67243.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org> 2015-07-10 10:52 ` Devesh Sharma 2015-07-10 10:52 ` Devesh Sharma 2015-07-11 10:34 ` Christoph Hellwig 2015-07-11 10:34 ` Christoph Hellwig [not found] ` <20150711103428.GD14741-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-07-11 18:50 ` Chuck Lever 2015-07-11 18:50 ` Chuck Lever [not found] ` <682604AF-955A-44F8-86F9-EB7D403CABFF-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2015-07-12 7:58 ` Christoph Hellwig 2015-07-12 7:58 ` Christoph Hellwig 2015-07-12 14:31 ` Sagi Grimberg 2015-07-12 14:31 ` Sagi Grimberg 2015-07-09 20:42 ` [PATCH v1 05/12] xprtrdma: Account for RPC/RDMA header size when deciding to inline Chuck Lever 2015-07-09 20:42 ` Chuck Lever [not found] ` <20150709204227.26247.51111.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org> 2015-07-10 10:55 ` Devesh Sharma 2015-07-10 10:55 ` Devesh Sharma 2015-07-10 20:08 ` Anna Schumaker 2015-07-10 20:08 ` Anna Schumaker [not found] ` <55A02650.1050809-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org> 2015-07-10 20:28 ` Chuck Lever 2015-07-10 20:28 ` Chuck Lever 2015-07-12 14:37 ` Sagi Grimberg 2015-07-12 14:37 ` Sagi Grimberg [not found] ` <55A27B9D.5010002-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-07-12 17:52 ` Chuck Lever 2015-07-12 17:52 ` Chuck Lever 2015-07-09 20:42 ` [PATCH v1 06/12] xprtrdma: Always provide a write list when sending NFS READ Chuck Lever 2015-07-09 20:42 ` Chuck Lever [not found] ` <20150709204237.26247.297.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org> 2015-07-10 11:08 ` Devesh Sharma 2015-07-10 11:08 ` Devesh Sharma 2015-07-12 14:42 ` Sagi Grimberg 2015-07-12 14:42 ` Sagi Grimberg 2015-07-09 20:42 ` [PATCH v1 07/12] xprtrdma: Don't provide a reply chunk when expecting a short reply Chuck Lever 2015-07-09 20:42 ` Chuck Lever [not found] ` <20150709204246.26247.10367.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org> 2015-07-12 14:58 ` Sagi Grimberg 2015-07-12 14:58 ` Sagi Grimberg [not found] ` <55A2809C.7020106-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-07-12 18:38 ` Chuck Lever 2015-07-12 18:38 ` Chuck Lever [not found] ` <2EB8EA33-9345-4D18-8BE1-39C4EB2658E2-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2015-07-14 9:54 ` Sagi Grimberg 2015-07-14 9:54 ` Sagi Grimberg 2015-07-09 20:42 ` Chuck Lever [this message] 2015-07-09 20:42 ` [PATCH v1 08/12] xprtrdma: Fix XDR tail buffer marshalling Chuck Lever 2015-07-09 20:43 ` [PATCH v1 09/12] xprtrdma: Prepare rpcrdma_ep_post() for RDMA_NOMSG calls Chuck Lever 2015-07-09 20:43 ` Chuck Lever [not found] ` <20150709204305.26247.39173.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org> 2015-07-10 11:29 ` Devesh Sharma 2015-07-10 11:29 ` Devesh Sharma [not found] ` <CANjDDBh38PrxGgWGuKYCYcbjqY9ELrpkGqaRZ_ueKfD7FEQP8g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2015-07-10 12:58 ` Tom Talpey 2015-07-10 12:58 ` Tom Talpey [not found] ` <559FC17A.7060502-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org> 2015-07-10 14:11 ` Devesh Sharma 2015-07-10 14:11 ` Devesh Sharma [not found] ` <CANjDDBipaXMO7Z5wirG127E9kQ0aHdWJ1s6ZM21th5RWLNh-4Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2015-07-10 14:53 ` Chuck Lever 2015-07-10 14:53 ` Chuck Lever [not found] ` <F5F5CAC4-866E-4BCA-8094-908CE9ED3B4B-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2015-07-10 22:44 ` Jason Gunthorpe 2015-07-10 22:44 ` Jason Gunthorpe 2015-07-10 20:43 ` Anna Schumaker 2015-07-10 20:43 ` Anna Schumaker [not found] ` <55A02E73.7000902-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org> 2015-07-10 20:52 ` Chuck Lever 2015-07-10 20:52 ` Chuck Lever 2015-07-09 20:43 ` [PATCH v1 10/12] xprtrdma: Fix large NFS SYMLINK calls Chuck Lever 2015-07-09 20:43 ` Chuck Lever [not found] ` <20150709204315.26247.47851.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org> 2015-07-14 16:01 ` Anna Schumaker 2015-07-14 16:01 ` Anna Schumaker [not found] ` <55A53259.8090606-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2015-07-14 19:09 ` Chuck Lever 2015-07-14 19:09 ` Chuck Lever 2015-07-09 20:43 ` [PATCH v1 11/12] xprtrdma: Clean up xprt_rdma_print_stats() Chuck Lever 2015-07-09 20:43 ` Chuck Lever 2015-07-09 20:43 ` [PATCH v1 12/12] xprtrdma: Count RDMA_NOMSG type calls Chuck Lever 2015-07-09 20:43 ` Chuck Lever
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20150709204256.26247.14336.stgit@manet.1015granger.net \ --to=chuck.lever-qhclzuegtsvqt0dzr+alfa@public.gmane.org \ --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \ --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.