All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
To: "J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
Cc: Linux RDMA Mailing List
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Linux NFS Mailing List
	<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH v4 01/11] svcrdma: Do not send XDR roundup bytes for a write chunk
Date: Mon, 21 Dec 2015 17:11:56 -0500	[thread overview]
Message-ID: <DF5B7D29-0C6C-47EF-8E3E-74BF137D7F95@oracle.com> (raw)
In-Reply-To: <20151221212959.GE7869-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>


> On Dec 21, 2015, at 4:29 PM, J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote:
> 
> On Mon, Dec 21, 2015 at 04:15:23PM -0500, Chuck Lever wrote:
>> 
>>> On Dec 21, 2015, at 4:07 PM, J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote:
>>> 
>>> On Mon, Dec 14, 2015 at 04:30:09PM -0500, Chuck Lever wrote:
>>>> Minor optimization: when dealing with write chunk XDR roundup, do
>>>> not post a Write WR for the zero bytes in the pad. Simply update
>>>> the write segment in the RPC-over-RDMA header to reflect the extra
>>>> pad bytes.
>>>> 
>>>> The Reply chunk is also a write chunk, but the server does not use
>>>> send_write_chunks() to send the Reply chunk. That's OK in this case:
>>>> the server Upper Layer typically marshals the Reply chunk contents
>>>> in a single contiguous buffer, without a separate tail for the XDR
>>>> pad.
>>>> 
>>>> The comments and the variable naming refer to "chunks" but what is
>>>> really meant is "segments." The existing code sends only one
>>>> xdr_write_chunk per RPC reply.
>>>> 
>>>> The fix assumes this as well. When the XDR pad in the first write
>>>> chunk is reached, the assumption is the Write list is complete and
>>>> send_write_chunks() returns.
>>>> 
>>>> That will remain a valid assumption until the server Upper Layer can
>>>> support multiple bulk payload results per RPC.
>>>> 
>>>> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>>> ---
>>>> net/sunrpc/xprtrdma/svc_rdma_sendto.c |    7 +++++++
>>>> 1 file changed, 7 insertions(+)
>>>> 
>>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>>>> index 969a1ab..bad5eaa 100644
>>>> --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>>>> @@ -342,6 +342,13 @@ static int send_write_chunks(struct svcxprt_rdma *xprt,
>>>> 						arg_ch->rs_handle,
>>>> 						arg_ch->rs_offset,
>>>> 						write_len);
>>>> +
>>>> +		/* Do not send XDR pad bytes */
>>>> +		if (chunk_no && write_len < 4) {
>>>> +			chunk_no++;
>>>> +			break;
>>> 
>>> I'm pretty lost in this code.  Why does (chunk_no && write_len < 4) mean
>>> this is xdr padding?
>> 
>> Chunk zero is always data. Padding is always going to be
>> after the first chunk. Any chunk after chunk zero that is
>> shorter than XDR quad alignment is going to be a pad.
> 
> I don't really know what a chunk is....  Looking at the code:
> 
> 	write_len = min(xfer_len, be32_to_cpu(arg_ch->rs_length));
> 
> so I guess the assumption is just that those rs_length's are always a
> multiple of four?

The example you recently gave was a two-byte NFS READ
that crosses a page boundary.

In that case, the NFSD would pass down an xdr_buf that
has one byte in a page, one byte in another page, and
a two-byte XDR pad. The logic introduced by this
optimization would be fooled, and neither the second
byte nor the XDR pad would be written to the client.

Unless you can think of a way to recognize an XDR pad
in the xdr_buf 100% of the time, you should drop this
patch.

As far as I know, none of the other patches in this
series depend on this optimization, so please merge
them if you can.


> --b.
> 
>> 
>> Probably too clever. Is there a better way to detect
>> the XDR pad?
>> 
>> 
>>>> +		}
>>>> +
>>>> 		chunk_off = 0;
>>>> 		while (write_len) {
>>>> 			ret = send_write(xprt, rqstp,
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: Chuck Lever <chuck.lever@oracle.com>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Linux RDMA Mailing List <linux-rdma@vger.kernel.org>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v4 01/11] svcrdma: Do not send XDR roundup bytes for a write chunk
Date: Mon, 21 Dec 2015 17:11:56 -0500	[thread overview]
Message-ID: <DF5B7D29-0C6C-47EF-8E3E-74BF137D7F95@oracle.com> (raw)
In-Reply-To: <20151221212959.GE7869@fieldses.org>


> On Dec 21, 2015, at 4:29 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Mon, Dec 21, 2015 at 04:15:23PM -0500, Chuck Lever wrote:
>> 
>>> On Dec 21, 2015, at 4:07 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>>> 
>>> On Mon, Dec 14, 2015 at 04:30:09PM -0500, Chuck Lever wrote:
>>>> Minor optimization: when dealing with write chunk XDR roundup, do
>>>> not post a Write WR for the zero bytes in the pad. Simply update
>>>> the write segment in the RPC-over-RDMA header to reflect the extra
>>>> pad bytes.
>>>> 
>>>> The Reply chunk is also a write chunk, but the server does not use
>>>> send_write_chunks() to send the Reply chunk. That's OK in this case:
>>>> the server Upper Layer typically marshals the Reply chunk contents
>>>> in a single contiguous buffer, without a separate tail for the XDR
>>>> pad.
>>>> 
>>>> The comments and the variable naming refer to "chunks" but what is
>>>> really meant is "segments." The existing code sends only one
>>>> xdr_write_chunk per RPC reply.
>>>> 
>>>> The fix assumes this as well. When the XDR pad in the first write
>>>> chunk is reached, the assumption is the Write list is complete and
>>>> send_write_chunks() returns.
>>>> 
>>>> That will remain a valid assumption until the server Upper Layer can
>>>> support multiple bulk payload results per RPC.
>>>> 
>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>> ---
>>>> net/sunrpc/xprtrdma/svc_rdma_sendto.c |    7 +++++++
>>>> 1 file changed, 7 insertions(+)
>>>> 
>>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>>>> index 969a1ab..bad5eaa 100644
>>>> --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>>>> @@ -342,6 +342,13 @@ static int send_write_chunks(struct svcxprt_rdma *xprt,
>>>> 						arg_ch->rs_handle,
>>>> 						arg_ch->rs_offset,
>>>> 						write_len);
>>>> +
>>>> +		/* Do not send XDR pad bytes */
>>>> +		if (chunk_no && write_len < 4) {
>>>> +			chunk_no++;
>>>> +			break;
>>> 
>>> I'm pretty lost in this code.  Why does (chunk_no && write_len < 4) mean
>>> this is xdr padding?
>> 
>> Chunk zero is always data. Padding is always going to be
>> after the first chunk. Any chunk after chunk zero that is
>> shorter than XDR quad alignment is going to be a pad.
> 
> I don't really know what a chunk is....  Looking at the code:
> 
> 	write_len = min(xfer_len, be32_to_cpu(arg_ch->rs_length));
> 
> so I guess the assumption is just that those rs_length's are always a
> multiple of four?

The example you recently gave was a two-byte NFS READ
that crosses a page boundary.

In that case, the NFSD would pass down an xdr_buf that
has one byte in a page, one byte in another page, and
a two-byte XDR pad. The logic introduced by this
optimization would be fooled, and neither the second
byte nor the XDR pad would be written to the client.

Unless you can think of a way to recognize an XDR pad
in the xdr_buf 100% of the time, you should drop this
patch.

As far as I know, none of the other patches in this
series depend on this optimization, so please merge
them if you can.


> --b.
> 
>> 
>> Probably too clever. Is there a better way to detect
>> the XDR pad?
>> 
>> 
>>>> +		}
>>>> +
>>>> 		chunk_off = 0;
>>>> 		while (write_len) {
>>>> 			ret = send_write(xprt, rqstp,
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever





  parent reply	other threads:[~2015-12-21 22:11 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-14 21:30 [PATCH v4 00/11] NFS/RDMA server patches for v4.5 Chuck Lever
2015-12-14 21:30 ` Chuck Lever
     [not found] ` <20151214211951.12932.99017.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-12-14 21:30   ` [PATCH v4 01/11] svcrdma: Do not send XDR roundup bytes for a write chunk Chuck Lever
2015-12-14 21:30     ` Chuck Lever
     [not found]     ` <20151214213009.12932.60521.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-12-21 21:07       ` J. Bruce Fields
2015-12-21 21:07         ` J. Bruce Fields
     [not found]         ` <20151221210708.GD7869-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-12-21 21:15           ` Chuck Lever
2015-12-21 21:15             ` Chuck Lever
     [not found]             ` <D0AD304E-0356-48DE-A68D-F9F8930D854D-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-12-21 21:29               ` J. Bruce Fields
2015-12-21 21:29                 ` J. Bruce Fields
     [not found]                 ` <20151221212959.GE7869-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-12-21 22:11                   ` Chuck Lever [this message]
2015-12-21 22:11                     ` Chuck Lever
     [not found]                     ` <DF5B7D29-0C6C-47EF-8E3E-74BF137D7F95-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-12-23 19:59                       ` J. Bruce Fields
2015-12-23 19:59                         ` J. Bruce Fields
2015-12-14 21:30   ` [PATCH v4 02/11] svcrdma: Clean up rdma_create_xprt() Chuck Lever
2015-12-14 21:30     ` Chuck Lever
2015-12-14 21:30   ` [PATCH v4 03/11] svcrdma: Clean up process_context() Chuck Lever
2015-12-14 21:30     ` Chuck Lever
2015-12-14 21:30   ` [PATCH v4 04/11] svcrdma: Improve allocation of struct svc_rdma_op_ctxt Chuck Lever
2015-12-14 21:30     ` Chuck Lever
2015-12-14 21:30   ` [PATCH v4 05/11] svcrdma: Improve allocation of struct svc_rdma_req_map Chuck Lever
2015-12-14 21:30     ` Chuck Lever
2015-12-14 21:30   ` [PATCH v4 06/11] svcrdma: Remove unused req_map and ctxt kmem_caches Chuck Lever
2015-12-14 21:30     ` Chuck Lever
2015-12-14 21:30   ` [PATCH v4 07/11] svcrdma: Add gfp flags to svc_rdma_post_recv() Chuck Lever
2015-12-14 21:30     ` Chuck Lever
2015-12-14 21:31   ` [PATCH v4 08/11] svcrdma: Remove last two __GFP_NOFAIL call sites Chuck Lever
2015-12-14 21:31     ` Chuck Lever
2015-12-14 21:31   ` [PATCH v4 09/11] svcrdma: Make map_xdr non-static Chuck Lever
2015-12-14 21:31     ` Chuck Lever
2015-12-14 21:31   ` [PATCH v4 10/11] svcrdma: Define maximum number of backchannel requests Chuck Lever
2015-12-14 21:31     ` Chuck Lever
2015-12-14 21:31   ` [PATCH v4 11/11] svcrdma: Add class for RDMA backwards direction transport Chuck Lever
2015-12-14 21:31     ` Chuck Lever
2015-12-16 12:10   ` [PATCH v4 00/11] NFS/RDMA server patches for v4.5 Devesh Sharma
2015-12-16 12:10     ` Devesh Sharma
     [not found]     ` <CANjDDBjypu0jX22fZ-Nf-bNNeam2MM0MADaZY9C-o9ihJPiseg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-12-23 21:00       ` J. Bruce Fields
2015-12-23 21:00         ` J. Bruce Fields
     [not found]         ` <20151223210015.GA29650-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-12-24  9:57           ` Chuck Lever
2015-12-24  9:57             ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DF5B7D29-0C6C-47EF-8E3E-74BF137D7F95@oracle.com \
    --to=chuck.lever-qhclzuegtsvqt0dzr+alfa@public.gmane.org \
    --cc=bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.