All of lore.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "olga.kornievskaia@gmail.com" <olga.kornievskaia@gmail.com>,
	"chuck.lever@oracle.com" <chuck.lever@oracle.com>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"anna.schumaker@netapp.com" <anna.schumaker@netapp.com>
Subject: Re: [RFC 1/2] xprtrdma: xdr pad optimization revisted again
Date: Mon, 30 Aug 2021 17:34:51 +0000	[thread overview]
Message-ID: <04f975f95126921f3d239a7a9d80ced2d88b05ff.camel@hammerspace.com> (raw)
In-Reply-To: <CAN-5tyEB9nv576uJijBtyhv2pqAHGNB4yeKo=F21btOkVQuaDQ@mail.gmail.com>

On Mon, 2021-08-30 at 13:24 -0400, Olga Kornievskaia wrote:
> On Mon, Aug 30, 2021 at 1:04 PM Chuck Lever III
> <chuck.lever@oracle.com> wrote:
> > 
> > Hi Olga-
> > 
> > > On Aug 30, 2021, at 12:53 PM, Olga Kornievskaia
> > > <olga.kornievskaia@gmail.com> wrote:
> > > 
> > > From: Olga Kornievskaia <kolga@netapp.com>
> > > 
> > > Given the patch "Always provide aligned buffers to the RPC read
> > > layers",
> > > RPC over RDMA doesn't need to look at the tail page and add that
> > > space
> > > to the write chunk.
> > > 
> > > For the RFC 8166 compliant server, it must not write an XDR
> > > padding
> > > into the write chunk (even if space was provided). Historically
> > > (before RFC 8166) Solaris RDMA server has been requiring the
> > > client
> > > to provide space for the XDR padding and thus this client code
> > > has
> > > existed.
> > 
> > I don't understand this change.
> > 
> > So, the upper layer doesn't provide XDR padding any more. That
> > doesn't
> > mean Solaris servers still aren't going to want to write into it.
> > The
> > client still has to provide this padding from somewhere.
> > 
> > This suggests that "Always provide aligned buffers to the RPC read
> > layers" breaks our interop with Solaris servers. Does it?
> 
> No, I don't believe "Always provide aligned buffers to the RPC read
> layers" breaks the interoperability. THIS patch would break the
> interop.
> 
> If we are not willing to break the interoperability and support only
> servers that comply with RFC 8166, this patch is not needed.

Why? The intention of the first patch is to ensure that we do not have
buffers that are not word aligned. If Solaris wants to write padding
after the end of the file, then there is space in the page buffer for
it to do so. There should be no need for an extra tail in which to
write the padding.

This means that the RDMA and TCP cases should end up doing the same
thing for the case of the Solaris server: the padding is written into
the page buffer. There is nothing written to the tail in either case.

> 
> > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > ---
> > > net/sunrpc/xprtrdma/rpc_rdma.c | 15 ---------------
> > > 1 file changed, 15 deletions(-)
> > > 
> > > diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c
> > > b/net/sunrpc/xprtrdma/rpc_rdma.c
> > > index c335c1361564..2c4146bcf2a8 100644
> > > --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> > > +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> > > @@ -255,21 +255,6 @@ rpcrdma_convert_iovs(struct rpcrdma_xprt
> > > *r_xprt, struct xdr_buf *xdrbuf,
> > >               page_base = 0;
> > >       }
> > > 
> > > -     if (type == rpcrdma_readch)
> > > -             goto out;
> > > -
> > > -     /* When encoding a Write chunk, some servers need to see an
> > > -      * extra segment for non-XDR-aligned Write chunks. The
> > > upper
> > > -      * layer provides space in the tail iovec that may be used
> > > -      * for this purpose.
> > > -      */
> > > -     if (type == rpcrdma_writech && r_xprt->rx_ep-
> > > >re_implicit_roundup)
> > > -             goto out;
> > > -
> > > -     if (xdrbuf->tail[0].iov_len)
> > 
> > Instead of checking for a tail, we could check
> > 
> >         if (xdr_pad_size(xdrbuf->page_len))
> > 
> > and provide some tail space in that case.
> 
> I don't believe this is any different than what we have now. If the
> page size is non-4byte aligned then, we would still allocate size for
> the padding which "SHOULD NOT" be there. But yes it is allowed to be
> there.
> 
> The problem, as you know from our offline discussion, is allocating
> the tail page and including it in the write chunk for the Nvidia
> environment where Nvidia doesn't support use of data (user) pages and
> nfs kernel allocated pages in the same segment.
> 
> Alternatively, my ask is then to change rpcrdma_convert_iovs() to
> return 2 segs instead of one: one for the pages and another for the
> tail.
> 
> > 
> > > -             rpcrdma_convert_kvec(&xdrbuf->tail[0], seg, &n);
> > > -
> > > -out:
> > >       if (unlikely(n > RPCRDMA_MAX_SEGS))
> > >               return -EIO;
> > >       return n;
> > > --
> > > 2.27.0
> > > 
> > 
> > --
> > Chuck Lever
> > 
> > 
> > 

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



  reply	other threads:[~2021-08-30 17:35 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-30 16:53 [RFC 0/2] revisit RMDA XDR padding management Olga Kornievskaia
2021-08-30 16:53 ` [RFC 1/2] xprtrdma: xdr pad optimization revisted again Olga Kornievskaia
2021-08-30 17:04   ` Chuck Lever III
2021-08-30 17:24     ` Olga Kornievskaia
2021-08-30 17:34       ` Trond Myklebust [this message]
2021-08-30 18:02         ` Chuck Lever III
2021-08-30 18:18           ` Trond Myklebust
2021-08-30 20:37             ` Chuck Lever III
2021-08-31 14:33               ` Chuck Lever III
2021-08-31 15:58                 ` Olga Kornievskaia
2021-08-31 16:11                   ` Chuck Lever III
2021-08-31 15:54               ` Olga Kornievskaia
2021-08-31 16:02                 ` Chuck Lever III
2021-08-30 17:35       ` Chuck Lever III
2021-08-30 16:53 ` [RFC 2/2] xprtrdma: remove re_implicit_roundup xprt_rdma_pad_optimize Olga Kornievskaia
2021-08-30 16:57   ` Chuck Lever III
2021-08-30 16:55 ` [RFC 0/2] revisit RMDA XDR padding management Olga Kornievskaia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=04f975f95126921f3d239a7a9d80ced2d88b05ff.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=anna.schumaker@netapp.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=olga.kornievskaia@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.