All of lore.kernel.org
 help / color / mirror / Atom feed
From: Olga Kornievskaia <olga.kornievskaia@gmail.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Frank van der Linden <fllinden@amazon.com>,
	Trond Myklebust <trond.myklebust@hammerspace.com>,
	Anna Schumaker <anna.schumaker@netapp.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH 1/1] NFSv4.2: fix LISTXATTR buffer receive size
Date: Mon, 23 Nov 2020 18:14:14 -0500	[thread overview]
Message-ID: <CAN-5tyFe-FBb_UWUmWokotEzNiYj5zJaWiu1oK+54H-1HQRurw@mail.gmail.com> (raw)
In-Reply-To: <F85397C8-3FFD-4A7F-92E4-DB84D80F6387@oracle.com>

On Mon, Nov 23, 2020 at 1:09 PM Chuck Lever <chuck.lever@oracle.com> wrote:
>
>
>
> > On Nov 23, 2020, at 12:59 PM, Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote:
> >
> > On Mon, Nov 23, 2020 at 12:37 PM Chuck Lever <chuck.lever@oracle.com> wrote:
> >>
> >>
> >>
> >>> On Nov 23, 2020, at 11:42 AM, Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote:
> >>>
> >>> Hi Frank, Chuck,
> >>>
> >>> I would like your option on how LISTXATTR is supposed to work over
> >>> RDMA. Here's my current understanding of why the listxattr is not
> >>> working over the RDMA.
> >>>
> >>> This happens when the listxattr is called with a very small buffer
> >>> size which RDMA wants to send an inline request. I really dont
> >>> understand why, Chuck, you are not seeing any problems with hardware
> >>> as far as I can tell it would have the same problem because the inline
> >>> threshold size would still make this size inline.
> >>> rcprdma_inline_fixup() is trying to write to pages that don't exist.
> >>>
> >>> When LISTXATTR sets this flag XDRBUF_SPARSE_PAGES there is code that
> >>> will allocate pages in xs_alloc_sparse_pages() but this is ONLY for
> >>> TCP. RDMA doesn't have anything like that.
> >>>
> >>> Question: Should there be code added to RDMA that will do something
> >>> similar when it sees that flag set?
> >>
> >> Isn't the logic in rpcrdma_convert_iovs() allocating those pages?
> >
> > No, rpcrdm_convert_iovs is only called for when you have reply chunks,
> > lists etc but not for the inline messages. What am I missing?
>
> So, then, rpcrdma_marshal_req() is deciding that the LISTXATTRS
> reply is supposed to fit inline. That means rqst->rq_rcv_buf.buflen
> is small.
>
> But if rpcrdma_inline_fixup() is trying to fill pages,
> rqst->rq_rcv_buf.page_len must not be zero? That sounds like the
> LISTXATTRS encoder is not setting up the receive buffer correctly.
>
> The receive buffer's buflen field is supposed to be set to a value
> that is at least as large as page_len, I would think.

Here's what the LISTXATTR code does that I can see:
It allocates pointers to the pages (but no pages). It sets the
page_len to the hdr.replen so yes it's not zero (setting of the value
as far as i know is correct). So for RDMA nothing allocates those
pages because it's an inline request. TCP code will allocate those
pages because the code was added. You keep on saying that you don't
think this is it but I don't know how to prove to you that Kasan's
message of "wild-memory access" means that page wasn't allocated. It's
a bogus address. The page isn't there. Or at least that's how I read
the Kasan's message. I don't know how else to interpret it (but also
know that code never allocates the memory I believe is a strong
argument).

NFS code can't know that request is inline. It does assume something
will allocate that memory but RDMA doesn't allocate memory for the
inline messages.

While I'm not suggesting this is a correct fix but this is a fix that
removes the oops.

diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index 2b2211d1234e..faab6aedeb42 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -1258,6 +1258,15 @@ static ssize_t _nfs42_proc_listxattrs(struct
inode *inode, void *buf,
                __free_page(res.scratch);
                return -ENOMEM;
        }
+       if (buflen < 1024) {
+               int i;
+               for (i = 0; i < np; i++) {
+                       pages[i] = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
+                       if (!pages[i])
+                               return -ENOMEM;
+               }
+       }
+

        arg.xattr_pages = pages;
        arg.count = xdrlen;


Basically since I know that all RDMA less than 1024 (for soft Roce)
will be inline, I need to allocate pages for them. This doesn't
interfere with the TCP mounts as the code checks if pages are
allocated and only allocates them if they are not.

But of course this is not a solution as it's unknown what's the rdma's
inline threshold is at the NFS layer.



> >>> Or, should LISTXATTR be re-written
> >>> to be like READDIR which allocates pages before calling the code.
> >>
> >> AIUI READDIR reads into the directory inode's page cache. I recall
> >> that Frank couldn't do that for LISTXATTR because there's no
> >> similar page cache associated with the xattr listing.
> >>
> >> That said, I would prefer that the *XATTR procedures directly
> >> allocate pages instead of relying on SPARSE_PAGES, which is a hack
> >> IMO. I think it would have to use alloc_page() for that, and then
> >> ensure those pages are released when the call has completed.
> >>
> >> I'm not convinced this is the cause of the problem you're seeing,
> >> though.
> >>
> >> --
> >> Chuck Lever
>
> --
> Chuck Lever
>
>
>

  reply	other threads:[~2020-11-23 23:14 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-13 19:08 [PATCH 1/1] NFSv4.2: fix LISTXATTR buffer receive size Olga Kornievskaia
2020-11-13 20:34 ` Chuck Lever
2020-11-18 21:44   ` Olga Kornievskaia
2020-11-18 22:16     ` Trond Myklebust
2020-11-19 14:37     ` Chuck Lever
2020-11-19 15:09       ` Olga Kornievskaia
2020-11-19 16:19         ` Chuck Lever
2020-11-19 23:26           ` Frank van der Linden
2020-11-20 16:37             ` Olga Kornievskaia
2020-11-23 16:42               ` Olga Kornievskaia
2020-11-23 17:37                 ` Chuck Lever
2020-11-23 17:59                   ` Olga Kornievskaia
2020-11-23 18:09                     ` Chuck Lever
2020-11-23 23:14                       ` Olga Kornievskaia [this message]
2020-11-23 18:20                   ` Frank van der Linden
2020-11-23 17:38                 ` Frank van der Linden
2020-11-23 17:49                   ` Chuck Lever
2020-11-23 17:56                   ` Chuck Lever
2020-11-23 18:05                   ` Olga Kornievskaia
2020-11-23 19:24                   ` [UNVERIFIED SENDER] " Frank van der Linden

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAN-5tyFe-FBb_UWUmWokotEzNiYj5zJaWiu1oK+54H-1HQRurw@mail.gmail.com \
    --to=olga.kornievskaia@gmail.com \
    --cc=anna.schumaker@netapp.com \
    --cc=chuck.lever@oracle.com \
    --cc=fllinden@amazon.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trond.myklebust@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.