All of lore.kernel.org
 help / color / mirror / Atom feed
From: Murphy Zhou <jencce.kernel@gmail.com>
To: NeilBrown <neilb@suse.com>
Cc: linux-nfs@vger.kernel.org,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Jeff Layton <jlayton@kernel.org>
Subject: Re: [PATCH] nfsd/nfsd3_proc_readdir: check before encoding on temporary page
Date: Thu, 4 Apr 2019 10:55:45 +0800	[thread overview]
Message-ID: <CADJHv_tXK95rQyu7+S8hRDFXRM2+pP7VV586PWSy4V6T-FqoGg@mail.gmail.com> (raw)
In-Reply-To: <875zry34ir.fsf@notabene.neil.brown.name>

Hi,

Thanks for the review!

On Mon, Apr 1, 2019 at 9:20 AM NeilBrown <neilb@suse.com> wrote:
>
> On Fri, Mar 29 2019, Murphy Zhou wrote:
>
> > After this commit
> >   f875a79 nfsd: allow nfsv3 readdir request to be larger.
> > nfsv3 readdir request size can be larger than PAGE_SIZE. So if the
> > request and the directory are large enough, we can run out of pages
> > in rq_respages. Then the temporary page we are encoding on is a
> > random page, Oops happen.
> >
> > Listing a directory with 30000 files in it can trigger the panic.
> >
> > Fixing this by ensuring the temporary page resides in rq_respages.
> > And when counting how many bytes left currently from buffer to the
> > end of the page which buffer is pointing to, aka len1, do not
> > assume that curr_page_addr is at the beginning of the same page.
> > Also, update resp->count by going through all pages. Because the
> > pages may not be sequential, the old way is not safe.
> >
> > Fixes: f875a79 "nfsd: allow nfsv3 readdir request to be larger"
> > Signed-off-by: Murphy Zhou <jencce.kernel@gmail.com>
>
> Hi,
>  thanks for finding these problems and submitting a patch.  Clearly I
>  should have tested better :-(
>
> > ---
> >  fs/nfsd/nfs3proc.c | 17 +++++++++++++++--
> >  fs/nfsd/nfs3xdr.c  |  8 +++++---
> >  2 files changed, 20 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
> > index 8f933e84cec1..9bc32af4e2da 100644
> > --- a/fs/nfsd/nfs3proc.c
> > +++ b/fs/nfsd/nfs3proc.c
> > @@ -442,7 +442,9 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp)
> >       struct nfsd3_readdirargs *argp = rqstp->rq_argp;
> >       struct nfsd3_readdirres  *resp = rqstp->rq_resp;
> >       __be32          nfserr;
> > -     int             count;
> > +     int             count = 0;
> > +     struct page     **p;
> > +     caddr_t         page_addr = NULL;
> >
> >       dprintk("nfsd: READDIR(3)  %s %d bytes at %d\n",
> >                               SVCFH_fmt(&argp->fh),
> > @@ -462,7 +464,18 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp)
> >       nfserr = nfsd_readdir(rqstp, &resp->fh, (loff_t*) &argp->cookie,
> >                                       &resp->common, nfs3svc_encode_entry);
> >       memcpy(resp->verf, argp->verf, 8);
> > -     resp->count = resp->buffer - argp->buffer;
> > +     count = 0;
> > +     for (p = rqstp->rq_respages + 1; p < rqstp->rq_next_page; p++) {
> > +             page_addr = page_address(*p);
> > +
> > +             if (((caddr_t)resp->buffer >= page_addr) &&
> > +                 ((caddr_t)resp->buffer < page_addr + PAGE_SIZE)) {
> > +                     count += (caddr_t)resp->buffer - page_addr;
> > +                     break;
> > +             }
> > +             count += PAGE_SIZE;
> > +     }
> > +     resp->count = count >> 2;
> >       if (resp->offset) {
> >               loff_t offset = argp->cookie;
>
> This section makes perfect sense.  You have copied code from
> nfsd3_proc_readdirplus() int nfsd3_proc_readdir().  This is needed
> because readdir doesn't limit replies to PAGE_SIZE any more.
>
> >
> > diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> > index 93fea246f676..1fabf1952bdb 100644
> > --- a/fs/nfsd/nfs3xdr.c
> > +++ b/fs/nfsd/nfs3xdr.c
> > @@ -953,6 +953,7 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,
> >                       break;
> >       }
> >
> > +
>
> Adding blank lines is best avoided.
>
> >       if ((caddr_t)(cd->buffer + elen) < (curr_page_addr + PAGE_SIZE)) {
> >               /* encode entry in current page */
> >
> > @@ -961,11 +962,12 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,
> >               if (plus)
> >                       p = encode_entryplus_baggage(cd, p, name, namlen, ino);
> >               num_entry_words = p - cd->buffer;
> > -     } else if (*(page+1) != NULL) {
> > +     } else if (*(page+1) != NULL && (page+1) < cd->rqstp->rq_next_page) {
>
> Adding the test against rq_next_page looks correct.
> nfs3svc_decode_readdirplusargs increments ->rq_next_page over enough
> pages to cover the request count.
> However nfs4svc_decode_readdirargs doesn't!  So it will not advance
> rq_next_page properly.  We need to fix that to make it like
> nfs3svc_decode_readdirplusargs().
>
> But I don't think this test should be needed.  svc_alloc_arg() NULL-
> terminates the ->rq_pages array which ->rq_next_page points in to.
>
> So I think this test is correct as it stands, but
> nfs4svc_decode_readdirargs() needs to be fixed.

Agreed. If we set the page pointers right. These checks wont be necessary.

>
> >               /* temporarily encode entry into next page, then move back to
> >                * current and next page in rq_respages[] */
> >               __be32 *p1, *tmp;
> >               int len1, len2;
> > +             caddr_t tmp_page_addr = NULL;
> >
> >               /* grab next page for temporary storage of entry */
> >               p1 = tmp = page_address(*(page+1));
> > @@ -977,7 +979,8 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,
> >
> >               /* determine entry word length and lengths to go in pages */
> >               num_entry_words = p1 - tmp;
> > -             len1 = curr_page_addr + PAGE_SIZE - (caddr_t)cd->buffer;
> > +             tmp_page_addr = (caddr_t)((unsigned long)cd->buffer & PAGE_MASK);
> > +             len1 = tmp_page_addr + PAGE_SIZE - (caddr_t)cd->buffer;
>
> I think tmp_page_addr will always be the same as curr_page_addr.
> At least, it will when nfs4svc_decode_readdirargs() is fixed.
>
> Could you please revert the changed you've made to nfs3xdr.c (keeping
> the change to nfsd3_proc_readdir()), and fix nfs4svc_decode_readdirargs
> to be more like nfs4svc_decode_readdirplusargs, and see if that fixed
> the problem?

Sure. A quick test on 3000 files passed. I'll post v2 if test on
30,000 files pass.

Thanks very much for reviewing!
M

>
> Thanks a lot,
> NeilBrown
>
>
> >               if ((num_entry_words << 2) < len1) {
> >                       /* the actual number of words in the entry is less
> >                        * than elen and can still fit in the current page
> > @@ -1026,7 +1029,6 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,
> >       cd->buffer = p;
> >       cd->common.err = nfs_ok;
> >       return 0;
> > -
> >  }
> >
> >  int
> > --
> > 2.21.0

      reply	other threads:[~2019-04-04  2:55 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-29 14:23 [PATCH] nfsd/nfsd3_proc_readdir: check before encoding on temporary page Murphy Zhou
2019-04-01  1:20 ` NeilBrown
2019-04-04  2:55   ` Murphy Zhou [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADJHv_tXK95rQyu7+S8hRDFXRM2+pP7VV586PWSy4V6T-FqoGg@mail.gmail.com \
    --to=jencce.kernel@gmail.com \
    --cc=bfields@fieldses.org \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.