All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Murphy Zhou <jencce.kernel@gmail.com>, linux-nfs@vger.kernel.org
Cc: bfields@fieldses.org, jlayton@kernel.org,
	Murphy Zhou <jencce.kernel@gmail.com>
Subject: Re: [PATCH] nfsd/nfsd3_proc_readdir: check before encoding on temporary page
Date: Mon, 01 Apr 2019 12:20:28 +1100	[thread overview]
Message-ID: <875zry34ir.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20190329142332.12010-1-jencce.kernel@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5435 bytes --]

On Fri, Mar 29 2019, Murphy Zhou wrote:

> After this commit
>   f875a79 nfsd: allow nfsv3 readdir request to be larger.
> nfsv3 readdir request size can be larger than PAGE_SIZE. So if the
> request and the directory are large enough, we can run out of pages
> in rq_respages. Then the temporary page we are encoding on is a
> random page, Oops happen.
>
> Listing a directory with 30000 files in it can trigger the panic.
>
> Fixing this by ensuring the temporary page resides in rq_respages.
> And when counting how many bytes left currently from buffer to the
> end of the page which buffer is pointing to, aka len1, do not
> assume that curr_page_addr is at the beginning of the same page.
> Also, update resp->count by going through all pages. Because the
> pages may not be sequential, the old way is not safe.
>
> Fixes: f875a79 "nfsd: allow nfsv3 readdir request to be larger"
> Signed-off-by: Murphy Zhou <jencce.kernel@gmail.com>

Hi,
 thanks for finding these problems and submitting a patch.  Clearly I
 should have tested better :-(

> ---
>  fs/nfsd/nfs3proc.c | 17 +++++++++++++++--
>  fs/nfsd/nfs3xdr.c  |  8 +++++---
>  2 files changed, 20 insertions(+), 5 deletions(-)
>
> diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
> index 8f933e84cec1..9bc32af4e2da 100644
> --- a/fs/nfsd/nfs3proc.c
> +++ b/fs/nfsd/nfs3proc.c
> @@ -442,7 +442,9 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp)
>  	struct nfsd3_readdirargs *argp = rqstp->rq_argp;
>  	struct nfsd3_readdirres  *resp = rqstp->rq_resp;
>  	__be32		nfserr;
> -	int		count;
> +	int		count = 0;
> +	struct page	**p;
> +	caddr_t		page_addr = NULL;
>  
>  	dprintk("nfsd: READDIR(3)  %s %d bytes at %d\n",
>  				SVCFH_fmt(&argp->fh),
> @@ -462,7 +464,18 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp)
>  	nfserr = nfsd_readdir(rqstp, &resp->fh, (loff_t*) &argp->cookie, 
>  					&resp->common, nfs3svc_encode_entry);
>  	memcpy(resp->verf, argp->verf, 8);
> -	resp->count = resp->buffer - argp->buffer;
> +	count = 0;
> +	for (p = rqstp->rq_respages + 1; p < rqstp->rq_next_page; p++) {
> +		page_addr = page_address(*p);
> +
> +		if (((caddr_t)resp->buffer >= page_addr) &&
> +		    ((caddr_t)resp->buffer < page_addr + PAGE_SIZE)) {
> +			count += (caddr_t)resp->buffer - page_addr;
> +			break;
> +		}
> +		count += PAGE_SIZE;
> +	}
> +	resp->count = count >> 2;
>  	if (resp->offset) {
>  		loff_t offset = argp->cookie;

This section makes perfect sense.  You have copied code from
nfsd3_proc_readdirplus() int nfsd3_proc_readdir().  This is needed
because readdir doesn't limit replies to PAGE_SIZE any more.

>  
> diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> index 93fea246f676..1fabf1952bdb 100644
> --- a/fs/nfsd/nfs3xdr.c
> +++ b/fs/nfsd/nfs3xdr.c
> @@ -953,6 +953,7 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,
>  			break;
>  	}
>  
> +

Adding blank lines is best avoided.

>  	if ((caddr_t)(cd->buffer + elen) < (curr_page_addr + PAGE_SIZE)) {
>  		/* encode entry in current page */
>  
> @@ -961,11 +962,12 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,
>  		if (plus)
>  			p = encode_entryplus_baggage(cd, p, name, namlen, ino);
>  		num_entry_words = p - cd->buffer;
> -	} else if (*(page+1) != NULL) {
> +	} else if (*(page+1) != NULL && (page+1) < cd->rqstp->rq_next_page) {

Adding the test against rq_next_page looks correct.
nfs3svc_decode_readdirplusargs increments ->rq_next_page over enough
pages to cover the request count.
However nfs4svc_decode_readdirargs doesn't!  So it will not advance
rq_next_page properly.  We need to fix that to make it like
nfs3svc_decode_readdirplusargs().

But I don't think this test should be needed.  svc_alloc_arg() NULL-
terminates the ->rq_pages array which ->rq_next_page points in to.

So I think this test is correct as it stands, but
nfs4svc_decode_readdirargs() needs to be fixed.

>  		/* temporarily encode entry into next page, then move back to
>  		 * current and next page in rq_respages[] */
>  		__be32 *p1, *tmp;
>  		int len1, len2;
> +		caddr_t	tmp_page_addr = NULL;
>  
>  		/* grab next page for temporary storage of entry */
>  		p1 = tmp = page_address(*(page+1));
> @@ -977,7 +979,8 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,
>  
>  		/* determine entry word length and lengths to go in pages */
>  		num_entry_words = p1 - tmp;
> -		len1 = curr_page_addr + PAGE_SIZE - (caddr_t)cd->buffer;
> +		tmp_page_addr = (caddr_t)((unsigned long)cd->buffer & PAGE_MASK);
> +		len1 = tmp_page_addr + PAGE_SIZE - (caddr_t)cd->buffer;

I think tmp_page_addr will always be the same as curr_page_addr.
At least, it will when nfs4svc_decode_readdirargs() is fixed.

Could you please revert the changed you've made to nfs3xdr.c (keeping
the change to nfsd3_proc_readdir()), and fix nfs4svc_decode_readdirargs
to be more like nfs4svc_decode_readdirplusargs, and see if that fixed
the problem?

Thanks a lot,
NeilBrown


>  		if ((num_entry_words << 2) < len1) {
>  			/* the actual number of words in the entry is less
>  			 * than elen and can still fit in the current page
> @@ -1026,7 +1029,6 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen,
>  	cd->buffer = p;
>  	cd->common.err = nfs_ok;
>  	return 0;
> -
>  }
>  
>  int
> -- 
> 2.21.0

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  reply	other threads:[~2019-04-01  1:20 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-29 14:23 [PATCH] nfsd/nfsd3_proc_readdir: check before encoding on temporary page Murphy Zhou
2019-04-01  1:20 ` NeilBrown [this message]
2019-04-04  2:55   ` Murphy Zhou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875zry34ir.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=bfields@fieldses.org \
    --cc=jencce.kernel@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.