All of lore.kernel.org
 help / color / mirror / Atom feed
* nfsd: supports read buffer from multiples pages
@ 2016-01-31 12:50 Kinglong Mee
  2016-02-01 18:38 ` J. Bruce Fields
  0 siblings, 1 reply; 7+ messages in thread
From: Kinglong Mee @ 2016-01-31 12:50 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, Trond Myklebust, kinglongmee

ltp fsync02 will cause nfs sending LAYOUTCOMMIT with length
larger than two pages. nfsd returns NFSERR_BAD_XDR right now.

This patch lets nfsd supports read buffer from multiples pages.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
---
 fs/nfsd/nfs4xdr.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index d6ef095..fcf399f 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -143,11 +143,11 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
 	 * Maybe we need a new page, maybe we have just run out
 	 */
 	unsigned int avail = (char *)argp->end - (char *)argp->p;
+	unsigned int copied = 0;
 	__be32 *p;
+
 	if (avail + argp->pagelen < nbytes)
 		return NULL;
-	if (avail + PAGE_SIZE < nbytes) /* need more than a page !! */
-		return NULL;
 	/* ok, we can do it with the current plus the next page */
 	if (nbytes <= sizeof(argp->tmp))
 		p = argp->tmp;
@@ -164,9 +164,19 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
 	 * guarantee p points to at least nbytes bytes.
 	 */
 	memcpy(p, argp->p, avail);
+	copied += avail;
+	nbytes -= avail;
+
+	while (nbytes > PAGE_SIZE) {
+		next_decode_page(argp);
+		memcpy(((char*)p) + copied, argp->p, PAGE_SIZE);
+		copied += PAGE_SIZE;
+		nbytes -= PAGE_SIZE;
+	}
+
 	next_decode_page(argp);
-	memcpy(((char*)p)+avail, argp->p, (nbytes - avail));
-	argp->p += XDR_QUADLEN(nbytes - avail);
+	memcpy(((char*)p) + copied, argp->p, nbytes);
+	argp->p += XDR_QUADLEN(nbytes);
 	return p;
 }
 
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: nfsd: supports read buffer from multiples pages
  2016-01-31 12:50 nfsd: supports read buffer from multiples pages Kinglong Mee
@ 2016-02-01 18:38 ` J. Bruce Fields
  2016-02-01 18:48   ` J. Bruce Fields
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: J. Bruce Fields @ 2016-02-01 18:38 UTC (permalink / raw)
  To: Kinglong Mee
  Cc: J. Bruce Fields, linux-nfs, Trond Myklebust, Christoph Hellwig

On Sun, Jan 31, 2016 at 08:50:10PM +0800, Kinglong Mee wrote:
> ltp fsync02 will cause nfs sending LAYOUTCOMMIT with length
> larger than two pages. nfsd returns NFSERR_BAD_XDR right now.

This is with the xfs block layout?

Christoph, do we know anything about average or worst-case sizes for
that layout update field?

> This patch lets nfsd supports read buffer from multiples pages.

Hm.  We'll end up kmalloc()ing the passed-in field length:

		p = argp->tmpp = kmalloc(nbytes, GFP_KERNEL);

We still do still have that (avail + argp->pagelen) limit, so we're not
going to pass arbitrarily large nbytes straight from the network to
kmalloc.  But we do try to avoid depending on higher-order allocations.

--b.

> 
> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
> ---
>  fs/nfsd/nfs4xdr.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index d6ef095..fcf399f 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -143,11 +143,11 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
>  	 * Maybe we need a new page, maybe we have just run out
>  	 */
>  	unsigned int avail = (char *)argp->end - (char *)argp->p;
> +	unsigned int copied = 0;
>  	__be32 *p;
> +
>  	if (avail + argp->pagelen < nbytes)
>  		return NULL;
> -	if (avail + PAGE_SIZE < nbytes) /* need more than a page !! */
> -		return NULL;
>  	/* ok, we can do it with the current plus the next page */
>  	if (nbytes <= sizeof(argp->tmp))
>  		p = argp->tmp;
> @@ -164,9 +164,19 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
>  	 * guarantee p points to at least nbytes bytes.
>  	 */
>  	memcpy(p, argp->p, avail);
> +	copied += avail;
> +	nbytes -= avail;
> +
> +	while (nbytes > PAGE_SIZE) {
> +		next_decode_page(argp);
> +		memcpy(((char*)p) + copied, argp->p, PAGE_SIZE);
> +		copied += PAGE_SIZE;
> +		nbytes -= PAGE_SIZE;
> +	}
> +
>  	next_decode_page(argp);
> -	memcpy(((char*)p)+avail, argp->p, (nbytes - avail));
> -	argp->p += XDR_QUADLEN(nbytes - avail);
> +	memcpy(((char*)p) + copied, argp->p, nbytes);
> +	argp->p += XDR_QUADLEN(nbytes);
>  	return p;
>  }
>  
> -- 
> 2.5.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nfsd: supports read buffer from multiples pages
  2016-02-01 18:38 ` J. Bruce Fields
@ 2016-02-01 18:48   ` J. Bruce Fields
  2016-02-02  0:54   ` Kinglong Mee
  2016-02-02  9:20   ` Christoph Hellwig
  2 siblings, 0 replies; 7+ messages in thread
From: J. Bruce Fields @ 2016-02-01 18:48 UTC (permalink / raw)
  To: Kinglong Mee
  Cc: J. Bruce Fields, linux-nfs, Trond Myklebust, Christoph Hellwig

On Mon, Feb 01, 2016 at 01:38:05PM -0500, bfields wrote:
> On Sun, Jan 31, 2016 at 08:50:10PM +0800, Kinglong Mee wrote:
> > ltp fsync02 will cause nfs sending LAYOUTCOMMIT with length
> > larger than two pages. nfsd returns NFSERR_BAD_XDR right now.
> 
> This is with the xfs block layout?
> 
> Christoph, do we know anything about average or worst-case sizes for
> that layout update field?
> 
> > This patch lets nfsd supports read buffer from multiples pages.
> 
> Hm.  We'll end up kmalloc()ing the passed-in field length:
> 
> 		p = argp->tmpp = kmalloc(nbytes, GFP_KERNEL);
> 
> We still do still have that (avail + argp->pagelen) limit, so we're not
> going to pass arbitrarily large nbytes straight from the network to
> kmalloc.  But we do try to avoid depending on higher-order allocations.

(Which it looks like we were allowing before, possibly by accident.  But
I doubt they were actually happening in practice, so that's not evidence
that we don't need to worry about allocations greater than a page.)

--b.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nfsd: supports read buffer from multiples pages
  2016-02-01 18:38 ` J. Bruce Fields
  2016-02-01 18:48   ` J. Bruce Fields
@ 2016-02-02  0:54   ` Kinglong Mee
  2016-02-02  9:20   ` Christoph Hellwig
  2 siblings, 0 replies; 7+ messages in thread
From: Kinglong Mee @ 2016-02-02  0:54 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: J. Bruce Fields, linux-nfs, Trond Myklebust, Christoph Hellwig,
	kinglongmee

On 2/2/2016 02:38, J. Bruce Fields wrote:
> On Sun, Jan 31, 2016 at 08:50:10PM +0800, Kinglong Mee wrote:
>> ltp fsync02 will cause nfs sending LAYOUTCOMMIT with length
>> larger than two pages. nfsd returns NFSERR_BAD_XDR right now.
> 
> This is with the xfs block layout?

Yes, xfs block layout.
Tested by ltp's fsync02.

thanks,
Kinglong Mee
> 
> Christoph, do we know anything about average or worst-case sizes for
> that layout update field?
> 
>> This patch lets nfsd supports read buffer from multiples pages.
> 
> Hm.  We'll end up kmalloc()ing the passed-in field length:
> 
> 		p = argp->tmpp = kmalloc(nbytes, GFP_KERNEL);
> 
> We still do still have that (avail + argp->pagelen) limit, so we're not
> going to pass arbitrarily large nbytes straight from the network to
> kmalloc.  But we do try to avoid depending on higher-order allocations.
> 
> --b.
> 
>>
>> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
>> ---
>>  fs/nfsd/nfs4xdr.c | 18 ++++++++++++++----
>>  1 file changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>> index d6ef095..fcf399f 100644
>> --- a/fs/nfsd/nfs4xdr.c
>> +++ b/fs/nfsd/nfs4xdr.c
>> @@ -143,11 +143,11 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
>>  	 * Maybe we need a new page, maybe we have just run out
>>  	 */
>>  	unsigned int avail = (char *)argp->end - (char *)argp->p;
>> +	unsigned int copied = 0;
>>  	__be32 *p;
>> +
>>  	if (avail + argp->pagelen < nbytes)
>>  		return NULL;
>> -	if (avail + PAGE_SIZE < nbytes) /* need more than a page !! */
>> -		return NULL;
>>  	/* ok, we can do it with the current plus the next page */
>>  	if (nbytes <= sizeof(argp->tmp))
>>  		p = argp->tmp;
>> @@ -164,9 +164,19 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
>>  	 * guarantee p points to at least nbytes bytes.
>>  	 */
>>  	memcpy(p, argp->p, avail);
>> +	copied += avail;
>> +	nbytes -= avail;
>> +
>> +	while (nbytes > PAGE_SIZE) {
>> +		next_decode_page(argp);
>> +		memcpy(((char*)p) + copied, argp->p, PAGE_SIZE);
>> +		copied += PAGE_SIZE;
>> +		nbytes -= PAGE_SIZE;
>> +	}
>> +
>>  	next_decode_page(argp);
>> -	memcpy(((char*)p)+avail, argp->p, (nbytes - avail));
>> -	argp->p += XDR_QUADLEN(nbytes - avail);
>> +	memcpy(((char*)p) + copied, argp->p, nbytes);
>> +	argp->p += XDR_QUADLEN(nbytes);
>>  	return p;
>>  }
>>  
>> -- 
>> 2.5.0
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nfsd: supports read buffer from multiples pages
  2016-02-01 18:38 ` J. Bruce Fields
  2016-02-01 18:48   ` J. Bruce Fields
  2016-02-02  0:54   ` Kinglong Mee
@ 2016-02-02  9:20   ` Christoph Hellwig
  2016-02-02 14:29     ` J. Bruce Fields
  2 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2016-02-02  9:20 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Kinglong Mee, J. Bruce Fields, linux-nfs, Trond Myklebust,
	Christoph Hellwig

On Mon, Feb 01, 2016 at 01:38:05PM -0500, J. Bruce Fields wrote:
> This is with the xfs block layout?
> 
> Christoph, do we know anything about average or worst-case sizes for
> that layout update field?

The average is rather small and fits into a single page, the worst
case is basically unlimited:

	(file size / block size) * sizeof(pnfs_block_extent)

by the protocol, and about half that for a non-stupid client as
it would merge consecutive blocks and only trigger something close
to the worst case for a "block allocated, block hole, block allocated, ..."
pattern.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nfsd: supports read buffer from multiples pages
  2016-02-02  9:20   ` Christoph Hellwig
@ 2016-02-02 14:29     ` J. Bruce Fields
  2016-02-29  9:49       ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: J. Bruce Fields @ 2016-02-02 14:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kinglong Mee, J. Bruce Fields, linux-nfs, Trond Myklebust

On Tue, Feb 02, 2016 at 10:20:54AM +0100, Christoph Hellwig wrote:
> On Mon, Feb 01, 2016 at 01:38:05PM -0500, J. Bruce Fields wrote:
> > This is with the xfs block layout?
> > 
> > Christoph, do we know anything about average or worst-case sizes for
> > that layout update field?
> 
> The average is rather small and fits into a single page, the worst
> case is basically unlimited:
> 
> 	(file size / block size) * sizeof(pnfs_block_extent)
> 
> by the protocol, and about half that for a non-stupid client as
> it would merge consecutive blocks and only trigger something close
> to the worst case for a "block allocated, block hole, block allocated, ..."
> pattern.

OK.  And what's the failure mode if the layoutcommit fails?  I guess the
client returns the layout and redoes IO through the MDS.  For a rare
failure maybe that's not so terrible.

So I guess the right thing to do is take Kinglong's patch for now.

After that, it wouldn't be that hard for nfsd4_block_proc_layoutcommit
to decode from an array of pages.  But does the iomaps array end up
being just as big?

--b.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nfsd: supports read buffer from multiples pages
  2016-02-02 14:29     ` J. Bruce Fields
@ 2016-02-29  9:49       ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2016-02-29  9:49 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Christoph Hellwig, Kinglong Mee, J. Bruce Fields, linux-nfs,
	Trond Myklebust

On Tue, Feb 02, 2016 at 09:29:48AM -0500, J. Bruce Fields wrote:
> OK.  And what's the failure mode if the layoutcommit fails?  I guess the
> client returns the layout and redoes IO through the MDS.  For a rare
> failure maybe that's not so terrible.
> 
> So I guess the right thing to do is take Kinglong's patch for now.

Yes, it would be great to take it.

> After that, it wouldn't be that hard for nfsd4_block_proc_layoutcommit
> to decode from an array of pages.  But does the iomaps array end up
> being just as big?

The block layout extents are rather bloated, so it will be significantly
smaller.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-02-29  9:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-31 12:50 nfsd: supports read buffer from multiples pages Kinglong Mee
2016-02-01 18:38 ` J. Bruce Fields
2016-02-01 18:48   ` J. Bruce Fields
2016-02-02  0:54   ` Kinglong Mee
2016-02-02  9:20   ` Christoph Hellwig
2016-02-02 14:29     ` J. Bruce Fields
2016-02-29  9:49       ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.