* nfsd: supports read buffer from multiples pages
@ 2016-01-31 12:50 Kinglong Mee
2016-02-01 18:38 ` J. Bruce Fields
0 siblings, 1 reply; 7+ messages in thread
From: Kinglong Mee @ 2016-01-31 12:50 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs, Trond Myklebust, kinglongmee
ltp fsync02 will cause nfs sending LAYOUTCOMMIT with length
larger than two pages. nfsd returns NFSERR_BAD_XDR right now.
This patch lets nfsd supports read buffer from multiples pages.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
---
fs/nfsd/nfs4xdr.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index d6ef095..fcf399f 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -143,11 +143,11 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
* Maybe we need a new page, maybe we have just run out
*/
unsigned int avail = (char *)argp->end - (char *)argp->p;
+ unsigned int copied = 0;
__be32 *p;
+
if (avail + argp->pagelen < nbytes)
return NULL;
- if (avail + PAGE_SIZE < nbytes) /* need more than a page !! */
- return NULL;
/* ok, we can do it with the current plus the next page */
if (nbytes <= sizeof(argp->tmp))
p = argp->tmp;
@@ -164,9 +164,19 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
* guarantee p points to at least nbytes bytes.
*/
memcpy(p, argp->p, avail);
+ copied += avail;
+ nbytes -= avail;
+
+ while (nbytes > PAGE_SIZE) {
+ next_decode_page(argp);
+ memcpy(((char*)p) + copied, argp->p, PAGE_SIZE);
+ copied += PAGE_SIZE;
+ nbytes -= PAGE_SIZE;
+ }
+
next_decode_page(argp);
- memcpy(((char*)p)+avail, argp->p, (nbytes - avail));
- argp->p += XDR_QUADLEN(nbytes - avail);
+ memcpy(((char*)p) + copied, argp->p, nbytes);
+ argp->p += XDR_QUADLEN(nbytes);
return p;
}
--
2.5.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: nfsd: supports read buffer from multiples pages
2016-01-31 12:50 nfsd: supports read buffer from multiples pages Kinglong Mee
@ 2016-02-01 18:38 ` J. Bruce Fields
2016-02-01 18:48 ` J. Bruce Fields
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: J. Bruce Fields @ 2016-02-01 18:38 UTC (permalink / raw)
To: Kinglong Mee
Cc: J. Bruce Fields, linux-nfs, Trond Myklebust, Christoph Hellwig
On Sun, Jan 31, 2016 at 08:50:10PM +0800, Kinglong Mee wrote:
> ltp fsync02 will cause nfs sending LAYOUTCOMMIT with length
> larger than two pages. nfsd returns NFSERR_BAD_XDR right now.
This is with the xfs block layout?
Christoph, do we know anything about average or worst-case sizes for
that layout update field?
> This patch lets nfsd supports read buffer from multiples pages.
Hm. We'll end up kmalloc()ing the passed-in field length:
p = argp->tmpp = kmalloc(nbytes, GFP_KERNEL);
We still do still have that (avail + argp->pagelen) limit, so we're not
going to pass arbitrarily large nbytes straight from the network to
kmalloc. But we do try to avoid depending on higher-order allocations.
--b.
>
> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
> ---
> fs/nfsd/nfs4xdr.c | 18 ++++++++++++++----
> 1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index d6ef095..fcf399f 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -143,11 +143,11 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
> * Maybe we need a new page, maybe we have just run out
> */
> unsigned int avail = (char *)argp->end - (char *)argp->p;
> + unsigned int copied = 0;
> __be32 *p;
> +
> if (avail + argp->pagelen < nbytes)
> return NULL;
> - if (avail + PAGE_SIZE < nbytes) /* need more than a page !! */
> - return NULL;
> /* ok, we can do it with the current plus the next page */
> if (nbytes <= sizeof(argp->tmp))
> p = argp->tmp;
> @@ -164,9 +164,19 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
> * guarantee p points to at least nbytes bytes.
> */
> memcpy(p, argp->p, avail);
> + copied += avail;
> + nbytes -= avail;
> +
> + while (nbytes > PAGE_SIZE) {
> + next_decode_page(argp);
> + memcpy(((char*)p) + copied, argp->p, PAGE_SIZE);
> + copied += PAGE_SIZE;
> + nbytes -= PAGE_SIZE;
> + }
> +
> next_decode_page(argp);
> - memcpy(((char*)p)+avail, argp->p, (nbytes - avail));
> - argp->p += XDR_QUADLEN(nbytes - avail);
> + memcpy(((char*)p) + copied, argp->p, nbytes);
> + argp->p += XDR_QUADLEN(nbytes);
> return p;
> }
>
> --
> 2.5.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: nfsd: supports read buffer from multiples pages
2016-02-01 18:38 ` J. Bruce Fields
@ 2016-02-01 18:48 ` J. Bruce Fields
2016-02-02 0:54 ` Kinglong Mee
2016-02-02 9:20 ` Christoph Hellwig
2 siblings, 0 replies; 7+ messages in thread
From: J. Bruce Fields @ 2016-02-01 18:48 UTC (permalink / raw)
To: Kinglong Mee
Cc: J. Bruce Fields, linux-nfs, Trond Myklebust, Christoph Hellwig
On Mon, Feb 01, 2016 at 01:38:05PM -0500, bfields wrote:
> On Sun, Jan 31, 2016 at 08:50:10PM +0800, Kinglong Mee wrote:
> > ltp fsync02 will cause nfs sending LAYOUTCOMMIT with length
> > larger than two pages. nfsd returns NFSERR_BAD_XDR right now.
>
> This is with the xfs block layout?
>
> Christoph, do we know anything about average or worst-case sizes for
> that layout update field?
>
> > This patch lets nfsd supports read buffer from multiples pages.
>
> Hm. We'll end up kmalloc()ing the passed-in field length:
>
> p = argp->tmpp = kmalloc(nbytes, GFP_KERNEL);
>
> We still do still have that (avail + argp->pagelen) limit, so we're not
> going to pass arbitrarily large nbytes straight from the network to
> kmalloc. But we do try to avoid depending on higher-order allocations.
(Which it looks like we were allowing before, possibly by accident. But
I doubt they were actually happening in practice, so that's not evidence
that we don't need to worry about allocations greater than a page.)
--b.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: nfsd: supports read buffer from multiples pages
2016-02-01 18:38 ` J. Bruce Fields
2016-02-01 18:48 ` J. Bruce Fields
@ 2016-02-02 0:54 ` Kinglong Mee
2016-02-02 9:20 ` Christoph Hellwig
2 siblings, 0 replies; 7+ messages in thread
From: Kinglong Mee @ 2016-02-02 0:54 UTC (permalink / raw)
To: J. Bruce Fields
Cc: J. Bruce Fields, linux-nfs, Trond Myklebust, Christoph Hellwig,
kinglongmee
On 2/2/2016 02:38, J. Bruce Fields wrote:
> On Sun, Jan 31, 2016 at 08:50:10PM +0800, Kinglong Mee wrote:
>> ltp fsync02 will cause nfs sending LAYOUTCOMMIT with length
>> larger than two pages. nfsd returns NFSERR_BAD_XDR right now.
>
> This is with the xfs block layout?
Yes, xfs block layout.
Tested by ltp's fsync02.
thanks,
Kinglong Mee
>
> Christoph, do we know anything about average or worst-case sizes for
> that layout update field?
>
>> This patch lets nfsd supports read buffer from multiples pages.
>
> Hm. We'll end up kmalloc()ing the passed-in field length:
>
> p = argp->tmpp = kmalloc(nbytes, GFP_KERNEL);
>
> We still do still have that (avail + argp->pagelen) limit, so we're not
> going to pass arbitrarily large nbytes straight from the network to
> kmalloc. But we do try to avoid depending on higher-order allocations.
>
> --b.
>
>>
>> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
>> ---
>> fs/nfsd/nfs4xdr.c | 18 ++++++++++++++----
>> 1 file changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>> index d6ef095..fcf399f 100644
>> --- a/fs/nfsd/nfs4xdr.c
>> +++ b/fs/nfsd/nfs4xdr.c
>> @@ -143,11 +143,11 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
>> * Maybe we need a new page, maybe we have just run out
>> */
>> unsigned int avail = (char *)argp->end - (char *)argp->p;
>> + unsigned int copied = 0;
>> __be32 *p;
>> +
>> if (avail + argp->pagelen < nbytes)
>> return NULL;
>> - if (avail + PAGE_SIZE < nbytes) /* need more than a page !! */
>> - return NULL;
>> /* ok, we can do it with the current plus the next page */
>> if (nbytes <= sizeof(argp->tmp))
>> p = argp->tmp;
>> @@ -164,9 +164,19 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
>> * guarantee p points to at least nbytes bytes.
>> */
>> memcpy(p, argp->p, avail);
>> + copied += avail;
>> + nbytes -= avail;
>> +
>> + while (nbytes > PAGE_SIZE) {
>> + next_decode_page(argp);
>> + memcpy(((char*)p) + copied, argp->p, PAGE_SIZE);
>> + copied += PAGE_SIZE;
>> + nbytes -= PAGE_SIZE;
>> + }
>> +
>> next_decode_page(argp);
>> - memcpy(((char*)p)+avail, argp->p, (nbytes - avail));
>> - argp->p += XDR_QUADLEN(nbytes - avail);
>> + memcpy(((char*)p) + copied, argp->p, nbytes);
>> + argp->p += XDR_QUADLEN(nbytes);
>> return p;
>> }
>>
>> --
>> 2.5.0
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: nfsd: supports read buffer from multiples pages
2016-02-01 18:38 ` J. Bruce Fields
2016-02-01 18:48 ` J. Bruce Fields
2016-02-02 0:54 ` Kinglong Mee
@ 2016-02-02 9:20 ` Christoph Hellwig
2016-02-02 14:29 ` J. Bruce Fields
2 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2016-02-02 9:20 UTC (permalink / raw)
To: J. Bruce Fields
Cc: Kinglong Mee, J. Bruce Fields, linux-nfs, Trond Myklebust,
Christoph Hellwig
On Mon, Feb 01, 2016 at 01:38:05PM -0500, J. Bruce Fields wrote:
> This is with the xfs block layout?
>
> Christoph, do we know anything about average or worst-case sizes for
> that layout update field?
The average is rather small and fits into a single page, the worst
case is basically unlimited:
(file size / block size) * sizeof(pnfs_block_extent)
by the protocol, and about half that for a non-stupid client as
it would merge consecutive blocks and only trigger something close
to the worst case for a "block allocated, block hole, block allocated, ..."
pattern.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: nfsd: supports read buffer from multiples pages
2016-02-02 9:20 ` Christoph Hellwig
@ 2016-02-02 14:29 ` J. Bruce Fields
2016-02-29 9:49 ` Christoph Hellwig
0 siblings, 1 reply; 7+ messages in thread
From: J. Bruce Fields @ 2016-02-02 14:29 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Kinglong Mee, J. Bruce Fields, linux-nfs, Trond Myklebust
On Tue, Feb 02, 2016 at 10:20:54AM +0100, Christoph Hellwig wrote:
> On Mon, Feb 01, 2016 at 01:38:05PM -0500, J. Bruce Fields wrote:
> > This is with the xfs block layout?
> >
> > Christoph, do we know anything about average or worst-case sizes for
> > that layout update field?
>
> The average is rather small and fits into a single page, the worst
> case is basically unlimited:
>
> (file size / block size) * sizeof(pnfs_block_extent)
>
> by the protocol, and about half that for a non-stupid client as
> it would merge consecutive blocks and only trigger something close
> to the worst case for a "block allocated, block hole, block allocated, ..."
> pattern.
OK. And what's the failure mode if the layoutcommit fails? I guess the
client returns the layout and redoes IO through the MDS. For a rare
failure maybe that's not so terrible.
So I guess the right thing to do is take Kinglong's patch for now.
After that, it wouldn't be that hard for nfsd4_block_proc_layoutcommit
to decode from an array of pages. But does the iomaps array end up
being just as big?
--b.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: nfsd: supports read buffer from multiples pages
2016-02-02 14:29 ` J. Bruce Fields
@ 2016-02-29 9:49 ` Christoph Hellwig
0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2016-02-29 9:49 UTC (permalink / raw)
To: J. Bruce Fields
Cc: Christoph Hellwig, Kinglong Mee, J. Bruce Fields, linux-nfs,
Trond Myklebust
On Tue, Feb 02, 2016 at 09:29:48AM -0500, J. Bruce Fields wrote:
> OK. And what's the failure mode if the layoutcommit fails? I guess the
> client returns the layout and redoes IO through the MDS. For a rare
> failure maybe that's not so terrible.
>
> So I guess the right thing to do is take Kinglong's patch for now.
Yes, it would be great to take it.
> After that, it wouldn't be that hard for nfsd4_block_proc_layoutcommit
> to decode from an array of pages. But does the iomaps array end up
> being just as big?
The block layout extents are rather bloated, so it will be significantly
smaller.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-02-29 9:49 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-31 12:50 nfsd: supports read buffer from multiples pages Kinglong Mee
2016-02-01 18:38 ` J. Bruce Fields
2016-02-01 18:48 ` J. Bruce Fields
2016-02-02 0:54 ` Kinglong Mee
2016-02-02 9:20 ` Christoph Hellwig
2016-02-02 14:29 ` J. Bruce Fields
2016-02-29 9:49 ` Christoph Hellwig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.