* [PATCH] NFS: Always provide aligned buffers to the RPC read layers @ 2021-08-27 18:00 trondmy 2022-01-18 19:26 ` Dan Aloni 0 siblings, 1 reply; 21+ messages in thread From: trondmy @ 2021-08-27 18:00 UTC (permalink / raw) To: Anna Schumaker; +Cc: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> Instead of messing around with XDR padding in the RDMA layer, we should just give the RPC layer an aligned buffer. Try to avoid creating extra RPC calls by aligning to the smaller value of ALIGN(len, rsize) and PAGE_SIZE. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- fs/nfs/read.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/nfs/read.c b/fs/nfs/read.c index 9f39e0a1a38b..08d6cc57cbc3 100644 --- a/fs/nfs/read.c +++ b/fs/nfs/read.c @@ -293,15 +293,19 @@ static int readpage_async_filler(void *data, struct page *page) { struct nfs_readdesc *desc = data; + struct inode *inode = page_file_mapping(page)->host; + unsigned int rsize = NFS_SERVER(inode)->rsize; struct nfs_page *new; - unsigned int len; + unsigned int len, aligned_len; int error; len = nfs_page_length(page); if (len == 0) return nfs_return_empty_page(page); - new = nfs_create_request(desc->ctx, page, 0, len); + aligned_len = min_t(unsigned int, ALIGN(len, rsize), PAGE_SIZE); + + new = nfs_create_request(desc->ctx, page, 0, aligned_len); if (IS_ERR(new)) goto out_error; -- 2.31.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] NFS: Always provide aligned buffers to the RPC read layers 2021-08-27 18:00 [PATCH] NFS: Always provide aligned buffers to the RPC read layers trondmy @ 2022-01-18 19:26 ` Dan Aloni 2022-01-18 19:33 ` [PATCH] NFS: fix an infinite request retry in an off-by-one last page read Dan Aloni 0 siblings, 1 reply; 21+ messages in thread From: Dan Aloni @ 2022-01-18 19:26 UTC (permalink / raw) To: trondmy; +Cc: Anna Schumaker, linux-nfs On Fri, Aug 27, 2021 at 02:00:56PM -0400, trondmy@kernel.org wrote: > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > Instead of messing around with XDR padding in the RDMA layer, we should > just give the RPC layer an aligned buffer. Try to avoid creating extra > RPC calls by aligning to the smaller value of ALIGN(len, rsize) and > PAGE_SIZE. I've bisected this change to be the cause of xfstests generic/525 getting stuck when tested against a standard Linux NFS server. It was stuck in a repeated request loop in kernel mode, and killable. I posted a patch with a full explanation in a reply. -- Dan Aloni ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH] NFS: fix an infinite request retry in an off-by-one last page read 2022-01-18 19:26 ` Dan Aloni @ 2022-01-18 19:33 ` Dan Aloni 2022-01-18 19:43 ` Trond Myklebust 0 siblings, 1 reply; 21+ messages in thread From: Dan Aloni @ 2022-01-18 19:33 UTC (permalink / raw) To: trondmy; +Cc: Anna Schumaker, linux-nfs Due to change 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers"), a read of 0xfff is aligned up to server rsize of 0x1000. As a result, in a test where the server has a file of size 0x7fffffffffffffff, and the client tries to read from the offset 0x7ffffffffffff000, the read causes loff_t overflow in the server and it returns an NFS code of EINVAL to the client. The client as a result indefinitely retries the request. This fixes the issue by cancelling the alignment for that case. Fixes: 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers") Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> --- fs/nfs/read.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/fs/nfs/read.c b/fs/nfs/read.c index 08d6cc57cbc3..d6fac5e4d3f4 100644 --- a/fs/nfs/read.c +++ b/fs/nfs/read.c @@ -296,16 +296,19 @@ readpage_async_filler(void *data, struct page *page) struct inode *inode = page_file_mapping(page)->host; unsigned int rsize = NFS_SERVER(inode)->rsize; struct nfs_page *new; - unsigned int len, aligned_len; + unsigned int len, request_len; int error; len = nfs_page_length(page); if (len == 0) return nfs_return_empty_page(page); - aligned_len = min_t(unsigned int, ALIGN(len, rsize), PAGE_SIZE); + if (likely(page_index(page) != (LLONG_MAX >> PAGE_SHIFT))) + request_len = min_t(unsigned int, ALIGN(len, rsize), PAGE_SIZE); + else + request_len = len; - new = nfs_create_request(desc->ctx, page, 0, aligned_len); + new = nfs_create_request(desc->ctx, page, 0, request_len); if (IS_ERR(new)) goto out_error; -- 2.23.0 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] NFS: fix an infinite request retry in an off-by-one last page read 2022-01-18 19:33 ` [PATCH] NFS: fix an infinite request retry in an off-by-one last page read Dan Aloni @ 2022-01-18 19:43 ` Trond Myklebust 2022-01-21 18:50 ` [PATCH] NFSD: trim reads past NFS_OFFSET_MAX Dan Aloni 0 siblings, 1 reply; 21+ messages in thread From: Trond Myklebust @ 2022-01-18 19:43 UTC (permalink / raw) To: dan.aloni, trondmy; +Cc: linux-nfs, Anna.Schumaker On Tue, 2022-01-18 at 21:33 +0200, Dan Aloni wrote: > Due to change 8cfb9015280d ("NFS: Always provide aligned buffers to > the > RPC read layers"), a read of 0xfff is aligned up to server rsize of > 0x1000. > > As a result, in a test where the server has a file of size > 0x7fffffffffffffff, and the client tries to read from the offset > 0x7ffffffffffff000, the read causes loff_t overflow in the server and > it > returns an NFS code of EINVAL to the client. The client as a result > indefinitely retries the request. > > This fixes the issue by cancelling the alignment for that case. > NACK. This would be a server bug, not a client bug. The NFS protocol has no notion of signed offsets, and doesn't use loff_t. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH] NFSD: trim reads past NFS_OFFSET_MAX 2022-01-18 19:43 ` Trond Myklebust @ 2022-01-21 18:50 ` Dan Aloni 2022-01-21 22:32 ` Chuck Lever III 0 siblings, 1 reply; 21+ messages in thread From: Dan Aloni @ 2022-01-21 18:50 UTC (permalink / raw) To: trondmy; +Cc: Anna Schumaker, linux-nfs Due to change 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers"), a read of 0xfff is aligned up to server rsize of 0x1000. As a result, in a test where the server has a file of size 0x7fffffffffffffff, and the client tries to read from the offset 0x7ffffffffffff000, the read causes loff_t overflow in the server and it returns an NFS code of EINVAL to the client. The client as a result indefinitely retries the request. This fixes the issue at server side by trimming reads past NFS_OFFSET_MAX. Fixes: 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers") Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> --- fs/nfsd/vfs.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 738d564ca4ce..754f4e9ff4a2 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1046,6 +1046,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp, __be32 err; trace_nfsd_read_start(rqstp, fhp, offset, *count); + + if (unlikely(offset + *count > NFS_OFFSET_MAX)) + *count = NFS_OFFSET_MAX - offset; + err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf); if (err) return err; -- 2.23.0 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSD: trim reads past NFS_OFFSET_MAX 2022-01-21 18:50 ` [PATCH] NFSD: trim reads past NFS_OFFSET_MAX Dan Aloni @ 2022-01-21 22:32 ` Chuck Lever III 2022-01-22 12:47 ` Dan Aloni 2022-01-22 12:49 ` [PATCH v2] NFSD: trim reads past NFS_OFFSET_MAX and fix NFSv3 check Dan Aloni 0 siblings, 2 replies; 21+ messages in thread From: Chuck Lever III @ 2022-01-21 22:32 UTC (permalink / raw) To: Dan Aloni; +Cc: Anna Schumaker, trondmy, Linux NFS Mailing List Hi Dan! NFS server patches should be sent to me these days. $ scripts/get_maintainer.pl fs/nfsd Chuck Lever <chuck.lever@oracle.com> (supporter:KERNEL NFSD, SUNRPC, AND LOCKD SERVERS) linux-nfs@vger.kernel.org (open list:KERNEL NFSD, SUNRPC, AND LOCKD SERVERS) linux-kernel@vger.kernel.org (open list) > On Jan 21, 2022, at 1:50 PM, Dan Aloni <dan.aloni@vastdata.com> wrote: > > Due to change 8cfb9015280d ("NFS: Always provide aligned buffers to the > RPC read layers"), a read of 0xfff is aligned up to server rsize of > 0x1000. > > As a result, in a test where the server has a file of size > 0x7fffffffffffffff, and the client tries to read from the offset > 0x7ffffffffffff000, the read causes loff_t overflow in the server and it > returns an NFS code of EINVAL to the client. The client as a result > indefinitely retries the request. An infinite loop in this case is a client bug. Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC 5661 permits the NFSv4 READ operation to return NFS4ERR_INVAL. Was the client side fix for this issue rejected? > This fixes the issue at server side by trimming reads past NFS_OFFSET_MAX. It's OK for the server to return a short READ in this case, so I will indeed consider a change to make that happen. But see below for comments specific to this patch. > Fixes: 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers") > Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> > --- > fs/nfsd/vfs.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index 738d564ca4ce..754f4e9ff4a2 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -1046,6 +1046,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp, > __be32 err; > > trace_nfsd_read_start(rqstp, fhp, offset, *count); > + > + if (unlikely(offset + *count > NFS_OFFSET_MAX)) > + *count = NFS_OFFSET_MAX - offset; Can @offset ever be larger than NFS_OFFSET_MAX? Does this check have any effect on NFSv4 READ operations? > + > err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf); > if (err) > return err; > -- > 2.23.0 > -- Chuck Lever ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSD: trim reads past NFS_OFFSET_MAX 2022-01-21 22:32 ` Chuck Lever III @ 2022-01-22 12:47 ` Dan Aloni 2022-01-22 17:05 ` Chuck Lever III 2022-01-22 12:49 ` [PATCH v2] NFSD: trim reads past NFS_OFFSET_MAX and fix NFSv3 check Dan Aloni 1 sibling, 1 reply; 21+ messages in thread From: Dan Aloni @ 2022-01-22 12:47 UTC (permalink / raw) To: Chuck Lever III Cc: Anna Schumaker, trondmy, Linux NFS Mailing List, Trond Myklebust On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III wrote: > NFS server patches should be sent to me these days. Thanks, will remember this next time. > > On Jan 21, 2022, at 1:50 PM, Dan Aloni <dan.aloni@vastdata.com> wrote: > > > > Due to change 8cfb9015280d ("NFS: Always provide aligned buffers to the > > RPC read layers"), a read of 0xfff is aligned up to server rsize of > > 0x1000. > > > > As a result, in a test where the server has a file of size > > 0x7fffffffffffffff, and the client tries to read from the offset > > 0x7ffffffffffff000, the read causes loff_t overflow in the server and it > > returns an NFS code of EINVAL to the client. The client as a result > > indefinitely retries the request. > > An infinite loop in this case is a client bug. > > Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure > to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC > 5661 permits the NFSv4 READ operation to return > NFS4ERR_INVAL. > > Was the client side fix for this issue rejected? Yeah, see Trond's response in https://lore.kernel.org/linux-nfs/fa9974724216c43f9bdb3fd39555d398fde11e59.camel@hammerspace.com/ So it is both a client and server bugs? > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > > index 738d564ca4ce..754f4e9ff4a2 100644 > > --- a/fs/nfsd/vfs.c > > +++ b/fs/nfsd/vfs.c > > @@ -1046,6 +1046,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp, > > __be32 err; > > > > trace_nfsd_read_start(rqstp, fhp, offset, *count); > > + > > + if (unlikely(offset + *count > NFS_OFFSET_MAX)) > > + *count = NFS_OFFSET_MAX - offset; > > Can @offset ever be larger than NFS_OFFSET_MAX? We have this check in `nfsd4_read`, `(read->rd_offset >= OFFSET_MAX)`. (should it have been `>` rather?). Seems it is missing from NFSv3, should add. > Does this check have any effect on NFSv4 READ operations? Indeed it doesn't - my expanded testing shows it only fixed for NFSv3. Will send an updated patch. -- Dan Aloni ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSD: trim reads past NFS_OFFSET_MAX 2022-01-22 12:47 ` Dan Aloni @ 2022-01-22 17:05 ` Chuck Lever III 2022-01-22 18:27 ` Trond Myklebust 2022-01-22 19:01 ` Dan Aloni 0 siblings, 2 replies; 21+ messages in thread From: Chuck Lever III @ 2022-01-22 17:05 UTC (permalink / raw) To: Dan Aloni Cc: Anna Schumaker, trondmy, Linux NFS Mailing List, Trond Myklebust > On Jan 22, 2022, at 7:47 AM, Dan Aloni <dan.aloni@vastdata.com> wrote: > > On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III wrote: >>> On Jan 21, 2022, at 1:50 PM, Dan Aloni <dan.aloni@vastdata.com> wrote: >>> >>> Due to change 8cfb9015280d ("NFS: Always provide aligned buffers to the >>> RPC read layers"), a read of 0xfff is aligned up to server rsize of >>> 0x1000. >>> >>> As a result, in a test where the server has a file of size >>> 0x7fffffffffffffff, and the client tries to read from the offset >>> 0x7ffffffffffff000, the read causes loff_t overflow in the server and it >>> returns an NFS code of EINVAL to the client. The client as a result >>> indefinitely retries the request. >> >> An infinite loop in this case is a client bug. >> >> Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure >> to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC >> 5661 permits the NFSv4 READ operation to return >> NFS4ERR_INVAL. >> >> Was the client side fix for this issue rejected? > > Yeah, see Trond's response in > > https://lore.kernel.org/linux-nfs/fa9974724216c43f9bdb3fd39555d398fde11e59.camel@hammerspace.com/ > > So it is both a client and server bugs? Splitting hairs, but yes there are issues on both sides IMO. Bad behavior due to bugs on both sides is actually not uncommon. Trond is correct that the server is not dealing totally correctly with the range of values in a READ request. However, as I pointed out, the specification permits NFS servers to return NFS[34]ERR_INVAL on READ. And in fact, there is already code in the NFSv4 READ path that returns INVAL, for example: 785 if (read->rd_offset >= OFFSET_MAX) 786 return nfserr_inval; I'm not sure the specifications describe precisely when the server /must/ return INVAL, but the client needs to be prepared to handle it reasonably. If INVAL results in an infinite loop, then that's a client bug. IMO changing the alignment for that case is a band-aid. The underlying looping behavior is what is the root problem. (So... I agree with Trond's NACK, but for different reasons). >>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c >>> index 738d564ca4ce..754f4e9ff4a2 100644 >>> --- a/fs/nfsd/vfs.c >>> +++ b/fs/nfsd/vfs.c >>> @@ -1046,6 +1046,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp, >>> __be32 err; >>> >>> trace_nfsd_read_start(rqstp, fhp, offset, *count); >>> + >>> + if (unlikely(offset + *count > NFS_OFFSET_MAX)) >>> + *count = NFS_OFFSET_MAX - offset; >> >> Can @offset ever be larger than NFS_OFFSET_MAX? > > We have this check in `nfsd4_read`, `(read->rd_offset >= OFFSET_MAX)`. > (should it have been `>` rather?). Don't think so, a zero-byte READ should be valid. However it's rather interesting that it does not use NFS_OFFSET_MAX here. Does anyone know why NFSv3 uses NFS_OFFSET_MAX but NFSv4 and NLM use OFFSET_MAX? -- Chuck Lever ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSD: trim reads past NFS_OFFSET_MAX 2022-01-22 17:05 ` Chuck Lever III @ 2022-01-22 18:27 ` Trond Myklebust 2022-01-22 20:15 ` Chuck Lever III 2022-01-22 19:01 ` Dan Aloni 1 sibling, 1 reply; 21+ messages in thread From: Trond Myklebust @ 2022-01-22 18:27 UTC (permalink / raw) To: dan.aloni, chuck.lever; +Cc: linux-nfs, Anna.Schumaker, trondmy On Sat, 2022-01-22 at 17:05 +0000, Chuck Lever III wrote: > > > On Jan 22, 2022, at 7:47 AM, Dan Aloni <dan.aloni@vastdata.com> > > wrote: > > > > On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III wrote: > > > > On Jan 21, 2022, at 1:50 PM, Dan Aloni <dan.aloni@vastdata.com> > > > > wrote: > > > > > > > > Due to change 8cfb9015280d ("NFS: Always provide aligned > > > > buffers to the > > > > RPC read layers"), a read of 0xfff is aligned up to server > > > > rsize of > > > > 0x1000. > > > > > > > > As a result, in a test where the server has a file of size > > > > 0x7fffffffffffffff, and the client tries to read from the > > > > offset > > > > 0x7ffffffffffff000, the read causes loff_t overflow in the > > > > server and it > > > > returns an NFS code of EINVAL to the client. The client as a > > > > result > > > > indefinitely retries the request. > > > > > > An infinite loop in this case is a client bug. > > > > > > Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure > > > to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC > > > 5661 permits the NFSv4 READ operation to return > > > NFS4ERR_INVAL. > > > > > > Was the client side fix for this issue rejected? > > > > Yeah, see Trond's response in > > > > > > https://lore.kernel.org/linux-nfs/fa9974724216c43f9bdb3fd39555d398fde11e59.camel@hammerspace.com/ > > > > So it is both a client and server bugs? > > Splitting hairs, but yes there are issues on both sides > IMO. Bad behavior due to bugs on both sides is actually > not uncommon. > > Trond is correct that the server is not dealing totally > correctly with the range of values in a READ request. > > However, as I pointed out, the specification permits NFS > servers to return NFS[34]ERR_INVAL on READ. And in fact, > there is already code in the NFSv4 READ path that returns > INVAL, for example: > > 785 if (read->rd_offset >= OFFSET_MAX) > 786 return nfserr_inval; > > I'm not sure the specifications describe precisely when > the server /must/ return INVAL, but the client needs to > be prepared to handle it reasonably. If INVAL results in > an infinite loop, then that's a client bug. > > IMO changing the alignment for that case is a band-aid. > The underlying looping behavior is what is the root > problem. (So... I agree with Trond's NACK, but for > different reasons). If I'm reading Dan's test case correctly, the client is trying to read a full page of 0x1000 bytes starting at offset 0x7fffffffffffff000. That means the end offset for that read is 0x7fffffffffffff000 + 0x1000 - 1 = 0x7fffffffffffffff. IOW: as far as the server is concerned, there is no loff_t overflow on either the start or end offset and so there is no reason for it to return NFS4ERR_INVAL. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSD: trim reads past NFS_OFFSET_MAX 2022-01-22 18:27 ` Trond Myklebust @ 2022-01-22 20:15 ` Chuck Lever III 2022-01-22 20:30 ` Trond Myklebust 0 siblings, 1 reply; 21+ messages in thread From: Chuck Lever III @ 2022-01-22 20:15 UTC (permalink / raw) To: Trond Myklebust Cc: dan.aloni, Linux NFS Mailing List, Anna Schumaker, trondmy > On Jan 22, 2022, at 1:27 PM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Sat, 2022-01-22 at 17:05 +0000, Chuck Lever III wrote: >> >>> On Jan 22, 2022, at 7:47 AM, Dan Aloni <dan.aloni@vastdata.com> >>> wrote: >>> >>> On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III wrote: >>>>> On Jan 21, 2022, at 1:50 PM, Dan Aloni <dan.aloni@vastdata.com> >>>>> wrote: >>>>> >>>>> Due to change 8cfb9015280d ("NFS: Always provide aligned >>>>> buffers to the >>>>> RPC read layers"), a read of 0xfff is aligned up to server >>>>> rsize of >>>>> 0x1000. >>>>> >>>>> As a result, in a test where the server has a file of size >>>>> 0x7fffffffffffffff, and the client tries to read from the >>>>> offset >>>>> 0x7ffffffffffff000, the read causes loff_t overflow in the >>>>> server and it >>>>> returns an NFS code of EINVAL to the client. The client as a >>>>> result >>>>> indefinitely retries the request. >>>> >>>> An infinite loop in this case is a client bug. >>>> >>>> Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure >>>> to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC >>>> 5661 permits the NFSv4 READ operation to return >>>> NFS4ERR_INVAL. >>>> >>>> Was the client side fix for this issue rejected? >>> >>> Yeah, see Trond's response in >>> >>> >>> https://lore.kernel.org/linux-nfs/fa9974724216c43f9bdb3fd39555d398fde11e59.camel@hammerspace.com/ >>> >>> So it is both a client and server bugs? >> >> Splitting hairs, but yes there are issues on both sides >> IMO. Bad behavior due to bugs on both sides is actually >> not uncommon. >> >> Trond is correct that the server is not dealing totally >> correctly with the range of values in a READ request. >> >> However, as I pointed out, the specification permits NFS >> servers to return NFS[34]ERR_INVAL on READ. And in fact, >> there is already code in the NFSv4 READ path that returns >> INVAL, for example: >> >> 785 if (read->rd_offset >= OFFSET_MAX) >> 786 return nfserr_inval; >> >> I'm not sure the specifications describe precisely when >> the server /must/ return INVAL, but the client needs to >> be prepared to handle it reasonably. If INVAL results in >> an infinite loop, then that's a client bug. >> >> IMO changing the alignment for that case is a band-aid. >> The underlying looping behavior is what is the root >> problem. (So... I agree with Trond's NACK, but for >> different reasons). > > If I'm reading Dan's test case correctly, the client is trying to read > a full page of 0x1000 bytes starting at offset 0x7fffffffffffff000. > That means the end offset for that read is 0x7fffffffffffff000 + 0x1000 > - 1 = 0x7fffffffffffffff. > > IOW: as far as the server is concerned, there is no loff_t overflow on > either the start or end offset and so there is no reason for it to > return NFS4ERR_INVAL. Yep, I agree there's server misbehavior, and I think Dan's server fix is on point. I would like to know why the client is looping, though. INVAL is a valid response the Linux server already uses in other cases and by itself should not trigger a READ retry. After checking the relevant XDR definitions, an NFS READ error response doesn't include the EOF flag, so I'm a little mystified why the client would need to retry after receiving INVAL. -- Chuck Lever ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSD: trim reads past NFS_OFFSET_MAX 2022-01-22 20:15 ` Chuck Lever III @ 2022-01-22 20:30 ` Trond Myklebust 2022-01-23 17:35 ` Chuck Lever III 0 siblings, 1 reply; 21+ messages in thread From: Trond Myklebust @ 2022-01-22 20:30 UTC (permalink / raw) To: chuck.lever; +Cc: dan.aloni, linux-nfs, Anna.Schumaker, trondmy On Sat, 2022-01-22 at 20:15 +0000, Chuck Lever III wrote: > > > > On Jan 22, 2022, at 1:27 PM, Trond Myklebust > > <trondmy@hammerspace.com> wrote: > > > > On Sat, 2022-01-22 at 17:05 +0000, Chuck Lever III wrote: > > > > > > > On Jan 22, 2022, at 7:47 AM, Dan Aloni <dan.aloni@vastdata.com> > > > > wrote: > > > > > > > > On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III > > > > wrote: > > > > > > On Jan 21, 2022, at 1:50 PM, Dan Aloni > > > > > > <dan.aloni@vastdata.com> > > > > > > wrote: > > > > > > > > > > > > Due to change 8cfb9015280d ("NFS: Always provide aligned > > > > > > buffers to the > > > > > > RPC read layers"), a read of 0xfff is aligned up to server > > > > > > rsize of > > > > > > 0x1000. > > > > > > > > > > > > As a result, in a test where the server has a file of size > > > > > > 0x7fffffffffffffff, and the client tries to read from the > > > > > > offset > > > > > > 0x7ffffffffffff000, the read causes loff_t overflow in the > > > > > > server and it > > > > > > returns an NFS code of EINVAL to the client. The client as > > > > > > a > > > > > > result > > > > > > indefinitely retries the request. > > > > > > > > > > An infinite loop in this case is a client bug. > > > > > > > > > > Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure > > > > > to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC > > > > > 5661 permits the NFSv4 READ operation to return > > > > > NFS4ERR_INVAL. > > > > > > > > > > Was the client side fix for this issue rejected? > > > > > > > > Yeah, see Trond's response in > > > > > > > > > > > > https://lore.kernel.org/linux-nfs/fa9974724216c43f9bdb3fd39555d398fde11e59.camel@hammerspace.com/ > > > > > > > > So it is both a client and server bugs? > > > > > > Splitting hairs, but yes there are issues on both sides > > > IMO. Bad behavior due to bugs on both sides is actually > > > not uncommon. > > > > > > Trond is correct that the server is not dealing totally > > > correctly with the range of values in a READ request. > > > > > > However, as I pointed out, the specification permits NFS > > > servers to return NFS[34]ERR_INVAL on READ. And in fact, > > > there is already code in the NFSv4 READ path that returns > > > INVAL, for example: > > > > > > 785 if (read->rd_offset >= OFFSET_MAX) > > > 786 return nfserr_inval; > > > > > > I'm not sure the specifications describe precisely when > > > the server /must/ return INVAL, but the client needs to > > > be prepared to handle it reasonably. If INVAL results in > > > an infinite loop, then that's a client bug. > > > > > > IMO changing the alignment for that case is a band-aid. > > > The underlying looping behavior is what is the root > > > problem. (So... I agree with Trond's NACK, but for > > > different reasons). > > > > If I'm reading Dan's test case correctly, the client is trying to > > read > > a full page of 0x1000 bytes starting at offset 0x7fffffffffffff000. > > That means the end offset for that read is 0x7fffffffffffff000 + > > 0x1000 > > - 1 = 0x7fffffffffffffff. > > > > IOW: as far as the server is concerned, there is no loff_t overflow > > on > > either the start or end offset and so there is no reason for it to > > return NFS4ERR_INVAL. > > Yep, I agree there's server misbehavior, and I think Dan's > server fix is on point. > > I would like to know why the client is looping, though. INVAL > is a valid response the Linux server already uses in other > cases and by itself should not trigger a READ retry. > > After checking the relevant XDR definitions, an NFS READ error > response doesn't include the EOF flag, so I'm a little mystified > why the client would need to retry after receiving INVAL. While we could certainly add that error to nfs_error_is_fatal(), the question is why the client should need to handle NFS4ERR_INVAL if it is sending valid arguments? 15.1.1.4. NFS4ERR_INVAL (Error Code 22) The arguments for this operation are not valid for some reason, even though they do match those specified in the XDR definition for the request. Sure... What does that mean, and what do I do? -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSD: trim reads past NFS_OFFSET_MAX 2022-01-22 20:30 ` Trond Myklebust @ 2022-01-23 17:35 ` Chuck Lever III 0 siblings, 0 replies; 21+ messages in thread From: Chuck Lever III @ 2022-01-23 17:35 UTC (permalink / raw) To: Trond Myklebust Cc: dan.aloni, Linux NFS Mailing List, Anna Schumaker, trondmy > On Jan 22, 2022, at 3:30 PM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Sat, 2022-01-22 at 20:15 +0000, Chuck Lever III wrote: >> >>> On Jan 22, 2022, at 1:27 PM, Trond Myklebust >>> <trondmy@hammerspace.com> wrote: >>> >>> On Sat, 2022-01-22 at 17:05 +0000, Chuck Lever III wrote: >>>> >>>>> On Jan 22, 2022, at 7:47 AM, Dan Aloni <dan.aloni@vastdata.com> >>>>> wrote: >>>>> >>>>> On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III >>>>> wrote: >>>>>>> On Jan 21, 2022, at 1:50 PM, Dan Aloni >>>>>>> <dan.aloni@vastdata.com> >>>>>>> wrote: >>>>>>> >>>>>>> Due to change 8cfb9015280d ("NFS: Always provide aligned >>>>>>> buffers to the >>>>>>> RPC read layers"), a read of 0xfff is aligned up to server >>>>>>> rsize of >>>>>>> 0x1000. >>>>>>> >>>>>>> As a result, in a test where the server has a file of size >>>>>>> 0x7fffffffffffffff, and the client tries to read from the >>>>>>> offset >>>>>>> 0x7ffffffffffff000, the read causes loff_t overflow in the >>>>>>> server and it >>>>>>> returns an NFS code of EINVAL to the client. The client as >>>>>>> a >>>>>>> result >>>>>>> indefinitely retries the request. >>>>>> >>>>>> An infinite loop in this case is a client bug. >>>>>> >>>>>> Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure >>>>>> to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC >>>>>> 5661 permits the NFSv4 READ operation to return >>>>>> NFS4ERR_INVAL. >>>>>> >>>>>> Was the client side fix for this issue rejected? >>>>> >>>>> Yeah, see Trond's response in >>>>> >>>>> >>>>> https://lore.kernel.org/linux-nfs/fa9974724216c43f9bdb3fd39555d398fde11e59.camel@hammerspace.com/ >>>>> >>>>> So it is both a client and server bugs? >>>> >>>> Splitting hairs, but yes there are issues on both sides >>>> IMO. Bad behavior due to bugs on both sides is actually >>>> not uncommon. >>>> >>>> Trond is correct that the server is not dealing totally >>>> correctly with the range of values in a READ request. >>>> >>>> However, as I pointed out, the specification permits NFS >>>> servers to return NFS[34]ERR_INVAL on READ. And in fact, >>>> there is already code in the NFSv4 READ path that returns >>>> INVAL, for example: >>>> >>>> 785 if (read->rd_offset >= OFFSET_MAX) >>>> 786 return nfserr_inval; >>>> >>>> I'm not sure the specifications describe precisely when >>>> the server /must/ return INVAL, but the client needs to >>>> be prepared to handle it reasonably. If INVAL results in >>>> an infinite loop, then that's a client bug. >>>> >>>> IMO changing the alignment for that case is a band-aid. >>>> The underlying looping behavior is what is the root >>>> problem. (So... I agree with Trond's NACK, but for >>>> different reasons). >>> >>> If I'm reading Dan's test case correctly, the client is trying to >>> read >>> a full page of 0x1000 bytes starting at offset 0x7fffffffffffff000. >>> That means the end offset for that read is 0x7fffffffffffff000 + >>> 0x1000 >>> - 1 = 0x7fffffffffffffff. >>> >>> IOW: as far as the server is concerned, there is no loff_t overflow >>> on >>> either the start or end offset and so there is no reason for it to >>> return NFS4ERR_INVAL. >> >> Yep, I agree there's server misbehavior, and I think Dan's >> server fix is on point. >> >> I would like to know why the client is looping, though. INVAL >> is a valid response the Linux server already uses in other >> cases and by itself should not trigger a READ retry. >> >> After checking the relevant XDR definitions, an NFS READ error >> response doesn't include the EOF flag, so I'm a little mystified >> why the client would need to retry after receiving INVAL. > > While we could certainly add that error to nfs_error_is_fatal(), the > question is why the client should need to handle NFS4ERR_INVAL if it is > sending valid arguments? As I said: I agree that Dan's test case is sending values in a range that NFSD should handle without error. That does need to be fixed. However, there are other instances where NFSD returns INVAL to a READ (and it has done so for a long while). Those cases really mustn't trigger an unterminating loop, especially since a ^C is not likely to unblock the application. That's why I'm still concerned about behavior when a server returns INVAL on a READ. > 15.1.1.4. NFS4ERR_INVAL (Error Code 22) > > The arguments for this operation are not valid for some reason, even > though they do match those specified in the XDR definition for the > request. > > > Sure... What does that mean, and what do I do? Let me try to paraphrase: A. RFC 1813 and 8881 permit servers to return INVAL on READ, but do not specify under which conditions to use it. This ambiguity might be reason for a server implementation to avoid that status code with READ. Have you considered filing an errata? B. Though the RFCs permit servers to return INVAL on READ, the Linux NFS client does not support it. The client is not spec-compliant in this regard, but that's because of the ambiguity described in A. C. Therefore the Linux NFS client treats INVAL on READ as unexpected input. I claim that when confronted with unexpected input (of any form) a "good quality" client implementation should avoid pathological behavior like unterminating loops.... That behavior is both an attack surface and potentially a problem if the client has to be rebooted to fully recover. The specific behavior of returning INVAL on READ is being addressed in the Linux server, but not root-causing and addressing the client's response to this behavior leaves a large set of potential issues in this same class. -- Chuck Lever ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSD: trim reads past NFS_OFFSET_MAX 2022-01-22 17:05 ` Chuck Lever III 2022-01-22 18:27 ` Trond Myklebust @ 2022-01-22 19:01 ` Dan Aloni 2022-01-22 20:33 ` Chuck Lever III 1 sibling, 1 reply; 21+ messages in thread From: Dan Aloni @ 2022-01-22 19:01 UTC (permalink / raw) To: Chuck Lever III Cc: Anna Schumaker, trondmy, Linux NFS Mailing List, Trond Myklebust On Sat, Jan 22, 2022 at 05:05:49PM +0000, Chuck Lever III wrote: > >>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > >>> index 738d564ca4ce..754f4e9ff4a2 100644 > >>> --- a/fs/nfsd/vfs.c > >>> +++ b/fs/nfsd/vfs.c > >>> @@ -1046,6 +1046,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp, > >>> __be32 err; > >>> > >>> trace_nfsd_read_start(rqstp, fhp, offset, *count); > >>> + > >>> + if (unlikely(offset + *count > NFS_OFFSET_MAX)) > >>> + *count = NFS_OFFSET_MAX - offset; > >> > >> Can @offset ever be larger than NFS_OFFSET_MAX? > > > > We have this check in `nfsd4_read`, `(read->rd_offset >= OFFSET_MAX)`. > > (should it have been `>` rather?). > > Don't think so, a zero-byte READ should be valid. Make sense. BTW, we have a `(argp->offset > NFS_OFFSET_MAX)` check resulting in EINVAL under `nfsd3_proc_commit`. Does it apply to writes as well? > However it's rather interesting that it does not use > NFS_OFFSET_MAX here. Does anyone know why NFSv3 uses > NFS_OFFSET_MAX but NFSv4 and NLM use OFFSET_MAX? NFS_OFFSET_MAX introduced in v2.3.31, which is before `OFFSET_MAX` was moved to a header file, which explains the comment on top of it, outdated for quite awhile: /* * This is really a general kernel constant, but since nothing like * this is defined in the kernel headers, I have to do it here. */ #define NFS_OFFSET_MAX ((__s64)((~(__u64)0) >> 1)) And `OFFSET_MAX` in linux/fs.h was introduced in v2.3.99pre4. Seems `OFFSET_MAX` always corresponds to 64-bit loff_t, so they seem inter-changeable to me. -- Dan Aloni ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] NFSD: trim reads past NFS_OFFSET_MAX 2022-01-22 19:01 ` Dan Aloni @ 2022-01-22 20:33 ` Chuck Lever III 0 siblings, 0 replies; 21+ messages in thread From: Chuck Lever III @ 2022-01-22 20:33 UTC (permalink / raw) To: Dan Aloni Cc: Anna Schumaker, trondmy, Linux NFS Mailing List, Trond Myklebust > On Jan 22, 2022, at 2:01 PM, Dan Aloni <dan.aloni@vastdata.com> wrote: > > On Sat, Jan 22, 2022 at 05:05:49PM +0000, Chuck Lever III wrote: >>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c >>>>> index 738d564ca4ce..754f4e9ff4a2 100644 >>>>> --- a/fs/nfsd/vfs.c >>>>> +++ b/fs/nfsd/vfs.c >>>>> @@ -1046,6 +1046,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp, >>>>> __be32 err; >>>>> >>>>> trace_nfsd_read_start(rqstp, fhp, offset, *count); >>>>> + >>>>> + if (unlikely(offset + *count > NFS_OFFSET_MAX)) >>>>> + *count = NFS_OFFSET_MAX - offset; >>>> >>>> Can @offset ever be larger than NFS_OFFSET_MAX? >>> >>> We have this check in `nfsd4_read`, `(read->rd_offset >= OFFSET_MAX)`. >>> (should it have been `>` rather?). >> >> Don't think so, a zero-byte READ should be valid. > > Make sense. BTW, we have a `(argp->offset > NFS_OFFSET_MAX)` check > resulting in EINVAL under `nfsd3_proc_commit`. Does it apply to writes > as well? Geez, that's whole 'nother can of worms. RFC 1813 section 3.3.21 does not list NFS3ERR_INVAL, and does not discuss what to do if the commit argument values are outside the range which the server or local filesystem supports. RFC 8881 section 15.2 (Table 6) does not list NFS4ERR_INVAL as a valid status code for the COMMIT operation, and likewise section 18.3 does not discuss how the server should respond when the commit argument values are invalid. Aside from nfsd3_proc_commit, nfsd_commit() is used by NFSv3 and NFSv4, and it has: 1129 __be32 err = nfserr_inval; 1130 1131 if (offset < 0) 1132 goto out; 1133 if (count != 0) { 1134 end = offset + (loff_t)count - 1; 1135 if (end < offset) 1136 goto out; 1137 } 1138 which I think is going to be problematic. But no-one has complained, so it's safe to defer changes here to another patch, IMO. >> However it's rather interesting that it does not use >> NFS_OFFSET_MAX here. Does anyone know why NFSv3 uses >> NFS_OFFSET_MAX but NFSv4 and NLM use OFFSET_MAX? > > NFS_OFFSET_MAX introduced in v2.3.31, which is before `OFFSET_MAX` was > moved to a header file, which explains the comment on top of it, > outdated for quite awhile: > > /* > * This is really a general kernel constant, but since nothing like > * this is defined in the kernel headers, I have to do it here. > */ > #define NFS_OFFSET_MAX ((__s64)((~(__u64)0) >> 1)) > > And `OFFSET_MAX` in linux/fs.h was introduced in v2.3.99pre4. Seems > `OFFSET_MAX` always corresponds to 64-bit loff_t, so they seem > inter-changeable to me. For now, add OFFSET_MAX in the NFSv4 paths, and use NFS_OFFSET_MAX in the NFSv3 paths, and at some point someone can propose a clean up to replace NFS_OFFSET_MAX with OFFSET_MAX. -- Chuck Lever ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v2] NFSD: trim reads past NFS_OFFSET_MAX and fix NFSv3 check 2022-01-21 22:32 ` Chuck Lever III 2022-01-22 12:47 ` Dan Aloni @ 2022-01-22 12:49 ` Dan Aloni 2022-01-22 17:37 ` Chuck Lever III 1 sibling, 1 reply; 21+ messages in thread From: Dan Aloni @ 2022-01-22 12:49 UTC (permalink / raw) To: chuck.lever; +Cc: Anna Schumaker, linux-nfs Due to change in client 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers"), a read of 0xfff is aligned up to server rsize of 0x0fff. As a result, in a test where the server has a file of size 0x7fffffffffffffff, and the client tries to read from the offset 0x7ffffffffffff000, the read causes loff_t overflow in the server and it returns an NFS code of EINVAL to the client. The client as a result indefinitely retries the request. This fixes the issue at server side by trimming reads past NFS_OFFSET_MAX. It also adds a missing check for out of bound offset in NFSv3. Fixes: 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers") Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> --- fs/nfsd/nfs4proc.c | 3 +++ fs/nfsd/vfs.c | 11 +++++++++++ 2 files changed, 14 insertions(+) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 486c5dba4b65..3b1e395a93b6 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -788,6 +788,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, trace_nfsd_read_start(rqstp, &cstate->current_fh, read->rd_offset, read->rd_length); + if (unlikely(read->rd_offset + read->rd_length > NFS_OFFSET_MAX)) + read->rd_length = NFS_OFFSET_MAX - read->rd_offset; + /* * If we do a zero copy read, then a client will see read data * that reflects the state of the file *after* performing the diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 738d564ca4ce..4a209f807466 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1046,6 +1046,16 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp, __be32 err; trace_nfsd_read_start(rqstp, fhp, offset, *count); + + if (unlikely(offset > NFS_OFFSET_MAX)) { + /* We can return this according to Section 3.3.6 */ + err = nfserr_inval; + goto error; + } + + if (unlikely(offset + *count > NFS_OFFSET_MAX)) + *count = NFS_OFFSET_MAX - offset; + err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf); if (err) return err; @@ -1058,6 +1068,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp, nfsd_file_put(nf); +error: trace_nfsd_read_done(rqstp, fhp, offset, *count); return err; -- 2.23.0 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v2] NFSD: trim reads past NFS_OFFSET_MAX and fix NFSv3 check 2022-01-22 12:49 ` [PATCH v2] NFSD: trim reads past NFS_OFFSET_MAX and fix NFSv3 check Dan Aloni @ 2022-01-22 17:37 ` Chuck Lever III 2022-01-23 6:29 ` Dan Aloni 2022-01-23 9:50 ` [PATCH v3] " Dan Aloni 0 siblings, 2 replies; 21+ messages in thread From: Chuck Lever III @ 2022-01-22 17:37 UTC (permalink / raw) To: Dan Aloni; +Cc: Anna Schumaker, Linux NFS Mailing List Some additional comments on v2 below. We need to sort the NFS_OFFSET_MAX v. OFFSET_MAX issue before you send a v3. > On Jan 22, 2022, at 7:49 AM, Dan Aloni <dan.aloni@vastdata.com> wrote: > > Due to change in client 8cfb9015280d ("NFS: Always provide aligned > buffers to the RPC read layers"), a read of 0xfff is aligned up to > server rsize of 0x0fff. scripts/checkpatch.pl will complain about the way you name the commit here. It will want: Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers") on the client, > As a result, in a test where the server has a file of size > 0x7fffffffffffffff, and the client tries to read from the offset > 0x7ffffffffffff000, the read causes loff_t overflow in the server and it > returns an NFS code of EINVAL to the client. The client as a result > indefinitely retries the request. > > This fixes the issue at server side by trimming reads past > NFS_OFFSET_MAX. It also adds a missing check for out of bound offset > in NFSv3. RFC 1813 section 3.3.6 does say this: >> It is possible for the server to return fewer than count >> bytes of data. If the server returns less than the count >> requested and eof set to FALSE, the client should issue >> another READ to get the remaining data. A server may >> return less data than requested under several >> circumstances. The file may have been truncated by another >> client or perhaps on the server itself, changing the file >> size from what the requesting client believes to be the >> case. This would reduce the actual amount of data >> available to the client. It is possible that the server >> may back off the transfer size and reduce the read request >> return. Server resource exhaustion may also occur >> necessitating a smaller read return. Similar language in RFC 8881 section 18.22.4: >> If the server returns a "short read" (i.e., fewer data than requested >> and eof is set to FALSE), the client should send another READ to get >> the remaining data. A server may return less data than requested >> under several circumstances. The file may have been truncated by >> another client or perhaps on the server itself, changing the file >> size from what the requesting client believes to be the case. This >> would reduce the actual amount of data available to the client. It >> is possible that the server reduce the transfer size and so return a >> short read result. Server resource exhaustion may also occur in a >> short read. So the server could be returning INVAL and leaving EOF set to false. That might be triggering the client to retry the READ. Does the server need to set the EOF flag if the READ arguments cross the limit of the range that the server can return? Does the client need to check for this case and stop retrying? The specs aren't particularly clear on this matter. > Fixes: 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers") It's arguable that you are fixing 8cfb9015280d. I don't think that commit is actually broken. Also, the server behavior is wrong even before that commit, and that commit is a client change... Mentioning this commit at the top of the patch description is fine, since that is how you discovered the problem, but I'd prefer if you drop this line. The patch does warrant a Cc: stable, though. > Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> > --- > fs/nfsd/nfs4proc.c | 3 +++ > fs/nfsd/vfs.c | 11 +++++++++++ > 2 files changed, 14 insertions(+) > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > index 486c5dba4b65..3b1e395a93b6 100644 > --- a/fs/nfsd/nfs4proc.c > +++ b/fs/nfsd/nfs4proc.c > @@ -788,6 +788,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > trace_nfsd_read_start(rqstp, &cstate->current_fh, > read->rd_offset, read->rd_length); > > + if (unlikely(read->rd_offset + read->rd_length > NFS_OFFSET_MAX)) > + read->rd_length = NFS_OFFSET_MAX - read->rd_offset; > + rd_offset is range-checked before the trace point, so I think this check needs to go before the trace point as well. That way the adjusted argument values are recorded. And we need to verify whether NFS_OFFSET_MAX is the correct constant for this check. > /* > * If we do a zero copy read, then a client will see read data > * that reflects the state of the file *after* performing the > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index 738d564ca4ce..4a209f807466 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -1046,6 +1046,16 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp, > __be32 err; > > trace_nfsd_read_start(rqstp, fhp, offset, *count); > + > + if (unlikely(offset > NFS_OFFSET_MAX)) { > + /* We can return this according to Section 3.3.6 */ RFC 1813 section 3.3.6 says that READ is permitted to return NFS3ERR_INVAL, but does not mandate that status code in this particular case (it's provided for general issues similar to this). So returning INVAL here is an implementation choice. Thus mentioning the spec here is IMO perhaps misleading, so I'd like you to drop this comment. > + err = nfserr_inval; > + goto error; > + } > + > + if (unlikely(offset + *count > NFS_OFFSET_MAX)) > + *count = NFS_OFFSET_MAX - offset; > + Same as above: these range-checks need to go before the trace point, IMO. > err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf); > if (err) > return err; > @@ -1058,6 +1068,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp, > > nfsd_file_put(nf); > > +error: > trace_nfsd_read_done(rqstp, fhp, offset, *count); > > return err; > -- > 2.23.0 -- Chuck Lever ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2] NFSD: trim reads past NFS_OFFSET_MAX and fix NFSv3 check 2022-01-22 17:37 ` Chuck Lever III @ 2022-01-23 6:29 ` Dan Aloni 2022-01-23 9:50 ` [PATCH v3] " Dan Aloni 1 sibling, 0 replies; 21+ messages in thread From: Dan Aloni @ 2022-01-23 6:29 UTC (permalink / raw) To: Chuck Lever III; +Cc: Anna Schumaker, Linux NFS Mailing List On Sat, Jan 22, 2022 at 05:37:16PM +0000, Chuck Lever III wrote: > Similar language in RFC 8881 section 18.22.4: > > >> If the server returns a "short read" (i.e., fewer data than requested > >> and eof is set to FALSE), the client should send another READ to get > >> the remaining data. A server may return less data than requested > >> under several circumstances. The file may have been truncated by > >> another client or perhaps on the server itself, changing the file > >> size from what the requesting client believes to be the case. This > >> would reduce the actual amount of data available to the client. It > >> is possible that the server reduce the transfer size and so return a > >> short read result. Server resource exhaustion may also occur in a > >> short read. > > So the server could be returning INVAL and leaving EOF set > to false. That might be triggering the client to retry the > READ. Does the server need to set the EOF flag if the READ > arguments cross the limit of the range that the server can > return? Does the client need to check for this case and > stop retrying? The specs aren't particularly clear on this > matter. But eof only in `resok` and not in `resfail`. For reference, here's the reply I got: Network File System, READ Reply Error: NFS3ERR_INVAL [Program Version: 3] [V3 Procedure: READ (6)] Status: NFS3ERR_INVAL (22) file_attributes Regular File mode: 0600 uid: 0 gid: 0 attributes_follow: value follows (1) attributes Regular File mode: 0600 uid: 0 gid: 0 Type: Regular File (1) Mode: 0600, S_IRUSR, S_IWUSR nlink: 1 uid: 0 gid: 0 size: 9223372036854775807 used: 4096 rdev: 0,0 fsid: 0x0000000000000002 (2) fileid: 69 atime: Jan 18, 2022 20:20:33.611267439 IST seconds: 1642530033 nano seconds: 611267439 mtime: Jan 18, 2022 20:20:33.571266608 IST seconds: 1642530033 nano seconds: 571266608 ctime: Jan 18, 2022 20:20:33.571266608 IST seconds: 1642530033 nano seconds: 571266608 -- Dan Aloni ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v3] NFSD: trim reads past NFS_OFFSET_MAX and fix NFSv3 check 2022-01-22 17:37 ` Chuck Lever III 2022-01-23 6:29 ` Dan Aloni @ 2022-01-23 9:50 ` Dan Aloni 2022-01-23 15:29 ` Trond Myklebust 1 sibling, 1 reply; 21+ messages in thread From: Dan Aloni @ 2022-01-23 9:50 UTC (permalink / raw) To: chuck.lever; +Cc: Anna Schumaker, linux-nfs, stable Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers") on the client, a read of 0xfff is aligned up to server rsize of 0x1000. As a result, in a test where the server has a file of size 0x7fffffffffffffff, and the client tries to read from the offset 0x7ffffffffffff000, the read causes loff_t overflow in the server and it returns an NFS code of EINVAL to the client. The client as a result indefinitely retries the request. This fixes the issue at server side by trimming reads past NFS_OFFSET_MAX. It also adds a missing check for out of bound offset in NFSv3, copying a similar check from NFSv4.x. Cc: <stable@vger.kernel.org> Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> --- fs/nfsd/nfs4proc.c | 3 +++ fs/nfsd/vfs.c | 6 ++++++ 2 files changed, 9 insertions(+) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 486c5dba4b65..816bdf212559 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -785,6 +785,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (read->rd_offset >= OFFSET_MAX) return nfserr_inval; + if (unlikely(read->rd_offset + read->rd_length > OFFSET_MAX)) + read->rd_length = OFFSET_MAX - read->rd_offset; + trace_nfsd_read_start(rqstp, &cstate->current_fh, read->rd_offset, read->rd_length); diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 738d564ca4ce..ad4df374433e 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1045,6 +1045,12 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file; __be32 err; + if (unlikely(offset >= NFS_OFFSET_MAX)) + return nfserr_inval; + + if (unlikely(offset + *count > NFS_OFFSET_MAX)) + *count = NFS_OFFSET_MAX - offset; + trace_nfsd_read_start(rqstp, fhp, offset, *count); err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf); if (err) -- 2.23.0 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v3] NFSD: trim reads past NFS_OFFSET_MAX and fix NFSv3 check 2022-01-23 9:50 ` [PATCH v3] " Dan Aloni @ 2022-01-23 15:29 ` Trond Myklebust 2022-01-23 17:03 ` Chuck Lever III 0 siblings, 1 reply; 21+ messages in thread From: Trond Myklebust @ 2022-01-23 15:29 UTC (permalink / raw) To: dan.aloni, chuck.lever; +Cc: linux-nfs, stable, Anna.Schumaker On Sun, 2022-01-23 at 11:50 +0200, Dan Aloni wrote: > Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to > the > RPC read layers") on the client, a read of 0xfff is aligned up to > server > rsize of 0x1000. > > As a result, in a test where the server has a file of size > 0x7fffffffffffffff, and the client tries to read from the offset > 0x7ffffffffffff000, the read causes loff_t overflow in the server and > it > returns an NFS code of EINVAL to the client. The client as a result > indefinitely retries the request. > > This fixes the issue at server side by trimming reads past > NFS_OFFSET_MAX. It also adds a missing check for out of bound offset > in > NFSv3, copying a similar check from NFSv4.x. > > Cc: <stable@vger.kernel.org> > Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> > --- > fs/nfsd/nfs4proc.c | 3 +++ > fs/nfsd/vfs.c | 6 ++++++ > 2 files changed, 9 insertions(+) > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > index 486c5dba4b65..816bdf212559 100644 > --- a/fs/nfsd/nfs4proc.c > +++ b/fs/nfsd/nfs4proc.c > @@ -785,6 +785,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct > nfsd4_compound_state *cstate, > if (read->rd_offset >= OFFSET_MAX) > return nfserr_inval; > > + if (unlikely(read->rd_offset + read->rd_length > OFFSET_MAX)) > + read->rd_length = OFFSET_MAX - read->rd_offset; > + > trace_nfsd_read_start(rqstp, &cstate->current_fh, > read->rd_offset, read->rd_length); > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index 738d564ca4ce..ad4df374433e 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -1045,6 +1045,12 @@ __be32 nfsd_read(struct svc_rqst *rqstp, > struct svc_fh *fhp, > struct file *file; > __be32 err; > > + if (unlikely(offset >= NFS_OFFSET_MAX)) > + return nfserr_inval; POSIX does allow you to lseek to the maximum filesize offset (sb- >s_maxbytes), however any subsequent read will return 0 bytes (i.e. eof), whereas a subsequent write will return EFBIG. > + > + if (unlikely(offset + *count > NFS_OFFSET_MAX)) > + *count = NFS_OFFSET_MAX - offset; Can we please fix this to use the actual per-filesystem file size limit, which is set as sb->s_maxbytes, instead of using NFS_OFFSET_MAX? Ditto for 'read' above. > + > trace_nfsd_read_start(rqstp, fhp, offset, *count); > err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf); > if (err) -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] NFSD: trim reads past NFS_OFFSET_MAX and fix NFSv3 check 2022-01-23 15:29 ` Trond Myklebust @ 2022-01-23 17:03 ` Chuck Lever III 2022-01-26 16:23 ` Chuck Lever III 0 siblings, 1 reply; 21+ messages in thread From: Chuck Lever III @ 2022-01-23 17:03 UTC (permalink / raw) To: Trond Myklebust; +Cc: dan.aloni, Linux NFS Mailing List, stable, Anna Schumaker > On Jan 23, 2022, at 10:29 AM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Sun, 2022-01-23 at 11:50 +0200, Dan Aloni wrote: >> Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to >> the >> RPC read layers") on the client, a read of 0xfff is aligned up to >> server >> rsize of 0x1000. >> >> As a result, in a test where the server has a file of size >> 0x7fffffffffffffff, and the client tries to read from the offset >> 0x7ffffffffffff000, the read causes loff_t overflow in the server and >> it >> returns an NFS code of EINVAL to the client. The client as a result >> indefinitely retries the request. >> >> This fixes the issue at server side by trimming reads past >> NFS_OFFSET_MAX. It also adds a missing check for out of bound offset >> in >> NFSv3, copying a similar check from NFSv4.x. >> >> Cc: <stable@vger.kernel.org> >> Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> >> --- >> fs/nfsd/nfs4proc.c | 3 +++ >> fs/nfsd/vfs.c | 6 ++++++ >> 2 files changed, 9 insertions(+) >> >> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c >> index 486c5dba4b65..816bdf212559 100644 >> --- a/fs/nfsd/nfs4proc.c >> +++ b/fs/nfsd/nfs4proc.c >> @@ -785,6 +785,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct >> nfsd4_compound_state *cstate, >> if (read->rd_offset >= OFFSET_MAX) >> return nfserr_inval; >> >> + if (unlikely(read->rd_offset + read->rd_length > OFFSET_MAX)) >> + read->rd_length = OFFSET_MAX - read->rd_offset; >> + >> trace_nfsd_read_start(rqstp, &cstate->current_fh, >> read->rd_offset, read->rd_length); >> >> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c >> index 738d564ca4ce..ad4df374433e 100644 >> --- a/fs/nfsd/vfs.c >> +++ b/fs/nfsd/vfs.c >> @@ -1045,6 +1045,12 @@ __be32 nfsd_read(struct svc_rqst *rqstp, >> struct svc_fh *fhp, >> struct file *file; >> __be32 err; >> >> + if (unlikely(offset >= NFS_OFFSET_MAX)) >> + return nfserr_inval; > > POSIX does allow you to lseek to the maximum filesize offset (sb- >> s_maxbytes), however any subsequent read will return 0 bytes (i.e. > eof), whereas a subsequent write will return EFBIG. I'm simply trying to clarify your request. You've stated that the Linux NFS client does not handle INVAL responses, even though both RFC 1813 and 8881 permit it. So are you suggesting (here) that the Linux NFS server should not return INVAL on READs beyond the filesystem's supported maximum file size but instead return a successful 0-byte result with EOF set? >> + >> + if (unlikely(offset + *count > NFS_OFFSET_MAX)) >> + *count = NFS_OFFSET_MAX - offset; > > Can we please fix this to use the actual per-filesystem file size > limit, which is set as sb->s_maxbytes, instead of using NFS_OFFSET_MAX? > > Ditto for 'read' above. That sounds reasonable. But I wonder if there are some other places that need the same treatment. >> + >> trace_nfsd_read_start(rqstp, fhp, offset, *count); >> err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf); >> if (err) -- Chuck Lever ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3] NFSD: trim reads past NFS_OFFSET_MAX and fix NFSv3 check 2022-01-23 17:03 ` Chuck Lever III @ 2022-01-26 16:23 ` Chuck Lever III 0 siblings, 0 replies; 21+ messages in thread From: Chuck Lever III @ 2022-01-26 16:23 UTC (permalink / raw) To: dan.aloni; +Cc: Linux NFS Mailing List, Trond Myklebust, Anna Schumaker Hi Dan- Dropping stable@. No need to copy them on this discussion. Also, you don't need to actually cc: stable when you repost. Just add the tag as you did below. Automation will pick up the patch when it goes into mainline. More below. > On Jan 23, 2022, at 12:03 PM, Chuck Lever III <chuck.lever@oracle.com> wrote: > >> >> On Jan 23, 2022, at 10:29 AM, Trond Myklebust <trondmy@hammerspace.com> wrote: >> >> On Sun, 2022-01-23 at 11:50 +0200, Dan Aloni wrote: >>> Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to >>> the >>> RPC read layers") on the client, a read of 0xfff is aligned up to >>> server >>> rsize of 0x1000. >>> >>> As a result, in a test where the server has a file of size >>> 0x7fffffffffffffff, and the client tries to read from the offset >>> 0x7ffffffffffff000, the read causes loff_t overflow in the server and >>> it >>> returns an NFS code of EINVAL to the client. The client as a result >>> indefinitely retries the request. >>> >>> This fixes the issue at server side by trimming reads past >>> NFS_OFFSET_MAX. It also adds a missing check for out of bound offset >>> in >>> NFSv3, copying a similar check from NFSv4.x. >>> >>> Cc: <stable@vger.kernel.org> >>> Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> >>> --- >>> fs/nfsd/nfs4proc.c | 3 +++ >>> fs/nfsd/vfs.c | 6 ++++++ >>> 2 files changed, 9 insertions(+) >>> >>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c >>> index 486c5dba4b65..816bdf212559 100644 >>> --- a/fs/nfsd/nfs4proc.c >>> +++ b/fs/nfsd/nfs4proc.c >>> @@ -785,6 +785,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct >>> nfsd4_compound_state *cstate, >>> if (read->rd_offset >= OFFSET_MAX) >>> return nfserr_inval; >>> >>> + if (unlikely(read->rd_offset + read->rd_length > OFFSET_MAX)) >>> + read->rd_length = OFFSET_MAX - read->rd_offset; >>> + >>> trace_nfsd_read_start(rqstp, &cstate->current_fh, >>> read->rd_offset, read->rd_length); >>> >>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c >>> index 738d564ca4ce..ad4df374433e 100644 >>> --- a/fs/nfsd/vfs.c >>> +++ b/fs/nfsd/vfs.c >>> @@ -1045,6 +1045,12 @@ __be32 nfsd_read(struct svc_rqst *rqstp, >>> struct svc_fh *fhp, >>> struct file *file; >>> __be32 err; >>> >>> + if (unlikely(offset >= NFS_OFFSET_MAX)) >>> + return nfserr_inval; >> >> POSIX does allow you to lseek to the maximum filesize offset (sb- >>> s_maxbytes), however any subsequent read will return 0 bytes (i.e. >> eof), whereas a subsequent write will return EFBIG. > > I'm simply trying to clarify your request. > > You've stated that the Linux NFS client does not handle INVAL > responses, even though both RFC 1813 and 8881 permit it. So > are you suggesting (here) that the Linux NFS server should > not return INVAL on READs beyond the filesystem's supported > maximum file size but instead return a successful 0-byte > result with EOF set? After some thought and discussion with Solaris NFS server engineers, I think this is the best response to a READ whose range arguments go past the server's advertised maxfilesize. So can you please confirm that your final fix does: 1. The range of values that was failing before but shouldn't have, now succeeds 2. When the offset is less than maxfilesize but offset+count exceeds it, the READ should succeed but return a short result and set EOF 3. When the range is completely outside maxfilesize, the READ should succeed but return zero bytes and set EOF And I don't mind if you split the fix over two or three patches if that makes it more clear. >>> + >>> + if (unlikely(offset + *count > NFS_OFFSET_MAX)) >>> + *count = NFS_OFFSET_MAX - offset; >> >> Can we please fix this to use the actual per-filesystem file size >> limit, which is set as sb->s_maxbytes, instead of using NFS_OFFSET_MAX? >> >> Ditto for 'read' above. > > That sounds reasonable. I'm wondering whether the VFS itself will bound the range arguments relative to sb->s_maxbytes. I'm told that the kiocb iterators used in fs/nfsd/vfs.c should do that. All that NFSD has to do is ensure that the incoming client values are converted from u64 to loff_t without underflowing. So comparing @offset with OFFSET_MAX here seems like the right thing to do? We just don't want the READ to fail with INVAL. > But I wonder if there are some other > places that need the same treatment. I've confirmed that there /are/ other places that need to be fixed. I've created a series of patches that will address those. The first of those was the COMMIT patch I posted yesterday. Dan, I'd like to include your READ fixes in that series. >>> + >>> trace_nfsd_read_start(rqstp, fhp, offset, *count); >>> err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf); >>> if (err) > > -- > Chuck Lever -- Chuck Lever ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2022-01-26 16:23 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-08-27 18:00 [PATCH] NFS: Always provide aligned buffers to the RPC read layers trondmy 2022-01-18 19:26 ` Dan Aloni 2022-01-18 19:33 ` [PATCH] NFS: fix an infinite request retry in an off-by-one last page read Dan Aloni 2022-01-18 19:43 ` Trond Myklebust 2022-01-21 18:50 ` [PATCH] NFSD: trim reads past NFS_OFFSET_MAX Dan Aloni 2022-01-21 22:32 ` Chuck Lever III 2022-01-22 12:47 ` Dan Aloni 2022-01-22 17:05 ` Chuck Lever III 2022-01-22 18:27 ` Trond Myklebust 2022-01-22 20:15 ` Chuck Lever III 2022-01-22 20:30 ` Trond Myklebust 2022-01-23 17:35 ` Chuck Lever III 2022-01-22 19:01 ` Dan Aloni 2022-01-22 20:33 ` Chuck Lever III 2022-01-22 12:49 ` [PATCH v2] NFSD: trim reads past NFS_OFFSET_MAX and fix NFSv3 check Dan Aloni 2022-01-22 17:37 ` Chuck Lever III 2022-01-23 6:29 ` Dan Aloni 2022-01-23 9:50 ` [PATCH v3] " Dan Aloni 2022-01-23 15:29 ` Trond Myklebust 2022-01-23 17:03 ` Chuck Lever III 2022-01-26 16:23 ` Chuck Lever III
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.