Re: [PATCH v3] NFSD: trim reads past NFS_OFFSET_MAX and fix NFSv3 check

From: Chuck Lever III <chuck.lever@oracle.com>
To: "dan.aloni@vastdata.com" <dan.aloni@vastdata.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	Trond Myklebust <trondmy@hammerspace.com>,
	Anna Schumaker <Anna.Schumaker@netapp.com>
Subject: Re: [PATCH v3] NFSD: trim reads past NFS_OFFSET_MAX and fix NFSv3 check
Date: Wed, 26 Jan 2022 16:23:02 +0000	[thread overview]
Message-ID: <F94FB612-02EF-4524-97A7-C4DCD1E1902E@oracle.com> (raw)
In-Reply-To: <A1AF622A-6794-4104-8A89-EA5CD9E19D7D@oracle.com>

Hi Dan-

Dropping stable@. No need to copy them on this discussion.

Also, you don't need to actually cc: stable when you repost.
Just add the tag as you did below. Automation will pick up
the patch when it goes into mainline.

More below.

> On Jan 23, 2022, at 12:03 PM, Chuck Lever III <chuck.lever@oracle.com> wrote:
> 
>> 
>> On Jan 23, 2022, at 10:29 AM, Trond Myklebust <trondmy@hammerspace.com> wrote:
>> 
>> On Sun, 2022-01-23 at 11:50 +0200, Dan Aloni wrote:
>>> Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to
>>> the
>>> RPC read layers") on the client, a read of 0xfff is aligned up to
>>> server
>>> rsize of 0x1000.
>>> 
>>> As a result, in a test where the server has a file of size
>>> 0x7fffffffffffffff, and the client tries to read from the offset
>>> 0x7ffffffffffff000, the read causes loff_t overflow in the server and
>>> it
>>> returns an NFS code of EINVAL to the client. The client as a result
>>> indefinitely retries the request.
>>> 
>>> This fixes the issue at server side by trimming reads past
>>> NFS_OFFSET_MAX. It also adds a missing check for out of bound offset
>>> in
>>> NFSv3, copying a similar check from NFSv4.x.
>>> 
>>> Cc: <stable@vger.kernel.org>
>>> Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
>>> ---
>>> fs/nfsd/nfs4proc.c | 3 +++
>>> fs/nfsd/vfs.c      | 6 ++++++
>>> 2 files changed, 9 insertions(+)
>>> 
>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>> index 486c5dba4b65..816bdf212559 100644
>>> --- a/fs/nfsd/nfs4proc.c
>>> +++ b/fs/nfsd/nfs4proc.c
>>> @@ -785,6 +785,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct
>>> nfsd4_compound_state *cstate,
>>>        if (read->rd_offset >= OFFSET_MAX)
>>>                return nfserr_inval;
>>> 
>>> +       if (unlikely(read->rd_offset + read->rd_length > OFFSET_MAX))
>>> +               read->rd_length = OFFSET_MAX - read->rd_offset;
>>> +
>>>        trace_nfsd_read_start(rqstp, &cstate->current_fh,
>>>                              read->rd_offset, read->rd_length);
>>> 
>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>> index 738d564ca4ce..ad4df374433e 100644
>>> --- a/fs/nfsd/vfs.c
>>> +++ b/fs/nfsd/vfs.c
>>> @@ -1045,6 +1045,12 @@ __be32 nfsd_read(struct svc_rqst *rqstp,
>>> struct svc_fh *fhp,
>>>        struct file *file;
>>>        __be32 err;
>>> 
>>> +       if (unlikely(offset >= NFS_OFFSET_MAX))
>>> +               return nfserr_inval;
>> 
>> POSIX does allow you to lseek to the maximum filesize offset (sb-
>>> s_maxbytes), however any subsequent read will return 0 bytes (i.e.
>> eof), whereas a subsequent write will return EFBIG.
> 
> I'm simply trying to clarify your request.
> 
> You've stated that the Linux NFS client does not handle INVAL
> responses, even though both RFC 1813 and 8881 permit it. So
> are you suggesting (here) that the Linux NFS server should
> not return INVAL on READs beyond the filesystem's supported
> maximum file size but instead return a successful 0-byte
> result with EOF set?

After some thought and discussion with Solaris NFS server
engineers, I think this is the best response to a READ
whose range arguments go past the server's advertised
maxfilesize.

So can you please confirm that your final fix does:

1. The range of values that was failing before but shouldn't
   have, now succeeds

2. When the offset is less than maxfilesize but offset+count
   exceeds it, the READ should succeed but return a short
   result and set EOF

3. When the range is completely outside maxfilesize, the
   READ should succeed but return zero bytes and set EOF

And I don't mind if you split the fix over two or three
patches if that makes it more clear.

>>> +
>>> +       if (unlikely(offset + *count > NFS_OFFSET_MAX))
>>> +               *count = NFS_OFFSET_MAX - offset;
>> 
>> Can we please fix this to use the actual per-filesystem file size
>> limit, which is set as sb->s_maxbytes, instead of using NFS_OFFSET_MAX?
>> 
>> Ditto for 'read' above.
> 
> That sounds reasonable.

I'm wondering whether the VFS itself will bound the range
arguments relative to sb->s_maxbytes. I'm told that the
kiocb iterators used in fs/nfsd/vfs.c should do that.

All that NFSD has to do is ensure that the incoming
client values are converted from u64 to loff_t without
underflowing. So comparing @offset with OFFSET_MAX here
seems like the right thing to do?

We just don't want the READ to fail with INVAL.

> But I wonder if there are some other
> places that need the same treatment.

I've confirmed that there /are/ other places that need to
be fixed. I've created a series of patches that will address
those. The first of those was the COMMIT patch I posted
yesterday.

Dan, I'd like to include your READ fixes in that series.

>>> +
>>>        trace_nfsd_read_start(rqstp, fhp, offset, *count);
>>>        err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
>>>        if (err)
> 
> --
> Chuck Lever

--
Chuck Lever