All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/6] NFSD size, offset, and count sanity
@ 2022-01-27 16:08 Chuck Lever
  2022-01-27 16:08 ` [PATCH RFC 1/6] NFSD: Fix NFSv4 SETATTR's handling of large file sizes Chuck Lever
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Chuck Lever @ 2022-01-27 16:08 UTC (permalink / raw)
  To: linux-nfs, linux-fsdevel

Dan Aloni reported a problem with the way NFSD's READ implementation
deals with the very upper end of file sizes, and I got interested in
how some of the other operations handled it. I found some issues,
and have started a (growing) pile of patches to deal with them.

Since at least the SETATTR case appears to cause a crash on some
filesystems, I think several of these are 5.17-rc fodder (i.e.,
priority bug fixes). I see that NLM also has potential problems with
how the max file size is handled, but since locking doesn't involve
the page cache, I think fixes in that area can be delayed a bit.

Dan's still working on the READ issue. I need some input on whether
I understand the problem correctly and whether the NFS status codes
I've chosen to use are going to be reasonable or a problem for NFS
clients. I've attempted to stay within the bound of the NFS specs,
but sometimes the spec doesn't provide a mechanism in the protocol
to indicate that the client passed us a bogus size/offset/count.

---

Chuck Lever (6):
      NFSD: Fix NFSv4 SETATTR's handling of large file sizes
      NFSD: Fix NFSv3 SETATTR's handling of large file sizes
      NFSD: COMMIT operations must not return NFS?ERR_INVAL
      NFSD: Replace directory offset placeholder
      NFSD: Remove NFS_OFFSET_MAX
      NFSD: Clamp WRITE offsets


 fs/nfsd/nfs3proc.c  | 32 +++++++++++++++++++++------
 fs/nfsd/nfs3xdr.c   |  4 ++--
 fs/nfsd/nfs4proc.c  |  7 +++++-
 fs/nfsd/nfs4xdr.c   |  2 +-
 fs/nfsd/vfs.c       | 53 ++++++++++++++++++++++++++++++---------------
 fs/nfsd/vfs.h       |  4 ++--
 include/linux/nfs.h |  8 -------
 7 files changed, 72 insertions(+), 38 deletions(-)

--
Chuck Lever

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH RFC 1/6] NFSD: Fix NFSv4 SETATTR's handling of large file sizes
  2022-01-27 16:08 [PATCH RFC 0/6] NFSD size, offset, and count sanity Chuck Lever
@ 2022-01-27 16:08 ` Chuck Lever
  2022-01-28  0:36   ` Dave Chinner
  2022-01-27 16:08 ` [PATCH RFC 2/6] NFSD: Fix NFSv3 " Chuck Lever
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Chuck Lever @ 2022-01-27 16:08 UTC (permalink / raw)
  To: linux-nfs, linux-fsdevel

iattr::ia_size is a loff_t. decode_fattr4() dumps a full u64 value
in there. If that value is larger than S64_MAX, then ia_size has
underflowed.

In this case the negative size is passed through to the VFS and
underlying filesystems. I've observed XFS behavior: it returns
EIO but still attempts to access past the end of the device.
IOW it assumes the caller has already sanity-checked the value.

Have our server return NFS4ERR_FBIG to the client when the passed-in
file size cannot be held in a loff_t variable.

> 15.1.4.4.  NFS4ERR_FBIG (Error Code 27)
>
>    The file is too large.  The operation would have caused the file to
>    grow beyond the server's limit.

It's likely that other NFSv4 operations that take a fattr4 argument
(such as OPEN) have a similar issue).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs4proc.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index ed1ee25647be..b8ac2b9bce74 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -972,6 +972,9 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	int err;
 
 	if (setattr->sa_iattr.ia_valid & ATTR_SIZE) {
+		if (setattr->sa_iattr.ia_size < 0)
+			return nfserr_fbig;
+
 		status = nfs4_preprocess_stateid_op(rqstp, cstate,
 				&cstate->current_fh, &setattr->sa_stateid,
 				WR_STATE, NULL, NULL);


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 2/6] NFSD: Fix NFSv3 SETATTR's handling of large file sizes
  2022-01-27 16:08 [PATCH RFC 0/6] NFSD size, offset, and count sanity Chuck Lever
  2022-01-27 16:08 ` [PATCH RFC 1/6] NFSD: Fix NFSv4 SETATTR's handling of large file sizes Chuck Lever
@ 2022-01-27 16:08 ` Chuck Lever
  2022-01-27 16:08 ` [PATCH RFC 3/6] NFSD: COMMIT operations must not return NFS?ERR_INVAL Chuck Lever
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Chuck Lever @ 2022-01-27 16:08 UTC (permalink / raw)
  To: linux-nfs, linux-fsdevel

iattr::ia_size is a loff_t, so the XDR decoders must be careful to
deal with incoming client size values that are larger than s64_max.

VFS size comparisons (like in inode_newsize_ok) should now work as
expected -- it returns -EFBIG if the new size is larger than the
underlying filesystem's s_maxbytes.

However, RFC 1813 permits only WRITE to return NFS3ERR_FBIG. Add an
extra check to prevent NFSv3 SETATTR from returning FBIG.

Other NFSv3 procedures that take sattr3 arguments need to be audited
for this issue.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs3proc.c |   19 ++++++++++++++++++-
 fs/nfsd/nfs3xdr.c  |    2 +-
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index 8ef53f6726ec..aa0f0261ddac 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -66,13 +66,30 @@ nfsd3_proc_setattr(struct svc_rqst *rqstp)
 {
 	struct nfsd3_sattrargs *argp = rqstp->rq_argp;
 	struct nfsd3_attrstat *resp = rqstp->rq_resp;
+	struct iattr *iap = &argp->attrs;
 
 	dprintk("nfsd: SETATTR(3)  %s\n",
 				SVCFH_fmt(&argp->fh));
 
 	fh_copy(&resp->fh, &argp->fh);
-	resp->status = nfsd_setattr(rqstp, &resp->fh, &argp->attrs,
+
+	if (iap->ia_valid & ATTR_SIZE) {
+		struct super_block *sb;
+
+		resp->status = fh_verify(rqstp, &resp->fh, S_IFREG,
+			NFSD_MAY_SATTR|NFSD_MAY_WRITE|NFSD_MAY_OWNER_OVERRIDE);
+		if (resp->status != nfs_ok)
+			goto out;
+
+		resp->status = nfserr_inval;
+		sb = resp->fh.fh_dentry->d_sb;
+		if (iap->ia_size < 0 || iap->ia_size > sb->s_maxbytes)
+			goto out;
+	}
+
+	resp->status = nfsd_setattr(rqstp, &resp->fh, iap,
 				    argp->check_guard, argp->guardtime);
+out:
 	return rpc_success;
 }
 
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 7c45ba4db61b..2e47a07029f1 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -254,7 +254,7 @@ svcxdr_decode_sattr3(struct svc_rqst *rqstp, struct xdr_stream *xdr,
 		if (xdr_stream_decode_u64(xdr, &newsize) < 0)
 			return false;
 		iap->ia_valid |= ATTR_SIZE;
-		iap->ia_size = min_t(u64, newsize, NFS_OFFSET_MAX);
+		iap->ia_size = newsize;
 	}
 	if (xdr_stream_decode_u32(xdr, &set_it) < 0)
 		return false;


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 3/6] NFSD: COMMIT operations must not return NFS?ERR_INVAL
  2022-01-27 16:08 [PATCH RFC 0/6] NFSD size, offset, and count sanity Chuck Lever
  2022-01-27 16:08 ` [PATCH RFC 1/6] NFSD: Fix NFSv4 SETATTR's handling of large file sizes Chuck Lever
  2022-01-27 16:08 ` [PATCH RFC 2/6] NFSD: Fix NFSv3 " Chuck Lever
@ 2022-01-27 16:08 ` Chuck Lever
  2022-01-27 16:08 ` [PATCH RFC 4/6] NFSD: Replace directory offset placeholder Chuck Lever
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Chuck Lever @ 2022-01-27 16:08 UTC (permalink / raw)
  To: linux-nfs, linux-fsdevel

Since, well, forever, the Linux NFS server's nfsd_commit() function
has returned nfserr_inval when the passed-in byte range arguments
were non-sensical.

However, according to RFC 1813 section 3.3.21, NFSv3 COMMIT requests
are permitted to return only the following non-zero status codes:

      NFS3ERR_IO
      NFS3ERR_STALE
      NFS3ERR_BADHANDLE
      NFS3ERR_SERVERFAULT

NFS3ERR_INVAL is not included in that list. Likewise, NFS4ERR_INVAL
is not listed in the COMMIT row of Table 6 in RFC 8881.

Instead of dropping or failing a COMMIT request in a byte range that
is not supported, turn it into a valid request by treating one or
both arguments as zero. Offset zero means start-of-file, count zero
means until-end-of-file, so we only ever extend the commit range.
NFS servers are always allowed to commit more and sooner than
requested.

The range check is no longer bounded by NFS_OFFSET_MAX, but rather
by the value that is returned in the maxfilesize field of the NFSv3
FSINFO procedure or the NFSv4 maxfilesize file attribute.

Note that this change results in a new pynfs failure:

CMT4     st_commit.testCommitOverflow                             : RUNNING
CMT4     st_commit.testCommitOverflow                             : FAILURE
           COMMIT with offset + count overflow should return
           NFS4ERR_INVAL, instead got NFS4_OK

IMO the test is not correct as written: RFC 8881 does not allow the
COMMIT operation to return NFS4ERR_INVAL.

Reported-by: Dan Aloni <dan.aloni@vastdata.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs3proc.c |    6 ------
 fs/nfsd/vfs.c      |   53 +++++++++++++++++++++++++++++++++++-----------------
 fs/nfsd/vfs.h      |    4 ++--
 3 files changed, 38 insertions(+), 25 deletions(-)

diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index aa0f0261ddac..7bca219a8146 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -668,15 +668,9 @@ nfsd3_proc_commit(struct svc_rqst *rqstp)
 				argp->count,
 				(unsigned long long) argp->offset);
 
-	if (argp->offset > NFS_OFFSET_MAX) {
-		resp->status = nfserr_inval;
-		goto out;
-	}
-
 	fh_copy(&resp->fh, &argp->fh);
 	resp->status = nfsd_commit(rqstp, &resp->fh, argp->offset,
 				   argp->count, resp->verf);
-out:
 	return rpc_success;
 }
 
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 99c2b9dfbb10..f9f72b1ecb73 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1110,42 +1110,61 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset,
 }
 
 #ifdef CONFIG_NFSD_V3
-/*
- * Commit all pending writes to stable storage.
+/**
+ * nfsd_commit - Commit pending writes to stable storage
+ * @rqstp: RPC request being processed
+ * @fhp: NFS filehandle
+ * @offset: raw offset from beginning of file
+ * @count: raw count of bytes to sync
+ * @verf: filled in with the server's current write verifier
  *
- * Note: we only guarantee that data that lies within the range specified
- * by the 'offset' and 'count' parameters will be synced.
+ * Note: we guarantee that data that lies within the range specified
+ * by the 'offset' and 'count' parameters will be synced. The server
+ * is permitted to sync data that lies outside this range at the
+ * same time.
  *
  * Unfortunately we cannot lock the file to make sure we return full WCC
  * data to the client, as locking happens lower down in the filesystem.
+ *
+ * Return values:
+ *   An nfsstat value in network byte order.
  */
 __be32
-nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
-               loff_t offset, unsigned long count, __be32 *verf)
+nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, u64 offset,
+	    u32 count, __be32 *verf)
 {
+	u64			maxbytes;
+	loff_t			start, end;
 	struct nfsd_net		*nn;
 	struct nfsd_file	*nf;
-	loff_t			end = LLONG_MAX;
-	__be32			err = nfserr_inval;
-
-	if (offset < 0)
-		goto out;
-	if (count != 0) {
-		end = offset + (loff_t)count - 1;
-		if (end < offset)
-			goto out;
-	}
+	__be32			err;
 
 	err = nfsd_file_acquire(rqstp, fhp,
 			NFSD_MAY_WRITE|NFSD_MAY_NOT_BREAK_LEASE, &nf);
 	if (err)
 		goto out;
+
+	/*
+	 * Convert the client-provided (offset, count) range to a
+	 * (start, end) range. If the client-provided range falls
+	 * outside the maximum file size of the underlying FS,
+	 * clamp the sync range appropriately.
+	 */
+	start = 0;
+	end = LLONG_MAX;
+	maxbytes = (u64)fhp->fh_dentry->d_sb->s_maxbytes;
+	if (offset < maxbytes) {
+		start = offset;
+		if (count && (offset + count - 1 < maxbytes))
+			end = offset + count - 1;
+	}
+
 	nn = net_generic(nf->nf_net, nfsd_net_id);
 	if (EX_ISSYNC(fhp->fh_export)) {
 		errseq_t since = READ_ONCE(nf->nf_file->f_wb_err);
 		int err2;
 
-		err2 = vfs_fsync_range(nf->nf_file, offset, end, 0);
+		err2 = vfs_fsync_range(nf->nf_file, start, end, 0);
 		switch (err2) {
 		case 0:
 			nfsd_copy_write_verifier(verf, nn);
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 9f56dcb22ff7..2c43d10e3cab 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -74,8 +74,8 @@ __be32		do_nfsd_create(struct svc_rqst *, struct svc_fh *,
 				char *name, int len, struct iattr *attrs,
 				struct svc_fh *res, int createmode,
 				u32 *verifier, bool *truncp, bool *created);
-__be32		nfsd_commit(struct svc_rqst *, struct svc_fh *,
-				loff_t, unsigned long, __be32 *verf);
+__be32		nfsd_commit(struct svc_rqst *rqst, struct svc_fh *fhp,
+				u64 offset, u32 count, __be32 *verf);
 #endif /* CONFIG_NFSD_V3 */
 #ifdef CONFIG_NFSD_V4
 __be32		nfsd_getxattr(struct svc_rqst *rqstp, struct svc_fh *fhp,


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 4/6] NFSD: Replace directory offset placeholder
  2022-01-27 16:08 [PATCH RFC 0/6] NFSD size, offset, and count sanity Chuck Lever
                   ` (2 preceding siblings ...)
  2022-01-27 16:08 ` [PATCH RFC 3/6] NFSD: COMMIT operations must not return NFS?ERR_INVAL Chuck Lever
@ 2022-01-27 16:08 ` Chuck Lever
  2022-01-27 16:08 ` [PATCH RFC 5/6] NFSD: Remove NFS_OFFSET_MAX Chuck Lever
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Chuck Lever @ 2022-01-27 16:08 UTC (permalink / raw)
  To: linux-nfs, linux-fsdevel

I'm about to remove NFS_OFFSET_MAX, so use a different symbolic
constant to mark the place of the offset field in directory
entries. OFFSET_MAX has the same value as NFS_OFFSET_MAX.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs3xdr.c |    2 +-
 fs/nfsd/nfs4xdr.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 2e47a07029f1..0293b8d65f10 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -1060,7 +1060,7 @@ svcxdr_encode_entry3_common(struct nfsd3_readdirres *resp, const char *name,
 		return false;
 	/* cookie */
 	resp->cookie_offset = dirlist->len;
-	if (xdr_stream_encode_u64(xdr, NFS_OFFSET_MAX) < 0)
+	if (xdr_stream_encode_u64(xdr, OFFSET_MAX) < 0)
 		return false;
 
 	return true;
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 7d2217cdaeaa..64d73b750491 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3505,7 +3505,7 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen,
 	p = xdr_reserve_space(xdr, 3*4 + namlen);
 	if (!p)
 		goto fail;
-	p = xdr_encode_hyper(p, NFS_OFFSET_MAX);    /* offset of next entry */
+	p = xdr_encode_hyper(p, OFFSET_MAX);        /* offset of next entry */
 	p = xdr_encode_array(p, name, namlen);      /* name length & name */
 
 	nfserr = nfsd4_encode_dirent_fattr(xdr, cd, name, namlen);


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 5/6] NFSD: Remove NFS_OFFSET_MAX
  2022-01-27 16:08 [PATCH RFC 0/6] NFSD size, offset, and count sanity Chuck Lever
                   ` (3 preceding siblings ...)
  2022-01-27 16:08 ` [PATCH RFC 4/6] NFSD: Replace directory offset placeholder Chuck Lever
@ 2022-01-27 16:08 ` Chuck Lever
  2022-01-27 16:09 ` [PATCH RFC 6/6] NFSD: Clamp WRITE offsets Chuck Lever
  2022-01-27 18:13 ` [PATCH RFC 0/6] NFSD size, offset, and count sanity Frank Filz
  6 siblings, 0 replies; 11+ messages in thread
From: Chuck Lever @ 2022-01-27 16:08 UTC (permalink / raw)
  To: linux-nfs, linux-fsdevel

This constant was introduced way back in Linux v2.3.y before there
was a kernel-wide OFFSET_MAX value. These days we prefer to check
against the limits in the underlying filesystem instead.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/nfs.h |    8 --------
 1 file changed, 8 deletions(-)

diff --git a/include/linux/nfs.h b/include/linux/nfs.h
index 0dc7ad38a0da..b06375e88e58 100644
--- a/include/linux/nfs.h
+++ b/include/linux/nfs.h
@@ -36,14 +36,6 @@ static inline void nfs_copy_fh(struct nfs_fh *target, const struct nfs_fh *sourc
 	memcpy(target->data, source->data, source->size);
 }
 
-
-/*
- * This is really a general kernel constant, but since nothing like
- * this is defined in the kernel headers, I have to do it here.
- */
-#define NFS_OFFSET_MAX		((__s64)((~(__u64)0) >> 1))
-
-
 enum nfs3_stable_how {
 	NFS_UNSTABLE = 0,
 	NFS_DATA_SYNC = 1,


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 6/6] NFSD: Clamp WRITE offsets
  2022-01-27 16:08 [PATCH RFC 0/6] NFSD size, offset, and count sanity Chuck Lever
                   ` (4 preceding siblings ...)
  2022-01-27 16:08 ` [PATCH RFC 5/6] NFSD: Remove NFS_OFFSET_MAX Chuck Lever
@ 2022-01-27 16:09 ` Chuck Lever
  2022-01-27 18:13 ` [PATCH RFC 0/6] NFSD size, offset, and count sanity Frank Filz
  6 siblings, 0 replies; 11+ messages in thread
From: Chuck Lever @ 2022-01-27 16:09 UTC (permalink / raw)
  To: linux-nfs, linux-fsdevel

Ensure that a client cannot specify a WRITE range that falls in a
byte range outside what the kernel's internal types (such as loff_t,
which is signed) can represent. The kiocb iterators, invoked in
nfsd_vfs_write(), should properly limit write operations to within
the underlying file system's s_maxbytes.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs3proc.c |    7 +++++++
 fs/nfsd/nfs4proc.c |    4 +++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index 7bca219a8146..ba72bbff53a5 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -216,6 +216,12 @@ nfsd3_proc_write(struct svc_rqst *rqstp)
 				(unsigned long long) argp->offset,
 				argp->stable? " stable" : "");
 
+	resp->status = nfserr_fbig;
+	if (argp->offset >= OFFSET_MAX)
+		goto out;
+	if (argp->offset + argp->len >= OFFSET_MAX)
+		goto out;
+
 	fh_copy(&resp->fh, &argp->fh);
 	resp->committed = argp->stable;
 	nvecs = svc_fill_write_vector(rqstp, &argp->payload);
@@ -224,6 +230,7 @@ nfsd3_proc_write(struct svc_rqst *rqstp)
 				  rqstp->rq_vec, nvecs, &cnt,
 				  resp->committed, resp->verf);
 	resp->count = cnt;
+out:
 	return rpc_success;
 }
 
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index b8ac2b9bce74..2baf547b344f 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1022,7 +1022,9 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	int nvecs;
 
 	if (write->wr_offset >= OFFSET_MAX)
-		return nfserr_inval;
+		return nfserr_fbig;
+	if (write->wr_offset + write->wr_buflen >= OFFSET_MAX)
+		return nfserr_fbig;
 
 	cnt = write->wr_buflen;
 	trace_nfsd_write_start(rqstp, &cstate->current_fh,


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: [PATCH RFC 0/6] NFSD size, offset, and count sanity
  2022-01-27 16:08 [PATCH RFC 0/6] NFSD size, offset, and count sanity Chuck Lever
                   ` (5 preceding siblings ...)
  2022-01-27 16:09 ` [PATCH RFC 6/6] NFSD: Clamp WRITE offsets Chuck Lever
@ 2022-01-27 18:13 ` Frank Filz
  6 siblings, 0 replies; 11+ messages in thread
From: Frank Filz @ 2022-01-27 18:13 UTC (permalink / raw)
  To: 'Chuck Lever', linux-nfs, linux-fsdevel

Ooh, lots to consider for Ganesha...

And I see you are proposing a new pynfs test case, which I will have to remember the thought behind it and make sure that's covered for NFS v3 when I get back to the pynfs 3 tests.

Frank

> -----Original Message-----
> From: Chuck Lever [mailto:chuck.lever@oracle.com]
> Sent: Thursday, January 27, 2022 8:08 AM
> To: linux-nfs@vger.kernel.org; linux-fsdevel@vger.kernel.org
> Subject: [PATCH RFC 0/6] NFSD size, offset, and count sanity
> 
> Dan Aloni reported a problem with the way NFSD's READ implementation deals
> with the very upper end of file sizes, and I got interested in how some of the
> other operations handled it. I found some issues, and have started a (growing)
> pile of patches to deal with them.
> 
> Since at least the SETATTR case appears to cause a crash on some filesystems, I
> think several of these are 5.17-rc fodder (i.e., priority bug fixes). I see that NLM
> also has potential problems with how the max file size is handled, but since
> locking doesn't involve the page cache, I think fixes in that area can be delayed a
> bit.
> 
> Dan's still working on the READ issue. I need some input on whether I
> understand the problem correctly and whether the NFS status codes I've chosen
> to use are going to be reasonable or a problem for NFS clients. I've attempted to
> stay within the bound of the NFS specs, but sometimes the spec doesn't provide
> a mechanism in the protocol to indicate that the client passed us a bogus
> size/offset/count.
> 
> ---
> 
> Chuck Lever (6):
>       NFSD: Fix NFSv4 SETATTR's handling of large file sizes
>       NFSD: Fix NFSv3 SETATTR's handling of large file sizes
>       NFSD: COMMIT operations must not return NFS?ERR_INVAL
>       NFSD: Replace directory offset placeholder
>       NFSD: Remove NFS_OFFSET_MAX
>       NFSD: Clamp WRITE offsets
> 
> 
>  fs/nfsd/nfs3proc.c  | 32 +++++++++++++++++++++------
>  fs/nfsd/nfs3xdr.c   |  4 ++--
>  fs/nfsd/nfs4proc.c  |  7 +++++-
>  fs/nfsd/nfs4xdr.c   |  2 +-
>  fs/nfsd/vfs.c       | 53 ++++++++++++++++++++++++++++++---------------
>  fs/nfsd/vfs.h       |  4 ++--
>  include/linux/nfs.h |  8 -------
>  7 files changed, 72 insertions(+), 38 deletions(-)
> 
> --
> Chuck Lever


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC 1/6] NFSD: Fix NFSv4 SETATTR's handling of large file sizes
  2022-01-27 16:08 ` [PATCH RFC 1/6] NFSD: Fix NFSv4 SETATTR's handling of large file sizes Chuck Lever
@ 2022-01-28  0:36   ` Dave Chinner
  2022-01-28  1:48     ` Chuck Lever III
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2022-01-28  0:36 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs, linux-fsdevel

On Thu, Jan 27, 2022 at 11:08:31AM -0500, Chuck Lever wrote:
> iattr::ia_size is a loff_t. decode_fattr4() dumps a full u64 value
> in there. If that value is larger than S64_MAX, then ia_size has
> underflowed.
> 
> In this case the negative size is passed through to the VFS and
> underlying filesystems. I've observed XFS behavior: it returns
> EIO but still attempts to access past the end of the device.

What attempts to access beyond the end of the device? A file offset
is not a disk offset, and the filesystem cannot allocate blocks for
IO that are outside the device boundaries. So I don't understand how
setting an inode size of >LLONGMAX can cause the filesystem to
access blocks outside the range it can allocate and map IO to. If
this falls through to trying to access data outside the range the
filesystem is allowed to access then we've got a bug that needs to
be fixed.

Can you please clarify the behaviour that is occurring here (stack
traces demonstrating the IO path that leads to access past the end
of device would be useful) so we can look into this further?

> IOW it assumes the caller has already sanity-checked the value.

Every filesystem assumes that the iattr that is passed to ->setattr
by notify_change() has been sanity checked and the parameters are
within the valid VFS supported ranges, not just XFS. Perhaps this
check should be in notify_change, not in the callers?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC 1/6] NFSD: Fix NFSv4 SETATTR's handling of large file sizes
  2022-01-28  0:36   ` Dave Chinner
@ 2022-01-28  1:48     ` Chuck Lever III
  2022-02-09 21:55       ` Dave Chinner
  0 siblings, 1 reply; 11+ messages in thread
From: Chuck Lever III @ 2022-01-28  1:48 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Linux NFS Mailing List, linux-fsdevel



> On Jan 27, 2022, at 7:36 PM, Dave Chinner <david@fromorbit.com> wrote:
> 
> On Thu, Jan 27, 2022 at 11:08:31AM -0500, Chuck Lever wrote:
>> iattr::ia_size is a loff_t. decode_fattr4() dumps a full u64 value
>> in there. If that value is larger than S64_MAX, then ia_size has
>> underflowed.
>> 
>> In this case the negative size is passed through to the VFS and
>> underlying filesystems. I've observed XFS behavior: it returns
>> EIO but still attempts to access past the end of the device.
> 
> What attempts to access beyond the end of the device? A file offset
> is not a disk offset, and the filesystem cannot allocate blocks for
> IO that are outside the device boundaries. So I don't understand how
> setting an inode size of >LLONGMAX can cause the filesystem to
> access blocks outside the range it can allocate and map IO to. If
> this falls through to trying to access data outside the range the
> filesystem is allowed to access then we've got a bug that needs to
> be fixed.
> 
> Can you please clarify the behaviour that is occurring here (stack
> traces demonstrating the IO path that leads to access past the end
> of device would be useful) so we can look into this further?

Reproducer:

I constructed a pynfs test that sends an NFSv4.0 SETATTR request
to set the file length to U64_MAX. That test was applied to pynfs
today.

git://git.linux-nfs.org/projects/bfields/pynfs.git

I will note that I tried this test against a tmpfs export as
well. No crash, but a subsequent GETATTR returned U64_MAX,
which is surprising. There's really no checking in that path
either.


Note below: md0 is the device where this filesystem resides.
It's a pair of 3TB PCIe NVMe cards striped together. Kernel at
the time on this system was 5.17-rc1 + a few NFSD patches.

Jan 26 16:01:26 klimt.1015granger.net rpc.mountd[972]: v4.0 client attached: 0x61bb6d4261eef9f6 from "192.168.1.67:53636"
Jan 26 16:01:26 klimt.1015granger.net kernel: ------------[ cut here ]------------
Jan 26 16:01:26 klimt.1015granger.net kernel: WARNING: CPU: 2 PID: 1009 at fs/iomap/iter.c:33 iomap_iter+0x1b5/0x272
Jan 26 16:01:26 klimt.1015granger.net kernel: Modules linked in: rfkill rpcrdma rdma_ucm ib_srpt ib_umad ib_isert ib_ipoib iscsi_target_mod ib_>
Jan 26 16:01:26 klimt.1015granger.net kernel: CPU: 2 PID: 1009 Comm: nfsd Not tainted 5.17.0-rc1-00165-g2785fad9b745 #3338
Jan 26 16:01:26 klimt.1015granger.net kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 3.3 10/28/2020
Jan 26 16:01:26 klimt.1015granger.net kernel: RIP: 0010:iomap_iter+0x1b5/0x272
Jan 26 16:01:26 klimt.1015granger.net kernel: Code: 8b 73 08 49 8b 04 24 4d 89 e9 4d 89 f0 48 8b 3b e8 c0 79 8e 00 85 c0 0f 88 c1 00 00 00 48 8>
Jan 26 16:01:26 klimt.1015granger.net kernel: RSP: 0018:ffffa65701ea3a80 EFLAGS: 00010213
Jan 26 16:01:26 klimt.1015granger.net kernel: RAX: 0000000000000000 RBX: ffffa65701ea3ad0 RCX: ffff8ada8cf0f840
Jan 26 16:01:26 klimt.1015granger.net kernel: RDX: ffffffffffffffff RSI: ffff8ada49411000 RDI: ffff8ada8cf0f840
Jan 26 16:01:26 klimt.1015granger.net kernel: RBP: ffffa65701ea3aa0 R08: ffff8ada4a56b000 R09: ffffa65701ea3b40
Jan 26 16:01:26 klimt.1015granger.net kernel: R10: ffffa65701ea39a0 R11: 000000000efc25f5 R12: ffffffffc06d6100
Jan 26 16:01:26 klimt.1015granger.net kernel: R13: ffffa65701ea3b40 R14: ffffa65701ea3af8 R15: ffff8ada4a56b000
Jan 26 16:01:26 klimt.1015granger.net kernel: FS:  0000000000000000(0000) GS:ffff8ae97fd00000(0000) knlGS:0000000000000000
Jan 26 16:01:26 klimt.1015granger.net kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 26 16:01:26 klimt.1015granger.net kernel: CR2: 00007ffff8126260 CR3: 00000001a16dc001 CR4: 00000000001706e0
Jan 26 16:01:26 klimt.1015granger.net kernel: Call Trace:
Jan 26 16:01:26 klimt.1015granger.net kernel:  <TASK>
Jan 26 16:01:26 klimt.1015granger.net kernel:  iomap_zero_range+0x6c/0x1a9
Jan 26 16:01:26 klimt.1015granger.net kernel:  ? __radix_tree_lookup+0x2f/0xac
Jan 26 16:01:26 klimt.1015granger.net kernel:  iomap_truncate_page+0x31/0x36
Jan 26 16:01:26 klimt.1015granger.net kernel:  xfs_truncate_page+0x39/0x3b [xfs]
Jan 26 16:01:26 klimt.1015granger.net kernel:  xfs_setattr_size+0x11a/0x306 [xfs]
Jan 26 16:01:26 klimt.1015granger.net kernel:  xfs_vn_setattr_size+0x4e/0x57 [xfs]
Jan 26 16:01:26 klimt.1015granger.net kernel:  xfs_vn_setattr+0x67/0xb1 [xfs]
Jan 26 16:01:26 klimt.1015granger.net kernel:  notify_change+0x2ac/0x3a2
Jan 26 16:01:26 klimt.1015granger.net kernel:  nfsd_setattr+0x200/0x268 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  ? nfsd_setattr+0x200/0x268 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  nfsd4_setattr+0xf1/0x130 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  nfsd4_proc_compound+0x337/0x474 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  nfsd_dispatch+0x1a9/0x260 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  svc_process_common+0x331/0x4bc [sunrpc]
Jan 26 16:01:26 klimt.1015granger.net kernel:  ? nfsd_svc+0x2f5/0x2f5 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  svc_process+0xc8/0xe7 [sunrpc]
Jan 26 16:01:26 klimt.1015granger.net kernel:  nfsd+0xdd/0x160 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  kthread+0xf7/0xff
Jan 26 16:01:26 klimt.1015granger.net kernel:  ? nfsd_shutdown_threads+0x65/0x65 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  ? kthread_complete_and_exit+0x20/0x20
Jan 26 16:01:26 klimt.1015granger.net kernel:  ret_from_fork+0x22/0x30
Jan 26 16:01:26 klimt.1015granger.net kernel:  </TASK>
Jan 26 16:01:26 klimt.1015granger.net kernel: ---[ end trace 0000000000000000 ]---
Jan 26 16:01:26 klimt.1015granger.net kernel: ------------[ cut here ]------------
Jan 26 16:01:26 klimt.1015granger.net kernel: WARNING: CPU: 2 PID: 1009 at fs/iomap/iter.c:35 iomap_iter+0x1ca/0x272
Jan 26 16:01:26 klimt.1015granger.net kernel: Modules linked in: rfkill rpcrdma rdma_ucm ib_srpt ib_umad ib_isert ib_ipoib iscsi_target_mod ib_>
Jan 26 16:01:26 klimt.1015granger.net kernel: CPU: 2 PID: 1009 Comm: nfsd Tainted: G        W         5.17.0-rc1-00165-g2785fad9b745 #3338
Jan 26 16:01:26 klimt.1015granger.net kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 3.3 10/28/2020
Jan 26 16:01:26 klimt.1015granger.net kernel: RIP: 0010:iomap_iter+0x1ca/0x272
Jan 26 16:01:26 klimt.1015granger.net kernel: Code: 85 c0 0f 88 c1 00 00 00 48 8b 43 30 48 8b 53 08 48 39 d0 7e 02 0f 0b 48 8b 4b 38 48 85 c9 7>
Jan 26 16:01:26 klimt.1015granger.net kernel: RSP: 0018:ffffa65701ea3a80 EFLAGS: 00010293
Jan 26 16:01:26 klimt.1015granger.net kernel: RAX: f8ada41a253c0000 RBX: ffffa65701ea3ad0 RCX: f8ada41a253c0000
Jan 26 16:01:26 klimt.1015granger.net kernel: RDX: ffffffffffffffff RSI: ffff8ada49411000 RDI: ffff8ada8cf0f840
Jan 26 16:01:26 klimt.1015granger.net kernel: RBP: ffffa65701ea3aa0 R08: ffff8ada4a56b000 R09: ffffa65701ea3b40
Jan 26 16:01:26 klimt.1015granger.net kernel: R10: ffffa65701ea39a0 R11: 000000000efc25f5 R12: ffffffffc06d6100
Jan 26 16:01:26 klimt.1015granger.net kernel: R13: ffffa65701ea3b40 R14: ffffa65701ea3af8 R15: ffff8ada4a56b000
Jan 26 16:01:26 klimt.1015granger.net kernel: FS:  0000000000000000(0000) GS:ffff8ae97fd00000(0000) knlGS:0000000000000000
Jan 26 16:01:26 klimt.1015granger.net kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 26 16:01:26 klimt.1015granger.net kernel: CR2: 00007ffff8126260 CR3: 00000001a16dc001 CR4: 00000000001706e0
Jan 26 16:01:26 klimt.1015granger.net kernel: Call Trace:
Jan 26 16:01:26 klimt.1015granger.net kernel:  <TASK>
Jan 26 16:01:26 klimt.1015granger.net kernel:  iomap_zero_range+0x6c/0x1a9
Jan 26 16:01:26 klimt.1015granger.net kernel:  ? __radix_tree_lookup+0x2f/0xac
Jan 26 16:01:26 klimt.1015granger.net kernel:  iomap_truncate_page+0x31/0x36
Jan 26 16:01:26 klimt.1015granger.net kernel:  xfs_truncate_page+0x39/0x3b [xfs]
Jan 26 16:01:26 klimt.1015granger.net kernel:  xfs_setattr_size+0x11a/0x306 [xfs]
Jan 26 16:01:26 klimt.1015granger.net kernel:  xfs_vn_setattr_size+0x4e/0x57 [xfs]
Jan 26 16:01:26 klimt.1015granger.net kernel:  xfs_vn_setattr+0x67/0xb1 [xfs]
Jan 26 16:01:26 klimt.1015granger.net kernel:  notify_change+0x2ac/0x3a2
Jan 26 16:01:26 klimt.1015granger.net kernel:  nfsd_setattr+0x200/0x268 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  ? nfsd_setattr+0x200/0x268 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  nfsd4_setattr+0xf1/0x130 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  nfsd4_proc_compound+0x337/0x474 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  nfsd_dispatch+0x1a9/0x260 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  svc_process_common+0x331/0x4bc [sunrpc]
Jan 26 16:01:26 klimt.1015granger.net kernel:  ? nfsd_svc+0x2f5/0x2f5 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  svc_process+0xc8/0xe7 [sunrpc]
Jan 26 16:01:26 klimt.1015granger.net kernel:  nfsd+0xdd/0x160 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  kthread+0xf7/0xff
Jan 26 16:01:26 klimt.1015granger.net kernel:  ? nfsd_shutdown_threads+0x65/0x65 [nfsd]
Jan 26 16:01:26 klimt.1015granger.net kernel:  ? kthread_complete_and_exit+0x20/0x20
Jan 26 16:01:26 klimt.1015granger.net kernel:  ret_from_fork+0x22/0x30
Jan 26 16:01:26 klimt.1015granger.net kernel:  </TASK>
Jan 26 16:01:26 klimt.1015granger.net kernel: ---[ end trace 0000000000000000 ]---
Jan 26 16:01:26 klimt.1015granger.net kernel: nfsd: attempt to access beyond end of device
                                              md0: rw=2048, want=19907765165852672, limit=12501942272


>> IOW it assumes the caller has already sanity-checked the value.
> 
> Every filesystem assumes that the iattr that is passed to ->setattr
> by notify_change() has been sanity checked and the parameters are
> within the valid VFS supported ranges, not just XFS. Perhaps this
> check should be in notify_change, not in the callers?

My (limited) understanding of the VFS code is that functions at
the notify_change() level expect that its callers will have
already sanitized the input -- those callers are largely the
system call routines. That's why I chose to address this in NFSD.

Maybe inode_newsize_ok() needs to check for negative size values?


--
Chuck Lever




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC 1/6] NFSD: Fix NFSv4 SETATTR's handling of large file sizes
  2022-01-28  1:48     ` Chuck Lever III
@ 2022-02-09 21:55       ` Dave Chinner
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2022-02-09 21:55 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Linux NFS Mailing List, linux-fsdevel

On Fri, Jan 28, 2022 at 01:48:48AM +0000, Chuck Lever III wrote:
> 
> 
> > On Jan 27, 2022, at 7:36 PM, Dave Chinner <david@fromorbit.com> wrote:
> > 
> > On Thu, Jan 27, 2022 at 11:08:31AM -0500, Chuck Lever wrote:
> >> IOW it assumes the caller has already sanity-checked the value.
> > 
> > Every filesystem assumes that the iattr that is passed to ->setattr
> > by notify_change() has been sanity checked and the parameters are
> > within the valid VFS supported ranges, not just XFS. Perhaps this
> > check should be in notify_change, not in the callers?
> 
> My (limited) understanding of the VFS code is that functions at
> the notify_change() level expect that its callers will have
> already sanitized the input -- those callers are largely the
> system call routines. That's why I chose to address this in NFSD.
> 
> Maybe inode_newsize_ok() needs to check for negative size values?

Yeah, that would seem reasonable - the size passed to it is a
loff_t, and it's not checked for overflows/negative values. So if it
checked for offset < 0 if would catch this....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-02-09 22:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-27 16:08 [PATCH RFC 0/6] NFSD size, offset, and count sanity Chuck Lever
2022-01-27 16:08 ` [PATCH RFC 1/6] NFSD: Fix NFSv4 SETATTR's handling of large file sizes Chuck Lever
2022-01-28  0:36   ` Dave Chinner
2022-01-28  1:48     ` Chuck Lever III
2022-02-09 21:55       ` Dave Chinner
2022-01-27 16:08 ` [PATCH RFC 2/6] NFSD: Fix NFSv3 " Chuck Lever
2022-01-27 16:08 ` [PATCH RFC 3/6] NFSD: COMMIT operations must not return NFS?ERR_INVAL Chuck Lever
2022-01-27 16:08 ` [PATCH RFC 4/6] NFSD: Replace directory offset placeholder Chuck Lever
2022-01-27 16:08 ` [PATCH RFC 5/6] NFSD: Remove NFS_OFFSET_MAX Chuck Lever
2022-01-27 16:09 ` [PATCH RFC 6/6] NFSD: Clamp WRITE offsets Chuck Lever
2022-01-27 18:13 ` [PATCH RFC 0/6] NFSD size, offset, and count sanity Frank Filz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.