linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/8] Make NFSv4 OPEN(CREATE) less brittle
@ 2022-04-20 18:28 Chuck Lever
  2022-04-20 18:28 ` [PATCH RFC 1/8] NFSD: Clean up nfsd3_proc_create() Chuck Lever
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: Chuck Lever @ 2022-04-20 18:28 UTC (permalink / raw)
  To: linux-nfs, linux-fsdevel

Attempt to address occasional reports of test failures caused by
NFSv4 OPEN(CREATE) having failing internally after the target
file object has been created.

The basic approach is to re-organize the NFSv4 OPEN code path so
that common failure modes occur /before/ the call to vfs_create()
rather than afterwards. In addition, the file is opened and created
atomically so that another client can't race and de-permit the
file just after it was created but before the server has opened it.

So far I haven't found any regressions. However I have not been
able to reproduce the original failures.


---

Chuck Lever (8):
      NFSD: Clean up nfsd3_proc_create()
      NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create()
      NFSD: Refactor nfsd_create_setattr()
      NFSD: Refactor NFSv3 CREATE
      NFSD: Refactor NFSv4 OPEN(CREATE)
      NFSD: Remove do_nfsd_create()
      NFSD: Clean up nfsd_open_verified()
      NFSD: Instantiate a struct file when creating a regular NFSv4 file


 fs/nfsd/filecache.c |  51 +++++++--
 fs/nfsd/filecache.h |   2 +
 fs/nfsd/nfs3proc.c  | 141 +++++++++++++++++++++----
 fs/nfsd/nfs4proc.c  | 197 +++++++++++++++++++++++++++++++++--
 fs/nfsd/nfs4state.c |  16 ++-
 fs/nfsd/vfs.c       | 245 ++++++++++----------------------------------
 fs/nfsd/vfs.h       |  14 +--
 fs/nfsd/xdr4.h      |   1 +
 fs/open.c           |  44 ++++++++
 include/linux/fs.h  |   2 +
 10 files changed, 471 insertions(+), 242 deletions(-)

--
Chuck Lever


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RFC 1/8] NFSD: Clean up nfsd3_proc_create()
  2022-04-20 18:28 [PATCH RFC 0/8] Make NFSv4 OPEN(CREATE) less brittle Chuck Lever
@ 2022-04-20 18:28 ` Chuck Lever
  2022-04-20 18:28 ` [PATCH RFC 2/8] NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create() Chuck Lever
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Chuck Lever @ 2022-04-20 18:28 UTC (permalink / raw)
  To: linux-nfs, linux-fsdevel

As near as I can tell, mode bit masking and setting S_IFREG is
already done by do_nfsd_create() and vfs_create(). The NFSv4 path
(do_open_lookup), for example, does not bother with this special
processing.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs3proc.c |   16 ++--------------
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index 936eebd4c56d..981a2a71c5af 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -229,8 +229,7 @@ nfsd3_proc_create(struct svc_rqst *rqstp)
 {
 	struct nfsd3_createargs *argp = rqstp->rq_argp;
 	struct nfsd3_diropres *resp = rqstp->rq_resp;
-	svc_fh		*dirfhp, *newfhp = NULL;
-	struct iattr	*attr;
+	svc_fh *dirfhp, *newfhp;
 
 	dprintk("nfsd: CREATE(3)   %s %.*s\n",
 				SVCFH_fmt(&argp->fh),
@@ -239,20 +238,9 @@ nfsd3_proc_create(struct svc_rqst *rqstp)
 
 	dirfhp = fh_copy(&resp->dirfh, &argp->fh);
 	newfhp = fh_init(&resp->fh, NFS3_FHSIZE);
-	attr   = &argp->attrs;
-
-	/* Unfudge the mode bits */
-	attr->ia_mode &= ~S_IFMT;
-	if (!(attr->ia_valid & ATTR_MODE)) { 
-		attr->ia_valid |= ATTR_MODE;
-		attr->ia_mode = S_IFREG;
-	} else {
-		attr->ia_mode = (attr->ia_mode & ~S_IFMT) | S_IFREG;
-	}
 
-	/* Now create the file and set attributes */
 	resp->status = do_nfsd_create(rqstp, dirfhp, argp->name, argp->len,
-				      attr, newfhp, argp->createmode,
+				      &argp->attrs, newfhp, argp->createmode,
 				      (u32 *)argp->verf, NULL, NULL);
 	return rpc_success;
 }



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFC 2/8] NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create()
  2022-04-20 18:28 [PATCH RFC 0/8] Make NFSv4 OPEN(CREATE) less brittle Chuck Lever
  2022-04-20 18:28 ` [PATCH RFC 1/8] NFSD: Clean up nfsd3_proc_create() Chuck Lever
@ 2022-04-20 18:28 ` Chuck Lever
  2022-04-20 18:28 ` [PATCH RFC 3/8] NFSD: Refactor nfsd_create_setattr() Chuck Lever
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Chuck Lever @ 2022-04-20 18:28 UTC (permalink / raw)
  To: linux-nfs, linux-fsdevel

Clean up: The "out" label already invokes fh_drop_write().

Note that fh_drop_write() is already careful not to invoke
mnt_drop_write() if either it has already been done or there is
nothing to drop. Therefore no change in behavior is expected.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/vfs.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index c22ad0532e8e..97fee9b33490 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1485,7 +1485,6 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		case NFS3_CREATE_GUARDED:
 			err = nfserr_exist;
 		}
-		fh_drop_write(fhp);
 		goto out;
 	}
 
@@ -1493,10 +1492,8 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		iap->ia_mode &= ~current_umask();
 
 	host_err = vfs_create(&init_user_ns, dirp, dchild, iap->ia_mode, true);
-	if (host_err < 0) {
-		fh_drop_write(fhp);
+	if (host_err < 0)
 		goto out_nfserr;
-	}
 	if (created)
 		*created = true;
 



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFC 3/8] NFSD: Refactor nfsd_create_setattr()
  2022-04-20 18:28 [PATCH RFC 0/8] Make NFSv4 OPEN(CREATE) less brittle Chuck Lever
  2022-04-20 18:28 ` [PATCH RFC 1/8] NFSD: Clean up nfsd3_proc_create() Chuck Lever
  2022-04-20 18:28 ` [PATCH RFC 2/8] NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create() Chuck Lever
@ 2022-04-20 18:28 ` Chuck Lever
  2022-04-20 18:29 ` [PATCH RFC 4/8] NFSD: Refactor NFSv3 CREATE Chuck Lever
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Chuck Lever @ 2022-04-20 18:28 UTC (permalink / raw)
  To: linux-nfs, linux-fsdevel

I'd like to move do_nfsd_create() out of vfs.c. Therefore
nfsd_create_setattr() needs to be made publicly visible.

Note that both call sites in vfs.c commit both the new object and
its parent directory, so just combine those common metadata commits
into nfsd_create_setattr().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/vfs.c |   79 ++++++++++++++++++++++++++++++---------------------------
 fs/nfsd/vfs.h |    2 +
 2 files changed, 44 insertions(+), 37 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 97fee9b33490..ed58f7d46730 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1187,14 +1187,26 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, u64 offset,
 	return err;
 }
 
-static __be32
-nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *resfhp,
-			struct iattr *iap)
+/**
+ * nfsd_create_setattr - Set a created file's attributes
+ * @rqstp: RPC transaction being executed
+ * @fhp: NFS filehandle of parent directory
+ * @resfhp: NFS filehandle of new object
+ * @iap: requested attributes of new object
+ *
+ * Returns nfs_ok on success, or an nfsstat in network byte order.
+ */
+__be32
+nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp,
+		    struct svc_fh *resfhp, struct iattr *iap)
 {
+	__be32 status;
+
 	/*
-	 * Mode has already been set earlier in create:
+	 * Mode has already been set by file creation.
 	 */
 	iap->ia_valid &= ~ATTR_MODE;
+
 	/*
 	 * Setting uid/gid works only for root.  Irix appears to
 	 * send along the gid on create when it tries to implement
@@ -1202,10 +1214,31 @@ nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *resfhp,
 	 */
 	if (!uid_eq(current_fsuid(), GLOBAL_ROOT_UID))
 		iap->ia_valid &= ~(ATTR_UID|ATTR_GID);
+
+	/*
+	 * Callers expect new file metadata to be committed even
+	 * if the attributes have not changed.
+	 */
 	if (iap->ia_valid)
-		return nfsd_setattr(rqstp, resfhp, iap, 0, (time64_t)0);
-	/* Callers expect file metadata to be committed here */
-	return nfserrno(commit_metadata(resfhp));
+		status = nfsd_setattr(rqstp, resfhp, iap, 0, (time64_t)0);
+	else
+		status = nfserrno(commit_metadata(resfhp));
+
+	/*
+	 * Transactional filesystems had a chance to commit changes
+	 * for both parent and child simultaneously making the
+	 * following commit_metadata a noop in many cases.
+	 */
+	if (!status)
+		status = nfserrno(commit_metadata(fhp));
+
+	/*
+	 * Update the new filehandle to pick up the new attributes.
+	 */
+	if (!status)
+		status = fh_update(resfhp);
+
+	return status;
 }
 
 /* HPUX client sometimes creates a file in mode 000, and sets size to 0.
@@ -1232,7 +1265,6 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	struct dentry	*dentry, *dchild;
 	struct inode	*dirp;
 	__be32		err;
-	__be32		err2;
 	int		host_err;
 
 	dentry = fhp->fh_dentry;
@@ -1305,22 +1337,8 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (host_err < 0)
 		goto out_nfserr;
 
-	err = nfsd_create_setattr(rqstp, resfhp, iap);
+	err = nfsd_create_setattr(rqstp, fhp, resfhp, iap);
 
-	/*
-	 * nfsd_create_setattr already committed the child.  Transactional
-	 * filesystems had a chance to commit changes for both parent and
-	 * child simultaneously making the following commit_metadata a
-	 * noop.
-	 */
-	err2 = nfserrno(commit_metadata(fhp));
-	if (err2)
-		err = err2;
-	/*
-	 * Update the file handle to get the new inode info.
-	 */
-	if (!err)
-		err = fh_update(resfhp);
 out:
 	dput(dchild);
 	return err;
@@ -1511,20 +1529,7 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	}
 
  set_attr:
-	err = nfsd_create_setattr(rqstp, resfhp, iap);
-
-	/*
-	 * nfsd_create_setattr already committed the child
-	 * (and possibly also the parent).
-	 */
-	if (!err)
-		err = nfserrno(commit_metadata(fhp));
-
-	/*
-	 * Update the filehandle to get the new inode info.
-	 */
-	if (!err)
-		err = fh_update(resfhp);
+	err = nfsd_create_setattr(rqstp, fhp, resfhp, iap);
 
  out:
 	fh_unlock(fhp);
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index ccb87b2864f6..1f32a83456b0 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -69,6 +69,8 @@ __be32		nfsd_create(struct svc_rqst *, struct svc_fh *,
 				char *name, int len, struct iattr *attrs,
 				int type, dev_t rdev, struct svc_fh *res);
 __be32		nfsd_access(struct svc_rqst *, struct svc_fh *, u32 *, u32 *);
+__be32		nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp,
+				struct svc_fh *resfhp, struct iattr *iap);
 __be32		do_nfsd_create(struct svc_rqst *, struct svc_fh *,
 				char *name, int len, struct iattr *attrs,
 				struct svc_fh *res, int createmode,



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFC 4/8] NFSD: Refactor NFSv3 CREATE
  2022-04-20 18:28 [PATCH RFC 0/8] Make NFSv4 OPEN(CREATE) less brittle Chuck Lever
                   ` (2 preceding siblings ...)
  2022-04-20 18:28 ` [PATCH RFC 3/8] NFSD: Refactor nfsd_create_setattr() Chuck Lever
@ 2022-04-20 18:29 ` Chuck Lever
  2022-04-20 19:10   ` J. Bruce Fields
  2022-04-20 18:29 ` [PATCH RFC 5/8] NFSD: Refactor NFSv4 OPEN(CREATE) Chuck Lever
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 13+ messages in thread
From: Chuck Lever @ 2022-04-20 18:29 UTC (permalink / raw)
  To: linux-nfs, linux-fsdevel

The NFSv3 CREATE and NFSv4 OPEN(CREATE) use cases are about to
diverge such that it makes sense to split do_nfsd_create() into one
version for NFSv3 and one for NFSv4.

As a first step, copy do_nfsd_create() to nfs3proc.c and remove
NFSv4-specific logic.

One immediate legibility benefit is that the logic for handling
NFSv3 createhow is now quite straightforward. NFSv4 createhow
has some subtleties that IMO do not belong in generic code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs3proc.c |  127 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 121 insertions(+), 6 deletions(-)

diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index 981a2a71c5af..981a3a7a6e16 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -8,6 +8,7 @@
 #include <linux/fs.h>
 #include <linux/ext2_fs.h>
 #include <linux/magic.h>
+#include <linux/namei.h>
 
 #include "cache.h"
 #include "xdr3.h"
@@ -220,10 +221,126 @@ nfsd3_proc_write(struct svc_rqst *rqstp)
 }
 
 /*
- * With NFSv3, CREATE processing is a lot easier than with NFSv2.
- * At least in theory; we'll see how it fares in practice when the
- * first reports about SunOS compatibility problems start to pour in...
+ * Implement NFSv3's unchecked, guarded, and exclusive CREATE
+ * semantics for regular files. Except for the created file,
+ * this operation is stateless on the server.
+ *
+ * Upon return, caller must release @fhp and @resfhp.
  */
+static __be32
+nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
+		  struct svc_fh *resfhp, struct nfsd3_createargs *argp)
+{
+	struct iattr *iap = &argp->attrs;
+	struct dentry *parent, *child;
+	__u32 v_mtime, v_atime;
+	struct inode *inode;
+	__be32 status;
+	int host_err;
+
+	if (isdotent(argp->name, argp->len))
+		return nfserr_exist;
+	if (!(iap->ia_valid & ATTR_MODE))
+		iap->ia_mode = 0;
+
+	status = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
+	if (status != nfs_ok)
+		return status;
+
+	parent = fhp->fh_dentry;
+	inode = d_inode(parent);
+
+	host_err = fh_want_write(fhp);
+	if (host_err)
+		return nfserrno(host_err);
+
+	fh_lock_nested(fhp, I_MUTEX_PARENT);
+
+	child = lookup_one_len(argp->name, parent, argp->len);
+	if (IS_ERR(child)) {
+		status = nfserrno(PTR_ERR(child));
+		goto out;
+	}
+
+	if (d_really_is_negative(child)) {
+		status = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
+		if (status != nfs_ok)
+			goto out;
+	}
+
+	status = fh_compose(resfhp, fhp->fh_export, child, fhp);
+	if (status != nfs_ok)
+		goto out;
+
+	v_mtime = 0;
+	v_atime = 0;
+	if (argp->createmode == NFS3_CREATE_EXCLUSIVE) {
+		u32 *verifier = (u32 *)argp->verf;
+
+		/*
+		 * Solaris 7 gets confused (bugid 4218508) if these have
+		 * the high bit set, as do xfs filesystems without the
+		 * "bigtime" feature. So just clear the high bits.
+		 */
+		v_mtime = verifier[0] & 0x7fffffff;
+		v_atime = verifier[1] & 0x7fffffff;
+	}
+
+	if (d_really_is_positive(child)) {
+		status = nfs_ok;
+
+		switch (argp->createmode) {
+		case NFS3_CREATE_UNCHECKED:
+			if (!d_is_reg(child))
+				break;
+			iap->ia_valid &= ATTR_SIZE;
+			goto set_attr;
+		case NFS3_CREATE_GUARDED:
+			status = nfserr_exist;
+			break;
+		case NFS3_CREATE_EXCLUSIVE:
+			if (d_inode(child)->i_mtime.tv_sec == v_mtime &&
+			    d_inode(child)->i_atime.tv_sec == v_atime &&
+			    d_inode(child)->i_size == 0) {
+				break;
+			}
+			status = nfserr_exist;
+		}
+		goto out;
+	}
+
+	if (!IS_POSIXACL(inode))
+		iap->ia_mode &= ~current_umask();
+
+	host_err = vfs_create(&init_user_ns, inode, child, iap->ia_mode, true);
+	if (host_err < 0) {
+		status = nfserrno(host_err);
+		goto out;
+	}
+
+	/* A newly created file already has a file size of zero. */
+	if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0))
+		iap->ia_valid &= ~ATTR_SIZE;
+	if (argp->createmode == NFS3_CREATE_EXCLUSIVE) {
+		iap->ia_valid = ATTR_MTIME | ATTR_ATIME |
+				ATTR_MTIME_SET | ATTR_ATIME_SET;
+		iap->ia_mtime.tv_sec = v_mtime;
+		iap->ia_atime.tv_sec = v_atime;
+		iap->ia_mtime.tv_nsec = 0;
+		iap->ia_atime.tv_nsec = 0;
+	}
+
+set_attr:
+	status = nfsd_create_setattr(rqstp, fhp, resfhp, iap);
+
+out:
+	fh_unlock(fhp);
+	if (child && !IS_ERR(child))
+		dput(child);
+	fh_drop_write(fhp);
+	return status;
+}
+
 static __be32
 nfsd3_proc_create(struct svc_rqst *rqstp)
 {
@@ -239,9 +356,7 @@ nfsd3_proc_create(struct svc_rqst *rqstp)
 	dirfhp = fh_copy(&resp->dirfh, &argp->fh);
 	newfhp = fh_init(&resp->fh, NFS3_FHSIZE);
 
-	resp->status = do_nfsd_create(rqstp, dirfhp, argp->name, argp->len,
-				      &argp->attrs, newfhp, argp->createmode,
-				      (u32 *)argp->verf, NULL, NULL);
+	resp->status = nfsd3_create_file(rqstp, dirfhp, newfhp, argp);
 	return rpc_success;
 }
 



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFC 5/8] NFSD: Refactor NFSv4 OPEN(CREATE)
  2022-04-20 18:28 [PATCH RFC 0/8] Make NFSv4 OPEN(CREATE) less brittle Chuck Lever
                   ` (3 preceding siblings ...)
  2022-04-20 18:29 ` [PATCH RFC 4/8] NFSD: Refactor NFSv3 CREATE Chuck Lever
@ 2022-04-20 18:29 ` Chuck Lever
  2022-04-20 18:29 ` [PATCH RFC 6/8] NFSD: Remove do_nfsd_create() Chuck Lever
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Chuck Lever @ 2022-04-20 18:29 UTC (permalink / raw)
  To: linux-nfs, linux-fsdevel

Copy do_nfsd_create() to nfs4proc.c and remove NFSv3-specific logic.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs4proc.c |  162 +++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 152 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index b207c76a873f..99c0485a29e9 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -37,6 +37,8 @@
 #include <linux/falloc.h>
 #include <linux/slab.h>
 #include <linux/kthread.h>
+#include <linux/namei.h>
+
 #include <linux/sunrpc/addr.h>
 #include <linux/nfs_ssc.h>
 
@@ -235,6 +237,154 @@ static void nfsd4_set_open_owner_reply_cache(struct nfsd4_compound_state *cstate
 			&resfh->fh_handle);
 }
 
+static inline bool nfsd4_create_is_exclusive(int createmode)
+{
+	return createmode == NFS4_CREATE_EXCLUSIVE ||
+		createmode == NFS4_CREATE_EXCLUSIVE4_1;
+}
+
+/*
+ * Implement NFSv4's unchecked, guarded, and exclusive create
+ * semantics for regular files. Open state for this new file is
+ * subsequently fabricated in nfsd4_process_open2().
+ *
+ * Upon return, caller must release @fhp and @resfhp.
+ */
+static __be32
+nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
+		  struct svc_fh *resfhp, struct nfsd4_open *open)
+{
+	struct iattr *iap = &open->op_iattr;
+	struct dentry *parent, *child;
+	__u32 v_mtime, v_atime;
+	struct inode *inode;
+	__be32 status;
+	int host_err;
+
+	if (isdotent(open->op_fname, open->op_fnamelen))
+		return nfserr_exist;
+	if (!(iap->ia_valid & ATTR_MODE))
+		iap->ia_mode = 0;
+
+	status = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
+	if (status != nfs_ok)
+		return status;
+	parent = fhp->fh_dentry;
+	inode = d_inode(parent);
+
+	host_err = fh_want_write(fhp);
+	if (host_err)
+		return nfserrno(host_err);
+
+	fh_lock_nested(fhp, I_MUTEX_PARENT);
+
+	child = lookup_one_len(open->op_fname, parent, open->op_fnamelen);
+	if (IS_ERR(child)) {
+		status = nfserrno(PTR_ERR(child));
+		goto out;
+	}
+
+	if (d_really_is_negative(child)) {
+		status = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
+		if (status != nfs_ok)
+			goto out;
+	}
+
+	status = fh_compose(resfhp, fhp->fh_export, child, fhp);
+	if (status != nfs_ok)
+		goto out;
+
+	v_mtime = 0;
+	v_atime = 0;
+	if (nfsd4_create_is_exclusive(open->op_createmode)) {
+		u32 *verifier = (u32 *)open->op_verf.data;
+
+		/*
+		 * Solaris 7 gets confused (bugid 4218508) if these have
+		 * the high bit set, as do xfs filesystems without the
+		 * "bigtime" feature. So just clear the high bits. If this
+		 * is ever changed to use different attrs for storing the
+		 * verifier, then do_open_lookup() will also need to be
+		 * fixed accordingly.
+		 */
+		v_mtime = verifier[0] & 0x7fffffff;
+		v_atime = verifier[1] & 0x7fffffff;
+	}
+
+	if (d_really_is_positive(child)) {
+		status = nfs_ok;
+
+		switch (open->op_createmode) {
+		case NFS4_CREATE_UNCHECKED:
+			if (!d_is_reg(child))
+				break;
+
+			/*
+			 * In NFSv4, we don't want to truncate the file
+			 * now. This would be wrong if the OPEN fails for
+			 * some other reason. Furthermore, if the size is
+			 * nonzero, we should ignore it according to spec!
+			 */
+			open->op_truncate = (iap->ia_valid & ATTR_SIZE) &&
+						!iap->ia_size;
+			break;
+		case NFS4_CREATE_GUARDED:
+			status = nfserr_exist;
+			break;
+		case NFS4_CREATE_EXCLUSIVE:
+			if (d_inode(child)->i_mtime.tv_sec == v_mtime &&
+			    d_inode(child)->i_atime.tv_sec == v_atime &&
+			    d_inode(child)->i_size == 0) {
+				open->op_created = true;
+				break;		/* subtle */
+			}
+			status = nfserr_exist;
+			break;
+		case NFS4_CREATE_EXCLUSIVE4_1:
+			if (d_inode(child)->i_mtime.tv_sec == v_mtime &&
+			    d_inode(child)->i_atime.tv_sec == v_atime &&
+			    d_inode(child)->i_size == 0) {
+				open->op_created = true;
+				goto set_attr;	/* subtle */
+			}
+			status = nfserr_exist;
+		}
+		goto out;
+	}
+
+	if (!IS_POSIXACL(inode))
+		iap->ia_mode &= ~current_umask();
+
+	host_err = vfs_create(&init_user_ns, inode, child, iap->ia_mode, true);
+	if (host_err < 0) {
+		status = nfserrno(host_err);
+		goto out;
+	}
+	open->op_created = true;
+
+	/* A newly created file already has a file size of zero. */
+	if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0))
+		iap->ia_valid &= ~ATTR_SIZE;
+	if (nfsd4_create_is_exclusive(open->op_createmode)) {
+		iap->ia_valid = ATTR_MTIME | ATTR_ATIME |
+				ATTR_MTIME_SET|ATTR_ATIME_SET;
+		iap->ia_mtime.tv_sec = v_mtime;
+		iap->ia_atime.tv_sec = v_atime;
+		iap->ia_mtime.tv_nsec = 0;
+		iap->ia_atime.tv_nsec = 0;
+	}
+
+set_attr:
+	status = nfsd_create_setattr(rqstp, fhp, resfhp, iap);
+
+out:
+	fh_unlock(fhp);
+	if (child && !IS_ERR(child))
+		dput(child);
+	fh_drop_write(fhp);
+	return status;
+}
+
 static __be32
 do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd4_open *open, struct svc_fh **resfh)
 {
@@ -264,16 +414,8 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru
 		 * yes          | yes    | GUARDED4        | GUARDED4
 		 */
 
-		/*
-		 * Note: create modes (UNCHECKED,GUARDED...) are the same
-		 * in NFSv4 as in v3 except EXCLUSIVE4_1.
-		 */
 		current->fs->umask = open->op_umask;
-		status = do_nfsd_create(rqstp, current_fh, open->op_fname,
-					open->op_fnamelen, &open->op_iattr,
-					*resfh, open->op_createmode,
-					(u32 *)open->op_verf.data,
-					&open->op_truncate, &open->op_created);
+		status = nfsd4_create_file(rqstp, current_fh, *resfh, open);
 		current->fs->umask = 0;
 
 		if (!status && open->op_label.len)
@@ -284,7 +426,7 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru
 		 * use the returned bitmask to indicate which attributes
 		 * we used to store the verifier:
 		 */
-		if (nfsd_create_is_exclusive(open->op_createmode) && status == 0)
+		if (nfsd4_create_is_exclusive(open->op_createmode) && status == 0)
 			open->op_bmval[1] |= (FATTR4_WORD1_TIME_ACCESS |
 						FATTR4_WORD1_TIME_MODIFY);
 	} else



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFC 6/8] NFSD: Remove do_nfsd_create()
  2022-04-20 18:28 [PATCH RFC 0/8] Make NFSv4 OPEN(CREATE) less brittle Chuck Lever
                   ` (4 preceding siblings ...)
  2022-04-20 18:29 ` [PATCH RFC 5/8] NFSD: Refactor NFSv4 OPEN(CREATE) Chuck Lever
@ 2022-04-20 18:29 ` Chuck Lever
  2022-04-20 18:29 ` [PATCH RFC 7/8] NFSD: Clean up nfsd_open_verified() Chuck Lever
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Chuck Lever @ 2022-04-20 18:29 UTC (permalink / raw)
  To: linux-nfs, linux-fsdevel

Now that its two callers have their own version-specific instance of
this function, do_nfsd_create() is no longer used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/vfs.c |  150 ---------------------------------------------------------
 fs/nfsd/vfs.h |   10 ----
 2 files changed, 160 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index ed58f7d46730..407976830a2b 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1393,156 +1393,6 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 					rdev, resfhp);
 }
 
-/*
- * NFSv3 and NFSv4 version of nfsd_create
- */
-__be32
-do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
-		char *fname, int flen, struct iattr *iap,
-		struct svc_fh *resfhp, int createmode, u32 *verifier,
-	        bool *truncp, bool *created)
-{
-	struct dentry	*dentry, *dchild = NULL;
-	struct inode	*dirp;
-	__be32		err;
-	int		host_err;
-	__u32		v_mtime=0, v_atime=0;
-
-	err = nfserr_perm;
-	if (!flen)
-		goto out;
-	err = nfserr_exist;
-	if (isdotent(fname, flen))
-		goto out;
-	if (!(iap->ia_valid & ATTR_MODE))
-		iap->ia_mode = 0;
-	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
-	if (err)
-		goto out;
-
-	dentry = fhp->fh_dentry;
-	dirp = d_inode(dentry);
-
-	host_err = fh_want_write(fhp);
-	if (host_err)
-		goto out_nfserr;
-
-	fh_lock_nested(fhp, I_MUTEX_PARENT);
-
-	/*
-	 * Compose the response file handle.
-	 */
-	dchild = lookup_one_len(fname, dentry, flen);
-	host_err = PTR_ERR(dchild);
-	if (IS_ERR(dchild))
-		goto out_nfserr;
-
-	/* If file doesn't exist, check for permissions to create one */
-	if (d_really_is_negative(dchild)) {
-		err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
-		if (err)
-			goto out;
-	}
-
-	err = fh_compose(resfhp, fhp->fh_export, dchild, fhp);
-	if (err)
-		goto out;
-
-	if (nfsd_create_is_exclusive(createmode)) {
-		/* solaris7 gets confused (bugid 4218508) if these have
-		 * the high bit set, as do xfs filesystems without the
-		 * "bigtime" feature.  So just clear the high bits. If this is
-		 * ever changed to use different attrs for storing the
-		 * verifier, then do_open_lookup() will also need to be fixed
-		 * accordingly.
-		 */
-		v_mtime = verifier[0]&0x7fffffff;
-		v_atime = verifier[1]&0x7fffffff;
-	}
-	
-	if (d_really_is_positive(dchild)) {
-		err = 0;
-
-		switch (createmode) {
-		case NFS3_CREATE_UNCHECKED:
-			if (! d_is_reg(dchild))
-				goto out;
-			else if (truncp) {
-				/* in nfsv4, we need to treat this case a little
-				 * differently.  we don't want to truncate the
-				 * file now; this would be wrong if the OPEN
-				 * fails for some other reason.  furthermore,
-				 * if the size is nonzero, we should ignore it
-				 * according to spec!
-				 */
-				*truncp = (iap->ia_valid & ATTR_SIZE) && !iap->ia_size;
-			}
-			else {
-				iap->ia_valid &= ATTR_SIZE;
-				goto set_attr;
-			}
-			break;
-		case NFS3_CREATE_EXCLUSIVE:
-			if (   d_inode(dchild)->i_mtime.tv_sec == v_mtime
-			    && d_inode(dchild)->i_atime.tv_sec == v_atime
-			    && d_inode(dchild)->i_size  == 0 ) {
-				if (created)
-					*created = true;
-				break;
-			}
-			fallthrough;
-		case NFS4_CREATE_EXCLUSIVE4_1:
-			if (   d_inode(dchild)->i_mtime.tv_sec == v_mtime
-			    && d_inode(dchild)->i_atime.tv_sec == v_atime
-			    && d_inode(dchild)->i_size  == 0 ) {
-				if (created)
-					*created = true;
-				goto set_attr;
-			}
-			fallthrough;
-		case NFS3_CREATE_GUARDED:
-			err = nfserr_exist;
-		}
-		goto out;
-	}
-
-	if (!IS_POSIXACL(dirp))
-		iap->ia_mode &= ~current_umask();
-
-	host_err = vfs_create(&init_user_ns, dirp, dchild, iap->ia_mode, true);
-	if (host_err < 0)
-		goto out_nfserr;
-	if (created)
-		*created = true;
-
-	nfsd_check_ignore_resizing(iap);
-
-	if (nfsd_create_is_exclusive(createmode)) {
-		/* Cram the verifier into atime/mtime */
-		iap->ia_valid = ATTR_MTIME|ATTR_ATIME
-			| ATTR_MTIME_SET|ATTR_ATIME_SET;
-		/* XXX someone who knows this better please fix it for nsec */ 
-		iap->ia_mtime.tv_sec = v_mtime;
-		iap->ia_atime.tv_sec = v_atime;
-		iap->ia_mtime.tv_nsec = 0;
-		iap->ia_atime.tv_nsec = 0;
-	}
-
- set_attr:
-	err = nfsd_create_setattr(rqstp, fhp, resfhp, iap);
-
- out:
-	fh_unlock(fhp);
-	if (dchild && !IS_ERR(dchild))
-		dput(dchild);
-	fh_drop_write(fhp);
- 	return err;
- 
- out_nfserr:
-	err = nfserrno(host_err);
-	goto out;
-}
-
 /*
  * Read a symlink. On entry, *lenp must contain the maximum path length that
  * fits into the buffer. On return, it contains the true length.
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 1f32a83456b0..f99794b033a5 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -71,10 +71,6 @@ __be32		nfsd_create(struct svc_rqst *, struct svc_fh *,
 __be32		nfsd_access(struct svc_rqst *, struct svc_fh *, u32 *, u32 *);
 __be32		nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp,
 				struct svc_fh *resfhp, struct iattr *iap);
-__be32		do_nfsd_create(struct svc_rqst *, struct svc_fh *,
-				char *name, int len, struct iattr *attrs,
-				struct svc_fh *res, int createmode,
-				u32 *verifier, bool *truncp, bool *created);
 __be32		nfsd_commit(struct svc_rqst *rqst, struct svc_fh *fhp,
 				u64 offset, u32 count, __be32 *verf);
 #ifdef CONFIG_NFSD_V4
@@ -161,10 +157,4 @@ static inline __be32 fh_getattr(const struct svc_fh *fh, struct kstat *stat)
 				    AT_STATX_SYNC_AS_STAT));
 }
 
-static inline int nfsd_create_is_exclusive(int createmode)
-{
-	return createmode == NFS3_CREATE_EXCLUSIVE
-	       || createmode == NFS4_CREATE_EXCLUSIVE4_1;
-}
-
 #endif /* LINUX_NFSD_VFS_H */



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFC 7/8] NFSD: Clean up nfsd_open_verified()
  2022-04-20 18:28 [PATCH RFC 0/8] Make NFSv4 OPEN(CREATE) less brittle Chuck Lever
                   ` (5 preceding siblings ...)
  2022-04-20 18:29 ` [PATCH RFC 6/8] NFSD: Remove do_nfsd_create() Chuck Lever
@ 2022-04-20 18:29 ` Chuck Lever
  2022-04-20 18:29 ` [PATCH RFC 8/8] NFSD: Instantiate a struct file when creating a regular NFSv4 file Chuck Lever
  2022-04-20 19:24 ` [PATCH RFC 0/8] Make NFSv4 OPEN(CREATE) less brittle J. Bruce Fields
  8 siblings, 0 replies; 13+ messages in thread
From: Chuck Lever @ 2022-04-20 18:29 UTC (permalink / raw)
  To: linux-nfs, linux-fsdevel

Its only caller always passes S_IFREG as the @type parameter. As an
additional clean-up, add a kerneldoc comment.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |    4 ++--
 fs/nfsd/vfs.c       |   15 ++++++++++++---
 fs/nfsd/vfs.h       |    2 +-
 3 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 2c1b027774d4..781bef7e42d9 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -995,8 +995,8 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 
 	nf->nf_mark = nfsd_file_mark_find_or_create(nf);
 	if (nf->nf_mark)
-		status = nfsd_open_verified(rqstp, fhp, S_IFREG,
-				may_flags, &nf->nf_file);
+		status = nfsd_open_verified(rqstp, fhp, may_flags,
+					    &nf->nf_file);
 	else
 		status = nfserr_jukebox;
 	/*
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 407976830a2b..893b5f21dab9 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -827,14 +827,23 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
 	return err;
 }
 
+/**
+ * nfsd_open_verified - Open a regular file for the filecache
+ * @rqstp: RPC request
+ * @fhp: NFS filehandle of the file to open
+ * @may_flags: internal permission flags
+ * @filp: OUT: open "struct file *"
+ *
+ * Returns an nfsstat value in network byte order.
+ */
 __be32
-nfsd_open_verified(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
-		int may_flags, struct file **filp)
+nfsd_open_verified(struct svc_rqst *rqstp, struct svc_fh *fhp, int may_flags,
+		   struct file **filp)
 {
 	__be32 err;
 
 	validate_process_creds();
-	err = __nfsd_open(rqstp, fhp, type, may_flags, filp);
+	err = __nfsd_open(rqstp, fhp, S_IFREG, may_flags, filp);
 	validate_process_creds();
 	return err;
 }
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index f99794b033a5..26347d76f44a 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -86,7 +86,7 @@ __be32		nfsd_setxattr(struct svc_rqst *rqstp, struct svc_fh *fhp,
 int 		nfsd_open_break_lease(struct inode *, int);
 __be32		nfsd_open(struct svc_rqst *, struct svc_fh *, umode_t,
 				int, struct file **);
-__be32		nfsd_open_verified(struct svc_rqst *, struct svc_fh *, umode_t,
+__be32		nfsd_open_verified(struct svc_rqst *, struct svc_fh *,
 				int, struct file **);
 __be32		nfsd_splice_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
 				struct file *file, loff_t offset,



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFC 8/8] NFSD: Instantiate a struct file when creating a regular NFSv4 file
  2022-04-20 18:28 [PATCH RFC 0/8] Make NFSv4 OPEN(CREATE) less brittle Chuck Lever
                   ` (6 preceding siblings ...)
  2022-04-20 18:29 ` [PATCH RFC 7/8] NFSD: Clean up nfsd_open_verified() Chuck Lever
@ 2022-04-20 18:29 ` Chuck Lever
  2022-04-20 19:24 ` [PATCH RFC 0/8] Make NFSv4 OPEN(CREATE) less brittle J. Bruce Fields
  8 siblings, 0 replies; 13+ messages in thread
From: Chuck Lever @ 2022-04-20 18:29 UTC (permalink / raw)
  To: linux-nfs, linux-fsdevel

There have been reports of races that cause NFSv4 OPEN(CREATE) to
return an error even though the requested file was created. NFSv4
does not provide a status code for this case.

To mitigate some of these problems, reorganize the NFSv4
OPEN(CREATE) logic to allocate resources before the file is actually
created, and open the new file while the parent directory is still
locked.

Two new APIs are added:

+ Add an API that works like nfsd_file_acquire() but does not open
the underlying file. The OPEN(CREATE) path can use this API when it
already has an open file.

+ Add an API that is kin to dentry_open(). NFSD needs to create a
file and grab an open "struct file *" atomically. The
alloc_empty_file() has to be done before the inode create. If it
fails (for example, because the NFS server has exceeded its
max_files limit), we avoid creating the file and can still return
an error to the NFS client.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=382
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c |   51 ++++++++++++++++++++++++++++++++++++++++++++-------
 fs/nfsd/filecache.h |    2 ++
 fs/nfsd/nfs4proc.c  |   43 +++++++++++++++++++++++++++++++++++++++----
 fs/nfsd/nfs4state.c |   16 +++++++++++++---
 fs/nfsd/xdr4.h      |    1 +
 fs/open.c           |   44 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h  |    2 ++
 7 files changed, 145 insertions(+), 14 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 781bef7e42d9..24772f246461 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -897,9 +897,9 @@ nfsd_file_is_cached(struct inode *inode)
 	return ret;
 }
 
-__be32
-nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
-		  unsigned int may_flags, struct nfsd_file **pnf)
+static __be32
+nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
+		     unsigned int may_flags, struct nfsd_file **pnf, bool open)
 {
 	__be32	status;
 	struct net *net = SVC_NET(rqstp);
@@ -994,10 +994,13 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		nfsd_file_gc();
 
 	nf->nf_mark = nfsd_file_mark_find_or_create(nf);
-	if (nf->nf_mark)
-		status = nfsd_open_verified(rqstp, fhp, may_flags,
-					    &nf->nf_file);
-	else
+	if (nf->nf_mark) {
+		if (open)
+			status = nfsd_open_verified(rqstp, fhp, may_flags,
+						    &nf->nf_file);
+		else
+			status = nfs_ok;
+	} else
 		status = nfserr_jukebox;
 	/*
 	 * If construction failed, or we raced with a call to unlink()
@@ -1017,6 +1020,40 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	goto out;
 }
 
+/**
+ * nfsd_file_acquire - Get a struct nfsd_file with an open file
+ * @rqstp: the RPC transaction being executed
+ * @fhp: the NFS filehandle of the file to be opened
+ * @may_flags: NFSD_MAY_ settings for the file
+ * @pnf: OUT: new or found "struct nfsd_file" object
+ *
+ * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in
+ * network byte order is returned.
+ */
+__be32
+nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
+		  unsigned int may_flags, struct nfsd_file **pnf)
+{
+	return nfsd_do_file_acquire(rqstp, fhp, may_flags, pnf, true);
+}
+
+/**
+ * nfsd_file_create - Get a struct nfsd_file, do not open
+ * @rqstp: the RPC transaction being executed
+ * @fhp: the NFS filehandle of the file just created
+ * @may_flags: NFSD_MAY_ settings for the file
+ * @pnf: OUT: new or found "struct nfsd_file" object
+ *
+ * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in
+ * network byte order is returned.
+ */
+__be32
+nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
+		 unsigned int may_flags, struct nfsd_file **pnf)
+{
+	return nfsd_do_file_acquire(rqstp, fhp, may_flags, pnf, false);
+}
+
 /*
  * Note that fields may be added, removed or reordered in the future. Programs
  * scraping this file for info should test the labels to ensure they're
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 435ceab27897..1da0c79a5580 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -59,5 +59,7 @@ void nfsd_file_close_inode_sync(struct inode *inode);
 bool nfsd_file_is_cached(struct inode *inode);
 __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		  unsigned int may_flags, struct nfsd_file **nfp);
+__be32 nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
+		  unsigned int may_flags, struct nfsd_file **nfp);
 int	nfsd_file_cache_stats_open(struct inode *, struct file *);
 #endif /* _FS_NFSD_FILECACHE_H */
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 99c0485a29e9..28bae84d7809 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -243,6 +243,37 @@ static inline bool nfsd4_create_is_exclusive(int createmode)
 		createmode == NFS4_CREATE_EXCLUSIVE4_1;
 }
 
+static __be32
+nfsd4_vfs_create(struct svc_fh *fhp, struct dentry *child,
+		 struct nfsd4_open *open)
+{
+	struct file *filp;
+	struct path path;
+	int oflags;
+
+	oflags = O_CREAT | O_LARGEFILE;
+	switch (open->op_share_access & NFS4_SHARE_ACCESS_BOTH) {
+	case NFS4_SHARE_ACCESS_WRITE:
+		oflags |= O_WRONLY;
+		break;
+	case NFS4_SHARE_ACCESS_BOTH:
+		oflags |= O_RDWR;
+		break;
+	default:
+		oflags |= O_RDONLY;
+	}
+
+	path.mnt = fhp->fh_export->ex_path.mnt;
+	path.dentry = child;
+	filp = dentry_create(&path, oflags, open->op_iattr.ia_mode,
+			     current_cred());
+	if (IS_ERR(filp))
+		return nfserrno(PTR_ERR(filp));
+
+	open->op_filp = filp;
+	return nfs_ok;
+}
+
 /*
  * Implement NFSv4's unchecked, guarded, and exclusive create
  * semantics for regular files. Open state for this new file is
@@ -355,11 +386,9 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (!IS_POSIXACL(inode))
 		iap->ia_mode &= ~current_umask();
 
-	host_err = vfs_create(&init_user_ns, inode, child, iap->ia_mode, true);
-	if (host_err < 0) {
-		status = nfserrno(host_err);
+	status = nfsd4_vfs_create(fhp, child, open);
+	if (status != nfs_ok)
 		goto out;
-	}
 	open->op_created = true;
 
 	/* A newly created file already has a file size of zero. */
@@ -517,6 +546,8 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		(int)open->op_fnamelen, open->op_fname,
 		open->op_openowner);
 
+	open->op_filp = NULL;
+
 	/* This check required by spec. */
 	if (open->op_create && open->op_claim_type != NFS4_OPEN_CLAIM_NULL)
 		return nfserr_inval;
@@ -613,6 +644,10 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	if (reclaim && !status)
 		nn->somebody_reclaimed = true;
 out:
+	if (open->op_filp) {
+		fput(open->op_filp);
+		open->op_filp = NULL;
+	}
 	if (resfh && resfh != &cstate->current_fh) {
 		fh_dup2(&cstate->current_fh, resfh);
 		fh_put(resfh);
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 234e852fcdfa..e7c28ddf2538 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4985,9 +4985,19 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp,
 
 	if (!fp->fi_fds[oflag]) {
 		spin_unlock(&fp->fi_lock);
-		status = nfsd_file_acquire(rqstp, cur_fh, access, &nf);
-		if (status)
-			goto out_put_access;
+
+		if (!open->op_filp) {
+			status = nfsd_file_acquire(rqstp, cur_fh, access, &nf);
+			if (status != nfs_ok)
+				goto out_put_access;
+		} else {
+			status = nfsd_file_create(rqstp, cur_fh, access, &nf);
+			if (status != nfs_ok)
+				goto out_put_access;
+			nf->nf_file = open->op_filp;
+			open->op_filp = NULL;
+		}
+
 		spin_lock(&fp->fi_lock);
 		if (!fp->fi_fds[oflag]) {
 			fp->fi_fds[oflag] = nf;
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 846ab6df9d48..7b744011f2d3 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -273,6 +273,7 @@ struct nfsd4_open {
 	bool		op_truncate;        /* used during processing */
 	bool		op_created;         /* used during processing */
 	struct nfs4_openowner *op_openowner; /* used during processing */
+	struct file	*op_filp;           /* used during processing */
 	struct nfs4_file *op_file;          /* used during processing */
 	struct nfs4_ol_stateid *op_stp;	    /* used during processing */
 	struct nfs4_clnt_odstate *op_odstate; /* used during processing */
diff --git a/fs/open.c b/fs/open.c
index 1315253e0247..e3cf72c269e1 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -981,6 +981,50 @@ struct file *dentry_open(const struct path *path, int flags,
 }
 EXPORT_SYMBOL(dentry_open);
 
+/**
+ * dentry_create - Create and open a file
+ * @path: path to create
+ * @flags: O_ flags
+ * @mode: mode bits for new file
+ * @cred: credentials to use
+ *
+ * Caller must hold the parent directory's lock, and have prepared
+ * a negative dentry, placed in @path->dentry, for the new file.
+ *
+ * On success, returns a "struct file *". Otherwise a ERR_PTR
+ * is returned.
+ */
+struct file *dentry_create(const struct path *path, int flags, umode_t mode,
+			   const struct cred *cred)
+{
+	struct dentry *parent;
+	struct file *f;
+	int error;
+
+	validate_creds(cred);
+	f = alloc_empty_file(flags, cred);
+	if (IS_ERR(f))
+		return f;
+
+	parent = dget_parent(path->dentry);
+	error = vfs_create(mnt_user_ns(path->mnt), d_inode(parent),
+			   path->dentry, mode, true);
+	dput(parent);
+	if (error) {
+		fput(f);
+		return ERR_PTR(error);
+	}
+
+	error = vfs_open(path, f);
+	if (error) {
+		fput(f);
+		return ERR_PTR(error);
+	}
+
+	return f;
+}
+EXPORT_SYMBOL(dentry_create);
+
 struct file *open_with_fake_path(const struct path *path, int flags,
 				struct inode *inode, const struct cred *cred)
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index bbde95387a23..53501d710441 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2630,6 +2630,8 @@ static inline struct file *file_open_root_mnt(struct vfsmount *mnt,
 			      name, flags, mode);
 }
 extern struct file * dentry_open(const struct path *, int, const struct cred *);
+extern struct file *dentry_create(const struct path *path, int flags,
+				  umode_t mode, const struct cred *cred);
 extern struct file * open_with_fake_path(const struct path *, int,
 					 struct inode*, const struct cred *);
 static inline struct file *file_clone_open(struct file *file)



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 4/8] NFSD: Refactor NFSv3 CREATE
  2022-04-20 18:29 ` [PATCH RFC 4/8] NFSD: Refactor NFSv3 CREATE Chuck Lever
@ 2022-04-20 19:10   ` J. Bruce Fields
  2022-04-20 19:31     ` Chuck Lever III
  0 siblings, 1 reply; 13+ messages in thread
From: J. Bruce Fields @ 2022-04-20 19:10 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs, linux-fsdevel

On Wed, Apr 20, 2022 at 02:29:00PM -0400, Chuck Lever wrote:
> The NFSv3 CREATE and NFSv4 OPEN(CREATE) use cases are about to
> diverge such that it makes sense to split do_nfsd_create() into one
> version for NFSv3 and one for NFSv4.
> 
> As a first step, copy do_nfsd_create() to nfs3proc.c and remove
> NFSv4-specific logic.
> 
> One immediate legibility benefit is that the logic for handling
> NFSv3 createhow is now quite straightforward. NFSv4 createhow
> has some subtleties that IMO do not belong in generic code.

That makes sense to me, though just eyeballing the two resulting
functions, you end up with a *lot* of duplication.  I wonder if it'd be
possible to keep the two paths free of complications from each other
while sharing more code, e.g. if there are logical blocks of code that
could now be pulled out into common helpers.

--b.

> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/nfs3proc.c |  127 ++++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 121 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
> index 981a2a71c5af..981a3a7a6e16 100644
> --- a/fs/nfsd/nfs3proc.c
> +++ b/fs/nfsd/nfs3proc.c
> @@ -8,6 +8,7 @@
>  #include <linux/fs.h>
>  #include <linux/ext2_fs.h>
>  #include <linux/magic.h>
> +#include <linux/namei.h>
>  
>  #include "cache.h"
>  #include "xdr3.h"
> @@ -220,10 +221,126 @@ nfsd3_proc_write(struct svc_rqst *rqstp)
>  }
>  
>  /*
> - * With NFSv3, CREATE processing is a lot easier than with NFSv2.
> - * At least in theory; we'll see how it fares in practice when the
> - * first reports about SunOS compatibility problems start to pour in...
> + * Implement NFSv3's unchecked, guarded, and exclusive CREATE
> + * semantics for regular files. Except for the created file,
> + * this operation is stateless on the server.
> + *
> + * Upon return, caller must release @fhp and @resfhp.
>   */
> +static __be32
> +nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> +		  struct svc_fh *resfhp, struct nfsd3_createargs *argp)
> +{
> +	struct iattr *iap = &argp->attrs;
> +	struct dentry *parent, *child;
> +	__u32 v_mtime, v_atime;
> +	struct inode *inode;
> +	__be32 status;
> +	int host_err;
> +
> +	if (isdotent(argp->name, argp->len))
> +		return nfserr_exist;
> +	if (!(iap->ia_valid & ATTR_MODE))
> +		iap->ia_mode = 0;
> +
> +	status = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
> +	if (status != nfs_ok)
> +		return status;
> +
> +	parent = fhp->fh_dentry;
> +	inode = d_inode(parent);
> +
> +	host_err = fh_want_write(fhp);
> +	if (host_err)
> +		return nfserrno(host_err);
> +
> +	fh_lock_nested(fhp, I_MUTEX_PARENT);
> +
> +	child = lookup_one_len(argp->name, parent, argp->len);
> +	if (IS_ERR(child)) {
> +		status = nfserrno(PTR_ERR(child));
> +		goto out;
> +	}
> +
> +	if (d_really_is_negative(child)) {
> +		status = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
> +		if (status != nfs_ok)
> +			goto out;
> +	}
> +
> +	status = fh_compose(resfhp, fhp->fh_export, child, fhp);
> +	if (status != nfs_ok)
> +		goto out;
> +
> +	v_mtime = 0;
> +	v_atime = 0;
> +	if (argp->createmode == NFS3_CREATE_EXCLUSIVE) {
> +		u32 *verifier = (u32 *)argp->verf;
> +
> +		/*
> +		 * Solaris 7 gets confused (bugid 4218508) if these have
> +		 * the high bit set, as do xfs filesystems without the
> +		 * "bigtime" feature. So just clear the high bits.
> +		 */
> +		v_mtime = verifier[0] & 0x7fffffff;
> +		v_atime = verifier[1] & 0x7fffffff;
> +	}
> +
> +	if (d_really_is_positive(child)) {
> +		status = nfs_ok;
> +
> +		switch (argp->createmode) {
> +		case NFS3_CREATE_UNCHECKED:
> +			if (!d_is_reg(child))
> +				break;
> +			iap->ia_valid &= ATTR_SIZE;
> +			goto set_attr;
> +		case NFS3_CREATE_GUARDED:
> +			status = nfserr_exist;
> +			break;
> +		case NFS3_CREATE_EXCLUSIVE:
> +			if (d_inode(child)->i_mtime.tv_sec == v_mtime &&
> +			    d_inode(child)->i_atime.tv_sec == v_atime &&
> +			    d_inode(child)->i_size == 0) {
> +				break;
> +			}
> +			status = nfserr_exist;
> +		}
> +		goto out;
> +	}
> +
> +	if (!IS_POSIXACL(inode))
> +		iap->ia_mode &= ~current_umask();
> +
> +	host_err = vfs_create(&init_user_ns, inode, child, iap->ia_mode, true);
> +	if (host_err < 0) {
> +		status = nfserrno(host_err);
> +		goto out;
> +	}
> +
> +	/* A newly created file already has a file size of zero. */
> +	if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0))
> +		iap->ia_valid &= ~ATTR_SIZE;
> +	if (argp->createmode == NFS3_CREATE_EXCLUSIVE) {
> +		iap->ia_valid = ATTR_MTIME | ATTR_ATIME |
> +				ATTR_MTIME_SET | ATTR_ATIME_SET;
> +		iap->ia_mtime.tv_sec = v_mtime;
> +		iap->ia_atime.tv_sec = v_atime;
> +		iap->ia_mtime.tv_nsec = 0;
> +		iap->ia_atime.tv_nsec = 0;
> +	}
> +
> +set_attr:
> +	status = nfsd_create_setattr(rqstp, fhp, resfhp, iap);
> +
> +out:
> +	fh_unlock(fhp);
> +	if (child && !IS_ERR(child))
> +		dput(child);
> +	fh_drop_write(fhp);
> +	return status;
> +}
> +
>  static __be32
>  nfsd3_proc_create(struct svc_rqst *rqstp)
>  {
> @@ -239,9 +356,7 @@ nfsd3_proc_create(struct svc_rqst *rqstp)
>  	dirfhp = fh_copy(&resp->dirfh, &argp->fh);
>  	newfhp = fh_init(&resp->fh, NFS3_FHSIZE);
>  
> -	resp->status = do_nfsd_create(rqstp, dirfhp, argp->name, argp->len,
> -				      &argp->attrs, newfhp, argp->createmode,
> -				      (u32 *)argp->verf, NULL, NULL);
> +	resp->status = nfsd3_create_file(rqstp, dirfhp, newfhp, argp);
>  	return rpc_success;
>  }
>  
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 0/8] Make NFSv4 OPEN(CREATE) less brittle
  2022-04-20 18:28 [PATCH RFC 0/8] Make NFSv4 OPEN(CREATE) less brittle Chuck Lever
                   ` (7 preceding siblings ...)
  2022-04-20 18:29 ` [PATCH RFC 8/8] NFSD: Instantiate a struct file when creating a regular NFSv4 file Chuck Lever
@ 2022-04-20 19:24 ` J. Bruce Fields
  8 siblings, 0 replies; 13+ messages in thread
From: J. Bruce Fields @ 2022-04-20 19:24 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs, linux-fsdevel

On Wed, Apr 20, 2022 at 02:28:34PM -0400, Chuck Lever wrote:
> Attempt to address occasional reports of test failures caused by
> NFSv4 OPEN(CREATE) having failing internally after the target
> file object has been created.
> 
> The basic approach is to re-organize the NFSv4 OPEN code path so
> that common failure modes occur /before/ the call to vfs_create()
> rather than afterwards. In addition, the file is opened and created
> atomically so that another client can't race and de-permit the
> file just after it was created but before the server has opened it.
> 
> So far I haven't found any regressions. However I have not been
> able to reproduce the original failures.

I'll admit I don't know how big an impact those races have in practice.
But this has bugged me for a long time--thanks for finally tackling it.

I haven't reviewed it in detail but the basic idea looks good.

--b.

> Chuck Lever (8):
>       NFSD: Clean up nfsd3_proc_create()
>       NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create()
>       NFSD: Refactor nfsd_create_setattr()
>       NFSD: Refactor NFSv3 CREATE
>       NFSD: Refactor NFSv4 OPEN(CREATE)
>       NFSD: Remove do_nfsd_create()
>       NFSD: Clean up nfsd_open_verified()
>       NFSD: Instantiate a struct file when creating a regular NFSv4 file
> 
> 
>  fs/nfsd/filecache.c |  51 +++++++--
>  fs/nfsd/filecache.h |   2 +
>  fs/nfsd/nfs3proc.c  | 141 +++++++++++++++++++++----
>  fs/nfsd/nfs4proc.c  | 197 +++++++++++++++++++++++++++++++++--
>  fs/nfsd/nfs4state.c |  16 ++-
>  fs/nfsd/vfs.c       | 245 ++++++++++----------------------------------
>  fs/nfsd/vfs.h       |  14 +--
>  fs/nfsd/xdr4.h      |   1 +
>  fs/open.c           |  44 ++++++++
>  include/linux/fs.h  |   2 +
>  10 files changed, 471 insertions(+), 242 deletions(-)
> 
> --
> Chuck Lever

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 4/8] NFSD: Refactor NFSv3 CREATE
  2022-04-20 19:10   ` J. Bruce Fields
@ 2022-04-20 19:31     ` Chuck Lever III
  2022-04-21 16:37       ` Bruce Fields
  0 siblings, 1 reply; 13+ messages in thread
From: Chuck Lever III @ 2022-04-20 19:31 UTC (permalink / raw)
  To: Bruce Fields; +Cc: Linux NFS Mailing List, linux-fsdevel



> On Apr 20, 2022, at 3:10 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Wed, Apr 20, 2022 at 02:29:00PM -0400, Chuck Lever wrote:
>> The NFSv3 CREATE and NFSv4 OPEN(CREATE) use cases are about to
>> diverge such that it makes sense to split do_nfsd_create() into one
>> version for NFSv3 and one for NFSv4.
>> 
>> As a first step, copy do_nfsd_create() to nfs3proc.c and remove
>> NFSv4-specific logic.
>> 
>> One immediate legibility benefit is that the logic for handling
>> NFSv3 createhow is now quite straightforward. NFSv4 createhow
>> has some subtleties that IMO do not belong in generic code.
> 
> That makes sense to me, though just eyeballing the two resulting
> functions, you end up with a *lot* of duplication.

About 200 lines. I would feel a little more agreeable to
this if we didn't already have a separate "create" function
for NFSv2 (ie, nfsd_proc_create).


> I wonder if it'd be
> possible to keep the two paths free of complications from each other
> while sharing more code, e.g. if there are logical blocks of code that
> could now be pulled out into common helpers.

I'm open to suggestions, but after the final patch in this
series, I don't see much else that is meaningful that can be
re-used by both. nfsd_create_setattr() was the one area that
seemed both common and heavyweight. The other areas are just
lightweight sanity checks.

And honestly, in this case, I don't think these code paths
are well-served by aggressive code de-duplication. The code
in each case is more readable and less brittle this way. The
NFSv4 code path now has some comments that mark the subtle
differences with NFSv3 exclusive create, and now you can't
break NFSv3 CREATE by making a change to NFSv4 OPEN, which
is far more complex.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 4/8] NFSD: Refactor NFSv3 CREATE
  2022-04-20 19:31     ` Chuck Lever III
@ 2022-04-21 16:37       ` Bruce Fields
  0 siblings, 0 replies; 13+ messages in thread
From: Bruce Fields @ 2022-04-21 16:37 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Linux NFS Mailing List, linux-fsdevel

On Wed, Apr 20, 2022 at 07:31:09PM +0000, Chuck Lever III wrote:
> > I wonder if it'd be
> > possible to keep the two paths free of complications from each other
> > while sharing more code, e.g. if there are logical blocks of code that
> > could now be pulled out into common helpers.
> 
> I'm open to suggestions, but after the final patch in this
> series, I don't see much else that is meaningful that can be
> re-used by both. nfsd_create_setattr() was the one area that
> seemed both common and heavyweight. The other areas are just
> lightweight sanity checks.
> 
> And honestly, in this case, I don't think these code paths
> are well-served by aggressive code de-duplication. The code
> in each case is more readable and less brittle this way. The
> NFSv4 code path now has some comments that mark the subtle
> differences with NFSv3 exclusive create, and now you can't
> break NFSv3 CREATE by making a change to NFSv4 OPEN, which
> is far more complex.

I can live with that.

Also, this passes all my usual regression tests, FWIW.

--b.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-04-21 16:37 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-20 18:28 [PATCH RFC 0/8] Make NFSv4 OPEN(CREATE) less brittle Chuck Lever
2022-04-20 18:28 ` [PATCH RFC 1/8] NFSD: Clean up nfsd3_proc_create() Chuck Lever
2022-04-20 18:28 ` [PATCH RFC 2/8] NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create() Chuck Lever
2022-04-20 18:28 ` [PATCH RFC 3/8] NFSD: Refactor nfsd_create_setattr() Chuck Lever
2022-04-20 18:29 ` [PATCH RFC 4/8] NFSD: Refactor NFSv3 CREATE Chuck Lever
2022-04-20 19:10   ` J. Bruce Fields
2022-04-20 19:31     ` Chuck Lever III
2022-04-21 16:37       ` Bruce Fields
2022-04-20 18:29 ` [PATCH RFC 5/8] NFSD: Refactor NFSv4 OPEN(CREATE) Chuck Lever
2022-04-20 18:29 ` [PATCH RFC 6/8] NFSD: Remove do_nfsd_create() Chuck Lever
2022-04-20 18:29 ` [PATCH RFC 7/8] NFSD: Clean up nfsd_open_verified() Chuck Lever
2022-04-20 18:29 ` [PATCH RFC 8/8] NFSD: Instantiate a struct file when creating a regular NFSv4 file Chuck Lever
2022-04-20 19:24 ` [PATCH RFC 0/8] Make NFSv4 OPEN(CREATE) less brittle J. Bruce Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).