All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] NFSD: clean up locking.
@ 2022-07-06  4:18 NeilBrown
  2022-07-06  4:18 ` [PATCH 6/8] NFSD: use explicit lock/unlock for directory ops NeilBrown
                   ` (8 more replies)
  0 siblings, 9 replies; 40+ messages in thread
From: NeilBrown @ 2022-07-06  4:18 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton; +Cc: linux-nfs

This series prepares NFSD to be able to adjust to work with a proposed
patch which allows updates to directories to happen in parallel.
This patch set changes the way directories are locked, so the present
series cleans up some locking in nfsd.

Specifically we remove fh_lock() and fh_unlock().
These functions are problematic for a few reasons.
- they are deliberately idempotent - setting or clearing a flag
  so that a second call does nothing.  This makes locking errors harder,
  but it results in code that looks wrong ...  and maybe sometimes is a
  little bit wrong.
  Early patches clean up several places where this idempotent nature of
  the functions is depended on, and so makes the code clearer.

- they transparently call fh_fill_pre/post_attrs(), including at times
  when this is not necessary.  Having the calls only when necessary is
  marginally more efficient, and arguably makes the code clearer.

nfsd_lookup() currently always locks the directory, though often no lock
is needed.  So a patch in this series reduces this locking.

There is an awkward case that could still be further improved.
NFSv4 open needs to ensure the file is not renamed (or unlinked etc)
between the moment when the open succeeds, and a later moment when a
"lease" is attached to support a delegation.  The handling of this lock
is currently untidy, particularly when creating a file.
It would probably be better to take a lease immediately after
opening the file, and then discarding if after deciding not to provide a
delegation.

I have run fstests and cthon tests on this, but I wouldn't be surprised
if there is a corner case that I've missed.

NeilBrown


---

NeilBrown (8):
      NFSD: drop rqstp arg to do_set_nfs4_acl()
      NFSD: change nfsd_create() to unlock directory before returning.
      NFSD: always drop directory lock in nfsd_unlink()
      NFSD: only call fh_unlock() once in nfsd_link()
      NFSD: reduce locking in nfsd_lookup()
      NFSD: use explicit lock/unlock for directory ops
      NFSD: use (un)lock_inode instead of fh_(un)lock for file operations
      NFSD: discard fh_locked flag and fh_lock/fh_unlock


 fs/nfsd/nfs2acl.c   |   6 +-
 fs/nfsd/nfs3acl.c   |   4 +-
 fs/nfsd/nfs3proc.c  |  21 ++---
 fs/nfsd/nfs4acl.c   |  19 ++---
 fs/nfsd/nfs4proc.c  | 106 +++++++++++++++---------
 fs/nfsd/nfs4state.c |   8 +-
 fs/nfsd/nfsfh.c     |   3 +-
 fs/nfsd/nfsfh.h     |  56 +------------
 fs/nfsd/nfsproc.c   |  14 ++--
 fs/nfsd/vfs.c       | 193 ++++++++++++++++++++++++++------------------
 fs/nfsd/vfs.h       |  19 +++--
 11 files changed, 238 insertions(+), 211 deletions(-)

--
Signature


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 1/8] NFSD: drop rqstp arg to do_set_nfs4_acl()
  2022-07-06  4:18 [PATCH 0/8] NFSD: clean up locking NeilBrown
                   ` (3 preceding siblings ...)
  2022-07-06  4:18 ` [PATCH 2/8] NFSD: change nfsd_create() to unlock directory before returning NeilBrown
@ 2022-07-06  4:18 ` NeilBrown
  2022-07-06 13:17   ` Jeff Layton
  2022-07-06  4:18 ` [PATCH 4/8] NFSD: only call fh_unlock() once in nfsd_link() NeilBrown
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 40+ messages in thread
From: NeilBrown @ 2022-07-06  4:18 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton; +Cc: linux-nfs

do_set_nfs4_acl() only needs rqstp to pass to nfsd4_set_nfs4_acl()

The latter only needs the rqstp to pass to fh_verify().

In every case that do_set_nfs4_acl() is called, fh_verify() is not
needed.  It is only needed for filehandles received from the client, the
filehandles passed to do_set_nfs4_acl() have just been constructed by
the server, and so must be valid.

So we can change nfsd4_set_nfs4_acl() to only call fh_verify() is rqstp
is not NULL, and always pass NULL from do_set_nfs4_acl().

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs4acl.c  |   12 +++++++-----
 fs/nfsd/nfs4proc.c |    9 ++++-----
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/nfs4acl.c b/fs/nfsd/nfs4acl.c
index eaa3a0cf38f1..5c9b7e01e8ca 100644
--- a/fs/nfsd/nfs4acl.c
+++ b/fs/nfsd/nfs4acl.c
@@ -753,7 +753,7 @@ static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
 
 __be32
 nfsd4_set_nfs4_acl(struct svc_rqst *rqstp, struct svc_fh *fhp,
-		struct nfs4_acl *acl)
+		   struct nfs4_acl *acl)
 {
 	__be32 error;
 	int host_error;
@@ -762,10 +762,12 @@ nfsd4_set_nfs4_acl(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	struct posix_acl *pacl = NULL, *dpacl = NULL;
 	unsigned int flags = 0;
 
-	/* Get inode */
-	error = fh_verify(rqstp, fhp, 0, NFSD_MAY_SATTR);
-	if (error)
-		return error;
+	if (rqstp) {
+		/* Get inode */
+		error = fh_verify(rqstp, fhp, 0, NFSD_MAY_SATTR);
+		if (error)
+			return error;
+	}
 
 	dentry = fhp->fh_dentry;
 	inode = d_inode(dentry);
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 5af9f8d1feb6..60591ceb4985 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -163,12 +163,11 @@ is_create_with_attrs(struct nfsd4_open *open)
  * in the returned attr bitmap.
  */
 static void
-do_set_nfs4_acl(struct svc_rqst *rqstp, struct svc_fh *fhp,
-		struct nfs4_acl *acl, u32 *bmval)
+do_set_nfs4_acl(struct svc_fh *fhp, struct nfs4_acl *acl, u32 *bmval)
 {
 	__be32 status;
 
-	status = nfsd4_set_nfs4_acl(rqstp, fhp, acl);
+	status = nfsd4_set_nfs4_acl(NULL, fhp, acl);
 	if (status)
 		/*
 		 * We should probably fail the whole open at this point,
@@ -474,7 +473,7 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru
 		goto out;
 
 	if (is_create_with_attrs(open) && open->op_acl != NULL)
-		do_set_nfs4_acl(rqstp, *resfh, open->op_acl, open->op_bmval);
+		do_set_nfs4_acl(*resfh, open->op_acl, open->op_bmval);
 
 	nfsd4_set_open_owner_reply_cache(cstate, open, *resfh);
 	accmode = NFSD_MAY_NOP;
@@ -861,7 +860,7 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		nfsd4_security_inode_setsecctx(&resfh, &create->cr_label, create->cr_bmval);
 
 	if (create->cr_acl != NULL)
-		do_set_nfs4_acl(rqstp, &resfh, create->cr_acl,
+		do_set_nfs4_acl(&resfh, create->cr_acl,
 				create->cr_bmval);
 
 	fh_unlock(&cstate->current_fh);



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 2/8] NFSD: change nfsd_create() to unlock directory before returning.
  2022-07-06  4:18 [PATCH 0/8] NFSD: clean up locking NeilBrown
                   ` (2 preceding siblings ...)
  2022-07-06  4:18 ` [PATCH 7/8] NFSD: use (un)lock_inode instead of fh_(un)lock for file operations NeilBrown
@ 2022-07-06  4:18 ` NeilBrown
  2022-07-06 13:24   ` Jeff Layton
  2022-07-06 16:29   ` Chuck Lever III
  2022-07-06  4:18 ` [PATCH 1/8] NFSD: drop rqstp arg to do_set_nfs4_acl() NeilBrown
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 40+ messages in thread
From: NeilBrown @ 2022-07-06  4:18 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton; +Cc: linux-nfs

nfsd_create() usually exits with the directory still locked.  This
relies on other code to unlock the directory.  Planned future patches
will change how directory locking works so the unlock step may be less
trivial.  It is cleaner to have lock and unlock in the same function.

nfsd4_create() performs some extra changes after the creation and before
the unlock - setting security label and an ACL.  To allow for these to
still be done while locked, we create a function nfsd4_post_create() and
pass it to nfsd_create() when needed.

nfsd_symlink() DOES usually unlock the directory, but nfsd4_create() may
add a label or ACL - with the directory unlocked.  I don't think symlinks
have ACLs and don't know if they can have labels, so I don't know if
this is of any practical consequence.  For consistency nfsd_symlink() is
changed to accept the same callback and call it if given.

nfsd_symlink() didn't unlock the directory if lookup_one_len() gave an
error.  This is untidy and potentially confusing, and has now been
fixed.  It isn't a practical problem as an eventual fh_put() will unlock
if needed.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs3proc.c |   11 ++++++-----
 fs/nfsd/nfs4proc.c |   38 ++++++++++++++++++++++++--------------
 fs/nfsd/nfsproc.c  |    5 +++--
 fs/nfsd/vfs.c      |   40 +++++++++++++++++++++++++++-------------
 fs/nfsd/vfs.h      |   11 ++++++++---
 5 files changed, 68 insertions(+), 37 deletions(-)

diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index 981a3a7a6e16..38255365ef71 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -378,8 +378,8 @@ nfsd3_proc_mkdir(struct svc_rqst *rqstp)
 	fh_copy(&resp->dirfh, &argp->fh);
 	fh_init(&resp->fh, NFS3_FHSIZE);
 	resp->status = nfsd_create(rqstp, &resp->dirfh, argp->name, argp->len,
-				   &argp->attrs, S_IFDIR, 0, &resp->fh);
-	fh_unlock(&resp->dirfh);
+				   &argp->attrs, S_IFDIR, 0, &resp->fh,
+				   NULL, NULL);
 	return rpc_success;
 }
 
@@ -414,7 +414,8 @@ nfsd3_proc_symlink(struct svc_rqst *rqstp)
 	fh_copy(&resp->dirfh, &argp->ffh);
 	fh_init(&resp->fh, NFS3_FHSIZE);
 	resp->status = nfsd_symlink(rqstp, &resp->dirfh, argp->fname,
-				    argp->flen, argp->tname, &resp->fh);
+				    argp->flen, argp->tname, &resp->fh,
+				    NULL, NULL);
 	kfree(argp->tname);
 out:
 	return rpc_success;
@@ -453,8 +454,8 @@ nfsd3_proc_mknod(struct svc_rqst *rqstp)
 
 	type = nfs3_ftypes[argp->ftype];
 	resp->status = nfsd_create(rqstp, &resp->dirfh, argp->name, argp->len,
-				   &argp->attrs, type, rdev, &resp->fh);
-	fh_unlock(&resp->dirfh);
+				   &argp->attrs, type, rdev, &resp->fh,
+				   NULL, NULL);
 out:
 	return rpc_success;
 }
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 60591ceb4985..3279daab909d 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -780,6 +780,18 @@ nfsd4_commit(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 			     (__be32 *)commit->co_verf.data);
 }
 
+static void nfsd4_post_create(struct svc_fh *fh, void *vcreate)
+{
+	struct nfsd4_create *create = vcreate;
+
+	if (create->cr_label.len)
+		nfsd4_security_inode_setsecctx(fh, &create->cr_label,
+					       create->cr_bmval);
+
+	if (create->cr_acl != NULL)
+		do_set_nfs4_acl(fh, create->cr_acl, create->cr_bmval);
+}
+
 static __be32
 nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	     union nfsd4_op_u *u)
@@ -805,7 +817,8 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	case NF4LNK:
 		status = nfsd_symlink(rqstp, &cstate->current_fh,
 				      create->cr_name, create->cr_namelen,
-				      create->cr_data, &resfh);
+				      create->cr_data, &resfh,
+				      nfsd4_post_create, create);
 		break;
 
 	case NF4BLK:
@@ -816,7 +829,8 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 			goto out_umask;
 		status = nfsd_create(rqstp, &cstate->current_fh,
 				     create->cr_name, create->cr_namelen,
-				     &create->cr_iattr, S_IFBLK, rdev, &resfh);
+				     &create->cr_iattr, S_IFBLK, rdev, &resfh,
+				     nfsd4_post_create, create);
 		break;
 
 	case NF4CHR:
@@ -827,26 +841,30 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 			goto out_umask;
 		status = nfsd_create(rqstp, &cstate->current_fh,
 				     create->cr_name, create->cr_namelen,
-				     &create->cr_iattr, S_IFCHR, rdev, &resfh);
+				     &create->cr_iattr, S_IFCHR, rdev, &resfh,
+				     nfsd4_post_create, create);
 		break;
 
 	case NF4SOCK:
 		status = nfsd_create(rqstp, &cstate->current_fh,
 				     create->cr_name, create->cr_namelen,
-				     &create->cr_iattr, S_IFSOCK, 0, &resfh);
+				     &create->cr_iattr, S_IFSOCK, 0, &resfh,
+				     nfsd4_post_create, create);
 		break;
 
 	case NF4FIFO:
 		status = nfsd_create(rqstp, &cstate->current_fh,
 				     create->cr_name, create->cr_namelen,
-				     &create->cr_iattr, S_IFIFO, 0, &resfh);
+				     &create->cr_iattr, S_IFIFO, 0, &resfh,
+				     nfsd4_post_create, create);
 		break;
 
 	case NF4DIR:
 		create->cr_iattr.ia_valid &= ~ATTR_SIZE;
 		status = nfsd_create(rqstp, &cstate->current_fh,
 				     create->cr_name, create->cr_namelen,
-				     &create->cr_iattr, S_IFDIR, 0, &resfh);
+				     &create->cr_iattr, S_IFDIR, 0, &resfh,
+				     nfsd4_post_create, create);
 		break;
 
 	default:
@@ -856,14 +874,6 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	if (status)
 		goto out;
 
-	if (create->cr_label.len)
-		nfsd4_security_inode_setsecctx(&resfh, &create->cr_label, create->cr_bmval);
-
-	if (create->cr_acl != NULL)
-		do_set_nfs4_acl(&resfh, create->cr_acl,
-				create->cr_bmval);
-
-	fh_unlock(&cstate->current_fh);
 	set_change_info(&create->cr_cinfo, &cstate->current_fh);
 	fh_dup2(&cstate->current_fh, &resfh);
 out:
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index fcdab8a8a41f..a25b8e321662 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -493,7 +493,7 @@ nfsd_proc_symlink(struct svc_rqst *rqstp)
 
 	fh_init(&newfh, NFS_FHSIZE);
 	resp->status = nfsd_symlink(rqstp, &argp->ffh, argp->fname, argp->flen,
-				    argp->tname, &newfh);
+				    argp->tname, &newfh, NULL, NULL);
 
 	kfree(argp->tname);
 	fh_put(&argp->ffh);
@@ -522,7 +522,8 @@ nfsd_proc_mkdir(struct svc_rqst *rqstp)
 	argp->attrs.ia_valid &= ~ATTR_SIZE;
 	fh_init(&resp->fh, NFS_FHSIZE);
 	resp->status = nfsd_create(rqstp, &argp->fh, argp->name, argp->len,
-				   &argp->attrs, S_IFDIR, 0, &resp->fh);
+				   &argp->attrs, S_IFDIR, 0, &resp->fh,
+				   NULL, NULL);
 	fh_put(&argp->fh);
 	if (resp->status != nfs_ok)
 		goto out;
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index d79db56475d4..1e7ca39e8a49 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1366,8 +1366,10 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
  */
 __be32
 nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
-		char *fname, int flen, struct iattr *iap,
-		int type, dev_t rdev, struct svc_fh *resfhp)
+	    char *fname, int flen, struct iattr *iap,
+	    int type, dev_t rdev, struct svc_fh *resfhp,
+	    void (*post_create)(struct svc_fh *fh, void *data),
+	    void *data)
 {
 	struct dentry	*dentry, *dchild = NULL;
 	__be32		err;
@@ -1389,8 +1391,10 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	fh_lock_nested(fhp, I_MUTEX_PARENT);
 	dchild = lookup_one_len(fname, dentry, flen);
 	host_err = PTR_ERR(dchild);
-	if (IS_ERR(dchild))
-		return nfserrno(host_err);
+	if (IS_ERR(dchild)) {
+		err = nfserrno(host_err);
+		goto out_unlock;
+	}
 	err = fh_compose(resfhp, fhp->fh_export, dchild, fhp);
 	/*
 	 * We unconditionally drop our ref to dchild as fh_compose will have
@@ -1398,9 +1402,14 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	 */
 	dput(dchild);
 	if (err)
-		return err;
-	return nfsd_create_locked(rqstp, fhp, fname, flen, iap, type,
-					rdev, resfhp);
+		goto out_unlock;
+	err = nfsd_create_locked(rqstp, fhp, fname, flen, iap, type,
+				 rdev, resfhp);
+	if (!err && post_create)
+		post_create(resfhp, data);
+out_unlock:
+	fh_unlock(fhp);
+	return err;
 }
 
 /*
@@ -1447,9 +1456,11 @@ nfsd_readlink(struct svc_rqst *rqstp, struct svc_fh *fhp, char *buf, int *lenp)
  */
 __be32
 nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
-				char *fname, int flen,
-				char *path,
-				struct svc_fh *resfhp)
+	     char *fname, int flen,
+	     char *path,
+	     struct svc_fh *resfhp,
+	     void (*post_create)(struct svc_fh *fh, void *data),
+	     void *data)
 {
 	struct dentry	*dentry, *dnew;
 	__be32		err, cerr;
@@ -1474,12 +1485,12 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	dentry = fhp->fh_dentry;
 	dnew = lookup_one_len(fname, dentry, flen);
 	host_err = PTR_ERR(dnew);
-	if (IS_ERR(dnew))
+	if (IS_ERR(dnew)) {
+		fh_unlock(fhp);
 		goto out_nfserr;
-
+	}
 	host_err = vfs_symlink(&init_user_ns, d_inode(dentry), dnew, path);
 	err = nfserrno(host_err);
-	fh_unlock(fhp);
 	if (!err)
 		err = nfserrno(commit_metadata(fhp));
 
@@ -1488,6 +1499,9 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	cerr = fh_compose(resfhp, fhp->fh_export, dnew, fhp);
 	dput(dnew);
 	if (err==0) err = cerr;
+	if (!err && post_create)
+		post_create(resfhp, data);
+	fh_unlock(fhp);
 out:
 	return err;
 
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 26347d76f44a..9f4fd3060200 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -66,8 +66,10 @@ __be32		nfsd_create_locked(struct svc_rqst *, struct svc_fh *,
 				char *name, int len, struct iattr *attrs,
 				int type, dev_t rdev, struct svc_fh *res);
 __be32		nfsd_create(struct svc_rqst *, struct svc_fh *,
-				char *name, int len, struct iattr *attrs,
-				int type, dev_t rdev, struct svc_fh *res);
+			    char *name, int len, struct iattr *attrs,
+			    int type, dev_t rdev, struct svc_fh *res,
+			    void (*post_create)(struct svc_fh *fh, void *data),
+			    void *data);
 __be32		nfsd_access(struct svc_rqst *, struct svc_fh *, u32 *, u32 *);
 __be32		nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp,
 				struct svc_fh *resfhp, struct iattr *iap);
@@ -111,7 +113,10 @@ __be32		nfsd_readlink(struct svc_rqst *, struct svc_fh *,
 				char *, int *);
 __be32		nfsd_symlink(struct svc_rqst *, struct svc_fh *,
 				char *name, int len, char *path,
-				struct svc_fh *res);
+				struct svc_fh *res,
+				void (*post_create)(struct svc_fh *fh,
+						    void *data),
+				void *data);
 __be32		nfsd_link(struct svc_rqst *, struct svc_fh *,
 				char *, int, struct svc_fh *);
 ssize_t		nfsd_copy_file_range(struct file *, u64,



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 3/8] NFSD: always drop directory lock in nfsd_unlink()
  2022-07-06  4:18 [PATCH 0/8] NFSD: clean up locking NeilBrown
                   ` (5 preceding siblings ...)
  2022-07-06  4:18 ` [PATCH 4/8] NFSD: only call fh_unlock() once in nfsd_link() NeilBrown
@ 2022-07-06  4:18 ` NeilBrown
  2022-07-06 13:30   ` Jeff Layton
  2022-07-06  4:18 ` [PATCH 5/8] NFSD: reduce locking in nfsd_lookup() NeilBrown
  2022-07-06 16:29 ` [PATCH 0/8] NFSD: clean up locking Chuck Lever III
  8 siblings, 1 reply; 40+ messages in thread
From: NeilBrown @ 2022-07-06  4:18 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton; +Cc: linux-nfs

Some error paths in nfsd_unlink() allow it to exit without unlocking the
directory.  This is not a problem in practice as the directory will be
locked with an fh_put(), but it is untidy and potentially confusing.

This allows us to remove all the fh_unlock() calls that are immediately
after nfsd_unlink() calls.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs3proc.c |    2 --
 fs/nfsd/nfs4proc.c |    4 +---
 fs/nfsd/vfs.c      |    7 +++++--
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index 38255365ef71..ad7941001106 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -478,7 +478,6 @@ nfsd3_proc_remove(struct svc_rqst *rqstp)
 	fh_copy(&resp->fh, &argp->fh);
 	resp->status = nfsd_unlink(rqstp, &resp->fh, -S_IFDIR,
 				   argp->name, argp->len);
-	fh_unlock(&resp->fh);
 	return rpc_success;
 }
 
@@ -499,7 +498,6 @@ nfsd3_proc_rmdir(struct svc_rqst *rqstp)
 	fh_copy(&resp->fh, &argp->fh);
 	resp->status = nfsd_unlink(rqstp, &resp->fh, S_IFDIR,
 				   argp->name, argp->len);
-	fh_unlock(&resp->fh);
 	return rpc_success;
 }
 
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 3279daab909d..4737019738ab 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1052,10 +1052,8 @@ nfsd4_remove(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		return nfserr_grace;
 	status = nfsd_unlink(rqstp, &cstate->current_fh, 0,
 			     remove->rm_name, remove->rm_namelen);
-	if (!status) {
-		fh_unlock(&cstate->current_fh);
+	if (!status)
 		set_change_info(&remove->rm_cinfo, &cstate->current_fh);
-	}
 	return status;
 }
 
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 1e7ca39e8a49..3f4579f5775c 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1762,12 +1762,12 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
 	rdentry = lookup_one_len(fname, dentry, flen);
 	host_err = PTR_ERR(rdentry);
 	if (IS_ERR(rdentry))
-		goto out_drop_write;
+		goto out_unlock;
 
 	if (d_really_is_negative(rdentry)) {
 		dput(rdentry);
 		host_err = -ENOENT;
-		goto out_drop_write;
+		goto out_unlock;
 	}
 	rinode = d_inode(rdentry);
 	ihold(rinode);
@@ -1805,6 +1805,9 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
 	}
 out:
 	return err;
+out_unlock:
+	fh_unlock(fhp);
+	goto out_drop_write;
 }
 
 /*



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 4/8] NFSD: only call fh_unlock() once in nfsd_link()
  2022-07-06  4:18 [PATCH 0/8] NFSD: clean up locking NeilBrown
                   ` (4 preceding siblings ...)
  2022-07-06  4:18 ` [PATCH 1/8] NFSD: drop rqstp arg to do_set_nfs4_acl() NeilBrown
@ 2022-07-06  4:18 ` NeilBrown
  2022-07-06 13:31   ` Jeff Layton
  2022-07-06 16:29   ` Chuck Lever III
  2022-07-06  4:18 ` [PATCH 3/8] NFSD: always drop directory lock in nfsd_unlink() NeilBrown
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 40+ messages in thread
From: NeilBrown @ 2022-07-06  4:18 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton; +Cc: linux-nfs

On non-error paths, nfsd_link() calls fh_unlock() twice.  This is safe
because fh_unlock() records that the unlock has been done and doesn't
repeat it.
However it makes the code a little confusing and interferes with changes
that are planned for directory locking.

So rearrange the code to ensure fh_unlock() is called exactly once if
fh_lock() was called.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/vfs.c |   18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 3f4579f5775c..4916c29af0fa 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1551,8 +1551,10 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
 
 	dnew = lookup_one_len(name, ddir, len);
 	host_err = PTR_ERR(dnew);
-	if (IS_ERR(dnew))
-		goto out_nfserr;
+	if (IS_ERR(dnew)) {
+		err = nfserrno(host_err);
+		goto out_unlock;
+	}
 
 	dold = tfhp->fh_dentry;
 
@@ -1571,17 +1573,17 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
 		else
 			err = nfserrno(host_err);
 	}
-out_dput:
 	dput(dnew);
-out_unlock:
-	fh_unlock(ffhp);
+out_drop_write:
 	fh_drop_write(tfhp);
 out:
 	return err;
 
-out_nfserr:
-	err = nfserrno(host_err);
-	goto out_unlock;
+out_dput:
+	dput(dnew);
+out_unlock:
+	fh_unlock(ffhp);
+	goto out_drop_write;
 }
 
 static void



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 5/8] NFSD: reduce locking in nfsd_lookup()
  2022-07-06  4:18 [PATCH 0/8] NFSD: clean up locking NeilBrown
                   ` (6 preceding siblings ...)
  2022-07-06  4:18 ` [PATCH 3/8] NFSD: always drop directory lock in nfsd_unlink() NeilBrown
@ 2022-07-06  4:18 ` NeilBrown
  2022-07-06 13:47   ` Jeff Layton
  2022-07-06 16:29   ` Chuck Lever III
  2022-07-06 16:29 ` [PATCH 0/8] NFSD: clean up locking Chuck Lever III
  8 siblings, 2 replies; 40+ messages in thread
From: NeilBrown @ 2022-07-06  4:18 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton; +Cc: linux-nfs

nfsd_lookup() takes an exclusive lock on the parent inode, but many
callers don't want the lock and may not need to lock at all if the
result is in the dcache.

Change nfsd_lookup() to be passed a bool flag.
If false, don't take the lock.
If true, do take an exclusive lock, and return with it held if
successful.
If nfsd_lookup() returns an error, the lock WILL NOT be held.

Only nfsd4_open() requests the lock to be held, and does so to block
rename until it decides whether to return a delegation.

NOTE: when nfsd4_open() creates a file, the directory does *NOT* remain
  locked and never has.  So it is possible (though unlikely) for the
  newly created file to be renamed before a delegation is handed out,
  and that would be bad.  This patch does *NOT* fix that, but *DOES*
  take the directory lock immediately after creating the file, which
  reduces the size of the window and ensure that the lock is held
  consistently.  More work is needed to guarantee no rename happens
  before the delegation.

NOTE-2: NFSv4 requires directory changeinfo for OPEN even when a create
  wasn't requested and no change happened.  Now that nfsd_lookup()
  doesn't use fh_lock(), we need explicit fh_fill_pre/post_attrs()
  in the non-create branch of do_open_lookup().

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs3proc.c |    2 +-
 fs/nfsd/nfs4proc.c |   51 ++++++++++++++++++++++++++++------------
 fs/nfsd/nfsproc.c  |    2 +-
 fs/nfsd/vfs.c      |   66 +++++++++++++++++++++++++++++++++++-----------------
 fs/nfsd/vfs.h      |    8 ++++--
 5 files changed, 88 insertions(+), 41 deletions(-)

diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index ad7941001106..3a67d0afb885 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -96,7 +96,7 @@ nfsd3_proc_lookup(struct svc_rqst *rqstp)
 
 	resp->status = nfsd_lookup(rqstp, &resp->dirfh,
 				   argp->name, argp->len,
-				   &resp->fh);
+				   &resp->fh, false);
 	return rpc_success;
 }
 
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 4737019738ab..6ec22c69cbec 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -414,7 +414,8 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
 }
 
 static __be32
-do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd4_open *open, struct svc_fh **resfh)
+do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+	       struct nfsd4_open *open, struct svc_fh **resfh)
 {
 	struct svc_fh *current_fh = &cstate->current_fh;
 	int accmode;
@@ -441,11 +442,18 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru
 		 * yes          | no     | GUARDED4        | GUARDED4
 		 * yes          | yes    | GUARDED4        | GUARDED4
 		 */
-
 		current->fs->umask = open->op_umask;
 		status = nfsd4_create_file(rqstp, current_fh, *resfh, open);
 		current->fs->umask = 0;
 
+		if (!status)
+			/* We really want to hold the lock from before the
+			 * create to ensure no rename happens, but that
+			 * needs more work...
+			 */
+			inode_lock_nested(current_fh->fh_dentry->d_inode,
+					  I_MUTEX_PARENT);
+
 		if (!status && open->op_label.len)
 			nfsd4_security_inode_setsecctx(*resfh, &open->op_label, open->op_bmval);
 
@@ -457,17 +465,25 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru
 		if (nfsd4_create_is_exclusive(open->op_createmode) && status == 0)
 			open->op_bmval[1] |= (FATTR4_WORD1_TIME_ACCESS |
 						FATTR4_WORD1_TIME_MODIFY);
-	} else
-		/*
-		 * Note this may exit with the parent still locked.
-		 * We will hold the lock until nfsd4_open's final
-		 * lookup, to prevent renames or unlinks until we've had
-		 * a chance to an acquire a delegation if appropriate.
+	} else {
+		/* We want to keep the directory locked until we've had a chance
+		 * to acquire a delegation if appropriate, so request that
+		 * nfsd_lookup() hold on to the lock.
 		 */
 		status = nfsd_lookup(rqstp, current_fh,
-				     open->op_fname, open->op_fnamelen, *resfh);
+				     open->op_fname, open->op_fnamelen, *resfh,
+				     true);
+		if (!status) {
+			/* NFSv4 protocol requires change attributes even though
+			 * no change happened.
+			 */
+			fh_fill_pre_attrs(current_fh);
+			fh_fill_post_attrs(current_fh);
+		}
+	}
 	if (status)
-		goto out;
+		return status;
+
 	status = nfsd_check_obj_isreg(*resfh);
 	if (status)
 		goto out;
@@ -483,6 +499,8 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru
 	status = do_open_permission(rqstp, *resfh, open, accmode);
 	set_change_info(&open->op_cinfo, current_fh);
 out:
+	if (status)
+		inode_unlock(current_fh->fh_dentry->d_inode);
 	return status;
 }
 
@@ -540,6 +558,7 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	struct net *net = SVC_NET(rqstp);
 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 	bool reclaim = false;
+	bool locked = false;
 
 	dprintk("NFSD: nfsd4_open filename %.*s op_openowner %p\n",
 		(int)open->op_fnamelen, open->op_fname,
@@ -604,6 +623,7 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		status = do_open_lookup(rqstp, cstate, open, &resfh);
 		if (status)
 			goto out;
+		locked = true;
 		break;
 	case NFS4_OPEN_CLAIM_PREVIOUS:
 		status = nfs4_check_open_reclaim(cstate->clp);
@@ -639,6 +659,8 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		fput(open->op_filp);
 		open->op_filp = NULL;
 	}
+	if (locked)
+		inode_unlock(cstate->current_fh.fh_dentry->d_inode);
 	if (resfh && resfh != &cstate->current_fh) {
 		fh_dup2(&cstate->current_fh, resfh);
 		fh_put(resfh);
@@ -933,7 +955,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
 		return nfserr_noent;
 	}
 	fh_put(&tmp_fh);
-	return nfsd_lookup(rqstp, fh, "..", 2, fh);
+	return nfsd_lookup(rqstp, fh, "..", 2, fh, false);
 }
 
 static __be32
@@ -949,7 +971,7 @@ nfsd4_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 {
 	return nfsd_lookup(rqstp, &cstate->current_fh,
 			   u->lookup.lo_name, u->lookup.lo_len,
-			   &cstate->current_fh);
+			   &cstate->current_fh, false);
 }
 
 static __be32
@@ -1089,11 +1111,10 @@ nfsd4_secinfo(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	if (err)
 		return err;
 	err = nfsd_lookup_dentry(rqstp, &cstate->current_fh,
-				    secinfo->si_name, secinfo->si_namelen,
-				    &exp, &dentry);
+				 secinfo->si_name, secinfo->si_namelen,
+				 &exp, &dentry, false);
 	if (err)
 		return err;
-	fh_unlock(&cstate->current_fh);
 	if (d_really_is_negative(dentry)) {
 		exp_put(exp);
 		err = nfserr_noent;
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index a25b8e321662..ed24fae09517 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -133,7 +133,7 @@ nfsd_proc_lookup(struct svc_rqst *rqstp)
 
 	fh_init(&resp->fh, NFS_FHSIZE);
 	resp->status = nfsd_lookup(rqstp, &argp->fh, argp->name, argp->len,
-				   &resp->fh);
+				   &resp->fh, false);
 	fh_put(&argp->fh);
 	if (resp->status != nfs_ok)
 		goto out;
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 4916c29af0fa..8e050c6d112a 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -172,7 +172,8 @@ int nfsd_mountpoint(struct dentry *dentry, struct svc_export *exp)
 __be32
 nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		   const char *name, unsigned int len,
-		   struct svc_export **exp_ret, struct dentry **dentry_ret)
+		   struct svc_export **exp_ret, struct dentry **dentry_ret,
+		   bool locked)
 {
 	struct svc_export	*exp;
 	struct dentry		*dparent;
@@ -199,27 +200,31 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp,
 				goto out_nfserr;
 		}
 	} else {
-		/*
-		 * In the nfsd4_open() case, this may be held across
-		 * subsequent open and delegation acquisition which may
-		 * need to take the child's i_mutex:
-		 */
-		fh_lock_nested(fhp, I_MUTEX_PARENT);
-		dentry = lookup_one_len(name, dparent, len);
+		if (locked)
+			dentry = lookup_one_len(name, dparent, len);
+		else
+			dentry = lookup_one_len_unlocked(name, dparent, len);
 		host_err = PTR_ERR(dentry);
 		if (IS_ERR(dentry))
 			goto out_nfserr;
 		if (nfsd_mountpoint(dentry, exp)) {
 			/*
-			 * We don't need the i_mutex after all.  It's
-			 * still possible we could open this (regular
-			 * files can be mountpoints too), but the
-			 * i_mutex is just there to prevent renames of
-			 * something that we might be about to delegate,
-			 * and a mountpoint won't be renamed:
+			 * nfsd_cross_mnt() may wait for an upcall
+			 * to userspace, and holding i_sem across that
+			 * invites the possibility of a deadlock.
+			 * We don't really need the lock on the parent
+			 * of a mount point was we only need it to guard
+			 * against a rename before we get a lease for a
+			 * delegation.
+			 * So just drop the i_sem and reclaim it.
 			 */
-			fh_unlock(fhp);
-			if ((host_err = nfsd_cross_mnt(rqstp, &dentry, &exp))) {
+			if (locked)
+				inode_unlock(dparent->d_inode);
+			host_err = nfsd_cross_mnt(rqstp, &dentry, &exp);
+			if (locked)
+				inode_lock_nested(dparent->d_inode,
+						  I_MUTEX_PARENT);
+			if (host_err) {
 				dput(dentry);
 				goto out_nfserr;
 			}
@@ -234,7 +239,17 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	return nfserrno(host_err);
 }
 
-/*
+/**
+ * nfsd_lookup - look up a single path component for nfsd
+ *
+ * @rqstp:   the request context
+ * @ftp:     the file handle of the directory
+ * @name:    the component name, or %NULL to look up parent
+ * @len:     length of name to examine
+ * @resfh:   pointer to pre-initialised filehandle to hold result.
+ * @lock:    if true, lock directory during lookup and keep it locked
+ *           if there is no error.
+ *
  * Look up one component of a pathname.
  * N.B. After this call _both_ fhp and resfh need an fh_put
  *
@@ -244,11 +259,15 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp,
  * returned. Otherwise the covered directory is returned.
  * NOTE: this mountpoint crossing is not supported properly by all
  *   clients and is explicitly disallowed for NFSv3
- *      NeilBrown <neilb@cse.unsw.edu.au>
+ *
+ * Only nfsd4_open() calls this with @lock set.  It does so to block
+ * renames/unlinks before it possibly gets a lease to provide a
+ * delegation.
  */
 __be32
 nfsd_lookup(struct svc_rqst *rqstp, struct svc_fh *fhp, const char *name,
-				unsigned int len, struct svc_fh *resfh)
+	    unsigned int len, struct svc_fh *resfh,
+	    bool lock)
 {
 	struct svc_export	*exp;
 	struct dentry		*dentry;
@@ -257,9 +276,11 @@ nfsd_lookup(struct svc_rqst *rqstp, struct svc_fh *fhp, const char *name,
 	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
 	if (err)
 		return err;
-	err = nfsd_lookup_dentry(rqstp, fhp, name, len, &exp, &dentry);
+	if (lock)
+		inode_lock_nested(fhp->fh_dentry->d_inode, I_MUTEX_PARENT);
+	err = nfsd_lookup_dentry(rqstp, fhp, name, len, &exp, &dentry, lock);
 	if (err)
-		return err;
+		goto out_err;
 	err = check_nfsd_access(exp, rqstp);
 	if (err)
 		goto out;
@@ -273,6 +294,9 @@ nfsd_lookup(struct svc_rqst *rqstp, struct svc_fh *fhp, const char *name,
 out:
 	dput(dentry);
 	exp_put(exp);
+out_err:
+	if (err && lock)
+		inode_unlock(fhp->fh_dentry->d_inode);
 	return err;
 }
 
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 9f4fd3060200..290788f007d4 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -45,10 +45,12 @@ typedef int (*nfsd_filldir_t)(void *, const char *, int, loff_t, u64, unsigned);
 int		nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp,
 		                struct svc_export **expp);
 __be32		nfsd_lookup(struct svc_rqst *, struct svc_fh *,
-				const char *, unsigned int, struct svc_fh *);
+			    const char *, unsigned int, struct svc_fh *,
+			    bool);
 __be32		 nfsd_lookup_dentry(struct svc_rqst *, struct svc_fh *,
-				const char *, unsigned int,
-				struct svc_export **, struct dentry **);
+				    const char *, unsigned int,
+				    struct svc_export **, struct dentry **,
+				    bool);
 __be32		nfsd_setattr(struct svc_rqst *, struct svc_fh *,
 				struct iattr *, int, time64_t);
 int nfsd_mountpoint(struct dentry *, struct svc_export *);



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 6/8] NFSD: use explicit lock/unlock for directory ops
  2022-07-06  4:18 [PATCH 0/8] NFSD: clean up locking NeilBrown
@ 2022-07-06  4:18 ` NeilBrown
  2022-07-06 14:05   ` Jeff Layton
                     ` (2 more replies)
  2022-07-06  4:18 ` [PATCH 8/8] NFSD: discard fh_locked flag and fh_lock/fh_unlock NeilBrown
                   ` (7 subsequent siblings)
  8 siblings, 3 replies; 40+ messages in thread
From: NeilBrown @ 2022-07-06  4:18 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton; +Cc: linux-nfs

When creating or unlinking a name in a directory use explicit
inode_lock_nested() instead of fh_lock(), and explicit calls to
fh_fill_pre_attrs() and fh_fill_post_attrs().  This is already done for
renames.

Also move the 'fill' calls closer to the operation that might change the
attributes.  This way they are avoided on some error paths.

Having the locking explicit will simplify proposed future changes to
locking for directories.  It also makes it easily visible exactly where
pre/post attributes are used - not all callers of fh_lock() actually
need the pre/post attributes.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs3proc.c |    6 ++++--
 fs/nfsd/nfs4proc.c |    6 ++++--
 fs/nfsd/nfsproc.c  |    7 ++++---
 fs/nfsd/vfs.c      |   30 +++++++++++++++++++-----------
 4 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index 3a67d0afb885..9629517344ff 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -254,7 +254,7 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (host_err)
 		return nfserrno(host_err);
 
-	fh_lock_nested(fhp, I_MUTEX_PARENT);
+	inode_lock_nested(inode, I_MUTEX_PARENT);
 
 	child = lookup_one_len(argp->name, parent, argp->len);
 	if (IS_ERR(child)) {
@@ -312,11 +312,13 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (!IS_POSIXACL(inode))
 		iap->ia_mode &= ~current_umask();
 
+	fh_fill_pre_attrs(fhp);
 	host_err = vfs_create(&init_user_ns, inode, child, iap->ia_mode, true);
 	if (host_err < 0) {
 		status = nfserrno(host_err);
 		goto out;
 	}
+	fh_fill_post_attrs(fhp);
 
 	/* A newly created file already has a file size of zero. */
 	if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0))
@@ -334,7 +336,7 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	status = nfsd_create_setattr(rqstp, fhp, resfhp, iap);
 
 out:
-	fh_unlock(fhp);
+	inode_unlock(inode);
 	if (child && !IS_ERR(child))
 		dput(child);
 	fh_drop_write(fhp);
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 6ec22c69cbec..242f059e6788 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -306,7 +306,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (host_err)
 		return nfserrno(host_err);
 
-	fh_lock_nested(fhp, I_MUTEX_PARENT);
+	inode_lock_nested(inode, I_MUTEX_PARENT);
 
 	child = lookup_one_len(open->op_fname, parent, open->op_fnamelen);
 	if (IS_ERR(child)) {
@@ -385,10 +385,12 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (!IS_POSIXACL(inode))
 		iap->ia_mode &= ~current_umask();
 
+	fh_fill_pre_attrs(fhp);
 	status = nfsd4_vfs_create(fhp, child, open);
 	if (status != nfs_ok)
 		goto out;
 	open->op_created = true;
+	fh_fill_post_attrs(fhp);
 
 	/* A newly created file already has a file size of zero. */
 	if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0))
@@ -406,7 +408,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	status = nfsd_create_setattr(rqstp, fhp, resfhp, iap);
 
 out:
-	fh_unlock(fhp);
+	inode_unlock(inode);
 	if (child && !IS_ERR(child))
 		dput(child);
 	fh_drop_write(fhp);
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index ed24fae09517..427c404bc52b 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -285,7 +285,7 @@ nfsd_proc_create(struct svc_rqst *rqstp)
 		goto done;
 	}
 
-	fh_lock_nested(dirfhp, I_MUTEX_PARENT);
+	inode_lock_nested(dirfhp->fh_dentry->d_inode, I_MUTEX_PARENT);
 	dchild = lookup_one_len(argp->name, dirfhp->fh_dentry, argp->len);
 	if (IS_ERR(dchild)) {
 		resp->status = nfserrno(PTR_ERR(dchild));
@@ -382,6 +382,7 @@ nfsd_proc_create(struct svc_rqst *rqstp)
 	}
 
 	resp->status = nfs_ok;
+	fh_fill_pre_attrs(dirfhp);
 	if (!inode) {
 		/* File doesn't exist. Create it and set attrs */
 		resp->status = nfsd_create_locked(rqstp, dirfhp, argp->name,
@@ -399,10 +400,10 @@ nfsd_proc_create(struct svc_rqst *rqstp)
 			resp->status = nfsd_setattr(rqstp, newfhp, attr, 0,
 						    (time64_t)0);
 	}
+	fh_fill_post_attrs(dirfhp);
 
 out_unlock:
-	/* We don't really need to unlock, as fh_put does it. */
-	fh_unlock(dirfhp);
+	inode_unlock(dirfhp->fh_dentry->d_inode);
 	fh_drop_write(dirfhp);
 done:
 	fh_put(dirfhp);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 8e050c6d112a..2ca748aa83bb 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1412,7 +1412,7 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (host_err)
 		return nfserrno(host_err);
 
-	fh_lock_nested(fhp, I_MUTEX_PARENT);
+	inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
 	dchild = lookup_one_len(fname, dentry, flen);
 	host_err = PTR_ERR(dchild);
 	if (IS_ERR(dchild)) {
@@ -1427,12 +1427,14 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	dput(dchild);
 	if (err)
 		goto out_unlock;
+	fh_fill_pre_attrs(fhp);
 	err = nfsd_create_locked(rqstp, fhp, fname, flen, iap, type,
 				 rdev, resfhp);
 	if (!err && post_create)
 		post_create(resfhp, data);
+	fh_fill_post_attrs(fhp);
 out_unlock:
-	fh_unlock(fhp);
+	inode_unlock(dentry->d_inode);
 	return err;
 }
 
@@ -1505,14 +1507,15 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (host_err)
 		goto out_nfserr;
 
-	fh_lock(fhp);
 	dentry = fhp->fh_dentry;
+	inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
 	dnew = lookup_one_len(fname, dentry, flen);
 	host_err = PTR_ERR(dnew);
 	if (IS_ERR(dnew)) {
-		fh_unlock(fhp);
+		inode_unlock(dentry->d_inode);
 		goto out_nfserr;
 	}
+	fh_fill_pre_attrs(fhp);
 	host_err = vfs_symlink(&init_user_ns, d_inode(dentry), dnew, path);
 	err = nfserrno(host_err);
 	if (!err)
@@ -1525,7 +1528,8 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (err==0) err = cerr;
 	if (!err && post_create)
 		post_create(resfhp, data);
-	fh_unlock(fhp);
+	fh_fill_post_attrs(fhp);
+	inode_unlock(dentry->d_inode);
 out:
 	return err;
 
@@ -1569,9 +1573,9 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
 		goto out;
 	}
 
-	fh_lock_nested(ffhp, I_MUTEX_PARENT);
 	ddir = ffhp->fh_dentry;
 	dirp = d_inode(ddir);
+	inode_lock_nested(dirp, I_MUTEX_PARENT);
 
 	dnew = lookup_one_len(name, ddir, len);
 	host_err = PTR_ERR(dnew);
@@ -1585,8 +1589,10 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
 	err = nfserr_noent;
 	if (d_really_is_negative(dold))
 		goto out_dput;
+	fh_fill_pre_attrs(ffhp);
 	host_err = vfs_link(dold, &init_user_ns, dirp, dnew, NULL);
-	fh_unlock(ffhp);
+	fh_fill_post_attrs(ffhp);
+	inode_unlock(dirp);
 	if (!host_err) {
 		err = nfserrno(commit_metadata(ffhp));
 		if (!err)
@@ -1606,7 +1612,7 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
 out_dput:
 	dput(dnew);
 out_unlock:
-	fh_unlock(ffhp);
+	inode_unlock(dirp);
 	goto out_drop_write;
 }
 
@@ -1781,9 +1787,9 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
 	if (host_err)
 		goto out_nfserr;
 
-	fh_lock_nested(fhp, I_MUTEX_PARENT);
 	dentry = fhp->fh_dentry;
 	dirp = d_inode(dentry);
+	inode_lock_nested(dirp, I_MUTEX_PARENT);
 
 	rdentry = lookup_one_len(fname, dentry, flen);
 	host_err = PTR_ERR(rdentry);
@@ -1801,6 +1807,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
 	if (!type)
 		type = d_inode(rdentry)->i_mode & S_IFMT;
 
+	fh_fill_pre_attrs(fhp);
 	if (type != S_IFDIR) {
 		if (rdentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK)
 			nfsd_close_cached_files(rdentry);
@@ -1808,8 +1815,9 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
 	} else {
 		host_err = vfs_rmdir(&init_user_ns, dirp, rdentry);
 	}
+	fh_fill_post_attrs(fhp);
 
-	fh_unlock(fhp);
+	inode_unlock(dirp);
 	if (!host_err)
 		host_err = commit_metadata(fhp);
 	dput(rdentry);
@@ -1832,7 +1840,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
 out:
 	return err;
 out_unlock:
-	fh_unlock(fhp);
+	inode_unlock(dirp);
 	goto out_drop_write;
 }
 



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 7/8] NFSD: use (un)lock_inode instead of fh_(un)lock for file operations
  2022-07-06  4:18 [PATCH 0/8] NFSD: clean up locking NeilBrown
  2022-07-06  4:18 ` [PATCH 6/8] NFSD: use explicit lock/unlock for directory ops NeilBrown
  2022-07-06  4:18 ` [PATCH 8/8] NFSD: discard fh_locked flag and fh_lock/fh_unlock NeilBrown
@ 2022-07-06  4:18 ` NeilBrown
  2022-07-06 14:10   ` Jeff Layton
  2022-07-06 16:30   ` Chuck Lever III
  2022-07-06  4:18 ` [PATCH 2/8] NFSD: change nfsd_create() to unlock directory before returning NeilBrown
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 40+ messages in thread
From: NeilBrown @ 2022-07-06  4:18 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton; +Cc: linux-nfs

When locking a file to access ACLs and xattrs etc, use explicit locking
with inode_lock() instead of fh_lock().  This means that the calls to
fh_fill_pre/post_attr() are also explicit which improves readability and
allows us to place them only where they are needed.  Only the xattr
calls need pre/post information.

When locking a file we don't need I_MUTEX_PARENT as the file is not a
parent of anything, so we can use inode_lock() directly rather than the
inode_lock_nested() call that fh_lock() uses.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs2acl.c   |    6 +++---
 fs/nfsd/nfs3acl.c   |    4 ++--
 fs/nfsd/nfs4acl.c   |    7 +++----
 fs/nfsd/nfs4state.c |    8 ++++----
 fs/nfsd/vfs.c       |   25 ++++++++++++-------------
 5 files changed, 24 insertions(+), 26 deletions(-)

diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c
index b5760801d377..9edd3c1a30fb 100644
--- a/fs/nfsd/nfs2acl.c
+++ b/fs/nfsd/nfs2acl.c
@@ -111,7 +111,7 @@ static __be32 nfsacld_proc_setacl(struct svc_rqst *rqstp)
 	if (error)
 		goto out_errno;
 
-	fh_lock(fh);
+	inode_lock(inode);
 
 	error = set_posix_acl(&init_user_ns, inode, ACL_TYPE_ACCESS,
 			      argp->acl_access);
@@ -122,7 +122,7 @@ static __be32 nfsacld_proc_setacl(struct svc_rqst *rqstp)
 	if (error)
 		goto out_drop_lock;
 
-	fh_unlock(fh);
+	inode_unlock(inode);
 
 	fh_drop_write(fh);
 
@@ -136,7 +136,7 @@ static __be32 nfsacld_proc_setacl(struct svc_rqst *rqstp)
 	return rpc_success;
 
 out_drop_lock:
-	fh_unlock(fh);
+	inode_unlock(inode);
 	fh_drop_write(fh);
 out_errno:
 	resp->status = nfserrno(error);
diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c
index 35b2ebda14da..9446c6743664 100644
--- a/fs/nfsd/nfs3acl.c
+++ b/fs/nfsd/nfs3acl.c
@@ -101,7 +101,7 @@ static __be32 nfsd3_proc_setacl(struct svc_rqst *rqstp)
 	if (error)
 		goto out_errno;
 
-	fh_lock(fh);
+	inode_lock(inode);
 
 	error = set_posix_acl(&init_user_ns, inode, ACL_TYPE_ACCESS,
 			      argp->acl_access);
@@ -111,7 +111,7 @@ static __be32 nfsd3_proc_setacl(struct svc_rqst *rqstp)
 			      argp->acl_default);
 
 out_drop_lock:
-	fh_unlock(fh);
+	inode_unlock(inode);
 	fh_drop_write(fh);
 out_errno:
 	resp->status = nfserrno(error);
diff --git a/fs/nfsd/nfs4acl.c b/fs/nfsd/nfs4acl.c
index 5c9b7e01e8ca..a33cacf62ea0 100644
--- a/fs/nfsd/nfs4acl.c
+++ b/fs/nfsd/nfs4acl.c
@@ -781,19 +781,18 @@ nfsd4_set_nfs4_acl(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (host_error < 0)
 		goto out_nfserr;
 
-	fh_lock(fhp);
+	inode_lock(inode);
 
 	host_error = set_posix_acl(&init_user_ns, inode, ACL_TYPE_ACCESS, pacl);
 	if (host_error < 0)
 		goto out_drop_lock;
 
-	if (S_ISDIR(inode->i_mode)) {
+	if (S_ISDIR(inode->i_mode))
 		host_error = set_posix_acl(&init_user_ns, inode,
 					   ACL_TYPE_DEFAULT, dpacl);
-	}
 
 out_drop_lock:
-	fh_unlock(fhp);
+	inode_unlock(inode);
 
 	posix_acl_release(pacl);
 	posix_acl_release(dpacl);
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 9d1a3e131c49..307317ba9aff 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -7322,21 +7322,21 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 static __be32 nfsd_test_lock(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file_lock *lock)
 {
 	struct nfsd_file *nf;
+	struct inode *inode = fhp->fh_dentry->d_inode;
 	__be32 err;
 
 	err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
 	if (err)
 		return err;
-	fh_lock(fhp); /* to block new leases till after test_lock: */
-	err = nfserrno(nfsd_open_break_lease(fhp->fh_dentry->d_inode,
-							NFSD_MAY_READ));
+	inode_lock(inode); /* to block new leases till after test_lock: */
+	err = nfserrno(nfsd_open_break_lease(inode, NFSD_MAY_READ));
 	if (err)
 		goto out;
 	lock->fl_file = nf->nf_file;
 	err = nfserrno(vfs_test_lock(nf->nf_file, lock));
 	lock->fl_file = NULL;
 out:
-	fh_unlock(fhp);
+	inode_unlock(inode);
 	nfsd_file_put(nf);
 	return err;
 }
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 2ca748aa83bb..2526615285ca 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -444,7 +444,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
 			return err;
 	}
 
-	fh_lock(fhp);
+	inode_lock(inode);
 	if (size_change) {
 		/*
 		 * RFC5661, Section 18.30.4:
@@ -480,7 +480,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
 	host_err = notify_change(&init_user_ns, dentry, iap, NULL);
 
 out_unlock:
-	fh_unlock(fhp);
+	inode_unlock(inode);
 	if (size_change)
 		put_write_access(inode);
 out:
@@ -2196,12 +2196,8 @@ nfsd_listxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char **bufp,
 }
 
 /*
- * Removexattr and setxattr need to call fh_lock to both lock the inode
- * and set the change attribute. Since the top-level vfs_removexattr
- * and vfs_setxattr calls already do their own inode_lock calls, call
- * the _locked variant. Pass in a NULL pointer for delegated_inode,
- * and let the client deal with NFS4ERR_DELAY (same as with e.g.
- * setattr and remove).
+ * Pass in a NULL pointer for delegated_inode, and let the client deal
+ * with NFS4ERR_DELAY (same as with e.g.  setattr and remove).
  */
 __be32
 nfsd_removexattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name)
@@ -2217,12 +2213,14 @@ nfsd_removexattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name)
 	if (ret)
 		return nfserrno(ret);
 
-	fh_lock(fhp);
+	inode_lock(fhp->fh_dentry->d_inode);
+	fh_fill_pre_attrs(fhp);
 
 	ret = __vfs_removexattr_locked(&init_user_ns, fhp->fh_dentry,
 				       name, NULL);
 
-	fh_unlock(fhp);
+	fh_fill_post_attrs(fhp);
+	inode_unlock(fhp->fh_dentry->d_inode);
 	fh_drop_write(fhp);
 
 	return nfsd_xattr_errno(ret);
@@ -2242,12 +2240,13 @@ nfsd_setxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name,
 	ret = fh_want_write(fhp);
 	if (ret)
 		return nfserrno(ret);
-	fh_lock(fhp);
+	inode_lock(fhp->fh_dentry->d_inode);
+	fh_fill_pre_attrs(fhp);
 
 	ret = __vfs_setxattr_locked(&init_user_ns, fhp->fh_dentry, name, buf,
 				    len, flags, NULL);
-
-	fh_unlock(fhp);
+	fh_fill_post_attrs(fhp);
+	inode_unlock(fhp->fh_dentry->d_inode);
 	fh_drop_write(fhp);
 
 	return nfsd_xattr_errno(ret);



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 8/8] NFSD: discard fh_locked flag and fh_lock/fh_unlock
  2022-07-06  4:18 [PATCH 0/8] NFSD: clean up locking NeilBrown
  2022-07-06  4:18 ` [PATCH 6/8] NFSD: use explicit lock/unlock for directory ops NeilBrown
@ 2022-07-06  4:18 ` NeilBrown
  2022-07-06 14:12   ` Jeff Layton
  2022-07-06  4:18 ` [PATCH 7/8] NFSD: use (un)lock_inode instead of fh_(un)lock for file operations NeilBrown
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 40+ messages in thread
From: NeilBrown @ 2022-07-06  4:18 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton; +Cc: linux-nfs

As all inode locking is now fully balanced, fh_put() does not need to
call fh_unlock().
fh_lock() and fh_unlock() are no longer used, so discard them.
These are the only real users of ->fh_locked, so discard that too.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfsfh.c |    3 +--
 fs/nfsd/nfsfh.h |   56 ++++---------------------------------------------------
 fs/nfsd/vfs.c   |   17 +----------------
 3 files changed, 6 insertions(+), 70 deletions(-)

diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 5e2ed4b2a925..22a77a5e2327 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -549,7 +549,7 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
 	if (ref_fh == fhp)
 		fh_put(ref_fh);
 
-	if (fhp->fh_locked || fhp->fh_dentry) {
+	if (fhp->fh_dentry) {
 		printk(KERN_ERR "fh_compose: fh %pd2 not initialized!\n",
 		       dentry);
 	}
@@ -681,7 +681,6 @@ fh_put(struct svc_fh *fhp)
 	struct dentry * dentry = fhp->fh_dentry;
 	struct svc_export * exp = fhp->fh_export;
 	if (dentry) {
-		fh_unlock(fhp);
 		fhp->fh_dentry = NULL;
 		dput(dentry);
 		fh_clear_pre_post_attrs(fhp);
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index fb9d358a267e..09c654bdf9b0 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -81,7 +81,6 @@ typedef struct svc_fh {
 	struct dentry *		fh_dentry;	/* validated dentry */
 	struct svc_export *	fh_export;	/* export pointer */
 
-	bool			fh_locked;	/* inode locked by us */
 	bool			fh_want_write;	/* remount protection taken */
 	bool			fh_no_wcc;	/* no wcc data needed */
 	bool			fh_no_atomic_attr;
@@ -93,7 +92,7 @@ typedef struct svc_fh {
 	bool			fh_post_saved;	/* post-op attrs saved */
 	bool			fh_pre_saved;	/* pre-op attrs saved */
 
-	/* Pre-op attributes saved during fh_lock */
+	/* Pre-op attributes saved when inode is locked */
 	__u64			fh_pre_size;	/* size before operation */
 	struct timespec64	fh_pre_mtime;	/* mtime before oper */
 	struct timespec64	fh_pre_ctime;	/* ctime before oper */
@@ -103,7 +102,7 @@ typedef struct svc_fh {
 	 */
 	u64			fh_pre_change;
 
-	/* Post-op attributes saved in fh_unlock */
+	/* Post-op attributes saved in fh_fill_post_attrs() */
 	struct kstat		fh_post_attr;	/* full attrs after operation */
 	u64			fh_post_change; /* nfsv4 change; see above */
 } svc_fh;
@@ -223,8 +222,8 @@ void	fh_put(struct svc_fh *);
 static __inline__ struct svc_fh *
 fh_copy(struct svc_fh *dst, struct svc_fh *src)
 {
-	WARN_ON(src->fh_dentry || src->fh_locked);
-			
+	WARN_ON(src->fh_dentry);
+
 	*dst = *src;
 	return dst;
 }
@@ -323,51 +322,4 @@ static inline u64 nfsd4_change_attribute(struct kstat *stat,
 extern void fh_fill_pre_attrs(struct svc_fh *fhp);
 extern void fh_fill_post_attrs(struct svc_fh *fhp);
 
-
-/*
- * Lock a file handle/inode
- * NOTE: both fh_lock and fh_unlock are done "by hand" in
- * vfs.c:nfsd_rename as it needs to grab 2 i_mutex's at once
- * so, any changes here should be reflected there.
- */
-
-static inline void
-fh_lock_nested(struct svc_fh *fhp, unsigned int subclass)
-{
-	struct dentry	*dentry = fhp->fh_dentry;
-	struct inode	*inode;
-
-	BUG_ON(!dentry);
-
-	if (fhp->fh_locked) {
-		printk(KERN_WARNING "fh_lock: %pd2 already locked!\n",
-			dentry);
-		return;
-	}
-
-	inode = d_inode(dentry);
-	inode_lock_nested(inode, subclass);
-	fh_fill_pre_attrs(fhp);
-	fhp->fh_locked = true;
-}
-
-static inline void
-fh_lock(struct svc_fh *fhp)
-{
-	fh_lock_nested(fhp, I_MUTEX_NORMAL);
-}
-
-/*
- * Unlock a file handle/inode
- */
-static inline void
-fh_unlock(struct svc_fh *fhp)
-{
-	if (fhp->fh_locked) {
-		fh_fill_post_attrs(fhp);
-		inode_unlock(d_inode(fhp->fh_dentry));
-		fhp->fh_locked = false;
-	}
-}
-
 #endif /* _LINUX_NFSD_NFSFH_H */
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 2526615285ca..fe4cdf8ab428 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1305,13 +1305,6 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	dirp = d_inode(dentry);
 
 	dchild = dget(resfhp->fh_dentry);
-	if (!fhp->fh_locked) {
-		WARN_ONCE(1, "nfsd_create: parent %pd2 not locked!\n",
-				dentry);
-		err = nfserr_io;
-		goto out;
-	}
-
 	err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
 	if (err)
 		goto out;
@@ -1674,10 +1667,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
 		goto out;
 	}
 
-	/* cannot use fh_lock as we need deadlock protective ordering
-	 * so do it by hand */
 	trap = lock_rename(tdentry, fdentry);
-	ffhp->fh_locked = tfhp->fh_locked = true;
 	fh_fill_pre_attrs(ffhp);
 	fh_fill_pre_attrs(tfhp);
 
@@ -1733,17 +1723,12 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
 	dput(odentry);
  out_nfserr:
 	err = nfserrno(host_err);
-	/*
-	 * We cannot rely on fh_unlock on the two filehandles,
-	 * as that would do the wrong thing if the two directories
-	 * were the same, so again we do it by hand.
-	 */
+
 	if (!close_cached) {
 		fh_fill_post_attrs(ffhp);
 		fh_fill_post_attrs(tfhp);
 	}
 	unlock_rename(tdentry, fdentry);
-	ffhp->fh_locked = tfhp->fh_locked = false;
 	fh_drop_write(ffhp);
 
 	/*



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/8] NFSD: drop rqstp arg to do_set_nfs4_acl()
  2022-07-06  4:18 ` [PATCH 1/8] NFSD: drop rqstp arg to do_set_nfs4_acl() NeilBrown
@ 2022-07-06 13:17   ` Jeff Layton
  0 siblings, 0 replies; 40+ messages in thread
From: Jeff Layton @ 2022-07-06 13:17 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever; +Cc: linux-nfs

On Wed, 2022-07-06 at 14:18 +1000, NeilBrown wrote:
> do_set_nfs4_acl() only needs rqstp to pass to nfsd4_set_nfs4_acl()
> 
> The latter only needs the rqstp to pass to fh_verify().
> 
> In every case that do_set_nfs4_acl() is called, fh_verify() is not
> needed.  It is only needed for filehandles received from the client, the
> filehandles passed to do_set_nfs4_acl() have just been constructed by
> the server, and so must be valid.
> 
> So we can change nfsd4_set_nfs4_acl() to only call fh_verify() is rqstp
> is not NULL, and always pass NULL from do_set_nfs4_acl().
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/nfs4acl.c  |   12 +++++++-----
>  fs/nfsd/nfs4proc.c |    9 ++++-----
>  2 files changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4acl.c b/fs/nfsd/nfs4acl.c
> index eaa3a0cf38f1..5c9b7e01e8ca 100644
> --- a/fs/nfsd/nfs4acl.c
> +++ b/fs/nfsd/nfs4acl.c
> @@ -753,7 +753,7 @@ static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
>  
>  __be32
>  nfsd4_set_nfs4_acl(struct svc_rqst *rqstp, struct svc_fh *fhp,
> -		struct nfs4_acl *acl)
> +		   struct nfs4_acl *acl)
>  {
>  	__be32 error;
>  	int host_error;
> @@ -762,10 +762,12 @@ nfsd4_set_nfs4_acl(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	struct posix_acl *pacl = NULL, *dpacl = NULL;
>  	unsigned int flags = 0;
>  
> -	/* Get inode */
> -	error = fh_verify(rqstp, fhp, 0, NFSD_MAY_SATTR);
> -	if (error)
> -		return error;
> +	if (rqstp) {
> +		/* Get inode */
> +		error = fh_verify(rqstp, fhp, 0, NFSD_MAY_SATTR);
> +		if (error)
> +			return error;
> +	}
>  
>  	dentry = fhp->fh_dentry;
>  	inode = d_inode(dentry);
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 5af9f8d1feb6..60591ceb4985 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -163,12 +163,11 @@ is_create_with_attrs(struct nfsd4_open *open)
>   * in the returned attr bitmap.
>   */
>  static void
> -do_set_nfs4_acl(struct svc_rqst *rqstp, struct svc_fh *fhp,
> -		struct nfs4_acl *acl, u32 *bmval)
> +do_set_nfs4_acl(struct svc_fh *fhp, struct nfs4_acl *acl, u32 *bmval)
>  {
>  	__be32 status;
>  
> -	status = nfsd4_set_nfs4_acl(rqstp, fhp, acl);
> +	status = nfsd4_set_nfs4_acl(NULL, fhp, acl);
>  	if (status)
>  		/*
>  		 * We should probably fail the whole open at this point,
> @@ -474,7 +473,7 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru
>  		goto out;
>  
>  	if (is_create_with_attrs(open) && open->op_acl != NULL)
> -		do_set_nfs4_acl(rqstp, *resfh, open->op_acl, open->op_bmval);
> +		do_set_nfs4_acl(*resfh, open->op_acl, open->op_bmval);
>  
>  	nfsd4_set_open_owner_reply_cache(cstate, open, *resfh);
>  	accmode = NFSD_MAY_NOP;
> @@ -861,7 +860,7 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  		nfsd4_security_inode_setsecctx(&resfh, &create->cr_label, create->cr_bmval);
>  
>  	if (create->cr_acl != NULL)
> -		do_set_nfs4_acl(rqstp, &resfh, create->cr_acl,
> +		do_set_nfs4_acl(&resfh, create->cr_acl,
>  				create->cr_bmval);
>  
>  	fh_unlock(&cstate->current_fh);
> 
> 

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/8] NFSD: change nfsd_create() to unlock directory before returning.
  2022-07-06  4:18 ` [PATCH 2/8] NFSD: change nfsd_create() to unlock directory before returning NeilBrown
@ 2022-07-06 13:24   ` Jeff Layton
  2022-07-06 16:29   ` Chuck Lever III
  1 sibling, 0 replies; 40+ messages in thread
From: Jeff Layton @ 2022-07-06 13:24 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever; +Cc: linux-nfs

On Wed, 2022-07-06 at 14:18 +1000, NeilBrown wrote:
> nfsd_create() usually exits with the directory still locked.  This
> relies on other code to unlock the directory.  Planned future patches
> will change how directory locking works so the unlock step may be less
> trivial.  It is cleaner to have lock and unlock in the same function.
> 
> nfsd4_create() performs some extra changes after the creation and before
> the unlock - setting security label and an ACL.  To allow for these to
> still be done while locked, we create a function nfsd4_post_create() and
> pass it to nfsd_create() when needed.
> 
> nfsd_symlink() DOES usually unlock the directory, but nfsd4_create() may
> add a label or ACL - with the directory unlocked.  I don't think symlinks
> have ACLs and don't know if they can have labels, so I don't know if
> this is of any practical consequence.  For consistency nfsd_symlink() is
> changed to accept the same callback and call it if given.
> 
> nfsd_symlink() didn't unlock the directory if lookup_one_len() gave an
> error.  This is untidy and potentially confusing, and has now been
> fixed.  It isn't a practical problem as an eventual fh_put() will unlock
> if needed.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/nfs3proc.c |   11 ++++++-----
>  fs/nfsd/nfs4proc.c |   38 ++++++++++++++++++++++++--------------
>  fs/nfsd/nfsproc.c  |    5 +++--
>  fs/nfsd/vfs.c      |   40 +++++++++++++++++++++++++++-------------
>  fs/nfsd/vfs.h      |   11 ++++++++---
>  5 files changed, 68 insertions(+), 37 deletions(-)
> 
> diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
> index 981a3a7a6e16..38255365ef71 100644
> --- a/fs/nfsd/nfs3proc.c
> +++ b/fs/nfsd/nfs3proc.c
> @@ -378,8 +378,8 @@ nfsd3_proc_mkdir(struct svc_rqst *rqstp)
>  	fh_copy(&resp->dirfh, &argp->fh);
>  	fh_init(&resp->fh, NFS3_FHSIZE);
>  	resp->status = nfsd_create(rqstp, &resp->dirfh, argp->name, argp->len,
> -				   &argp->attrs, S_IFDIR, 0, &resp->fh);
> -	fh_unlock(&resp->dirfh);
> +				   &argp->attrs, S_IFDIR, 0, &resp->fh,
> +				   NULL, NULL);
>  	return rpc_success;
>  }
>  
> @@ -414,7 +414,8 @@ nfsd3_proc_symlink(struct svc_rqst *rqstp)
>  	fh_copy(&resp->dirfh, &argp->ffh);
>  	fh_init(&resp->fh, NFS3_FHSIZE);
>  	resp->status = nfsd_symlink(rqstp, &resp->dirfh, argp->fname,
> -				    argp->flen, argp->tname, &resp->fh);
> +				    argp->flen, argp->tname, &resp->fh,
> +				    NULL, NULL);
>  	kfree(argp->tname);
>  out:
>  	return rpc_success;
> @@ -453,8 +454,8 @@ nfsd3_proc_mknod(struct svc_rqst *rqstp)
>  
>  	type = nfs3_ftypes[argp->ftype];
>  	resp->status = nfsd_create(rqstp, &resp->dirfh, argp->name, argp->len,
> -				   &argp->attrs, type, rdev, &resp->fh);
> -	fh_unlock(&resp->dirfh);
> +				   &argp->attrs, type, rdev, &resp->fh,
> +				   NULL, NULL);
>  out:
>  	return rpc_success;
>  }
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 60591ceb4985..3279daab909d 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -780,6 +780,18 @@ nfsd4_commit(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  			     (__be32 *)commit->co_verf.data);
>  }
>  
> +static void nfsd4_post_create(struct svc_fh *fh, void *vcreate)
> +{
> +	struct nfsd4_create *create = vcreate;
> +
> +	if (create->cr_label.len)
> +		nfsd4_security_inode_setsecctx(fh, &create->cr_label,
> +					       create->cr_bmval);
> +
> +	if (create->cr_acl != NULL)
> +		do_set_nfs4_acl(fh, create->cr_acl, create->cr_bmval);
> +}
> +
>  static __be32
>  nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	     union nfsd4_op_u *u)
> @@ -805,7 +817,8 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	case NF4LNK:
>  		status = nfsd_symlink(rqstp, &cstate->current_fh,
>  				      create->cr_name, create->cr_namelen,
> -				      create->cr_data, &resfh);
> +				      create->cr_data, &resfh,
> +				      nfsd4_post_create, create);
>  		break;
>  
>  	case NF4BLK:
> @@ -816,7 +829,8 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  			goto out_umask;
>  		status = nfsd_create(rqstp, &cstate->current_fh,
>  				     create->cr_name, create->cr_namelen,
> -				     &create->cr_iattr, S_IFBLK, rdev, &resfh);
> +				     &create->cr_iattr, S_IFBLK, rdev, &resfh,
> +				     nfsd4_post_create, create);
>  		break;
>  
>  	case NF4CHR:
> @@ -827,26 +841,30 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  			goto out_umask;
>  		status = nfsd_create(rqstp, &cstate->current_fh,
>  				     create->cr_name, create->cr_namelen,
> -				     &create->cr_iattr, S_IFCHR, rdev, &resfh);
> +				     &create->cr_iattr, S_IFCHR, rdev, &resfh,
> +				     nfsd4_post_create, create);
>  		break;
>  
>  	case NF4SOCK:
>  		status = nfsd_create(rqstp, &cstate->current_fh,
>  				     create->cr_name, create->cr_namelen,
> -				     &create->cr_iattr, S_IFSOCK, 0, &resfh);
> +				     &create->cr_iattr, S_IFSOCK, 0, &resfh,
> +				     nfsd4_post_create, create);
>  		break;
>  
>  	case NF4FIFO:
>  		status = nfsd_create(rqstp, &cstate->current_fh,
>  				     create->cr_name, create->cr_namelen,
> -				     &create->cr_iattr, S_IFIFO, 0, &resfh);
> +				     &create->cr_iattr, S_IFIFO, 0, &resfh,
> +				     nfsd4_post_create, create);
>  		break;
>  
>  	case NF4DIR:
>  		create->cr_iattr.ia_valid &= ~ATTR_SIZE;
>  		status = nfsd_create(rqstp, &cstate->current_fh,
>  				     create->cr_name, create->cr_namelen,
> -				     &create->cr_iattr, S_IFDIR, 0, &resfh);
> +				     &create->cr_iattr, S_IFDIR, 0, &resfh,
> +				     nfsd4_post_create, create);
>  		break;
>  
>  	default:
> @@ -856,14 +874,6 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	if (status)
>  		goto out;
>  
> -	if (create->cr_label.len)
> -		nfsd4_security_inode_setsecctx(&resfh, &create->cr_label, create->cr_bmval);
> -
> -	if (create->cr_acl != NULL)
> -		do_set_nfs4_acl(&resfh, create->cr_acl,
> -				create->cr_bmval);
> -
> -	fh_unlock(&cstate->current_fh);
>  	set_change_info(&create->cr_cinfo, &cstate->current_fh);
>  	fh_dup2(&cstate->current_fh, &resfh);
>  out:
> diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
> index fcdab8a8a41f..a25b8e321662 100644
> --- a/fs/nfsd/nfsproc.c
> +++ b/fs/nfsd/nfsproc.c
> @@ -493,7 +493,7 @@ nfsd_proc_symlink(struct svc_rqst *rqstp)
>  
>  	fh_init(&newfh, NFS_FHSIZE);
>  	resp->status = nfsd_symlink(rqstp, &argp->ffh, argp->fname, argp->flen,
> -				    argp->tname, &newfh);
> +				    argp->tname, &newfh, NULL, NULL);
>  
>  	kfree(argp->tname);
>  	fh_put(&argp->ffh);
> @@ -522,7 +522,8 @@ nfsd_proc_mkdir(struct svc_rqst *rqstp)
>  	argp->attrs.ia_valid &= ~ATTR_SIZE;
>  	fh_init(&resp->fh, NFS_FHSIZE);
>  	resp->status = nfsd_create(rqstp, &argp->fh, argp->name, argp->len,
> -				   &argp->attrs, S_IFDIR, 0, &resp->fh);
> +				   &argp->attrs, S_IFDIR, 0, &resp->fh,
> +				   NULL, NULL);
>  	fh_put(&argp->fh);
>  	if (resp->status != nfs_ok)
>  		goto out;
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index d79db56475d4..1e7ca39e8a49 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1366,8 +1366,10 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
>   */
>  __be32
>  nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> -		char *fname, int flen, struct iattr *iap,
> -		int type, dev_t rdev, struct svc_fh *resfhp)
> +	    char *fname, int flen, struct iattr *iap,
> +	    int type, dev_t rdev, struct svc_fh *resfhp,
> +	    void (*post_create)(struct svc_fh *fh, void *data),
> +	    void *data)
>  {
>  	struct dentry	*dentry, *dchild = NULL;
>  	__be32		err;
> @@ -1389,8 +1391,10 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	fh_lock_nested(fhp, I_MUTEX_PARENT);
>  	dchild = lookup_one_len(fname, dentry, flen);
>  	host_err = PTR_ERR(dchild);
> -	if (IS_ERR(dchild))
> -		return nfserrno(host_err);
> +	if (IS_ERR(dchild)) {
> +		err = nfserrno(host_err);
> +		goto out_unlock;
> +	}
>  	err = fh_compose(resfhp, fhp->fh_export, dchild, fhp);
>  	/*
>  	 * We unconditionally drop our ref to dchild as fh_compose will have
> @@ -1398,9 +1402,14 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	 */
>  	dput(dchild);
>  	if (err)
> -		return err;
> -	return nfsd_create_locked(rqstp, fhp, fname, flen, iap, type,
> -					rdev, resfhp);
> +		goto out_unlock;
> +	err = nfsd_create_locked(rqstp, fhp, fname, flen, iap, type,
> +				 rdev, resfhp);
> +	if (!err && post_create)
> +		post_create(resfhp, data);
> +out_unlock:
> +	fh_unlock(fhp);
> +	return err;
>  }
>  
>  /*
> @@ -1447,9 +1456,11 @@ nfsd_readlink(struct svc_rqst *rqstp, struct svc_fh *fhp, char *buf, int *lenp)
>   */
>  __be32
>  nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
> -				char *fname, int flen,
> -				char *path,
> -				struct svc_fh *resfhp)
> +	     char *fname, int flen,
> +	     char *path,
> +	     struct svc_fh *resfhp,
> +	     void (*post_create)(struct svc_fh *fh, void *data),
> +	     void *data)
>  {
>  	struct dentry	*dentry, *dnew;
>  	__be32		err, cerr;
> @@ -1474,12 +1485,12 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	dentry = fhp->fh_dentry;
>  	dnew = lookup_one_len(fname, dentry, flen);
>  	host_err = PTR_ERR(dnew);
> -	if (IS_ERR(dnew))
> +	if (IS_ERR(dnew)) {
> +		fh_unlock(fhp);
>  		goto out_nfserr;
> -
> +	}
>  	host_err = vfs_symlink(&init_user_ns, d_inode(dentry), dnew, path);
>  	err = nfserrno(host_err);
> -	fh_unlock(fhp);
>  	if (!err)
>  		err = nfserrno(commit_metadata(fhp));
>  
> @@ -1488,6 +1499,9 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	cerr = fh_compose(resfhp, fhp->fh_export, dnew, fhp);
>  	dput(dnew);
>  	if (err==0) err = cerr;
> +	if (!err && post_create)
> +		post_create(resfhp, data);
> +	fh_unlock(fhp);
>  out:
>  	return err;
>  
> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> index 26347d76f44a..9f4fd3060200 100644
> --- a/fs/nfsd/vfs.h
> +++ b/fs/nfsd/vfs.h
> @@ -66,8 +66,10 @@ __be32		nfsd_create_locked(struct svc_rqst *, struct svc_fh *,
>  				char *name, int len, struct iattr *attrs,
>  				int type, dev_t rdev, struct svc_fh *res);
>  __be32		nfsd_create(struct svc_rqst *, struct svc_fh *,
> -				char *name, int len, struct iattr *attrs,
> -				int type, dev_t rdev, struct svc_fh *res);
> +			    char *name, int len, struct iattr *attrs,
> +			    int type, dev_t rdev, struct svc_fh *res,
> +			    void (*post_create)(struct svc_fh *fh, void *data),
> +			    void *data);
>  __be32		nfsd_access(struct svc_rqst *, struct svc_fh *, u32 *, u32 *);
>  __be32		nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  				struct svc_fh *resfhp, struct iattr *iap);
> @@ -111,7 +113,10 @@ __be32		nfsd_readlink(struct svc_rqst *, struct svc_fh *,
>  				char *, int *);
>  __be32		nfsd_symlink(struct svc_rqst *, struct svc_fh *,
>  				char *name, int len, char *path,
> -				struct svc_fh *res);
> +				struct svc_fh *res,
> +				void (*post_create)(struct svc_fh *fh,
> +						    void *data),
> +				void *data);
>  __be32		nfsd_link(struct svc_rqst *, struct svc_fh *,
>  				char *, int, struct svc_fh *);
>  ssize_t		nfsd_copy_file_range(struct file *, u64,
> 
> 

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/8] NFSD: always drop directory lock in nfsd_unlink()
  2022-07-06  4:18 ` [PATCH 3/8] NFSD: always drop directory lock in nfsd_unlink() NeilBrown
@ 2022-07-06 13:30   ` Jeff Layton
  0 siblings, 0 replies; 40+ messages in thread
From: Jeff Layton @ 2022-07-06 13:30 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever; +Cc: linux-nfs

On Wed, 2022-07-06 at 14:18 +1000, NeilBrown wrote:
> Some error paths in nfsd_unlink() allow it to exit without unlocking the
> directory.  This is not a problem in practice as the directory will be
> locked with an fh_put(), but it is untidy and potentially confusing.
> 
> This allows us to remove all the fh_unlock() calls that are immediately
> after nfsd_unlink() calls.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/nfs3proc.c |    2 --
>  fs/nfsd/nfs4proc.c |    4 +---
>  fs/nfsd/vfs.c      |    7 +++++--
>  3 files changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
> index 38255365ef71..ad7941001106 100644
> --- a/fs/nfsd/nfs3proc.c
> +++ b/fs/nfsd/nfs3proc.c
> @@ -478,7 +478,6 @@ nfsd3_proc_remove(struct svc_rqst *rqstp)
>  	fh_copy(&resp->fh, &argp->fh);
>  	resp->status = nfsd_unlink(rqstp, &resp->fh, -S_IFDIR,
>  				   argp->name, argp->len);
> -	fh_unlock(&resp->fh);
>  	return rpc_success;
>  }
>  
> @@ -499,7 +498,6 @@ nfsd3_proc_rmdir(struct svc_rqst *rqstp)
>  	fh_copy(&resp->fh, &argp->fh);
>  	resp->status = nfsd_unlink(rqstp, &resp->fh, S_IFDIR,
>  				   argp->name, argp->len);
> -	fh_unlock(&resp->fh);
>  	return rpc_success;
>  }
>  
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 3279daab909d..4737019738ab 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -1052,10 +1052,8 @@ nfsd4_remove(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  		return nfserr_grace;
>  	status = nfsd_unlink(rqstp, &cstate->current_fh, 0,
>  			     remove->rm_name, remove->rm_namelen);
> -	if (!status) {
> -		fh_unlock(&cstate->current_fh);
> +	if (!status)
>  		set_change_info(&remove->rm_cinfo, &cstate->current_fh);
> -	}
>  	return status;
>  }
>  
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 1e7ca39e8a49..3f4579f5775c 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1762,12 +1762,12 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
>  	rdentry = lookup_one_len(fname, dentry, flen);
>  	host_err = PTR_ERR(rdentry);
>  	if (IS_ERR(rdentry))
> -		goto out_drop_write;
> +		goto out_unlock;
>  
>  	if (d_really_is_negative(rdentry)) {
>  		dput(rdentry);
>  		host_err = -ENOENT;
> -		goto out_drop_write;
> +		goto out_unlock;
>  	}
>  	rinode = d_inode(rdentry);
>  	ihold(rinode);
> @@ -1805,6 +1805,9 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
>  	}
>  out:
>  	return err;
> +out_unlock:
> +	fh_unlock(fhp);
> +	goto out_drop_write;
>  }
>  
>  /*
> 
> 

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/8] NFSD: only call fh_unlock() once in nfsd_link()
  2022-07-06  4:18 ` [PATCH 4/8] NFSD: only call fh_unlock() once in nfsd_link() NeilBrown
@ 2022-07-06 13:31   ` Jeff Layton
  2022-07-06 16:29   ` Chuck Lever III
  1 sibling, 0 replies; 40+ messages in thread
From: Jeff Layton @ 2022-07-06 13:31 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever; +Cc: linux-nfs

On Wed, 2022-07-06 at 14:18 +1000, NeilBrown wrote:
> On non-error paths, nfsd_link() calls fh_unlock() twice.  This is safe
> because fh_unlock() records that the unlock has been done and doesn't
> repeat it.
> However it makes the code a little confusing and interferes with changes
> that are planned for directory locking.
> 
> So rearrange the code to ensure fh_unlock() is called exactly once if
> fh_lock() was called.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/vfs.c |   18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 3f4579f5775c..4916c29af0fa 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1551,8 +1551,10 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
>  
>  	dnew = lookup_one_len(name, ddir, len);
>  	host_err = PTR_ERR(dnew);
> -	if (IS_ERR(dnew))
> -		goto out_nfserr;
> +	if (IS_ERR(dnew)) {
> +		err = nfserrno(host_err);
> +		goto out_unlock;
> +	}
>  
>  	dold = tfhp->fh_dentry;
>  
> @@ -1571,17 +1573,17 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
>  		else
>  			err = nfserrno(host_err);
>  	}
> -out_dput:
>  	dput(dnew);
> -out_unlock:
> -	fh_unlock(ffhp);
> +out_drop_write:
>  	fh_drop_write(tfhp);
>  out:
>  	return err;
>  
> -out_nfserr:
> -	err = nfserrno(host_err);
> -	goto out_unlock;
> +out_dput:
> +	dput(dnew);
> +out_unlock:
> +	fh_unlock(ffhp);
> +	goto out_drop_write;
>  }
>  
>  static void
> 
> 

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/8] NFSD: reduce locking in nfsd_lookup()
  2022-07-06  4:18 ` [PATCH 5/8] NFSD: reduce locking in nfsd_lookup() NeilBrown
@ 2022-07-06 13:47   ` Jeff Layton
  2022-07-07  1:26     ` NeilBrown
  2022-07-06 16:29   ` Chuck Lever III
  1 sibling, 1 reply; 40+ messages in thread
From: Jeff Layton @ 2022-07-06 13:47 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever; +Cc: linux-nfs

On Wed, 2022-07-06 at 14:18 +1000, NeilBrown wrote:
> nfsd_lookup() takes an exclusive lock on the parent inode, but many
> callers don't want the lock and may not need to lock at all if the
> result is in the dcache.
> 
> Change nfsd_lookup() to be passed a bool flag.
> If false, don't take the lock.
> If true, do take an exclusive lock, and return with it held if
> successful.
> If nfsd_lookup() returns an error, the lock WILL NOT be held.
> 
> Only nfsd4_open() requests the lock to be held, and does so to block
> rename until it decides whether to return a delegation.
> 
> NOTE: when nfsd4_open() creates a file, the directory does *NOT* remain
>   locked and never has.  So it is possible (though unlikely) for the
>   newly created file to be renamed before a delegation is handed out,
>   and that would be bad.  This patch does *NOT* fix that, but *DOES*
>   take the directory lock immediately after creating the file, which
>   reduces the size of the window and ensure that the lock is held
>   consistently.  More work is needed to guarantee no rename happens
>   before the delegation.
> 

Interesting. Maybe after taking the lock, we could re-vet the dentry vs.
the info in the OPEN request? That way, we'd presumably know that the
above race didn't occur.

> NOTE-2: NFSv4 requires directory changeinfo for OPEN even when a create
>   wasn't requested and no change happened.  Now that nfsd_lookup()
>   doesn't use fh_lock(), we need explicit fh_fill_pre/post_attrs()
>   in the non-create branch of do_open_lookup().
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/nfs3proc.c |    2 +-
>  fs/nfsd/nfs4proc.c |   51 ++++++++++++++++++++++++++++------------
>  fs/nfsd/nfsproc.c  |    2 +-
>  fs/nfsd/vfs.c      |   66 +++++++++++++++++++++++++++++++++++-----------------
>  fs/nfsd/vfs.h      |    8 ++++--
>  5 files changed, 88 insertions(+), 41 deletions(-)
> 
> diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
> index ad7941001106..3a67d0afb885 100644
> --- a/fs/nfsd/nfs3proc.c
> +++ b/fs/nfsd/nfs3proc.c
> @@ -96,7 +96,7 @@ nfsd3_proc_lookup(struct svc_rqst *rqstp)
>  
>  	resp->status = nfsd_lookup(rqstp, &resp->dirfh,
>  				   argp->name, argp->len,
> -				   &resp->fh);
> +				   &resp->fh, false);
>  	return rpc_success;
>  }
>  
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 4737019738ab..6ec22c69cbec 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -414,7 +414,8 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  }
>  
>  static __be32
> -do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd4_open *open, struct svc_fh **resfh)
> +do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> +	       struct nfsd4_open *open, struct svc_fh **resfh)
>  {
>  	struct svc_fh *current_fh = &cstate->current_fh;
>  	int accmode;
> @@ -441,11 +442,18 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru
>  		 * yes          | no     | GUARDED4        | GUARDED4
>  		 * yes          | yes    | GUARDED4        | GUARDED4
>  		 */
> -
>  		current->fs->umask = open->op_umask;
>  		status = nfsd4_create_file(rqstp, current_fh, *resfh, open);
>  		current->fs->umask = 0;
>  
> +		if (!status)
> +			/* We really want to hold the lock from before the
> +			 * create to ensure no rename happens, but that
> +			 * needs more work...
> +			 */
> +			inode_lock_nested(current_fh->fh_dentry->d_inode,
> +					  I_MUTEX_PARENT);
> +
>  		if (!status && open->op_label.len)
>  			nfsd4_security_inode_setsecctx(*resfh, &open->op_label, open->op_bmval);
>  
> @@ -457,17 +465,25 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru
>  		if (nfsd4_create_is_exclusive(open->op_createmode) && status == 0)
>  			open->op_bmval[1] |= (FATTR4_WORD1_TIME_ACCESS |
>  						FATTR4_WORD1_TIME_MODIFY);
> -	} else
> -		/*
> -		 * Note this may exit with the parent still locked.
> -		 * We will hold the lock until nfsd4_open's final
> -		 * lookup, to prevent renames or unlinks until we've had
> -		 * a chance to an acquire a delegation if appropriate.
> +	} else {
> +		/* We want to keep the directory locked until we've had a chance
> +		 * to acquire a delegation if appropriate, so request that
> +		 * nfsd_lookup() hold on to the lock.
>  		 */
>  		status = nfsd_lookup(rqstp, current_fh,
> -				     open->op_fname, open->op_fnamelen, *resfh);
> +				     open->op_fname, open->op_fnamelen, *resfh,
> +				     true);
> +		if (!status) {
> +			/* NFSv4 protocol requires change attributes even though
> +			 * no change happened.
> +			 */
> +			fh_fill_pre_attrs(current_fh);
> +			fh_fill_post_attrs(current_fh);
> +		}
> +	}
>  	if (status)
> -		goto out;
> +		return status;
> +
>  	status = nfsd_check_obj_isreg(*resfh);
>  	if (status)
>  		goto out;
> @@ -483,6 +499,8 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru
>  	status = do_open_permission(rqstp, *resfh, open, accmode);
>  	set_change_info(&open->op_cinfo, current_fh);
>  out:
> +	if (status)
> +		inode_unlock(current_fh->fh_dentry->d_inode);
>  	return status;
>  }
>  
> @@ -540,6 +558,7 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	struct net *net = SVC_NET(rqstp);
>  	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>  	bool reclaim = false;
> +	bool locked = false;
>  
>  	dprintk("NFSD: nfsd4_open filename %.*s op_openowner %p\n",
>  		(int)open->op_fnamelen, open->op_fname,
> @@ -604,6 +623,7 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  		status = do_open_lookup(rqstp, cstate, open, &resfh);
>  		if (status)
>  			goto out;
> +		locked = true;
>  		break;
>  	case NFS4_OPEN_CLAIM_PREVIOUS:
>  		status = nfs4_check_open_reclaim(cstate->clp);
> @@ -639,6 +659,8 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  		fput(open->op_filp);
>  		open->op_filp = NULL;
>  	}
> +	if (locked)
> +		inode_unlock(cstate->current_fh.fh_dentry->d_inode);
>  	if (resfh && resfh != &cstate->current_fh) {
>  		fh_dup2(&cstate->current_fh, resfh);
>  		fh_put(resfh);
> @@ -933,7 +955,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
>  		return nfserr_noent;
>  	}
>  	fh_put(&tmp_fh);
> -	return nfsd_lookup(rqstp, fh, "..", 2, fh);
> +	return nfsd_lookup(rqstp, fh, "..", 2, fh, false);
>  }
>  
>  static __be32
> @@ -949,7 +971,7 @@ nfsd4_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  {
>  	return nfsd_lookup(rqstp, &cstate->current_fh,
>  			   u->lookup.lo_name, u->lookup.lo_len,
> -			   &cstate->current_fh);
> +			   &cstate->current_fh, false);
>  }
>  
>  static __be32
> @@ -1089,11 +1111,10 @@ nfsd4_secinfo(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	if (err)
>  		return err;
>  	err = nfsd_lookup_dentry(rqstp, &cstate->current_fh,
> -				    secinfo->si_name, secinfo->si_namelen,
> -				    &exp, &dentry);
> +				 secinfo->si_name, secinfo->si_namelen,
> +				 &exp, &dentry, false);
>  	if (err)
>  		return err;
> -	fh_unlock(&cstate->current_fh);
>  	if (d_really_is_negative(dentry)) {
>  		exp_put(exp);
>  		err = nfserr_noent;
> diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
> index a25b8e321662..ed24fae09517 100644
> --- a/fs/nfsd/nfsproc.c
> +++ b/fs/nfsd/nfsproc.c
> @@ -133,7 +133,7 @@ nfsd_proc_lookup(struct svc_rqst *rqstp)
>  
>  	fh_init(&resp->fh, NFS_FHSIZE);
>  	resp->status = nfsd_lookup(rqstp, &argp->fh, argp->name, argp->len,
> -				   &resp->fh);
> +				   &resp->fh, false);
>  	fh_put(&argp->fh);
>  	if (resp->status != nfs_ok)
>  		goto out;
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 4916c29af0fa..8e050c6d112a 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -172,7 +172,8 @@ int nfsd_mountpoint(struct dentry *dentry, struct svc_export *exp)
>  __be32
>  nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  		   const char *name, unsigned int len,
> -		   struct svc_export **exp_ret, struct dentry **dentry_ret)
> +		   struct svc_export **exp_ret, struct dentry **dentry_ret,
> +		   bool locked)
>  {
>  	struct svc_export	*exp;
>  	struct dentry		*dparent;
> @@ -199,27 +200,31 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  				goto out_nfserr;
>  		}
>  	} else {
> -		/*
> -		 * In the nfsd4_open() case, this may be held across
> -		 * subsequent open and delegation acquisition which may
> -		 * need to take the child's i_mutex:
> -		 */
> -		fh_lock_nested(fhp, I_MUTEX_PARENT);
> -		dentry = lookup_one_len(name, dparent, len);
> +		if (locked)
> +			dentry = lookup_one_len(name, dparent, len);
> +		else
> +			dentry = lookup_one_len_unlocked(name, dparent, len);
>  		host_err = PTR_ERR(dentry);
>  		if (IS_ERR(dentry))
>  			goto out_nfserr;
>  		if (nfsd_mountpoint(dentry, exp)) {
>  			/*
> -			 * We don't need the i_mutex after all.  It's
> -			 * still possible we could open this (regular
> -			 * files can be mountpoints too), but the
> -			 * i_mutex is just there to prevent renames of
> -			 * something that we might be about to delegate,
> -			 * and a mountpoint won't be renamed:
> +			 * nfsd_cross_mnt() may wait for an upcall
> +			 * to userspace, and holding i_sem across that

s/i_sem/i_rwsem/

> +			 * invites the possibility of a deadlock.
> +			 * We don't really need the lock on the parent
> +			 * of a mount point was we only need it to guard
> +			 * against a rename before we get a lease for a
> +			 * delegation.
> +			 * So just drop the i_sem and reclaim it.
>  			 */
> -			fh_unlock(fhp);
> -			if ((host_err = nfsd_cross_mnt(rqstp, &dentry, &exp))) {
> +			if (locked)
> +				inode_unlock(dparent->d_inode);
> +			host_err = nfsd_cross_mnt(rqstp, &dentry, &exp);
> +			if (locked)
> +				inode_lock_nested(dparent->d_inode,
> +						  I_MUTEX_PARENT);
> +			if (host_err) {
>  				dput(dentry);
>  				goto out_nfserr;
>  			}
> @@ -234,7 +239,17 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	return nfserrno(host_err);
>  }
>  
> -/*
> +/**
> + * nfsd_lookup - look up a single path component for nfsd
> + *
> + * @rqstp:   the request context
> + * @ftp:     the file handle of the directory
> + * @name:    the component name, or %NULL to look up parent
> + * @len:     length of name to examine
> + * @resfh:   pointer to pre-initialised filehandle to hold result.
> + * @lock:    if true, lock directory during lookup and keep it locked
> + *           if there is no error.
> + *
>   * Look up one component of a pathname.
>   * N.B. After this call _both_ fhp and resfh need an fh_put
>   *
> @@ -244,11 +259,15 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp,
>   * returned. Otherwise the covered directory is returned.
>   * NOTE: this mountpoint crossing is not supported properly by all
>   *   clients and is explicitly disallowed for NFSv3
> - *      NeilBrown <neilb@cse.unsw.edu.au>
> + *
> + * Only nfsd4_open() calls this with @lock set.  It does so to block
> + * renames/unlinks before it possibly gets a lease to provide a
> + * delegation.
>   */
>  __be32
>  nfsd_lookup(struct svc_rqst *rqstp, struct svc_fh *fhp, const char *name,
> -				unsigned int len, struct svc_fh *resfh)
> +	    unsigned int len, struct svc_fh *resfh,
> +	    bool lock)
>  {
>  	struct svc_export	*exp;
>  	struct dentry		*dentry;
> @@ -257,9 +276,11 @@ nfsd_lookup(struct svc_rqst *rqstp, struct svc_fh *fhp, const char *name,
>  	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
>  	if (err)
>  		return err;
> -	err = nfsd_lookup_dentry(rqstp, fhp, name, len, &exp, &dentry);
> +	if (lock)
> +		inode_lock_nested(fhp->fh_dentry->d_inode, I_MUTEX_PARENT);
> +	err = nfsd_lookup_dentry(rqstp, fhp, name, len, &exp, &dentry, lock);
>  	if (err)
> -		return err;
> +		goto out_err;
>  	err = check_nfsd_access(exp, rqstp);
>  	if (err)
>  		goto out;
> @@ -273,6 +294,9 @@ nfsd_lookup(struct svc_rqst *rqstp, struct svc_fh *fhp, const char *name,
>  out:
>  	dput(dentry);
>  	exp_put(exp);
> +out_err:
> +	if (err && lock)
> +		inode_unlock(fhp->fh_dentry->d_inode);
>  	return err;
>  }
>  
> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> index 9f4fd3060200..290788f007d4 100644
> --- a/fs/nfsd/vfs.h
> +++ b/fs/nfsd/vfs.h
> @@ -45,10 +45,12 @@ typedef int (*nfsd_filldir_t)(void *, const char *, int, loff_t, u64, unsigned);
>  int		nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp,
>  		                struct svc_export **expp);
>  __be32		nfsd_lookup(struct svc_rqst *, struct svc_fh *,
> -				const char *, unsigned int, struct svc_fh *);
> +			    const char *, unsigned int, struct svc_fh *,
> +			    bool);
>  __be32		 nfsd_lookup_dentry(struct svc_rqst *, struct svc_fh *,
> -				const char *, unsigned int,
> -				struct svc_export **, struct dentry **);
> +				    const char *, unsigned int,
> +				    struct svc_export **, struct dentry **,
> +				    bool);
>  __be32		nfsd_setattr(struct svc_rqst *, struct svc_fh *,
>  				struct iattr *, int, time64_t);
>  int nfsd_mountpoint(struct dentry *, struct svc_export *);
> 
> 

Other than minor comment nit...

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 6/8] NFSD: use explicit lock/unlock for directory ops
  2022-07-06  4:18 ` [PATCH 6/8] NFSD: use explicit lock/unlock for directory ops NeilBrown
@ 2022-07-06 14:05   ` Jeff Layton
  2022-07-06 16:29   ` Chuck Lever III
  2022-07-15 16:11   ` Jeff Layton
  2 siblings, 0 replies; 40+ messages in thread
From: Jeff Layton @ 2022-07-06 14:05 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever; +Cc: linux-nfs

On Wed, 2022-07-06 at 14:18 +1000, NeilBrown wrote:
> When creating or unlinking a name in a directory use explicit
> inode_lock_nested() instead of fh_lock(), and explicit calls to
> fh_fill_pre_attrs() and fh_fill_post_attrs().  This is already done for
> renames.
> 
> Also move the 'fill' calls closer to the operation that might change the
> attributes.  This way they are avoided on some error paths.
> 
> Having the locking explicit will simplify proposed future changes to
> locking for directories.  It also makes it easily visible exactly where
> pre/post attributes are used - not all callers of fh_lock() actually
> need the pre/post attributes.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/nfs3proc.c |    6 ++++--
>  fs/nfsd/nfs4proc.c |    6 ++++--
>  fs/nfsd/nfsproc.c  |    7 ++++---
>  fs/nfsd/vfs.c      |   30 +++++++++++++++++++-----------
>  4 files changed, 31 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
> index 3a67d0afb885..9629517344ff 100644
> --- a/fs/nfsd/nfs3proc.c
> +++ b/fs/nfsd/nfs3proc.c
> @@ -254,7 +254,7 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (host_err)
>  		return nfserrno(host_err);
>  
> -	fh_lock_nested(fhp, I_MUTEX_PARENT);
> +	inode_lock_nested(inode, I_MUTEX_PARENT);
>  
>  	child = lookup_one_len(argp->name, parent, argp->len);
>  	if (IS_ERR(child)) {
> @@ -312,11 +312,13 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (!IS_POSIXACL(inode))
>  		iap->ia_mode &= ~current_umask();
>  
> +	fh_fill_pre_attrs(fhp);
>  	host_err = vfs_create(&init_user_ns, inode, child, iap->ia_mode, true);
>  	if (host_err < 0) {
>  		status = nfserrno(host_err);
>  		goto out;
>  	}
> +	fh_fill_post_attrs(fhp);
>  
>  	/* A newly created file already has a file size of zero. */
>  	if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0))
> @@ -334,7 +336,7 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	status = nfsd_create_setattr(rqstp, fhp, resfhp, iap);
>  
>  out:
> -	fh_unlock(fhp);
> +	inode_unlock(inode);
>  	if (child && !IS_ERR(child))
>  		dput(child);
>  	fh_drop_write(fhp);
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 6ec22c69cbec..242f059e6788 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -306,7 +306,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (host_err)
>  		return nfserrno(host_err);
>  
> -	fh_lock_nested(fhp, I_MUTEX_PARENT);
> +	inode_lock_nested(inode, I_MUTEX_PARENT);
>  
>  	child = lookup_one_len(open->op_fname, parent, open->op_fnamelen);
>  	if (IS_ERR(child)) {
> @@ -385,10 +385,12 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (!IS_POSIXACL(inode))
>  		iap->ia_mode &= ~current_umask();
>  
> +	fh_fill_pre_attrs(fhp);
>  	status = nfsd4_vfs_create(fhp, child, open);
>  	if (status != nfs_ok)
>  		goto out;
>  	open->op_created = true;
> +	fh_fill_post_attrs(fhp);
>  
>  	/* A newly created file already has a file size of zero. */
>  	if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0))
> @@ -406,7 +408,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	status = nfsd_create_setattr(rqstp, fhp, resfhp, iap);
>  
>  out:
> -	fh_unlock(fhp);
> +	inode_unlock(inode);
>  	if (child && !IS_ERR(child))
>  		dput(child);
>  	fh_drop_write(fhp);
> diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
> index ed24fae09517..427c404bc52b 100644
> --- a/fs/nfsd/nfsproc.c
> +++ b/fs/nfsd/nfsproc.c
> @@ -285,7 +285,7 @@ nfsd_proc_create(struct svc_rqst *rqstp)
>  		goto done;
>  	}
>  
> -	fh_lock_nested(dirfhp, I_MUTEX_PARENT);
> +	inode_lock_nested(dirfhp->fh_dentry->d_inode, I_MUTEX_PARENT);
>  	dchild = lookup_one_len(argp->name, dirfhp->fh_dentry, argp->len);
>  	if (IS_ERR(dchild)) {
>  		resp->status = nfserrno(PTR_ERR(dchild));
> @@ -382,6 +382,7 @@ nfsd_proc_create(struct svc_rqst *rqstp)
>  	}
>  
>  	resp->status = nfs_ok;
> +	fh_fill_pre_attrs(dirfhp);
>  	if (!inode) {
>  		/* File doesn't exist. Create it and set attrs */
>  		resp->status = nfsd_create_locked(rqstp, dirfhp, argp->name,
> @@ -399,10 +400,10 @@ nfsd_proc_create(struct svc_rqst *rqstp)
>  			resp->status = nfsd_setattr(rqstp, newfhp, attr, 0,
>  						    (time64_t)0);
>  	}
> +	fh_fill_post_attrs(dirfhp);
>  
>  out_unlock:
> -	/* We don't really need to unlock, as fh_put does it. */
> -	fh_unlock(dirfhp);
> +	inode_unlock(dirfhp->fh_dentry->d_inode);
>  	fh_drop_write(dirfhp);
>  done:
>  	fh_put(dirfhp);
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 8e050c6d112a..2ca748aa83bb 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1412,7 +1412,7 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (host_err)
>  		return nfserrno(host_err);
>  
> -	fh_lock_nested(fhp, I_MUTEX_PARENT);
> +	inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
>  	dchild = lookup_one_len(fname, dentry, flen);
>  	host_err = PTR_ERR(dchild);
>  	if (IS_ERR(dchild)) {
> @@ -1427,12 +1427,14 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	dput(dchild);
>  	if (err)
>  		goto out_unlock;
> +	fh_fill_pre_attrs(fhp);
>  	err = nfsd_create_locked(rqstp, fhp, fname, flen, iap, type,
>  				 rdev, resfhp);
>  	if (!err && post_create)
>  		post_create(resfhp, data);
> +	fh_fill_post_attrs(fhp);
>  out_unlock:
> -	fh_unlock(fhp);
> +	inode_unlock(dentry->d_inode);
>  	return err;
>  }
>  
> @@ -1505,14 +1507,15 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (host_err)
>  		goto out_nfserr;
>  
> -	fh_lock(fhp);
>  	dentry = fhp->fh_dentry;
> +	inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
>  	dnew = lookup_one_len(fname, dentry, flen);
>  	host_err = PTR_ERR(dnew);
>  	if (IS_ERR(dnew)) {
> -		fh_unlock(fhp);
> +		inode_unlock(dentry->d_inode);
>  		goto out_nfserr;
>  	}
> +	fh_fill_pre_attrs(fhp);
>  	host_err = vfs_symlink(&init_user_ns, d_inode(dentry), dnew, path);
>  	err = nfserrno(host_err);
>  	if (!err)
> @@ -1525,7 +1528,8 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (err==0) err = cerr;
>  	if (!err && post_create)
>  		post_create(resfhp, data);
> -	fh_unlock(fhp);
> +	fh_fill_post_attrs(fhp);
> +	inode_unlock(dentry->d_inode);
>  out:
>  	return err;
>  
> @@ -1569,9 +1573,9 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
>  		goto out;
>  	}
>  
> -	fh_lock_nested(ffhp, I_MUTEX_PARENT);
>  	ddir = ffhp->fh_dentry;
>  	dirp = d_inode(ddir);
> +	inode_lock_nested(dirp, I_MUTEX_PARENT);
>  
>  	dnew = lookup_one_len(name, ddir, len);
>  	host_err = PTR_ERR(dnew);
> @@ -1585,8 +1589,10 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
>  	err = nfserr_noent;
>  	if (d_really_is_negative(dold))
>  		goto out_dput;
> +	fh_fill_pre_attrs(ffhp);
>  	host_err = vfs_link(dold, &init_user_ns, dirp, dnew, NULL);
> -	fh_unlock(ffhp);
> +	fh_fill_post_attrs(ffhp);
> +	inode_unlock(dirp);
>  	if (!host_err) {
>  		err = nfserrno(commit_metadata(ffhp));
>  		if (!err)
> @@ -1606,7 +1612,7 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
>  out_dput:
>  	dput(dnew);
>  out_unlock:
> -	fh_unlock(ffhp);
> +	inode_unlock(dirp);
>  	goto out_drop_write;
>  }
>  
> @@ -1781,9 +1787,9 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
>  	if (host_err)
>  		goto out_nfserr;
>  
> -	fh_lock_nested(fhp, I_MUTEX_PARENT);
>  	dentry = fhp->fh_dentry;
>  	dirp = d_inode(dentry);
> +	inode_lock_nested(dirp, I_MUTEX_PARENT);
>  
>  	rdentry = lookup_one_len(fname, dentry, flen);
>  	host_err = PTR_ERR(rdentry);
> @@ -1801,6 +1807,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
>  	if (!type)
>  		type = d_inode(rdentry)->i_mode & S_IFMT;
>  
> +	fh_fill_pre_attrs(fhp);
>  	if (type != S_IFDIR) {
>  		if (rdentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK)
>  			nfsd_close_cached_files(rdentry);
> @@ -1808,8 +1815,9 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
>  	} else {
>  		host_err = vfs_rmdir(&init_user_ns, dirp, rdentry);
>  	}
> +	fh_fill_post_attrs(fhp);
>  
> -	fh_unlock(fhp);
> +	inode_unlock(dirp);
>  	if (!host_err)
>  		host_err = commit_metadata(fhp);
>  	dput(rdentry);
> @@ -1832,7 +1840,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
>  out:
>  	return err;
>  out_unlock:
> -	fh_unlock(fhp);
> +	inode_unlock(dirp);
>  	goto out_drop_write;
>  }
>  
> 
> 

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/8] NFSD: use (un)lock_inode instead of fh_(un)lock for file operations
  2022-07-06  4:18 ` [PATCH 7/8] NFSD: use (un)lock_inode instead of fh_(un)lock for file operations NeilBrown
@ 2022-07-06 14:10   ` Jeff Layton
  2022-07-06 16:30   ` Chuck Lever III
  1 sibling, 0 replies; 40+ messages in thread
From: Jeff Layton @ 2022-07-06 14:10 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever; +Cc: linux-nfs

On Wed, 2022-07-06 at 14:18 +1000, NeilBrown wrote:
> When locking a file to access ACLs and xattrs etc, use explicit locking
> with inode_lock() instead of fh_lock().  This means that the calls to
> fh_fill_pre/post_attr() are also explicit which improves readability and
> allows us to place them only where they are needed.  Only the xattr
> calls need pre/post information.
> 
> When locking a file we don't need I_MUTEX_PARENT as the file is not a
> parent of anything, so we can use inode_lock() directly rather than the
> inode_lock_nested() call that fh_lock() uses.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/nfs2acl.c   |    6 +++---
>  fs/nfsd/nfs3acl.c   |    4 ++--
>  fs/nfsd/nfs4acl.c   |    7 +++----
>  fs/nfsd/nfs4state.c |    8 ++++----
>  fs/nfsd/vfs.c       |   25 ++++++++++++-------------
>  5 files changed, 24 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c
> index b5760801d377..9edd3c1a30fb 100644
> --- a/fs/nfsd/nfs2acl.c
> +++ b/fs/nfsd/nfs2acl.c
> @@ -111,7 +111,7 @@ static __be32 nfsacld_proc_setacl(struct svc_rqst *rqstp)
>  	if (error)
>  		goto out_errno;
>  
> -	fh_lock(fh);
> +	inode_lock(inode);
>  
>  	error = set_posix_acl(&init_user_ns, inode, ACL_TYPE_ACCESS,
>  			      argp->acl_access);
> @@ -122,7 +122,7 @@ static __be32 nfsacld_proc_setacl(struct svc_rqst *rqstp)
>  	if (error)
>  		goto out_drop_lock;
>  
> -	fh_unlock(fh);
> +	inode_unlock(inode);
>  
>  	fh_drop_write(fh);
>  
> @@ -136,7 +136,7 @@ static __be32 nfsacld_proc_setacl(struct svc_rqst *rqstp)
>  	return rpc_success;
>  
>  out_drop_lock:
> -	fh_unlock(fh);
> +	inode_unlock(inode);
>  	fh_drop_write(fh);
>  out_errno:
>  	resp->status = nfserrno(error);
> diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c
> index 35b2ebda14da..9446c6743664 100644
> --- a/fs/nfsd/nfs3acl.c
> +++ b/fs/nfsd/nfs3acl.c
> @@ -101,7 +101,7 @@ static __be32 nfsd3_proc_setacl(struct svc_rqst *rqstp)
>  	if (error)
>  		goto out_errno;
>  
> -	fh_lock(fh);
> +	inode_lock(inode);
>  
>  	error = set_posix_acl(&init_user_ns, inode, ACL_TYPE_ACCESS,
>  			      argp->acl_access);
> @@ -111,7 +111,7 @@ static __be32 nfsd3_proc_setacl(struct svc_rqst *rqstp)
>  			      argp->acl_default);
>  
>  out_drop_lock:
> -	fh_unlock(fh);
> +	inode_unlock(inode);
>  	fh_drop_write(fh);
>  out_errno:
>  	resp->status = nfserrno(error);
> diff --git a/fs/nfsd/nfs4acl.c b/fs/nfsd/nfs4acl.c
> index 5c9b7e01e8ca..a33cacf62ea0 100644
> --- a/fs/nfsd/nfs4acl.c
> +++ b/fs/nfsd/nfs4acl.c
> @@ -781,19 +781,18 @@ nfsd4_set_nfs4_acl(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (host_error < 0)
>  		goto out_nfserr;
>  
> -	fh_lock(fhp);
> +	inode_lock(inode);
>  
>  	host_error = set_posix_acl(&init_user_ns, inode, ACL_TYPE_ACCESS, pacl);
>  	if (host_error < 0)
>  		goto out_drop_lock;
>  
> -	if (S_ISDIR(inode->i_mode)) {
> +	if (S_ISDIR(inode->i_mode))
>  		host_error = set_posix_acl(&init_user_ns, inode,
>  					   ACL_TYPE_DEFAULT, dpacl);
> -	}
>  
>  out_drop_lock:
> -	fh_unlock(fhp);
> +	inode_unlock(inode);
>  
>  	posix_acl_release(pacl);
>  	posix_acl_release(dpacl);
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 9d1a3e131c49..307317ba9aff 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -7322,21 +7322,21 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  static __be32 nfsd_test_lock(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file_lock *lock)
>  {
>  	struct nfsd_file *nf;
> +	struct inode *inode = fhp->fh_dentry->d_inode;
>  	__be32 err;
>  
>  	err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
>  	if (err)
>  		return err;
> -	fh_lock(fhp); /* to block new leases till after test_lock: */
> -	err = nfserrno(nfsd_open_break_lease(fhp->fh_dentry->d_inode,
> -							NFSD_MAY_READ));
> +	inode_lock(inode); /* to block new leases till after test_lock: */
> +	err = nfserrno(nfsd_open_break_lease(inode, NFSD_MAY_READ));
>  	if (err)
>  		goto out;
>  	lock->fl_file = nf->nf_file;
>  	err = nfserrno(vfs_test_lock(nf->nf_file, lock));
>  	lock->fl_file = NULL;
>  out:
> -	fh_unlock(fhp);
> +	inode_unlock(inode);
>  	nfsd_file_put(nf);
>  	return err;
>  }
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 2ca748aa83bb..2526615285ca 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -444,7 +444,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
>  			return err;
>  	}
>  
> -	fh_lock(fhp);
> +	inode_lock(inode);
>  	if (size_change) {
>  		/*
>  		 * RFC5661, Section 18.30.4:
> @@ -480,7 +480,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
>  	host_err = notify_change(&init_user_ns, dentry, iap, NULL);
>  
>  out_unlock:
> -	fh_unlock(fhp);
> +	inode_unlock(inode);
>  	if (size_change)
>  		put_write_access(inode);
>  out:
> @@ -2196,12 +2196,8 @@ nfsd_listxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char **bufp,
>  }
>  
>  /*
> - * Removexattr and setxattr need to call fh_lock to both lock the inode
> - * and set the change attribute. Since the top-level vfs_removexattr
> - * and vfs_setxattr calls already do their own inode_lock calls, call
> - * the _locked variant. Pass in a NULL pointer for delegated_inode,
> - * and let the client deal with NFS4ERR_DELAY (same as with e.g.
> - * setattr and remove).
> + * Pass in a NULL pointer for delegated_inode, and let the client deal
> + * with NFS4ERR_DELAY (same as with e.g.  setattr and remove).
>   */
>  __be32
>  nfsd_removexattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name)
> @@ -2217,12 +2213,14 @@ nfsd_removexattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name)
>  	if (ret)
>  		return nfserrno(ret);
>  
> -	fh_lock(fhp);
> +	inode_lock(fhp->fh_dentry->d_inode);
> +	fh_fill_pre_attrs(fhp);
>  
>  	ret = __vfs_removexattr_locked(&init_user_ns, fhp->fh_dentry,
>  				       name, NULL);
>  
> -	fh_unlock(fhp);
> +	fh_fill_post_attrs(fhp);
> +	inode_unlock(fhp->fh_dentry->d_inode);
>  	fh_drop_write(fhp);
>  
>  	return nfsd_xattr_errno(ret);
> @@ -2242,12 +2240,13 @@ nfsd_setxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name,
>  	ret = fh_want_write(fhp);
>  	if (ret)
>  		return nfserrno(ret);
> -	fh_lock(fhp);
> +	inode_lock(fhp->fh_dentry->d_inode);
> +	fh_fill_pre_attrs(fhp);
>  
>  	ret = __vfs_setxattr_locked(&init_user_ns, fhp->fh_dentry, name, buf,
>  				    len, flags, NULL);
> -
> -	fh_unlock(fhp);
> +	fh_fill_post_attrs(fhp);
> +	inode_unlock(fhp->fh_dentry->d_inode);
>  	fh_drop_write(fhp);
>  
>  	return nfsd_xattr_errno(ret);
> 
> 

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 8/8] NFSD: discard fh_locked flag and fh_lock/fh_unlock
  2022-07-06  4:18 ` [PATCH 8/8] NFSD: discard fh_locked flag and fh_lock/fh_unlock NeilBrown
@ 2022-07-06 14:12   ` Jeff Layton
  0 siblings, 0 replies; 40+ messages in thread
From: Jeff Layton @ 2022-07-06 14:12 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever; +Cc: linux-nfs

On Wed, 2022-07-06 at 14:18 +1000, NeilBrown wrote:
> As all inode locking is now fully balanced, fh_put() does not need to
> call fh_unlock().
> fh_lock() and fh_unlock() are no longer used, so discard them.
> These are the only real users of ->fh_locked, so discard that too.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/nfsfh.c |    3 +--
>  fs/nfsd/nfsfh.h |   56 ++++---------------------------------------------------
>  fs/nfsd/vfs.c   |   17 +----------------
>  3 files changed, 6 insertions(+), 70 deletions(-)
> 
> diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> index 5e2ed4b2a925..22a77a5e2327 100644
> --- a/fs/nfsd/nfsfh.c
> +++ b/fs/nfsd/nfsfh.c
> @@ -549,7 +549,7 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
>  	if (ref_fh == fhp)
>  		fh_put(ref_fh);
>  
> -	if (fhp->fh_locked || fhp->fh_dentry) {
> +	if (fhp->fh_dentry) {
>  		printk(KERN_ERR "fh_compose: fh %pd2 not initialized!\n",
>  		       dentry);
>  	}
> @@ -681,7 +681,6 @@ fh_put(struct svc_fh *fhp)
>  	struct dentry * dentry = fhp->fh_dentry;
>  	struct svc_export * exp = fhp->fh_export;
>  	if (dentry) {
> -		fh_unlock(fhp);
>  		fhp->fh_dentry = NULL;
>  		dput(dentry);
>  		fh_clear_pre_post_attrs(fhp);
> diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
> index fb9d358a267e..09c654bdf9b0 100644
> --- a/fs/nfsd/nfsfh.h
> +++ b/fs/nfsd/nfsfh.h
> @@ -81,7 +81,6 @@ typedef struct svc_fh {
>  	struct dentry *		fh_dentry;	/* validated dentry */
>  	struct svc_export *	fh_export;	/* export pointer */
>  
> -	bool			fh_locked;	/* inode locked by us */
>  	bool			fh_want_write;	/* remount protection taken */
>  	bool			fh_no_wcc;	/* no wcc data needed */
>  	bool			fh_no_atomic_attr;
> @@ -93,7 +92,7 @@ typedef struct svc_fh {
>  	bool			fh_post_saved;	/* post-op attrs saved */
>  	bool			fh_pre_saved;	/* pre-op attrs saved */
>  
> -	/* Pre-op attributes saved during fh_lock */
> +	/* Pre-op attributes saved when inode is locked */
>  	__u64			fh_pre_size;	/* size before operation */
>  	struct timespec64	fh_pre_mtime;	/* mtime before oper */
>  	struct timespec64	fh_pre_ctime;	/* ctime before oper */
> @@ -103,7 +102,7 @@ typedef struct svc_fh {
>  	 */
>  	u64			fh_pre_change;
>  
> -	/* Post-op attributes saved in fh_unlock */
> +	/* Post-op attributes saved in fh_fill_post_attrs() */
>  	struct kstat		fh_post_attr;	/* full attrs after operation */
>  	u64			fh_post_change; /* nfsv4 change; see above */
>  } svc_fh;
> @@ -223,8 +222,8 @@ void	fh_put(struct svc_fh *);
>  static __inline__ struct svc_fh *
>  fh_copy(struct svc_fh *dst, struct svc_fh *src)
>  {
> -	WARN_ON(src->fh_dentry || src->fh_locked);
> -			
> +	WARN_ON(src->fh_dentry);
> +
>  	*dst = *src;
>  	return dst;
>  }
> @@ -323,51 +322,4 @@ static inline u64 nfsd4_change_attribute(struct kstat *stat,
>  extern void fh_fill_pre_attrs(struct svc_fh *fhp);
>  extern void fh_fill_post_attrs(struct svc_fh *fhp);
>  
> -
> -/*
> - * Lock a file handle/inode
> - * NOTE: both fh_lock and fh_unlock are done "by hand" in
> - * vfs.c:nfsd_rename as it needs to grab 2 i_mutex's at once
> - * so, any changes here should be reflected there.
> - */
> -
> -static inline void
> -fh_lock_nested(struct svc_fh *fhp, unsigned int subclass)
> -{
> -	struct dentry	*dentry = fhp->fh_dentry;
> -	struct inode	*inode;
> -
> -	BUG_ON(!dentry);
> -
> -	if (fhp->fh_locked) {
> -		printk(KERN_WARNING "fh_lock: %pd2 already locked!\n",
> -			dentry);
> -		return;
> -	}
> -
> -	inode = d_inode(dentry);
> -	inode_lock_nested(inode, subclass);
> -	fh_fill_pre_attrs(fhp);
> -	fhp->fh_locked = true;
> -}
> -
> -static inline void
> -fh_lock(struct svc_fh *fhp)
> -{
> -	fh_lock_nested(fhp, I_MUTEX_NORMAL);
> -}
> -
> -/*
> - * Unlock a file handle/inode
> - */
> -static inline void
> -fh_unlock(struct svc_fh *fhp)
> -{
> -	if (fhp->fh_locked) {
> -		fh_fill_post_attrs(fhp);
> -		inode_unlock(d_inode(fhp->fh_dentry));
> -		fhp->fh_locked = false;
> -	}
> -}
> -
>  #endif /* _LINUX_NFSD_NFSFH_H */
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 2526615285ca..fe4cdf8ab428 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1305,13 +1305,6 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	dirp = d_inode(dentry);
>  
>  	dchild = dget(resfhp->fh_dentry);
> -	if (!fhp->fh_locked) {
> -		WARN_ONCE(1, "nfsd_create: parent %pd2 not locked!\n",
> -				dentry);
> -		err = nfserr_io;
> -		goto out;
> -	}
> -
>  	err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
>  	if (err)
>  		goto out;
> @@ -1674,10 +1667,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
>  		goto out;
>  	}
>  
> -	/* cannot use fh_lock as we need deadlock protective ordering
> -	 * so do it by hand */
>  	trap = lock_rename(tdentry, fdentry);
> -	ffhp->fh_locked = tfhp->fh_locked = true;
>  	fh_fill_pre_attrs(ffhp);
>  	fh_fill_pre_attrs(tfhp);
>  
> @@ -1733,17 +1723,12 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
>  	dput(odentry);
>   out_nfserr:
>  	err = nfserrno(host_err);
> -	/*
> -	 * We cannot rely on fh_unlock on the two filehandles,
> -	 * as that would do the wrong thing if the two directories
> -	 * were the same, so again we do it by hand.
> -	 */
> +
>  	if (!close_cached) {
>  		fh_fill_post_attrs(ffhp);
>  		fh_fill_post_attrs(tfhp);
>  	}
>  	unlock_rename(tdentry, fdentry);
> -	ffhp->fh_locked = tfhp->fh_locked = false;
>  	fh_drop_write(ffhp);
>  
>  	/*
> 
> 

Nice cleanup.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/8] NFSD: clean up locking.
  2022-07-06  4:18 [PATCH 0/8] NFSD: clean up locking NeilBrown
                   ` (7 preceding siblings ...)
  2022-07-06  4:18 ` [PATCH 5/8] NFSD: reduce locking in nfsd_lookup() NeilBrown
@ 2022-07-06 16:29 ` Chuck Lever III
  2022-07-12  2:33   ` NeilBrown
  8 siblings, 1 reply; 40+ messages in thread
From: Chuck Lever III @ 2022-07-06 16:29 UTC (permalink / raw)
  To: Neil Brown; +Cc: Jeff Layton, Linux NFS Mailing List



> On Jul 6, 2022, at 12:18 AM, NeilBrown <neilb@suse.de> wrote:
> 
> This series prepares NFSD to be able to adjust to work with a proposed
> patch which allows updates to directories to happen in parallel.
> This patch set changes the way directories are locked, so the present
> series cleans up some locking in nfsd.
> 
> Specifically we remove fh_lock() and fh_unlock().
> These functions are problematic for a few reasons.
> - they are deliberately idempotent - setting or clearing a flag
>  so that a second call does nothing.  This makes locking errors harder,
>  but it results in code that looks wrong ...  and maybe sometimes is a
>  little bit wrong.
>  Early patches clean up several places where this idempotent nature of
>  the functions is depended on, and so makes the code clearer.
> 
> - they transparently call fh_fill_pre/post_attrs(), including at times
>  when this is not necessary.  Having the calls only when necessary is
>  marginally more efficient, and arguably makes the code clearer.
> 
> nfsd_lookup() currently always locks the directory, though often no lock
> is needed.  So a patch in this series reduces this locking.
> 
> There is an awkward case that could still be further improved.
> NFSv4 open needs to ensure the file is not renamed (or unlinked etc)
> between the moment when the open succeeds, and a later moment when a
> "lease" is attached to support a delegation.  The handling of this lock
> is currently untidy, particularly when creating a file.
> It would probably be better to take a lease immediately after
> opening the file, and then discarding if after deciding not to provide a
> delegation.
> 
> I have run fstests and cthon tests on this, but I wouldn't be surprised
> if there is a corner case that I've missed.

Hi Neil, thanks for (re)posting.

Let me make a few general comments here before I send out specific
review nits.

I'm concerned mostly with how this series can be adequately tested.
The two particular areas I'm worried about:

 - There are some changes to NFSv2 code, which is effectively
   fallow. I think I can run v2 tests, once we decide what tests
   should be run.

 - More critically, ("NFSD: reduce locking in nfsd_lookup()") does
   some pretty heavy lifting. How should this change be tested?

Secondarily, the series adds more bells and whistles to the generic
NFSD VFS APIs on behalf of NFSv4-specific requirements. In particular:

 - ("NFSD: change nfsd_create() to unlock directory before returning.")
   makes some big changes to nfsd_create(). But that helper itself
   is pretty small. Seems like cleaner code would result if NFSv4
   had its own version of nfsd_create() to deal with the post-change
   cases.

 - ("NFSD: reduce locking in nfsd_lookup()") has a similar issue:
   nfsd_lookup() is being changed such that its semantics are
   substantially different for NFSv4 than for others. This is
   possibly an indication that nfsd_lookup() should also be
   duplicated into the NFSv4-specific code and the generic VFS
   version should be left alone.

I would prefer the code duplication approach in both these cases,
unless you can convince me that is a bad idea.

Finally, with regard to the awkward case you mention above. The
NFSv4 OPEN code is a hairy mess, mostly because the protocol is
a Swiss army knife and our implementation has had small fixes
plastered onto it for many years. I won't be disappointed if
you don't manage to address the rename/unlink/delegation race
you mention above this time around. Just don't make it worse ;-)

Meanwhile we should start accruing some thoughts and designs
about how this code path needs to work.


> NeilBrown
> 
> 
> ---
> 
> NeilBrown (8):
>      NFSD: drop rqstp arg to do_set_nfs4_acl()
>      NFSD: change nfsd_create() to unlock directory before returning.
>      NFSD: always drop directory lock in nfsd_unlink()
>      NFSD: only call fh_unlock() once in nfsd_link()
>      NFSD: reduce locking in nfsd_lookup()
>      NFSD: use explicit lock/unlock for directory ops
>      NFSD: use (un)lock_inode instead of fh_(un)lock for file operations
>      NFSD: discard fh_locked flag and fh_lock/fh_unlock
> 
> 
> fs/nfsd/nfs2acl.c   |   6 +-
> fs/nfsd/nfs3acl.c   |   4 +-
> fs/nfsd/nfs3proc.c  |  21 ++---
> fs/nfsd/nfs4acl.c   |  19 ++---
> fs/nfsd/nfs4proc.c  | 106 +++++++++++++++---------
> fs/nfsd/nfs4state.c |   8 +-
> fs/nfsd/nfsfh.c     |   3 +-
> fs/nfsd/nfsfh.h     |  56 +------------
> fs/nfsd/nfsproc.c   |  14 ++--
> fs/nfsd/vfs.c       | 193 ++++++++++++++++++++++++++------------------
> fs/nfsd/vfs.h       |  19 +++--
> 11 files changed, 238 insertions(+), 211 deletions(-)
> 
> --
> Signature
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/8] NFSD: change nfsd_create() to unlock directory before returning.
  2022-07-06  4:18 ` [PATCH 2/8] NFSD: change nfsd_create() to unlock directory before returning NeilBrown
  2022-07-06 13:24   ` Jeff Layton
@ 2022-07-06 16:29   ` Chuck Lever III
  1 sibling, 0 replies; 40+ messages in thread
From: Chuck Lever III @ 2022-07-06 16:29 UTC (permalink / raw)
  To: Neil Brown; +Cc: Jeff Layton, Linux NFS Mailing List



> On Jul 6, 2022, at 12:18 AM, NeilBrown <neilb@suse.de> wrote:
> 
> nfsd_create() usually exits with the directory still locked.  This
> relies on other code to unlock the directory.  Planned future patches
> will change how directory locking works so the unlock step may be less
> trivial.  It is cleaner to have lock and unlock in the same function.
> 
> nfsd4_create() performs some extra changes after the creation and before
> the unlock - setting security label and an ACL.  To allow for these to
> still be done while locked, we create a function nfsd4_post_create() and
> pass it to nfsd_create() when needed.
> 
> nfsd_symlink() DOES usually unlock the directory, but nfsd4_create() may
> add a label or ACL - with the directory unlocked.  I don't think symlinks
> have ACLs and don't know if they can have labels, so I don't know if
> this is of any practical consequence.  For consistency nfsd_symlink() is
> changed to accept the same callback and call it if given.
> 
> nfsd_symlink() didn't unlock the directory if lookup_one_len() gave an
> error.  This is untidy and potentially confusing, and has now been
> fixed.  It isn't a practical problem as an eventual fh_put() will unlock
> if needed.

I would like confirmation that NFSv4 symlinks cannot have ACLs
or security labels before committing to changing nfsd_symlink()
too. I can have a look at specifications and ask around.


> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
> fs/nfsd/nfs3proc.c |   11 ++++++-----
> fs/nfsd/nfs4proc.c |   38 ++++++++++++++++++++++++--------------
> fs/nfsd/nfsproc.c  |    5 +++--
> fs/nfsd/vfs.c      |   40 +++++++++++++++++++++++++++-------------
> fs/nfsd/vfs.h      |   11 ++++++++---
> 5 files changed, 68 insertions(+), 37 deletions(-)
> 
> diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
> index 981a3a7a6e16..38255365ef71 100644
> --- a/fs/nfsd/nfs3proc.c
> +++ b/fs/nfsd/nfs3proc.c
> @@ -378,8 +378,8 @@ nfsd3_proc_mkdir(struct svc_rqst *rqstp)
> 	fh_copy(&resp->dirfh, &argp->fh);
> 	fh_init(&resp->fh, NFS3_FHSIZE);
> 	resp->status = nfsd_create(rqstp, &resp->dirfh, argp->name, argp->len,
> -				   &argp->attrs, S_IFDIR, 0, &resp->fh);
> -	fh_unlock(&resp->dirfh);
> +				   &argp->attrs, S_IFDIR, 0, &resp->fh,
> +				   NULL, NULL);
> 	return rpc_success;
> }
> 
> @@ -414,7 +414,8 @@ nfsd3_proc_symlink(struct svc_rqst *rqstp)
> 	fh_copy(&resp->dirfh, &argp->ffh);
> 	fh_init(&resp->fh, NFS3_FHSIZE);
> 	resp->status = nfsd_symlink(rqstp, &resp->dirfh, argp->fname,
> -				    argp->flen, argp->tname, &resp->fh);
> +				    argp->flen, argp->tname, &resp->fh,
> +				    NULL, NULL);
> 	kfree(argp->tname);
> out:
> 	return rpc_success;
> @@ -453,8 +454,8 @@ nfsd3_proc_mknod(struct svc_rqst *rqstp)
> 
> 	type = nfs3_ftypes[argp->ftype];
> 	resp->status = nfsd_create(rqstp, &resp->dirfh, argp->name, argp->len,
> -				   &argp->attrs, type, rdev, &resp->fh);
> -	fh_unlock(&resp->dirfh);
> +				   &argp->attrs, type, rdev, &resp->fh,
> +				   NULL, NULL);
> out:
> 	return rpc_success;
> }
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 60591ceb4985..3279daab909d 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -780,6 +780,18 @@ nfsd4_commit(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> 			     (__be32 *)commit->co_verf.data);
> }
> 
> +static void nfsd4_post_create(struct svc_fh *fh, void *vcreate)
> +{
> +	struct nfsd4_create *create = vcreate;
> +
> +	if (create->cr_label.len)
> +		nfsd4_security_inode_setsecctx(fh, &create->cr_label,
> +					       create->cr_bmval);
> +
> +	if (create->cr_acl != NULL)
> +		do_set_nfs4_acl(fh, create->cr_acl, create->cr_bmval);
> +}
> +
> static __be32
> nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> 	     union nfsd4_op_u *u)
> @@ -805,7 +817,8 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> 	case NF4LNK:
> 		status = nfsd_symlink(rqstp, &cstate->current_fh,
> 				      create->cr_name, create->cr_namelen,
> -				      create->cr_data, &resfh);
> +				      create->cr_data, &resfh,
> +				      nfsd4_post_create, create);
> 		break;
> 
> 	case NF4BLK:
> @@ -816,7 +829,8 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> 			goto out_umask;
> 		status = nfsd_create(rqstp, &cstate->current_fh,
> 				     create->cr_name, create->cr_namelen,
> -				     &create->cr_iattr, S_IFBLK, rdev, &resfh);
> +				     &create->cr_iattr, S_IFBLK, rdev, &resfh,
> +				     nfsd4_post_create, create);
> 		break;
> 
> 	case NF4CHR:
> @@ -827,26 +841,30 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> 			goto out_umask;
> 		status = nfsd_create(rqstp, &cstate->current_fh,
> 				     create->cr_name, create->cr_namelen,
> -				     &create->cr_iattr, S_IFCHR, rdev, &resfh);
> +				     &create->cr_iattr, S_IFCHR, rdev, &resfh,
> +				     nfsd4_post_create, create);
> 		break;
> 
> 	case NF4SOCK:
> 		status = nfsd_create(rqstp, &cstate->current_fh,
> 				     create->cr_name, create->cr_namelen,
> -				     &create->cr_iattr, S_IFSOCK, 0, &resfh);
> +				     &create->cr_iattr, S_IFSOCK, 0, &resfh,
> +				     nfsd4_post_create, create);
> 		break;
> 
> 	case NF4FIFO:
> 		status = nfsd_create(rqstp, &cstate->current_fh,
> 				     create->cr_name, create->cr_namelen,
> -				     &create->cr_iattr, S_IFIFO, 0, &resfh);
> +				     &create->cr_iattr, S_IFIFO, 0, &resfh,
> +				     nfsd4_post_create, create);
> 		break;
> 
> 	case NF4DIR:
> 		create->cr_iattr.ia_valid &= ~ATTR_SIZE;
> 		status = nfsd_create(rqstp, &cstate->current_fh,
> 				     create->cr_name, create->cr_namelen,
> -				     &create->cr_iattr, S_IFDIR, 0, &resfh);
> +				     &create->cr_iattr, S_IFDIR, 0, &resfh,
> +				     nfsd4_post_create, create);
> 		break;
> 
> 	default:
> @@ -856,14 +874,6 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> 	if (status)
> 		goto out;
> 
> -	if (create->cr_label.len)
> -		nfsd4_security_inode_setsecctx(&resfh, &create->cr_label, create->cr_bmval);
> -
> -	if (create->cr_acl != NULL)
> -		do_set_nfs4_acl(&resfh, create->cr_acl,
> -				create->cr_bmval);
> -
> -	fh_unlock(&cstate->current_fh);
> 	set_change_info(&create->cr_cinfo, &cstate->current_fh);
> 	fh_dup2(&cstate->current_fh, &resfh);
> out:
> diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
> index fcdab8a8a41f..a25b8e321662 100644
> --- a/fs/nfsd/nfsproc.c
> +++ b/fs/nfsd/nfsproc.c
> @@ -493,7 +493,7 @@ nfsd_proc_symlink(struct svc_rqst *rqstp)
> 
> 	fh_init(&newfh, NFS_FHSIZE);
> 	resp->status = nfsd_symlink(rqstp, &argp->ffh, argp->fname, argp->flen,
> -				    argp->tname, &newfh);
> +				    argp->tname, &newfh, NULL, NULL);
> 
> 	kfree(argp->tname);
> 	fh_put(&argp->ffh);
> @@ -522,7 +522,8 @@ nfsd_proc_mkdir(struct svc_rqst *rqstp)
> 	argp->attrs.ia_valid &= ~ATTR_SIZE;
> 	fh_init(&resp->fh, NFS_FHSIZE);
> 	resp->status = nfsd_create(rqstp, &argp->fh, argp->name, argp->len,
> -				   &argp->attrs, S_IFDIR, 0, &resp->fh);
> +				   &argp->attrs, S_IFDIR, 0, &resp->fh,
> +				   NULL, NULL);
> 	fh_put(&argp->fh);
> 	if (resp->status != nfs_ok)
> 		goto out;
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index d79db56475d4..1e7ca39e8a49 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1366,8 +1366,10 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  */
> __be32
> nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> -		char *fname, int flen, struct iattr *iap,
> -		int type, dev_t rdev, struct svc_fh *resfhp)
> +	    char *fname, int flen, struct iattr *iap,
> +	    int type, dev_t rdev, struct svc_fh *resfhp,
> +	    void (*post_create)(struct svc_fh *fh, void *data),
> +	    void *data)
> {
> 	struct dentry	*dentry, *dchild = NULL;
> 	__be32		err;
> @@ -1389,8 +1391,10 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> 	fh_lock_nested(fhp, I_MUTEX_PARENT);
> 	dchild = lookup_one_len(fname, dentry, flen);
> 	host_err = PTR_ERR(dchild);
> -	if (IS_ERR(dchild))
> -		return nfserrno(host_err);
> +	if (IS_ERR(dchild)) {
> +		err = nfserrno(host_err);
> +		goto out_unlock;
> +	}
> 	err = fh_compose(resfhp, fhp->fh_export, dchild, fhp);
> 	/*
> 	 * We unconditionally drop our ref to dchild as fh_compose will have
> @@ -1398,9 +1402,14 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> 	 */
> 	dput(dchild);
> 	if (err)
> -		return err;
> -	return nfsd_create_locked(rqstp, fhp, fname, flen, iap, type,
> -					rdev, resfhp);
> +		goto out_unlock;
> +	err = nfsd_create_locked(rqstp, fhp, fname, flen, iap, type,
> +				 rdev, resfhp);
> +	if (!err && post_create)
> +		post_create(resfhp, data);
> +out_unlock:
> +	fh_unlock(fhp);
> +	return err;
> }
> 
> /*
> @@ -1447,9 +1456,11 @@ nfsd_readlink(struct svc_rqst *rqstp, struct svc_fh *fhp, char *buf, int *lenp)
>  */
> __be32
> nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
> -				char *fname, int flen,
> -				char *path,
> -				struct svc_fh *resfhp)
> +	     char *fname, int flen,
> +	     char *path,
> +	     struct svc_fh *resfhp,
> +	     void (*post_create)(struct svc_fh *fh, void *data),
> +	     void *data)
> {
> 	struct dentry	*dentry, *dnew;
> 	__be32		err, cerr;
> @@ -1474,12 +1485,12 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
> 	dentry = fhp->fh_dentry;
> 	dnew = lookup_one_len(fname, dentry, flen);
> 	host_err = PTR_ERR(dnew);
> -	if (IS_ERR(dnew))
> +	if (IS_ERR(dnew)) {
> +		fh_unlock(fhp);
> 		goto out_nfserr;
> -
> +	}
> 	host_err = vfs_symlink(&init_user_ns, d_inode(dentry), dnew, path);
> 	err = nfserrno(host_err);
> -	fh_unlock(fhp);
> 	if (!err)
> 		err = nfserrno(commit_metadata(fhp));
> 
> @@ -1488,6 +1499,9 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
> 	cerr = fh_compose(resfhp, fhp->fh_export, dnew, fhp);
> 	dput(dnew);
> 	if (err==0) err = cerr;
> +	if (!err && post_create)
> +		post_create(resfhp, data);
> +	fh_unlock(fhp);
> out:
> 	return err;
> 
> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> index 26347d76f44a..9f4fd3060200 100644
> --- a/fs/nfsd/vfs.h
> +++ b/fs/nfsd/vfs.h
> @@ -66,8 +66,10 @@ __be32		nfsd_create_locked(struct svc_rqst *, struct svc_fh *,
> 				char *name, int len, struct iattr *attrs,
> 				int type, dev_t rdev, struct svc_fh *res);
> __be32		nfsd_create(struct svc_rqst *, struct svc_fh *,
> -				char *name, int len, struct iattr *attrs,
> -				int type, dev_t rdev, struct svc_fh *res);
> +			    char *name, int len, struct iattr *attrs,
> +			    int type, dev_t rdev, struct svc_fh *res,
> +			    void (*post_create)(struct svc_fh *fh, void *data),
> +			    void *data);
> __be32		nfsd_access(struct svc_rqst *, struct svc_fh *, u32 *, u32 *);
> __be32		nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp,
> 				struct svc_fh *resfhp, struct iattr *iap);
> @@ -111,7 +113,10 @@ __be32		nfsd_readlink(struct svc_rqst *, struct svc_fh *,
> 				char *, int *);
> __be32		nfsd_symlink(struct svc_rqst *, struct svc_fh *,
> 				char *name, int len, char *path,
> -				struct svc_fh *res);
> +				struct svc_fh *res,
> +				void (*post_create)(struct svc_fh *fh,
> +						    void *data),
> +				void *data);
> __be32		nfsd_link(struct svc_rqst *, struct svc_fh *,
> 				char *, int, struct svc_fh *);
> ssize_t		nfsd_copy_file_range(struct file *, u64,
> 
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/8] NFSD: only call fh_unlock() once in nfsd_link()
  2022-07-06  4:18 ` [PATCH 4/8] NFSD: only call fh_unlock() once in nfsd_link() NeilBrown
  2022-07-06 13:31   ` Jeff Layton
@ 2022-07-06 16:29   ` Chuck Lever III
  1 sibling, 0 replies; 40+ messages in thread
From: Chuck Lever III @ 2022-07-06 16:29 UTC (permalink / raw)
  To: Neil Brown; +Cc: Jeff Layton, Linux NFS Mailing List



> On Jul 6, 2022, at 12:18 AM, NeilBrown <neilb@suse.de> wrote:
> 
> On non-error paths, nfsd_link() calls fh_unlock() twice.  This is safe
> because fh_unlock() records that the unlock has been done and doesn't
> repeat it.
> However it makes the code a little confusing and interferes with changes
> that are planned for directory locking.
> 
> So rearrange the code to ensure fh_unlock() is called exactly once if
> fh_lock() was called.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
> fs/nfsd/vfs.c |   18 ++++++++++--------
> 1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 3f4579f5775c..4916c29af0fa 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1551,8 +1551,10 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
> 
> 	dnew = lookup_one_len(name, ddir, len);
> 	host_err = PTR_ERR(dnew);
> -	if (IS_ERR(dnew))
> -		goto out_nfserr;
> +	if (IS_ERR(dnew)) {
> +		err = nfserrno(host_err);
> +		goto out_unlock;
> +	}

Nit: Let's do it this way:

	dnew = lookup_one_len(name, ddir, len);
	if (IS_ERR(dnew)) {
		err = nfserrno(PTR_ERR(dnew);
		goto out_unlock;
	}

> 	dold = tfhp->fh_dentry;
> 
> @@ -1571,17 +1573,17 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
> 		else
> 			err = nfserrno(host_err);
> 	}
> -out_dput:
> 	dput(dnew);
> -out_unlock:
> -	fh_unlock(ffhp);
> +out_drop_write:
> 	fh_drop_write(tfhp);
> out:
> 	return err;
> 
> -out_nfserr:
> -	err = nfserrno(host_err);
> -	goto out_unlock;
> +out_dput:
> +	dput(dnew);
> +out_unlock:
> +	fh_unlock(ffhp);
> +	goto out_drop_write;
> }
> 
> static void
> 
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/8] NFSD: reduce locking in nfsd_lookup()
  2022-07-06  4:18 ` [PATCH 5/8] NFSD: reduce locking in nfsd_lookup() NeilBrown
  2022-07-06 13:47   ` Jeff Layton
@ 2022-07-06 16:29   ` Chuck Lever III
  2022-07-07  1:29     ` NeilBrown
  1 sibling, 1 reply; 40+ messages in thread
From: Chuck Lever III @ 2022-07-06 16:29 UTC (permalink / raw)
  To: Neil Brown; +Cc: Jeff Layton, Linux NFS Mailing List



> On Jul 6, 2022, at 12:18 AM, NeilBrown <neilb@suse.de> wrote:
> 
> nfsd_lookup() takes an exclusive lock on the parent inode, but many
> callers don't want the lock and may not need to lock at all if the
> result is in the dcache.
> 
> Change nfsd_lookup() to be passed a bool flag.
> If false, don't take the lock.
> If true, do take an exclusive lock, and return with it held if
> successful.
> If nfsd_lookup() returns an error, the lock WILL NOT be held.
> 
> Only nfsd4_open() requests the lock to be held, and does so to block
> rename until it decides whether to return a delegation.
> 
> NOTE: when nfsd4_open() creates a file, the directory does *NOT* remain
>  locked and never has.  So it is possible (though unlikely) for the
>  newly created file to be renamed before a delegation is handed out,
>  and that would be bad.  This patch does *NOT* fix that, but *DOES*
>  take the directory lock immediately after creating the file, which
>  reduces the size of the window and ensure that the lock is held
>  consistently.  More work is needed to guarantee no rename happens
>  before the delegation.
> 
> NOTE-2: NFSv4 requires directory changeinfo for OPEN even when a create
>  wasn't requested and no change happened.  Now that nfsd_lookup()
>  doesn't use fh_lock(), we need explicit fh_fill_pre/post_attrs()
>  in the non-create branch of do_open_lookup().
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
> fs/nfsd/nfs3proc.c |    2 +-
> fs/nfsd/nfs4proc.c |   51 ++++++++++++++++++++++++++++------------
> fs/nfsd/nfsproc.c  |    2 +-
> fs/nfsd/vfs.c      |   66 +++++++++++++++++++++++++++++++++++-----------------
> fs/nfsd/vfs.h      |    8 ++++--
> 5 files changed, 88 insertions(+), 41 deletions(-)
> 
> diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
> index ad7941001106..3a67d0afb885 100644
> --- a/fs/nfsd/nfs3proc.c
> +++ b/fs/nfsd/nfs3proc.c
> @@ -96,7 +96,7 @@ nfsd3_proc_lookup(struct svc_rqst *rqstp)
> 
> 	resp->status = nfsd_lookup(rqstp, &resp->dirfh,
> 				   argp->name, argp->len,
> -				   &resp->fh);
> +				   &resp->fh, false);
> 	return rpc_success;
> }
> 
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 4737019738ab..6ec22c69cbec 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -414,7 +414,8 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> }
> 
> static __be32
> -do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd4_open *open, struct svc_fh **resfh)
> +do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> +	       struct nfsd4_open *open, struct svc_fh **resfh)
> {
> 	struct svc_fh *current_fh = &cstate->current_fh;
> 	int accmode;
> @@ -441,11 +442,18 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru
> 		 * yes          | no     | GUARDED4        | GUARDED4
> 		 * yes          | yes    | GUARDED4        | GUARDED4
> 		 */
> -
> 		current->fs->umask = open->op_umask;
> 		status = nfsd4_create_file(rqstp, current_fh, *resfh, open);
> 		current->fs->umask = 0;
> 
> +		if (!status)
> +			/* We really want to hold the lock from before the
> +			 * create to ensure no rename happens, but that
> +			 * needs more work...
> +			 */
> +			inode_lock_nested(current_fh->fh_dentry->d_inode,
> +					  I_MUTEX_PARENT);
> +
> 		if (!status && open->op_label.len)
> 			nfsd4_security_inode_setsecctx(*resfh, &open->op_label, open->op_bmval);
> 
> @@ -457,17 +465,25 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru
> 		if (nfsd4_create_is_exclusive(open->op_createmode) && status == 0)
> 			open->op_bmval[1] |= (FATTR4_WORD1_TIME_ACCESS |
> 						FATTR4_WORD1_TIME_MODIFY);
> -	} else
> -		/*
> -		 * Note this may exit with the parent still locked.
> -		 * We will hold the lock until nfsd4_open's final
> -		 * lookup, to prevent renames or unlinks until we've had
> -		 * a chance to an acquire a delegation if appropriate.
> +	} else {
> +		/* We want to keep the directory locked until we've had a chance
> +		 * to acquire a delegation if appropriate, so request that
> +		 * nfsd_lookup() hold on to the lock.
> 		 */
> 		status = nfsd_lookup(rqstp, current_fh,
> -				     open->op_fname, open->op_fnamelen, *resfh);
> +				     open->op_fname, open->op_fnamelen, *resfh,
> +				     true);
> +		if (!status) {
> +			/* NFSv4 protocol requires change attributes even though
> +			 * no change happened.
> +			 */
> +			fh_fill_pre_attrs(current_fh);
> +			fh_fill_post_attrs(current_fh);

If this is really correct, the comment should also state that
no concurrent changes to the parent are possible during
the lookup, and thus the pre and post attributes are expected
to be the same always.

Otherwise, this code paragraph looks just a little insane ;-)


> +		}
> +	}
> 	if (status)
> -		goto out;
> +		return status;
> +
> 	status = nfsd_check_obj_isreg(*resfh);
> 	if (status)
> 		goto out;
> @@ -483,6 +499,8 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru
> 	status = do_open_permission(rqstp, *resfh, open, accmode);
> 	set_change_info(&open->op_cinfo, current_fh);
> out:
> +	if (status)
> +		inode_unlock(current_fh->fh_dentry->d_inode);
> 	return status;
> }
> 
> @@ -540,6 +558,7 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> 	struct net *net = SVC_NET(rqstp);
> 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> 	bool reclaim = false;
> +	bool locked = false;
> 
> 	dprintk("NFSD: nfsd4_open filename %.*s op_openowner %p\n",
> 		(int)open->op_fnamelen, open->op_fname,
> @@ -604,6 +623,7 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> 		status = do_open_lookup(rqstp, cstate, open, &resfh);
> 		if (status)
> 			goto out;
> +		locked = true;
> 		break;
> 	case NFS4_OPEN_CLAIM_PREVIOUS:
> 		status = nfs4_check_open_reclaim(cstate->clp);
> @@ -639,6 +659,8 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> 		fput(open->op_filp);
> 		open->op_filp = NULL;
> 	}
> +	if (locked)
> +		inode_unlock(cstate->current_fh.fh_dentry->d_inode);
> 	if (resfh && resfh != &cstate->current_fh) {
> 		fh_dup2(&cstate->current_fh, resfh);
> 		fh_put(resfh);
> @@ -933,7 +955,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
> 		return nfserr_noent;
> 	}
> 	fh_put(&tmp_fh);
> -	return nfsd_lookup(rqstp, fh, "..", 2, fh);
> +	return nfsd_lookup(rqstp, fh, "..", 2, fh, false);
> }
> 
> static __be32
> @@ -949,7 +971,7 @@ nfsd4_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> {
> 	return nfsd_lookup(rqstp, &cstate->current_fh,
> 			   u->lookup.lo_name, u->lookup.lo_len,
> -			   &cstate->current_fh);
> +			   &cstate->current_fh, false);
> }
> 
> static __be32
> @@ -1089,11 +1111,10 @@ nfsd4_secinfo(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> 	if (err)
> 		return err;
> 	err = nfsd_lookup_dentry(rqstp, &cstate->current_fh,
> -				    secinfo->si_name, secinfo->si_namelen,
> -				    &exp, &dentry);
> +				 secinfo->si_name, secinfo->si_namelen,
> +				 &exp, &dentry, false);
> 	if (err)
> 		return err;
> -	fh_unlock(&cstate->current_fh);
> 	if (d_really_is_negative(dentry)) {
> 		exp_put(exp);
> 		err = nfserr_noent;
> diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
> index a25b8e321662..ed24fae09517 100644
> --- a/fs/nfsd/nfsproc.c
> +++ b/fs/nfsd/nfsproc.c
> @@ -133,7 +133,7 @@ nfsd_proc_lookup(struct svc_rqst *rqstp)
> 
> 	fh_init(&resp->fh, NFS_FHSIZE);
> 	resp->status = nfsd_lookup(rqstp, &argp->fh, argp->name, argp->len,
> -				   &resp->fh);
> +				   &resp->fh, false);
> 	fh_put(&argp->fh);
> 	if (resp->status != nfs_ok)
> 		goto out;
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 4916c29af0fa..8e050c6d112a 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -172,7 +172,8 @@ int nfsd_mountpoint(struct dentry *dentry, struct svc_export *exp)
> __be32
> nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp,
> 		   const char *name, unsigned int len,
> -		   struct svc_export **exp_ret, struct dentry **dentry_ret)
> +		   struct svc_export **exp_ret, struct dentry **dentry_ret,
> +		   bool locked)
> {
> 	struct svc_export	*exp;
> 	struct dentry		*dparent;
> @@ -199,27 +200,31 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp,
> 				goto out_nfserr;
> 		}
> 	} else {
> -		/*
> -		 * In the nfsd4_open() case, this may be held across
> -		 * subsequent open and delegation acquisition which may
> -		 * need to take the child's i_mutex:
> -		 */
> -		fh_lock_nested(fhp, I_MUTEX_PARENT);
> -		dentry = lookup_one_len(name, dparent, len);
> +		if (locked)
> +			dentry = lookup_one_len(name, dparent, len);
> +		else
> +			dentry = lookup_one_len_unlocked(name, dparent, len);
> 		host_err = PTR_ERR(dentry);
> 		if (IS_ERR(dentry))
> 			goto out_nfserr;
> 		if (nfsd_mountpoint(dentry, exp)) {
> 			/*
> -			 * We don't need the i_mutex after all.  It's
> -			 * still possible we could open this (regular
> -			 * files can be mountpoints too), but the
> -			 * i_mutex is just there to prevent renames of
> -			 * something that we might be about to delegate,
> -			 * and a mountpoint won't be renamed:
> +			 * nfsd_cross_mnt() may wait for an upcall
> +			 * to userspace, and holding i_sem across that
> +			 * invites the possibility of a deadlock.
> +			 * We don't really need the lock on the parent
> +			 * of a mount point was we only need it to guard
> +			 * against a rename before we get a lease for a
> +			 * delegation.
> +			 * So just drop the i_sem and reclaim it.
> 			 */
> -			fh_unlock(fhp);
> -			if ((host_err = nfsd_cross_mnt(rqstp, &dentry, &exp))) {
> +			if (locked)
> +				inode_unlock(dparent->d_inode);
> +			host_err = nfsd_cross_mnt(rqstp, &dentry, &exp);
> +			if (locked)
> +				inode_lock_nested(dparent->d_inode,
> +						  I_MUTEX_PARENT);
> +			if (host_err) {
> 				dput(dentry);
> 				goto out_nfserr;
> 			}
> @@ -234,7 +239,17 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp,
> 	return nfserrno(host_err);
> }
> 
> -/*
> +/**
> + * nfsd_lookup - look up a single path component for nfsd
> + *
> + * @rqstp:   the request context
> + * @ftp:     the file handle of the directory
> + * @name:    the component name, or %NULL to look up parent
> + * @len:     length of name to examine
> + * @resfh:   pointer to pre-initialised filehandle to hold result.
> + * @lock:    if true, lock directory during lookup and keep it locked
> + *           if there is no error.
> + *
>  * Look up one component of a pathname.
>  * N.B. After this call _both_ fhp and resfh need an fh_put
>  *
> @@ -244,11 +259,15 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  * returned. Otherwise the covered directory is returned.
>  * NOTE: this mountpoint crossing is not supported properly by all
>  *   clients and is explicitly disallowed for NFSv3
> - *      NeilBrown <neilb@cse.unsw.edu.au>
> + *
> + * Only nfsd4_open() calls this with @lock set.  It does so to block
> + * renames/unlinks before it possibly gets a lease to provide a
> + * delegation.
>  */
> __be32
> nfsd_lookup(struct svc_rqst *rqstp, struct svc_fh *fhp, const char *name,
> -				unsigned int len, struct svc_fh *resfh)
> +	    unsigned int len, struct svc_fh *resfh,
> +	    bool lock)
> {
> 	struct svc_export	*exp;
> 	struct dentry		*dentry;
> @@ -257,9 +276,11 @@ nfsd_lookup(struct svc_rqst *rqstp, struct svc_fh *fhp, const char *name,
> 	err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
> 	if (err)
> 		return err;
> -	err = nfsd_lookup_dentry(rqstp, fhp, name, len, &exp, &dentry);
> +	if (lock)
> +		inode_lock_nested(fhp->fh_dentry->d_inode, I_MUTEX_PARENT);
> +	err = nfsd_lookup_dentry(rqstp, fhp, name, len, &exp, &dentry, lock);
> 	if (err)
> -		return err;
> +		goto out_err;
> 	err = check_nfsd_access(exp, rqstp);
> 	if (err)
> 		goto out;
> @@ -273,6 +294,9 @@ nfsd_lookup(struct svc_rqst *rqstp, struct svc_fh *fhp, const char *name,
> out:
> 	dput(dentry);
> 	exp_put(exp);
> +out_err:
> +	if (err && lock)
> +		inode_unlock(fhp->fh_dentry->d_inode);
> 	return err;
> }
> 
> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> index 9f4fd3060200..290788f007d4 100644
> --- a/fs/nfsd/vfs.h
> +++ b/fs/nfsd/vfs.h
> @@ -45,10 +45,12 @@ typedef int (*nfsd_filldir_t)(void *, const char *, int, loff_t, u64, unsigned);
> int		nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp,
> 		                struct svc_export **expp);
> __be32		nfsd_lookup(struct svc_rqst *, struct svc_fh *,
> -				const char *, unsigned int, struct svc_fh *);
> +			    const char *, unsigned int, struct svc_fh *,
> +			    bool);
> __be32		 nfsd_lookup_dentry(struct svc_rqst *, struct svc_fh *,
> -				const char *, unsigned int,
> -				struct svc_export **, struct dentry **);
> +				    const char *, unsigned int,
> +				    struct svc_export **, struct dentry **,
> +				    bool);
> __be32		nfsd_setattr(struct svc_rqst *, struct svc_fh *,
> 				struct iattr *, int, time64_t);
> int nfsd_mountpoint(struct dentry *, struct svc_export *);
> 
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 6/8] NFSD: use explicit lock/unlock for directory ops
  2022-07-06  4:18 ` [PATCH 6/8] NFSD: use explicit lock/unlock for directory ops NeilBrown
  2022-07-06 14:05   ` Jeff Layton
@ 2022-07-06 16:29   ` Chuck Lever III
  2022-07-15 16:11   ` Jeff Layton
  2 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever III @ 2022-07-06 16:29 UTC (permalink / raw)
  To: Neil Brown; +Cc: Jeff Layton, Linux NFS Mailing List



> On Jul 6, 2022, at 12:18 AM, NeilBrown <neilb@suse.de> wrote:
> 
> When creating or unlinking a name in a directory use explicit
> inode_lock_nested() instead of fh_lock(), and explicit calls to
> fh_fill_pre_attrs() and fh_fill_post_attrs().  This is already done for
> renames.
> 
> Also move the 'fill' calls closer to the operation that might change the
> attributes.  This way they are avoided on some error paths.
> 
> Having the locking explicit will simplify proposed future changes to
> locking for directories.  It also makes it easily visible exactly where
> pre/post attributes are used - not all callers of fh_lock() actually
> need the pre/post attributes.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
> fs/nfsd/nfs3proc.c |    6 ++++--
> fs/nfsd/nfs4proc.c |    6 ++++--
> fs/nfsd/nfsproc.c  |    7 ++++---
> fs/nfsd/vfs.c      |   30 +++++++++++++++++++-----------
> 4 files changed, 31 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
> index 3a67d0afb885..9629517344ff 100644
> --- a/fs/nfsd/nfs3proc.c
> +++ b/fs/nfsd/nfs3proc.c
> @@ -254,7 +254,7 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> 	if (host_err)
> 		return nfserrno(host_err);
> 
> -	fh_lock_nested(fhp, I_MUTEX_PARENT);
> +	inode_lock_nested(inode, I_MUTEX_PARENT);
> 
> 	child = lookup_one_len(argp->name, parent, argp->len);
> 	if (IS_ERR(child)) {
> @@ -312,11 +312,13 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> 	if (!IS_POSIXACL(inode))
> 		iap->ia_mode &= ~current_umask();
> 
> +	fh_fill_pre_attrs(fhp);
> 	host_err = vfs_create(&init_user_ns, inode, child, iap->ia_mode, true);
> 	if (host_err < 0) {
> 		status = nfserrno(host_err);
> 		goto out;
> 	}
> +	fh_fill_post_attrs(fhp);
> 
> 	/* A newly created file already has a file size of zero. */
> 	if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0))
> @@ -334,7 +336,7 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> 	status = nfsd_create_setattr(rqstp, fhp, resfhp, iap);
> 
> out:
> -	fh_unlock(fhp);
> +	inode_unlock(inode);
> 	if (child && !IS_ERR(child))
> 		dput(child);
> 	fh_drop_write(fhp);
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 6ec22c69cbec..242f059e6788 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -306,7 +306,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> 	if (host_err)
> 		return nfserrno(host_err);
> 
> -	fh_lock_nested(fhp, I_MUTEX_PARENT);
> +	inode_lock_nested(inode, I_MUTEX_PARENT);
> 
> 	child = lookup_one_len(open->op_fname, parent, open->op_fnamelen);
> 	if (IS_ERR(child)) {
> @@ -385,10 +385,12 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> 	if (!IS_POSIXACL(inode))
> 		iap->ia_mode &= ~current_umask();
> 
> +	fh_fill_pre_attrs(fhp);
> 	status = nfsd4_vfs_create(fhp, child, open);
> 	if (status != nfs_ok)
> 		goto out;
> 	open->op_created = true;
> +	fh_fill_post_attrs(fhp);
> 
> 	/* A newly created file already has a file size of zero. */
> 	if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0))
> @@ -406,7 +408,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> 	status = nfsd_create_setattr(rqstp, fhp, resfhp, iap);
> 
> out:
> -	fh_unlock(fhp);
> +	inode_unlock(inode);
> 	if (child && !IS_ERR(child))
> 		dput(child);
> 	fh_drop_write(fhp);
> diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
> index ed24fae09517..427c404bc52b 100644
> --- a/fs/nfsd/nfsproc.c
> +++ b/fs/nfsd/nfsproc.c
> @@ -285,7 +285,7 @@ nfsd_proc_create(struct svc_rqst *rqstp)
> 		goto done;
> 	}
> 
> -	fh_lock_nested(dirfhp, I_MUTEX_PARENT);
> +	inode_lock_nested(dirfhp->fh_dentry->d_inode, I_MUTEX_PARENT);
> 	dchild = lookup_one_len(argp->name, dirfhp->fh_dentry, argp->len);
> 	if (IS_ERR(dchild)) {
> 		resp->status = nfserrno(PTR_ERR(dchild));
> @@ -382,6 +382,7 @@ nfsd_proc_create(struct svc_rqst *rqstp)
> 	}
> 
> 	resp->status = nfs_ok;
> +	fh_fill_pre_attrs(dirfhp);
> 	if (!inode) {
> 		/* File doesn't exist. Create it and set attrs */
> 		resp->status = nfsd_create_locked(rqstp, dirfhp, argp->name,
> @@ -399,10 +400,10 @@ nfsd_proc_create(struct svc_rqst *rqstp)
> 			resp->status = nfsd_setattr(rqstp, newfhp, attr, 0,
> 						    (time64_t)0);
> 	}
> +	fh_fill_post_attrs(dirfhp);

Are the fh_fill_* twins necessary for NFSv2 CREATE?


> out_unlock:
> -	/* We don't really need to unlock, as fh_put does it. */
> -	fh_unlock(dirfhp);
> +	inode_unlock(dirfhp->fh_dentry->d_inode);
> 	fh_drop_write(dirfhp);
> done:
> 	fh_put(dirfhp);
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 8e050c6d112a..2ca748aa83bb 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1412,7 +1412,7 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> 	if (host_err)
> 		return nfserrno(host_err);
> 
> -	fh_lock_nested(fhp, I_MUTEX_PARENT);
> +	inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
> 	dchild = lookup_one_len(fname, dentry, flen);
> 	host_err = PTR_ERR(dchild);
> 	if (IS_ERR(dchild)) {
> @@ -1427,12 +1427,14 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> 	dput(dchild);
> 	if (err)
> 		goto out_unlock;
> +	fh_fill_pre_attrs(fhp);
> 	err = nfsd_create_locked(rqstp, fhp, fname, flen, iap, type,
> 				 rdev, resfhp);
> 	if (!err && post_create)
> 		post_create(resfhp, data);
> +	fh_fill_post_attrs(fhp);
> out_unlock:
> -	fh_unlock(fhp);
> +	inode_unlock(dentry->d_inode);
> 	return err;
> }
> 
> @@ -1505,14 +1507,15 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
> 	if (host_err)
> 		goto out_nfserr;
> 
> -	fh_lock(fhp);
> 	dentry = fhp->fh_dentry;
> +	inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
> 	dnew = lookup_one_len(fname, dentry, flen);
> 	host_err = PTR_ERR(dnew);
> 	if (IS_ERR(dnew)) {
> -		fh_unlock(fhp);
> +		inode_unlock(dentry->d_inode);
> 		goto out_nfserr;
> 	}
> +	fh_fill_pre_attrs(fhp);
> 	host_err = vfs_symlink(&init_user_ns, d_inode(dentry), dnew, path);
> 	err = nfserrno(host_err);
> 	if (!err)
> @@ -1525,7 +1528,8 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
> 	if (err==0) err = cerr;
> 	if (!err && post_create)
> 		post_create(resfhp, data);
> -	fh_unlock(fhp);
> +	fh_fill_post_attrs(fhp);
> +	inode_unlock(dentry->d_inode);
> out:
> 	return err;
> 
> @@ -1569,9 +1573,9 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
> 		goto out;
> 	}
> 
> -	fh_lock_nested(ffhp, I_MUTEX_PARENT);
> 	ddir = ffhp->fh_dentry;
> 	dirp = d_inode(ddir);
> +	inode_lock_nested(dirp, I_MUTEX_PARENT);
> 
> 	dnew = lookup_one_len(name, ddir, len);
> 	host_err = PTR_ERR(dnew);
> @@ -1585,8 +1589,10 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
> 	err = nfserr_noent;
> 	if (d_really_is_negative(dold))
> 		goto out_dput;
> +	fh_fill_pre_attrs(ffhp);
> 	host_err = vfs_link(dold, &init_user_ns, dirp, dnew, NULL);
> -	fh_unlock(ffhp);
> +	fh_fill_post_attrs(ffhp);
> +	inode_unlock(dirp);
> 	if (!host_err) {
> 		err = nfserrno(commit_metadata(ffhp));
> 		if (!err)
> @@ -1606,7 +1612,7 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
> out_dput:
> 	dput(dnew);
> out_unlock:
> -	fh_unlock(ffhp);
> +	inode_unlock(dirp);
> 	goto out_drop_write;
> }
> 
> @@ -1781,9 +1787,9 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
> 	if (host_err)
> 		goto out_nfserr;
> 
> -	fh_lock_nested(fhp, I_MUTEX_PARENT);
> 	dentry = fhp->fh_dentry;
> 	dirp = d_inode(dentry);
> +	inode_lock_nested(dirp, I_MUTEX_PARENT);
> 
> 	rdentry = lookup_one_len(fname, dentry, flen);
> 	host_err = PTR_ERR(rdentry);
> @@ -1801,6 +1807,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
> 	if (!type)
> 		type = d_inode(rdentry)->i_mode & S_IFMT;
> 
> +	fh_fill_pre_attrs(fhp);
> 	if (type != S_IFDIR) {
> 		if (rdentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK)
> 			nfsd_close_cached_files(rdentry);
> @@ -1808,8 +1815,9 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
> 	} else {
> 		host_err = vfs_rmdir(&init_user_ns, dirp, rdentry);
> 	}
> +	fh_fill_post_attrs(fhp);
> 
> -	fh_unlock(fhp);
> +	inode_unlock(dirp);
> 	if (!host_err)
> 		host_err = commit_metadata(fhp);
> 	dput(rdentry);
> @@ -1832,7 +1840,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
> out:
> 	return err;
> out_unlock:
> -	fh_unlock(fhp);
> +	inode_unlock(dirp);
> 	goto out_drop_write;
> }
> 
> 
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/8] NFSD: use (un)lock_inode instead of fh_(un)lock for file operations
  2022-07-06  4:18 ` [PATCH 7/8] NFSD: use (un)lock_inode instead of fh_(un)lock for file operations NeilBrown
  2022-07-06 14:10   ` Jeff Layton
@ 2022-07-06 16:30   ` Chuck Lever III
  2022-07-07  1:33     ` NeilBrown
  1 sibling, 1 reply; 40+ messages in thread
From: Chuck Lever III @ 2022-07-06 16:30 UTC (permalink / raw)
  To: Neil Brown; +Cc: Jeff Layton, Linux NFS Mailing List



> On Jul 6, 2022, at 12:18 AM, NeilBrown <neilb@suse.de> wrote:
> 
> When locking a file to access ACLs and xattrs etc, use explicit locking
> with inode_lock() instead of fh_lock().  This means that the calls to
> fh_fill_pre/post_attr() are also explicit which improves readability and
> allows us to place them only where they are needed.  Only the xattr
> calls need pre/post information.
> 
> When locking a file we don't need I_MUTEX_PARENT as the file is not a
> parent of anything, so we can use inode_lock() directly rather than the
> inode_lock_nested() call that fh_lock() uses.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
> fs/nfsd/nfs2acl.c   |    6 +++---
> fs/nfsd/nfs3acl.c   |    4 ++--
> fs/nfsd/nfs4acl.c   |    7 +++----
> fs/nfsd/nfs4state.c |    8 ++++----
> fs/nfsd/vfs.c       |   25 ++++++++++++-------------
> 5 files changed, 24 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c
> index b5760801d377..9edd3c1a30fb 100644
> --- a/fs/nfsd/nfs2acl.c
> +++ b/fs/nfsd/nfs2acl.c
> @@ -111,7 +111,7 @@ static __be32 nfsacld_proc_setacl(struct svc_rqst *rqstp)
> 	if (error)
> 		goto out_errno;
> 
> -	fh_lock(fh);
> +	inode_lock(inode);
> 
> 	error = set_posix_acl(&init_user_ns, inode, ACL_TYPE_ACCESS,
> 			      argp->acl_access);
> @@ -122,7 +122,7 @@ static __be32 nfsacld_proc_setacl(struct svc_rqst *rqstp)
> 	if (error)
> 		goto out_drop_lock;
> 
> -	fh_unlock(fh);
> +	inode_unlock(inode);
> 
> 	fh_drop_write(fh);
> 
> @@ -136,7 +136,7 @@ static __be32 nfsacld_proc_setacl(struct svc_rqst *rqstp)
> 	return rpc_success;
> 
> out_drop_lock:
> -	fh_unlock(fh);
> +	inode_unlock(inode);
> 	fh_drop_write(fh);
> out_errno:
> 	resp->status = nfserrno(error);
> diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c
> index 35b2ebda14da..9446c6743664 100644
> --- a/fs/nfsd/nfs3acl.c
> +++ b/fs/nfsd/nfs3acl.c
> @@ -101,7 +101,7 @@ static __be32 nfsd3_proc_setacl(struct svc_rqst *rqstp)
> 	if (error)
> 		goto out_errno;
> 
> -	fh_lock(fh);
> +	inode_lock(inode);
> 
> 	error = set_posix_acl(&init_user_ns, inode, ACL_TYPE_ACCESS,
> 			      argp->acl_access);
> @@ -111,7 +111,7 @@ static __be32 nfsd3_proc_setacl(struct svc_rqst *rqstp)
> 			      argp->acl_default);
> 
> out_drop_lock:
> -	fh_unlock(fh);
> +	inode_unlock(inode);
> 	fh_drop_write(fh);
> out_errno:
> 	resp->status = nfserrno(error);
> diff --git a/fs/nfsd/nfs4acl.c b/fs/nfsd/nfs4acl.c
> index 5c9b7e01e8ca..a33cacf62ea0 100644
> --- a/fs/nfsd/nfs4acl.c
> +++ b/fs/nfsd/nfs4acl.c
> @@ -781,19 +781,18 @@ nfsd4_set_nfs4_acl(struct svc_rqst *rqstp, struct svc_fh *fhp,
> 	if (host_error < 0)
> 		goto out_nfserr;
> 
> -	fh_lock(fhp);
> +	inode_lock(inode);
> 
> 	host_error = set_posix_acl(&init_user_ns, inode, ACL_TYPE_ACCESS, pacl);
> 	if (host_error < 0)
> 		goto out_drop_lock;
> 
> -	if (S_ISDIR(inode->i_mode)) {
> +	if (S_ISDIR(inode->i_mode))
> 		host_error = set_posix_acl(&init_user_ns, inode,
> 					   ACL_TYPE_DEFAULT, dpacl);
> -	}
> 
> out_drop_lock:
> -	fh_unlock(fhp);
> +	inode_unlock(inode);
> 
> 	posix_acl_release(pacl);
> 	posix_acl_release(dpacl);
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 9d1a3e131c49..307317ba9aff 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -7322,21 +7322,21 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> static __be32 nfsd_test_lock(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file_lock *lock)
> {
> 	struct nfsd_file *nf;
> +	struct inode *inode = fhp->fh_dentry->d_inode;

I don't think this is correct.

nfsd_file_acquire() calls fh_verify(), which can updated fhp->fh_dentry.
Anyway, is it guaranteed that fh_dentry is not NULL here?

It would be more defensive to set @inode /after/ the call to
nfsd_file_acquire().


> 	__be32 err;
> 
> 	err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
> 	if (err)
> 		return err;
> -	fh_lock(fhp); /* to block new leases till after test_lock: */
> -	err = nfserrno(nfsd_open_break_lease(fhp->fh_dentry->d_inode,
> -							NFSD_MAY_READ));
> +	inode_lock(inode); /* to block new leases till after test_lock: */
> +	err = nfserrno(nfsd_open_break_lease(inode, NFSD_MAY_READ));
> 	if (err)
> 		goto out;
> 	lock->fl_file = nf->nf_file;
> 	err = nfserrno(vfs_test_lock(nf->nf_file, lock));
> 	lock->fl_file = NULL;
> out:
> -	fh_unlock(fhp);
> +	inode_unlock(inode);
> 	nfsd_file_put(nf);
> 	return err;
> }
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 2ca748aa83bb..2526615285ca 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -444,7 +444,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
> 			return err;
> 	}
> 
> -	fh_lock(fhp);
> +	inode_lock(inode);
> 	if (size_change) {
> 		/*
> 		 * RFC5661, Section 18.30.4:
> @@ -480,7 +480,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
> 	host_err = notify_change(&init_user_ns, dentry, iap, NULL);
> 
> out_unlock:
> -	fh_unlock(fhp);
> +	inode_unlock(inode);
> 	if (size_change)
> 		put_write_access(inode);
> out:
> @@ -2196,12 +2196,8 @@ nfsd_listxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char **bufp,
> }
> 
> /*
> - * Removexattr and setxattr need to call fh_lock to both lock the inode
> - * and set the change attribute. Since the top-level vfs_removexattr
> - * and vfs_setxattr calls already do their own inode_lock calls, call
> - * the _locked variant. Pass in a NULL pointer for delegated_inode,
> - * and let the client deal with NFS4ERR_DELAY (same as with e.g.
> - * setattr and remove).
> + * Pass in a NULL pointer for delegated_inode, and let the client deal
> + * with NFS4ERR_DELAY (same as with e.g.  setattr and remove).
>  */
> __be32
> nfsd_removexattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name)
> @@ -2217,12 +2213,14 @@ nfsd_removexattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name)
> 	if (ret)
> 		return nfserrno(ret);
> 
> -	fh_lock(fhp);
> +	inode_lock(fhp->fh_dentry->d_inode);
> +	fh_fill_pre_attrs(fhp);
> 
> 	ret = __vfs_removexattr_locked(&init_user_ns, fhp->fh_dentry,
> 				       name, NULL);
> 
> -	fh_unlock(fhp);
> +	fh_fill_post_attrs(fhp);
> +	inode_unlock(fhp->fh_dentry->d_inode);
> 	fh_drop_write(fhp);
> 
> 	return nfsd_xattr_errno(ret);
> @@ -2242,12 +2240,13 @@ nfsd_setxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name,
> 	ret = fh_want_write(fhp);
> 	if (ret)
> 		return nfserrno(ret);
> -	fh_lock(fhp);
> +	inode_lock(fhp->fh_dentry->d_inode);
> +	fh_fill_pre_attrs(fhp);
> 
> 	ret = __vfs_setxattr_locked(&init_user_ns, fhp->fh_dentry, name, buf,
> 				    len, flags, NULL);
> -
> -	fh_unlock(fhp);
> +	fh_fill_post_attrs(fhp);
> +	inode_unlock(fhp->fh_dentry->d_inode);
> 	fh_drop_write(fhp);
> 
> 	return nfsd_xattr_errno(ret);
> 
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/8] NFSD: reduce locking in nfsd_lookup()
  2022-07-06 13:47   ` Jeff Layton
@ 2022-07-07  1:26     ` NeilBrown
  0 siblings, 0 replies; 40+ messages in thread
From: NeilBrown @ 2022-07-07  1:26 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Chuck Lever, linux-nfs

On Wed, 06 Jul 2022, Jeff Layton wrote:
> On Wed, 2022-07-06 at 14:18 +1000, NeilBrown wrote:
> > nfsd_lookup() takes an exclusive lock on the parent inode, but many
> > callers don't want the lock and may not need to lock at all if the
> > result is in the dcache.
> > 
> > Change nfsd_lookup() to be passed a bool flag.
> > If false, don't take the lock.
> > If true, do take an exclusive lock, and return with it held if
> > successful.
> > If nfsd_lookup() returns an error, the lock WILL NOT be held.
> > 
> > Only nfsd4_open() requests the lock to be held, and does so to block
> > rename until it decides whether to return a delegation.
> > 
> > NOTE: when nfsd4_open() creates a file, the directory does *NOT* remain
> >   locked and never has.  So it is possible (though unlikely) for the
> >   newly created file to be renamed before a delegation is handed out,
> >   and that would be bad.  This patch does *NOT* fix that, but *DOES*
> >   take the directory lock immediately after creating the file, which
> >   reduces the size of the window and ensure that the lock is held
> >   consistently.  More work is needed to guarantee no rename happens
> >   before the delegation.
> > 
> 
> Interesting. Maybe after taking the lock, we could re-vet the dentry vs.
> the info in the OPEN request? That way, we'd presumably know that the
> above race didn't occur.

I would lean towards revalidating the dentry after getting the lease.
However I don't think "revalidate the dentry" is quite as easy as I
would like it to be, particularly if you care about bind-mounts of
regular files.

> >  			/*
> > -			 * We don't need the i_mutex after all.  It's
> > -			 * still possible we could open this (regular
> > -			 * files can be mountpoints too), but the
> > -			 * i_mutex is just there to prevent renames of
> > -			 * something that we might be about to delegate,
> > -			 * and a mountpoint won't be renamed:
> > +			 * nfsd_cross_mnt() may wait for an upcall
> > +			 * to userspace, and holding i_sem across that
> 
> s/i_sem/i_rwsem/

But ...  fs/nilfs2/nilfs.h calls it i_sem, as does
fs/jffs2/README.Locking
And 
$ git grep -w i_mutex | wc
    180    1878   13728

But yes, I should spell it i_rwsem... or maybe just "the inode lock".

> 
> Other than minor comment nit...
> 
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> 

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/8] NFSD: reduce locking in nfsd_lookup()
  2022-07-06 16:29   ` Chuck Lever III
@ 2022-07-07  1:29     ` NeilBrown
  0 siblings, 0 replies; 40+ messages in thread
From: NeilBrown @ 2022-07-07  1:29 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Jeff Layton, Linux NFS Mailing List

On Thu, 07 Jul 2022, Chuck Lever III wrote:
> > +		/* We want to keep the directory locked until we've had a chance
> > +		 * to acquire a delegation if appropriate, so request that
> > +		 * nfsd_lookup() hold on to the lock.
> > 		 */
> > 		status = nfsd_lookup(rqstp, current_fh,
> > -				     open->op_fname, open->op_fnamelen, *resfh);
> > +				     open->op_fname, open->op_fnamelen, *resfh,
> > +				     true);
> > +		if (!status) {
> > +			/* NFSv4 protocol requires change attributes even though
> > +			 * no change happened.
> > +			 */
> > +			fh_fill_pre_attrs(current_fh);
> > +			fh_fill_post_attrs(current_fh);
> 
> If this is really correct, the comment should also state that
> no concurrent changes to the parent are possible during
> the lookup, and thus the pre and post attributes are expected
> to be the same always.

The earlier comment notes that nfsd_lookup() is called in a way in which
it takes and keeps the lock - that should imply that no other changes
can happen.  But with such insane looking code, it doesn't hurt to be
extra explicit.

> 
> Otherwise, this code paragraph looks just a little insane ;-)
> 
:-)

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/8] NFSD: use (un)lock_inode instead of fh_(un)lock for file operations
  2022-07-06 16:30   ` Chuck Lever III
@ 2022-07-07  1:33     ` NeilBrown
  0 siblings, 0 replies; 40+ messages in thread
From: NeilBrown @ 2022-07-07  1:33 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Jeff Layton, Linux NFS Mailing List

On Thu, 07 Jul 2022, Chuck Lever III wrote:
> 
> > On Jul 6, 2022, at 12:18 AM, NeilBrown <neilb@suse.de> wrote:
> > 
> > When locking a file to access ACLs and xattrs etc, use explicit locking
> > with inode_lock() instead of fh_lock().  This means that the calls to
> > fh_fill_pre/post_attr() are also explicit which improves readability and
> > allows us to place them only where they are needed.  Only the xattr
> > calls need pre/post information.
> > 
> > When locking a file we don't need I_MUTEX_PARENT as the file is not a
> > parent of anything, so we can use inode_lock() directly rather than the
> > inode_lock_nested() call that fh_lock() uses.
> > 
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > ---
> > fs/nfsd/nfs2acl.c   |    6 +++---
> > fs/nfsd/nfs3acl.c   |    4 ++--
> > fs/nfsd/nfs4acl.c   |    7 +++----
> > fs/nfsd/nfs4state.c |    8 ++++----
> > fs/nfsd/vfs.c       |   25 ++++++++++++-------------
> > 5 files changed, 24 insertions(+), 26 deletions(-)
> > 
> > diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c
> > index b5760801d377..9edd3c1a30fb 100644
> > --- a/fs/nfsd/nfs2acl.c
> > +++ b/fs/nfsd/nfs2acl.c
> > @@ -111,7 +111,7 @@ static __be32 nfsacld_proc_setacl(struct svc_rqst *rqstp)
> > 	if (error)
> > 		goto out_errno;
> > 
> > -	fh_lock(fh);
> > +	inode_lock(inode);
> > 
> > 	error = set_posix_acl(&init_user_ns, inode, ACL_TYPE_ACCESS,
> > 			      argp->acl_access);
> > @@ -122,7 +122,7 @@ static __be32 nfsacld_proc_setacl(struct svc_rqst *rqstp)
> > 	if (error)
> > 		goto out_drop_lock;
> > 
> > -	fh_unlock(fh);
> > +	inode_unlock(inode);
> > 
> > 	fh_drop_write(fh);
> > 
> > @@ -136,7 +136,7 @@ static __be32 nfsacld_proc_setacl(struct svc_rqst *rqstp)
> > 	return rpc_success;
> > 
> > out_drop_lock:
> > -	fh_unlock(fh);
> > +	inode_unlock(inode);
> > 	fh_drop_write(fh);
> > out_errno:
> > 	resp->status = nfserrno(error);
> > diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c
> > index 35b2ebda14da..9446c6743664 100644
> > --- a/fs/nfsd/nfs3acl.c
> > +++ b/fs/nfsd/nfs3acl.c
> > @@ -101,7 +101,7 @@ static __be32 nfsd3_proc_setacl(struct svc_rqst *rqstp)
> > 	if (error)
> > 		goto out_errno;
> > 
> > -	fh_lock(fh);
> > +	inode_lock(inode);
> > 
> > 	error = set_posix_acl(&init_user_ns, inode, ACL_TYPE_ACCESS,
> > 			      argp->acl_access);
> > @@ -111,7 +111,7 @@ static __be32 nfsd3_proc_setacl(struct svc_rqst *rqstp)
> > 			      argp->acl_default);
> > 
> > out_drop_lock:
> > -	fh_unlock(fh);
> > +	inode_unlock(inode);
> > 	fh_drop_write(fh);
> > out_errno:
> > 	resp->status = nfserrno(error);
> > diff --git a/fs/nfsd/nfs4acl.c b/fs/nfsd/nfs4acl.c
> > index 5c9b7e01e8ca..a33cacf62ea0 100644
> > --- a/fs/nfsd/nfs4acl.c
> > +++ b/fs/nfsd/nfs4acl.c
> > @@ -781,19 +781,18 @@ nfsd4_set_nfs4_acl(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > 	if (host_error < 0)
> > 		goto out_nfserr;
> > 
> > -	fh_lock(fhp);
> > +	inode_lock(inode);
> > 
> > 	host_error = set_posix_acl(&init_user_ns, inode, ACL_TYPE_ACCESS, pacl);
> > 	if (host_error < 0)
> > 		goto out_drop_lock;
> > 
> > -	if (S_ISDIR(inode->i_mode)) {
> > +	if (S_ISDIR(inode->i_mode))
> > 		host_error = set_posix_acl(&init_user_ns, inode,
> > 					   ACL_TYPE_DEFAULT, dpacl);
> > -	}
> > 
> > out_drop_lock:
> > -	fh_unlock(fhp);
> > +	inode_unlock(inode);
> > 
> > 	posix_acl_release(pacl);
> > 	posix_acl_release(dpacl);
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 9d1a3e131c49..307317ba9aff 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -7322,21 +7322,21 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > static __be32 nfsd_test_lock(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file_lock *lock)
> > {
> > 	struct nfsd_file *nf;
> > +	struct inode *inode = fhp->fh_dentry->d_inode;
> 
> I don't think this is correct.
> 
> nfsd_file_acquire() calls fh_verify(), which can updated fhp->fh_dentry.
> Anyway, is it guaranteed that fh_dentry is not NULL here?

nfsd_test_lock() is only ever called from nfsd4_lockt(), and that always
calls fh_verify() before calling nfsd_test_lock().  So the code is safe.

> 
> It would be more defensive to set @inode /after/ the call to
> nfsd_file_acquire().

Yes, that would make it even safer - thanks.

NeilBrown


> 
> 
> > 	__be32 err;
> > 
> > 	err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
> > 	if (err)
> > 		return err;
> > -	fh_lock(fhp); /* to block new leases till after test_lock: */
> > -	err = nfserrno(nfsd_open_break_lease(fhp->fh_dentry->d_inode,
> > -							NFSD_MAY_READ));
> > +	inode_lock(inode); /* to block new leases till after test_lock: */
> > +	err = nfserrno(nfsd_open_break_lease(inode, NFSD_MAY_READ));
> > 	if (err)
> > 		goto out;
> > 	lock->fl_file = nf->nf_file;
> > 	err = nfserrno(vfs_test_lock(nf->nf_file, lock));
> > 	lock->fl_file = NULL;
> > out:
> > -	fh_unlock(fhp);
> > +	inode_unlock(inode);
> > 	nfsd_file_put(nf);
> > 	return err;
> > }
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 2ca748aa83bb..2526615285ca 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -444,7 +444,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
> > 			return err;
> > 	}
> > 
> > -	fh_lock(fhp);
> > +	inode_lock(inode);
> > 	if (size_change) {
> > 		/*
> > 		 * RFC5661, Section 18.30.4:
> > @@ -480,7 +480,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
> > 	host_err = notify_change(&init_user_ns, dentry, iap, NULL);
> > 
> > out_unlock:
> > -	fh_unlock(fhp);
> > +	inode_unlock(inode);
> > 	if (size_change)
> > 		put_write_access(inode);
> > out:
> > @@ -2196,12 +2196,8 @@ nfsd_listxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char **bufp,
> > }
> > 
> > /*
> > - * Removexattr and setxattr need to call fh_lock to both lock the inode
> > - * and set the change attribute. Since the top-level vfs_removexattr
> > - * and vfs_setxattr calls already do their own inode_lock calls, call
> > - * the _locked variant. Pass in a NULL pointer for delegated_inode,
> > - * and let the client deal with NFS4ERR_DELAY (same as with e.g.
> > - * setattr and remove).
> > + * Pass in a NULL pointer for delegated_inode, and let the client deal
> > + * with NFS4ERR_DELAY (same as with e.g.  setattr and remove).
> >  */
> > __be32
> > nfsd_removexattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name)
> > @@ -2217,12 +2213,14 @@ nfsd_removexattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name)
> > 	if (ret)
> > 		return nfserrno(ret);
> > 
> > -	fh_lock(fhp);
> > +	inode_lock(fhp->fh_dentry->d_inode);
> > +	fh_fill_pre_attrs(fhp);
> > 
> > 	ret = __vfs_removexattr_locked(&init_user_ns, fhp->fh_dentry,
> > 				       name, NULL);
> > 
> > -	fh_unlock(fhp);
> > +	fh_fill_post_attrs(fhp);
> > +	inode_unlock(fhp->fh_dentry->d_inode);
> > 	fh_drop_write(fhp);
> > 
> > 	return nfsd_xattr_errno(ret);
> > @@ -2242,12 +2240,13 @@ nfsd_setxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name,
> > 	ret = fh_want_write(fhp);
> > 	if (ret)
> > 		return nfserrno(ret);
> > -	fh_lock(fhp);
> > +	inode_lock(fhp->fh_dentry->d_inode);
> > +	fh_fill_pre_attrs(fhp);
> > 
> > 	ret = __vfs_setxattr_locked(&init_user_ns, fhp->fh_dentry, name, buf,
> > 				    len, flags, NULL);
> > -
> > -	fh_unlock(fhp);
> > +	fh_fill_post_attrs(fhp);
> > +	inode_unlock(fhp->fh_dentry->d_inode);
> > 	fh_drop_write(fhp);
> > 
> > 	return nfsd_xattr_errno(ret);
> > 
> > 
> 
> --
> Chuck Lever
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/8] NFSD: clean up locking.
  2022-07-06 16:29 ` [PATCH 0/8] NFSD: clean up locking Chuck Lever III
@ 2022-07-12  2:33   ` NeilBrown
  2022-07-12 14:17     ` Chuck Lever III
  0 siblings, 1 reply; 40+ messages in thread
From: NeilBrown @ 2022-07-12  2:33 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Jeff Layton, Linux NFS Mailing List

On Thu, 07 Jul 2022, Chuck Lever III wrote:
> 
> > On Jul 6, 2022, at 12:18 AM, NeilBrown <neilb@suse.de> wrote:
> > 
> > This series prepares NFSD to be able to adjust to work with a proposed
> > patch which allows updates to directories to happen in parallel.
> > This patch set changes the way directories are locked, so the present
> > series cleans up some locking in nfsd.
> > 
> > Specifically we remove fh_lock() and fh_unlock().
> > These functions are problematic for a few reasons.
> > - they are deliberately idempotent - setting or clearing a flag
> >  so that a second call does nothing.  This makes locking errors harder,
> >  but it results in code that looks wrong ...  and maybe sometimes is a
> >  little bit wrong.
> >  Early patches clean up several places where this idempotent nature of
> >  the functions is depended on, and so makes the code clearer.
> > 
> > - they transparently call fh_fill_pre/post_attrs(), including at times
> >  when this is not necessary.  Having the calls only when necessary is
> >  marginally more efficient, and arguably makes the code clearer.
> > 
> > nfsd_lookup() currently always locks the directory, though often no lock
> > is needed.  So a patch in this series reduces this locking.
> > 
> > There is an awkward case that could still be further improved.
> > NFSv4 open needs to ensure the file is not renamed (or unlinked etc)
> > between the moment when the open succeeds, and a later moment when a
> > "lease" is attached to support a delegation.  The handling of this lock
> > is currently untidy, particularly when creating a file.
> > It would probably be better to take a lease immediately after
> > opening the file, and then discarding if after deciding not to provide a
> > delegation.
> > 
> > I have run fstests and cthon tests on this, but I wouldn't be surprised
> > if there is a corner case that I've missed.
> 
> Hi Neil, thanks for (re)posting.
> 
> Let me make a few general comments here before I send out specific
> review nits.
> 
> I'm concerned mostly with how this series can be adequately tested.
> The two particular areas I'm worried about:
> 
>  - There are some changes to NFSv2 code, which is effectively
>    fallow. I think I can run v2 tests, once we decide what tests
>    should be run.

I hope we can still test v2... I know it is disabled by default..
If we can't test it, we should consider removing it.

> 
>  - More critically, ("NFSD: reduce locking in nfsd_lookup()") does
>    some pretty heavy lifting. How should this change be tested?

I don't see how there can be any answer other than "run all the tests we
usually run".  lockdep should report any locking strangeness.

> 
> Secondarily, the series adds more bells and whistles to the generic
> NFSD VFS APIs on behalf of NFSv4-specific requirements. In particular:
> 
>  - ("NFSD: change nfsd_create() to unlock directory before returning.")
>    makes some big changes to nfsd_create(). But that helper itself
>    is pretty small. Seems like cleaner code would result if NFSv4
>    had its own version of nfsd_create() to deal with the post-change
>    cases.

I would not like that approach.  Duplicating code is rarely a good idea.
Maybe, rather than passing a function and void* to nfsd_create(), we
could pass an acl and a label and do the setting in vfs.c rather then
nfs4proc.c.  The difficult part of that approach is getting back the
individual error statuses.   That should be solvable though.

> 
>  - ("NFSD: reduce locking in nfsd_lookup()") has a similar issue:
>    nfsd_lookup() is being changed such that its semantics are
>    substantially different for NFSv4 than for others. This is
>    possibly an indication that nfsd_lookup() should also be
>    duplicated into the NFSv4-specific code and the generic VFS
>    version should be left alone.

Again, I don't like duplication.  In this case, I think the longer term
solution is to remove the NFSv4 specific locking differences and solve
the problem differently.  i.e.  don't hold the inode locked, but check
for any possible rename after getting a lease.  Once that is done,
nfsd_lookup() can have saner semantics.

> 
> I would prefer the code duplication approach in both these cases,
> unless you can convince me that is a bad idea.

When duplicating code results in substantial simplification in both
copies, then it makes sense.  Otherwise I think the default should be
not to duplicate.

Thanks,
NeilBrown


> 
> Finally, with regard to the awkward case you mention above. The
> NFSv4 OPEN code is a hairy mess, mostly because the protocol is
> a Swiss army knife and our implementation has had small fixes
> plastered onto it for many years. I won't be disappointed if
> you don't manage to address the rename/unlink/delegation race
> you mention above this time around. Just don't make it worse ;-)
> 
> Meanwhile we should start accruing some thoughts and designs
> about how this code path needs to work.
> 
> 
> > NeilBrown
> > 
> > 
> > ---
> > 
> > NeilBrown (8):
> >      NFSD: drop rqstp arg to do_set_nfs4_acl()
> >      NFSD: change nfsd_create() to unlock directory before returning.
> >      NFSD: always drop directory lock in nfsd_unlink()
> >      NFSD: only call fh_unlock() once in nfsd_link()
> >      NFSD: reduce locking in nfsd_lookup()
> >      NFSD: use explicit lock/unlock for directory ops
> >      NFSD: use (un)lock_inode instead of fh_(un)lock for file operations
> >      NFSD: discard fh_locked flag and fh_lock/fh_unlock
> > 
> > 
> > fs/nfsd/nfs2acl.c   |   6 +-
> > fs/nfsd/nfs3acl.c   |   4 +-
> > fs/nfsd/nfs3proc.c  |  21 ++---
> > fs/nfsd/nfs4acl.c   |  19 ++---
> > fs/nfsd/nfs4proc.c  | 106 +++++++++++++++---------
> > fs/nfsd/nfs4state.c |   8 +-
> > fs/nfsd/nfsfh.c     |   3 +-
> > fs/nfsd/nfsfh.h     |  56 +------------
> > fs/nfsd/nfsproc.c   |  14 ++--
> > fs/nfsd/vfs.c       | 193 ++++++++++++++++++++++++++------------------
> > fs/nfsd/vfs.h       |  19 +++--
> > 11 files changed, 238 insertions(+), 211 deletions(-)
> > 
> > --
> > Signature
> > 
> 
> --
> Chuck Lever
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/8] NFSD: clean up locking.
  2022-07-12  2:33   ` NeilBrown
@ 2022-07-12 14:17     ` Chuck Lever III
  2022-07-13  4:32       ` NeilBrown
  0 siblings, 1 reply; 40+ messages in thread
From: Chuck Lever III @ 2022-07-12 14:17 UTC (permalink / raw)
  To: Neil Brown; +Cc: Jeff Layton, Linux NFS Mailing List



> On Jul 11, 2022, at 10:33 PM, NeilBrown <neilb@suse.de> wrote:
> 
> On Thu, 07 Jul 2022, Chuck Lever III wrote:
>> 
>>> On Jul 6, 2022, at 12:18 AM, NeilBrown <neilb@suse.de> wrote:
>>> 
>>> This series prepares NFSD to be able to adjust to work with a proposed
>>> patch which allows updates to directories to happen in parallel.
>>> This patch set changes the way directories are locked, so the present
>>> series cleans up some locking in nfsd.
>>> 
>>> Specifically we remove fh_lock() and fh_unlock().
>>> These functions are problematic for a few reasons.
>>> - they are deliberately idempotent - setting or clearing a flag
>>> so that a second call does nothing. This makes locking errors harder,
>>> but it results in code that looks wrong ... and maybe sometimes is a
>>> little bit wrong.
>>> Early patches clean up several places where this idempotent nature of
>>> the functions is depended on, and so makes the code clearer.
>>> 
>>> - they transparently call fh_fill_pre/post_attrs(), including at times
>>> when this is not necessary. Having the calls only when necessary is
>>> marginally more efficient, and arguably makes the code clearer.
>>> 
>>> nfsd_lookup() currently always locks the directory, though often no lock
>>> is needed. So a patch in this series reduces this locking.
>>> 
>>> There is an awkward case that could still be further improved.
>>> NFSv4 open needs to ensure the file is not renamed (or unlinked etc)
>>> between the moment when the open succeeds, and a later moment when a
>>> "lease" is attached to support a delegation. The handling of this lock
>>> is currently untidy, particularly when creating a file.
>>> It would probably be better to take a lease immediately after
>>> opening the file, and then discarding if after deciding not to provide a
>>> delegation.
>>> 
>>> I have run fstests and cthon tests on this, but I wouldn't be surprised
>>> if there is a corner case that I've missed.
>> 
>> Hi Neil, thanks for (re)posting.
>> 
>> Let me make a few general comments here before I send out specific
>> review nits.
>> 
>> I'm concerned mostly with how this series can be adequately tested.
>> The two particular areas I'm worried about:
>> 
>> - There are some changes to NFSv2 code, which is effectively
>> fallow. I think I can run v2 tests, once we decide what tests
>> should be run.
> 
> I hope we can still test v2... I know it is disabled by default..
> If we can't test it, we should consider removing it.

The work of deprecating and removing NFSv2 is already under way.
I think what I'm asking is if /you/ have tested the NFSv2 changes.


>> Secondarily, the series adds more bells and whistles to the generic
>> NFSD VFS APIs on behalf of NFSv4-specific requirements. In particular:
>> 
>> - ("NFSD: change nfsd_create() to unlock directory before returning.")
>> makes some big changes to nfsd_create(). But that helper itself
>> is pretty small. Seems like cleaner code would result if NFSv4
>> had its own version of nfsd_create() to deal with the post-change
>> cases.
> 
> I would not like that approach. Duplicating code is rarely a good idea.

De-duplicating code /can/ be a good idea, but isn't always a good
idea. If the exceptional cases add a lot of logic, that can make the
de-duplicated code difficult to read and reason about, and it can
make it brittle, just as it does in this case. Modifications on
behalf of NFSv4 in this common piece of code is possibly hazardous
to NFSv3, and navigating around the exception logic makes it
difficult to understand and review.

IMO code duplication is not an appropriate design pattern in this
specific instance.


> Maybe, rather than passing a function and void* to nfsd_create(), we
> could pass an acl and a label and do the setting in vfs.c rather then
> nfs4proc.c. The difficult part of that approach is getting back the
> individual error statuses. That should be solvable though.

The bulk of the work in nfsd_create() is done by lookup_one_len()
and nfsd_create_locked(), both of which are public APIs. The rest
of nfsd_create() is code that appears in several other places
already.


>> - ("NFSD: reduce locking in nfsd_lookup()") has a similar issue:
>> nfsd_lookup() is being changed such that its semantics are
>> substantially different for NFSv4 than for others. This is
>> possibly an indication that nfsd_lookup() should also be
>> duplicated into the NFSv4-specific code and the generic VFS
>> version should be left alone.
> 
> Again, I don't like duplication. In this case, I think the longer term
> solution is to remove the NFSv4 specific locking differences and solve
> the problem differently. i.e. don't hold the inode locked, but check
> for any possible rename after getting a lease. Once that is done,
> nfsd_lookup() can have saner semantics.

Then, perhaps we should bite that bullet and do that work now.


>> I would prefer the code duplication approach in both these cases,
>> unless you can convince me that is a bad idea.
> 
> When duplicating code results in substantial simplification in both
> copies, then it makes sense. Otherwise I think the default should be
> not to duplicate.

I believe that duplicating in both cases above will result in
simpler and less brittle code. That's why I asked for it.

--
Chuck Lever




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/8] NFSD: clean up locking.
  2022-07-12 14:17     ` Chuck Lever III
@ 2022-07-13  4:32       ` NeilBrown
  2022-07-13 14:15         ` Chuck Lever III
  0 siblings, 1 reply; 40+ messages in thread
From: NeilBrown @ 2022-07-13  4:32 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Jeff Layton, Linux NFS Mailing List

On Wed, 13 Jul 2022, Chuck Lever III wrote:
> 
> > On Jul 11, 2022, at 10:33 PM, NeilBrown <neilb@suse.de> wrote:
> > 
> > On Thu, 07 Jul 2022, Chuck Lever III wrote:
> >> 
> >>> On Jul 6, 2022, at 12:18 AM, NeilBrown <neilb@suse.de> wrote:
> >>> 
> >>> This series prepares NFSD to be able to adjust to work with a proposed
> >>> patch which allows updates to directories to happen in parallel.
> >>> This patch set changes the way directories are locked, so the present
> >>> series cleans up some locking in nfsd.
> >>> 
> >>> Specifically we remove fh_lock() and fh_unlock().
> >>> These functions are problematic for a few reasons.
> >>> - they are deliberately idempotent - setting or clearing a flag
> >>> so that a second call does nothing. This makes locking errors harder,
> >>> but it results in code that looks wrong ... and maybe sometimes is a
> >>> little bit wrong.
> >>> Early patches clean up several places where this idempotent nature of
> >>> the functions is depended on, and so makes the code clearer.
> >>> 
> >>> - they transparently call fh_fill_pre/post_attrs(), including at times
> >>> when this is not necessary. Having the calls only when necessary is
> >>> marginally more efficient, and arguably makes the code clearer.
> >>> 
> >>> nfsd_lookup() currently always locks the directory, though often no lock
> >>> is needed. So a patch in this series reduces this locking.
> >>> 
> >>> There is an awkward case that could still be further improved.
> >>> NFSv4 open needs to ensure the file is not renamed (or unlinked etc)
> >>> between the moment when the open succeeds, and a later moment when a
> >>> "lease" is attached to support a delegation. The handling of this lock
> >>> is currently untidy, particularly when creating a file.
> >>> It would probably be better to take a lease immediately after
> >>> opening the file, and then discarding if after deciding not to provide a
> >>> delegation.
> >>> 
> >>> I have run fstests and cthon tests on this, but I wouldn't be surprised
> >>> if there is a corner case that I've missed.
> >> 
> >> Hi Neil, thanks for (re)posting.
> >> 
> >> Let me make a few general comments here before I send out specific
> >> review nits.
> >> 
> >> I'm concerned mostly with how this series can be adequately tested.
> >> The two particular areas I'm worried about:
> >> 
> >> - There are some changes to NFSv2 code, which is effectively
> >> fallow. I think I can run v2 tests, once we decide what tests
> >> should be run.
> > 
> > I hope we can still test v2... I know it is disabled by default..
> > If we can't test it, we should consider removing it.
> 
> The work of deprecating and removing NFSv2 is already under way.
> I think what I'm asking is if /you/ have tested the NFSv2 changes.

That's a question I can answer.  I haven't.  I will.

> 
> 
> >> Secondarily, the series adds more bells and whistles to the generic
> >> NFSD VFS APIs on behalf of NFSv4-specific requirements. In particular:
> >> 
> >> - ("NFSD: change nfsd_create() to unlock directory before returning.")
> >> makes some big changes to nfsd_create(). But that helper itself
> >> is pretty small. Seems like cleaner code would result if NFSv4
> >> had its own version of nfsd_create() to deal with the post-change
> >> cases.
> > 
> > I would not like that approach. Duplicating code is rarely a good idea.
> 
> De-duplicating code /can/ be a good idea, but isn't always a good
> idea. If the exceptional cases add a lot of logic, that can make the
> de-duplicated code difficult to read and reason about, and it can
> make it brittle, just as it does in this case. Modifications on
> behalf of NFSv4 in this common piece of code is possibly hazardous
> to NFSv3, and navigating around the exception logic makes it
> difficult to understand and review.

Are we looking at the same code?
The sum total of extra code needed for v4 is:
- two extra parameters:
	    void (*post_create)(struct svc_fh *fh, void *data),
	    void *data)
- two lines of code:
	if (!err && post_create)
		post_create(resfhp, data);

does that really make anything hard to follow?

> 
> IMO code duplication is not an appropriate design pattern in this
> specific instance.

I'm guessing there is a missing negation in there.

> 
> 
> > Maybe, rather than passing a function and void* to nfsd_create(), we
> > could pass an acl and a label and do the setting in vfs.c rather then
> > nfs4proc.c. The difficult part of that approach is getting back the
> > individual error statuses. That should be solvable though.
> 
> The bulk of the work in nfsd_create() is done by lookup_one_len()
> and nfsd_create_locked(), both of which are public APIs. The rest
> of nfsd_create() is code that appears in several other places
> already.

"several" == 1.  The only other call site for nfsd_create_locked() is in
nfsd_proc_create()

I would say that the "bulk" of the work is the interplay between
locking, error checking, and these two functions you mention.

> 
> 
> >> - ("NFSD: reduce locking in nfsd_lookup()") has a similar issue:
> >> nfsd_lookup() is being changed such that its semantics are
> >> substantially different for NFSv4 than for others. This is
> >> possibly an indication that nfsd_lookup() should also be
> >> duplicated into the NFSv4-specific code and the generic VFS
> >> version should be left alone.
> > 
> > Again, I don't like duplication. In this case, I think the longer term
> > solution is to remove the NFSv4 specific locking differences and solve
> > the problem differently. i.e. don't hold the inode locked, but check
> > for any possible rename after getting a lease. Once that is done,
> > nfsd_lookup() can have saner semantics.
> 
> Then, perhaps we should bite that bullet and do that work now.

While this does have an appeal, it also looks like the start of a rabbit
hole where I find have to fix up a growing collection of problems before
my patch can land.
I think balance is needed - certainly asking for some preliminary
cleanup is appropriate.  Expecting long standing and subtle issues that
are tangential to the main goal to be resolved first is possibly asking
too much.

NeilBrown


> 
> 
> >> I would prefer the code duplication approach in both these cases,
> >> unless you can convince me that is a bad idea.
> > 
> > When duplicating code results in substantial simplification in both
> > copies, then it makes sense. Otherwise I think the default should be
> > not to duplicate.
> 
> I believe that duplicating in both cases above will result in
> simpler and less brittle code. That's why I asked for it.
> 
> --
> Chuck Lever
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/8] NFSD: clean up locking.
  2022-07-13  4:32       ` NeilBrown
@ 2022-07-13 14:15         ` Chuck Lever III
  2022-07-13 19:13           ` Jeff Layton
  2022-07-15  2:36           ` NeilBrown
  0 siblings, 2 replies; 40+ messages in thread
From: Chuck Lever III @ 2022-07-13 14:15 UTC (permalink / raw)
  To: Neil Brown; +Cc: Jeff Layton, Linux NFS Mailing List



> On Jul 13, 2022, at 12:32 AM, NeilBrown <neilb@suse.de> wrote:
> 
> On Wed, 13 Jul 2022, Chuck Lever III wrote:
>> 
>>> On Jul 11, 2022, at 10:33 PM, NeilBrown <neilb@suse.de> wrote:
>>> 
>>> On Thu, 07 Jul 2022, Chuck Lever III wrote:
>>>> 
>>>>> On Jul 6, 2022, at 12:18 AM, NeilBrown <neilb@suse.de> wrote:
>>>>> 
>>>>> This series prepares NFSD to be able to adjust to work with a proposed
>>>>> patch which allows updates to directories to happen in parallel.
>>>>> This patch set changes the way directories are locked, so the present
>>>>> series cleans up some locking in nfsd.
>>>>> 
>>>>> Specifically we remove fh_lock() and fh_unlock().
>>>>> These functions are problematic for a few reasons.
>>>>> - they are deliberately idempotent - setting or clearing a flag
>>>>> so that a second call does nothing. This makes locking errors harder,
>>>>> but it results in code that looks wrong ... and maybe sometimes is a
>>>>> little bit wrong.
>>>>> Early patches clean up several places where this idempotent nature of
>>>>> the functions is depended on, and so makes the code clearer.
>>>>> 
>>>>> - they transparently call fh_fill_pre/post_attrs(), including at times
>>>>> when this is not necessary. Having the calls only when necessary is
>>>>> marginally more efficient, and arguably makes the code clearer.
>>>>> 
>>>>> nfsd_lookup() currently always locks the directory, though often no lock
>>>>> is needed. So a patch in this series reduces this locking.
>>>>> 
>>>>> There is an awkward case that could still be further improved.
>>>>> NFSv4 open needs to ensure the file is not renamed (or unlinked etc)
>>>>> between the moment when the open succeeds, and a later moment when a
>>>>> "lease" is attached to support a delegation. The handling of this lock
>>>>> is currently untidy, particularly when creating a file.
>>>>> It would probably be better to take a lease immediately after
>>>>> opening the file, and then discarding if after deciding not to provide a
>>>>> delegation.
>>>>> 
>>>>> I have run fstests and cthon tests on this, but I wouldn't be surprised
>>>>> if there is a corner case that I've missed.
>>>> 
>>>> Hi Neil, thanks for (re)posting.
>>>> 
>>>> Let me make a few general comments here before I send out specific
>>>> review nits.
>>>> 
>>>> I'm concerned mostly with how this series can be adequately tested.
>>>> The two particular areas I'm worried about:
>>>> 
>>>> - There are some changes to NFSv2 code, which is effectively
>>>> fallow. I think I can run v2 tests, once we decide what tests
>>>> should be run.
>>> 
>>> I hope we can still test v2... I know it is disabled by default..
>>> If we can't test it, we should consider removing it.
>> 
>> The work of deprecating and removing NFSv2 is already under way.
>> I think what I'm asking is if /you/ have tested the NFSv2 changes.
> 
> That's a question I can answer. I haven't. I will.

Just in case it's not clear by now, NFSv2 (and NFSv3)
stability is the reason I don't want to perturb the
NFSD VFS API code in any significant way unless it's
absolutely needed.


>>>> Secondarily, the series adds more bells and whistles to the generic
>>>> NFSD VFS APIs on behalf of NFSv4-specific requirements. In particular:
>>>> 
>>>> - ("NFSD: change nfsd_create() to unlock directory before returning.")
>>>> makes some big changes to nfsd_create(). But that helper itself
>>>> is pretty small. Seems like cleaner code would result if NFSv4
>>>> had its own version of nfsd_create() to deal with the post-change
>>>> cases.
>>> 
>>> I would not like that approach. Duplicating code is rarely a good idea.
>> 
>> De-duplicating code /can/ be a good idea, but isn't always a good
>> idea. If the exceptional cases add a lot of logic, that can make the
>> de-duplicated code difficult to read and reason about, and it can
>> make it brittle, just as it does in this case. Modifications on
>> behalf of NFSv4 in this common piece of code is possibly hazardous
>> to NFSv3, and navigating around the exception logic makes it
>> difficult to understand and review.
> 
> Are we looking at the same code?
> The sum total of extra code needed for v4 is:
> - two extra parameters:
> 	 void (*post_create)(struct svc_fh *fh, void *data),
> 	 void *data)
> - two lines of code:
> 	if (!err && post_create)
> 		post_create(resfhp, data);
> 
> does that really make anything hard to follow?

The synopsis of nfsd_create() is already pretty cluttered.

You're adding that extra code in nfsd_symlink() too IIRC? And,
this change adds a virtual function call (which has significant
overhead these days) for reasons of convenience rather than
necessity.


>> IMO code duplication is not an appropriate design pattern in this
>> specific instance.
> 
> I'm guessing there is a missing negation in there.

Yes, thanks.


>>> Maybe, rather than passing a function and void* to nfsd_create(), we
>>> could pass an acl and a label and do the setting in vfs.c rather then
>>> nfs4proc.c. The difficult part of that approach is getting back the
>>> individual error statuses. That should be solvable though.
>> 
>> The bulk of the work in nfsd_create() is done by lookup_one_len()
>> and nfsd_create_locked(), both of which are public APIs. The rest
>> of nfsd_create() is code that appears in several other places
>> already.
> 
> "several" == 1. The only other call site for nfsd_create_locked() is in
> nfsd_proc_create()

But there are 15 call sites under fs/nfsd/ for lookup_one_len().
It's a pretty common trope.


> I would say that the "bulk" of the work is the interplay between
> locking, error checking, and these two functions you mention.

You're looking at the details of your particular change, and
I'm concerned about how much technical debt is going to
continue to accrue in an area shared with legacy protocol
implementation.

I'm still asking myself if I can live with the extra parameters
and virtual function call. Maybe. IMO, there are three ways
forward:

1. I can merge your patch and move on.
2. I can merge your patch as it is and follow up with clean-up.
3. I can drop your patch and write it the way I prefer.


>>>> - ("NFSD: reduce locking in nfsd_lookup()") has a similar issue:
>>>> nfsd_lookup() is being changed such that its semantics are
>>>> substantially different for NFSv4 than for others. This is
>>>> possibly an indication that nfsd_lookup() should also be
>>>> duplicated into the NFSv4-specific code and the generic VFS
>>>> version should be left alone.
>>> 
>>> Again, I don't like duplication. In this case, I think the longer term
>>> solution is to remove the NFSv4 specific locking differences and solve
>>> the problem differently. i.e. don't hold the inode locked, but check
>>> for any possible rename after getting a lease. Once that is done,
>>> nfsd_lookup() can have saner semantics.
>> 
>> Then, perhaps we should bite that bullet and do that work now.
> 
> While this does have an appeal, it also looks like the start of a rabbit
> hole where I find have to fix up a growing collection of problems before
> my patch can land.

That's kind of the nature of the beast.

You are requesting changes that add technical debt with the
promise that "sometime in the future" that debt will be
erased. Meanwhile, nfsd_lookup() will be made harder to
maintain, and it will continue to contain a real bug.

I'm saying if there's a real bug here, addressing that should
be the priority.


> I think balance is needed - certainly asking for some preliminary
> cleanup is appropriate. Expecting long standing and subtle issues that
> are tangential to the main goal to be resolved first is possibly asking
> too much.

Fixing the bug seems to me (perhaps naively) to be directly
related to the parallelism of directory operations. Why do
you think it is tangential?

Asking for bugs to be addressed before new features go in
seems sensible to me, and is a common practice to enable the
resulting fixes to be backported. If you don't want to address
the bug you mentioned, then someone else will need to, and
that's fine. I think the priority should be bug fixes first.

Just to be clear, I'm referring to this issue:

>> NOTE: when nfsd4_open() creates a file, the directory does *NOT* remain
>> locked and never has. So it is possible (though unlikely) for the
>> newly created file to be renamed before a delegation is handed out,
>> and that would be bad. This patch does *NOT* fix that, but *DOES*
>> take the directory lock immediately after creating the file, which
>> reduces the size of the window and ensure that the lock is held
>> consistently. More work is needed to guarantee no rename happens
>> before the delegation.
> 
> Interesting. Maybe after taking the lock, we could re-vet the dentry vs.
> the info in the OPEN request? That way, we'd presumably know that the
> above race didn't occur.



--
Chuck Lever




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/8] NFSD: clean up locking.
  2022-07-13 14:15         ` Chuck Lever III
@ 2022-07-13 19:13           ` Jeff Layton
  2022-07-14 14:32             ` Chuck Lever III
  2022-07-15  2:36           ` NeilBrown
  1 sibling, 1 reply; 40+ messages in thread
From: Jeff Layton @ 2022-07-13 19:13 UTC (permalink / raw)
  To: Chuck Lever III, Neil Brown; +Cc: Linux NFS Mailing List

On Wed, 2022-07-13 at 14:15 +0000, Chuck Lever III wrote:
> 
> > On Jul 13, 2022, at 12:32 AM, NeilBrown <neilb@suse.de> wrote:
> > 
> > On Wed, 13 Jul 2022, Chuck Lever III wrote:
> > > 
> > > > On Jul 11, 2022, at 10:33 PM, NeilBrown <neilb@suse.de> wrote:
> > > > 
> > > > On Thu, 07 Jul 2022, Chuck Lever III wrote:
> > > > > 
> > > > > > On Jul 6, 2022, at 12:18 AM, NeilBrown <neilb@suse.de> wrote:
> > > > > > 
> > > > > > This series prepares NFSD to be able to adjust to work with a proposed
> > > > > > patch which allows updates to directories to happen in parallel.
> > > > > > This patch set changes the way directories are locked, so the present
> > > > > > series cleans up some locking in nfsd.
> > > > > > 
> > > > > > Specifically we remove fh_lock() and fh_unlock().
> > > > > > These functions are problematic for a few reasons.
> > > > > > - they are deliberately idempotent - setting or clearing a flag
> > > > > > so that a second call does nothing. This makes locking errors harder,
> > > > > > but it results in code that looks wrong ... and maybe sometimes is a
> > > > > > little bit wrong.
> > > > > > Early patches clean up several places where this idempotent nature of
> > > > > > the functions is depended on, and so makes the code clearer.
> > > > > > 
> > > > > > - they transparently call fh_fill_pre/post_attrs(), including at times
> > > > > > when this is not necessary. Having the calls only when necessary is
> > > > > > marginally more efficient, and arguably makes the code clearer.
> > > > > > 
> > > > > > nfsd_lookup() currently always locks the directory, though often no lock
> > > > > > is needed. So a patch in this series reduces this locking.
> > > > > > 
> > > > > > There is an awkward case that could still be further improved.
> > > > > > NFSv4 open needs to ensure the file is not renamed (or unlinked etc)
> > > > > > between the moment when the open succeeds, and a later moment when a
> > > > > > "lease" is attached to support a delegation. The handling of this lock
> > > > > > is currently untidy, particularly when creating a file.
> > > > > > It would probably be better to take a lease immediately after
> > > > > > opening the file, and then discarding if after deciding not to provide a
> > > > > > delegation.
> > > > > > 
> > > > > > I have run fstests and cthon tests on this, but I wouldn't be surprised
> > > > > > if there is a corner case that I've missed.
> > > > > 
> > > > > Hi Neil, thanks for (re)posting.
> > > > > 
> > > > > Let me make a few general comments here before I send out specific
> > > > > review nits.
> > > > > 
> > > > > I'm concerned mostly with how this series can be adequately tested.
> > > > > The two particular areas I'm worried about:
> > > > > 
> > > > > - There are some changes to NFSv2 code, which is effectively
> > > > > fallow. I think I can run v2 tests, once we decide what tests
> > > > > should be run.
> > > > 
> > > > I hope we can still test v2... I know it is disabled by default..
> > > > If we can't test it, we should consider removing it.
> > > 
> > > The work of deprecating and removing NFSv2 is already under way.
> > > I think what I'm asking is if /you/ have tested the NFSv2 changes.
> > 
> > That's a question I can answer. I haven't. I will.
> 
> Just in case it's not clear by now, NFSv2 (and NFSv3)
> stability is the reason I don't want to perturb the
> NFSD VFS API code in any significant way unless it's
> absolutely needed.
> 
> 
> > > > > Secondarily, the series adds more bells and whistles to the generic
> > > > > NFSD VFS APIs on behalf of NFSv4-specific requirements. In particular:
> > > > > 
> > > > > - ("NFSD: change nfsd_create() to unlock directory before returning.")
> > > > > makes some big changes to nfsd_create(). But that helper itself
> > > > > is pretty small. Seems like cleaner code would result if NFSv4
> > > > > had its own version of nfsd_create() to deal with the post-change
> > > > > cases.
> > > > 
> > > > I would not like that approach. Duplicating code is rarely a good idea.
> > > 
> > > De-duplicating code /can/ be a good idea, but isn't always a good
> > > idea. If the exceptional cases add a lot of logic, that can make the
> > > de-duplicated code difficult to read and reason about, and it can
> > > make it brittle, just as it does in this case. Modifications on
> > > behalf of NFSv4 in this common piece of code is possibly hazardous
> > > to NFSv3, and navigating around the exception logic makes it
> > > difficult to understand and review.
> > 
> > Are we looking at the same code?
> > The sum total of extra code needed for v4 is:
> > - two extra parameters:
> > 	 void (*post_create)(struct svc_fh *fh, void *data),
> > 	 void *data)
> > - two lines of code:
> > 	if (!err && post_create)
> > 		post_create(resfhp, data);
> > 
> > does that really make anything hard to follow?
> 
> The synopsis of nfsd_create() is already pretty cluttered.
> 
> You're adding that extra code in nfsd_symlink() too IIRC? And,
> this change adds a virtual function call (which has significant
> overhead these days) for reasons of convenience rather than
> necessity.
> 
> 
> > > IMO code duplication is not an appropriate design pattern in this
> > > specific instance.
> > 
> > I'm guessing there is a missing negation in there.
> 
> Yes, thanks.
> 
> 
> > > > Maybe, rather than passing a function and void* to nfsd_create(), we
> > > > could pass an acl and a label and do the setting in vfs.c rather then
> > > > nfs4proc.c. The difficult part of that approach is getting back the
> > > > individual error statuses. That should be solvable though.
> > > 
> > > The bulk of the work in nfsd_create() is done by lookup_one_len()
> > > and nfsd_create_locked(), both of which are public APIs. The rest
> > > of nfsd_create() is code that appears in several other places
> > > already.
> > 
> > "several" == 1. The only other call site for nfsd_create_locked() is in
> > nfsd_proc_create()
> 
> But there are 15 call sites under fs/nfsd/ for lookup_one_len().
> It's a pretty common trope.
> 
> 
> > I would say that the "bulk" of the work is the interplay between
> > locking, error checking, and these two functions you mention.
> 
> You're looking at the details of your particular change, and
> I'm concerned about how much technical debt is going to
> continue to accrue in an area shared with legacy protocol
> implementation.
> 
> I'm still asking myself if I can live with the extra parameters
> and virtual function call. Maybe. IMO, there are three ways
> forward:
> 
> 1. I can merge your patch and move on.
> 2. I can merge your patch as it is and follow up with clean-up.
> 3. I can drop your patch and write it the way I prefer.
> 
> 
> > > > > - ("NFSD: reduce locking in nfsd_lookup()") has a similar issue:
> > > > > nfsd_lookup() is being changed such that its semantics are
> > > > > substantially different for NFSv4 than for others. This is
> > > > > possibly an indication that nfsd_lookup() should also be
> > > > > duplicated into the NFSv4-specific code and the generic VFS
> > > > > version should be left alone.
> > > > 
> > > > Again, I don't like duplication. In this case, I think the longer term
> > > > solution is to remove the NFSv4 specific locking differences and solve
> > > > the problem differently. i.e. don't hold the inode locked, but check
> > > > for any possible rename after getting a lease. Once that is done,
> > > > nfsd_lookup() can have saner semantics.
> > > 
> > > Then, perhaps we should bite that bullet and do that work now.
> > 
> > While this does have an appeal, it also looks like the start of a rabbit
> > hole where I find have to fix up a growing collection of problems before
> > my patch can land.
> 
> That's kind of the nature of the beast.
> 
> You are requesting changes that add technical debt with the
> promise that "sometime in the future" that debt will be
> erased. Meanwhile, nfsd_lookup() will be made harder to
> maintain, and it will continue to contain a real bug.
> 
> I'm saying if there's a real bug here, addressing that should
> be the priority.
> 
> 
> > I think balance is needed - certainly asking for some preliminary
> > cleanup is appropriate. Expecting long standing and subtle issues that
> > are tangential to the main goal to be resolved first is possibly asking
> > too much.
> 
> Fixing the bug seems to me (perhaps naively) to be directly
> related to the parallelism of directory operations. Why do
> you think it is tangential?
> 
> Asking for bugs to be addressed before new features go in
> seems sensible to me, and is a common practice to enable the
> resulting fixes to be backported. If you don't want to address
> the bug you mentioned, then someone else will need to, and
> that's fine. I think the priority should be bug fixes first.
> 
> Just to be clear, I'm referring to this issue:
> 
> > > NOTE: when nfsd4_open() creates a file, the directory does *NOT* remain
> > > locked and never has. So it is possible (though unlikely) for the
> > > newly created file to be renamed before a delegation is handed out,
> > > and that would be bad. This patch does *NOT* fix that, but *DOES*
> > > take the directory lock immediately after creating the file, which
> > > reduces the size of the window and ensure that the lock is held
> > > consistently. More work is needed to guarantee no rename happens
> > > before the delegation.
> > 
> > Interesting. Maybe after taking the lock, we could re-vet the dentry vs.
> > the info in the OPEN request? That way, we'd presumably know that the
> > above race didn't occur.
> 
> 

I usually like to see bugfixes first too, but I haven't heard of anyone
ever hitting this problem in the field. We've lived with this bug for a
long time, so I don't see a problem with cleaning up the locking first
and then fixing this on top of that.

That said, if we're concerned about it, we could just add an extra check
to nfs4_set_delegation. Basically, take parent mutex, set the
delegation, walk the inode->i_dentry list and vet that one of them
matches the OPEN args. That change probably wouldn't be too invasive and
should be backportable, but it means taking that mutex twice.

The alternate idea would be to try to set the delegation a lot earlier,
while we still hold the parent mutex for the open. That makes the
cleanup trickier, but is potentially more efficient. I think it'd be 
simpler to implement this on top of Neil's patchset since it simplifies
the locking. Backporting that by itself is probably going to be painful
though.

What should we aim for here?
--
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/8] NFSD: clean up locking.
  2022-07-13 19:13           ` Jeff Layton
@ 2022-07-14 14:32             ` Chuck Lever III
  0 siblings, 0 replies; 40+ messages in thread
From: Chuck Lever III @ 2022-07-14 14:32 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Neil Brown, Linux NFS Mailing List


> On Jul 13, 2022, at 3:13 PM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> On Wed, 2022-07-13 at 14:15 +0000, Chuck Lever III wrote:
>> 
>> 
>> Just to be clear, I'm referring to this issue:
>> 
>>>> NOTE: when nfsd4_open() creates a file, the directory does *NOT* remain
>>>> locked and never has. So it is possible (though unlikely) for the
>>>> newly created file to be renamed before a delegation is handed out,
>>>> and that would be bad. This patch does *NOT* fix that, but *DOES*
>>>> take the directory lock immediately after creating the file, which
>>>> reduces the size of the window and ensure that the lock is held
>>>> consistently. More work is needed to guarantee no rename happens
>>>> before the delegation.
>>> 
>>> Interesting. Maybe after taking the lock, we could re-vet the dentry vs.
>>> the info in the OPEN request? That way, we'd presumably know that the
>>> above race didn't occur.
> 
> I usually like to see bugfixes first too, but I haven't heard of anyone
> ever hitting this problem in the field. We've lived with this bug for a
> long time, so I don't see a problem with cleaning up the locking first
> and then fixing this on top of that.
> 
> That said, if we're concerned about it, we could just add an extra check
> to nfs4_set_delegation. Basically, take parent mutex, set the
> delegation, walk the inode->i_dentry list and vet that one of them
> matches the OPEN args. That change probably wouldn't be too invasive and
> should be backportable, but it means taking that mutex twice.
> 
> The alternate idea would be to try to set the delegation a lot earlier,
> while we still hold the parent mutex for the open. That makes the
> cleanup trickier, but is potentially more efficient. I think it'd be 
> simpler to implement this on top of Neil's patchset since it simplifies
> the locking. Backporting that by itself is probably going to be painful
> though.
> 
> What should we aim for here?

If there is consensus that a fix for a possible delegation/rename race is
not necessary in stable kernels, then there is a little more maneuverability --
we can shoot for what is ideal moving forward.

Again, my main concerns are not perturbing the legacy code if we don't have
to, and having the NFSv4 code spell out its locking requirements clearly.
After that, performance is important, so avoiding taking locks and mutexes
(mutices?) unnecessarily would be best.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/8] NFSD: clean up locking.
  2022-07-13 14:15         ` Chuck Lever III
  2022-07-13 19:13           ` Jeff Layton
@ 2022-07-15  2:36           ` NeilBrown
  2022-07-15 13:01             ` Jeff Layton
  2022-07-15 14:06             ` Chuck Lever III
  1 sibling, 2 replies; 40+ messages in thread
From: NeilBrown @ 2022-07-15  2:36 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Jeff Layton, Linux NFS Mailing List

On Thu, 14 Jul 2022, Chuck Lever III wrote:
> 
> > On Jul 13, 2022, at 12:32 AM, NeilBrown <neilb@suse.de> wrote:
> > 
> > On Wed, 13 Jul 2022, Chuck Lever III wrote:
> >> 
> >>> On Jul 11, 2022, at 10:33 PM, NeilBrown <neilb@suse.de> wrote:
> >>> 
> >>> On Thu, 07 Jul 2022, Chuck Lever III wrote:
> >>>> 
> >>>>> On Jul 6, 2022, at 12:18 AM, NeilBrown <neilb@suse.de> wrote:
> >>>>> 
> >>>>> This series prepares NFSD to be able to adjust to work with a proposed
> >>>>> patch which allows updates to directories to happen in parallel.
> >>>>> This patch set changes the way directories are locked, so the present
> >>>>> series cleans up some locking in nfsd.
> >>>>> 
> >>>>> Specifically we remove fh_lock() and fh_unlock().
> >>>>> These functions are problematic for a few reasons.
> >>>>> - they are deliberately idempotent - setting or clearing a flag
> >>>>> so that a second call does nothing. This makes locking errors harder,
> >>>>> but it results in code that looks wrong ... and maybe sometimes is a
> >>>>> little bit wrong.
> >>>>> Early patches clean up several places where this idempotent nature of
> >>>>> the functions is depended on, and so makes the code clearer.
> >>>>> 
> >>>>> - they transparently call fh_fill_pre/post_attrs(), including at times
> >>>>> when this is not necessary. Having the calls only when necessary is
> >>>>> marginally more efficient, and arguably makes the code clearer.
> >>>>> 
> >>>>> nfsd_lookup() currently always locks the directory, though often no lock
> >>>>> is needed. So a patch in this series reduces this locking.
> >>>>> 
> >>>>> There is an awkward case that could still be further improved.
> >>>>> NFSv4 open needs to ensure the file is not renamed (or unlinked etc)
> >>>>> between the moment when the open succeeds, and a later moment when a
> >>>>> "lease" is attached to support a delegation. The handling of this lock
> >>>>> is currently untidy, particularly when creating a file.
> >>>>> It would probably be better to take a lease immediately after
> >>>>> opening the file, and then discarding if after deciding not to provide a
> >>>>> delegation.
> >>>>> 
> >>>>> I have run fstests and cthon tests on this, but I wouldn't be surprised
> >>>>> if there is a corner case that I've missed.
> >>>> 
> >>>> Hi Neil, thanks for (re)posting.
> >>>> 
> >>>> Let me make a few general comments here before I send out specific
> >>>> review nits.
> >>>> 
> >>>> I'm concerned mostly with how this series can be adequately tested.
> >>>> The two particular areas I'm worried about:
> >>>> 
> >>>> - There are some changes to NFSv2 code, which is effectively
> >>>> fallow. I think I can run v2 tests, once we decide what tests
> >>>> should be run.
> >>> 
> >>> I hope we can still test v2... I know it is disabled by default..
> >>> If we can't test it, we should consider removing it.
> >> 
> >> The work of deprecating and removing NFSv2 is already under way.
> >> I think what I'm asking is if /you/ have tested the NFSv2 changes.
> > 
> > That's a question I can answer. I haven't. I will.
> 
> Just in case it's not clear by now, NFSv2 (and NFSv3)
> stability is the reason I don't want to perturb the
> NFSD VFS API code in any significant way unless it's
> absolutely needed.

"absolutely" seems like a rather high bar.  It makes the goal of
"stability" seem more like "ossification".
Certainly we don't want to break things.  Equally certainly we will.
I don't think "hold back useful changes out of fear" is the path to
success.  Testing, review, and responding to bug reports is what we find
works best - and what we generally do.

And I wouldn't class NFSv3 at all with NFSv2 (not even in parentheses).
NFSv2 has very few (if any) users in the broad community, so bugs are
likely to go unnoticed for extended periods.  NFSv3 is, I believe, still
widely used, though not as widely as v4.  There are use-cases where I
would recommend v3.

e.g.  a case could certainly be made to not extend my directory-locking
changes to v2-specific code, but v3 should get them.

> 
> 
> >>>> Secondarily, the series adds more bells and whistles to the generic
> >>>> NFSD VFS APIs on behalf of NFSv4-specific requirements. In particular:
> >>>> 
> >>>> - ("NFSD: change nfsd_create() to unlock directory before returning.")
> >>>> makes some big changes to nfsd_create(). But that helper itself
> >>>> is pretty small. Seems like cleaner code would result if NFSv4
> >>>> had its own version of nfsd_create() to deal with the post-change
> >>>> cases.
> >>> 
> >>> I would not like that approach. Duplicating code is rarely a good idea.
> >> 
> >> De-duplicating code /can/ be a good idea, but isn't always a good
> >> idea. If the exceptional cases add a lot of logic, that can make the
> >> de-duplicated code difficult to read and reason about, and it can
> >> make it brittle, just as it does in this case. Modifications on
> >> behalf of NFSv4 in this common piece of code is possibly hazardous
> >> to NFSv3, and navigating around the exception logic makes it
> >> difficult to understand and review.
> > 
> > Are we looking at the same code?
> > The sum total of extra code needed for v4 is:
> > - two extra parameters:
> > 	 void (*post_create)(struct svc_fh *fh, void *data),
> > 	 void *data)
> > - two lines of code:
> > 	if (!err && post_create)
> > 		post_create(resfhp, data);
> > 
> > does that really make anything hard to follow?
> 
> The synopsis of nfsd_create() is already pretty cluttered.

Would it help to collect related details (name, type, attributes, acl,
label) into a struct and pass a reference to that around?
One awkwardness in my latest patch is that the acl and label are not set
in nfsd_create_setattr().  If they were passed around together with the
attributes, that would be a lot easier to achieve.

> 
> You're adding that extra code in nfsd_symlink() too IIRC? And,
> this change adds a virtual function call (which has significant
> overhead these days) for reasons of convenience rather than
> necessity.

"significant"?  In context of creation a file, I don't think one more
virtual function call is all that significant?

I started writing code to avoid the function pointer, but the function
args list exploded.  If you could be happy with e.g.  a struct
nfs_create_args which contains attrs, acl, label, and was passed into
nfsd_create_setattr(), then I think that would cleanly avoid the desire
for a function pointer.

> 
> 
> >> IMO code duplication is not an appropriate design pattern in this
> >> specific instance.
> > 
> > I'm guessing there is a missing negation in there.
> 
> Yes, thanks.
> 
> 
> >>> Maybe, rather than passing a function and void* to nfsd_create(), we
> >>> could pass an acl and a label and do the setting in vfs.c rather then
> >>> nfs4proc.c. The difficult part of that approach is getting back the
> >>> individual error statuses. That should be solvable though.
> >> 
> >> The bulk of the work in nfsd_create() is done by lookup_one_len()
> >> and nfsd_create_locked(), both of which are public APIs. The rest
> >> of nfsd_create() is code that appears in several other places
> >> already.
> > 
> > "several" == 1. The only other call site for nfsd_create_locked() is in
> > nfsd_proc_create()
> 
> But there are 15 call sites under fs/nfsd/ for lookup_one_len().
> It's a pretty common trope.
> 
> 
> > I would say that the "bulk" of the work is the interplay between
> > locking, error checking, and these two functions you mention.
> 
> You're looking at the details of your particular change, and
> I'm concerned about how much technical debt is going to
> continue to accrue in an area shared with legacy protocol
> implementation.
> 
> I'm still asking myself if I can live with the extra parameters
> and virtual function call. Maybe. IMO, there are three ways
> forward:
> 
> 1. I can merge your patch and move on.
> 2. I can merge your patch as it is and follow up with clean-up.
> 3. I can drop your patch and write it the way I prefer.
> 
> 
> >>>> - ("NFSD: reduce locking in nfsd_lookup()") has a similar issue:
> >>>> nfsd_lookup() is being changed such that its semantics are
> >>>> substantially different for NFSv4 than for others. This is
> >>>> possibly an indication that nfsd_lookup() should also be
> >>>> duplicated into the NFSv4-specific code and the generic VFS
> >>>> version should be left alone.
> >>> 
> >>> Again, I don't like duplication. In this case, I think the longer term
> >>> solution is to remove the NFSv4 specific locking differences and solve
> >>> the problem differently. i.e. don't hold the inode locked, but check
> >>> for any possible rename after getting a lease. Once that is done,
> >>> nfsd_lookup() can have saner semantics.
> >> 
> >> Then, perhaps we should bite that bullet and do that work now.
> > 
> > While this does have an appeal, it also looks like the start of a rabbit
> > hole where I find have to fix up a growing collection of problems before
> > my patch can land.
> 
> That's kind of the nature of the beast.
> 
> You are requesting changes that add technical debt with the
> promise that "sometime in the future" that debt will be
> erased. Meanwhile, nfsd_lookup() will be made harder to
> maintain, and it will continue to contain a real bug.
> 
> I'm saying if there's a real bug here, addressing that should
> be the priority.
> 
> 
> > I think balance is needed - certainly asking for some preliminary
> > cleanup is appropriate. Expecting long standing and subtle issues that
> > are tangential to the main goal to be resolved first is possibly asking
> > too much.
> 
> Fixing the bug seems to me (perhaps naively) to be directly
> related to the parallelism of directory operations. Why do
> you think it is tangential?
> 

The bug was exposed by the analysis required for the changes I want, but
I think that is as close as the connection goes.
To really see if it is tangential, we would need to come up with a
solution and see how easily it is ported across my patches.

As I said in a reply to Jeff, I don't think locking of the parent
directory should be part of the solution.  After getting a lease, we
repeat the lookup and see if dentry has changed.  After my patches, that
would mean calling lookup_one_len_unlocked().  Before my patches, it
would mean calling fh_unlock() earlier to ensure the directory is no
longer locked but still calling lookup_one_len_unlocked()


> Asking for bugs to be addressed before new features go in
> seems sensible to me, and is a common practice to enable the
> resulting fixes to be backported. If you don't want to address
> the bug you mentioned, then someone else will need to, and
> that's fine. I think the priority should be bug fixes first.

As a general principle I would agree.
In this case the bug is minor, long standing, and tangential.
And the series imposes enough review burden as it is.

But if Jeff can produce a fix, either to be applied before or after,
then that would be an excellent solution.

Thanks,
NeilBrown

> 
> Just to be clear, I'm referring to this issue:
> 
> >> NOTE: when nfsd4_open() creates a file, the directory does *NOT* remain
> >> locked and never has. So it is possible (though unlikely) for the
> >> newly created file to be renamed before a delegation is handed out,
> >> and that would be bad. This patch does *NOT* fix that, but *DOES*
> >> take the directory lock immediately after creating the file, which
> >> reduces the size of the window and ensure that the lock is held
> >> consistently. More work is needed to guarantee no rename happens
> >> before the delegation.
> > 
> > Interesting. Maybe after taking the lock, we could re-vet the dentry vs.
> > the info in the OPEN request? That way, we'd presumably know that the
> > above race didn't occur.
> 
> 
> 
> --
> Chuck Lever
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/8] NFSD: clean up locking.
  2022-07-15  2:36           ` NeilBrown
@ 2022-07-15 13:01             ` Jeff Layton
  2022-07-15 13:53               ` Jeff Layton
  2022-07-15 14:06             ` Chuck Lever III
  1 sibling, 1 reply; 40+ messages in thread
From: Jeff Layton @ 2022-07-15 13:01 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever III; +Cc: Linux NFS Mailing List

On Fri, 2022-07-15 at 12:36 +1000, NeilBrown wrote:
> On Thu, 14 Jul 2022, Chuck Lever III wrote:
> > 
> > > On Jul 13, 2022, at 12:32 AM, NeilBrown <neilb@suse.de> wrote:
> > > 
> > > On Wed, 13 Jul 2022, Chuck Lever III wrote:
> > > > 
> > > > > On Jul 11, 2022, at 10:33 PM, NeilBrown <neilb@suse.de> wrote:
> > > > > 
> > > > > On Thu, 07 Jul 2022, Chuck Lever III wrote:
> > > > > > 
> > > > > > > On Jul 6, 2022, at 12:18 AM, NeilBrown <neilb@suse.de> wrote:
> > > > > > > 
> > > > > > > This series prepares NFSD to be able to adjust to work with a proposed
> > > > > > > patch which allows updates to directories to happen in parallel.
> > > > > > > This patch set changes the way directories are locked, so the present
> > > > > > > series cleans up some locking in nfsd.
> > > > > > > 
> > > > > > > Specifically we remove fh_lock() and fh_unlock().
> > > > > > > These functions are problematic for a few reasons.
> > > > > > > - they are deliberately idempotent - setting or clearing a flag
> > > > > > > so that a second call does nothing. This makes locking errors harder,
> > > > > > > but it results in code that looks wrong ... and maybe sometimes is a
> > > > > > > little bit wrong.
> > > > > > > Early patches clean up several places where this idempotent nature of
> > > > > > > the functions is depended on, and so makes the code clearer.
> > > > > > > 
> > > > > > > - they transparently call fh_fill_pre/post_attrs(), including at times
> > > > > > > when this is not necessary. Having the calls only when necessary is
> > > > > > > marginally more efficient, and arguably makes the code clearer.
> > > > > > > 
> > > > > > > nfsd_lookup() currently always locks the directory, though often no lock
> > > > > > > is needed. So a patch in this series reduces this locking.
> > > > > > > 
> > > > > > > There is an awkward case that could still be further improved.
> > > > > > > NFSv4 open needs to ensure the file is not renamed (or unlinked etc)
> > > > > > > between the moment when the open succeeds, and a later moment when a
> > > > > > > "lease" is attached to support a delegation. The handling of this lock
> > > > > > > is currently untidy, particularly when creating a file.
> > > > > > > It would probably be better to take a lease immediately after
> > > > > > > opening the file, and then discarding if after deciding not to provide a
> > > > > > > delegation.
> > > > > > > 
> > > > > > > I have run fstests and cthon tests on this, but I wouldn't be surprised
> > > > > > > if there is a corner case that I've missed.
> > > > > > 
> > > > > > Hi Neil, thanks for (re)posting.
> > > > > > 
> > > > > > Let me make a few general comments here before I send out specific
> > > > > > review nits.
> > > > > > 
> > > > > > I'm concerned mostly with how this series can be adequately tested.
> > > > > > The two particular areas I'm worried about:
> > > > > > 
> > > > > > - There are some changes to NFSv2 code, which is effectively
> > > > > > fallow. I think I can run v2 tests, once we decide what tests
> > > > > > should be run.
> > > > > 
> > > > > I hope we can still test v2... I know it is disabled by default..
> > > > > If we can't test it, we should consider removing it.
> > > > 
> > > > The work of deprecating and removing NFSv2 is already under way.
> > > > I think what I'm asking is if /you/ have tested the NFSv2 changes.
> > > 
> > > That's a question I can answer. I haven't. I will.
> > 
> > Just in case it's not clear by now, NFSv2 (and NFSv3)
> > stability is the reason I don't want to perturb the
> > NFSD VFS API code in any significant way unless it's
> > absolutely needed.
> 
> "absolutely" seems like a rather high bar.  It makes the goal of
> "stability" seem more like "ossification".
> Certainly we don't want to break things.  Equally certainly we will.
> I don't think "hold back useful changes out of fear" is the path to
> success.  Testing, review, and responding to bug reports is what we find
> works best - and what we generally do.
> 
> And I wouldn't class NFSv3 at all with NFSv2 (not even in parentheses).
> NFSv2 has very few (if any) users in the broad community, so bugs are
> likely to go unnoticed for extended periods.  NFSv3 is, I believe, still
> widely used, though not as widely as v4.  There are use-cases where I
> would recommend v3.
> 
> e.g.  a case could certainly be made to not extend my directory-locking
> changes to v2-specific code, but v3 should get them.
> 
> > 
> > 
> > > > > > Secondarily, the series adds more bells and whistles to the generic
> > > > > > NFSD VFS APIs on behalf of NFSv4-specific requirements. In particular:
> > > > > > 
> > > > > > - ("NFSD: change nfsd_create() to unlock directory before returning.")
> > > > > > makes some big changes to nfsd_create(). But that helper itself
> > > > > > is pretty small. Seems like cleaner code would result if NFSv4
> > > > > > had its own version of nfsd_create() to deal with the post-change
> > > > > > cases.
> > > > > 
> > > > > I would not like that approach. Duplicating code is rarely a good idea.
> > > > 
> > > > De-duplicating code /can/ be a good idea, but isn't always a good
> > > > idea. If the exceptional cases add a lot of logic, that can make the
> > > > de-duplicated code difficult to read and reason about, and it can
> > > > make it brittle, just as it does in this case. Modifications on
> > > > behalf of NFSv4 in this common piece of code is possibly hazardous
> > > > to NFSv3, and navigating around the exception logic makes it
> > > > difficult to understand and review.
> > > 
> > > Are we looking at the same code?
> > > The sum total of extra code needed for v4 is:
> > > - two extra parameters:
> > > 	 void (*post_create)(struct svc_fh *fh, void *data),
> > > 	 void *data)
> > > - two lines of code:
> > > 	if (!err && post_create)
> > > 		post_create(resfhp, data);
> > > 
> > > does that really make anything hard to follow?
> > 
> > The synopsis of nfsd_create() is already pretty cluttered.
> 
> Would it help to collect related details (name, type, attributes, acl,
> label) into a struct and pass a reference to that around?
> One awkwardness in my latest patch is that the acl and label are not set
> in nfsd_create_setattr().  If they were passed around together with the
> attributes, that would be a lot easier to achieve.
> 
> > 
> > You're adding that extra code in nfsd_symlink() too IIRC? And,
> > this change adds a virtual function call (which has significant
> > overhead these days) for reasons of convenience rather than
> > necessity.
> 
> "significant"?  In context of creation a file, I don't think one more
> virtual function call is all that significant?
> 
> I started writing code to avoid the function pointer, but the function
> args list exploded.  If you could be happy with e.g.  a struct
> nfs_create_args which contains attrs, acl, label, and was passed into
> nfsd_create_setattr(), then I think that would cleanly avoid the desire
> for a function pointer.
> 
> > 
> > 
> > > > IMO code duplication is not an appropriate design pattern in this
> > > > specific instance.
> > > 
> > > I'm guessing there is a missing negation in there.
> > 
> > Yes, thanks.
> > 
> > 
> > > > > Maybe, rather than passing a function and void* to nfsd_create(), we
> > > > > could pass an acl and a label and do the setting in vfs.c rather then
> > > > > nfs4proc.c. The difficult part of that approach is getting back the
> > > > > individual error statuses. That should be solvable though.
> > > > 
> > > > The bulk of the work in nfsd_create() is done by lookup_one_len()
> > > > and nfsd_create_locked(), both of which are public APIs. The rest
> > > > of nfsd_create() is code that appears in several other places
> > > > already.
> > > 
> > > "several" == 1. The only other call site for nfsd_create_locked() is in
> > > nfsd_proc_create()
> > 
> > But there are 15 call sites under fs/nfsd/ for lookup_one_len().
> > It's a pretty common trope.
> > 
> > 
> > > I would say that the "bulk" of the work is the interplay between
> > > locking, error checking, and these two functions you mention.
> > 
> > You're looking at the details of your particular change, and
> > I'm concerned about how much technical debt is going to
> > continue to accrue in an area shared with legacy protocol
> > implementation.
> > 
> > I'm still asking myself if I can live with the extra parameters
> > and virtual function call. Maybe. IMO, there are three ways
> > forward:
> > 
> > 1. I can merge your patch and move on.
> > 2. I can merge your patch as it is and follow up with clean-up.
> > 3. I can drop your patch and write it the way I prefer.
> > 
> > 
> > > > > > - ("NFSD: reduce locking in nfsd_lookup()") has a similar issue:
> > > > > > nfsd_lookup() is being changed such that its semantics are
> > > > > > substantially different for NFSv4 than for others. This is
> > > > > > possibly an indication that nfsd_lookup() should also be
> > > > > > duplicated into the NFSv4-specific code and the generic VFS
> > > > > > version should be left alone.
> > > > > 
> > > > > Again, I don't like duplication. In this case, I think the longer term
> > > > > solution is to remove the NFSv4 specific locking differences and solve
> > > > > the problem differently. i.e. don't hold the inode locked, but check
> > > > > for any possible rename after getting a lease. Once that is done,
> > > > > nfsd_lookup() can have saner semantics.
> > > > 
> > > > Then, perhaps we should bite that bullet and do that work now.
> > > 
> > > While this does have an appeal, it also looks like the start of a rabbit
> > > hole where I find have to fix up a growing collection of problems before
> > > my patch can land.
> > 
> > That's kind of the nature of the beast.
> > 
> > You are requesting changes that add technical debt with the
> > promise that "sometime in the future" that debt will be
> > erased. Meanwhile, nfsd_lookup() will be made harder to
> > maintain, and it will continue to contain a real bug.
> > 
> > I'm saying if there's a real bug here, addressing that should
> > be the priority.
> > 
> > 
> > > I think balance is needed - certainly asking for some preliminary
> > > cleanup is appropriate. Expecting long standing and subtle issues that
> > > are tangential to the main goal to be resolved first is possibly asking
> > > too much.
> > 
> > Fixing the bug seems to me (perhaps naively) to be directly
> > related to the parallelism of directory operations. Why do
> > you think it is tangential?
> > 
> 
> The bug was exposed by the analysis required for the changes I want, but
> I think that is as close as the connection goes.
> To really see if it is tangential, we would need to come up with a
> solution and see how easily it is ported across my patches.
> 
> As I said in a reply to Jeff, I don't think locking of the parent
> directory should be part of the solution.  After getting a lease, we
> repeat the lookup and see if dentry has changed.  After my patches, that
> would mean calling lookup_one_len_unlocked().  Before my patches, it
> would mean calling fh_unlock() earlier to ensure the directory is no
> longer locked but still calling lookup_one_len_unlocked()
> 
> 
> > Asking for bugs to be addressed before new features go in
> > seems sensible to me, and is a common practice to enable the
> > resulting fixes to be backported. If you don't want to address
> > the bug you mentioned, then someone else will need to, and
> > that's fine. I think the priority should be bug fixes first.
> 
> As a general principle I would agree.
> In this case the bug is minor, long standing, and tangential.
> And the series imposes enough review burden as it is.
> 
> But if Jeff can produce a fix, either to be applied before or after,
> then that would be an excellent solution.
> 
> Thanks,
> NeilBrown
> 
> 


FWIW, I hit this while running fstests against the server with some
draft changes. This crash is not in an area I touched, so I suspect
something in Neil's locking cleanup:

[  434.257242] ------------[ cut here ]------------
[  434.257278] kernel BUG at fs/nfsd/xdr4.h:752!
[  434.257755] invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
[  434.257937] CPU: 2 PID: 1202 Comm: nfsd Kdump: loaded Tainted: G           OE     5.19.0-rc5+ #316
[  434.258179] Hardware name: Red Hat KVM/RHEL, BIOS 1.16.0-3.el9 04/01/2014
[  434.258371] RIP: 0010:nfsd4_open+0x1940/0x1a30 [nfsd]
[  434.258615] Code: 40 e8 54 04 e2 e6 48 8b 7c 24 40 41 89 c4 e8 57 7e d4 e6 41 f7 d4 4c 8b 44 24 60 66 44 21 63 44 48 8b 54 24 40 e9 46 fc ff ff <0f> 0b 48 8d bd 88 00 00 00 e8 72 80 d4 e6 4c 8b ad 88 00 00 00 49
[  434.259053] RSP: 0018:ffff888134897c10 EFLAGS: 00010246
[  434.259211] RAX: 0000000000000000 RBX: ffff8881348791a0 RCX: ffffffffc07fbb44
[  434.259408] RDX: 1ffff1102691001a RSI: 0000000000000002 RDI: ffff8881348800d1
[  434.259606] RBP: ffff888134880030 R08: 0000000000000000 R09: 0000000000000001
[  434.259802] R10: fffffbfff542dfac R11: 0000000000000001 R12: 0000000000000000
[  434.260000] R13: ffff888133ab8000 R14: 0000000000000000 R15: ffff8881165db400
[  434.260195] FS:  0000000000000000(0000) GS:ffff888225500000(0000) knlGS:0000000000000000
[  434.260466] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  434.260697] CR2: 0000558890b5d178 CR3: 0000000107844000 CR4: 00000000003506e0
[  434.260949] Call Trace:
[  434.261106]  <TASK>
[  434.261260]  ? nfsd4_decode_notsupp+0x10/0x10 [nfsd]
[  434.261549]  ? nfsd4_interssc_connect+0x8e0/0x8e0 [nfsd]
[  434.261858]  ? preempt_count_sub+0x14/0xc0
[  434.262054]  ? percpu_counter_add_batch+0x60/0xd0
[  434.262261]  nfsd4_proc_compound+0x8c6/0xe90 [nfsd]
[  434.262553]  nfsd_dispatch+0x2a9/0x5c0 [nfsd]
[  434.262836]  svc_process_common+0x6ab/0xc30 [sunrpc]
[  434.263162]  ? svc_create_pooled+0x390/0x390 [sunrpc]
[  434.263484]  ? nfsd_svc+0x830/0x830 [nfsd]
[  434.263755]  ? svc_recv+0xabf/0x1310 [sunrpc]
[  434.264074]  svc_process+0x1a3/0x200 [sunrpc]
[  434.264382]  nfsd+0x1d7/0x390 [nfsd]
[  434.264640]  ? nfsd_shutdown_threads+0x200/0x200 [nfsd]
[  434.264926]  kthread+0x167/0x1a0
[  434.265101]  ? kthread_complete_and_exit+0x20/0x20
[  434.265307]  ret_from_fork+0x22/0x30
[  434.265494]  </TASK>
[  434.265652] Modules linked in: rpcsec_gss_krb5(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) rfkill(E) nf_tables(E) nfnetlink(E) iTCO_wdt(E) intel_rapl_msr(E) intel_pmc_bxt(E) iTCO_vendor_support(E) joydev(E) intel_rapl_common(E) i2c_i801(E) i2c_smbus(E) virtio_balloon(E) lpc_ich(E) nfsd(OE) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) drm(E) fuse(E) sunrpc(E) xfs(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) serio_raw(E) virtio_blk(E) virtio_console(E) virtio_net(E) net_failover(E) failover(E) qemu_fw_cfg(E)
[  434.267660] ---[ end trace 0000000000000000 ]---
[  434.267870] RIP: 0010:nfsd4_open+0x1940/0x1a30 [nfsd]
[  434.268161] Code: 40 e8 54 04 e2 e6 48 8b 7c 24 40 41 89 c4 e8 57 7e d4 e6 41 f7 d4 4c 8b 44 24 60 66 44 21 63 44 48 8b 54 24 40 e9 46 fc ff ff <0f> 0b 48 8d bd 88 00 00 00 e8 72 80 d4 e6 4c 8b ad 88 00 00 00 49
[  434.268771] RSP: 0018:ffff888134897c10 EFLAGS: 00010246
[  434.268995] RAX: 0000000000000000 RBX: ffff8881348791a0 RCX: ffffffffc07fbb44
[  434.269247] RDX: 1ffff1102691001a RSI: 0000000000000002 RDI: ffff8881348800d1
[  434.269511] RBP: ffff888134880030 R08: 0000000000000000 R09: 0000000000000001
[  434.269775] R10: fffffbfff542dfac R11: 0000000000000001 R12: 0000000000000000
[  434.270030] R13: ffff888133ab8000 R14: 0000000000000000 R15: ffff8881165db400
[  434.270288] FS:  0000000000000000(0000) GS:ffff888225500000(0000) knlGS:0000000000000000
[  434.270632] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  434.270862] CR2: 0000558890b5d178 CR3: 0000000107844000 CR4: 00000000003506e0

faddr2line says:

$ ./scripts/faddr2line --list fs/nfsd/nfsd.ko nfsd4_open+0x1940
nfsd4_open+0x1940/0x1a30:

set_change_info at /home/jlayton/git/linux/fs/nfsd/xdr4.h:752
 747 	#define NFS4_SVC_XDRSIZE		sizeof(struct nfsd4_compoundargs)
 748 	
 749 	static inline void
 750 	set_change_info(struct nfsd4_change_info *cinfo, struct svc_fh *fhp)
 751 	{
>752<		BUG_ON(!fhp->fh_pre_saved);
 753 		cinfo->atomic = (u32)(fhp->fh_post_saved && !fhp->fh_no_atomic_attr);
 754 	
 755 		cinfo->before_change = fhp->fh_pre_change;
 756 		cinfo->after_change = fhp->fh_post_change;
 757 	}

(inlined by) do_open_lookup at /home/jlayton/git/linux/fs/nfsd/nfs4proc.c:502
 497 		accmode = NFSD_MAY_NOP;
 498 		if (open->op_created ||
 499 				open->op_claim_type == NFS4_OPEN_CLAIM_DELEGATE_CUR)
 500 			accmode |= NFSD_MAY_OWNER_OVERRIDE;
 501 		status = do_open_permission(rqstp, *resfh, open, accmode);
>502<		set_change_info(&open->op_cinfo, current_fh);
 503 	out:
 504 		if (status)
 505 			inode_unlock(current_fh->fh_dentry->d_inode);
 506 		return status;
 507 	}

(inlined by) nfsd4_open at /home/jlayton/git/linux/fs/nfsd/nfs4proc.c:625
 620 			goto out;
 621 	
 622 		switch (open->op_claim_type) {
 623 		case NFS4_OPEN_CLAIM_DELEGATE_CUR:
 624 		case NFS4_OPEN_CLAIM_NULL:
>625<			status = do_open_lookup(rqstp, cstate, open, &resfh);
 626 			if (status)
 627 				goto out;
 628 			locked = true;
 629 			break;
 630 		case NFS4_OPEN_CLAIM_PREVIOUS:

I haven't yet dug down into the actual cause yet.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/8] NFSD: clean up locking.
  2022-07-15 13:01             ` Jeff Layton
@ 2022-07-15 13:53               ` Jeff Layton
  0 siblings, 0 replies; 40+ messages in thread
From: Jeff Layton @ 2022-07-15 13:53 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever III; +Cc: Linux NFS Mailing List

On Fri, 2022-07-15 at 09:01 -0400, Jeff Layton wrote:
> On Fri, 2022-07-15 at 12:36 +1000, NeilBrown wrote:
> > On Thu, 14 Jul 2022, Chuck Lever III wrote:
> > > 
> > > > On Jul 13, 2022, at 12:32 AM, NeilBrown <neilb@suse.de> wrote:
> > > > 
> > > > On Wed, 13 Jul 2022, Chuck Lever III wrote:
> > > > > 
> > > > > > On Jul 11, 2022, at 10:33 PM, NeilBrown <neilb@suse.de> wrote:
> > > > > > 
> > > > > > On Thu, 07 Jul 2022, Chuck Lever III wrote:
> > > > > > > 
> > > > > > > > On Jul 6, 2022, at 12:18 AM, NeilBrown <neilb@suse.de> wrote:
> > > > > > > > 
> > > > > > > > This series prepares NFSD to be able to adjust to work with a proposed
> > > > > > > > patch which allows updates to directories to happen in parallel.
> > > > > > > > This patch set changes the way directories are locked, so the present
> > > > > > > > series cleans up some locking in nfsd.
> > > > > > > > 
> > > > > > > > Specifically we remove fh_lock() and fh_unlock().
> > > > > > > > These functions are problematic for a few reasons.
> > > > > > > > - they are deliberately idempotent - setting or clearing a flag
> > > > > > > > so that a second call does nothing. This makes locking errors harder,
> > > > > > > > but it results in code that looks wrong ... and maybe sometimes is a
> > > > > > > > little bit wrong.
> > > > > > > > Early patches clean up several places where this idempotent nature of
> > > > > > > > the functions is depended on, and so makes the code clearer.
> > > > > > > > 
> > > > > > > > - they transparently call fh_fill_pre/post_attrs(), including at times
> > > > > > > > when this is not necessary. Having the calls only when necessary is
> > > > > > > > marginally more efficient, and arguably makes the code clearer.
> > > > > > > > 
> > > > > > > > nfsd_lookup() currently always locks the directory, though often no lock
> > > > > > > > is needed. So a patch in this series reduces this locking.
> > > > > > > > 
> > > > > > > > There is an awkward case that could still be further improved.
> > > > > > > > NFSv4 open needs to ensure the file is not renamed (or unlinked etc)
> > > > > > > > between the moment when the open succeeds, and a later moment when a
> > > > > > > > "lease" is attached to support a delegation. The handling of this lock
> > > > > > > > is currently untidy, particularly when creating a file.
> > > > > > > > It would probably be better to take a lease immediately after
> > > > > > > > opening the file, and then discarding if after deciding not to provide a
> > > > > > > > delegation.
> > > > > > > > 
> > > > > > > > I have run fstests and cthon tests on this, but I wouldn't be surprised
> > > > > > > > if there is a corner case that I've missed.
> > > > > > > 
> > > > > > > Hi Neil, thanks for (re)posting.
> > > > > > > 
> > > > > > > Let me make a few general comments here before I send out specific
> > > > > > > review nits.
> > > > > > > 
> > > > > > > I'm concerned mostly with how this series can be adequately tested.
> > > > > > > The two particular areas I'm worried about:
> > > > > > > 
> > > > > > > - There are some changes to NFSv2 code, which is effectively
> > > > > > > fallow. I think I can run v2 tests, once we decide what tests
> > > > > > > should be run.
> > > > > > 
> > > > > > I hope we can still test v2... I know it is disabled by default..
> > > > > > If we can't test it, we should consider removing it.
> > > > > 
> > > > > The work of deprecating and removing NFSv2 is already under way.
> > > > > I think what I'm asking is if /you/ have tested the NFSv2 changes.
> > > > 
> > > > That's a question I can answer. I haven't. I will.
> > > 
> > > Just in case it's not clear by now, NFSv2 (and NFSv3)
> > > stability is the reason I don't want to perturb the
> > > NFSD VFS API code in any significant way unless it's
> > > absolutely needed.
> > 
> > "absolutely" seems like a rather high bar.  It makes the goal of
> > "stability" seem more like "ossification".
> > Certainly we don't want to break things.  Equally certainly we will.
> > I don't think "hold back useful changes out of fear" is the path to
> > success.  Testing, review, and responding to bug reports is what we find
> > works best - and what we generally do.
> > 
> > And I wouldn't class NFSv3 at all with NFSv2 (not even in parentheses).
> > NFSv2 has very few (if any) users in the broad community, so bugs are
> > likely to go unnoticed for extended periods.  NFSv3 is, I believe, still
> > widely used, though not as widely as v4.  There are use-cases where I
> > would recommend v3.
> > 
> > e.g.  a case could certainly be made to not extend my directory-locking
> > changes to v2-specific code, but v3 should get them.
> > 
> > > 
> > > 
> > > > > > > Secondarily, the series adds more bells and whistles to the generic
> > > > > > > NFSD VFS APIs on behalf of NFSv4-specific requirements. In particular:
> > > > > > > 
> > > > > > > - ("NFSD: change nfsd_create() to unlock directory before returning.")
> > > > > > > makes some big changes to nfsd_create(). But that helper itself
> > > > > > > is pretty small. Seems like cleaner code would result if NFSv4
> > > > > > > had its own version of nfsd_create() to deal with the post-change
> > > > > > > cases.
> > > > > > 
> > > > > > I would not like that approach. Duplicating code is rarely a good idea.
> > > > > 
> > > > > De-duplicating code /can/ be a good idea, but isn't always a good
> > > > > idea. If the exceptional cases add a lot of logic, that can make the
> > > > > de-duplicated code difficult to read and reason about, and it can
> > > > > make it brittle, just as it does in this case. Modifications on
> > > > > behalf of NFSv4 in this common piece of code is possibly hazardous
> > > > > to NFSv3, and navigating around the exception logic makes it
> > > > > difficult to understand and review.
> > > > 
> > > > Are we looking at the same code?
> > > > The sum total of extra code needed for v4 is:
> > > > - two extra parameters:
> > > > 	 void (*post_create)(struct svc_fh *fh, void *data),
> > > > 	 void *data)
> > > > - two lines of code:
> > > > 	if (!err && post_create)
> > > > 		post_create(resfhp, data);
> > > > 
> > > > does that really make anything hard to follow?
> > > 
> > > The synopsis of nfsd_create() is already pretty cluttered.
> > 
> > Would it help to collect related details (name, type, attributes, acl,
> > label) into a struct and pass a reference to that around?
> > One awkwardness in my latest patch is that the acl and label are not set
> > in nfsd_create_setattr().  If they were passed around together with the
> > attributes, that would be a lot easier to achieve.
> > 
> > > 
> > > You're adding that extra code in nfsd_symlink() too IIRC? And,
> > > this change adds a virtual function call (which has significant
> > > overhead these days) for reasons of convenience rather than
> > > necessity.
> > 
> > "significant"?  In context of creation a file, I don't think one more
> > virtual function call is all that significant?
> > 
> > I started writing code to avoid the function pointer, but the function
> > args list exploded.  If you could be happy with e.g.  a struct
> > nfs_create_args which contains attrs, acl, label, and was passed into
> > nfsd_create_setattr(), then I think that would cleanly avoid the desire
> > for a function pointer.
> > 
> > > 
> > > 
> > > > > IMO code duplication is not an appropriate design pattern in this
> > > > > specific instance.
> > > > 
> > > > I'm guessing there is a missing negation in there.
> > > 
> > > Yes, thanks.
> > > 
> > > 
> > > > > > Maybe, rather than passing a function and void* to nfsd_create(), we
> > > > > > could pass an acl and a label and do the setting in vfs.c rather then
> > > > > > nfs4proc.c. The difficult part of that approach is getting back the
> > > > > > individual error statuses. That should be solvable though.
> > > > > 
> > > > > The bulk of the work in nfsd_create() is done by lookup_one_len()
> > > > > and nfsd_create_locked(), both of which are public APIs. The rest
> > > > > of nfsd_create() is code that appears in several other places
> > > > > already.
> > > > 
> > > > "several" == 1. The only other call site for nfsd_create_locked() is in
> > > > nfsd_proc_create()
> > > 
> > > But there are 15 call sites under fs/nfsd/ for lookup_one_len().
> > > It's a pretty common trope.
> > > 
> > > 
> > > > I would say that the "bulk" of the work is the interplay between
> > > > locking, error checking, and these two functions you mention.
> > > 
> > > You're looking at the details of your particular change, and
> > > I'm concerned about how much technical debt is going to
> > > continue to accrue in an area shared with legacy protocol
> > > implementation.
> > > 
> > > I'm still asking myself if I can live with the extra parameters
> > > and virtual function call. Maybe. IMO, there are three ways
> > > forward:
> > > 
> > > 1. I can merge your patch and move on.
> > > 2. I can merge your patch as it is and follow up with clean-up.
> > > 3. I can drop your patch and write it the way I prefer.
> > > 
> > > 
> > > > > > > - ("NFSD: reduce locking in nfsd_lookup()") has a similar issue:
> > > > > > > nfsd_lookup() is being changed such that its semantics are
> > > > > > > substantially different for NFSv4 than for others. This is
> > > > > > > possibly an indication that nfsd_lookup() should also be
> > > > > > > duplicated into the NFSv4-specific code and the generic VFS
> > > > > > > version should be left alone.
> > > > > > 
> > > > > > Again, I don't like duplication. In this case, I think the longer term
> > > > > > solution is to remove the NFSv4 specific locking differences and solve
> > > > > > the problem differently. i.e. don't hold the inode locked, but check
> > > > > > for any possible rename after getting a lease. Once that is done,
> > > > > > nfsd_lookup() can have saner semantics.
> > > > > 
> > > > > Then, perhaps we should bite that bullet and do that work now.
> > > > 
> > > > While this does have an appeal, it also looks like the start of a rabbit
> > > > hole where I find have to fix up a growing collection of problems before
> > > > my patch can land.
> > > 
> > > That's kind of the nature of the beast.
> > > 
> > > You are requesting changes that add technical debt with the
> > > promise that "sometime in the future" that debt will be
> > > erased. Meanwhile, nfsd_lookup() will be made harder to
> > > maintain, and it will continue to contain a real bug.
> > > 
> > > I'm saying if there's a real bug here, addressing that should
> > > be the priority.
> > > 
> > > 
> > > > I think balance is needed - certainly asking for some preliminary
> > > > cleanup is appropriate. Expecting long standing and subtle issues that
> > > > are tangential to the main goal to be resolved first is possibly asking
> > > > too much.
> > > 
> > > Fixing the bug seems to me (perhaps naively) to be directly
> > > related to the parallelism of directory operations. Why do
> > > you think it is tangential?
> > > 
> > 
> > The bug was exposed by the analysis required for the changes I want, but
> > I think that is as close as the connection goes.
> > To really see if it is tangential, we would need to come up with a
> > solution and see how easily it is ported across my patches.
> > 
> > As I said in a reply to Jeff, I don't think locking of the parent
> > directory should be part of the solution.  After getting a lease, we
> > repeat the lookup and see if dentry has changed.  After my patches, that
> > would mean calling lookup_one_len_unlocked().  Before my patches, it
> > would mean calling fh_unlock() earlier to ensure the directory is no
> > longer locked but still calling lookup_one_len_unlocked()
> > 
> > 
> > > Asking for bugs to be addressed before new features go in
> > > seems sensible to me, and is a common practice to enable the
> > > resulting fixes to be backported. If you don't want to address
> > > the bug you mentioned, then someone else will need to, and
> > > that's fine. I think the priority should be bug fixes first.
> > 
> > As a general principle I would agree.
> > In this case the bug is minor, long standing, and tangential.
> > And the series imposes enough review burden as it is.
> > 
> > But if Jeff can produce a fix, either to be applied before or after,
> > then that would be an excellent solution.
> > 
> > Thanks,
> > NeilBrown
> > 
> > 
> 
> 
> FWIW, I hit this while running fstests against the server with some
> draft changes. This crash is not in an area I touched, so I suspect
> something in Neil's locking cleanup:
> 
> [  434.257242] ------------[ cut here ]------------
> [  434.257278] kernel BUG at fs/nfsd/xdr4.h:752!
> [  434.257755] invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
> [  434.257937] CPU: 2 PID: 1202 Comm: nfsd Kdump: loaded Tainted: G           OE     5.19.0-rc5+ #316
> [  434.258179] Hardware name: Red Hat KVM/RHEL, BIOS 1.16.0-3.el9 04/01/2014
> [  434.258371] RIP: 0010:nfsd4_open+0x1940/0x1a30 [nfsd]
> [  434.258615] Code: 40 e8 54 04 e2 e6 48 8b 7c 24 40 41 89 c4 e8 57 7e d4 e6 41 f7 d4 4c 8b 44 24 60 66 44 21 63 44 48 8b 54 24 40 e9 46 fc ff ff <0f> 0b 48 8d bd 88 00 00 00 e8 72 80 d4 e6 4c 8b ad 88 00 00 00 49
> [  434.259053] RSP: 0018:ffff888134897c10 EFLAGS: 00010246
> [  434.259211] RAX: 0000000000000000 RBX: ffff8881348791a0 RCX: ffffffffc07fbb44
> [  434.259408] RDX: 1ffff1102691001a RSI: 0000000000000002 RDI: ffff8881348800d1
> [  434.259606] RBP: ffff888134880030 R08: 0000000000000000 R09: 0000000000000001
> [  434.259802] R10: fffffbfff542dfac R11: 0000000000000001 R12: 0000000000000000
> [  434.260000] R13: ffff888133ab8000 R14: 0000000000000000 R15: ffff8881165db400
> [  434.260195] FS:  0000000000000000(0000) GS:ffff888225500000(0000) knlGS:0000000000000000
> [  434.260466] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  434.260697] CR2: 0000558890b5d178 CR3: 0000000107844000 CR4: 00000000003506e0
> [  434.260949] Call Trace:
> [  434.261106]  <TASK>
> [  434.261260]  ? nfsd4_decode_notsupp+0x10/0x10 [nfsd]
> [  434.261549]  ? nfsd4_interssc_connect+0x8e0/0x8e0 [nfsd]
> [  434.261858]  ? preempt_count_sub+0x14/0xc0
> [  434.262054]  ? percpu_counter_add_batch+0x60/0xd0
> [  434.262261]  nfsd4_proc_compound+0x8c6/0xe90 [nfsd]
> [  434.262553]  nfsd_dispatch+0x2a9/0x5c0 [nfsd]
> [  434.262836]  svc_process_common+0x6ab/0xc30 [sunrpc]
> [  434.263162]  ? svc_create_pooled+0x390/0x390 [sunrpc]
> [  434.263484]  ? nfsd_svc+0x830/0x830 [nfsd]
> [  434.263755]  ? svc_recv+0xabf/0x1310 [sunrpc]
> [  434.264074]  svc_process+0x1a3/0x200 [sunrpc]
> [  434.264382]  nfsd+0x1d7/0x390 [nfsd]
> [  434.264640]  ? nfsd_shutdown_threads+0x200/0x200 [nfsd]
> [  434.264926]  kthread+0x167/0x1a0
> [  434.265101]  ? kthread_complete_and_exit+0x20/0x20
> [  434.265307]  ret_from_fork+0x22/0x30
> [  434.265494]  </TASK>
> [  434.265652] Modules linked in: rpcsec_gss_krb5(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) rfkill(E) nf_tables(E) nfnetlink(E) iTCO_wdt(E) intel_rapl_msr(E) intel_pmc_bxt(E) iTCO_vendor_support(E) joydev(E) intel_rapl_common(E) i2c_i801(E) i2c_smbus(E) virtio_balloon(E) lpc_ich(E) nfsd(OE) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) drm(E) fuse(E) sunrpc(E) xfs(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) serio_raw(E) virtio_blk(E) virtio_console(E) virtio_net(E) net_failover(E) failover(E) qemu_fw_cfg(E)
> [  434.267660] ---[ end trace 0000000000000000 ]---
> [  434.267870] RIP: 0010:nfsd4_open+0x1940/0x1a30 [nfsd]
> [  434.268161] Code: 40 e8 54 04 e2 e6 48 8b 7c 24 40 41 89 c4 e8 57 7e d4 e6 41 f7 d4 4c 8b 44 24 60 66 44 21 63 44 48 8b 54 24 40 e9 46 fc ff ff <0f> 0b 48 8d bd 88 00 00 00 e8 72 80 d4 e6 4c 8b ad 88 00 00 00 49
> [  434.268771] RSP: 0018:ffff888134897c10 EFLAGS: 00010246
> [  434.268995] RAX: 0000000000000000 RBX: ffff8881348791a0 RCX: ffffffffc07fbb44
> [  434.269247] RDX: 1ffff1102691001a RSI: 0000000000000002 RDI: ffff8881348800d1
> [  434.269511] RBP: ffff888134880030 R08: 0000000000000000 R09: 0000000000000001
> [  434.269775] R10: fffffbfff542dfac R11: 0000000000000001 R12: 0000000000000000
> [  434.270030] R13: ffff888133ab8000 R14: 0000000000000000 R15: ffff8881165db400
> [  434.270288] FS:  0000000000000000(0000) GS:ffff888225500000(0000) knlGS:0000000000000000
> [  434.270632] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  434.270862] CR2: 0000558890b5d178 CR3: 0000000107844000 CR4: 00000000003506e0
> 
> faddr2line says:
> 
> $ ./scripts/faddr2line --list fs/nfsd/nfsd.ko nfsd4_open+0x1940
> nfsd4_open+0x1940/0x1a30:
> 
> set_change_info at /home/jlayton/git/linux/fs/nfsd/xdr4.h:752
>  747 	#define NFS4_SVC_XDRSIZE		sizeof(struct nfsd4_compoundargs)
>  748 	
>  749 	static inline void
>  750 	set_change_info(struct nfsd4_change_info *cinfo, struct svc_fh *fhp)
>  751 	{
> > 752<		BUG_ON(!fhp->fh_pre_saved);
>  753 		cinfo->atomic = (u32)(fhp->fh_post_saved && !fhp->fh_no_atomic_attr);
>  754 	
>  755 		cinfo->before_change = fhp->fh_pre_change;
>  756 		cinfo->after_change = fhp->fh_post_change;
>  757 	}
> 
> (inlined by) do_open_lookup at /home/jlayton/git/linux/fs/nfsd/nfs4proc.c:502
>  497 		accmode = NFSD_MAY_NOP;
>  498 		if (open->op_created ||
>  499 				open->op_claim_type == NFS4_OPEN_CLAIM_DELEGATE_CUR)
>  500 			accmode |= NFSD_MAY_OWNER_OVERRIDE;
>  501 		status = do_open_permission(rqstp, *resfh, open, accmode);
> > 502<		set_change_info(&open->op_cinfo, current_fh);
>  503 	out:
>  504 		if (status)
>  505 			inode_unlock(current_fh->fh_dentry->d_inode);
>  506 		return status;
>  507 	}
> 
> (inlined by) nfsd4_open at /home/jlayton/git/linux/fs/nfsd/nfs4proc.c:625
>  620 			goto out;
>  621 	
>  622 		switch (open->op_claim_type) {
>  623 		case NFS4_OPEN_CLAIM_DELEGATE_CUR:
>  624 		case NFS4_OPEN_CLAIM_NULL:
> > 625<			status = do_open_lookup(rqstp, cstate, open, &resfh);
>  626 			if (status)
>  627 				goto out;
>  628 			locked = true;
>  629 			break;
>  630 		case NFS4_OPEN_CLAIM_PREVIOUS:
> 
> I haven't yet dug down into the actual cause yet.

Maybe the patch below will fix it? I didn't see other cases where we
could return nfs_ok w/o setting the pre-op attrs, but there may be some.
Neil, feel free to fold this into the appropriate patch in your series
if you think it's correct:

----------------------8<--------------------
[PATCH] nfsd: ensure we fill in pre-op-attrs in v4.1 exclusive create

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/nfs4proc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 242f059e6788..15a08fdaf8b1 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -374,6 +374,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
 			if (d_inode(child)->i_mtime.tv_sec == v_mtime &&
 			    d_inode(child)->i_atime.tv_sec == v_atime &&
 			    d_inode(child)->i_size == 0) {
+				fh_fill_pre_attrs(fhp);
 				open->op_created = true;
 				goto set_attr;	/* subtle */
 			}
-- 
2.36.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/8] NFSD: clean up locking.
  2022-07-15  2:36           ` NeilBrown
  2022-07-15 13:01             ` Jeff Layton
@ 2022-07-15 14:06             ` Chuck Lever III
  1 sibling, 0 replies; 40+ messages in thread
From: Chuck Lever III @ 2022-07-15 14:06 UTC (permalink / raw)
  To: Neil Brown; +Cc: Jeff Layton, Linux NFS Mailing List



> On Jul 14, 2022, at 10:36 PM, NeilBrown <neilb@suse.de> wrote:
> 
> On Thu, 14 Jul 2022, Chuck Lever III wrote:
>> 
>>> On Jul 13, 2022, at 12:32 AM, NeilBrown <neilb@suse.de> wrote:
>>> 
>>> On Wed, 13 Jul 2022, Chuck Lever III wrote:
>> 
>>>>>> Secondarily, the series adds more bells and whistles to the generic
>>>>>> NFSD VFS APIs on behalf of NFSv4-specific requirements. In particular:
>>>>>> 
>>>>>> - ("NFSD: change nfsd_create() to unlock directory before returning.")
>>>>>> makes some big changes to nfsd_create(). But that helper itself
>>>>>> is pretty small. Seems like cleaner code would result if NFSv4
>>>>>> had its own version of nfsd_create() to deal with the post-change
>>>>>> cases.
>>>>> 
>>>>> I would not like that approach. Duplicating code is rarely a good idea.
>>>> 
>>>> De-duplicating code /can/ be a good idea, but isn't always a good
>>>> idea. If the exceptional cases add a lot of logic, that can make the
>>>> de-duplicated code difficult to read and reason about, and it can
>>>> make it brittle, just as it does in this case. Modifications on
>>>> behalf of NFSv4 in this common piece of code is possibly hazardous
>>>> to NFSv3, and navigating around the exception logic makes it
>>>> difficult to understand and review.
>>> 
>>> Are we looking at the same code?
>>> The sum total of extra code needed for v4 is:
>>> - two extra parameters:
>>> 	 void (*post_create)(struct svc_fh *fh, void *data),
>>> 	 void *data)
>>> - two lines of code:
>>> 	if (!err && post_create)
>>> 		post_create(resfhp, data);
>>> 
>>> does that really make anything hard to follow?
>> 
>> The synopsis of nfsd_create() is already pretty cluttered.
> 
> Would it help to collect related details (name, type, attributes, acl,
> label) into a struct and pass a reference to that around?
> One awkwardness in my latest patch is that the acl and label are not set
> in nfsd_create_setattr().  If they were passed around together with the
> attributes, that would be a lot easier to achieve.

That kind of makes my point: using nfsd_create() for both classic
NFSv2/3 and the new NFSv4 hotness overloads the API.

But OK, you seem very set on this. So, let's construct a set
of parameters for a create operation and give the set a
sensible name. The "_args" suffix already has some meaning
and precedence in this code. How about "struct nfsd_create_info" ?


>> You're adding that extra code in nfsd_symlink() too IIRC? And,
>> this change adds a virtual function call (which has significant
>> overhead these days) for reasons of convenience rather than
>> necessity.
> 
> "significant"?  In context of creation a file, I don't think one more
> virtual function call is all that significant?

If you consider how fast underlying storage has become, and how
fast it will become soon (eg, NVRAM-backed filesystems) then the
cost of a virtual function call becomes significant.

And note there is also control flow integrity. A virtual function
can be used to branch into malicious code unless we are careful
to lock it down correctly.

But these are details, and kind of miss my main point.


> I started writing code to avoid the function pointer, but the function
> args list exploded.  If you could be happy with e.g.  a struct
> nfs_create_args which contains attrs, acl, label, and was passed into
> nfsd_create_setattr(), then I think that would cleanly avoid the desire
> for a function pointer.

Note that my complaint was also about adding more logic in these
functions. It isn't much now, but having an "info struct" means
we can pass all kinds of junk into nfsd_create() and add lots
more exceptions to the code path.

Do you see why I really don't like this approach?


--
Chuck Lever




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 6/8] NFSD: use explicit lock/unlock for directory ops
  2022-07-06  4:18 ` [PATCH 6/8] NFSD: use explicit lock/unlock for directory ops NeilBrown
  2022-07-06 14:05   ` Jeff Layton
  2022-07-06 16:29   ` Chuck Lever III
@ 2022-07-15 16:11   ` Jeff Layton
  2022-07-15 18:22     ` Jeff Layton
  2022-07-17 23:43     ` NeilBrown
  2 siblings, 2 replies; 40+ messages in thread
From: Jeff Layton @ 2022-07-15 16:11 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever; +Cc: linux-nfs

On Wed, 2022-07-06 at 14:18 +1000, NeilBrown wrote:
> When creating or unlinking a name in a directory use explicit
> inode_lock_nested() instead of fh_lock(), and explicit calls to
> fh_fill_pre_attrs() and fh_fill_post_attrs().  This is already done for
> renames.
> 
> Also move the 'fill' calls closer to the operation that might change the
> attributes.  This way they are avoided on some error paths.
> 
> Having the locking explicit will simplify proposed future changes to
> locking for directories.  It also makes it easily visible exactly where
> pre/post attributes are used - not all callers of fh_lock() actually
> need the pre/post attributes.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/nfs3proc.c |    6 ++++--
>  fs/nfsd/nfs4proc.c |    6 ++++--
>  fs/nfsd/nfsproc.c  |    7 ++++---
>  fs/nfsd/vfs.c      |   30 +++++++++++++++++++-----------
>  4 files changed, 31 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
> index 3a67d0afb885..9629517344ff 100644
> --- a/fs/nfsd/nfs3proc.c
> +++ b/fs/nfsd/nfs3proc.c
> @@ -254,7 +254,7 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (host_err)
>  		return nfserrno(host_err);
>  
> -	fh_lock_nested(fhp, I_MUTEX_PARENT);
> +	inode_lock_nested(inode, I_MUTEX_PARENT);
>  
>  	child = lookup_one_len(argp->name, parent, argp->len);
>  	if (IS_ERR(child)) {
> @@ -312,11 +312,13 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (!IS_POSIXACL(inode))
>  		iap->ia_mode &= ~current_umask();
>  
> +	fh_fill_pre_attrs(fhp);
>  	host_err = vfs_create(&init_user_ns, inode, child, iap->ia_mode, true);
>  	if (host_err < 0) {
>  		status = nfserrno(host_err);
>  		goto out;
>  	}
> +	fh_fill_post_attrs(fhp);
>  
>  	/* A newly created file already has a file size of zero. */
>  	if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0))
> @@ -334,7 +336,7 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	status = nfsd_create_setattr(rqstp, fhp, resfhp, iap);
>  
>  out:
> -	fh_unlock(fhp);
> +	inode_unlock(inode);
>  	if (child && !IS_ERR(child))
>  		dput(child);
>  	fh_drop_write(fhp);
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 6ec22c69cbec..242f059e6788 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -306,7 +306,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (host_err)
>  		return nfserrno(host_err);
>  
> -	fh_lock_nested(fhp, I_MUTEX_PARENT);
> +	inode_lock_nested(inode, I_MUTEX_PARENT);
>  
>  	child = lookup_one_len(open->op_fname, parent, open->op_fnamelen);
>  	if (IS_ERR(child)) {
> @@ -385,10 +385,12 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (!IS_POSIXACL(inode))
>  		iap->ia_mode &= ~current_umask();
>  
> +	fh_fill_pre_attrs(fhp);
>  	status = nfsd4_vfs_create(fhp, child, open);
>  	if (status != nfs_ok)
>  		goto out;
>  	open->op_created = true;
> +	fh_fill_post_attrs(fhp);
>  
>  	/* A newly created file already has a file size of zero. */
>  	if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0))

Should the fh_fill_post_attrs call be done after nfsd_create_setattr
instead in this function? It seems like we're filling out the post-op
attr here before we're actually done changing things...

> @@ -406,7 +408,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	status = nfsd_create_setattr(rqstp, fhp, resfhp, iap);
>  
>  out:
> -	fh_unlock(fhp);
> +	inode_unlock(inode);
>  	if (child && !IS_ERR(child))
>  		dput(child);
>  	fh_drop_write(fhp);
> diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
> index ed24fae09517..427c404bc52b 100644
> --- a/fs/nfsd/nfsproc.c
> +++ b/fs/nfsd/nfsproc.c
> @@ -285,7 +285,7 @@ nfsd_proc_create(struct svc_rqst *rqstp)
>  		goto done;
>  	}
>  
> -	fh_lock_nested(dirfhp, I_MUTEX_PARENT);
> +	inode_lock_nested(dirfhp->fh_dentry->d_inode, I_MUTEX_PARENT);
>  	dchild = lookup_one_len(argp->name, dirfhp->fh_dentry, argp->len);
>  	if (IS_ERR(dchild)) {
>  		resp->status = nfserrno(PTR_ERR(dchild));
> @@ -382,6 +382,7 @@ nfsd_proc_create(struct svc_rqst *rqstp)
>  	}
>  
>  	resp->status = nfs_ok;
> +	fh_fill_pre_attrs(dirfhp);
>  	if (!inode) {
>  		/* File doesn't exist. Create it and set attrs */
>  		resp->status = nfsd_create_locked(rqstp, dirfhp, argp->name,
> @@ -399,10 +400,10 @@ nfsd_proc_create(struct svc_rqst *rqstp)
>  			resp->status = nfsd_setattr(rqstp, newfhp, attr, 0,
>  						    (time64_t)0);
>  	}
> +	fh_fill_post_attrs(dirfhp);
>  
>  out_unlock:
> -	/* We don't really need to unlock, as fh_put does it. */
> -	fh_unlock(dirfhp);
> +	inode_unlock(dirfhp->fh_dentry->d_inode);
>  	fh_drop_write(dirfhp);
>  done:
>  	fh_put(dirfhp);
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 8e050c6d112a..2ca748aa83bb 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1412,7 +1412,7 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (host_err)
>  		return nfserrno(host_err);
>  
> -	fh_lock_nested(fhp, I_MUTEX_PARENT);
> +	inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
>  	dchild = lookup_one_len(fname, dentry, flen);
>  	host_err = PTR_ERR(dchild);
>  	if (IS_ERR(dchild)) {
> @@ -1427,12 +1427,14 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	dput(dchild);
>  	if (err)
>  		goto out_unlock;
> +	fh_fill_pre_attrs(fhp);
>  	err = nfsd_create_locked(rqstp, fhp, fname, flen, iap, type,
>  				 rdev, resfhp);
>  	if (!err && post_create)
>  		post_create(resfhp, data);
> +	fh_fill_post_attrs(fhp);
>  out_unlock:
> -	fh_unlock(fhp);
> +	inode_unlock(dentry->d_inode);
>  	return err;
>  }
>  
> @@ -1505,14 +1507,15 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (host_err)
>  		goto out_nfserr;
>  
> -	fh_lock(fhp);
>  	dentry = fhp->fh_dentry;
> +	inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
>  	dnew = lookup_one_len(fname, dentry, flen);
>  	host_err = PTR_ERR(dnew);
>  	if (IS_ERR(dnew)) {
> -		fh_unlock(fhp);
> +		inode_unlock(dentry->d_inode);
>  		goto out_nfserr;
>  	}
> +	fh_fill_pre_attrs(fhp);
>  	host_err = vfs_symlink(&init_user_ns, d_inode(dentry), dnew, path);
>  	err = nfserrno(host_err);
>  	if (!err)
> @@ -1525,7 +1528,8 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (err==0) err = cerr;
>  	if (!err && post_create)
>  		post_create(resfhp, data);
> -	fh_unlock(fhp);
> +	fh_fill_post_attrs(fhp);
> +	inode_unlock(dentry->d_inode);
>  out:
>  	return err;
>  
> @@ -1569,9 +1573,9 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
>  		goto out;
>  	}
>  
> -	fh_lock_nested(ffhp, I_MUTEX_PARENT);
>  	ddir = ffhp->fh_dentry;
>  	dirp = d_inode(ddir);
> +	inode_lock_nested(dirp, I_MUTEX_PARENT);
>  
>  	dnew = lookup_one_len(name, ddir, len);
>  	host_err = PTR_ERR(dnew);
> @@ -1585,8 +1589,10 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
>  	err = nfserr_noent;
>  	if (d_really_is_negative(dold))
>  		goto out_dput;
> +	fh_fill_pre_attrs(ffhp);
>  	host_err = vfs_link(dold, &init_user_ns, dirp, dnew, NULL);
> -	fh_unlock(ffhp);
> +	fh_fill_post_attrs(ffhp);
> +	inode_unlock(dirp);
>  	if (!host_err) {
>  		err = nfserrno(commit_metadata(ffhp));
>  		if (!err)
> @@ -1606,7 +1612,7 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
>  out_dput:
>  	dput(dnew);
>  out_unlock:
> -	fh_unlock(ffhp);
> +	inode_unlock(dirp);
>  	goto out_drop_write;
>  }
>  
> @@ -1781,9 +1787,9 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
>  	if (host_err)
>  		goto out_nfserr;
>  
> -	fh_lock_nested(fhp, I_MUTEX_PARENT);
>  	dentry = fhp->fh_dentry;
>  	dirp = d_inode(dentry);
> +	inode_lock_nested(dirp, I_MUTEX_PARENT);
>  
>  	rdentry = lookup_one_len(fname, dentry, flen);
>  	host_err = PTR_ERR(rdentry);
> @@ -1801,6 +1807,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
>  	if (!type)
>  		type = d_inode(rdentry)->i_mode & S_IFMT;
>  
> +	fh_fill_pre_attrs(fhp);
>  	if (type != S_IFDIR) {
>  		if (rdentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK)
>  			nfsd_close_cached_files(rdentry);
> @@ -1808,8 +1815,9 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
>  	} else {
>  		host_err = vfs_rmdir(&init_user_ns, dirp, rdentry);
>  	}
> +	fh_fill_post_attrs(fhp);
>  
> -	fh_unlock(fhp);
> +	inode_unlock(dirp);
>  	if (!host_err)
>  		host_err = commit_metadata(fhp);
>  	dput(rdentry);
> @@ -1832,7 +1840,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
>  out:
>  	return err;
>  out_unlock:
> -	fh_unlock(fhp);
> +	inode_unlock(dirp);
>  	goto out_drop_write;
>  }
>  
> 
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 6/8] NFSD: use explicit lock/unlock for directory ops
  2022-07-15 16:11   ` Jeff Layton
@ 2022-07-15 18:22     ` Jeff Layton
  2022-07-17 23:59       ` NeilBrown
  2022-07-17 23:43     ` NeilBrown
  1 sibling, 1 reply; 40+ messages in thread
From: Jeff Layton @ 2022-07-15 18:22 UTC (permalink / raw)
  To: NeilBrown, Chuck Lever; +Cc: linux-nfs

On Fri, 2022-07-15 at 12:11 -0400, Jeff Layton wrote:
> On Wed, 2022-07-06 at 14:18 +1000, NeilBrown wrote:
> > When creating or unlinking a name in a directory use explicit
> > inode_lock_nested() instead of fh_lock(), and explicit calls to
> > fh_fill_pre_attrs() and fh_fill_post_attrs().  This is already done for
> > renames.
> > 
> > Also move the 'fill' calls closer to the operation that might change the
> > attributes.  This way they are avoided on some error paths.
> > 
> > Having the locking explicit will simplify proposed future changes to
> > locking for directories.  It also makes it easily visible exactly where
> > pre/post attributes are used - not all callers of fh_lock() actually
> > need the pre/post attributes.
> > 
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > ---
> >  fs/nfsd/nfs3proc.c |    6 ++++--
> >  fs/nfsd/nfs4proc.c |    6 ++++--
> >  fs/nfsd/nfsproc.c  |    7 ++++---
> >  fs/nfsd/vfs.c      |   30 +++++++++++++++++++-----------
> >  4 files changed, 31 insertions(+), 18 deletions(-)
> > 
> > diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
> > index 3a67d0afb885..9629517344ff 100644
> > --- a/fs/nfsd/nfs3proc.c
> > +++ b/fs/nfsd/nfs3proc.c
> > @@ -254,7 +254,7 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  	if (host_err)
> >  		return nfserrno(host_err);
> >  
> > -	fh_lock_nested(fhp, I_MUTEX_PARENT);
> > +	inode_lock_nested(inode, I_MUTEX_PARENT);
> >  
> >  	child = lookup_one_len(argp->name, parent, argp->len);
> >  	if (IS_ERR(child)) {
> > @@ -312,11 +312,13 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  	if (!IS_POSIXACL(inode))
> >  		iap->ia_mode &= ~current_umask();
> >  
> > +	fh_fill_pre_attrs(fhp);
> >  	host_err = vfs_create(&init_user_ns, inode, child, iap->ia_mode, true);
> >  	if (host_err < 0) {
> >  		status = nfserrno(host_err);
> >  		goto out;
> >  	}
> > +	fh_fill_post_attrs(fhp);
> >  
> >  	/* A newly created file already has a file size of zero. */
> >  	if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0))
> > @@ -334,7 +336,7 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  	status = nfsd_create_setattr(rqstp, fhp, resfhp, iap);
> >  
> >  out:
> > -	fh_unlock(fhp);
> > +	inode_unlock(inode);
> >  	if (child && !IS_ERR(child))
> >  		dput(child);
> >  	fh_drop_write(fhp);
> > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > index 6ec22c69cbec..242f059e6788 100644
> > --- a/fs/nfsd/nfs4proc.c
> > +++ b/fs/nfsd/nfs4proc.c
> > @@ -306,7 +306,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  	if (host_err)
> >  		return nfserrno(host_err);
> >  
> > -	fh_lock_nested(fhp, I_MUTEX_PARENT);
> > +	inode_lock_nested(inode, I_MUTEX_PARENT);
> >  
> >  	child = lookup_one_len(open->op_fname, parent, open->op_fnamelen);
> >  	if (IS_ERR(child)) {
> > @@ -385,10 +385,12 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  	if (!IS_POSIXACL(inode))
> >  		iap->ia_mode &= ~current_umask();
> >  
> > +	fh_fill_pre_attrs(fhp);
> >  	status = nfsd4_vfs_create(fhp, child, open);
> >  	if (status != nfs_ok)
> >  		goto out;
> >  	open->op_created = true;
> > +	fh_fill_post_attrs(fhp);
> >  
> >  	/* A newly created file already has a file size of zero. */
> >  	if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0))
> 
> Should the fh_fill_post_attrs call be done after nfsd_create_setattr
> instead in this function? It seems like we're filling out the post-op
> attr here before we're actually done changing things...
> 
> > @@ -406,7 +408,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  	status = nfsd_create_setattr(rqstp, fhp, resfhp, iap);
> >  
> >  out:
> > -	fh_unlock(fhp);
> > +	inode_unlock(inode);
> >  	if (child && !IS_ERR(child))
> >  		dput(child);
> >  	fh_drop_write(fhp);
> > diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
> > index ed24fae09517..427c404bc52b 100644
> > --- a/fs/nfsd/nfsproc.c
> > +++ b/fs/nfsd/nfsproc.c
> > @@ -285,7 +285,7 @@ nfsd_proc_create(struct svc_rqst *rqstp)
> >  		goto done;
> >  	}
> >  
> > -	fh_lock_nested(dirfhp, I_MUTEX_PARENT);
> > +	inode_lock_nested(dirfhp->fh_dentry->d_inode, I_MUTEX_PARENT);
> >  	dchild = lookup_one_len(argp->name, dirfhp->fh_dentry, argp->len);
> >  	if (IS_ERR(dchild)) {
> >  		resp->status = nfserrno(PTR_ERR(dchild));
> > @@ -382,6 +382,7 @@ nfsd_proc_create(struct svc_rqst *rqstp)
> >  	}
> >  
> >  	resp->status = nfs_ok;
> > +	fh_fill_pre_attrs(dirfhp);
> >  	if (!inode) {
> >  		/* File doesn't exist. Create it and set attrs */
> >  		resp->status = nfsd_create_locked(rqstp, dirfhp, argp->name,
> > @@ -399,10 +400,10 @@ nfsd_proc_create(struct svc_rqst *rqstp)
> >  			resp->status = nfsd_setattr(rqstp, newfhp, attr, 0,
> >  						    (time64_t)0);
> >  	}
> > +	fh_fill_post_attrs(dirfhp);
> >  
> >  out_unlock:
> > -	/* We don't really need to unlock, as fh_put does it. */
> > -	fh_unlock(dirfhp);
> > +	inode_unlock(dirfhp->fh_dentry->d_inode);
> >  	fh_drop_write(dirfhp);
> >  done:
> >  	fh_put(dirfhp);
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 8e050c6d112a..2ca748aa83bb 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -1412,7 +1412,7 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  	if (host_err)
> >  		return nfserrno(host_err);
> >  
> > -	fh_lock_nested(fhp, I_MUTEX_PARENT);
> > +	inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
> >  	dchild = lookup_one_len(fname, dentry, flen);
> >  	host_err = PTR_ERR(dchild);
> >  	if (IS_ERR(dchild)) {
> > @@ -1427,12 +1427,14 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  	dput(dchild);
> >  	if (err)
> >  		goto out_unlock;
> > +	fh_fill_pre_attrs(fhp);
> >  	err = nfsd_create_locked(rqstp, fhp, fname, flen, iap, type,
> >  				 rdev, resfhp);
> >  	if (!err && post_create)
> >  		post_create(resfhp, data);
> > +	fh_fill_post_attrs(fhp);
> >  out_unlock:
> > -	fh_unlock(fhp);
> > +	inode_unlock(dentry->d_inode);
> >  	return err;
> >  }
> >  
> > @@ -1505,14 +1507,15 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  	if (host_err)
> >  		goto out_nfserr;
> >  
> > -	fh_lock(fhp);
> >  	dentry = fhp->fh_dentry;
> > +	inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
> >  	dnew = lookup_one_len(fname, dentry, flen);
> >  	host_err = PTR_ERR(dnew);
> >  	if (IS_ERR(dnew)) {
> > -		fh_unlock(fhp);
> > +		inode_unlock(dentry->d_inode);
> >  		goto out_nfserr;
> >  	}
> > +	fh_fill_pre_attrs(fhp);
> >  	host_err = vfs_symlink(&init_user_ns, d_inode(dentry), dnew, path);
> >  	err = nfserrno(host_err);
> >  	if (!err)
> > @@ -1525,7 +1528,8 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  	if (err==0) err = cerr;
> >  	if (!err && post_create)
> >  		post_create(resfhp, data);
> > -	fh_unlock(fhp);
> > +	fh_fill_post_attrs(fhp);
> > +	inode_unlock(dentry->d_inode);
> >  out:
> >  	return err;
> >  
> > @@ -1569,9 +1573,9 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
> >  		goto out;
> >  	}
> >  
> > -	fh_lock_nested(ffhp, I_MUTEX_PARENT);
> >  	ddir = ffhp->fh_dentry;
> >  	dirp = d_inode(ddir);
> > +	inode_lock_nested(dirp, I_MUTEX_PARENT);
> >  
> >  	dnew = lookup_one_len(name, ddir, len);
> >  	host_err = PTR_ERR(dnew);
> > @@ -1585,8 +1589,10 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
> >  	err = nfserr_noent;
> >  	if (d_really_is_negative(dold))
> >  		goto out_dput;
> > +	fh_fill_pre_attrs(ffhp);
> >  	host_err = vfs_link(dold, &init_user_ns, dirp, dnew, NULL);
> > -	fh_unlock(ffhp);
> > +	fh_fill_post_attrs(ffhp);
> > +	inode_unlock(dirp);
> >  	if (!host_err) {
> >  		err = nfserrno(commit_metadata(ffhp));
> >  		if (!err)
> > @@ -1606,7 +1612,7 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
> >  out_dput:
> >  	dput(dnew);
> >  out_unlock:
> > -	fh_unlock(ffhp);
> > +	inode_unlock(dirp);
> >  	goto out_drop_write;
> >  }
> >  
> > @@ -1781,9 +1787,9 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
> >  	if (host_err)
> >  		goto out_nfserr;
> >  
> > -	fh_lock_nested(fhp, I_MUTEX_PARENT);
> >  	dentry = fhp->fh_dentry;
> >  	dirp = d_inode(dentry);
> > +	inode_lock_nested(dirp, I_MUTEX_PARENT);
> >  
> >  	rdentry = lookup_one_len(fname, dentry, flen);
> >  	host_err = PTR_ERR(rdentry);
> > @@ -1801,6 +1807,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
> >  	if (!type)
> >  		type = d_inode(rdentry)->i_mode & S_IFMT;
> >  
> > +	fh_fill_pre_attrs(fhp);
> >  	if (type != S_IFDIR) {
> >  		if (rdentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK)
> >  			nfsd_close_cached_files(rdentry);
> > @@ -1808,8 +1815,9 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
> >  	} else {
> >  		host_err = vfs_rmdir(&init_user_ns, dirp, rdentry);
> >  	}
> > +	fh_fill_post_attrs(fhp);
> >  
> > -	fh_unlock(fhp);
> > +	inode_unlock(dirp);
> >  	if (!host_err)
> >  		host_err = commit_metadata(fhp);
> >  	dput(rdentry);
> > @@ -1832,7 +1840,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
> >  out:
> >  	return err;
> >  out_unlock:
> > -	fh_unlock(fhp);
> > +	inode_unlock(dirp);
> >  	goto out_drop_write;
> >  }
> >  
> > 
> > 
> 


[PATCH] SQUASH: nfsd: ensure we fill in pre-op-attrs in
 nfsd4_create_file

In some cases, they're left uninitialized. This also ensures that the
post_op attrs are properly filled in all cases too.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/nfs4proc.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 242f059e6788..05652a7dabe8 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -346,6 +346,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
 
 		switch (open->op_createmode) {
 		case NFS4_CREATE_UNCHECKED:
+			fh_fill_pre_attrs(fhp);
 			if (!d_is_reg(child))
 				break;
 
@@ -365,6 +366,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
 			if (d_inode(child)->i_mtime.tv_sec == v_mtime &&
 			    d_inode(child)->i_atime.tv_sec == v_atime &&
 			    d_inode(child)->i_size == 0) {
+				fh_fill_pre_attrs(fhp);
 				open->op_created = true;
 				break;		/* subtle */
 			}
@@ -374,6 +376,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
 			if (d_inode(child)->i_mtime.tv_sec == v_mtime &&
 			    d_inode(child)->i_atime.tv_sec == v_atime &&
 			    d_inode(child)->i_size == 0) {
+				fh_fill_pre_attrs(fhp);
 				open->op_created = true;
 				goto set_attr;	/* subtle */
 			}
@@ -385,12 +388,10 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (!IS_POSIXACL(inode))
 		iap->ia_mode &= ~current_umask();
 
-	fh_fill_pre_attrs(fhp);
 	status = nfsd4_vfs_create(fhp, child, open);
 	if (status != nfs_ok)
 		goto out;
 	open->op_created = true;
-	fh_fill_post_attrs(fhp);
 
 	/* A newly created file already has a file size of zero. */
 	if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0))
@@ -408,6 +409,8 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	status = nfsd_create_setattr(rqstp, fhp, resfhp, iap);
 
 out:
+	if (status == nfs_ok)
+		fh_fill_post_attrs(fhp);
 	inode_unlock(inode);
 	if (child && !IS_ERR(child))
 		dput(child);
-- 
2.36.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 6/8] NFSD: use explicit lock/unlock for directory ops
  2022-07-15 16:11   ` Jeff Layton
  2022-07-15 18:22     ` Jeff Layton
@ 2022-07-17 23:43     ` NeilBrown
  1 sibling, 0 replies; 40+ messages in thread
From: NeilBrown @ 2022-07-17 23:43 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Chuck Lever, linux-nfs

On Sat, 16 Jul 2022, Jeff Layton wrote:
> On Wed, 2022-07-06 at 14:18 +1000, NeilBrown wrote:
> >  
> > +	fh_fill_pre_attrs(fhp);
> >  	status = nfsd4_vfs_create(fhp, child, open);
> >  	if (status != nfs_ok)
> >  		goto out;
> >  	open->op_created = true;
> > +	fh_fill_post_attrs(fhp);
> >  
> >  	/* A newly created file already has a file size of zero. */
> >  	if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0))
> 
> Should the fh_fill_post_attrs call be done after nfsd_create_setattr
> instead in this function? It seems like we're filling out the post-op
> attr here before we're actually done changing things...

nfsd_create_setattr() only affects the newly created thing, so it should
not be changing any attributes of the directory that it was created in.
So it should not matter for correctness where fh_fill_post_attrs() is
called, as long as it is between nfsd4_vfs_create() and inode_unlock().

I preferred closer to the former.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 6/8] NFSD: use explicit lock/unlock for directory ops
  2022-07-15 18:22     ` Jeff Layton
@ 2022-07-17 23:59       ` NeilBrown
  0 siblings, 0 replies; 40+ messages in thread
From: NeilBrown @ 2022-07-17 23:59 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Chuck Lever, linux-nfs

On Sat, 16 Jul 2022, Jeff Layton wrote:
> On Fri, 2022-07-15 at 12:11 -0400, Jeff Layton wrote:
> 
> 
> [PATCH] SQUASH: nfsd: ensure we fill in pre-op-attrs in
>  nfsd4_create_file
> 
> In some cases, they're left uninitialized. This also ensures that the
> post_op attrs are properly filled in all cases too.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>

Thanks Jeff, but I think this is more noisy than necessary.
The problem is that the d_really_is_positive() doesn't actually change
the directory (obviously) but can succeed - so pre/post attributes are
needed by NFSv4 even though they aren't really relevant.

I would rather use the same approach as in the !open->op_create branch
in d_open_lookup() :
			fh_fill_pre_attrs(current_fh);
			fh_fill_post_attrs(current_fh);
with a comment explaining that as the directory is locked, and as it
isn't being changed, this makes sense.

I'll fold that in.

Thanks,
NeilBrown


> ---
>  fs/nfsd/nfs4proc.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 242f059e6788..05652a7dabe8 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -346,6 +346,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  
>  		switch (open->op_createmode) {
>  		case NFS4_CREATE_UNCHECKED:
> +			fh_fill_pre_attrs(fhp);
>  			if (!d_is_reg(child))
>  				break;
>  
> @@ -365,6 +366,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  			if (d_inode(child)->i_mtime.tv_sec == v_mtime &&
>  			    d_inode(child)->i_atime.tv_sec == v_atime &&
>  			    d_inode(child)->i_size == 0) {
> +				fh_fill_pre_attrs(fhp);
>  				open->op_created = true;
>  				break;		/* subtle */
>  			}
> @@ -374,6 +376,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  			if (d_inode(child)->i_mtime.tv_sec == v_mtime &&
>  			    d_inode(child)->i_atime.tv_sec == v_atime &&
>  			    d_inode(child)->i_size == 0) {
> +				fh_fill_pre_attrs(fhp);
>  				open->op_created = true;
>  				goto set_attr;	/* subtle */
>  			}
> @@ -385,12 +388,10 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (!IS_POSIXACL(inode))
>  		iap->ia_mode &= ~current_umask();
>  
> -	fh_fill_pre_attrs(fhp);
>  	status = nfsd4_vfs_create(fhp, child, open);
>  	if (status != nfs_ok)
>  		goto out;
>  	open->op_created = true;
> -	fh_fill_post_attrs(fhp);
>  
>  	/* A newly created file already has a file size of zero. */
>  	if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0))
> @@ -408,6 +409,8 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	status = nfsd_create_setattr(rqstp, fhp, resfhp, iap);
>  
>  out:
> +	if (status == nfs_ok)
> +		fh_fill_post_attrs(fhp);
>  	inode_unlock(inode);
>  	if (child && !IS_ERR(child))
>  		dput(child);
> -- 
> 2.36.1
> 
> 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2022-07-18  0:00 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-06  4:18 [PATCH 0/8] NFSD: clean up locking NeilBrown
2022-07-06  4:18 ` [PATCH 6/8] NFSD: use explicit lock/unlock for directory ops NeilBrown
2022-07-06 14:05   ` Jeff Layton
2022-07-06 16:29   ` Chuck Lever III
2022-07-15 16:11   ` Jeff Layton
2022-07-15 18:22     ` Jeff Layton
2022-07-17 23:59       ` NeilBrown
2022-07-17 23:43     ` NeilBrown
2022-07-06  4:18 ` [PATCH 8/8] NFSD: discard fh_locked flag and fh_lock/fh_unlock NeilBrown
2022-07-06 14:12   ` Jeff Layton
2022-07-06  4:18 ` [PATCH 7/8] NFSD: use (un)lock_inode instead of fh_(un)lock for file operations NeilBrown
2022-07-06 14:10   ` Jeff Layton
2022-07-06 16:30   ` Chuck Lever III
2022-07-07  1:33     ` NeilBrown
2022-07-06  4:18 ` [PATCH 2/8] NFSD: change nfsd_create() to unlock directory before returning NeilBrown
2022-07-06 13:24   ` Jeff Layton
2022-07-06 16:29   ` Chuck Lever III
2022-07-06  4:18 ` [PATCH 1/8] NFSD: drop rqstp arg to do_set_nfs4_acl() NeilBrown
2022-07-06 13:17   ` Jeff Layton
2022-07-06  4:18 ` [PATCH 4/8] NFSD: only call fh_unlock() once in nfsd_link() NeilBrown
2022-07-06 13:31   ` Jeff Layton
2022-07-06 16:29   ` Chuck Lever III
2022-07-06  4:18 ` [PATCH 3/8] NFSD: always drop directory lock in nfsd_unlink() NeilBrown
2022-07-06 13:30   ` Jeff Layton
2022-07-06  4:18 ` [PATCH 5/8] NFSD: reduce locking in nfsd_lookup() NeilBrown
2022-07-06 13:47   ` Jeff Layton
2022-07-07  1:26     ` NeilBrown
2022-07-06 16:29   ` Chuck Lever III
2022-07-07  1:29     ` NeilBrown
2022-07-06 16:29 ` [PATCH 0/8] NFSD: clean up locking Chuck Lever III
2022-07-12  2:33   ` NeilBrown
2022-07-12 14:17     ` Chuck Lever III
2022-07-13  4:32       ` NeilBrown
2022-07-13 14:15         ` Chuck Lever III
2022-07-13 19:13           ` Jeff Layton
2022-07-14 14:32             ` Chuck Lever III
2022-07-15  2:36           ` NeilBrown
2022-07-15 13:01             ` Jeff Layton
2022-07-15 13:53               ` Jeff Layton
2022-07-15 14:06             ` Chuck Lever III

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.