All of lore.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@gmail.com>
To: "J. Bruce Fields" <bfields@redhat.com>
Cc: linux-nfs@vger.kernel.org
Subject: [PATCH v2 14/16] nfsd: close cached files prior to a REMOVE or RENAME that would replace target
Date: Sun, 18 Aug 2019 14:18:57 -0400	[thread overview]
Message-ID: <20190818181859.8458-15-trond.myklebust@hammerspace.com> (raw)
In-Reply-To: <20190818181859.8458-14-trond.myklebust@hammerspace.com>

From: Jeff Layton <jeff.layton@primarydata.com>

It's not uncommon for some workloads to do a bunch of I/O to a file and
delete it just afterward. If knfsd has a cached open file however, then
the file may still be open when the dentry is unlinked. If the
underlying filesystem is nfs, then that could trigger it to do a
sillyrename.

On a REMOVE or RENAME scan the nfsd_file cache for open files that
correspond to the inode, and proactively unhash and put their
references. This should prevent any delete-on-last-close activity from
occurring, solely due to knfsd's open file cache.

This must be done synchronously though so we use the variants that call
flush_delayed_fput. There are deadlock possibilities if you call
flush_delayed_fput while holding locks, however. In the case of
nfsd_rename, we don't even do the lookups of the dentries to be renamed
until we've locked for rename.

Once we've figured out what the target dentry is for a rename, check to
see whether there are cached open files associated with it. If there
are, then unwind all of the locking, close them all, and then reattempt
the rename.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfsd/vfs.c | 62 +++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 53 insertions(+), 9 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 8e2c8f36eba3..84e87772c2b8 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1590,6 +1590,26 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
 	goto out_unlock;
 }
 
+static void
+nfsd_close_cached_files(struct dentry *dentry)
+{
+	struct inode *inode = d_inode(dentry);
+
+	if (inode && S_ISREG(inode->i_mode))
+		nfsd_file_close_inode_sync(inode);
+}
+
+static bool
+nfsd_has_cached_files(struct dentry *dentry)
+{
+	bool		ret = false;
+	struct inode *inode = d_inode(dentry);
+
+	if (inode && S_ISREG(inode->i_mode))
+		ret = nfsd_file_is_cached(inode);
+	return ret;
+}
+
 /*
  * Rename a file
  * N.B. After this call _both_ ffhp and tfhp need an fh_put
@@ -1602,6 +1622,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
 	struct inode	*fdir, *tdir;
 	__be32		err;
 	int		host_err;
+	bool		has_cached = false;
 
 	err = fh_verify(rqstp, ffhp, S_IFDIR, NFSD_MAY_REMOVE);
 	if (err)
@@ -1620,6 +1641,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
 	if (!flen || isdotent(fname, flen) || !tlen || isdotent(tname, tlen))
 		goto out;
 
+retry:
 	host_err = fh_want_write(ffhp);
 	if (host_err) {
 		err = nfserrno(host_err);
@@ -1659,11 +1681,16 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
 	if (ffhp->fh_export->ex_path.dentry != tfhp->fh_export->ex_path.dentry)
 		goto out_dput_new;
 
-	host_err = vfs_rename(fdir, odentry, tdir, ndentry, NULL, 0);
-	if (!host_err) {
-		host_err = commit_metadata(tfhp);
-		if (!host_err)
-			host_err = commit_metadata(ffhp);
+	if (nfsd_has_cached_files(ndentry)) {
+		has_cached = true;
+		goto out_dput_old;
+	} else {
+		host_err = vfs_rename(fdir, odentry, tdir, ndentry, NULL, 0);
+		if (!host_err) {
+			host_err = commit_metadata(tfhp);
+			if (!host_err)
+				host_err = commit_metadata(ffhp);
+		}
 	}
  out_dput_new:
 	dput(ndentry);
@@ -1676,12 +1703,26 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
 	 * as that would do the wrong thing if the two directories
 	 * were the same, so again we do it by hand.
 	 */
-	fill_post_wcc(ffhp);
-	fill_post_wcc(tfhp);
+	if (!has_cached) {
+		fill_post_wcc(ffhp);
+		fill_post_wcc(tfhp);
+	}
 	unlock_rename(tdentry, fdentry);
 	ffhp->fh_locked = tfhp->fh_locked = false;
 	fh_drop_write(ffhp);
 
+	/*
+	 * If the target dentry has cached open files, then we need to try to
+	 * close them prior to doing the rename. Flushing delayed fput
+	 * shouldn't be done with locks held however, so we delay it until this
+	 * point and then reattempt the whole shebang.
+	 */
+	if (has_cached) {
+		has_cached = false;
+		nfsd_close_cached_files(ndentry);
+		dput(ndentry);
+		goto retry;
+	}
 out:
 	return err;
 }
@@ -1728,10 +1769,13 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
 	if (!type)
 		type = d_inode(rdentry)->i_mode & S_IFMT;
 
-	if (type != S_IFDIR)
+	if (type != S_IFDIR) {
+		nfsd_close_cached_files(rdentry);
 		host_err = vfs_unlink(dirp, rdentry, NULL);
-	else
+	} else {
 		host_err = vfs_rmdir(dirp, rdentry);
+	}
+
 	if (!host_err)
 		host_err = commit_metadata(fhp);
 	dput(rdentry);
-- 
2.21.0


  reply	other threads:[~2019-08-18 18:21 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-18 18:18 [PATCH v2 00/16] Cache open file descriptors in knfsd Trond Myklebust
2019-08-18 18:18 ` [PATCH v2 01/16] sunrpc: add a new cache_detail operation for when a cache is flushed Trond Myklebust
2019-08-18 18:18   ` [PATCH v2 02/16] locks: create a new notifier chain for lease attempts Trond Myklebust
2019-08-18 18:18     ` [PATCH v2 03/16] notify: export symbols for use by the knfsd file cache Trond Myklebust
2019-08-18 18:18       ` [PATCH v2 04/16] vfs: Export flush_delayed_fput for use by knfsd Trond Myklebust
2019-08-18 18:18         ` [PATCH v2 05/16] nfsd: add a new struct file caching facility to nfsd Trond Myklebust
2019-08-18 18:18           ` [PATCH v2 06/16] nfsd: hook up nfsd_write to the new nfsd_file cache Trond Myklebust
2019-08-18 18:18             ` [PATCH v2 07/16] nfsd: hook up nfsd_read to the " Trond Myklebust
2019-08-18 18:18               ` [PATCH v2 08/16] nfsd: hook nfsd_commit up " Trond Myklebust
2019-08-18 18:18                 ` [PATCH v2 09/16] nfsd: convert nfs4_file->fi_fds array to use nfsd_files Trond Myklebust
2019-08-18 18:18                   ` [PATCH v2 10/16] nfsd: convert fi_deleg_file and ls_file fields to nfsd_file Trond Myklebust
2019-08-18 18:18                     ` [PATCH v2 11/16] nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache Trond Myklebust
2019-08-18 18:18                       ` [PATCH v2 12/16] nfsd: have nfsd_test_lock use " Trond Myklebust
2019-08-18 18:18                         ` [PATCH v2 13/16] nfsd: rip out the raparms cache Trond Myklebust
2019-08-18 18:18                           ` Trond Myklebust [this message]
2019-08-18 18:18                             ` [PATCH v2 15/16] nfsd: Fix up some unused variable warnings Trond Myklebust
2019-08-18 18:18                               ` [PATCH v2 16/16] nfsd: Fix the documentation for svcxdr_tmpalloc() Trond Myklebust
2019-08-19 15:01 ` [PATCH v2 00/16] Cache open file descriptors in knfsd J. Bruce Fields
2019-08-19 15:11   ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190818181859.8458-15-trond.myklebust@hammerspace.com \
    --to=trondmy@gmail.com \
    --cc=bfields@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.