All of lore.kernel.org
 help / color / mirror / Atom feed
From: Miklos Szeredi <miklos@szeredi.hu>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: fuse-devel <fuse-devel@lists.sourceforge.net>,
	virtio-fs-list <virtio-fs@redhat.com>,
	Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [Virtio-fs] [fuse-devel] 'FORGET' ordering semantics (vs unlink & NFS)
Date: Tue, 5 Jan 2021 12:24:48 +0100	[thread overview]
Message-ID: <CAJfpegvm0ZYojTeakkXgJ+Q1=k1UFiu=p3VeOST29PXjZGDreA@mail.gmail.com> (raw)
In-Reply-To: <20210104185655.GN2972@work-vm>

[-- Attachment #1: Type: text/plain, Size: 2308 bytes --]

On Mon, Jan 4, 2021 at 7:57 PM Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
>
> * Vivek Goyal (vgoyal@redhat.com) wrote:
> > On Mon, Jan 04, 2021 at 04:00:13PM +0000, Dr. David Alan Gilbert wrote:
> > > Hi,
> > >   On virtio-fs we're hitting a problem with NFS, where
> > > unlinking a file in a directory and then rmdir'ing that
> > > directory fails complaining about the directory not being empty.
> > >
> > > The problem here is that if a file has an open fd, NFS doesn't
> > > actually delete the file on unlink, it just renames it to
> > > a hidden file (e.g. .nfs*******).  That open file is there because
> > > the 'FORGET' hasn't completed yet by the time the rmdir is issued.
> > >
> > > Question:
> > >   a) In the FUSE protocol, are requests assumed to complete in order;
> > > i.e.  unlink, forget, rmdir   is it required that 'forget' completes
> > > before the rmdir is processed?
> > >      (In virtiofs we've been processing requests, in parallel, and
> > > have sent forgets down a separate queue to keep them out of the way).
> > >
> > >   b) 'forget' doesn't send a reply - so the kernel can't wait for the
> > > client to have finished it;  do we need a synchronous forget here?
> >
> > Even if we introduce a synchronous forget, will that really fix the
> > issue. For example, this could also happen if file has been unlinked
> > but it is still open and directory is being removed.
> >
> > fd = open(foo/bar.txt)
> > unlink foo/bar.txt
> > rmdir foo
> > close(fd).
> >
> > In this case, final forget should go after fd has been closed. Its
> > not a forget race.
> >
> > I wrote a test case for this and it works on regular file systems.
> >
> > https://github.com/rhvgoyal/misc/blob/master/virtiofs-tests/rmdir.c
> >
> > I suspect it will fail on nfs because I am assuming that temporary
> > file will be there till final close(fd) happens. If that's the
> > case this is a NFS specific issue because its behavior is different
> > from other file systems.
>
> That's true; but that's NFS just being NFS; in our case we're keeping
> an fd open even though the guest has been smart enough not to; so we're
> causing the NFS oddity when it wouldn't normally happen.

Can't think of anything better than a synchronous forget.   Compile
only tested patch attached.

Thanks,
Miklos

[-- Attachment #2: fuse-sync-forget.patch --]
[-- Type: text/x-patch, Size: 3319 bytes --]

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 78f9f209078c..daa4e669441d 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -373,6 +373,26 @@ static struct vfsmount *fuse_dentry_automount(struct path *path)
 	return ERR_PTR(err);
 }
 
+static void fuse_dentry_iput(struct dentry *dentry, struct inode *inode)
+{
+	if (!__lockref_is_dead(&dentry->d_lockref)) {
+		/*
+		 * This is an unlink/rmdir and removing the last ref to the
+		 * dentry.  Use synchronous FORGET in case filesystem requests
+		 * it.
+		 *
+		 * FIXME: This is racy!  Two or more instances of
+		 * fuse_dentry_iput() could be running concurrently (unlink of
+		 * several aliases in different directories).
+		 */
+		set_bit(FUSE_I_SYNC_FORGET, &get_fuse_inode(inode)->state);
+		iput(inode);
+		clear_bit(FUSE_I_SYNC_FORGET, &get_fuse_inode(inode)->state);
+	} else {
+		iput(inode);
+	}
+}
+
 const struct dentry_operations fuse_dentry_operations = {
 	.d_revalidate	= fuse_dentry_revalidate,
 	.d_delete	= fuse_dentry_delete,
@@ -381,6 +401,7 @@ const struct dentry_operations fuse_dentry_operations = {
 	.d_release	= fuse_dentry_release,
 #endif
 	.d_automount	= fuse_dentry_automount,
+	.d_iput		= fuse_dentry_iput,
 };
 
 const struct dentry_operations fuse_root_dentry_operations = {
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 7c4b8cb93f9f..0820b7a63ca7 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -174,6 +174,8 @@ enum {
 	FUSE_I_SIZE_UNSTABLE,
 	/* Bad inode */
 	FUSE_I_BAD,
+	/* Synchronous forget requested */
+	FUSE_I_SYNC_FORGET,
 };
 
 struct fuse_conn;
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index b0e18b470e91..a49ff30d1ecc 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -115,6 +115,26 @@ static void fuse_free_inode(struct inode *inode)
 	kmem_cache_free(fuse_inode_cachep, fi);
 }
 
+static void fuse_sync_forget(struct inode *inode)
+{
+	struct fuse_mount *fm = get_fuse_mount(inode);
+	struct fuse_inode *fi = get_fuse_inode(inode);
+	struct fuse_forget_in inarg;
+	FUSE_ARGS(args);
+
+	memset(&inarg, 0, sizeof(inarg));
+	inarg.nlookup = fi->nlookup;
+	args.opcode = FUSE_SYNC_FORGET;
+	args.nodeid = fi->nodeid;
+	args.in_numargs = 1;
+	args.in_args[0].size = sizeof(inarg);
+	args.in_args[0].value = &inarg;
+	args.force = true;
+
+	fuse_simple_request(fm, &args);
+	/* ignore errors */
+}
+
 static void fuse_evict_inode(struct inode *inode)
 {
 	struct fuse_inode *fi = get_fuse_inode(inode);
@@ -127,9 +147,13 @@ static void fuse_evict_inode(struct inode *inode)
 		if (FUSE_IS_DAX(inode))
 			fuse_dax_inode_cleanup(inode);
 		if (fi->nlookup) {
-			fuse_queue_forget(fc, fi->forget, fi->nodeid,
-					  fi->nlookup);
-			fi->forget = NULL;
+			if (test_bit(FUSE_I_SYNC_FORGET, &fi->state)) {
+				fuse_sync_forget(inode);
+			} else {
+				fuse_queue_forget(fc, fi->forget, fi->nodeid,
+						  fi->nlookup);
+				fi->forget = NULL;
+			}
 		}
 	}
 	if (S_ISREG(inode->i_mode) && !fuse_is_bad(inode)) {
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 98ca64d1beb6..cfcf95cfde76 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -499,6 +499,7 @@ enum fuse_opcode {
 	FUSE_COPY_FILE_RANGE	= 47,
 	FUSE_SETUPMAPPING	= 48,
 	FUSE_REMOVEMAPPING	= 49,
+	FUSE_SYNC_FORGET	= 50,
 
 	/* CUSE specific operations */
 	CUSE_INIT		= 4096,

  parent reply	other threads:[~2021-01-05 11:24 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-04 16:00 [Virtio-fs] 'FORGET' ordering semantics (vs unlink & NFS) Dr. David Alan Gilbert
2021-01-04 18:45 ` Vivek Goyal
2021-01-04 18:56   ` Dr. David Alan Gilbert
2021-01-04 19:04     ` Vivek Goyal
2021-01-04 19:16       ` Vivek Goyal
2021-01-05 11:24     ` Miklos Szeredi [this message]
2021-01-05 15:42       ` [Virtio-fs] [fuse-devel] " Vivek Goyal
2021-01-06  4:29     ` Amir Goldstein
2021-01-06  8:01       ` Miklos Szeredi
2021-01-06  9:16         ` Amir Goldstein
2021-01-06  9:27           ` Amir Goldstein
2021-01-06 13:40           ` Miklos Szeredi
2021-01-06 16:57             ` Vivek Goyal
2021-01-07  8:44               ` Miklos Szeredi
2021-01-07 10:42                 ` Amir Goldstein
2021-01-07 20:10                   ` Dr. David Alan Gilbert
2021-01-08  4:12                   ` Eryu Guan
2021-01-08  9:08                     ` Amir Goldstein
2021-01-08  9:25                       ` Liu, Jiang
2021-01-08 10:18                       ` Eryu Guan
2021-01-08 15:26                     ` Vivek Goyal
2021-01-15 10:20                       ` Peng Tao
2021-01-08 15:55           ` Vivek Goyal
2021-01-11 15:48           ` Dr. David Alan Gilbert
2021-01-05 10:11 ` Nikolaus Rath
2021-01-05 12:28   ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJfpegvm0ZYojTeakkXgJ+Q1=k1UFiu=p3VeOST29PXjZGDreA@mail.gmail.com' \
    --to=miklos@szeredi.hu \
    --cc=dgilbert@redhat.com \
    --cc=fuse-devel@lists.sourceforge.net \
    --cc=vgoyal@redhat.com \
    --cc=virtio-fs@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.